Assume that the alphabet contains possible characters . The (local) representation of is a binary -dimensional vector with exactly one non-zero component (at the -th position). has input units and output units. is called the ``time-window'' size. We insert default characters at the beginning of each file. The representation of the default character, , is the -dimensional zero-vector. The -th character of file (starting from the first default character) is called .
For all and all possible ,
receives as an input
In practical applications, the
will not always sum up to 1.
To obtain outputs satisfying the properties of
a proper probability distribution,
we normalize by defining