With the modified automaton,
it turns out that conventional
recurrent networks
**fail to learn** the correct classifications
within training sequences
(various topologies were tested).
Two related reasons are:

- Time lags are too long (error signals become less significant while moving ``back into time''). See, e.g., [7].
- The presumed search space is huge (in principle, the recurrent net considers all possible symbol combinations as equal candidates for being the reason for the final classification).

But the ``real'' search space ought to be small, because most possible symbol combinations can never occur. The modification of the automaton did not cause a change in entropy. How can an adaptive system find this out? The next section gives an answer.

Juergen Schmidhuber 2003-02-19