Our previous experimental comparisons
(on widely used benchmark problems)
with RTRL (e.g., [15];
results compared to the ones in [17]),
Recurrent Cascade-Correlation
[5],
Elman nets
(results compared to the ones in
[4]),
and Neural Sequence Chunking [16],
demonstrated that
LSTM leads to many
more successful runs than its competitors, and learns much faster
[8]. The following tasks, though,
are more difficult than the above benchmark problems: they
cannot be solved at all in reasonable time
by RS (we tried various architectures) nor any other
recurrent net learning algorithm we are aware of
(see [13] for an overview).
In the experiments below,
gate units (
)
and output units are sigmoid in
.
is sigmoid in
,
and
is sigmoid in
.