Previous approaches. In recent years numerous methods have been proposed to address this problem. For instance, Schmidhuber (1992) suggested to compress regular sequences with the chunking algorithm. Hochreiter's ideas (1991) led to the recent LSTM algorithm (Hochreiter and Schmidhuber, 1997a). Bengio et al. (1994) investigated simulated annealing, multi-grid random search, time-weighted pseudo-Newton optimization, and discrete error propagation, to evaluate their performance in learning long-term dependencies. Bengio and Frasconi (1994) also proposed an EM approach for propagating targets.
Previous benchmarks. Several researchers use the 2-sequence problem (and latch problem) to compare various algorithms, e.g., Bengio et al. (1994), Bengio and Frasconi (1994), El Hihi and Bengio (1995), Lin et al. (1995). For the same purpose, some also use the parity problem, e.g., Bengio et al. (1994), Bengio and Frasconi (1994). Some of Tomita's grammars (1982) are also often used as benchmark problems for recurrent nets (see, e.g., Bengio and Frasconi, 1995; Watrous and Kuhn, 1992; Pollack, 1991; Miller and Giles, 1993; Manolios and Fanelli, 1994).
Overview. In this paper we will show that some instances of such problems can be solved much more quickly by random weight guessing (RG). In other cases, however, RG does not perform well. This paper's purpose is not to suggest that RG is a good algorithm for training recurrent networks. Instead it intends to contribute to understanding RG's significance and shortcomings in long time lag algorithms. It also suggests that RG should be used as a first test to evaluate some benchmark problem's difficulty.