next up previous
Next: About this document ... Up: EVALUATING LONG-TERM DEPENDENCY BENCHMARK Previous: ACKNOWLEDGMENTS

Bibliography

Bengio and Frasconi, 1994
Bengio, Y. and Frasconi, P. (1994).
Credit assignment through time: Alternatives to backpropagation.
In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 75-82. San Mateo, CA: Morgan Kaufmann.

Bengio and Frasconi, 1995
Bengio, Y. and Frasconi, P. (1995).
An input output HMM architecture.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 427-434. MIT Press, Cambridge MA.

Bengio et al., 1994
Bengio, Y., Simard, P., and Frasconi, P. (1994).
Learning long-term dependencies with gradient descent is difficult.
IEEE Transactions on Neural Networks, 5(2):157-166.

El Hihi and Bengio, 1995
El Hihi, S. and Bengio, Y. (1995).
Hierarchical recurrent neural networks for long-term dependencies.
In Advances in Neural Information Processing Systems 8, pages 493-499. San Mateo, CA: Morgan Kaufmann.

Fu and Booth, 1975
Fu, K. S. and Booth, T. L. (1975).
Grammatical inference: Introduction and survey.
IEEE Transactions on Systems, Man, and Cybernetics, 5:95.

Gallant, 1990
Gallant, S. I. (1990).
A connectionist learning algorithm with provable generalization and scaling bounds.
Neural Networks, 3:191-201.

Hochreiter, 1991
Hochreiter, J. (1991).
Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München.
See www7.informatik.tu-muenchen.de/~hochreit.

Hochreiter and Schmidhuber, 1997a
Hochreiter, S. and Schmidhuber, J. (1997a).
Long Short-Term Memory.
Neural Computation, 9(8):1735-1780.

Hochreiter and Schmidhuber, 1997b
Hochreiter, S. and Schmidhuber, J. (1997b).
Flat minima.
Neural Computation, 9(1): 1-42.

Hochreiter and Schmidhuber, 1997c
Hochreiter, S. and Schmidhuber, J. (1997c).
LSTM can solve hard long time lag problems.
In M. C. Mozer, M. I. Jordan, T. Petsche, eds., Advances in Neural Information Processing Systems 9, pages 473-479, MIT Press, Cambridge MA, 1997.

Jordan et al., 1997
Jordan, M., Ghahramani, Z., and Saul, L. (1997).
Hidden markov decision trees.
In Advances in Neural Information Processing Systems 9, volume 9, Cambridge, MA. MIT Press.

Lang, 1996
Lang, K. J. (1996).
Random dfa's can be approximately learned from sparse uniform examples.
In Proceedings of the Fifth ACM Workshop on Computational Learning Theory.

Lin et al., 1995
Lin, T., Horne, B. G., Tino, P., and Giles, C. L. (1995).
Learning long-term dependencies is not as difficult with NARX recurrent neural networks.
Technical Report UMIACS-TR-95-78 and CS-TR-3500, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742.

Manolios and Fanelli, 1994
Manolios, P. and Fanelli, R. (1994).
First-order recurrent neural networks and deterministic finite state automata.
Neural Computation, 6:1155-1173.

Miller and Giles, 1993
Miller, C. B. and Giles, C. L. (1993).
Experimental comparison of the effect of order in recurrent neural networks.
International Journal of Pattern Recognition and Artificial Intelligence, 7(4):849-872.

Mozer, 1992
Mozer, M. C. (1992).
Induction of multiscale temporal structure.
In Moody, J. E., Hanson, S. J., and Lippman, R. P., editors, Advances in Neural Information Processing Systems 4, pages 275-282. San Mateo, CA: Morgan Kaufmann.

Pearlmutter, 1989
Pearlmutter, B. A. (1995).
Gradient calculations for dynamic recurrent neural networks: A survey.
IEEE Transactions on Neural Networks, 6(5):1212-1228.

Pollack, 1991
Pollack, J. B. (1991).
The induction of dynamical recognizers.
Machine Learning, 7:227-252.

Saul and Jordan, 1996
Saul, L. and Jordan, M. (1996).
Exploiting tractable substructures in intractable networks.
In Mozer, M., Touretzky, D., and Perrone, M., editors, Advances in Neural Information Processing Systems 8. MIT Press, Cambridge, MA.

Schmidhuber, 1992
Schmidhuber, J. (1992).
Learning complex, extended sequences using the principle of history compression.
Neural Computation, 4(2):234-242.

Schmidhuber, 1997
Schmidhuber, J. (1997).
Discovering neural nets with low Kolmogorov complexity and high generalization capability.
Neural Networks, 10(5):857-873, 1997.

Tomita, 1982
Tomita, M. (1982).
Dynamic construction of finite automata from examples using hill-climbing.
In Proceedings of the Fourth Annual Cognitive Science Conference, pages 105-108. Ann Arbor, MI.

Watrous and Kuhn, 1992
Watrous, R. L. and Kuhn, G. M. (1992).
Induction of finite-state languages using second-order recurrent networks.
Neural Computation, 4:406-414.

Williams, 1989
Williams, R. J. (1989).
Complexity of exact gradient computation algorithms for recurrent neural networks.
Technical Report NU-CCS-89-27, Boston: Northeastern University, College of Computer Science.



Juergen Schmidhuber 2003-02-19