Next: About this document ...
Up: EVALUATING LONG-TERM DEPENDENCY BENCHMARK
Previous: ACKNOWLEDGMENTS
-
- Bengio and Frasconi, 1994
-
Bengio, Y. and Frasconi, P. (1994).
Credit assignment through time: Alternatives to backpropagation.
In Cowan, J. D., Tesauro, G., and Alspector, J., editors, Advances in Neural Information Processing Systems 6, pages 75-82.
San Mateo, CA: Morgan Kaufmann.
- Bengio and Frasconi, 1995
-
Bengio, Y. and Frasconi, P. (1995).
An input output HMM architecture.
In Tesauro, G., Touretzky, D. S., and Leen, T. K., editors, Advances in Neural Information Processing Systems 7, pages 427-434. MIT
Press, Cambridge MA.
- Bengio et al., 1994
-
Bengio, Y., Simard, P., and Frasconi, P. (1994).
Learning long-term dependencies with gradient descent is difficult.
IEEE Transactions on Neural Networks, 5(2):157-166.
- El Hihi and Bengio, 1995
-
El Hihi, S. and Bengio, Y. (1995).
Hierarchical recurrent neural networks for long-term dependencies.
In Advances in Neural Information Processing Systems 8,
pages 493-499. San Mateo, CA: Morgan Kaufmann.
- Fu and Booth, 1975
-
Fu, K. S. and Booth, T. L. (1975).
Grammatical inference: Introduction and survey.
IEEE Transactions on Systems, Man, and Cybernetics, 5:95.
- Gallant, 1990
-
Gallant, S. I. (1990).
A connectionist learning algorithm with provable generalization and
scaling bounds.
Neural Networks, 3:191-201.
- Hochreiter, 1991
-
Hochreiter, J. (1991).
Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis,
Institut für Informatik, Lehrstuhl Prof. Brauer, Technische
Universität München.
See www7.informatik.tu-muenchen.de/~hochreit.
- Hochreiter and Schmidhuber, 1997a
-
Hochreiter, S. and Schmidhuber, J. (1997a).
Long Short-Term Memory.
Neural Computation, 9(8):1735-1780.
- Hochreiter and Schmidhuber, 1997b
-
Hochreiter, S. and Schmidhuber, J. (1997b).
Flat minima.
Neural Computation, 9(1): 1-42.
- Hochreiter and Schmidhuber, 1997c
-
Hochreiter, S. and Schmidhuber, J. (1997c).
LSTM can solve hard long time lag problems.
In M. C. Mozer, M. I. Jordan, T. Petsche, eds.,
Advances in Neural Information Processing Systems 9,
pages 473-479, MIT Press, Cambridge MA, 1997.
- Jordan et al., 1997
-
Jordan, M., Ghahramani, Z., and Saul, L. (1997).
Hidden markov decision trees.
In Advances in Neural Information Processing Systems 9,
volume 9, Cambridge, MA. MIT Press.
- Lang, 1996
-
Lang, K. J. (1996).
Random dfa's can be approximately learned from sparse uniform
examples.
In Proceedings of the Fifth ACM Workshop on Computational
Learning Theory.
- Lin et al., 1995
-
Lin, T., Horne, B. G., Tino, P., and Giles, C. L. (1995).
Learning long-term dependencies is not as difficult with NARX
recurrent neural networks.
Technical Report UMIACS-TR-95-78 and CS-TR-3500, Institute for
Advanced Computer Studies, University of Maryland, College Park, MD 20742.
- Manolios and Fanelli, 1994
-
Manolios, P. and Fanelli, R. (1994).
First-order recurrent neural networks and deterministic finite state
automata.
Neural Computation, 6:1155-1173.
- Miller and Giles, 1993
-
Miller, C. B. and Giles, C. L. (1993).
Experimental comparison of the effect of order in recurrent neural
networks.
International Journal of Pattern Recognition and Artificial
Intelligence, 7(4):849-872.
- Mozer, 1992
-
Mozer, M. C. (1992).
Induction of multiscale temporal structure.
In Moody, J. E., Hanson, S. J., and Lippman, R. P., editors, Advances in Neural Information Processing Systems 4, pages 275-282. San
Mateo, CA: Morgan Kaufmann.
- Pearlmutter, 1989
-
Pearlmutter, B. A. (1995).
Gradient calculations for dynamic recurrent neural networks: A
survey.
IEEE Transactions on Neural Networks, 6(5):1212-1228.
- Pollack, 1991
-
Pollack, J. B. (1991).
The induction of dynamical recognizers.
Machine Learning, 7:227-252.
- Saul and Jordan, 1996
-
Saul, L. and Jordan, M. (1996).
Exploiting tractable substructures in intractable networks.
In Mozer, M., Touretzky, D., and Perrone, M., editors, Advances
in Neural Information Processing Systems 8. MIT Press, Cambridge, MA.
- Schmidhuber, 1992
-
Schmidhuber, J. (1992).
Learning complex, extended sequences using the principle of history
compression.
Neural Computation, 4(2):234-242.
- Schmidhuber, 1997
-
Schmidhuber, J. (1997).
Discovering neural nets with low Kolmogorov complexity and high
generalization capability.
Neural Networks, 10(5):857-873, 1997.
- Tomita, 1982
-
Tomita, M. (1982).
Dynamic construction of finite automata from examples using
hill-climbing.
In Proceedings of the Fourth Annual Cognitive Science
Conference, pages 105-108. Ann Arbor, MI.
- Watrous and Kuhn, 1992
-
Watrous, R. L. and Kuhn, G. M. (1992).
Induction of finite-state languages using second-order recurrent
networks.
Neural Computation, 4:406-414.
- Williams, 1989
-
Williams, R. J. (1989).
Complexity of exact gradient computation algorithms for recurrent
neural networks.
Technical Report NU-CCS-89-27, Boston: Northeastern University,
College of Computer Science.
Juergen Schmidhuber
2003-02-19