next up previous
Next: About this document ... Up: Reinforcement Learning in Markovian Previous: Acknowledgements


C. W. Anderson.
Learning and Problem Solving with Multilayer Connectionist Systems.
PhD thesis, University of Massachusetts, Dept. of Comp. and Inf. Sci., 1986.

M. I. Jordan.
Supervised learning and systems with excess degrees of freedom.
Technical Report COINS TR 88-27, Massachusetts Institute of Technology, 1988.

M. I. Jordan and R. A. Jacobs.
Learning to control an unstable system with forward modeling.
In Proc. of the 1990 Connectionist Models Summer School, in press. Morgan Kaufmann, 1990.

S. W. Piché.
Draft: First order gradient descent training of adaptive discrete time dynamic networks.
Technical report, Dept. of Electrical Engineering, Stanford University, 1990.

A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.

T. Robinson and F. Fallside.
Dynamic reinforcement driven error propagation networks with application to game playing.
In Proceedings of the 11th Conference of the Cognitive Science Society, Ann Arbor, pages 836-843, 1989.

J. Schmidhuber.
Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments.
Technical Report FKI-126-90 (revised), Institut für Informatik, Technische Universität München, November 1990.
(Revised and extended version of an earlier report from February.).

J. Schmidhuber.
Networks adjusting networks.
In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990.
In November 1990 a revised and extended version appeared as FKI-Report FKI-125-90 (revised) at the Institut für Informatik, Technische Universität München.

J. Schmidhuber.
Towards compositional learning with dynamic neural networks.
Technical Report FKI-129-90, Institut für Informatik, Technische Universität München, 1990.

R. S. Sutton.
Learning to predict by the methods of temporal differences.
Machine Learning, 3:9-44, 1988.

P. J. Werbos.
Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.
IEEE Transactions on Systems, Man, and Cybernetics, 17, 1987.

R. J. Williams.
On the use of backpropagation in associative reinforcement learning.
In IEEE International Conference on Neural Networks, San Diego, volume 2, pages 263-270, 1988.

R. J. Williams and D. Zipser.
Experimental analysis of the real-time recurrent learning algorithm.
Connection Science, 1(1):87-111, 1989.

Juergen Schmidhuber 2003-02-25

Back to Reinforcement Learning POMDP page