Next: About this document ...
Up: REINFORCEMENT LEARNING WITH SELF-MODIFYING
Previous: DISCUSSION
Schmidhuber:94self
J. Schmidhuber.
On learning how to learn learning strategies.
Technical Report FKI-198-94, Fakult\"at f\"ur Informatik,
Technische Universit\"at M\"unchen, 1994.
Revised 1995.
Wolpert:96
D. H. Wolpert.
The lack of a priori distinctions between learning algorithms.
Neural Computation, 8(7):1341--1390, 1996.
Schmidhuber:97nn
J. Schmidhuber.
Discovering neural nets with low Kolmogorov complexity and high
generalization capability.
Neural Networks, 10(5):857--873, 1997.
Utgoff:86
P. Utgoff.
Shift of bias for inductive concept learning.
In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine
Learning, volume 2, pages 163--190. Morgan Kaufmann, Los Altos, CA, 1986.
Solomonoff:64
R.J. Solomonoff.
A formal theory of inductive inference. Part I.
Information and Control, 7:1--22, 1964.
Kolmogorov:65
A.N. Kolmogorov.
Three approaches to the quantitative definition of information.
Problems of Information Transmission, 1:1--11, 1965.
Chaitin:69
G.J. Chaitin.
On the length of programs for computing finite binary sequences:
statistical considerations.
Journal of the ACM, 16:145--159, 1969.
LiVitanyi:93
M. Li and P. M. B. Vit\'anyi.
An Introduction to Kolmogorov Complexity and its
Applications.
Springer, 1993.
Schmidhuber:95kol
J. Schmidhuber.
Discovering solutions with low Kolmogorov complexity and high
generalization capability.
In A. Prieditis and S. Russell, editors, Machine Learning:
Proceedings of the Twelfth International Conference, pages 488--496. Morgan
Kaufmann Publishers, San Francisco, CA, 1995.
Wiering:96levin
M.A. Wiering and J. Schmidhuber.
Solving POMDPs with Levin search and EIRA.
In L. Saitta, editor, Machine Learning: Proceedings of the
Thirteenth International Conference, pages 534--542. Morgan Kaufmann
Publishers, San Francisco, CA, 1996.
Levin:73
L. A. Levin.
Universal sequential search problems.
Problems of Information Transmission, 9(3):265--266, 1973.
Levin:84
L. A. Levin.
Randomness conservation inequalities: Information and independence in
mathematical theories.
Information and Control, 61:15--37, 1984.
Solomonoff:86
R.J. Solomonoff.
An application of algorithmic probability to problems in artificial
intelligence.
In L. N. Kanal and J. F. Lemmer, editors, Uncertainty in
Artificial Intelligence, pages 473--491. Elsevier Science Publishers, 1986.
Watanabe:92
O. Watanabe.
Kolmogorov complexity and computational complexity.
EATCS Monographs on Theoretical Computer Science, Springer, 1992.
Schmidhuber:94kol
J. Schmidhuber.
Discovering problem solutions with low Kolmogorov complexity and
high generalization capability.
Technical Report FKI-194-94, Fakult\"at f\"ur Informatik,
Technische Universit\"at M\"unchen, 1994.
Short version in A. Prieditis and S. Russell, eds., Machine Learning:
Proceedings of the Twelfth International Conference, Morgan Kaufmann
Publishers, pages 488--496, San Francisco, CA, 1995.
Caruana:95
R. Caruana, D. L. Silver, J. Baxter, T. M. Mitchell, L. Y. Pratt, and S. Thrun.
Learning to learn: knowledge consolidation and transfer in inductive
systems, 1995.
Workshop held at NIPS-95, Vail, CO, see
http://www.cs.cmu.edu/afs/user/caruana/pub/transfer.html.
Schmidhuber:87
J. Schmidhuber.
Evolutionary principles in self-referential learning, or on learning
how to learn: the meta-meta-... hook. Institut f\"ur Informatik, Technische
Universit\"at M\"unchen, 1987.
Schmidhuber:93selfreficann
J. Schmidhuber.
A self-referential weight matrix.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 446--451. Springer, 1993.
Whitehead:90
S.D. Whitehead and D. H. Ballard.
Active perception and reinforcement learning.
Neural Computation, 2(4):409--419, 1990.
Schmidhuber:91nips
J. Schmidhuber.
Reinforcement learning in Markovian and non-Markovian
environments.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors,
Advances in Neural Information Processing Systems 3, pages 500--506. San
Mateo, CA: Morgan Kaufmann, 1991.
Lin:93
L.J. Lin.
Reinforcement Learning for Robots Using Neural Networks.
PhD thesis, Carnegie Mellon University, Pittsburgh, January 1993.
Ring:94
M. B. Ring.
Continual Learning in Reinforcement Environments.
PhD thesis, University of Texas at Austin, Austin, Texas 78712,
August 1994.
Littman:94
M.I. Littman.
Memoryless policies: Theoretical limitations and practical results.
In J. A. Meyer D. Cliff, P. Husbands and S. W. Wilson, editors,
Proc. of the International Conference on Simulation of Adaptive Behavior:
From Animals to Animats 3, pages 297--305. MIT Press/Bradford Books, 1994.
Cliff:94
D. Cliff and S. Ross.
Adding temporary memory to ZCS.
Adaptive Behavior, 3:101--150, 1994.
Chrisman:92
L. Chrisman.
Reinforcement learning with perceptual aliasing: The perceptual
distinctions approach.
In Proceedings of the Tenth International Conference on
Artificial Intelligence, pages 183--188. AAAI Press, San Jose, California,
1992.
Jaakkola:95
T. Jaakkola, S. P. Singh, and M. I. Jordan.
Reinforcement learning algorithm for partially observable Markov
decision problems.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors,
Advances in Neural Information Processing Systems 7, pages 345--352. MIT
Press, Cambridge MA, 1995.
Kaelbling:95
L.P. Kaelbling, M.L. Littman, and A.R. Cassandra.
Planning and acting in partially observable stochastic domains.
Technical report, Brown University, Providence RI, 1995.
McCallum:95
R. A. McCallum.
Instance-based utile distinctions for reinforcement learning with
hidden state.
In A. Prieditis and S. Russell, editors, Machine Learning:
Proceedings of the Twelfth International Conference, pages 387--395. Morgan
Kaufmann Publishers, San Francisco, CA, 1995.
Wiering:96hq
M. Wiering and J. Schmidhuber.
HQ-Learning: Discovering Markovian subgoals for non-Markovian
reinforcement learning.
Technical Report IDSIA-95-96, IDSIA, 1996.
Russell:91
S. Russell and E. Wefald.
Principles of Metareasoning.
Artificial Intelligence, 49:361--395, 1991.
Boddy:94
M. Boddy and T. L. Dean.
Deliberation scheduling for problem solving in time-constrained
environments.
Artificial Intelligence, 67:245--285, 1994.
Berry:85
D. A. Berry and B. Fristedt.
Bandit Problems: Sequential Allocation of Experiments.
Chapman and Hall, London, 1985.
Gittins:89
J. C. Gittins.
Multi-armed Bandit Allocation Indices.
Wiley-Interscience series in systems and optimization. Wiley,
Chichester, NY, 1989.
Greiner:96
R. Greiner.
PALO: A probabilistic hill-climbing algorithm.
Artificial Intelligence, 83(2), 1996.
Kumar:86
P. R. Kumar and P. Varaiya.
Stochastic Systems: Estimation, Identification, and Adaptive
Control.
Prentice Hall, 1986.
Sutton:88
R. S. Sutton.
Learning to predict by the methods of temporal differences.
Machine Learning, 3:9--44, 1988.
WatkinsDayan:92
C. J. C. H. Watkins and P. Dayan.
Q-learning.
Machine Learning, 8:279--292, 1992.
Crites:96
R.H. Crites and A.G. Barto.
Improving elevator performance using reinforcement learning.
In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors,
Advances in Neural Information Processing Systems 8, pages 1017--1023,
Cambridge MA, 1996. MIT Press.
Lenat:83
D. Lenat.
Theory formation by heuristic search.
Machine Learning, 21, 1983.
SOAR:93
P. S. Rosenbloom, J. E. Laird, and A. Newell.
The SOAR Papers.
MIT Press, 1993.
Schmidhuber:93selfrefieee
J. Schmidhuber.
A neural network that embeds its own meta-levels.
In Proc. of the International Conference on Neural Networks '93,
San Francisco. IEEE, 1993.
Jieyu:96self
J. Zhao and J. Schmidhuber.
Incremental self-improvement for life-time multi-agent reinforcement
learning.
In Pattie Maes, Maja Mataric, Jean-Arcady Meyer, Jordan Pollack, and
Stewart W. Wilson, editors, From Animals to Animats 4: Proceedings of
the Fourth International Conference on Simulation of Adaptive Behavior,
Cambridge, MA, pages 516--525. MIT Press, Bradford Books, 1996.
Juergen Schmidhuber
2001-01-25
Back to Metalearning page
Back to Reinforcement Learning page