next up previous
Next: About this document ... Up: REINFORCEMENT LEARNING WITH SELF-MODIFYING Previous: DISCUSSION



Schmidhuber:94self J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakult\"at f\"ur Informatik, Technische Universit\"at M\"unchen, 1994. Revised 1995.

Wolpert:96 D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341--1390, 1996.

Schmidhuber:97nn J. Schmidhuber. Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5):857--873, 1997.

Utgoff:86 P. Utgoff. Shift of bias for inductive concept learning. In R. Michalski, J. Carbonell, and T. Mitchell, editors, Machine Learning, volume 2, pages 163--190. Morgan Kaufmann, Los Altos, CA, 1986.

Solomonoff:64 R.J. Solomonoff. A formal theory of inductive inference. Part I. Information and Control, 7:1--22, 1964.

Kolmogorov:65 A.N. Kolmogorov. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1:1--11, 1965.

Chaitin:69 G.J. Chaitin. On the length of programs for computing finite binary sequences: statistical considerations. Journal of the ACM, 16:145--159, 1969.

LiVitanyi:93 M. Li and P. M. B. Vit\'anyi. An Introduction to Kolmogorov Complexity and its Applications. Springer, 1993.

Schmidhuber:95kol J. Schmidhuber. Discovering solutions with low Kolmogorov complexity and high generalization capability. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 488--496. Morgan Kaufmann Publishers, San Francisco, CA, 1995.

Wiering:96levin M.A. Wiering and J. Schmidhuber. Solving POMDPs with Levin search and EIRA. In L. Saitta, editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 534--542. Morgan Kaufmann Publishers, San Francisco, CA, 1996.

Levin:73 L. A. Levin. Universal sequential search problems. Problems of Information Transmission, 9(3):265--266, 1973.

Levin:84 L. A. Levin. Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61:15--37, 1984.

Solomonoff:86 R.J. Solomonoff. An application of algorithmic probability to problems in artificial intelligence. In L. N. Kanal and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 473--491. Elsevier Science Publishers, 1986.

Watanabe:92 O. Watanabe. Kolmogorov complexity and computational complexity. EATCS Monographs on Theoretical Computer Science, Springer, 1992.

Schmidhuber:94kol J. Schmidhuber. Discovering problem solutions with low Kolmogorov complexity and high generalization capability. Technical Report FKI-194-94, Fakult\"at f\"ur Informatik, Technische Universit\"at M\"unchen, 1994. Short version in A. Prieditis and S. Russell, eds., Machine Learning: Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, pages 488--496, San Francisco, CA, 1995.

Caruana:95 R. Caruana, D. L. Silver, J. Baxter, T. M. Mitchell, L. Y. Pratt, and S. Thrun. Learning to learn: knowledge consolidation and transfer in inductive systems, 1995. Workshop held at NIPS-95, Vail, CO, see http://www.cs.cmu.edu/afs/user/caruana/pub/transfer.html.

Schmidhuber:87 J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Institut f\"ur Informatik, Technische Universit\"at M\"unchen, 1987.

Schmidhuber:93selfreficann J. Schmidhuber. A self-referential weight matrix. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 446--451. Springer, 1993.

Whitehead:90 S.D. Whitehead and D. H. Ballard. Active perception and reinforcement learning. Neural Computation, 2(4):409--419, 1990.

Schmidhuber:91nips J. Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500--506. San Mateo, CA: Morgan Kaufmann, 1991.

Lin:93 L.J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, January 1993.

Ring:94 M. B. Ring. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, Austin, Texas 78712, August 1994.

Littman:94 M.I. Littman. Memoryless policies: Theoretical limitations and practical results. In J. A. Meyer D. Cliff, P. Husbands and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats 3, pages 297--305. MIT Press/Bradford Books, 1994.

Cliff:94 D. Cliff and S. Ross. Adding temporary memory to ZCS. Adaptive Behavior, 3:101--150, 1994.

Chrisman:92 L. Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth International Conference on Artificial Intelligence, pages 183--188. AAAI Press, San Jose, California, 1992.

Jaakkola:95 T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 345--352. MIT Press, Cambridge MA, 1995.

Kaelbling:95 L.P. Kaelbling, M.L. Littman, and A.R. Cassandra. Planning and acting in partially observable stochastic domains. Technical report, Brown University, Providence RI, 1995.

McCallum:95 R. A. McCallum. Instance-based utile distinctions for reinforcement learning with hidden state. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 387--395. Morgan Kaufmann Publishers, San Francisco, CA, 1995.

Wiering:96hq M. Wiering and J. Schmidhuber. HQ-Learning: Discovering Markovian subgoals for non-Markovian reinforcement learning. Technical Report IDSIA-95-96, IDSIA, 1996.

Russell:91 S. Russell and E. Wefald. Principles of Metareasoning. Artificial Intelligence, 49:361--395, 1991.

Boddy:94 M. Boddy and T. L. Dean. Deliberation scheduling for problem solving in time-constrained environments. Artificial Intelligence, 67:245--285, 1994.

Berry:85 D. A. Berry and B. Fristedt. Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London, 1985.

Gittins:89 J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.

Greiner:96 R. Greiner. PALO: A probabilistic hill-climbing algorithm. Artificial Intelligence, 83(2), 1996.

Kumar:86 P. R. Kumar and P. Varaiya. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall, 1986.

Sutton:88 R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9--44, 1988.

WatkinsDayan:92 C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279--292, 1992.

Crites:96 R.H. Crites and A.G. Barto. Improving elevator performance using reinforcement learning. In D.S. Touretzky, M.C. Mozer, and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017--1023, Cambridge MA, 1996. MIT Press.

Lenat:83 D. Lenat. Theory formation by heuristic search. Machine Learning, 21, 1983.

SOAR:93 P. S. Rosenbloom, J. E. Laird, and A. Newell. The SOAR Papers. MIT Press, 1993.

Schmidhuber:93selfrefieee J. Schmidhuber. A neural network that embeds its own meta-levels. In Proc. of the International Conference on Neural Networks '93, San Francisco. IEEE, 1993.

Jieyu:96self J. Zhao and J. Schmidhuber. Incremental self-improvement for life-time multi-agent reinforcement learning. In Pattie Maes, Maja Mataric, Jean-Arcady Meyer, Jordan Pollack, and Stewart W. Wilson, editors, From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, pages 516--525. MIT Press, Bradford Books, 1996.

Juergen Schmidhuber
2001-01-25


Back to Metalearning page
Back to Reinforcement Learning page