PRE-2000 WORK ON META-LEARNING AND SELF-MODIFYING CODE
A general method for incremental self-improvement
and multiagent learning.
In X. Yao, editor, Evolutionary Computation: Theory and Applications.
Chapter 3, pp.81-123, Scientific Publ. Co., Singapore,
1999 (submitted 1996).
J. Schmidhuber, J. Zhao, N. Schraudolph.
Reinforcement learning with self-modifying policies.
In S. Thrun and L. Pratt, eds.,
Learning to learn, Kluwer, pages 293-309, 1997.
J. Schmidhuber, J. Zhao, and M. Wiering.
Shifting inductive bias with success-story algorithm,
adaptive Levin search, and incremental self-improvement.
Machine Learning 28:105-130, 1997.
J. Zhao and J. Schmidhuber.
Solving a complex prisoner's dilemma
with self-modifying policies.
In From Animals to Animats 5: Proceedings
of the Fifth International Conference on Simulation of Adaptive
Behavior, 1998, in press.
J. Schmidhuber and J. Zhao and M. Wiering.
Simple principles of metalearning.
Technical Report IDSIA-69-96, IDSIA, June 1996.
M. Wiering and J. Schmidhuber.
Solving POMDPs using Levin search and EIRA.
In L. Saitta, ed.,
Proceedings of the 13th International Conference (ICML 1996),
Morgan Kaufmann Publishers, San Francisco, CA, 1996.
Environment-independent reinforcement acceleration
(invited talk at Hongkong University of Science and Technology).
Technical Note IDSIA-59-95, June 1995.
Beyond "Genetic Programming": Incremental Self-Improvement.
In J. Rosca, ed., Proc. Workshop on Genetic Programming at ML95,
pages 42-49. National Resource Lab for the study of Brain and Behavior,
On learning how to learn learning strategies.
Technical Report FKI-198-94, Fakultät für Informatik,
Technische Universität München, November 1994.
A neural network that embeds its own meta-levels.
In Proc. of the International Conference on Neural Networks '93,
San Francisco. IEEE, 1993.
An introspective network that can learn to run its own weight change
In Proc. of the Intl. Conf. on Artificial Neural Networks,
Brighton, pages 191-195. IEE, 1993.
A self-referential weight matrix.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 446-451. Springer, 1993.
Steps towards `self-referential' learning.
Technical Report CU-CS-627-92, Dept. of Comp. Sci., University of
Colorado at Boulder, November 1992.
Evolutionary principles in self-referential learning, or on learning
how to learn: The meta-meta-... hook. Diploma thesis,
Institut für Informatik, Technische Universität München, 1987.
(GP) is applied to itself, to recursively evolve
better GP methods.
In 1992 Schmidhuber suggested that recurrent neural networks (RNNs) can be used
to metalearn learning algorithms. A gradient-based
metalearning algorithm was derived (see refs 2-5 above).
However, it did not work very well in practice
because standard RNNs were used to implement it, instead of
the better and more recent
But his former PhD student
has continued along these
lines and achieved an astonishing
result, using LSTM nets instead of traditional
RNNs (ICANN 2001):
LSTM networks with roughly 5000 weights are trained
to metalearn fast online learning
algorithms for nontrivial classes of functions, such as all quadratic
functions of two variables. LSTM is necessary because metalearning
typically involves huge time lags between important events, and standard
RNNs cannot deal with these. After a month
of metalearning on a PC all weights are frozen, then the frozen
net is used as follows: some new function f is selected, then
a sequence of
random training exemplars of the form ...data/target/data/target/data...
is fed into the input units, one sequence element at a time. After about 30
exemplars the frozen recurrent net correctly predicts target inputs before
it sees them. No weight changes! How is this possible? After metalearning
the frozen net implements a sequential learning algorithm which apparently
computes something like error signals from data inputs and target inputs
and translates them into changes of internal estimates of f. Parameters
of f, errors, temporary variables, counters, computations of f and of
parameter updates are all somehow represented in form of circulating
activations. Remarkably, the new - and quite opaque - online learning
algorithm running on the frozen network is much faster than standard
backprop with optimal learning rate. This indicates that one can use
gradient descent to metalearn learning algorithms that outperform gradient
descent. Furthermore, the metalearning procedure automatically avoids
overfitting in a principled way, since it punishes overfitting online
learners just like it punishes slow ones, simply because overfitters
and slow learners cause more cumulative errors during metalearning.