.

Scroll this page down for papers on surprise, novelty, artificial creativity and curiosity
Jürgen Schmidhuber's theory of
Last
update
2010
.
TU Munich Cogbotlab

ACTIVE EXPLORATION,
ARTIFICIAL CURIOSITY &
WHAT'S INTERESTING

Attentive vision Reinforcement Learning

Only data with still unknown but learnable statistical or algorithmic regularities are truly novel or surprising or interesting and thus deserve attention.

Even beautiful things are not necessarily interesting. Beauty reflects low complexity with respect to the observer's current knowledge, interestingness and curiosity the learning process leading from high to low subjective complexity. More.

ON TV: Schmidhuber's theory of interestingness / curiosity / beauty / surprise / novelty / creativity was subject of a TV documentary (BR "Faszination Wissen", 29 May 2008, 21:15, plus several later repeats on other channels).

See also an interview in HPlus Magazine: Build Optimal Scientist, Then Retire. This got slashdotted.


What's interesting? Many interesting things are unexpected, but not all unexpected things are interesting or surprising. According to Schmidhuber's formal theory of surprise & novelty & interestingness & attention & creativity & intrinsic motivation, curious agents are interested in learnable but yet unknown regularities, and get bored by both predictable and inherently unpredictable things. His active reinforcement learners translate mismatches between expectations and reality into curiosity rewards or intrinsic rewards for curious, creative, exploring agents which like to observe / create truly surprising aspects of the world, to learn novel patterns [references 1-20 below; 1990-2010]. His first curiosity- driven, creative agents [1,2] (1990) used an adaptive predictor or data compressor to predict the next input, given some history of actions and inputs. The action- generating, reward- maximizing controller got rewarded for action sequences provoking still unpredictable inputs. To discourage the controller from focusing on truly unpredictable, random inputs (such as uninteresting details of white noise), later approaches [e.g., refs 3, 4, 6, 1991-] model the expected progress of the predictor: parts of the world where the predictor fails to learn (no data compression progress!) become less interesting than those where its predictions improve. Later systems (1997-) also take into account the computational cost of learning new skills in systems that learn when to learn and what to learn [refs 8, 11, 12]. Recent papers (2006-) focus on mathematically optimal artificial curiosity & creativity, and provide a simple formal explanation of art & science & humor [refs 14-20].
.

Above: curiosity is not necessarily good for you!

Nevertheless, we show [e.g., refs 3, 4, 6, 8, 12 below] that intrinsic curiosity reward can speed up the construction of predictive world models and the collection of external reward.
.

Low- complexity Art
Unsupervised learning and predictability minimization CoTeSys: Schmidhuber's group Resilient machine with Continuous Self-Modeling Learning Robots
Fundamental Principle of Artificial Curiosity and Creativity:

Reward the reward- optimizing controller for actions yielding data that cause improvements of the adaptive predictor or data compressor!

(Formulated in the early 1990s; basis of much of the recent work in Developmental Robotics since 2004)

Variant 1: Reward the controller whenever the predictor errs [1990; refs 1a, 1, 2].

Variant 2: Reward the controller whenever the predictor improves / becomes more reliable [1991; refs 3, 4, 6, 13, 14].

Variant 3: Reward the controller in proportion to the Kullback-Leibler distance between the predictor's subjective probability distributions before and after an observation - the relative entropy between its prior and posterior [1995; ref 6].

Variant 4 (zero sum intrinsic reward games): Two reward- maximizing modules bet on outcomes of potentially surprising experiments they have agreed upon [1997-2002; refs 8, 11, 12].

Variant 5 (progress in data compression): Store entire life, keep trying to compress it, reward controller for actions that yield data causing compressor improvements [1990s - 2008; e.g., refs 14-17].

Both art and science are by-products of the desire to create / discover more data that is compressible in hitherto unknown ways! [Refs 15-20]

Related links:
1. Full publication list (with additional HTML and pdf links)
2. Reinforcement learning
3. Recurrent network predictors
4. Learning attentive vision
5. Reinforcement learning economies
6. Learning to learn
7. Learning robots
8. Self-modeling robots
9. Hierarchical learning & subgoal generation
10. Beauty
11. CoTeSys group

Recent invited talks on Creativity, Curiosity, Beauty, Novel Patterns, True Surprise & Novelty, Art & Science & Humor:

12 Nov 2009: Keynote for Multiple Ways to Design 09: Art & Science

3 Oct 2009: Invited talk for Singularity Summit, New York City. See original video (40 min). Or save time by watching the condensed but jagged video (20 min), also available at the ShanghAI Lectures. Save even more time by watching the short video (10 min, also at the bottom of this page).

25 Aug 2009: Dirac summer school, Leuven, Belgium

12 Jul 2009: Dagstuhl Castle Seminar on Computational Creativity

3 Sep 2008: Keynote for Knowledge-Based and Intelligent Information & Engineering Systems KES 2008, Zagreb

2 Oct 2007: Joint invited lecture for Algorithmic Learning Theory (ALT 2007) and Discovery Science (DS 2007), Sendai, Japan (the only joint invited lecture). Preprint

23 Aug 2007: Keynote for A*STAR Meeting on Expectation & Surprise, Singapore

12 July 2007: Keynote for Art Meets Science 2007: "Randomness vs simplicity & beauty in physics and the fine arts"

Fibonacci web design
by J. Schmidhuber

20. J. Schmidhuber. Artificial Scientists & Artists Based on the Formal Theory of Creativity. In Proceedings of the Third Conference on Artificial General Intelligence (AGI-2010), Lugano, Switzerland. PDF.

19. J. Schmidhuber. Art & science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways. In M. Botta (ed.), Multiple ways to design research. Research cases that reshape the design discipline, Milano-Lugano, Swiss Design Network - Et al. Edizioni, 2009, pp. 98-112. (Keynote talk.) PDF of preprint.

18. J. Schmidhuber. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Based on keynote talk for KES 2008 (below) and joint invited lecture for ALT 2007 / DS 2007 (below). Short version: ref 17 below. Long version in G. Pezzulo, M. V. Butz, O. Sigaud, G. Baldassarre, eds.: Anticipatory Behavior in Adaptive Learning Systems, from Sensorimotor to Higher-level Cognitive Capabilities, Springer, LNAI, 2009, in press. Preprint (2008, revised 2009): arXiv:0812.4360. PDF (Dec 2008). PDF (April 2009).

17. J. Schmidhuber. Simple Algorithmic Theory of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Journal of SICE, 48(1):21-32, 2009. PDF.

16. J. Schmidhuber. Driven by Compression Progress. In Proc. Knowledge- Based Intelligent Information and Engineering Systems KES-2008, Lecture Notes in Computer Science LNCS 5177, p 11, Springer, 2008. (Abstract of invited keynote talk.) PDF.

15. J. Schmidhuber. Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity. In V. Corruble, M. Takeda, E. Suzuki, eds., Proc. 10th Intl. Conf. on Discovery Science (DS 2007) p. 26-38, LNAI 4755, Springer, 2007. Also in M. Hutter, R. A. Servedio, E. Takimoto, eds., Proc. 18th Intl. Conf. on Algorithmic Learning Theory (ALT 2007) p. 32, LNAI 4754, Springer, 2007. (Joint invited lecture for DS 2007 and ALT 2007, Sendai, Japan, 2007.) Preprint: arxiv:0709.0674. PDF.
Curiosity as the drive to improve the compression of the lifelong sensory input stream: interestingness as the first derivative of subjective "beauty" or compressibility.

14. J.  Schmidhuber. Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts. Connection Science, 18(2): 173-187, June 2006. PDF.
On mathematically optimal universal artificial curiosity, based on theoretically best possible ways of maximizing learning progress in embedded agents or robots with an intrinsic motivation to learn skills that lead to a better understanding of the world and what can be done in it. It is also pointed out how music and the arts can be formally understood as a consequence of the principle of artificial curiosity and creativity.

13. J.  Schmidhuber. Self-Motivated Development Through Rewards for Predictor Errors / Improvements. Developmental Robotics 2005 AAAI Spring Symposium, March 21-23, 2005, Stanford University, CA. PDF.

12. J.  Schmidhuber. Exploring the Predictable. In Ghosh, S. Tsutsui, eds., Advances in Evolutionary Computing, p. 579-612, Springer, 2002. PDF . HTML. See also refs [8, 11, 1997-].

11. J. Schmidhuber. What's interesting? In Abstract Collection of SNOWBIRD: Machines That Learn. Utah, April 1998.

10. M. Wiering and J. Schmidhuber. Efficient model-based exploration. In R. Pfeiffer, B. Blumberg, J. Meyer, S. W. Wilson, eds., From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, p. 223-228, MIT Press, 1998.

9. M. Wiering and J. Schmidhuber. Learning exploration policies with models. In Proc. CONALD, 1998.

8. J. Schmidhuber. What's interesting? Technical Report IDSIA-35-97, IDSIA, July 1997 (23 pages, 10 figures, 157 K, 834 K gunzipped).
Here we focus on automatic creation of predictable internal abstractions of complex spatio- temporal events: two competing, intrinsically motivated agents agree on essentially arbitrary algorithmic experiments and bet on their possibly surprising (not yet predictable) outcomes in zero-sum games, each agent profiting from outwitting / surprising the other. The focus is on exploring the space of general algorithms (as opposed to traditional simple mappings from inputs to outputs); the general system [12] focuses on the interesting things by losing interest in both predictable and unpredictable aspects of the world. Unlike the previous systems with intrinsic motivation (1990, 91, 95, see below), the system also takes into account the computational cost of learning new skills, learning when to learn and what to learn. See also refs [11, 12, 1998-2002].

7. J.  Schmidhuber, J.  Zhao, N. Schraudolph. Reinforcement learning with self-modifying policies. In S. Thrun and L. Pratt, eds., Learning to learn, Kluwer, pages 293-309, 1997. PDF; HTML.

6. J. Storck, S. Hochreiter, and J.  Schmidhuber. Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF . HTML.
In this paper the curiosity reward is again proportional to the predictor's surprise / information gain, this time measured as the Kullback-Leibler distance between the learning predictor's subjective probability distributions before and after new observations - the relative entropy between its prior and posterior. (In 2005 Itti & Baldi called this "Bayesian surprise" and demonstrated experimentally that it explains certain patterns of human visual attention better than certain previous approaches.)
Note the differences to "Active Learning": The latter typically focuses on choosing which data points to evaluate next in order to maximize information gain (i.e., one-step look-ahead) assuming all data point evaluations are equally costly. The 1995 system, however, is more general and takes into account: (1) arbitrary delays between experimental actions agents and corresponding information gains, (2) the highly environment-dependent costs of obtaining or creating not just individual data points but entire data sequences.

5. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994.

4. J.  Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991. PDF . HTML.
The second peer-reviewed English-language publication on artificial curious agents with intrinsic motivation. The system uses reinforcement learning to create behaviors that lead to parts of the environment where previous experience indicates that the prediction error can be improved (not necessarily where it is high). So the agent is neither attracted by unpredictable randomness nor by totally predictable aspects of the world. Instead it likes to go where it learnt to expect additional learning progress.
(Quite a few later publications on developmental robotics and intrinsic reward took up this basic idea, e.g., Oudeyer & Kaplan (2007), whose work is restricted to one-step look-ahead though, and doesn't allow for delayed intrinsic rewards like the 1991 paper above.)

3. J.  Schmidhuber. Adaptive curiosity and adaptive confidence. Technical Report FKI-149-91, Institut für Informatik, Technische Universität München, April 1991. PDF.

2. J.  Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991. PDF . HTML.
The first peer-reviewed English-language publication on artificial curious agents with intrinsic motivation. The system uses reinforcement learning to create behaviors that lead the agent to parts of the environment where the separate predictor's prediction error is expected to be high, assuming one can learn something there.
Quite a few later publications on developmental robotics and/or intrinsic reward took up this basic idea, e.g., Singh & Barto & Chentanez (2005).

1. J.  Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, Institut für Informatik, Technische Universität München, February 1990 (revised in November). PDF (hand-drawn figures omitted).

1a. J. Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem). Dissertation, Institut für Informatik, Technische Universität München, 1990. PDF . HTML.

Differences to Shannon / Boltzmann's notion of surprise. Since the early 1990s, the papers above have repeatedly pointed out an essential difference between our theory of surprise & novelty and Shannon's traditional information theory based on Boltzmann's entropy notion. Consider two extreme examples of uninteresting, unsurprising, boring data. A vision-based agent that always stays in the dark will experience an extremely compressible, soon totally predictable and unsurprising history of unchanging visual inputs. In front of a screen full of white noise conveying a lot of information and "novelty" and "surprise" in the traditional sense of Boltzmann (1800s) and Shannon (1948), however, it will experience highly unpredictable and fundamentally uncompressible data. In both cases the data gets boring quickly as it does not allow for learning new things or for further compression progress. Neither the arbitrary nor the fully predictable is truly novel or surprising or interesting - only data with still unknown but learnable statistical or algorithmic regularities are! That's why our theory of surprise and curiosity and creativity takes the time-varying state of the subjective, learning observer into account.

Check out related papers on adaptive visual attention with foveas (overview page):

J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135-141, 1991. Figures in overview page. PDF . HTML.

J.  Schmidhuber and R. Huber. Using sequential adaptive neuro-control for efficient learning of rotation and translation invariance. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 315-320. Elsevier Science Publishers B.V., North-Holland, 1991.
.

Evolution RNN-Evolution Feedback Network Universal AI Goedel machine

Deutsch

Right: appetizer video on the formal theory of curiosity & creativity & beauty & surprise & humor. These are excerpts (10 min) of the original talk (40 min) mentioned above.