Remarkably, there is a theoretically optimal way of making predictions based on observations, rooted in the early work of Solomonoff and Kolmogorov [59,28]. The approach reflects basic principles of Occam's razor: simple explanations of data are preferable to complex ones.
The theory of inductive inference quantifies what simplicity really means. Given certain very broad computability assumptions, it provides techniques for making optimally reliable statements about future events, given the past.
Once there is an optimal, formally describable way of predicting the future, we should be able to construct a machine that continually computes and executes action sequences that maximize expected or predicted reward, thus solving an ancient goal of AI research.
For many decades, however, AI researchers have not paid a lot of attention to the theory of inductive inference. Why not? There is another reason besides the fact that most of them have traditionally ignored theoretical computer science: the theory has been perceived as being associated with excessive computational costs. In fact, its most general statements refer to methods that are optimal (in a certain asymptotic sense) but incomputable. So researchers in machine learning and artificial intelligence have often resorted to alternative methods that lack a strong theoretical foundation but at least seem feasible in certain limited contexts. For example, since the early attempts at building a ``General Problem Solver'' [36,43] much work has been done to develop mostly heuristic machine learning algorithms that solve new problems based on experience with previous problems. Many pointers to learning by chunking, learning by macros, hierarchical learning, learning by analogy, etc. can be found in Mitchell's book  and Kaelbling's survey .
Recent years, however, have brought substantial progress in the field of computable and feasible variants of optimal algorithms for prediction, search, inductive inference, problem solving, decision making, and reinforcement learning in very general environments. In what follows, I will focus on results predominantly from my own lab at IDSIA.
Sections 3, 4, 7 relate Occam's razor and the notion of simplicity to the shortest algorithms for computing computable objects, and will concentrate on recent asymptotic optimality results for universal learning machines, essentially ignoring issues of practical feasibility--compare my postdoc Hutter's contribution  in this volume.
Section 5, however, will focus on our recent non-traditional simplicity measure which is not based on the shortest but on the fastest way of describing objects, and Section 6 will use this measure to derive non-traditional predictions concerning the future of our universe.
Sections 8, 9, 10 will finally address quite pragmatic issues and ``true'' time-optimality: given a problem and only so much limited computation time, what is the best way of spending it on evaluating solution candidates? In particular, Section 9 will present an optimally fast way of incrementally solving each task in a problem sequence, given a probability distribution (the bias) on programs computing solution candidates. Bias shifts are computed by program prefixes that modify the distribution on their suffixes by reusing successful code for previous tasks (stored in non-modifiable memory). No tested program gets more runtime than its probability times the total search time. In illustrative experiments, ours becomes the first general system to learn a universal solver for arbitrary disk Towers of Hanoi tasks (minimal solution size ). It demonstrates the advantages of incremental learning by profiting from previously solved, simpler tasks involving samples of a simple context free language. Section 10 discusses how to use this approach for building universal reinforcement learners.