Next: BASIC CONCEPTS Up: Discovering Solutions with Low Previous: Discovering Solutions with Low

# INTRODUCTION

The first number is 2. The second number is 4. The third number is 6. The fourth number is 8. What is the fifth number? The answer is 34. The reason is the following law. The th number is

But an IQ test requires you to answer 10'' instead of 34''. Why not 34''? The reasons are: (1) simple'' solutions are preferred over complex'' ones. This idea is often referred to as Occam's razor''. (2) It is assumed that the simpler'' the rules, the better the generalization on test data. (3) The makers of the IQ test assume that everybody agrees on what simple'' means.

Similarly, many researchers agree that learning algorithms ought to extract simple'' rules to explain training data. But what exactly does simple'' mean? The only theory providing a convincing objective criterion for simplicity'' is the theory of Kolmogorov complexity (or algorithmic complexity). Contrary to a popular myth, the incomputability of Kolmogorov complexity (due to the halting problem) does not prevent machine learning applications, because there are tractable yet very general extensions of Kolmogorov complexity. Few machine learning researchers, however, make use of the powerful tools provided by the theory (see Li and Vitanyi (1993) for an excellent overview, see Schmidhuber (1994c) for an application to fine arts).

Purpose of paper. This work and the experiments to be presented herein are intended (1) to demonstrate that basic concepts from the theory of Kolmogorov complexity are indeed of interest for machine learning purposes, (2) to encourage machine learning researchers to study this theory, (3) to point to problems concerning incremental learning'', and (4) to mention initial steps towards solving them.

Outline. Section 2 briefly reviews basic concepts of algorithmic complexity theory relevant to machine learning, including Levin complexity (an extension of Kolmogorov complexity) and Levin's universal optimal search algorithm. For a broad class of problems, universal search can be shown to be optimal with respect to total expected search time, leaving aside a constant factor which does not depend on the problem. To my knowledge, section 3 presents the first general (working) implementation of (a probabilistic variant of) universal search. Experiments in section 4 focus on the task of finding simple'' neural nets with excellent generalization capability. Section 5 goes beyond sections 1-4, by addressing incremental'' learning situations.

Next: BASIC CONCEPTS Up: Discovering Solutions with Low Previous: Discovering Solutions with Low
Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Program Evolution page
Back to Algorithmic Information page
Back to Speed Prior page