But an IQ test requires you to answer ``10'' instead of ``34''. Why not ``34''? The reasons are: (1) ``simple'' solutions are preferred over ``complex'' ones. This idea is often referred to as ``Occam's razor''. (2) It is assumed that the ``simpler'' the rules, the better the generalization on test data. (3) The makers of the IQ test assume that everybody agrees on what ``simple'' means.

Similarly, many researchers agree that learning algorithms ought to extract ``simple'' rules to explain training data. But what exactly does ``simple'' mean? The only theory providing a convincing objective criterion for ``simplicity'' is the theory of Kolmogorov complexity (or algorithmic complexity). Contrary to a popular myth, the incomputability of Kolmogorov complexity (due to the halting problem) does not prevent machine learning applications, because there are tractable yet very general extensions of Kolmogorov complexity. Few machine learning researchers, however, make use of the powerful tools provided by the theory (see Li and Vitanyi (1993) for an excellent overview, see Schmidhuber (1994c) for an application to fine arts).

**Purpose of paper.**
This work and the experiments to be presented herein are intended
(1) to demonstrate that basic concepts from the theory of Kolmogorov
complexity are indeed of interest for machine learning purposes,
(2) to encourage machine learning researchers to study this theory,
(3) to point to problems concerning ``incremental learning'',
and (4) to mention initial steps towards solving them.

**Outline.**
Section 2 briefly reviews basic concepts of algorithmic
complexity theory
relevant to machine learning, including
Levin complexity (an extension of Kolmogorov complexity)
and Levin's universal optimal search algorithm.
For a broad class of problems,
universal search can be shown to be optimal
with respect to total expected
search time, leaving aside a constant factor which does not depend on
the problem.
To my knowledge, section 3 presents the first
general (working) implementation of (a probabilistic
variant of) universal search.
Experiments in section 4
focus on the task of finding ``simple'' neural nets
with excellent generalization capability.
Section 5 goes beyond sections 1-4, by addressing ``incremental''
learning situations.

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page