But an IQ test requires you to answer ``10'' instead of ``34.'' Why not ``34''? The reasons are: (1) ``Simple'' solutions are preferred over ``complex'' ones. This idea is often referred to as ``Occam's razor'' (e.g., Blumer et al., 1987). (2) It is assumed that the ``simpler'' the rules, the better the generalization on test data. (3) The makers of the IQ test assume that everybody agrees on what ``simple'' means.

Similarly, many neural net and machine learning researchers
agree^{1}that learning algorithms
ought to extract ``simple'' rules to explain training data.
But what exactly does ``simple'' mean? One theory providing
a convincing objective criterion for ``simplicity''
is the theory of Kolmogorov complexity (or algorithmic complexity).
Contrary to a popular myth, the incomputability
of Kolmogorov complexity
(due to the halting problem)
does not prevent machine learning applications, because there
are tractable yet very general extensions of Kolmogorov complexity.
Few neural net and machine learning researchers, however, make
use of the powerful tools provided by the theory.

**Purpose of paper.**
This work and the experiments to be presented herein are intended
(1) to demonstrate that basic concepts from the theory of Kolmogorov
complexity are indeed of interest for
the study of neural network generalization and for
machine learning purposes in general,
(2) to encourage neural net researchers to study this theory,
and (3) to point to some limitations of the current state of the art
and to important open problems (see also [Schmidhuber, 1994]).

**Outline.**
Section 2 briefly reviews the following basic concepts of algorithmic
complexity theory relevant to machine learning:
(1) Kolmogorov complexity,
(2) The universal prior (or Solomonoff-Levin distribution),
under which
the probability of a computable object (like the solution to a problem)
is essentially equal to
the probability of guessing its shortest program on a universal computer,
(3) Solomonoff's theory of inductive inference, and how
it justifies Occam's razor,
(4) The principle of minimal description length (MDL) in its
general form,
(5) Levin complexity (a generalization of Kolmogorov complexity)
and Levin's universal optimal search algorithm.
For a given computable solution to a problem, consider the
negative logarithm of the
probability of guessing a program that computes it, plus the logarithm of
its runtime.
The Levin complexity of the solution is the minimal
possible value of this. Levin's universal search algorithm essentially
generates and tests solution candidates (from a set of possible
computable candidates) in order of their Levin complexity,
until a solution is found.
For a broad class of problems,
universal search can be shown to be optimal
with respect to total expected
search time, leaving aside a constant factor which does not depend on
the problem.
To the author's knowledge, section 3 presents the first
general implementation of (a probabilistic
variant of) universal search
on a conventional digital machine.
It is based on efficient ``self-sizing'' programs
which influence their own runtime and storage size.
Simulations in section 4
focus on the task of finding ``simple'' neural nets
with high generalization capability.
Experiments with toy problems demonstrate that the method,
at least in certain cases where it is computationally feasible,
can lead to generalization results that are impossible
to obtain by more traditional neural net algorithms
(also briefly reviewed in section 4).
Section 5 discusses the fact that
universal search is not directly applicable to *incremental*
learning situations, and mentions new ideas on how to deal with
such situations.
Section 6 concludes with general remarks on problem
solving and Occam's razor.

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page