Unbeknownst to many machine learning researchers, there exists a search algorithm with amazing theoretical properties: for a broad class of search problems, Levin search (LS) [#!Levin:73!#,#!Levin:84!#] has the optimal order of computational complexity. See [#!LiVitanyi:93!#] for an overview. See (Schmidhuber 1995, 1997a) for recent implementations/applications.
Basic concepts. LS requires a set of
primitive,
prewired instructions
that can be composed to
form arbitrary sequential programs. Essentially, LS generates and
tests solution candidates
(program outputs represented as strings
over a finite alphabet) in order of their Levin complexities
, where
stands for a program that
computes
in
time steps, and
is the probability of
guessing
according to a fixed Solomonoff-Levin distribution
[#!LiVitanyi:93!#] on the set of possible programs (in section 3.2,
however, we will make the distribution variable).
Optimality. Given primitives representing a
universal programming language, for a broad class of problems
LS can be shown to be optimal with respect to total expected search
time, leaving aside a constant factor independent of the problem size
[#!Levin:73!#,#!Levin:84!#,#!LiVitanyi:93!#]. More formally: a problem is a
symbol string that conveys all information about another symbol string
called its solution, where the solution can be extracted by some (search)
algorithm, given the problem. Suppose there is an algorithm that solves
certain time-limited optimization problems
or inversion problems in
steps, where
is a total recursive
function and
is a positive
integer representing problem size. Then universal LS will solve the
same problems in at most
steps (although a large constant
may be buried in the
notation). Despite this strong result, until
recently LS has not received much attention except in purely theoretical
studies -- see, e.g., [#!Watanabe:92!#].
Of course, LS and any other algorithm will fail to quickly solve problems whose solutions all have high algorithmic complexity. Unfortunately, almost all possible problems are of this kind [#!Kolmogorov:65!#,#!Chaitin:69!#,#!Solomonoff:64!#]. In fact, the realm of practical computer science is limited to solving the comparatively few tasks with low-complexity solutions. Fortunately such tasks are rather common in our regular universe.
Practical implementation. In our practical LS version there is an
upper bound
on program length (due to obvious storage limitations).
denotes the address of the
-th instruction. Each program
is generated incrementally: first we select an instruction for
,
then for
, etc.
is given by a matrix
, where
(
,
) denotes the probability
of selecting
as the instruction at address
, given that the
first
instructions have already been selected. The probability
of a program is the product of the probabilities of its constituents.
LS' arguments are
and the representation of a problem denoted by
. LS' output is a program that computes a solution to the problem if it
found any. In this section, all
will remain
fixed. LS is implemented as a sequence of longer and longer phases:
Levin search(problem
, probability matrix
)
(1) Set, the number of the current phase, equal to 1. In what follows, let
denote the set of not yet executed programs
satisfying
![]()
.
(2) Repeat
(2.1) Whileand no solution found do: Generate a program
, and run
until it either halts or until it used up
steps. If
computed a solution for
, return
and exit.
(2.2) Set:= 2
![]()
until solution found or![]()
![]()
.
Return empty program.
Here
and
are prespecified
constants. The procedure above is essentially the same (has the same
order of complexity) as the one described in the second paragraph of
this section -- see, e.g., [#!Solomonoff:86!#,#!LiVitanyi:93!#].