Levin's algorithm was considered of interest for theoretical purposes (see e.g. Allender, 1992; and Li and Vitanyi, 1993). However, it seems that nobody implemented it for experimental applications, perhaps in fear of the ominous ``constant factor'' which may be large. To my knowledge, general universal search was implemented for the first time during the project that led to this paper. See also Heil's more recent diploma thesis at TUM (1995). Solomonoff (1986) himself apparently implemented very restricted versions. In what follows, however, I will focus on the (working) implementation of a slightly different probabilistic algorithm (also based on Levin complexity and strongly inspired by universal search). The experimental results obtained with the probabilistic algorithm (see section 4) are very similar to those obtained by the original universal search procedure.
Overview.
The method described in this section
searches and finds algorithms that compute
solutions to a given problem specified by possibly
very limited ``training data''.
The goal is to discover solutions with high generalization
performance on ``test data'' unavailable during the
search phase.
Towards this purpose, the probabilistic
search algorithm randomly generates programs written in a general
assembler-like programming
language based on sequences of integers.
Programs may influence their own storage size and runtime.
Each program computes a solution candidate
which is tested on the training data.
The probability of generating a program
and
an upper bound
for its runtime essentially
equals the quotient of the probability
of guessing
, and
. This implies that
candidates with low Levin complexity are preferred
over candidates with high Levin complexity.
To measure generalization performance,
candidates fitting
the training data are evaluated on test data.
In the experiments (section 4), solution candidates will be
weight matrices for a
neural net supposed to solve certain generalization
tasks that are difficult or impossible to solve by conventional
neural net algorithms.
``Universal'' programming language.
Programs are sequences of integers. They are stored in
the storage, consisting of a single array of cells.
Each cell has an integer address in the interval
.
Both
and
are positive integers.
The program tape is the set of cells with addresses
in
.
The work tape is the set of cells with addresses
in
.
Cells with non-negative addresses belong to the program tape.
Cells with negative addresses belong to the work tape.
The contents of the cell
with address
is denoted by
, and is
of type integer as well (in the implemented version,
maxint equals 10000).
During execution of a program, the used portion of
the program tape may increase.
The used portion of the
work tape may increase or decrease.
At any time step, the variable
(
) denotes the topmost
address of the used storage.
The variable
(
) denotes its smallest
address.
At any given time,
legal addresses are in the dynamic range
, where
, by definition.
At any given time, the integer sequence written on the program tape
(up to address
)
is called the current program.
implies the ``empty'' program.
Instructions.
At any given time, the variable InstructionPointer may equal
the address of one of the cells, whose contents
may be interpretable as an instruction.
There are
different possible instructions
(in the implemented version,
).
Each instruction is
uniquely represented by an instruction number from
the set
.
An instruction may have up to three arguments (of type integer), or none.
Arguments are stored in the addresses following
the address of the instruction.
For each argument of each instruction, there is
a legal argument range
(a set of integer values the argument is allowed to take on).
Within certain limits, legal argument ranges can be
dynamically modified by programs, as will be
seen shortly.
Initialization, time limits, time probability.
In the beginning of the execution of a ``program'' or ``run'',
the variables
OracleAddress,
InstructionPointer,
Min,
and CurrentRuntime
are all set to zero.
The variable CurrentTimeLimit is used to define
an upper bound for the runtime of the current program.
To obtain a probabilistic variant of universal search,
CurrentTimeLimit is chosen randomly as follows:
elements from the set
are drawn with equal probability
until the first ``1'' is drawn.
Let
denote the number of trials.
CurrentTimeLimit is set to
,
where
equals 16 time steps
(each program will be allowed
to execute at least 16 instructions - but it may choose
to halt earlier).
If CurrentTimeLimit exceeds
, then
it is replaced by
MaxTimeLimit
.
The time probability of the current program is defined by
.
Short runtimes are more likely than long
runtimes.
Instruction cycle and oracles.
A single step of the program interpreter works as follows:
if the InstructionPointer equals
(
),
then this is interpreted as the request for an oracle.
A primitive and the corresponding arguments
are chosen randomly from the set of legal
options (to be described below).
They are sequentially written onto the program tape, starting
from
.
and
are increased accordingly, to reflect
the growth of used program tape.
Then the new primitive gets executed
(except when growth beyond
halts the program).
If there is no oracle request:
if the InstructionPointer equals
, then if
the content
, the corresponding number
of arguments
and the corresponding legal argument ranges are looked
up and checked against
the contents of the
addresses following the current address.
If the instruction is ``syntactically correct'', it gets executed.
Otherwise the current program is halted.
If the executed primitive did not change the value
of the InstructionPointer (e.g. by causing a jump),
the InstructionPointer is set to point to the address
following the address of (the last argument of) the current instruction.
If an instruction was executed, CurrentRuntime is incremented.
If the CurrentTimeLimit is reached, the program is halted.
Runs, programs, and space probability. After initialization, the instruction cycle is repeated until a halt situation is encountered. The space probability of a program is defined as the product of the probabilities of all arguments and primitives requested and executed during its runtime. Essentially, the space probability is the probability of guessing the executed content of the program tape.
Probabilistic search. Programs are generated randomly and executed as described above, and their results are evaluated until some problem-specific performance criterion is met. Obviously, results with low Levin complexity are preferred over results with high Levin complexity. (In a very similar alternative implementation, the original universal search algorithm was used to systematically generate all solution candidates in order of their Levin complexities. See also Heil's diploma thesis at TUM (1995)).
Used primitives.
The instruction numbers and the semantics
of the primitives
used in the experiments are listed below.
An expression of the form ``address
'' denotes the value (interpreted
as an address) found in the
th
cell following the one containing the current
instruction (indirect addressing is used throughout).
The following list assumes syntactical correctness of the instructions.
Rules for legal argument ranges and syntactical correctness
will be given shortly.
Rules for legal argument ranges and syntactical correctness.
Jumps may lead to any address in the dynamic range
(recall that
always equals the current value
of
).
Operations that read the contents of certain cells
(like add, move, jumpleq etc.)
may read only from addresses in
.
Operations that change the contents of certain cells
may write only into work tape addresses in
.
Thus, the program tape is ``read/execute'' only, except
for random writes requested by moves of the InstructionPointer
to
.
This makes reruns easy.
The work tape is ``read/write/execute''.
Results of arithmetic operations leading to
underflow or overflow are replaced by
or
,
respectively.
No more than 5 work tape cells may be Allocated or
Freed at a time.