**The task.**
A linear (perceptron-like) network with 100 input units,
one output unit, and 100 weights,
is fed with 100-dimensional binary input vectors.
denotes the -th input vector.
denotes the th component
of , where ranges from 0 to 99.
Each input vector has exactly three bits
set to one, all the other bits are set to zero.
Obviously, there are
possible inputs.
The network's output in response to is

where is the -th weight. Each weight may take on integer values between -10000 and 10000. The task is to find weights such that equals the number of

**The solution.**
The only solution to the problem is: make all
equal to 1. The Kolmogorov complexity of
this solution is small, since there is a short
program that computes it.
Its Levin complexity is small, too, since its ``logical
depth'' (the runtime of its shortest program
(Bennett, 1988))
is less than 400 time steps.

**The training data.**
To illustrate the generalization capability
of search for solution candidates with low Levin complexity, **only 3
training examples are used**.
They were randomly chosen from the
possible inputs.
The first training example is the binary vector with *on*-bits
at the positions 5, 17, and 86 (and *off*-bits everywhere else).
The second one, , has *on*-bits at the positions 13, 55, and 58.
The third one, , has *on*-bits at the positions 40, 87, and 94.
In all three cases, the desired output (target) is 3.

**Why conventional neural net algorithms fail
to solve this problem.**
Since the training set is very small,
conventional perceptron algorithms will not
solve this apparently simple problem.
They will not achieve good generalization
on unseen test data. One reason is that connections from
units that are always off won't be changed at all by
gradient descent algorithms. Note, however, that scaling the inputs
differently is not going to improve matters.
Nor is weight decay.
Weight decay encourages weight matrices with many zero entries.
For the current task, this is a bad strategy.

**Probabilistic search.**
The search procedure is as follows:
the probabilistic search algorithm
(as described in section 3)
lists and executes programs computing solution candidates (weight vectors).
The primitive *``WriteWeight''*
(replacing *``output''*, see section 3)
is used for writing network weights. It has one argument and
uses the variable *WeightPointer* taking on values
from the set
.
In the beginning of a run,
*WeightPointer*
and all weights
are initialized to 0.
The instruction number and the semantics
of *``WriteWeight''* are as follows (compare the list of primitives
given in section 3):

- 1
*WriteWeight(address)*. is set equal to the contents of*address*^{2}. The variable*WeightPointer*is incremented. Halt if*WeightPointer*out of range.

**Only if the solution candidate fits
the training data exactly is
the solution tested on the
test data.**
Note that this is like a ``reward-only-at-goal'' task:
the measure of success is binary - either the network
fits all the training data, or it doesn't.
There is no teacher providing a more informative
error signal (such as the distance to the desired outputs).

**RESULTS.**
Programs generating networks fitting the 3 training exemplars
were found in 20 out of 100000 runs.
*18 of these 20 led to perfect generalization on the
unseen test examples.*
Typical programs used (conditional) jumps
to generate correct solutions. See
(Schmidhuber, 1994a)
for details.

**Comment.**
With the task above, probabilistic
search among self-sizing programs
leads to excellent generalization performance.
At least in theory, however, it might be possible
that an appropriate variant of Nowlan's and Hinton's
approach (1992) might achieve good generalization
performance on this task, too.
Nowlan and Hinton
encourage groups of weights
with equal values, which is a good strategy in the case above.
For this reason, the following task requires
*that no two weights have equal values*.
The Kolmogorov complexity of the solution, however, will again be low.

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page