``Only math nerds would consider
finite.''
(LEONID LEVIN)
HSEARCH and LSEARCH (Sections 2.2, 2.3) neglect one potential source of speed-up: they are nonincremental in the sense that they do not attempt to minimize their constant slowdowns by exploiting experience collected in previous searches for solutions to earlier tasks. They simply ignore the constants -- from an asymptotic point of view, incremental search does not buy anything.
A heuristic attempt
[64,65]
to greatly reduce the constants through experience
was called Adaptive LSEARCH or ALS
-- compare related ideas by
Solomonoff ([69,70]).
Essentially ALS works as follows: whenever LSEARCH
finds a program
that computes a solution for the current problem,
's probability
is
substantially increased using a ``learning rate,''
while probabilities of alternative programs decrease
appropriately.
Subsequent LSEARCHes for new problems then use the adjusted
, etc.
[65] and [80]
used a nonuniversal variant of this approach to solve
reinforcement learning (RL) tasks
in partially observable environments
unsolvable by traditional RL
algorithms.
Each LSEARCH invoked by ALS is bias-optimal with respect to
the most recent adjustment of
.
On the other hand, the rather arbitrary
-modifications themselves
are not necessarily optimal. They might
lead to overfitting in the following sense: modifications of
after the
discovery of a solution to problem 1 could actually be harmful and slow
down the search for a solution to problem 2, etc.
This may provoke a loss of near-bias-optimality with respect to the initial
bias during exposure to subsequent tasks. Furthermore, ALS has a fixed
prewired method for changing
and cannot improve this method by experience.
The main contribution of this paper is
to overcome all such drawbacks in a principled way.