As mentioned above, LS is not necessarily optimal
for ``incremental'' learning problems where experience
with previous problems may
help to reduce future search costs.
To make an incremental search method out of non-incremental LS,
we introduce a simple, heuristic, adaptive LS extension (ALS) that
uses experience with previous problems to
adaptively modify LS' underlying probability distribution.
ALS essentially works as follows: whenever
LS found a program
that computed a solution for
the current problem, the
probabilities of
's instructions
are increased (here
denotes
's
-th instruction,
and
denotes
's length -- if LS did not find a solution
(
is the empty program),
then
is defined to be 0).
The probability adjustment is controlled by a learning rate
(
). ALS is related to the linear
reward-inaction algorithm (e.g., [Kaelbling, 1993]) -- the main difference
is: ALS uses LS to search through program space as opposed to
single action space.
As in section 2, the probability distribution
is determined by
.
Initially, all
. However,
given a sequence of problems
,
the
may undergo changes caused by ALS:
ALS (problems
, variable matrix
)
for:= 1 to
do:
:= Levin search(
,
); Adapt(
,
).
Adapt(program
, variable matrix
)
for:= 1 to
,
:= 1 to
do:
if (=
) then
![]()
else![]()
Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 5), there is no theoretical proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks - sometimes ALS may produce harmful instead of beneficial results. To address this issue, in the next section we augment ALS by a recent backtracking technique called ``Environment-Independent Reinforcement Acceleration'' (EIRA). EIRA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.