Next: EIRA FOR ALS Up: Solving POMDPs with Levin Previous: LEVIN SEARCH (LS)

# ADAPTIVE LS (ALS)

As mentioned above, LS is not necessarily optimal for incremental'' learning problems where experience with previous problems may help to reduce future search costs. To make an incremental search method out of non-incremental LS, we introduce a simple, heuristic, adaptive LS extension (ALS) that uses experience with previous problems to adaptively modify LS' underlying probability distribution. ALS essentially works as follows: whenever LS found a program that computed a solution for the current problem, the probabilities of 's instructions are increased (here denotes 's -th instruction, and denotes 's length -- if LS did not find a solution ( is the empty program), then is defined to be 0). The probability adjustment is controlled by a learning rate ( ). ALS is related to the linear reward-inaction algorithm (e.g., [Kaelbling, 1993]) -- the main difference is: ALS uses LS to search through program space as opposed to single action space. As in section 2, the probability distribution is determined by . Initially, all . However, given a sequence of problems , the may undergo changes caused by ALS:

ALS (problems , variable matrix )

for := 1 to do:
:= Levin search(, ); Adapt(, ).

where the procedure Adapt works as follows:

Adapt(program , variable matrix )

for := 1 to , := 1 to do:
if ( = ) then
else

Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 5), there is no theoretical proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks - sometimes ALS may produce harmful instead of beneficial results. To address this issue, in the next section we augment ALS by a recent backtracking technique called Environment-Independent Reinforcement Acceleration'' (EIRA). EIRA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.

Next: EIRA FOR ALS Up: Solving POMDPs with Levin Previous: LEVIN SEARCH (LS)
Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page