Wiering and Schmidhuber (1996)
describe additional experiments and comparisons.
They show that
Levin search can be useful not only for supervised learning,
but also for more general reinforcement learning, even for difficult,
*partially observable* Markov decision problems (POMDPs).
For example, experiments in [Wiering and Schmidhuber, 1996] show
that LS can solve partially observable mazes (POMs)
involving more states and obstacles
than certain POMs solved by various previous authors.
LS also can solve tasks that cannot be solved
by standard reinforcement learning techniques
such as Q-learning (Watkins, 1989).

For instance, one simulation involves a -maze with a single start position (S) and a single goal position (G). There is a a low-complexity shortest path from S to G involving 127 steps. The goal is to find a program that makes an agent move from S to G. Programs are composed from primitives that greatly limit the agent's sensing capabilities: it can implicitly perceive only one of three highly ambiguous types of input, namely ``front is blocked or not'', ``the left field is free or not'', ``the right field is free or not.'' Hence, from the agent's perspective, the task is a difficult POMDP. Different Q-learning variants were tried but failed to solve the problem. In contrast, LS was indeed able to solve the POMDP by discovering a program computing the shortest path (details in Wiering and Schmidhuber, 1996).

Numerous additional experiments were performed by Norbert Jankowski and Stefan Heil at Technische Universität München. Some are described, e.g., in Heil's MSc thesis (1995).

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page