Wiering and Schmidhuber (1996) describe additional experiments and comparisons. They show that Levin search can be useful not only for supervised learning, but also for more general reinforcement learning, even for difficult, partially observable Markov decision problems (POMDPs). For example, experiments in [Wiering and Schmidhuber, 1996] show that LS can solve partially observable mazes (POMs) involving more states and obstacles than certain POMs solved by various previous authors. LS also can solve tasks that cannot be solved by standard reinforcement learning techniques such as Q-learning (Watkins, 1989).
For instance, one simulation involves a -maze with a single start position (S) and a single goal position (G). There is a a low-complexity shortest path from S to G involving 127 steps. The goal is to find a program that makes an agent move from S to G. Programs are composed from primitives that greatly limit the agent's sensing capabilities: it can implicitly perceive only one of three highly ambiguous types of input, namely ``front is blocked or not'', ``the left field is free or not'', ``the right field is free or not.'' Hence, from the agent's perspective, the task is a difficult POMDP. Different Q-learning variants were tried but failed to solve the problem. In contrast, LS was indeed able to solve the POMDP by discovering a program computing the shortest path (details in Wiering and Schmidhuber, 1996).
Numerous additional experiments were performed by Norbert Jankowski and Stefan Heil at Technische Universität München. Some are described, e.g., in Heil's MSc thesis (1995).