In a sense, searching is less general than reinforcement learning because it does not necessarily involve predictions of unseen data. Still, search is a central aspect of computer science (and any reinforcement learner needs a searcher as a submodule--see Section 10). Surprisingly, however, many books on search algorithms do not even mention the following, very simple asymptotically optimal, ``universal'' algorithm for a broad class of search problems.
Define a probability distribution on a finite or infinite set of programs for a given computer. represents the searcher's initial bias (e.g., could be based on program length, or on a probabilistic syntax diagram).
Method LSEARCH: Set current time limit T=1. WHILE problem not solved DO:LSEARCH (for Levin Search) may be the algorithm Levin was referring to in his 2 page paper  which states that there is an asymptotically optimal universal search method for problems with easily verifiable solutions, that is, solutions whose validity can be quickly tested. Given some problem class, if some unknown optimal program requires steps to solve a problem instance of size , then LSEARCH will need at most steps -- the constant factor may be huge but does not depend on . Compare [31, p. 502-505] and  and the fastest way of computing all computable universes in Section 6.Test all programs such that , the maximal time spent on creating and running and testing , satisfies . Set
Recently Hutter developed a more complex asymptotically optimal search algorithm for all well-defined problems, not just those with with easily verifiable solutions . HSEARCH cleverly allocates part of the total search time for searching the space of proofs to find provably correct candidate programs with provable upper runtime bounds, and at any given time focuses resources on those programs with the currently best proven time bounds. Unexpectedly, HSEARCH manages to reduce the unknown constant slowdown factor of LSEARCH to a value of , where is an arbitrary positive constant.
Unfortunately, however, the search in proof space introduces an unknown additive problem class-specific constant slowdown, which again may be huge. While additive constants generally are preferrable over multiplicative ones, both types may make universal search methods practically infeasible.
HSEARCH and LSEARCH are nonincremental in the sense that they do not attempt to minimize their constants by exploiting experience collected in previous searches. Our method Adaptive LSEARCH or ALS tries to overcome this  -- compare Solomonoff's related ideas [61,62]. Essentially it works as follows: whenever LSEARCH finds a program that computes a solution for the current problem, 's probability is substantially increased using a ``learning rate,'' while probabilities of alternative programs decrease appropriately. Subsequent LSEARCHes for new problems then use the adjusted , etc. A nonuniversal variant of this approach was able to solve reinforcement learning (RL) tasks  in partially observable environments unsolvable by traditional RL algorithms [71,57].
Each LSEARCH invoked by ALS is optimal with respect to the most recent adjustment of . On the other hand, the modifications of themselves are not necessarily optimal. Recent work discussed in the next section overcomes this drawback in a principled way.