Optimal Universal Search Algorithms

In a sense, searching is less general than reinforcement learning because it does not necessarily involve predictions of unseen data. Still, search is a central aspect of computer science (and any reinforcement learner needs a searcher as a submodule--see Sections 10 and 11). Surprisingly, however, many books on search algorithms do not even mention the following, very simple asymptotically optimal, ``universal'' algorithm for a broad class of search problems.

Define a probability distribution on a finite or infinite set of programs for a given computer. represents the searcher's initial bias (e.g., could be based on program length, or on a probabilistic syntax diagram).

Method LSEARCH: Set current time limit T=1. WHILE problem not solved DO:LSEARCH (forTest all programs such that , the maximal time spent on creating and running and testing , satisfies . Set

Recently Hutter
developed a more complex asymptotically
optimal search algorithm for *all* well-defined problems, not
just those with with easily verifiable solutions [23].
HSEARCH cleverly allocates part of the total
search time for searching the space of proofs to find provably correct
candidate programs with provable upper runtime bounds, and
at any given time
focuses resources on those programs with the currently
best proven time bounds.
Unexpectedly, HSEARCH manages to reduce
the unknown constant slowdown factor of LSEARCH to
a value of , where is an
arbitrary positive constant.

Unfortunately, however, the search in proof space
introduces an unknown *additive* problem class-specific
constant slowdown, which again may be huge.
While additive constants generally are preferrable over
multiplicative ones, both types may make universal
search methods practically infeasible.

HSEARCH and LSEARCH are nonincremental in the sense that
they do not attempt to minimize their constants by exploiting
experience collected in previous searches.
Our method
*Adaptive* LSEARCH or ALS
tries to overcome this
[60] -- compare Solomonoff's related ideas
[64,65].
Essentially it works as follows: whenever LSEARCH
finds a program that computes a solution for the current problem,
's probability is
substantially increased using a ``learning rate,''
while probabilities of alternative programs decrease
appropriately.
Subsequent LSEARCHes for new problems then use the adjusted
, etc. A nonuniversal variant of this approach
was able to solve reinforcement learning (RL) tasks
[27]
in partially observable environments
unsolvable by traditional RL
algorithms [74,60].

Each LSEARCH invoked by ALS is optimal with respect to the most recent adjustment of . On the other hand, the modifications of themselves are not necessarily optimal. Recent work discussed in the next section overcomes this drawback in a principled way.