Section 2 will survey previous relevant work on general optimal search algorithms and introduce the concept of bias-optimality. Section 3 will use the framework of universal computers to explain OOPS and how OOPS benefits from incrementally extracting useful knowledge hidden in training sequences and in which sense OOPS is optimal. The remainder of the paper is devoted to ``Realistic OOPS'' which uses a recursive procedure for time-optimal planning and backtracking in program space to perform efficient storage management (Section 4) on realistic, limited computers. Section 5 discusses limitations of OOPS as well as extensions for reinforcement learning. Appendix A describes a pilot implementation of Realistic OOPS based on a stack-based universal programming language inspired by FORTH [36], with initial primitives for defining and calling recursive functions, iterative loops, arithmetic operations, domain-specific behavior, and for rewriting the search procedure itself.
Experiments in Section 6 use
the language of Appendix A to solve 60 tasks in a row:
we first teach OOPS something about recursion, by training it to
construct samples of the simple context free language
(
1's followed by
2's), for
up to 30.
This takes roughly 0.3 days on a standard personal computer (PC).
Thereafter, within a few additional days,
OOPS demonstrates the benefits of incremental knowledge transfer:
it exploits certain properties of its previously
discovered universal
-solver
to greatly accelerate the search for a universal solver
for all
disk Towers of Hanoi problems, solving all
instances up to
(solution size
).
Previous, less general reinforcement learners
and nonlearning AI planners
tend to fail for much smaller instances.