Given are
disks of
different sizes,
stacked in decreasing size on the first of three pegs.
One may move some peg's top disk to the top of another peg,
one disk at a time, but never a larger disk onto a smaller. The goal
is to transfer all disks to the third peg. Remarkably,
the fastest way of solving this famous problem requires
moves
.
The problem is of the reward-only-at-goal type --
given some instance of size
, there is no intermediate reward for
achieving instance-specific subgoals.
The exponential growth of minimal solution size is what
makes the problem interesting: Brute force methods searching
in raw solution space will quickly fail as
increases.
But the rapidly growing
solutions do have something in common, namely, the short algorithm
that generates them. Smart searchers will exploit such
algorithmic regularities.
Once we are searching in general algorithm space,
however, it is essential to efficiently allocate time
to algorithm tests. This is what OOPS does, in
near-bias-optimal incremental fashion.
Untrained humans find it hard to solve instances
.
[1]
applied traditional reinforcement learning methods
and was able to solve instances up to
, solvable within
at most 7 moves. [28] used
learning production systems and was able to solve instances up to
,
solvable within at most 31 moves.
(Side note:
[3] also applied an alternative reinforcement
learner based on the artificial economy by [18]
to a simpler 3 peg blocks world problem
where any disk may be placed on any other;
thus the required number of moves grows only
linearly with the number of disks, not exponentially;
[27] were able to
replicate their results for
up to 5.)
Traditional AI planning procedures--compare
chapter V by [43] and [25]--do not
learn but systematically explore all possible move combinations,
using only absolutely necessary task-specific primitives
(while OOPS will later use more than 70 general instructions, most of
them unnecessary). On current personal computers
AI planners tend to fail to solve Hanoi problem instances with
due to the exploding search space (Jana Koehler, IBM Research,
personal communication, 2002).
OOPS, however, searches program space instead of
raw solution space. Therefore, in principle it should be able to solve
arbitrary instances by discovering
the problem's elegant recursive solution--given
and
three pegs
(source peg, auxiliary peg, destination peg),
define the following procedure:
HANOI(S,A,D,n): IFexit; ELSE DO:
call HANOI(S, D, A, n-1);
move top disk from S to D;
call HANOI(A, S, D, n-1).