Consider figure 1.
An `animat' moves in the real plane defined by the and axis,
producing a trajectory in
. There are obstacles in the form of circular
swamps. As long as the `animat' does not cross a swamp,
there are no *costs*^{1}
( negative
reinforcement). The -th swamp
with center
builds the basis of a
cone (growing in the third dimension) with tip
.
Crossing
costs

(1) |

A problem is defined by
a start state
and a goal state .
In the above example, - start states and goal
states are simply given by pairs of cartesian coordinates.
We are looking for an action sequence
leading from to
*with minimal costs*.

It is true that in theory such sequences could be learned by conventional
reinforcement learning algorithms
(e.g. [Barto, 1989], [Barto et al., 1983],
[Anderson, 1986], [Schmidhuber, 1991b], [Sutton, 1984],
[Lin, 1991], [Williams, 1988], [Watkins, 1989]).
For the sake of argument,
assume that the maximal step size of the `animat' is just
a tiny fraction of the obstacle diameter.
Then all the above algorithms will take nearly
forever to find appropriate cost-free trajectories for other than
trivial start/goal combinations. One drawback of
conventional algorithms is that they will
try to learn each new task from scratch, instead of
exploiting a possibility for speeding up learning
and gaining efficiency
by solving new tasks through *composition*
of solutions for older tasks.

Back to Subgoal learning - Hierarchical Learning

Pages with Subgoal learning pictures