The first module is a `program executer' , which may be a
neural net (but does not have to be one).
With a given problem ,
emits a sequence of actions in response to its input vector,
the `problem name' .
Here `' denotes the concatenation operator for vectors.
We assume (1) that there are problems for which does not
`know' solutions with minimal costs but (2) that there also are many
problems for which *does* `know' appropriate action sequences
(otherwise our method will not provide additional efficiency).
may have learned this by a conventional learning
algorithm - or possibly even by a recursive application of the
principle outlined below.

The second module is the evaluator .
's input can be the concatenation
of two states and .
's non-negative output
is interpreted as a
prediction of the *costs* ( negative
reinforcement) for an action sequence (known by ) leading from
to . means minimal expected costs.

represents a model of 's current abilities. For the purposes of this paper, we need not specify the details of - it may be an adaptive network (like in [Schmidhuber, 1991a]) as well as any other mapping whose output is differentiable with respect to the input.

The third module is the module of interest: the
*adaptive subgoal generator S*.
is supposed to learn to emit a list of appropriate
subgoals in response to a novel start/goal combination.
Section 4 will present two architectures for - one
for simultaneous generation of all subgoals,
the other one for sequential generation of the subgoal list.

The -th sub-goal of the list
(
)
is denoted by the vector
, its
-th
component by .
We set
.
Ideally, after training the subgoal-list
should fulfill the following condition:

(2) |

Back to Subgoal learning - Hierarchical Learning

Pages with Subgoal learning pictures