The first module is a `program executer'
, which may be a
neural net (but does not have to be one).
With a given problem
,
emits a sequence of actions in response to its input vector,
the `problem name'
.
Here `
' denotes the concatenation operator for vectors.
We assume (1) that there are problems for which
does not
`know' solutions with minimal costs but (2) that there also are many
problems for which
does `know' appropriate action sequences
(otherwise our method will not provide additional efficiency).
may have learned this by a conventional learning
algorithm - or possibly even by a recursive application of the
principle outlined below.
The second module is the evaluator
.
's input can be the concatenation
of two states
and
.
's non-negative output
is interpreted as a
prediction of the costs (
negative
reinforcement) for an action sequence (known by
) leading from
to
.
means minimal expected costs.
represents a model of
's current abilities.
For the purposes of this paper,
we need not specify the details of
-
it may be an adaptive network
(like in
[Schmidhuber, 1991a])
as well as any other mapping whose output is differentiable
with respect to the input.
The third module is the module of interest: the
adaptive subgoal generator S.
is supposed to learn to emit a list of appropriate
subgoals in response to a novel start/goal combination.
Section 4 will present two architectures for
- one
for simultaneous generation of all subgoals,
the other one for sequential generation of the subgoal list.
The
-th sub-goal of the list
(
)
is denoted by the vector
, its
-th
component by
.
We set
.
Ideally, after training the subgoal-list
should fulfill the following condition:
| (2) |