The idea is to take a recurrent sub-goal generator which at a given
time step produces
only one sub-goal. At the next time step this sub-goal is fed back
to the start input of the same sub-goal generator (while the
goal input remains constant).
To adjust the weights of the sub-goal generator,
we can use an algorithm inspired by the
`back-propagation through time'-method: Successive sub-goals have
to be fed into copies of
as shown
in figure 3 (figure 3 shows the special case of three sub-goals).
Gradient descent requires to change
according to the sum
of all gradients computed for the various copies of
. (
Of course,
's
weight vector has to remain constant during
'
credit assignment phase.)
While unfolding the
system in time, it
is not necessary to build real copies of
and
. It suffices if
during activation spreading
each unit in
and
stores its time-varying activations on a stack,
from which they are popped during back-propagation phase.