With architecture 1, this is essentially done by back-propagating
error signals
(e.g. [Werbos, 1974], [Parker, 1985], [LeCun, 1985],
[Rumelhart et al., 1986])
through copies of the evaluator modules down into the subgoal generator.
Loosely speaking, each subgoal `receives error signals from
two adjacent copies of
'. These error signals are added and
flow down into
, where they cause appropriate weight changes.
One might say that in general two `neighboring' evaluator copies (see figure 2)
tend to pull their common subgoal into different directions.
The iterative process stops when a local or global minimum of (3)
is found. This corresponds to an `equilibrium' of the partly conflicting
forces originating from different evaluator copies.
The derivation of the more complex algorithm for the recurrent architecture 2 is analoguous to the derivation of conventional discrete time recurrent net algorithms (e.g. [Robinson and Fallside, 1987], [Williams, 1989], (Williams and Zipser, in press), [Schmidhuber, 1992]).