where denotes the weight vector of . During each training iteration, has to be changed in proportion to this gradient.

With architecture 1, this is essentially done by back-propagating error signals (e.g. [Werbos, 1974], [Parker, 1985], [LeCun, 1985], [Rumelhart et al., 1986]) through copies of the evaluator modules down into the subgoal generator. Loosely speaking, each subgoal `receives error signals from two adjacent copies of '. These error signals are added and flow down into , where they cause appropriate weight changes. One might say that in general two `neighboring' evaluator copies (see figure 2) tend to pull their common subgoal into different directions. The iterative process stops when a local or global minimum of (3) is found. This corresponds to an `equilibrium' of the partly conflicting forces originating from different evaluator copies.

The derivation of the more complex algorithm for the recurrent architecture 2 is analoguous to the derivation of conventional discrete time recurrent net algorithms (e.g. [Robinson and Fallside, 1987], [Williams, 1989], (Williams and Zipser, in press), [Schmidhuber, 1992]).

Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning

Pages with Subgoal learning pictures