next up previous
Next: LIMITATIONS Up: PLANNING SIMPLE TRAJECTORIES USING Previous: ALGORITHMS

EXPERIMENTS

[Schmidhuber, 1991a] gives a simple example where the evaluator module $E$ itself is an adaptive back-prop network. In this section, however, we concentrate on the learning process of the subgoal generator $S$; the $eval$ function and its partial derivatives are computed analytically.

For illustration purposes, we assume that $C$ `knows' all possible action sequences leading to straight movements of the `animat', and that the costs of all these action sequences are already known by $E$. In that case it is easy to compute (1). The start of the $k$-th `sub-program' is $s^p(k) = (s_1^p(k), s_2^p(k))$, its end point is $s^p(k+1) = (s_1^p(k+1), s_2^p(k+1))$. (1) becomes equal to the area

\begin{displaymath}
F (s_1^p(k), s_2^p(k), s_1^p(k+1), s_2^p(k+1), \Phi_i)
\end{displaymath} (4)

defined by the trajectory of the `animat' and the corresponding parabola-like projection onto the cone. See again figure 1.

For the $k$-th `sub-program', $eval$ is defined as

\begin{displaymath}
eval((s^p(k), s^p(k+1)) =
\end{displaymath}


\begin{displaymath}
\sum_i F (s_1^p(k), s_2^p(k), s_1^p(k+1), s_2^p(k+1), \Phi_i) .
\end{displaymath} (5)

Consider figure 4. A single swamp has to be overcome by the `animat'. With 40 hidden nodes and a learning rate $\eta_S = 0.03$, a recurrent subgoal generator (architecture 2) needed 20 iterations to find a satisfactory solution.

Now consider figure 5. Multiple swamps separate the start from the goal. With 40 hidden nodes and a learning rate $\eta_S = 0.002$, a static subgoal generator (architecture 1) needed 22 iterations to find a satisfactory solution.


next up previous
Next: LIMITATIONS Up: PLANNING SIMPLE TRAJECTORIES USING Previous: ALGORITHMS
Juergen Schmidhuber 2003-03-14

Back to Subgoal learning - Hierarchical Learning
Pages with Subgoal learning pictures