next up previous
Next: A Possible Extension to Up: The Neural Bucket Brigade Previous: The Neural Bucket Brigade

The NBB and Temporal Difference Methods

It seems to be unlikely that the NBB performs gradient descent in some sensible global error measure. However, Sutton's temporal difference (TD-) methods [Sutton, 1988] (a generalization of both gradient descent methods and an old principle proposed by Samuel [Samuel, 1959]) might offer a framework for analyzing the NBB's convergence properties.

Following Sutton's discussion of relations between the bucket brigade for classifier systems and TD-methods, at a given time the strength of a connection $w_{ij}$ leading to an active unit $j$ (or the fraction of its contribution, $\lambda c_{ij}$) may be interpreted as a prediction of the weight substance it will receive. This prediction recursively depends on the predictions of weights that will be active at later time ticks. Thus $w_{ij}$ also predicts the ultimate environmental payoff, which terminates the recursion. A dynamic equilibrium of weight flow means that predictions meet reality.

Unfortunately, the competitive element introduced by the winner-take-all-networks makes an analysis of the NBB anything but straight-forward. The same holds in the case of classifier systems: Nobody so far has proven a theorem that demonstrates that the bucket brigade mechanism must work as desired.


next up previous
Next: A Possible Extension to Up: The Neural Bucket Brigade Previous: The Neural Bucket Brigade
Juergen Schmidhuber 2003-02-21


Back to Reinforcement Learning Economy page
Back to Recurrent Neural Networks page