next up previous
Next: ACKNOWLEDGEMENTS Up: ratio Previous: SUPERVISED LEARNING ALGORITHM

CONCLUDING REMARKS

I have described a novel fully recurrent network that may choose to behave like a conventional fully recurrent net. In addition, however, the novel net may choose to use its own weights for storing temporal events. The network can do so by creating and directing `internal spotlights of attention' to cause intra-sequence weight changes that may help the system to achieve its goal (defined by a conventional objective function for supervised sequence learning). The corresponding exact gradient based learning algorithm turns out to have the same ratio between number of learning operations per time step and number of time-varying variables as the time-efficient BPTT algorithm. In addition, it turns out to have the same ratio between maximum storage and number of time-varying variables as the space-efficient RTRL algorithm.

Of course, since fast weights and unit activations are different kinds of variables, I am subsuming different things under the expression `time-varying variable'. I expect that some problems may be more naturally solved using information processing based on time-varying unit activations, other problems may be more naturally solved using information processing based on fast weights (e.g. certain kinds of temporal variable binding problems, see [5]). Careful experimental investigations of the mutual advantages and disadvantages of both kinds of time-varying variables are needed (experiments are also needed for analyzing different reasonable choices for functions like $g$, $h$, $\sigma$) but are beyond the scope of this short paper and will be left for the future.


next up previous
Next: ACKNOWLEDGEMENTS Up: ratio Previous: SUPERVISED LEARNING ALGORITHM
Juergen Schmidhuber 2003-02-21


Back to Recurrent Neural Networks page