Next: ACKNOWLEDGEMENTS
Up: ratio
Previous: SUPERVISED LEARNING ALGORITHM
I have described a novel fully recurrent network that may choose to
behave like a conventional fully recurrent net. In addition, however,
the novel net may choose to use its own weights for storing temporal
events. The network can do so by
creating and directing `internal
spotlights of attention' to cause
intra-sequence weight changes that may help
the system to achieve its goal (defined by a conventional objective
function for supervised sequence learning).
The corresponding exact gradient based learning algorithm
turns out to have the same
ratio between number of learning operations per time step
and number of time-varying variables as the time-efficient BPTT algorithm.
In addition, it turns out to have the same ratio between maximum storage and
number of time-varying variables as the space-efficient RTRL algorithm.
Of course, since fast weights and unit activations are different
kinds of variables, I am subsuming different things under the expression
`time-varying variable'. I expect that some problems
may be more naturally solved using information processing based
on time-varying unit activations,
other problems may be more naturally solved
using information processing based
on fast weights (e.g. certain kinds of temporal variable binding problems,
see [5]).
Careful experimental investigations of
the mutual advantages and disadvantages of both kinds of
time-varying variables are needed
(experiments are also needed for analyzing different reasonable
choices for functions like
,
,
)
but are beyond the scope of this
short paper and will be left for the future.
Next: ACKNOWLEDGEMENTS
Up: ratio
Previous: SUPERVISED LEARNING ALGORITHM
Juergen Schmidhuber
2003-02-21
Back to Recurrent Neural Networks page