Next: 4. CONCLUSION
Up: selfref
Previous: 2.1. `SELFREFERENTIAL' DYNAMICS AND
The following algorithm^{1}for minimizing
is partly inspired by (but more complex
than) conventional
recurrent network algorithms
(e.g. Robinson and Fallside [2]).
Derivation of the algorithm.
We use the chain rule to compute weight increments
(to be performed after each training sequence)
for all initial weights
according to

(5) 
where is a constant positive `learning rate'. Thus we obtain
an exact gradientbased algorithm for minimizing
under the `selfreferential' dynamics given by (1)(4).
To reduce writing effort, I introduce some shorthand notation
partly inspired by Williams
[7].
For all units and all weights , we write

(6) 
To begin with, note that

(7) 
Therefore, the remaining problem is to compute
the
, which can be done by incrementally
computing all
and
, as we will see.
We have

(8) 

(9) 
(where
is the th bit of 's address),

(10) 
where

(11) 

(12) 
According to (8)(12),
the and
can be updated incrementally at each time step.
This implies that
(5) can be updated incrementally at each time step, too.
The storage complexity is independent
of the sequence length and equals
. The
computational complexity per time step (of sequences with arbitrary
length) is
.
Next: 4. CONCLUSION
Up: selfref
Previous: 2.1. `SELFREFERENTIAL' DYNAMICS AND
Juergen Schmidhuber
20030221
Back to Metalearning page
Back to Recurrent Neural Networks page