Next: 3. INITIAL LEARNING ALGORITHM
Up: 2. THE `INTROSPECTIVE' NETWORK
Previous: 2. THE `INTROSPECTIVE' NETWORK
I assume that the input sequence observed by the network
has length
(where
) and
can be divided into equalsized blocks of
length during which the input pattern does not change.
This does not imply a loss of generality 
it just means speeding up the network's hardware such that
each input pattern is presented for timesteps before the
next pattern can be observed. This gives
the architecture timesteps to do some
sequential processing (including immediate weight changes)
before seeing a new pattern
of the input sequence.
In what follows, unquantized
variables are assumed to take on their maximal range. The network
dynamics are specified as follows:

(1) 
The network can quickly read information about
its current weights
into the special input unit according to

(2) 
where denotes Euclidean length,
and is a differentiable function emitting values
between 0 and 1 that determines
how close a connection address
has to be to the activations of the
analyzing units in order for its weight to contribute to
at that time. Such a function
might have a narrow peak at 1 around the origin and be zero (or
nearly zero)
everywhere else. This essentially allows the network to
pick out a single connection at a time
and obtain its current weight value without receiving
`crosstalk' from other weights.
The network can quickly modify its current weights using
and
according to

(3) 
Again, if has a narrow peak at 1 around the origin and is zero (or
nearly zero) everywhere else, the network will be able to
pick out a single connection at a time and change its weight without
affecting other weights.
Objective function and dynamics of the eval units.
As with typical supervised sequencelearning tasks,
we want to minimize
where

(4) 
Here may be a desired target value for the th
output unit at time step .
Next: 3. INITIAL LEARNING ALGORITHM
Up: 2. THE `INTROSPECTIVE' NETWORK
Previous: 2. THE `INTROSPECTIVE' NETWORK
Juergen Schmidhuber
20030221
Back to Metalearning page
Back to Recurrent Neural Networks page