next up previous


Throughout the remainder of this paper, to save indices, I consider a single limited pre-specified time-interval of discrete time-steps during which our network interacts with its environment. An interaction sequence actually may be the concatenation of many `conventional' training sequences for conventional recurrent networks. This will (in theory) help our `self-referential' weight matrix to find regularities among solutions for different tasks. The network's output vector at time $t$, $o(t)$, is computed from previous input vectors $x(\tau), \tau < t$, by a discrete time recurrent network with $n_I $ input units and $n_y $ non-input units. A subset of the non-input units, the `normal' output units, has a cardinality of $n_o < n_y$.

$z_k$ is the $k$-th unit in the network. $y_k$ is the $k$-th non-input unit in the network. $x_k$ is the $k$-th `normal' input unit in the network. $o_k$ is the $k$-th `normal' output unit. If $u$ stands for a unit, then $f_u$ is its differentiable activation function and $u$'s activation at time $t$ is denoted by $u(t)$. If $v(t)$ stands for a vector, then $v_k(t)$ is the $k$-th component of $v(t)$.

Each input unit has a directed connection to each non-input unit. Each non-input unit has a directed connection to each non-input unit. There are $(n_I + n_y) n_y = n_{conn}$ connections in the network. The connection from unit $j$ to unit $i$ is denoted by $w_{ij}$. For instance, one of the names of the connection from the $j$-th `normal' input unit to the the $k$-th `normal' output unit is $w_{o_kx_j}$. $w_{ij}$'s real-valued weight at time $t$ is denoted by $w_{ij}(t)$. Before training, all weights $w_{ij}(1)$ are randomly initialized.

The following features are needed to obtain `self-reference'. Details of the network dynamics follow in the next section.

1. The network receives performance information through the eval units, which are special input units. $eval_k$ is the $k$-th eval unit (of $n_{eval}$ such units) in the network.

2. Each connection of the net gets an address. One way of doing this is to introduce a binary address, $adr(w_{ij})$, for each connection $w_{ij}$. This will help the network to do computations concerning its own weights in terms of activations, as will be seen later.

3. $ana_k$ is the $k$-th analyzing unit (of $ n_{ana} = ceil(log_2 n_{conn})$ such units, where $ceil(x)$ returns the first integer $\geq x$). The analyzing units are special non-input units. They serve to indicate which connections the current algorithm of the network (defined by the current weight matrix plus the current activations) will access next (see next section). A special input unit for reading current weight values that is used in conjunction with the analyzing units is called $val$.

4. The network may modify any of its weights. Some non-input units that are not `normal' output units or analyzing units are called the modifying units. $mod_k$ is the $k$-th modifying unit (of $n_{mod} = ceil(log_2 n_{conn})$ such units). The modifying units serve to address connections to be modified. A special output unit for modifying weights (used in conjunction with the modifying units, see next section) is called $\bigtriangleup$. $f_{\bigtriangleup}$ should allow both positive and negative activations $\bigtriangleup(t)$.

next up previous
Juergen Schmidhuber 2003-02-21

Back to Metalearning page
Back to Recurrent Neural Networks page