next up previous
Next: RELATION TO PREVIOUS WORK Up: ALTERNATIVE DEFINITIONS OF Previous: INFOMAX

PREDICTABILITY MINIMIZATION

Schmidhuber (1992) shows how $D_l$ can be defined with the help of intra-representational adaptive predictors that try to predict each output unit of some $T_l$ from its remaining output units, while each output unit in turn tries to extract properties of the environment that allow it to escape predictability. This was called the principle of predictability minimization. This principle encourages each output unit of $T_l$ to represent environmental properties that are statistically independent from environmental properties represented by the remaining output units. The procedure aims at generating binary `factorial codes' [Barlow et al., 1989]. It is our preferred method, because (unlike the methods used by Linsker (1988), Becker and Hinton (1989), and Zemel and Hinton (1991) ) it has a potential for removing even non-linear statistical dependencies3among the output units of some classifier.

Let us define

\begin{displaymath}
\bar{D_l} = -\frac{1}{2} \sum_i(s_i^{p,l} - y_i^{p,l})^2,
\end{displaymath} (8)

where the $s_i^{p,l}$ are the outputs of $S^i_l$, the $i$-th additional so-called intra-representational predictor network of $T_l$ (one such additional predictor network is required for each output unit of $T_l$). The goal of $S^i_l$ is to emit the conditioned expectation of $y_i^{p,l}$ given $\{ y_k^{p,l},~~ k \neq i \}$. This goal is achieved by simply training $S^i_l$ to predict $y_i^{p,l}$ from $\{ y_k^{p,l},~~ k \neq i \}$. See figure 1.

To encourage even distributions in output space, we slightly modify $\bar{D_l}$ by introducing a term similar to the one in equation (5), subsection 2.1 and obtain

\begin{displaymath}
D_l = - \frac{1}{2}
\sum_i(s_i^{p,l} - y_i^{p,l})^2 + \frac{\lambda}{2}
\sum_i (0.5 - \bar{y_i}^l)^2.
\end{displaymath} (9)


next up previous
Next: RELATION TO PREVIOUS WORK Up: ALTERNATIVE DEFINITIONS OF Previous: INFOMAX
Juergen Schmidhuber 2003-02-13