For each representational unit there corresponds
an adaptive
predictor , which, in general, is non-linear.
With the -th input pattern ,
's
input is the concatenation
of the outputs of
all units .
's one-dimensional output
is trained to equal the expectation
.
It is well-known that this can be achieved by
letting minimize^{2}

(1) |

With the help of the predictors one can define various
objective functions for the representational modules to
enforce the 3 criteria listed above (see section 4 and section 5).
Common to these methods is that
all units are trained to take on values that
*minimize mutual predictability* via the predictors:
Each unit tries to extract features from the environment
such that no combination of units conveys information
about the remaining unit. In other words,
no combination of units should allow better
predictions of the remaining unit than a prediction based
on a constant.
I call this the *principle
of intra-representational predictability minimization* or, somewhat
shorter,
the *principle of predictability minimization*.

A major novel aspect of this principle which makes it different from previous work is that it uses adaptive sub-modules (the predictors) to define the objective functions for the subjects of interest, namely, the representational units themselves.

Following the principle of predictability minimization, each representational module tries to use the statistical properties of the environment to protect itself from being predictable. This forces each representational module to focus on aspects of the environment that are independent of environmental properties upon which the other modules focus.

Back to Independent Component Analysis page.