For each representational unit
there corresponds
an adaptive
predictor
, which, in general, is non-linear.
With the
-th input pattern
,
's
input is the concatenation
of the outputs
of
all units
.
's one-dimensional output
is trained to equal the expectation
.
It is well-known that this can be achieved by
letting
minimize2
![]() |
(1) |
With the help of the
predictors one can define various
objective functions for the representational modules to
enforce the 3 criteria listed above (see section 4 and section 5).
Common to these methods is that
all units are trained to take on values that
minimize mutual predictability via the predictors:
Each unit tries to extract features from the environment
such that no combination of
units conveys information
about the remaining unit. In other words,
no combination of
units should allow better
predictions of the remaining unit than a prediction based
on a constant.
I call this the principle
of intra-representational predictability minimization or, somewhat
shorter,
the principle of predictability minimization.
A major novel aspect of this principle which makes it different from previous work is that it uses adaptive sub-modules (the predictors) to define the objective functions for the subjects of interest, namely, the representational units themselves.
Following the principle of predictability minimization, each representational module tries to use the statistical properties of the environment to protect itself from being predictable. This forces each representational module to focus on aspects of the environment that are independent of environmental properties upon which the other modules focus.