During training, the images were
randomly presented according to the
probabilities of English language.
The unsupervised system had
150 input units, 16 code units, and 1 ``bias'' unit.
Each predictor had
15 input units, 1 ``bias'' unit, and 1 output unit.
The learning rate of the predictors was 10 times as high
as the learning rate of the code units.
Within 10000 pattern presentations,
the system often learned to generate a
loss-free code of the ensemble such that the
code was much less redundant than the original data.
The redundancy (see the definition in section 1.2)
corresponding to the original DEC dataset
is
.
The redundancy corresponding to a 16-bit code discovered
by the system is
. See
[14], [13], and
[24]
for details.
This result corresponds to a dramatic reduction of redundant information, although the achieved value is not optimal. In many realistic cases, however, approximations of nonredundant codes should be satisfactory. It is intended to apply the method to the problem of unsupervised segmentation of real world images. See [30] for an application to simple stereo vision.
One might speculate about whether the brain uses a similar principle based on ``code neurons'' trying to escape the predictions of ``predictor neurons''.