The input ensemble considered in this subsection consists
of four different patterns denoted by
,
,
, and
,
respectively. The probabilities of the patterns were
code
:
,
,
,
.
With code
, the total objective function
becomes
.
A non-factorial but invertible (information-preserving) code
is given by
code
:
,
,
,
.
With code
,
, which is only
below
. This already indicates that
certain local maxima of the internal state's objective function
may be very close to the global maxima.
Experiment 1:
off-line,
,
,
distributed input representation with
,
,
,
,
1 hidden unit per predictor,
2 hidden units shared among the representational modules.
10 test runs with 2,000 epochs for the
representational modules were conducted.
Here one epoch consisted of the presentation of 9 patterns -
was presented once,
was presented twice,
was presented twice,
was presented four times.
In 7 cases, the system found a global maximum corresponding to a factorial code. In the remaining cases the code was not invertible.
Experiment 2 (Occam's Razor):
Like experiment 1, but with
.
In all but one of the 10 test runs the system
developed a factorial code (including
one unused unit).
In the remaining test run the code was at least invertible.
With local input representation and
,
, the success rate dropped below 50 percent.
With
, the system usually found invertible but
rarely factorial codes.
This reflects the fact that with certain input ensembles
there is a trade-off between redundancy and invertibility:
Superfluous degrees of freedom among the representational units
may increase the probability that an information-preserving
code is found, while at the same time decreasing the probability of finding
an optimal factorial code.