**Predictability minimization (PM).**
The simple approach in this paper is based on the recent
principle of *predictability minimization* (PM)
[Schmidhuber, 1992].
A feedforward network with output units (or code units) sees
input patterns with redundant components.
Its goal is to respond with
informative but
less redundant
output patterns,
ideally by creating a factorial
(statistically nonredundant)
code of the input ensemble [Barlow et al., 1989].
The central idea of PM is:
*for each code unit, there is a
predictor network that tries to predict the code unit
from the remaining code units.
But each code unit tries to become
as unpredictable as possible,
by representing
environmental properties
that are independent from those
represented by other code units.*
Predictors and code units co-evolve by fighting
each other. See details in section 2.

**Potential advantages** of
PM over other methods are:
(1) Unlike certain inherently sequential methods (e.g.
Rubner and Schulten, 1990),
PM
can be implemented in a parallel way.
(2) Unlike e.g. with Barrow's model (1987),
there may be *many* simultaneously active code units
(multiple ``winners'' instead of single ``winners''), as
long as they represent *different* aspects of
the environment (distributed coding instead of local coding).
(3) Unlike e.g. with Linsker's INFOMAX (1988),
there is no need to compute the derivatives of determinants of covariance
matrices.
(4) Unlike, e.g.,
Deco's and Obradovic's system (1996),
Földiák's system (1990),
Rubner and Tavan's system (1989),
and anti-Hebbian systems in general,
PM does neither
require time consuming settling phases (due to recurrent
connections), nor
analytic computation of weight vectors.
(5) Unlike almost all other methods,
PM
has a potential to
discover *nonlinear* redundancy in the input data, and to
generate appropriate redundancy-free codes.
(6) Unlike most other ``neural'' methods (see references above),
existing variants of
PM create *binary*
codes as opposed to *continuous* codes. This (a) allows
for easier post training analysis and (b) facilitates
the creation of *statistically independent* code components
as opposed to merely *decorrelated* code components.
Note that statistical independence implies decorrelation. But
decorrelation does not imply statistical independence.
**Why are statistically independent code components
of interest?** One of many important reasons is this:
for efficiency reasons, most statistical
classifiers (e.g. Bayesian
pattern classifiers) assume statistical independence of
their input variables (corresponding to the pattern components).
If we had a method that takes
an arbitrary pattern ensemble and generates
an equivalent factorial code, the latter
could be fed into
an efficient conventional classifier, which in turn could achieve
its theoretically optimal performance.

**Purpose of paper.**
Despite its potential advantages,
PM has been tested
on artificial data only
(Lindstädt, 1993; Schmidhuber, 1993, 1994).
To start a more thorough experimental analysis,
in this paper we study the question:
what happens if we apply a computationally simple,
entirely local, highly parallel, and even biologically plausible
variant of PM to real world images?
An intuitively reasonable first step towards representing images in
a less redundant way (and one adopted by standard image
processing techniques, but apparently also by
early visual processing stages of
biological systems) is to build compact
representations based on information about
boundaries (*edges*)
between areas with nonvarying,
redundant pixel activations.
Since PM aims at generating codes
with reduced redundancy, we may expect it to discover
related ways of coding visual scenes, by
creating feature detectors responsive to edges or similar
informative features in the input
scenes. Moreover, since edge detectors (as well as other, related
useful feature detectors such as on-center-off-surround detectors)
can be implemented with a single
layer of neuronal units, we already may expect a single layer system
to come up with such detectors.
This paper reports a confirmation of this expectation,
thus demonstrating that PM
makes sense not only intuitively and in theory,
but also in practical
applications. The results encourage us to expect that the method
also will be beneficial for large scale applications, by
extracting more sophisticated, nonlinear, useful features in
deeper layers. Due to our current hardware limitations, however,
a test of this hypothesis is left for future research.

**Outline.**
Section 2 briefly reviews the principles of PM
in more detail.
Section 3 applies the technique
to real world images and presents results.

Back to Independent Component Analysis page.