Previous autoassociators (AAs). Backprop-trained AAs without a narrow hidden bottleneck (``bottleneck'' refers to a hidden layer containing fewer units than other layers) typically produce redundant, continuous-valued codes and unstructured weight patterns. Baldi and Hornik (1989) studied linear AAs with a hidden layer bottleneck and found that their codes are orthogonal projections onto the subspace spanned by the first principal eigenvectors of a covariance matrix associated with the training patterns. They showed that the mean squared error (MSE) surface has an unique minimum. Nonlinear codes have been obtained by nonlinear bottleneck AAs with more than 3 (e.g., 5) layers, e.g., Kramer (1991), Oja (1991) or DeMers and Cottrell (1993). None of these methods produces sparse, factorial or local codes -- instead they produce first principal components or their nonlinear equivalents (``principal manifolds''). We will see that FMS-based AAs yield quite different results.
FMS-based AAs.
According to subsections 3.1 and 3.2,
because of the low-complexity coding aspect
the codes tend to
(C1) be binary for sigmoid units
with activation function
(
is small for
near 0 or 1),
(C2) require few separated code components or hidden units (HUs),
and (C3) use simple component functions.
Because of the low-complexity decoding part,
codes also tend to
(D1) have many HUs near zero
and, therefore, be sparsely (or even locally)
distributed,
(D2) have code components conveying information useful for
generating as many output activations as possible.
(C1), (C2) and (D2) encourage minimally redundant, binary codes.
(C3), (D1) and (D2), however, encourage sparse distributed (local)
codes. (C1) - (C3) and (D1) - (D2) lead to codes with
simply computable code components (C1, C3) that convey a lot of
information (D2), and with as
few active code components as possible (C2, D1).
Collectively this makes code components represent simple input
features.