We did not use Infomax methods in our experiments for the following reasons:
(a) There is no efficient and general method for maximizing mutual
information.
(b) With our basic approach from section 1, Infomax makes sense only in
situations where it automatically enforces
high variance of the outputs of the
(possibly under certain constraints).
This holds for the simplifying Gaussian noise models studied by Linsker,
but it does not hold for the general case.
(c) Even under appropriate Gaussian assumptions,
with more than one-dimensional representations, Infomax
implies maximization of functions of the determinant
of the covariance matrix of the output activations
[Shannon, 1948]. In a small application,
Linsker explicitly calculated
's derivatives.
In general, however, this is clumsy.