next up previous
Next: PREDICTABILITY MINIMIZATION Up: ALTERNATIVE DEFINITIONS OF Previous: AUTO-ENCODERS

INFOMAX

Following Linsker's Infomax approach [Linsker, 1988], we might think of defining $-D_l$ explicitly as the mutual information between the inputs and the outputs of $T_l$.

We did not use Infomax methods in our experiments for the following reasons:

(a) There is no efficient and general method for maximizing mutual information. (b) With our basic approach from section 1, Infomax makes sense only in situations where it automatically enforces high variance of the outputs of the $T_l$ (possibly under certain constraints). This holds for the simplifying Gaussian noise models studied by Linsker, but it does not hold for the general case. (c) Even under appropriate Gaussian assumptions, with more than one-dimensional representations, Infomax implies maximization of functions of the determinant $DET$ of the covariance matrix of the output activations [Shannon, 1948]. In a small application, Linsker explicitly calculated $DET$'s derivatives. In general, however, this is clumsy.



Juergen Schmidhuber 2003-02-13