Where does
come from?
To discover flat minima FMS searches for large
axis-aligned hypercuboids (boxes) in weight space
such that weight vectors within the box yield similar
network behavior.
Boxes satisfy two flatness conditions, FC1 and FC2.
FC1 enforces ``tolerable'' output variation in response to
weight vector perturbations, i.e.,
near-flatness of the error surface
around the current weight vector
(in all weight space directions).
Among the boxes satisfying FC1, FC2 selects a unique
one with minimal net output variance.
is the negative logarithm of this box's volume (ignoring constant
terms that have no effect on the gradient descent algorithm).
Hence
is the number of bits (save a
constant) required to describe the current net function,
which does not change significantly
by changing weights within the box.
The box edge length determines the required weight precision.
See Hochreiter and Schmidhuber (1997a) for
details of
's derivation.