Next: DERIVATION OF THE ALGORITHM Up: FLAT MINIMA NEURAL COMPUTATION Previous: TASK / ARCHITECTURE /

# THE ALGORITHM

Let denote the inputs of the training set. We approximate by , where is defined like in the previous section (replacing by ). For simplicity, in what follows, we will abbreviate by . Starting with a random initial weight vector, flat minimum search (FMS) tries to find a that not only has low but also defines a box with maximal box volume and, consequently, minimal . Note the relationship to MDL: is the number of bits required to describe the weights, whereas the number of bits needed to describe the , given (with ), can be bounded by fixing (see appendix A.1). In the next section we derive the following algorithm. We use gradient descent to minimize , where , and

 (1)

Here is the activation of the th output unit (given weight vector and input ), is a constant, and is the regularization constant (or hyperparameter) which controls the trade-off between regularization and training error (see appendix A.1). To minimize , for each we have to compute
 (2)

It can be shown that by using Pearlmutter's and Mller's efficient second order method, the gradient of can be computed in time (see details in A.3). Therefore, our algorithm has the same order of computational complexity as standard backprop.

Next: DERIVATION OF THE ALGORITHM Up: FLAT MINIMA NEURAL COMPUTATION Previous: TASK / ARCHITECTURE /
Juergen Schmidhuber 2003-02-13

Back to Financial Forecasting page