Details. The training set consists of 24 data vectors from 1966 to 1972. Positive DAX tendency is mapped to target 0.8, otherwise the target is -0.8. The test set consists of 68 data vectors from 1973 to 1990. Flat minimum search (FMS) is compared against: (1) Conventional backprop (BP8) with 8 hidden units, (2) Backprop with 4 hidden units (BP4) (4 hidden units are chosen because pruning methods favor 4 hidden units, but 3 is not enough), (3) Optimal brain surgeon (OBS; Hassibi & Stork, 1993), ) with a few improvements (see section 5.6), (4) Weight decay (WD) according to Weigend et. al (1991) (WD and OBS were chosen because they are well-known and widely used).
Performance measure. Since wrong predictions lead to loss of money, performance is measured as follows. The sum of incorrectly predicted DAX changes is subtracted from the sum of correctly predicted DAX changes. The result is divided by the sum of absolute DAX changes.
Results. See table 2. Our method outperforms the other methods.
MSE is irrelevant. Note that MSE is not a reasonable performance measure for this task. For instance, although FMS typically makes more correct classifications than WD, FMS' MSE often exceeds WD's. This is because WD's wrong classifications tend to be close to 0, while FMS often prefers large weights yielding strong output activations -- FMS' few false classifications tend to contribute a lot to MSE.
Learning rate: 0.01.
Architecture: (3-8-1), except BP4 with (3-4-1).
Number of training examples: 20,000,000.
Method specific parameters:
FMS: ; .
WD: like with FMS, but .
OBS: (the same result was obtained with higher values, e.g. 0.13).
See section 5.6 for parameters common to all experiments.