The solution. The only solution to the problem is: make all equal to . Like with the example above, there are short and fast programs for computing the solution.
The training data. The 3 training inputs , , and from the previous task are used. The target values are different, however. Obviously, the target for input vector is 108. The target for input vector is 126. The target for input vector is 221. Again, success is binary: only if the solution candidate fits the 3 training examples exactly, the solution is evaluated on the test data.
Why conventional neural net algorithms fail to solve this problem: for the same reasons they fail to solve the previous problem (see section 4.3). The training set is too small to obtain reasonable generalization on the test set.
RESULTS. Programs fitting the training data were found in 10 out of runs, using up a total search time of time steps. 8 of the 10 successful runs led to perfect generalization on the 161,697 unseen test examples.
The first weight vector fitting the training data was found after 6,902,963 runs. Again, the corresponding program was a pretty wild one. But it led to perfect generalization on all the test data. Before halting, the program used 502 out of 8192 allocated time steps. Its time probability was . Its space probability was . Table 3 shows the used part of the storage after execution. What the program does is this:
(0) Allocate 3 cells on the work tape. Initialize with zero. Set Min = Min - 3.
(2) Get the contents of the input field (see list of instructions in section 3)
at position 11 (which is 0), and write it into address -2.
(5) Write the contents of address -3 onto the weight pointed to by WeightPointer
and increment WeightPointer. Halt if WeightPointer is out of range.
(7) If the contents of address -3 is less or equal to the contents of address 9,
goto address 11. Otherwise goto address 11.
(11) Increment the contents of address -3.
(13) Goto address -1.
(-1) If the contents of address 7 is less or equal to the contents of address 3
(always true), goto address 5.
The instructions beginning at the addresses (2), (7), and (-1) are useless. But at least they are not catastrophic. Essentially, the program first allocates space for a variable (initially zero) on the work tape (recall that the program tape is ``read/execute'' only, and cannot be used for variables). Then it executes a loop for incrementing and writing the variable contents onto the network's weight vector.
After runs, a faster and rather elegant (nearly minimal) program was found. Table 4 shows the used storage after execution. The program ran for 302 out of 512 allocated time steps. Its space probability is . Inspection will reveal the operation of the program.
Using different sets of 3 training examples (obtained by randomly permuting the input units that are never on) led to very similar generalization results.