The previous section used a single, complex, powerful, primitive learning action (adaptive Levin Search). The current section exploits the fact that it is also possible to use many, much simpler actions that can be combined to form more complex learning strategies, or metalearning strategies (Schmidhuber, 1994, 1997b; Zhao and Schmidhuber, 1996).
Overview. We will use a simple, assembler-like programming language which allows for writing many kinds of (learning) algorithms. Effectively, we embed the way the system modifies its policy and triggers backtracking within the self-modifying policy itself. SSA is used to keep only those self-modifications followed by reward speed-ups, in particular those leading to ``better'' future self-modifications, recursively. We call this ``incremental self-improvement'' (IS).
Outline of section. Subsection 4.1 will describe how the policy is represented as a set of variable probability distributions on a set of assembler-like instructions, how the policy builds the basis for generating and executing a lifelong instruction sequence, how the system can modify itself executing special self-modification instructions, and how SSA keeps only the ``good'' policy modifications. Subsection 4.2 will describe an experimental inductive transfer case study where we apply IS to a sequence of more and more difficult function approximation tasks. Subsection 4.3 will mention additional IS experiments involving complex POEs and interacting learning agents that influence each other's task difficulties.