Since the early attempts of  at building a ``General Problem Solver''--see also --much work has been done to develop mostly heuristic machine learning algorithms that solve new problems based on experience with previous problems, by incrementally shifting the inductive bias in the sense of . Many pointers to learning by chunking, learning by macros, hierarchical learning, learning by analogy, etc. can be found in the book by . Relatively recent general attempts include program evolvers such as ADATE  and simpler heuristics such as Genetic Programming (GP) [8,10,2]. Unlike logic-based program synthesizers [15,77,9], program evolvers use biology-inspired concepts of Evolutionary Computation [40,67] or Genetic Algorithms  to evolve better and better computer programs. Most existing GP implementations, however, do not even allow for programs with loops and recursion, thus ignoring a main motivation for search in program space. They either have very limited search spaces (where solution candidate runtime is not even an issue), or are far from bias-optimal, or both. Similarly, traditional reinforcement learners  are neither general nor close to being bias-optimal.
A first step to make GP-like methods bias-optimal would be to allocate runtime to tested programs in proportion to the probabilities of the mutations or ``crossover operations'' that generated them. Even then there would still be room for improvement, however, since GP has quite limited ways of making new programs from previous ones--it does not learn better program-making strategies.
This brings us to several previous publications on learning to learn or metalearning , where the goal is to learn better learning algorithms through self-improvement without human intervention--compare the human-assisted self-improver by . We introduced the concept of incremental search for improved, probabilistically generated code that modifies the probability distribution on the possible code continuations: incremental self-improvers [51,64,63,65] use the success-story algorithm SSA to undo those self-generated probability modifications that in the long run do not contribute to increasing the learner's cumulative reward per time interval. An earlier meta-GP algorithm  was designed to learn better GP-like strategies;  also combined principles of reinforcement learning economies  with a ``self-referential'' metalearning approach. A gradient-based metalearning technique [50,49] for continuous program spaces of differentiable recurrent neural networks (RNNs) was also designed to favor better learning algorithms; compare the remarkable recent success of the related but technically improved RNN-based metalearner by .
The algorithms above generally are not near-bias-optimal though. The method discussed in this paper, however, combines optimal search and incremental self-improvement, and will be -bias-optimal, where is a small and practically acceptable number.