Next: Proof Techniques and an
Up: Overview / Basic Ideas
Previous: Setup and Formal Goal
Our machine becomes a selfreferential
[11]
Gödel machine
by loading it
with a
particular form of
machinedependent,
selfmodifying code . The initial code
at time step 1
includes a (typically suboptimal)
problem solving subroutine for interacting with
the environment, such as any traditional reinforcement learning
algorithm [20],
and a general proof searcher subroutine
(Section 5)
that systematically makes pairs
(switchprog, proof) (variable substrings of )
until it finds a proof of
a target theorem which essentially states: `the
immediate rewrite of p through current program switchprog
on the given machine
implies higher utility than leaving p as is'. Then it executes
switchprog, which may completely rewrite , including
the proof searcher. Section 3 will explain
details of the necessary
initial axiomatic system
encoded in .
Compare Fig. 1.
Figure:
Storage snapshot of a not yet selfimproved example Gödel machine,
with the initial software still intact. See text for details.

The Global Optimality Theorem (Theorem 4.1,
Section 4) shows
this selfimprovement strategy is not greedy: since the
utility of `leaving as is' implicitly evaluates all possible
alternative switchprogs which an unmodified might find later,
we obtain a globally optimal selfchangethe current switchprog
represents the best of all possible relevant selfchanges, relative
to the given resource limitations and initial proof search strategy.
Next: Proof Techniques and an
Up: Overview / Basic Ideas
Previous: Setup and Formal Goal
Juergen Schmidhuber
20050103