The Gödel Machine

The Gödel machine [56]
explicitly addresses the *`Grand Problem of Artificial Intelligence'*
[58]
by optimally dealing with limited resources in general reinforcement
learning settings,
and with the possibly huge (but constant) slowdowns buried
by AIXI [22]
in the somewhat misleading -notation.
It is designed to solve arbitrary computational problems beyond those
solvable by plain OOPS,
such as maximizing the expected future reward of a robot in a possibly
stochastic and reactive environment
(note that the total utility of some robot behavior may be hard
to verify--its evaluation may consume the robot's entire lifetime).

How does it work?
While executing some arbitrary
initial problem solving strategy, the Gödel machine
simultaneously runs a proof searcher which systematically and repeatedly
tests proof techniques. Proof techniques are programs that may read any
part of the Gödel machine's state, and write on a reserved part which may be reset
for each new proof technique test. In an example Gödel machine
[56] this writable
storage includes the variables *proof* and *switchprog*, where
*switchprog* holds a potentially unrestricted program whose execution
could completely rewrite any part of the Gödel machine's current software.
Normally the current *switchprog* is not executed. However, proof
techniques may invoke a special subroutine *check()* which tests
whether *proof* currently holds a proof showing that the utility of
stopping the systematic proof searcher and transferring control to the
current *switchprog* at a particular
point in the near future exceeds the
utility of continuing the search until some alternative *switchprog*
is found. Such proofs are derivable from the proof searcher's axiom
scheme which formally describes the utility function to be maximized
(typically the expected future reward in the expected remaining lifetime
of the Gödel machine), the computational costs of hardware instructions (from
which all programs are composed), and the effects of hardware instructions on
the Gödel machine's state. The axiom scheme also formalizes known probabilistic
properties of the possibly reactive environment, and also the * initial* Gödel machine state and software, which includes the axiom scheme itself
(no circular argument here). Thus proof techniques can reason about
expected costs and results of all programs including the proof searcher.

Once *check()* has identified a provably
good *switchprog*, the latter is executed
(some care has to be taken here because the proof
verification itself and the transfer of control to *switchprog*
also consume part of the typically limited lifetime).
The discovered *switchprog* represents a *globally*
optimal self-change in the following sense:
provably *none* of all the alternative *switchprog*s
and *proof*s (that could be found in the future
by continuing the proof search) is worth
waiting for.

There are many ways of initializing the
proof searcher. Although identical proof techniques may
yield different proofs depending on the time
of their invocation (due to the continually
changing Gödel machine state), there is a bias-optimal and
asymptotically optimal proof searcher initialization
based on a variant of OOPS [56] (Section 9).
It exploits the fact that proof verification is a simple and
fast business where the particular optimality
notion of OOPS is appropriate.
The Gödel machine itself, however, may have
an arbitrary, *typically different and more powerful* sense
of optimality embodied by its given utility function.