Possible Types of Gödel Machine Self-Improvements

Which provably useful self-modifications are possible? There are few limits to what a Gödel machine might do.

In one of the simplest cases it might leave its basic proof searcher intact and just change the ratio of time-sharing between the proof searching subroutine and the subpolicy --those parts of responsible for interaction with the environment.

Or the Gödel machine might modify only.
For example, the initial may regularly store limited memories
of past events somewhere in ; this might allow to derive that
it would be useful to modify such that will conduct certain
experiments to increase the knowledge about
the environment, and use the resulting information
to increase reward intake. In this sense the Gödel machine embodies
a principled way of dealing with the
exploration vs exploitation problem [18].
Note that the *expected* utility
of conducting some experiment may exceed
the one of not conducting it,
even when the experimental outcome later suggests to return
to the previous .

The Gödel machine might also modify its very axioms to speed things up. For example, it might find a proof that the original axioms should be replaced or augmented by theorems derivable from the original axioms.

The Gödel machine might even change
its own utility function and target theorem,
but can do so only if their *new* values
are provably better according to the *old* ones.

In many cases we do not expect the Gödel machine to replace its proof searcher by code that completely abandons the search for proofs. Instead we expect that only certain subroutines of the proof searcher will be sped up, or that perhaps just the order of generated proofs will be modified in problem-specific fashion. This could be done by modifying the probability distribution on the proof techniques of the initial bias-optimal proof searcher from Section 2.3. Generally speaking, the utility of limited rewrites may often be easier to prove than the one of total rewrites.

In certain uninteresting environments reward is maximized by becoming dumb. For example, a given task may require to repeatedly and forever execute the same pleasure center-activating action, as quickly as possible. In such cases the Gödel machine may delete most of its more time-consuming initial software including the proof searcher.

Note that there is no reason why a Gödel machine should not augment its own hardware. Suppose its lifetime is known to be 100 years. Given a hard problem and axioms restricting the possible behaviors of the environment, the Gödel machine might find a proof that its expected cumulative reward will increase if it invests 10 years into building faster computational hardware, by exploiting the physical resources of its environment.

Back to Goedel Machine Home Page