next up previous
Next: Example Applications Up: Discussion Previous: Discussion


Possible Types of Gödel Machine Self-Improvements

Which provably useful self-modifications are possible? There are few limits to what a Gödel machine might do.

In one of the simplest cases it might leave its basic proof searcher intact and just change the ratio of time-sharing between the proof searching subroutine and the subpolicy $e$--those parts of $p$ responsible for interaction with the environment.

Or the Gödel machine might modify $e$ only. For example, the initial $e$ may regularly store limited memories of past events somewhere in $s$; this might allow $p$ to derive that it would be useful to modify $e$ such that $e$ will conduct certain experiments to increase the knowledge about the environment, and use the resulting information to increase reward intake. In this sense the Gödel machine embodies a principled way of dealing with the exploration vs exploitation problem [18]. Note that the expected utility of conducting some experiment may exceed the one of not conducting it, even when the experimental outcome later suggests to return to the previous $e$.

The Gödel machine might also modify its very axioms to speed things up. For example, it might find a proof that the original axioms should be replaced or augmented by theorems derivable from the original axioms.

The Gödel machine might even change its own utility function and target theorem, but can do so only if their new values are provably better according to the old ones.

In many cases we do not expect the Gödel machine to replace its proof searcher by code that completely abandons the search for proofs. Instead we expect that only certain subroutines of the proof searcher will be sped up, or that perhaps just the order of generated proofs will be modified in problem-specific fashion. This could be done by modifying the probability distribution on the proof techniques of the initial bias-optimal proof searcher from Section 2.3. Generally speaking, the utility of limited rewrites may often be easier to prove than the one of total rewrites.

In certain uninteresting environments reward is maximized by becoming dumb. For example, a given task may require to repeatedly and forever execute the same pleasure center-activating action, as quickly as possible. In such cases the Gödel machine may delete most of its more time-consuming initial software including the proof searcher.

Note that there is no reason why a Gödel machine should not augment its own hardware. Suppose its lifetime is known to be 100 years. Given a hard problem and axioms restricting the possible behaviors of the environment, the Gödel machine might find a proof that its expected cumulative reward will increase if it invests 10 years into building faster computational hardware, by exploiting the physical resources of its environment.


next up previous
Next: Example Applications Up: Discussion Previous: Discussion
Juergen Schmidhuber 2003-09-29

Back to Goedel machine home page