What is the optimal way of predicting the future, given the past? Which is the best way to act such as to maximize one's future expected reward? Which is the best way of searching for the solution to a novel problem, making optimal use of solutions to earlier problems?
Most previous work on these old and fundamental questions has focused on very limited settings, such as Markovian environments where the optimal next action, given past inputs, depends on the current input only [27].
We will concentrate on a much weaker and therefore much more general assumption, namely, that the environment's responses are sampled from a computable probability distribution. If even this weak assumption were not true then we could not even formally specify the environment, leave alone writing reasonable scientific papers about it.
Let us first introduce some notation.
denotes the set of finite
sequences over the binary alphabet
,
the set of
infinite sequences over
,
the empty string,
.
stand for strings in
.
If
then
is the concatenation of
and
(e.g.,
if
and
then
). For
,
denotes the number of bits in
, where
for
;
.
is the prefix of
consisting of the first
bits, if
, and
otherwise
(
).
denotes the logarithm with basis 2,
denote functions mapping integers to integers. We write
if
there exist positive constants
such that
for
all
. For simplicity let us consider universal Turing Machines
[67]
(TMs) with input alphabet
and trinary output alphabet including the
symbols ``0'', ``1'', and `` '' (blank). For efficiency reasons,
the TMs should have several work tapes
to avoid potential quadratic slowdowns associated with 1-tape TMs.
The remainder of this paper assumes a fixed universal reference TM.
Now suppose bitstring
represents the data observed so far.
What is its most likely continuation
? Bayes' theorem yields