Unfortunately, while
and the more general priors of
Section 4
are computable in
the limit, they are not recursive, and thus
practically infeasible. This drawback inspired less general yet
practically more feasible principles of minimum description length
(MDL) [68,41] as well as priors derived from
time-bounded restrictions [31]
of Kolmogorov complexity
[28,59,8].
No particular instance of these approaches, however, is universally
accepted or has a general convincing motivation that carries beyond rather
specialized application scenarios. For instance, typical efficient MDL
approaches require the specification of a class of computable models of
the data, say, certain types of neural networks, plus some computable
loss function expressing the coding costs of the data relative to the
model. This provokes numerous ad-hoc choices.
Our recent work [55], however, offers an alternative to the celebrated but noncomputable algorithmic simplicity measure or Solomonoff-Levin measure discussed above [59,75,60]. We introduced a new measure (a prior on the computable objects) which is not based on the shortest but on the fastest way of describing objects.
Let us assume that the observed data sequence is generated by a computational process, and that any possible sequence of observations is therefore computable in the limit [50]. This assumption is stronger and more radical than the traditional one: Solomonoff just insists that the probability of any sequence prefix is recursively computable, but the (infinite) sequence itself may still be generated probabilistically.
Given our starting assumption that data are deterministically generated by a machine, it seems plausible that the machine suffers from a computational resource problem. Since some things are much harder to compute than others, the resource-oriented point of view suggests the following postulate.
1. Toss an unbiased coin until heads is up; letdenote the number of required trials; set
.
2. If the number of steps executed so far exceedsthen exit. Execute one step; if this leads to a request for a new input bit (of the growing self-delimiting program, e.g., [30,31]), toss the coin to determine the bit, and set
.
3. Go to 2.
Algorithm GUESS is very similar to a probabilistic search algorithm used in previous work on applied inductive inference [47,49]. On several toy problems it generalized extremely well in a way unmatchable by traditional neural network learning algorithms.
With
comes a computable method AS for predicting
optimally within
accuracy [55].
Consider a finite but unknown program
computing
.
What if Postulate 1 holds but
is not optimally
efficient, and/or computed on a computer that differs from
our reference machine? Then we effectively do not sample
beginnings
from
but from an alternative semimeasure
.
Can we still predict well? Yes, because the
Speed Prior
dominates
.
This dominance is all we need to apply the recent loss bounds
[21].
The loss that we are expected to receive
by predicting according to AS
instead of using the true but unknown
does not exceed
the optimal loss by much [55].