Here we describe how a reinforcement learning method
called Q-learning can be used to build a `curious' model builder.
The notation is the same as above.
Following [13] we introduce
an adaptive function
for evaluating pairs of inputs
and actions
as well as an utility function
for evaluating
inputs
.
After random initialization of
,
,
,
, and
,
at each time step
the following algorithm is performed:
Note that the algorithm does not specify the implementation of
,
, and
. All three can be implemented as
lookup tables or (in hope for useful `generalizations') as
back-propagation networks, Boltzmann-machines, etc.
and
may be replaced by
back-propagation networks, too (see the experiments described in
section 5).