Here we describe how a reinforcement learning method called Q-learning can be used to build a `curious' model builder. The notation is the same as above. Following  we introduce an adaptive function for evaluating pairs of inputs and actions as well as an utility function for evaluating inputs .
After random initialization of , , , , and , at each time step the following algorithm is performed:
Note that the algorithm does not specify the implementation of , , and . All three can be implemented as lookup tables or (in hope for useful `generalizations') as back-propagation networks, Boltzmann-machines, etc. and may be replaced by back-propagation networks, too (see the experiments described in section 5).