The agent was able to move around in a two-dimensional world
with 100 different states.
The environment was reactive.
's task was to predict the reactions of the environment
which were partly random and partly deterministic.
The `curious' system was tested against the conventional
random search method.
With both methods,
at time
the sum
of the squared differences between the
values of the possible deterministic
reactions
and the corresponding predictions of
was used as a criterion for judging the quality of
.
With guidance by the
principle of adaptive curiosity
decreased up to
10 times faster
than with random search (see [6] for details).
The reason for this superior performance was that the
`curious' system soon found out that there were certain states
of the environment where
further performance improvement of
could be expected. It started to
focus on these particular states. The random search method was not
selective at all, therefore it wasted a lot of time on senseless
exploration of states of the environment that did not allow
performance improvement.
The more complex the environment the more benefits should be expected from the principle of adaptive curiosity. Ongoing experiments focus on increasingly complex worlds, non-local input/output representations and on the expected `generalization capabilities' of non-trivial networks with hidden units.