We study an aspect of adaptive vision with neural networks which has not been explored in this general form before: The adaptive control of sequential physical fovea-movements for target detection.
Consider the following target detection task: A two-dimensional object may be arbitrarily rotated and translated on a pixel plane consisting of many pixels. Learn to give the position and the orientation of a predefined detail of the object (the target).
Now consider the naive `neural' solution to this task: By providing a huge number of training examples, train a feed-forward network with many input units (typically one for each pixel), many hidden units and many (typically millions of) connections to emit a representation of the position and the orientation of the target.
The contribution of this paper is a system for target detection which can be more efficient, more sequential, but also more complex than the naive approach. It is inspired by the observation that biological systems employ sequential fovea movements for target detection. The system is capable of `active perception': At a given time it can have an influence on what to perceive next. It learns to produce sequences of fovea movements (rotations and translations) which lead the high-resolution part of an artificial fovea from arbitrary starting points in the environment of a randomly placed object to a predefined detail of the object (the externally defined target). In particular, we show how techiques for adaptive neuro-control can be used for learning target detection without an informed teacher (the task is a `reward-only-at-goal' task). The system solves its target detection task solely by being given the shape of the target, but without being told how to get there. It learns to focus on those domain-dependent parts of the visual scene which are relevant for the target detection process. The system is efficient in the sense that it uses only a fraction of the input units and connections of the naive approach, still allowing maximal resolution to be applied to each part of the pixel plane.
The remainder of this paper is structured as follows: First we describe and motivate our 2-network approach for solving the temporal credit assignment problem associated with the target detection task.
Then experiments with target detection problems are described. It is demonstrated that the system can discover (in an unsupervised manner) target-directed trajectories (sequences of fovea translations and rotations) by learning to sequentially focus on relevant cues in the visual scene. As a by-product, the system learns translation and rotation invariance, as well as target tracking. It is demonstrated that an imperfect adaptive model of the environmental dynamics can contribute to perfect solutions. It is also demonstrated that making a sequential task out of a static one can be very efficient. Furthermore, a method for parallel on-line learning of both networks is experimentally shown to be feasible.
Finally implications for more general attentive systems are discussed.