Finding Promising Exploration Regions by Weighting Expected Navigation Costs, GMD Technical Report, Arbeitspapiere der GMD 987, April, 1996.

In many learning tasks, data-query is neither free nor of constant cost.  Often the cost of a query depends on the distance from the current location in state space to the desired query point.  Much can be gained in these instances by keeping track of (1) the length of the shortest path from each state to every other, and (2) the first action to take on each of these paths.  With this information, a learning agent can efficiently explore its environment, calculating at every step the action that will move it towards the region of greatest estimated exploration benefit by balancing the exploration potential of all reachable states encountered so far against their currently estimated distances.