next up previous
Next: LEARNING TO SOLVE A Up: HQ-Learning Adaptive Behavior 6(2):219-246, Previous: LEARNING RULES


We tested our system on two tasks in partially observable environments. The first task is comparatively simple -- it will serve to exemplify how HQ discovers and stabilizes appropriate subgoal combinations. It requires finding a path from start to goal in a partially observable 10 $\times$ 10-maze, and can be collectively solved by three or more agents. We study system performance as more agents are added. The second, quite complex task involves finding a key which opens a door blocking the path to the goal. The optimal solution (which requires at least 3 agents) costs 83 steps.


Juergen Schmidhuber 2003-02-24

Back to Reinforcement Learning and POMDP page
Back to Subgoal Learning - Hierarchical Learning