We tested our system on two tasks in
partially observable environments.
The first task is comparatively simple -- it will serve to exemplify
how HQ discovers and stabilizes appropriate subgoal combinations.
It requires finding a path from
start to goal in a partially observable
10
10-maze, and can be collectively solved by
three or more agents. We study system performance as
more agents are added.
The second, quite complex task involves
finding a key which opens a door blocking the path to the goal.
The optimal solution (which requires at least 3 agents)
costs 83 steps.