Academia.eduAcademia.edu

Angelic Hierarchical Planning: Optimal and Online Algorithms

Abstract

High-level actions (HLAs) are essential tools for coping with the large search spaces and long decision horizons encountered in real-world decision making. In a recent paper, we proposed an "angelic" semantics for HLAs that supports proofs that a high-level plan will (or will not) achieve a goal, without first reducing the plan to primitive action sequences. This paper extends the angelic semantics with cost information to support proofs that a high-level plan is (or is not) optimal. We describe the Angelic Hierarchical A* algorithm, which generates provably optimal plans, and show its advantages over alternative algorithms. We also present the Angelic Hierarchical Learning Real-Time A* algorithm for situated agents, one of the first algorithms to do hierarchical lookahead in an online setting. Since high-level plans are much shorter, this algorithm can look much farther ahead than previous algorithms (and thus choose much better actions) for a given amount of computational effort.

Key takeaways

  • To make a high-level sequence more concrete we may refine it, by replacing one of its HLAs by one of its immediate refinements, and we call one plan a refinement of another if it is reachable by any sequence of such steps.
  • As with exact descriptions, we can extend optimistic and pessimistic descriptions and then compose them to produce bounds on the outcomes of high-level sequences, which we call optimistic and pessimistic valuations (see Figure 1(c/d)).
  • Since our planning algorithms will try to find lowcost solutions, we will be most concerned with finding optimistic (and pessimistic) bounds on the cost of the best primitive refinement of each high-level plan that reaches t. These bounds can be extracted directly from the final ALT node of each plan; for instance, the optimistic and pessimistic costs to t of plan (Nav(0, 0), F, Go(0, 1), Z) are [5,7].
  • AHA* (see Algorithm 1) is essentially A* in refinement space, where the initial node is the plan (Act), possible "actions" are refinements of a plan at some HLA, and the goal set consists of the primitive plans that reach t from s 0 .
  • The min optimistic cost of any refinement of HLA a is a valid optimistic cost for a's current optimistic reachable set, and when pessimistic descriptions are consistent, the max such pessimistic cost is similarly valid.