Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
High-level actions (HLAs) are essential tools for coping with the large search spaces and long decision horizons encountered in real-world decision making. In a recent paper, we proposed an "angelic" semantics for HLAs that supports proofs that a high-level plan will (or will not) achieve a goal, without first reducing the plan to primitive action sequences. This paper extends the angelic semantics with cost information to support proofs that a high-level plan is (or is not) optimal. We describe the Angelic Hierarchical A* algorithm, which generates provably optimal plans, and show its advantages over alternative algorithms. We also present the Angelic Hierarchical Learning Real-Time A* algorithm for situated agents, one of the first algorithms to do hierarchical lookahead in an online setting. Since high-level plans are much shorter, this algorithm can look much farther ahead than previous algorithms (and thus choose much better actions) for a given amount of computational effort.
High-level actions (HLAs) lie at the heart of hierarchical planning. Typically, an HLA admits multiple refinements into primitive action sequences. Correct descriptions of the effects of HLAs may be essential to their effective use, yet the literature is mostly silent. We propose an angelic semantics for HLAs, the key concept of which is the set of states reachable by some refinement of a high-level plan, representing uncertainty that will ultimately be resolved in the planning agent's own best interest. We describe upper and lower approximations to these reachable sets, and show that the resulting definition of a high-level solution automatically satisfies the upward and downward refinement properties. We define a STRIPS-like notation for such descriptions. A sound and complete hierarchical planning algorithm is given and its computational benefits are demonstrated.
arXiv (Cornell University), 2022
Two common approaches to sequential decision-making are AI planning (AIP) and reinforcement learning (RL). Each has strengths and weaknesses. AIP is interpretable, easy to integrate with symbolic knowledge, and often efficient, but requires an up-front logical domain specification and is sensitive to noise; RL only requires specification of rewards and is robust to noise but is sample inefficient and not easily supplied with external knowledge. We propose an integrative approach that combines high-level planning with RL, retaining interpretability, transfer, and efficiency, while allowing for robust learning of the lower-level plan actions. Our approach defines options in hierarchical reinforcement learning (HRL) from AIP operators by establishing a correspondence between the state transition model of AI planning problem and the abstract state transition system of a Markov Decision Process (MDP). Options are learned by adding intrinsic rewards to encourage consistency between the MDP and AIP transition models. We demonstrate the benefit of our integrated approach by comparing the performance of RL and HRL algorithms in both MiniGrid and N-rooms environments, showing the advantage of our method over the existing ones.
Proceedings of the International Conference on Automated Planning and Scheduling
Reinforcement learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. Users define tasks or goals for RL agents by designing specialized reward functions such that maximization aligns with task satisfaction. This work explores the use of high-level symbolic action models as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions. We also show how automated planning can be used to synthesize high-level plans that can guide hierarchical RL (HRL) techniques towards efficiently learning adequate policies. We provide a formal characterization of taskable RL environments and describe sufficient conditions that guarantee we can satisfy various notions of optimality (e.g., minimize total cost, maximize probability of reaching the goal). In addition, we do an empirical evaluation that shows that our approach converges to near-optimal solutions faster than standard RL and HRL methods...
2011
Abstract In this paper we present a method that allows an agent to discover and create temporal abstractions autonomously. Our method is based on the concept that to reach the goal, the agent must pass through relevant states that we will interpret as subgoals. To detect useful subgoals, our method creates intersections between several paths leading to a goal. Our research focused on domains largely used in the study of temporal abstractions. We used several versions of the room-to-room navigation problem.
Proceedings of the third annual conference on Autonomous Agents - AGENTS '99, 1999
Uncertain and complex environments demand that an agent be able to anticipate the actions of others in order to avoid resource conflicts with them and to realize its goals. Conflicts during plan execution can be avoided by reducing or eliminating interactions by localizing plan effects to particular agents and by merging/coordinating the individual plans of agents by introducing synchronization actions. We describe a method for coordinating plans at abstract levels that takes advantage of hierarchical representations of plan information and that retains the flexibility of plans used in robust plan execution systems such as procedural reasoning systems (PRS). In order to coordinate at abstract levels in plan hierarchies, information about how abstract plans can be refined must be available in order to identify and avoid potential conflicts.
AAAI2011 Workshop on Generalized Planning, 2011
In this paper, we propose a new approach to using probabilistic hierarchical task networks (HTNs) as an effective method for agents to plan under conditions in which their problemsolving knowledge is uncertain, and the environment is nondeterministic. In such situations it is natural to model the environment as a Markov Decision Process (MDP). We show that using Earley graphs, it is possible to bridge the gap between HTNs and MDPs. We prove that the size of the Earley graph created for any given HTN is bounded by the total ...
Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems - AAMAS '06, 2006
Factored representations, model-based learning, and hierarchies are well-studied techniques for improving the learning efficiency of reinforcement-learning algorithms in large-scale state spaces. We bring these three ideas together in a new algorithm. Our algorithm tackles two open problems from the reinforcement-learning literature, and provides a solution to those problems in deterministic domains. First, it shows how models can improve learning speed in the hierarchybased MaxQ framework without disrupting opportunities for state abstraction. Second, we show how hierarchies can augment existing factored exploration algorithms to achieve not only low sample complexity for learning, but provably efficient planning as well. We illustrate the resulting performance gains in example domains. We prove polynomial bounds on the computational effort needed to attain near optimal performance within the hierarchy.
2011
In this paper, we propose a new approach to using probabilistic hierarchical task networks (HTNs) as an effective method for agents to plan under conditions in which their problemsolving knowledge is uncertain, and the environment is nondeterministic. In such situations it is natural to model the environment as a Markov Decision Process (MDP). We show that using Earley graphs, it is possible to bridge the gap between HTNs and MDPs. We prove that the size of the Earley graph created for any given HTN is bounded by the total ...
1998
One big obstacle to understanding the nature of hierarchical task network (htn) planning has been the lack of a clear theoretical framework. In particular, no one has yet presented a clear and concise htn algorithm that is sound and complete. In this paper, we present a formal syntax and semantics for htn planning. Based on this syntax and semantics, we are able to de ne an algorithm for htn planning and prove it sound and complete. We also develop several de nitions of expressivity for planning languages and prove that htn planning is strictly more expressive than strips-style planning according to those de nitions.
2017
There has been much recent interest in the topic of goal reasoning: where do an agent’s goals come from and how is it decided which to pursue? Previous work has described goal reasoning as a unique and separate process apart from previously studied AI functionalities. In this paper, we argue an alternative view: that goal reasoning can be thought of as multilevel planning. We demonstrate that scenarios previously argued to support the need for goal reasoning can be handled easily by an on-line planner, and we sketch a view of how more complex situations might be handled by multiple planners working at different levels of abstraction. By considering goal reasoning as a form of planning, we simplify the AI research agenda and highlight promising avenues for future planning research.
2003
Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent attempts to combat the curse of dimensionality have turned to principled ways of exploiting temporal abstraction, where decisions are not required at each step, but rather invoke the execution of temporally-extended activities which follow their own policies until termination. This leads naturally to hierarchical control architectures and associated learning algorithms. We review several approaches to temporal abstraction and hierarchical organization that machine learning researchers have recently developed. Common to these approaches is a reliance on the theory of semi-Markov decision processes, which we emphasize in our review. We then discuss extensions of these ideas to concurrent activities, multiagent coordination, and hierarchical memory for addressing partial observability.
ArXiv, 2022
AI planning and Reinforcement Learning (RL) both solve sequential decision-making problems under the different formulations. AI Planning requires operator models, but then allows efficient plan generation. RL requires no operator model, instead learns a policy to guide an agent to highreward states. Planning can be brittle in the face of noise whereas RL is more tolerant. However, RL requires a large number of training examples to learn the policy. In this work, we aim to bring AI planning and RL closer by showing that a suitably defined planning model can be used to improve the efficiency of RL. Specifically, we show that the options in the hierarchical RL can be derived from a planning task and integrate planning and RL algorithms for training option policy functions. Our experiments demonstrate an improved sample efficiency on a variety of RL environments over the previous state of the art.
Autonomous Robots
We often specify tasks for a robot using temporal language that can include different levels of abstraction. For example, the command "go to the kitchen before going to the second floor" contains spatial abstraction, given that "floor" consists of individual rooms that can also be referred to in isolation ("kitchen", for example). There is also a temporal ordering of events, defined by the word "before". Previous works have used syntactically co-safe Linear Temporal Logic (sc-LTL) to interpret temporal language (such as "before"), and Abstract Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such as "kitchen" and "second floor"), separately. To handle both types of commands at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a novel approach capable of representing non-Markovian reward functions at different levels of abstractions. The AP-MDP framework translates LTL into its corresponding automata, creates a product Markov Decision Process (MDP) of the LTL specification and the environment MDP, and decomposes the problem into subproblems to enable efficient planning with abstractions. AP-MDP performs faster than a non-hierarchical method of solving LTL problems in over 95% of path planning tasks, and this number only increases as the size of the environment domain increases. In a cleanup world domain, AP-MDP performs faster in over 98% of tasks. We also present a neural sequence-tosequence model trained to translate language commands into LTL expression, and a new corpus of non-Markovian language commands spanning different levels of abstraction. We test our framework with the collected language commands on two drones, demonstrating that our approach enables robots to efficiently solve temporal commands at different levels of abstraction in both indoor and outdoor environments.
To be useful in solving real world problems, agents need to be able to act in environments in which it may not be possible to be completely aware of the current state and where actions do not always work as planned. Additional complexity is added to the problem when one considers groups of agents working together. By casting the agent planning problem as a partially observable Markov decision problem (POMDP), optimal policies can be generated for partially observable and stochastic environments. Exact solutions, however, are notoriously difficult to find for problems of a realistic nature. We introduce a hierarchical decision network-based planning algorithm that can generate high quality plans during execution while demonstrating significant time savings. We also discuss how this approach is particularly applicable to planning in a multiagent environment as compared to other POMDP-based planning algorithms. We present experimental results comparing our algorithm with results obtain...
2006
This paper provides a general mechanism and a solid theoretical basis for performing planning within Belief-Desire-Intention (BDI) agents. BDI agent systems have emerged as one of the most widely used approaches to implementing intelligent behaviour in complex dynamic domains, in addition to which they have a strong theoretical background. However, these systems either do not include any built-in capacity for "lookahead" type of planning or they do it only at the implementation level without any precise defined semantics. In some situations, the ability to plan ahead is clearly desirable or even mandatory for ensuring success. Also, a precise definition of how planning can be integrated into a BDI system is highly desirable. By building on the underlying similarities between BDI systems and Hierarchical Task Network (HTN) planners, we present a formal semantics for a BDI agent programming language which cleanly incorporates HTN-style planning as a built-in feature. We argue that the resulting integrated agent programming language combines the advantages of both BDI agent systems and hierarchical offline planners.
2018
Hierarchical Task Network (HTN) planning is a proven approach to solving complex, real world planning problems more efficiently than planning from first principles when 'standard operating procedures' (or `recipes') can be supplied by the user. By planning for tasks in the same order that they are later executed, total-order HTN planners always know the complete state of the world at each planning step. This enables writing more expressive planning domains than what is possible in partial-order HTN planning, such as preconditions with calls to external procedures. Such features have facilitated the use of total-order HTN planners in agent systems and seen them excel in AI games. This paper describes the Hierarchical Agent-based Task Planner (HATP), a total-order HTN planner. Since its first implementation, HATP has had various extensions and integrations over the years, such as support for splitting a solution into multiple streams and assigning them to the agents in the...
IEE Proceedings - Control Theory and Applications, 1995
The hierarchical non-linear planner, AbNLP, is introduced and its main features described, including the novel mechanisms by which abstraction is encapsulated within abstract operators, and possible interactions between developing levels of plan description are resolved. AbNLP has been developed in the rigorous tradition of STRIPS and TWEAK, and is therefore proposed as a foundation for the development of more powerful hierarchical planners. The main objective of the paper is to present a complete formal specification of the operational behaviour of the goal achevement functions and of the hierarchical refinement strategy employed by AbNLP. AbNLP is presented as a correct foundation for the construction of hierarchical planners. It is proposed that the refinement strategy used constitutes a powerful heuristic weapon against the inherent complexity of planning.
2010
It is well known that there cannot be a single "best" heuristic for optimal planning in general. One way of overcoming this is by combining admissible heuristics (e.g. by using their maximum), which requires computing numerous heuristic estimates at each state. However, there is a tradeoff between the time spent on computing these heuristic estimates for each state, and the time saved by reducing the number of expanded states. We present a novel method that reduces the cost of combining admissible heuristics for optimal search, while maintaining its benefits. Based on an idealized search space model, we formulate a decision rule for choosing the best heuristic to compute at each state. We then present an active online learning approach for that decision rule, and employ the learned model to decide which heuristic to compute at each state. We evaluate this technique empirically, and show that it substantially outperforms each of the individual heuristics that were used, as well as their regular maximum.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.