Papers by Stuart J. Russell

High-level actions (HLAs) are essential tools for coping with the large search spaces and long de... more High-level actions (HLAs) are essential tools for coping with the large search spaces and long decision horizons encountered in real-world decision making. In a recent paper, we proposed an "angelic" semantics for HLAs that supports proofs that a high-level plan will (or will not) achieve a goal, without first reducing the plan to primitive action sequences. This paper extends the angelic semantics with cost information to support proofs that a high-level plan is (or is not) optimal. We describe the Angelic Hierarchical A* algorithm, which generates provably optimal plans, and show its advantages over alternative algorithms. We also present the Angelic Hierarchical Learning Real-Time A* algorithm for situated agents, one of the first algorithms to do hierarchical lookahead in an online setting. Since high-level plans are much shorter, this algorithm can look much farther ahead than previous algorithms (and thus choose much better actions) for a given amount of computational effort.
An efficient policy search algorithm should estimate the local gradient of the objective function... more An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance.
We describe an application of probabilistic modeling and inference technology to the problem of a... more We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are acquired at fixed intervals whereas the events causing data artifacts may occur at any time and have durations that may be significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables effective detection of artifacts and accurate estimation of the underlying blood pressure values.
It is well-known that eliminating repeated states is essential for efficient search of state-spac... more It is well-known that eliminating repeated states is essential for efficient search of state-space AND-OR graphs. The same technique has been found useful for searching beliefstate AND-OR graphs, which arise in nondeterministic partially observable planning problems and in partially observable games. Whereas physical states are viewed by search algorithms as atomic and admit only equality tests, belief states, which are sets of possible physical states, have additional structure: one belief state can subsume or be subsumed by another. This paper presents new algorithms that exploit this property to achieve substantial speedups. The algorithms are demonstrated on Kriegspiel checkmate problems and on a partially observable vacuum world domain.
High-level actions (HLAs) lie at the heart of hierarchical planning. Typically, an HLA admits mul... more High-level actions (HLAs) lie at the heart of hierarchical planning. Typically, an HLA admits multiple refinements into primitive action sequences. Correct descriptions of the effects of HLAs may be essential to their effective use, yet the literature is mostly silent. We propose an angelic semantics for HLAs, the key concept of which is the set of states reachable by some refinement of a high-level plan, representing uncertainty that will ultimately be resolved in the planning agent's own best interest. We describe upper and lower approximations to these reachable sets, and show that the resulting definition of a high-level solution automatically satisfies the upward and downward refinement properties. We define a STRIPS-like notation for such descriptions. A sound and complete hierarchical planning algorithm is given and its computational benefits are demonstrated.

High-level actions (HLAs) lie at the heart of hierarchical planning. Typically, an HLA admits mul... more High-level actions (HLAs) lie at the heart of hierarchical planning. Typically, an HLA admits multiple refinements into primitive action sequences. Correct descriptions of the effects of HLAs may be essential to their effective use, yet the literature is mostly silent. We propose an angelic semantics for HLAs, the key concept of which is the set of states reachable by some refinement of a high-level plan, representing uncertainty that will ultimately be resolved in the planning agent's own best interest. We describe upper and lower approximations to these reachable sets, and show that the resulting definition of a high-level solution automatically satisfies the upward and downward refinement properties. We define a STRIPS-like notation for such descriptions. A sound and complete hierarchical planning algorithm is given and its computational benefits are demonstrated. This is an extended version of a paper by the same name appearing in ICAPS '07.
First-Order Probabilistic Languages: Into the Unknown

Filtering denotes any method whereby an agent updates its belief state-its knowledge of the state... more Filtering denotes any method whereby an agent updates its belief state-its knowledge of the state of the world-from a sequence of actions and observations. In logical filtering, the belief state is a logical formula describing the possible world states. Efficient algorithms for logical filtering bear important implications on reasoning tasks such as planning and diagnosis. In this paper, we will identify classes of transition constraints that are amenable to compact and indefinite filtering-presenting efficient algorithms wherever necessary. We will first show that connected row-convex (CRC) constraints are amenable to efficient filtering when path-consistency is enforced in appropriate steps. We will then extend this theory to provide a filtering algorithm based on repeatedly enforcing path-consistency and embedding the domain values of the related variables in tree structures to guarantee global consistency. Finally, we will identify and comment on the problem of multi-agent localization as a potential application of the theory developed in the paper (under some reasonable assumptions).
General-Purpose MCMC Inference over Relational Structures
A Compact, Hierarchical Q-Function Decomposition

This paper presents Markov chain Monte Carlo data association (MCMCDA) for solving data associati... more This paper presents Markov chain Monte Carlo data association (MCMCDA) for solving data association problems arising in multiple-target tracking in a cluttered environment. When the number of targets is fixed, the single-scan version of MCMCDA approximates joint probabilistic data association (JPDA). Although the exact computation of association probabilities in JPDA is NP-hard, we prove that the single-scan MCMCDA algorithm provides a fully polynomial randomized approximation scheme for JPDA. For general multiple-target tracking problems, in which unknown numbers of targets appear and disappear at random times, we present a multi-scan MCMCDA algorithm that approximates the optimal Bayesian filter. It exhibits remarkable performance compared to multiple hypothesis tracking (MHT) under extreme conditions, such as a large number of targets in a dense environment, low detection probabilities, and high false alarm rates.
We describe Concurrent ALisp, a language that allows the augmentation of reinforcement learning a... more We describe Concurrent ALisp, a language that allows the augmentation of reinforcement learning algorithms with prior knowledge about the structure of policies, and show by example how it can be used to write agents that learn to play a subdomain of the computer game Stratagus.
This paper introduces and illustrates BLOG, a formal language for defining probability models ove... more This paper introduces and illustrates BLOG, a formal language for defining probability models over worlds with unknown objects and identity uncertainty. BLOG unifies and extends several existing approaches. Subject to certain acyclicity constraints, every BLOG model specifies a unique probability distribution over first-order model structures that can contain varying and unbounded numbers of objects. Furthermore, complete inference algorithms exist for a large fragment of the language. We also introduce a probabilistic form of Skolemization for handling evidence.
The paper reports on new algorithms for solving partially observable games. Whereas existing algo... more The paper reports on new algorithms for solving partially observable games. Whereas existing algorithms apply AND-OR search to a tree of blackbox belief states, our "incremental" versions treat uncertainty as a new search dimension, examining the physical states within a belief state to construct solution trees incrementally. On a newly created database of checkmate problems for Kriegspiel (a partially observable form of chess), incrementalization yields speedups of two or more orders of magnitude on hard instances.
We consider applying hierarchical reinforcement learning techniques to problems in which an agent... more We consider applying hierarchical reinforcement learning techniques to problems in which an agent has several effectors to control simultaneously. We argue that the kind of prior knowledge one typically has about such problems is best expressed using a multithreaded partial program, and present concurrent ALisp, a language for specifying such partial programs. We describe algorithms for learning and acting with concurrent ALisp that can be efficient even when there are exponentially many joint choices at each decision point. Finally, we show results of applying these methods to a complex computer game domain.
In many practical problems-from tracking aircraft based on radar data to building a bibliographic... more In many practical problems-from tracking aircraft based on radar data to building a bibliographic database based on citation lists-we want to reason about an unbounded number of unseen objects with unknown relations among them. Bayesian networks, which define a fixed dependency structure on a finite set of variables, are not the ideal representation language for this task. This paper introduces contingent Bayesian networks (CBNs), which represent uncertainty about dependencies by labeling each edge with a condition under which it is active. A CBN may contain cycles and have infinitely many variables. Nevertheless, we give general conditions under which such a CBN defines a unique joint distribution over its variables. We also present a likelihood weighting algorithm that performs approximate inference in finite time per sampling step on any CBN that satisfies these conditions.

In this paper, we consider the general multiple target tracking problem in which an unknown numbe... more In this paper, we consider the general multiple target tracking problem in which an unknown number of targets appears and disappears at random times and the goal is to find the tracks of targets from noisy observations. We propose an efficient real-time algorithm that solves the data association problem and is capable of initiating and terminating a varying number of tracks. We take the data-oriented, combinatorial optimization approach to the data association problem but avoid the enumeration of tracks by applying a sampling method called Markov chain Monte Carlo (MCMC). The MCMC data association algorithm can be considered as a deferred logic since its decision about forming a track is based on the current and past observations. But, at the same time, it can be considered as an approximation to the optimal Bayesian filter. The algorithm shows remarkable performance compared to the greedy algorithm and the multiple hypothesis tracker (MHT) under the extreme conditions, such as a large number of targets in a dense environment, low detection probabilities, and a large number of false alarms.
Uploads
Papers by Stuart J. Russell