Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1991, ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research
We consider finite state, finite action, stochastic games over an infinite time horizon. We survey algorithms for the computation of minimax optimal stationary strategies in the zerosum case, and of Nash equilibria in stationary strategies in the nonzerosum case. We also survey those theoretical results that pave the way towards future development of algorithms. Zusammenfassung: In dieser Arbeit werden unendlichstufige stochastische Spiele mit endlichen Zustands-und Aktionenr~umen untersucht. Es wird ein Uberblick gegeben fiber Algorithmen zur Berechnung von optimalen station/iren Minimax-Strategien in Nullsummen-Spielen und von station~tren Nash-Gleichgewichtsstrategien in Nicht-Nullsummen-Spielen. Einige theoretische Ergebnisse werden vorgestellt, die far die weitere Entwicklung von Algorithmen nOtzlich sind. 1 This paper is based on the invited lectures given by the authors at the 12th Symposium for Operations Research in Passau, 1987. We are indebted to M. Abbad, Evangelista Fe, F. Thuijsman and O.J. Vrieze for valuable comments and discussion. Any remaining errors of either misinterpretation or of omission are the authors' alone.
arXiv: Optimization and Control, 2018
Stochastic games are a classical model in game theory in which two opponents interact and the environment changes in response to the players' behavior. The central solution concepts for these games are the discounted values and the value, which represent what playing the game is worth to the players for different levels of impatience. In the present manuscript, we provide algorithms for computing exact expressions for the discounted values and for the value, which are polynomial in the number of pure stationary strategies of the players. This result considerably improves all the existing algorithms, including the most efficient one, due to Hansen, Kouck\'y, Lauritzen, Miltersen and Tsigaridas (STOC 2011).
Proceedings of the 43rd annual ACM symposium on Theory of computing - STOC '11, 2011
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games. When the number of positions of the game is constant, our algorithms run in polynomial time.
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a N-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game-Sub-Problem) conditions. Using these conditions, we develop two actor-critic algorithms: OFF-SGSP (model-based) and ON-SGSP (model-free). Both algorithms use a critic that estimates the value function for a fixed policy and an actor that performs descent in the policy space using a descent direction that avoids local minima. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game (see Hart and Mas-Colell [2005]) as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [Hu and Wellman, 2003] and FFQ [Littman, 2001] algorithms.
2009
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games in [10]. However, the optimization problem has non-linear objective and non-linear constraints with special structure. Algorithms for computationally solving such problems are not available in the literature. We present in this paper, a simple and robust algorithm for numerical solution of general-sum stochastic games with assured convergence to a Nash equilibrium.
Stochastic Games and Applications, 2003
After a brief survey of iterative algorithms for general stochastic games, we concentrate on finite-step algorithms for two special classes of stochastic games. They are Single-Controller Stochastic Games and Perfect Information Stochastic Games. In the case of single-controller games, the transition probabilities depend on the actions of the same player in all states. In perfect information stochastic games, one of the players has exactly one action in each state. Single-controller zero-sum games are efficiently solved by linear programming. Non-zero-sum single-controller stochastic games are reducible to linear complementary problems (LCP). In the discounted case they can be modified to fit into the so-called LCPs of Eave's class L. In the undiscounted case the LCP's are reducible to Lemke's copositive plus class. In either case Lemke's algorithm can be used to find a Nash equilibrium. In the case of discounted zero-sum perfect information stochastic games, a policy improvement algorithm is presented. Many other classes of stochastic games with orderfield property still await efficient finite-step algorithms.
Journal of Economic Theory, 2014
We study a class of discounted infinite horizon stochastic games with strategic complementarities. We first characterize the set of all Markovian equilibrium values by developing a new type procedure (monotone method in function spaces under set inclusion orders). Secondly, using monotone operators on the space of values and strategies (under pointwise orders), we prove existence of a Stationary Markov Nash equilibrium via constructive methods. In addition, we provide monotone comparative statics results for ordered perturbations of the space of stochastic games. Under slightly stronger assumptions, we prove the stationary Markov Nash equilibrium values form a complete lattice, with least and greatest equilibrium value functions being the uniform limit of successive approximations from pointwise lower and upper bounds. We finally discuss the relationship between both monotone methods presented in a paper. dynamic negotiations with status quo (see ), (vi) international lending and sovereign debt (Atkeson, 1991), (vii) optimal Ramsey taxation (Phelan and Stacchetti, 2001), (viii) models of savings and asset prices with hyperbolic discounting (Harris and Laibson, 2001), among others. 2 Additionally, in the literature pertaining to economic applications of stochastic games, the central concerns have been broader than the mere question of weakening conditions for the existence of subgame perfect or Markovian equilibrium. Rather, researchers have become progressively more concerned with characterizing the properties of computational implementations, so they can study the quantitative (as well as qualitative) properties of particular subclasses of subgame perfect equilibrium. For example, often one seeks to simulate approximate equilibria in order to assess the quantitative importance of dynamic/time consistency problems (as is done, for example, for calibration exercises in applied macroeconomics). In other cases, one seeks to estimate the deep structural parameters of the game (as, for instance, in the recent work in empirical industrial organization). In either situation, one needs to relate theory to numerical implementation, which requires both (i) sharp characterizations of the set of SMNE being computed, and (ii) constructive fixed point methods that can be tied directly to approximation schemes. Of course, for finite games 3 , the question of existence and computation of SMNE has been essentially resolved. 4 Unfortunately, for infinite games, although the equilibrium existence question has received a great deal of attention, results that provide characterization the SMNE set are needed (e.g. to obtain results on the accuracy of approximation methods). Similarly, we still miss results that establish classes of robust equilibrium comparative statics on the space of games which is needed to develop a collection of computable equilibrium comparative statics 5 .
2021
We consider a class of hierarchical noncooperative N−player games where the ith player solves a parametrized mathematical program with equilibrium constraints (MPEC) with the caveat that the implicit form of the ith player’s in MPEC is convex in player strategy, given rival decisions. This represents a challenging class of games that subsumes multi-leader multifollower games in which player-specific problems are convex in an implicit sense, given rival decisions. We consider settings where player playoffs are expectation-valued with lower-level equilibrium constraints imposed in an almost-sure sense; i.e. player problems are parametrized stochastic MPECs. Few, if any, general purpose schemes exist for computing equilibria even for deterministic specializations of such games. We develop computational schemes in two distinct regimes: (a) Monotone regimes. When player-specific implicit problems are convex, then the necessary and sufficient equilibrium conditions are given by a stochast...
Stochastic and Differential Games, 1999
This paper treats stochastic games. A nonzero-sum average payoff stochastic games with arbitrary state spaces and the stopping games are considered. Such models of games very well fit in some studies in economic theory and operations research. A correlation of strategies of the players, involving "public signals", is allowed in the nonzero-sum average payoff stochastic games. The main result is an extension of the correlated equilibrium theorem proved recently by Nowak and Raghavan for dynamic games with discounting to the average payoff stochastic games. The stopping games are special model of stochastic games. The version of Dynkin's game related to observation of Markov process with random priority assignment mechanism of states is presented in the paper. The zero-sum and nonzero-sum games are considered. The paper also provides a brief overview of the theory of nonzero-sum stochastic games and stopping games which are very far from being complete.
1993
AbstractmThis paper presents algorithms for finding equilibria of mixed strategy in multistage noncooperative games of incomplete information (like pmbabilistie blindfold chess, where at every opportunity a player can perform different moves with some probability). These algorithms accept input games in extensive form. Our main result is an algorithm for computing sequ,.nt.ial equilibrium, which is the most widely accepted notion of equilibrium (for mixed strategies of noncooperative probabilistic games) in mainstream economic game theory. Previously, there were no known algorithms for computing sequential equilibria strategies (except for the special case of single stage games).
Lecture Notes in Computer Science, 2013
We consider two-player zero-sum finite (but infinite-horizon) stochastic games with limiting average payoffs. We define a family of stationary strategies for Player I parameterized by ε > 0 to be monomial, if for each state k and each action j of Player I in state k except possibly one action, we have that the probability of playing j in k is given by an expression of the form cε d for some non-negative real number c and some non-negative integer d. We show that for all games, there is a monomial family of stationary strategies that are ε-optimal among stationary strategies. A corollary is that all concurrent reachability games have a monomial family of ε-optimal strategies. This generalizes a classical result of de Alfaro, Henzinger and Kupferman who showed that this is the case for concurrent reachability games where all states have value 0 or 1.
arXiv: Optimization and Control, 2017
This work considers a stochastic Nash game in which each player solves a parameterized stochastic optimization problem. In deterministic regimes, best-response schemes have been shown to be convergent under a suitable spectral property associated with the proximal best-response map. However, a direct application of this scheme to stochastic settings requires obtaining exact solutions to stochastic optimization at each iteration. Instead, we propose an inexact generalization in which an inexact solution is computed via an increasing number of projected stochastic gradient steps. Based on this framework, we present three inexact best-response schemes: (i) First, we propose a synchronous scheme where all players simultaneously update their strategies; (ii) Subsequently, we extend this to a randomized setting where a subset of players is randomly chosen to their update strategies while the others keep their strategies invariant; (iii) Finally, we propose an asynchronous scheme, where ea...
Lecture Notes in Computer Science, 2015
Ummels and Wojtczak initiated the study of finding Nash equilibria in simple stochastic multi-player games satisfying specific bounds. They showed that deciding the existence of pure-strategy Nash equilibria (PURENE) where a fixed player wins almost surely is undecidable for games with 9 players. They also showed that the problem remains undecidable for the finite-strategy Nash equilibrium (FINNE) with 14 players. In this paper we improve their undecidability results by showing that PURENE and FINNE problems remain undecidable for 5 or more players.
2021
We present a generic strategy improvement algorithm (GSIA) to find an optimal strategy of simple stochastic games (SSG). We prove the correctness of GSIA, and derive a general complexity bound, which implies and improves on the results of several articles. First, we remove the assumption that the SSG is stopping, which is usually obtained by a polynomial blowup of the game. Second, we prove a tight bound on the denominator of the values associated to a strategy, and use it to prove that all strategy improvement algorithms are in fact fixed parameter tractable in the number r of random vertices. All known strategy improvement algorithms can be seen as instances of GSIA, which allows to analyze the complexity of converge from below by Condon [14] and to propose a class of algorithms generalising Gimbert and Horn’s algorithm [16, 17]. These algorithms terminate in at most r! iterations, and for binary SSGs, they do less iterations than the current best deterministic algorithm given by ...
2004
We develop an exact dynamic programming algorithm for partially observable stochastic games (POSGs). The algorithm is a synthesis of dynamic programming for partially observable Markov decision processes (POMDPs) and iterated elimination of dominated strategies in normal form games. We prove that when applied to finite-horizon POSGs, the algorithm iteratively eliminates very weakly dominated strategies without first forming a normal form representation of the game. For the special case in which agents share the same payoffs, the algorithm can be used to find an optimal solution. We present preliminary empirical results and discuss ways to further exploit POMDP theory in solving POSGs.
International Journal of Game Theory, 1989
Dynamic Games and Applications, 2013
To celebrate the 60th anniversary of the seminal paper "Stochastic Games" of L.S. Shapley [16], Dynamic Games and Applications is proud to publish this special issue. Shapley's paper on stochastic games has had a tremendous scientific impact on the theory and applications of dynamic games and there is still a very active research on these domains. In addition, as can be seen by the content of this volume, the theoretical model as well the potential for applications develops in new directions including continuous time framework, link with evolutionary games, algorithmic game theory, economics, and social networks. The idea to devote a special issue to celebrate the 60th anniversary of Shapley's paper [16] emerged a few years ago, and the decision was taken in 2011. Since then, we had the great pleasure to enjoy the attribution to Llyod Shapley of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2012 1 "for the theory of stable allocations and the practice of market design," jointly with Alvin Roth. This is the occasion to recall the importance of Shapley's contributions in other areas of game theory like: • core for TU and NTU cooperative games; • equivalence principle for large economies; • matching (with David Gale);
Mathematical Methods of Operations Research, 2005
We consider a zero-sum stochastic game with side constraints for both players with a special structure. There are two independent controlled Markov chains, one for each player. The transition probabilities of the chain associated with a player as well as the related side constraints depend only on the actions of the corresponding player; the side constraints also depend on the player's controlled chain. The global cost that player 1 wishes to minimize and that player 2 wishes to maximize, depend however on the actions and Markov chains of both players. We obtain a linear programming formulations that allows to compute the value and saddle point policies for this problem. We illustrate the theoretical results through a zero-sum stochastic game in wireless networks in which each player has power constraints.
2015
We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a saddle point in uniformly optimal pure stationary strategies.
Adaptive Agents and Multi-Agents Systems, 2015
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from [9] to a general Nplayer game setting. Next, we break down the optimization problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game-Sub-Problem) conditions. Using these conditions, we develop two provably convergent algorithms. The first algorithm-OFF-SGSPis centralized and model-based, i.e., it assumes complete information of the game. The second algorithm-ON-SGSP-is an online model-free algorithm. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game [12] as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [16] and FFQ [21] algorithms.
Computational Management Science, 2007
In this paper we review a number of algorithms to compute Nash equilibria in deterministic linear quadratic differential games. We will review the open-loop and feedback information case. In both cases we address both the finite and the infinite-planning horizon.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.