Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, arXiv: Optimization and Control
Stochastic games are a classical model in game theory in which two opponents interact and the environment changes in response to the players' behavior. The central solution concepts for these games are the discounted values and the value, which represent what playing the game is worth to the players for different levels of impatience. In the present manuscript, we provide algorithms for computing exact expressions for the discounted values and for the value, which are polynomial in the number of pure stationary strategies of the players. This result considerably improves all the existing algorithms, including the most efficient one, due to Hansen, Kouck\'y, Lauritzen, Miltersen and Tsigaridas (STOC 2011).
Proceedings of the 43rd annual ACM symposium on Theory of computing - STOC '11, 2011
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games. When the number of positions of the game is constant, our algorithms run in polynomial time.
ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research, 1991
We consider finite state, finite action, stochastic games over an infinite time horizon. We survey algorithms for the computation of minimax optimal stationary strategies in the zerosum case, and of Nash equilibria in stationary strategies in the nonzerosum case. We also survey those theoretical results that pave the way towards future development of algorithms. Zusammenfassung: In dieser Arbeit werden unendlichstufige stochastische Spiele mit endlichen Zustands-und Aktionenr~umen untersucht. Es wird ein Uberblick gegeben fiber Algorithmen zur Berechnung von optimalen station/iren Minimax-Strategien in Nullsummen-Spielen und von station~tren Nash-Gleichgewichtsstrategien in Nicht-Nullsummen-Spielen. Einige theoretische Ergebnisse werden vorgestellt, die far die weitere Entwicklung von Algorithmen nOtzlich sind. 1 This paper is based on the invited lectures given by the authors at the 12th Symposium for Operations Research in Passau, 1987. We are indebted to M. Abbad, Evangelista Fe, F. Thuijsman and O.J. Vrieze for valuable comments and discussion. Any remaining errors of either misinterpretation or of omission are the authors' alone.
Applied Mathematics and Computation, 2015
Game theory (GT) is an essential formal tool for interacting entities; however computing equilibria in GT is a hard problem. When the same game can be played repeatedly over time, the problem becomes even more complicated. The existence of multiple game states makes the problem of computing equilibria in such games extremely difficult. In this paper, we approach this problem by first proposing a method to compute a nonempty subset of approximate (up to any precision) subgame-perfect equilibria in repeated games. We then demonstrate how to extend this method to approximate all subgame-perfect equilibria in a repeated game, and also to solve more complex games, such as Markov chain games and stochastic games. We observe that in stochastic games, our algorithm requires additional strong assumptions to become tractable, while in repeated and Markov chain games it allows approximating all subgame-perfect equilibria reasonably fast and under considerably weaker assumptions than previous methods.
2009
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games in [10]. However, the optimization problem has non-linear objective and non-linear constraints with special structure. Algorithms for computationally solving such problems are not available in the literature. We present in this paper, a simple and robust algorithm for numerical solution of general-sum stochastic games with assured convergence to a Nash equilibrium.
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a N-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game-Sub-Problem) conditions. Using these conditions, we develop two actor-critic algorithms: OFF-SGSP (model-based) and ON-SGSP (model-free). Both algorithms use a critic that estimates the value function for a fixed policy and an actor that performs descent in the policy space using a descent direction that avoids local minima. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game (see Hart and Mas-Colell [2005]) as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [Hu and Wellman, 2003] and FFQ [Littman, 2001] algorithms.
Journal of the ACM, 2013
Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O ( mn 1− γ log n 1− γ ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most O ( m 1− γ log n 1− γ ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement ) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provide...
Proceedings of the National Academy of Sciences
In 1953, Lloyd Shapley defined the model of stochastic games, which were the first general dynamic model of a game to be defined, and proved that competitive stochastic games have a discounted value. In 1982, Jean-François Mertens and Abraham Neyman proved that competitive stochastic games admit a robust solution concept, the value, which is equal to the limit of the discounted values as the discount rate goes to 0. Both contributions were published in PNAS. In the present paper, we provide a tractable formula for the value of competitive stochastic games.
2006
Koller, Megiddo and von Stengel showed how to efficiently compute minimax strategies for two-player extensive-form zero-sum games with imperfect information but perfect recall using linear programming and avoiding conversion to normal form. Their algorithm has been used by AI researchers for constructing prescriptive strategies for concrete, often fairly large games. Koller and Pfeffer pointed out that the strategies obtained by the algorithm are not necessarily sequentially rational and that this deficiency is often problematic for the practical applications. We show how to remove this deficiency by modifying the linear programs constructed by Koller, Megiddo and von Stengel so that pairs of strategies forming a sequential equilibrium are computed. In particular, we show that a sequential equilibrium for a two-player zero-sum game with imperfect information but perfect recall can be found in polynomial time. In addition, the equilibrium we find is normal-form perfect. We also describe an extension of our technique to general-sum games which is likely to be prove practical, even though it is not polynomial-time.
Adaptive Agents and Multi-Agents Systems, 2015
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from [9] to a general Nplayer game setting. Next, we break down the optimization problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game-Sub-Problem) conditions. Using these conditions, we develop two provably convergent algorithms. The first algorithm-OFF-SGSPis centralized and model-based, i.e., it assumes complete information of the game. The second algorithm-ON-SGSP-is an online model-free algorithm. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game [12] as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [16] and FFQ [21] algorithms.
Journal of Optimization Theory and Applications, 1990
This paper addresses the problem of computation of cooperative equilibria in discounted stochastic sequential games. The proposed approach contains as a special case the method of Green and Porter (developed originally for repeated oligopoly games), but it is more general than the latter in the sense that it generates nontrivial equilibrium solutions for a much larger class of dynamic games. This fact is demonstrated on two examples, one concerned with duopolistic economics and the other with fishery management.
arXiv (Cornell University), 2021
We consider a class of hierarchical noncooperative N-player games where the ith player solves a parametrized stochastic mathematical program with equilibrium constraints (MPEC) with the caveat that the implicit form of the ith player's in MPEC is convex in player strategy, given rival decisions. Few, if any, general purpose schemes exist for computing equilibria even for deterministic specializations of such games. We develop computational schemes in two distinct regimes: (a) Monotone regimes. When player-specific implicit problems are convex, then the necessary and sufficient equilibrium conditions are given by a stochastic inclusion. Under a monotonicity assumption on the operator, we develop a variance-reduced stochastic proximalpoint scheme that achieves deterministic rates of convergence in terms of solving proximal-point problems in monotone/strongly monotone regimes and the schemes are characterized by optimal or near-optimal sample-complexity guarantees. Finally, the generated sequences are shown to be convergent to an equilibrium in an almost-sure sense in both monotone and strongly monotone regimes; (b) Potentiality. When the implicit form of the game admits a potential function, we develop an asynchronous relaxed inexact smoothed proximal best-response framework. However, any such avenue is impeded by the need to efficiently compute an approximate solution of an MPEC with a strongly convex implicit objective. To this end, we consider the smoothed counterpart of this game where each player's problem is smoothed via randomized smoothing. Notably, under suitable assumptions, we show that an η-smoothed game admits an η-approximate Nash equilibrium of the original game. Our proposed scheme produces a sequence that converges almost surely to an η-approximate Nash equilibrium in both relaxed and unrelaxed settings. This scheme is reliant on computing the proximal problem, a stochastic MPEC whose implicit form has a strongly convex objective, with increasing accuracy in finite-time. The smoothing framework allows for developing a variance-reduced zeroth-order scheme for such problems that admits a fast rate of convergence. Numerical studies on a class of multi-leader multi-follower games suggest that variance-reduced proximal schemes provide significantly better accuracy with far lower run-times. The relaxed best-response scheme scales well will problem size and generally displays more stability than its unrelaxed counterpart.
2018
We suggest a new algorithm for two-person zero-sum undiscounted stochastic games focusing on stationary strategies. Given a positive real $$\varepsilon $$ε, let us call a stochastic game $$\varepsilon $$ε-ergodic, if its values from any two initial positions differ by at most $$\varepsilon $$ε. The proposed new algorithm outputs for every $$\varepsilon >0$$ε>0 in finite time either a pair of stationary strategies for the two players guaranteeing that the values from any initial positions are within an $$\varepsilon $$ε-range, or identifies two initial positions u and v and corresponding stationary strategies for the players proving that the game values starting from u and v are at least $$\varepsilon /24$$ε/24 apart. In particular, the above result shows that if a stochastic game is $$\varepsilon $$ε-ergodic, then there are stationary strategies for the players proving $$24\varepsilon $$24ε-ergodicity. This result strengthens and provides a constructive version of an existenti...
1993
AbstractmThis paper presents algorithms for finding equilibria of mixed strategy in multistage noncooperative games of incomplete information (like pmbabilistie blindfold chess, where at every opportunity a player can perform different moves with some probability). These algorithms accept input games in extensive form. Our main result is an algorithm for computing sequ,.nt.ial equilibrium, which is the most widely accepted notion of equilibrium (for mixed strategies of noncooperative probabilistic games) in mainstream economic game theory. Previously, there were no known algorithms for computing sequential equilibria strategies (except for the special case of single stage games).
Stochastic and Differential Games, 1999
This paper treats stochastic games. A nonzero-sum average payoff stochastic games with arbitrary state spaces and the stopping games are considered. Such models of games very well fit in some studies in economic theory and operations research. A correlation of strategies of the players, involving "public signals", is allowed in the nonzero-sum average payoff stochastic games. The main result is an extension of the correlated equilibrium theorem proved recently by Nowak and Raghavan for dynamic games with discounting to the average payoff stochastic games. The stopping games are special model of stochastic games. The version of Dynkin's game related to observation of Markov process with random priority assignment mechanism of states is presented in the paper. The zero-sum and nonzero-sum games are considered. The paper also provides a brief overview of the theory of nonzero-sum stochastic games and stopping games which are very far from being complete.
2012
We study stochastic two-player games where the goal of one player is to achieve precisely a given expected value of the objective function, while the goal of the opponent is the opposite. Potential applications for such games include controller synthesis problems where the optimisation objective is to maximise or minimise a given payoff function while respecting a strict upper or lower bound, respectively. We consider a number of objective functions including reachability, ω-regular, discounted reward, and total reward.
arXiv: Optimization and Control, 2017
This work considers a stochastic Nash game in which each player solves a parameterized stochastic optimization problem. In deterministic regimes, best-response schemes have been shown to be convergent under a suitable spectral property associated with the proximal best-response map. However, a direct application of this scheme to stochastic settings requires obtaining exact solutions to stochastic optimization at each iteration. Instead, we propose an inexact generalization in which an inexact solution is computed via an increasing number of projected stochastic gradient steps. Based on this framework, we present three inexact best-response schemes: (i) First, we propose a synchronous scheme where all players simultaneously update their strategies; (ii) Subsequently, we extend this to a randomized setting where a subset of players is randomly chosen to their update strategies while the others keep their strategies invariant; (iii) Finally, we propose an asynchronous scheme, where ea...
Dynamic Games and Applications, 2013
To celebrate the 60th anniversary of the seminal paper "Stochastic Games" of L.S. Shapley [16], Dynamic Games and Applications is proud to publish this special issue. Shapley's paper on stochastic games has had a tremendous scientific impact on the theory and applications of dynamic games and there is still a very active research on these domains. In addition, as can be seen by the content of this volume, the theoretical model as well the potential for applications develops in new directions including continuous time framework, link with evolutionary games, algorithmic game theory, economics, and social networks. The idea to devote a special issue to celebrate the 60th anniversary of Shapley's paper [16] emerged a few years ago, and the decision was taken in 2011. Since then, we had the great pleasure to enjoy the attribution to Llyod Shapley of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2012 1 "for the theory of stable allocations and the practice of market design," jointly with Alvin Roth. This is the occasion to recall the importance of Shapley's contributions in other areas of game theory like: • core for TU and NTU cooperative games; • equivalence principle for large economies; • matching (with David Gale);
2005
With the increasing reliance on game theory as a foundation for auctions and electronic commerce, efficient algorithms for computing equilibria in multiplayer general-sum games are of great theoretical and practical interest. The computational complexity of finding a Nash equilibrium for a one-shot bimatrix game is a well-known open problem. This paper treats a related but distinct problem—that of finding a Nash equilibrium for an average-payoff repeated bimatrix game, and presents a polynomial-time algorithm.
Lecture Notes in Computer Science, 2011
In this paper, we consider two-player zero-sum stochastic mean payoff games with perfect information modeled by a digraph with black, white, and random vertices. These BWR-games games are polynomially equivalent with the classical Gillette games, which include many well-known subclasses, such as cyclic games, simple stochastic games, stochastic parity games, and Markov decision processes. They can also be used to model parlor games such as Chess or Backgammon. It is a long-standing open question if a polynomial algorithm exists that solves BWR-games. In fact, a pseudo-polynomial algorithm for these games with an arbitrary number of random nodes would already imply their polynomial solvability. Currently, only two classes are known to have such a pseudo-polynomial algorithm: BW-games (the case with no random nodes) and ergodic BWR-games (in which the game's value does not depend on the initial position) with constant number of random nodes. In this paper, we show that the existence of a pseudo-polynomial algorithm for BWR-games with constant number of random vertices implies smoothed polynomial complexity and the existence of absolute and relative polynomial-time approximation schemes. In particular, we obtain smoothed polynomial complexity and derive absolute and relative approximation schemes for BW-games and ergodic BWR-games (assuming a technical requirement about the probabilities at the random nodes).
Computational Management Science, 2007
In this paper we review a number of algorithms to compute Nash equilibria in deterministic linear quadratic differential games. We will review the open-loop and feedback information case. In both cases we address both the finite and the infinite-planning horizon.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.