Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, arXiv (Cornell University)
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games. When the number of positions of the game is constant, our algorithms run in polynomial time.
arXiv: Optimization and Control, 2018
Stochastic games are a classical model in game theory in which two opponents interact and the environment changes in response to the players' behavior. The central solution concepts for these games are the discounted values and the value, which represent what playing the game is worth to the players for different levels of impatience. In the present manuscript, we provide algorithms for computing exact expressions for the discounted values and for the value, which are polynomial in the number of pure stationary strategies of the players. This result considerably improves all the existing algorithms, including the most efficient one, due to Hansen, Kouck\'y, Lauritzen, Miltersen and Tsigaridas (STOC 2011).
ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research, 1991
We consider finite state, finite action, stochastic games over an infinite time horizon. We survey algorithms for the computation of minimax optimal stationary strategies in the zerosum case, and of Nash equilibria in stationary strategies in the nonzerosum case. We also survey those theoretical results that pave the way towards future development of algorithms. Zusammenfassung: In dieser Arbeit werden unendlichstufige stochastische Spiele mit endlichen Zustands-und Aktionenr~umen untersucht. Es wird ein Uberblick gegeben fiber Algorithmen zur Berechnung von optimalen station/iren Minimax-Strategien in Nullsummen-Spielen und von station~tren Nash-Gleichgewichtsstrategien in Nicht-Nullsummen-Spielen. Einige theoretische Ergebnisse werden vorgestellt, die far die weitere Entwicklung von Algorithmen nOtzlich sind. 1 This paper is based on the invited lectures given by the authors at the 12th Symposium for Operations Research in Passau, 1987. We are indebted to M. Abbad, Evangelista Fe, F. Thuijsman and O.J. Vrieze for valuable comments and discussion. Any remaining errors of either misinterpretation or of omission are the authors' alone.
Stochastic and Differential Games, 1999
This paper treats stochastic games. A nonzero-sum average payoff stochastic games with arbitrary state spaces and the stopping games are considered. Such models of games very well fit in some studies in economic theory and operations research. A correlation of strategies of the players, involving "public signals", is allowed in the nonzero-sum average payoff stochastic games. The main result is an extension of the correlated equilibrium theorem proved recently by Nowak and Raghavan for dynamic games with discounting to the average payoff stochastic games. The stopping games are special model of stochastic games. The version of Dynkin's game related to observation of Markov process with random priority assignment mechanism of states is presented in the paper. The zero-sum and nonzero-sum games are considered. The paper also provides a brief overview of the theory of nonzero-sum stochastic games and stopping games which are very far from being complete.
2009
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games in [10]. However, the optimization problem has non-linear objective and non-linear constraints with special structure. Algorithms for computationally solving such problems are not available in the literature. We present in this paper, a simple and robust algorithm for numerical solution of general-sum stochastic games with assured convergence to a Nash equilibrium.
Computing Research Repository - CORR, 2008
We consider some well known families of two-player, zero-sum, turn-based, perfect information games that can be viewed as specical cases of Shapley's stochastic games. We show that the following tasks are polynomial time equivalent: Solving simple stochastic games, solving stochastic mean-payo games with rewards and probabilities given in unary, and solving stochastic mean-payo games with rewards and probabilities given in binary.
Operations Research Letters, 2012
We deal with zero-sum two-player stochastic games with perfect information. We propose two algorithms to find the uniform optimal strategies and one method to compute the optimality range of discount factors. We prove the convergence in finite time for one algorithm. The uniform optimal strategies are also optimal for the long run average criterion and, in transient games, for the undiscounted criterion as well.
2012
We provide a direct, elementary proof for the existence of lim λ→0 v λ , where v λ is the value of λ-discounted finite two-person zero-sum stochastic game. 1 Introduction Two-person zero-sum stochastic games were introduced by Shapley [4]. They are described by a 5-tuple (Ω, I, J , q, g), where Ω is a finite set of states, I and J are finite sets of actions, g : Ω × I × J → [0, 1] is the payoff, q : Ω × I × J → ∆(Ω) the transition and, for any finite set X, ∆(X) denotes the set of probability distributions over X. The functions g and q are bilinearly extended to Ω × ∆(I) × ∆(J). The stochastic game with initial state ω ∈ Ω and discount factor λ ∈ (0, 1] is denoted by Γ λ (ω) and is played as follows: at stage m ≥ 1, knowing the current state ω m , the players choose actions (i m , j m) ∈ I × J ; their choice produces a stage payoff g(ω m , i m , j m) and influences the transition: a new state ω m+1 is chosen according to the probability distribution q(•|ω m , i m , j m). At the end of the game, player 1 receives m≥1 λ(1 − λ) m−1 g(ω m , i m , j m) from player 2. The game Γ λ (ω) has a value v λ (ω), and v λ = (v λ (ω)) ω∈Ω is the unique fixed point of the so-called Shapley operator [4], i.e. v λ = Φ(λ, v λ), where for all f ∈ R Ω :
Lecture Notes in Computer Science, 2015
Ummels and Wojtczak initiated the study of finding Nash equilibria in simple stochastic multi-player games satisfying specific bounds. They showed that deciding the existence of pure-strategy Nash equilibria (PURENE) where a fixed player wins almost surely is undecidable for games with 9 players. They also showed that the problem remains undecidable for the finite-strategy Nash equilibrium (FINNE) with 14 players. In this paper we improve their undecidability results by showing that PURENE and FINNE problems remain undecidable for 5 or more players.
Dynamic Games and Applications, 2013
To celebrate the 60th anniversary of the seminal paper "Stochastic Games" of L.S. Shapley [16], Dynamic Games and Applications is proud to publish this special issue. Shapley's paper on stochastic games has had a tremendous scientific impact on the theory and applications of dynamic games and there is still a very active research on these domains. In addition, as can be seen by the content of this volume, the theoretical model as well the potential for applications develops in new directions including continuous time framework, link with evolutionary games, algorithmic game theory, economics, and social networks. The idea to devote a special issue to celebrate the 60th anniversary of Shapley's paper [16] emerged a few years ago, and the decision was taken in 2011. Since then, we had the great pleasure to enjoy the attribution to Llyod Shapley of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2012 1 "for the theory of stable allocations and the practice of market design," jointly with Alvin Roth. This is the occasion to recall the importance of Shapley's contributions in other areas of game theory like: • core for TU and NTU cooperative games; • equivalence principle for large economies; • matching (with David Gale);
Lecture Notes in Computer Science, 2009
We consider some well-known families of two-player zero-sum perfect-information stochastic games played on finite directed graphs. Generalizing and unifying results of Liggett and Lippman, Zwick and Paterson, and Chatterjee and Henzinger, we show that the following Work supported by Center for Algorithmic Game Theory, funded by the Carlsberg Foundation. A large fraction of the results of this paper appeared in a preprint [12], co-authored by Vladimir Gurvich and the second author of this paper. Vladimir Gurvich's contributions to that preprint will appear elsewhere.
2018
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of nature. The total payoff is the discounted sum of the stage payoffs. Assume that the players are very patient and use optimal strategies. We then prove that, at any point in the game, players get essentially the same expected payoff: the payoff is constant. This solves a conjecture by Sorin, Venel and Vigeral (2010). The proof relies on the semi-algebraic approach for discounted stochastic games introduced by Bewley and Kohlberg (1976), on the theory of Markov chains with rare transitions, initiated by Friedlin and Wentzell (1984), and on some variational inequalities for value functions inspired by the recent work of Davini, Fathi, Iturriaga and Zavidovique (2016).
2018
We suggest a new algorithm for two-person zero-sum undiscounted stochastic games focusing on stationary strategies. Given a positive real $$\varepsilon $$ε, let us call a stochastic game $$\varepsilon $$ε-ergodic, if its values from any two initial positions differ by at most $$\varepsilon $$ε. The proposed new algorithm outputs for every $$\varepsilon >0$$ε>0 in finite time either a pair of stationary strategies for the two players guaranteeing that the values from any initial positions are within an $$\varepsilon $$ε-range, or identifies two initial positions u and v and corresponding stationary strategies for the players proving that the game values starting from u and v are at least $$\varepsilon /24$$ε/24 apart. In particular, the above result shows that if a stochastic game is $$\varepsilon $$ε-ergodic, then there are stationary strategies for the players proving $$24\varepsilon $$24ε-ergodicity. This result strengthens and provides a constructive version of an existenti...
OPSEARCH, 2004
We introduce and investigate semi-infinite zero-sum two-person discounted semi-Markov games without putting any boundedness condition on its payoff function. We show that such games have a value with-00 in some states. We characterise the optimality equation of such games as well.
Stochastic Games and Applications, 2003
After a brief survey of iterative algorithms for general stochastic games, we concentrate on finite-step algorithms for two special classes of stochastic games. They are Single-Controller Stochastic Games and Perfect Information Stochastic Games. In the case of single-controller games, the transition probabilities depend on the actions of the same player in all states. In perfect information stochastic games, one of the players has exactly one action in each state. Single-controller zero-sum games are efficiently solved by linear programming. Non-zero-sum single-controller stochastic games are reducible to linear complementary problems (LCP). In the discounted case they can be modified to fit into the so-called LCPs of Eave's class L. In the undiscounted case the LCP's are reducible to Lemke's copositive plus class. In either case Lemke's algorithm can be used to find a Nash equilibrium. In the case of discounted zero-sum perfect information stochastic games, a policy improvement algorithm is presented. Many other classes of stochastic games with orderfield property still await efficient finite-step algorithms.
International Journal of Game Theory, 1993
This paper deals with undiscounted stochastic games. 1 As in Thuijsman-Vrieze [9], we consider specific states, which we call solvable. The existence of such slates in every game is proved in a new way. This proof implies the existence of equilibrium payoffs in stochastic games with at most 3 states. On an example, we relate our work to the construction of Thuijsman and Vrieze.
Electronic Proceedings in Theoretical Computer Science, 2011
Games on graphs provide a natural model for reactive non-terminating systems. In such games, the interaction of two players on an arena results in an infinite path that describes a run of the system. Different settings are used to model various open systems in computer science, as for instance turnbased or concurrent moves, and deterministic or stochastic transitions. In this paper, we are interested in turn-based games, and specifically in deterministic parity games and stochastic reachability games (also known as simple stochastic games). We present a simple, direct and efficient reduction from deterministic parity games to simple stochastic games: it yields an arena whose size is linear up to a logarithmic factor in size of the original arena.
We consider the problem of finding stationary Nash equilibria (NE) in a finite discounted general-sum stochastic game. We first generalize a non-linear optimization problem from Filar and Vrieze [2004] to a N-player setting and break down this problem into simpler sub-problems that ensure there is no Bellman error for a given state and an agent. We then provide a characterization of solution points of these sub-problems that correspond to Nash equilibria of the underlying game and for this purpose, we derive a set of necessary and sufficient SG-SP (Stochastic Game-Sub-Problem) conditions. Using these conditions, we develop two actor-critic algorithms: OFF-SGSP (model-based) and ON-SGSP (model-free). Both algorithms use a critic that estimates the value function for a fixed policy and an actor that performs descent in the policy space using a descent direction that avoids local minima. We establish that both algorithms converge, in self-play, to the equilibria of a certain ordinary differential equation (ODE), whose stable limit points coincide with stationary NE of the underlying general-sum stochastic game. On a single state non-generic game (see Hart and Mas-Colell [2005]) as well as on a synthetic two-player game setup with 810, 000 states, we establish that ON-SGSP consistently outperforms NashQ [Hu and Wellman, 2003] and FFQ [Littman, 2001] algorithms.
2012
We study stochastic two-player games where the goal of one player is to achieve precisely a given expected value of the objective function, while the goal of the opponent is the opposite. Potential applications for such games include controller synthesis problems where the optimisation objective is to maximise or minimise a given payoff function while respecting a strict upper or lower bound, respectively. We consider a number of objective functions including reachability, ω-regular, discounted reward, and total reward.
International Journal of Game Theory, 1989
Mathematical Programming, 2003
We give a policy-improvement type algorithm to locate an optimal pure stationary strategy for discounted stochastic games with perfect information. A graph theoretic motivation for our algorithm is presented as well.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.