Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2011, arXiv (Cornell University)
Gimbert and Horn gave an algorithm for solving simple stochastic games with running time O(r!n) where n is the number of positions of the simple stochastic game and r is the number of its coin toss positions. Chatterjee et al. pointed out that a variant of strategy iteration can be implemented to solve this problem in time 4 r r O(1) n O(1). In this paper, we show that an algorithm combining value iteration with retrograde analysis achieves a time bound of O(r2 r (r log r + n)), thus improving both time bounds. While the algorithm is simple, the analysis leading to this time bound is involved, using techniques of extremal combinatorics to identify worst case instances for the algorithm.
arXiv: Optimization and Control, 2018
Stochastic games are a classical model in game theory in which two opponents interact and the environment changes in response to the players' behavior. The central solution concepts for these games are the discounted values and the value, which represent what playing the game is worth to the players for different levels of impatience. In the present manuscript, we provide algorithms for computing exact expressions for the discounted values and for the value, which are polynomial in the number of pure stationary strategies of the players. This result considerably improves all the existing algorithms, including the most efficient one, due to Hansen, Kouck\'y, Lauritzen, Miltersen and Tsigaridas (STOC 2011).
2021
We present a generic strategy improvement algorithm (GSIA) to find an optimal strategy of simple stochastic games (SSG). We prove the correctness of GSIA, and derive a general complexity bound, which implies and improves on the results of several articles. First, we remove the assumption that the SSG is stopping, which is usually obtained by a polynomial blowup of the game. Second, we prove a tight bound on the denominator of the values associated to a strategy, and use it to prove that all strategy improvement algorithms are in fact fixed parameter tractable in the number r of random vertices. All known strategy improvement algorithms can be seen as instances of GSIA, which allows to analyze the complexity of converge from below by Condon [14] and to propose a class of algorithms generalising Gimbert and Horn’s algorithm [16, 17]. These algorithms terminate in at most r! iterations, and for binary SSGs, they do less iterations than the current best deterministic algorithm given by ...
Proceedings of the 43rd annual ACM symposium on Theory of computing - STOC '11, 2011
Shapley's discounted stochastic games, Everett's recursive games and Gillette's undiscounted stochastic games are classical models of game theory describing two-player zero-sum games of potentially infinite duration. We describe algorithms for exactly solving these games. When the number of positions of the game is constant, our algorithms run in polynomial time.
ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research, 1991
We consider finite state, finite action, stochastic games over an infinite time horizon. We survey algorithms for the computation of minimax optimal stationary strategies in the zerosum case, and of Nash equilibria in stationary strategies in the nonzerosum case. We also survey those theoretical results that pave the way towards future development of algorithms. Zusammenfassung: In dieser Arbeit werden unendlichstufige stochastische Spiele mit endlichen Zustands-und Aktionenr~umen untersucht. Es wird ein Uberblick gegeben fiber Algorithmen zur Berechnung von optimalen station/iren Minimax-Strategien in Nullsummen-Spielen und von station~tren Nash-Gleichgewichtsstrategien in Nicht-Nullsummen-Spielen. Einige theoretische Ergebnisse werden vorgestellt, die far die weitere Entwicklung von Algorithmen nOtzlich sind. 1 This paper is based on the invited lectures given by the authors at the 12th Symposium for Operations Research in Passau, 1987. We are indebted to M. Abbad, Evangelista Fe, F. Thuijsman and O.J. Vrieze for valuable comments and discussion. Any remaining errors of either misinterpretation or of omission are the authors' alone.
arXiv preprint arXiv:0711.1055, 2007
Abstract. We define the class of simple recursive games. A simple recursive game is defined as a simple stochastic game (a notion due to Anne Condon), except that we allow arbitrary real payoffs but disallow moves of chance. We study the complexity of solving simple recursive games and obtain an almost-linear time comparison-based algorithm for computing an equilibrium of such a game. The existence of a linear time comparison-based algorithm remains an open problem. We also extend our techniques to a new variant of ...
Lecture Notes in Computer Science, 2011
A fair two-party coin tossing protocol is one in which both parties output the same bit that is almost uniformly distributed (i.e., it equals 0 and 1 with probability that is at most negligibly far from one half). It is well known that it is impossible to achieve fair coin tossing even in the presence of fail-stop adversaries (Cleve, FOCS 1986). In fact, Cleve showed that for every coin tossing protocol running for r rounds, an efficient fail-stop adversary can bias the output by Ω(1/r). Since this is the best possible, a protocol that limits the bias of any adversary to O(1/r) is called optimally-fair. The only optimally-fair protocol that is known to exist relies on the existence of oblivious transfer, because it uses general secure computation (Moran, Naor and Segev, TCC 2009). However, it is possible to achieve a bias of O(1/ √ r) in r rounds relying only on the assumption that there exist one-way functions. In this paper we show that it is impossible to achieve optimally-fair coin tossing via a black-box construction from one-way functions for r that is less than O(n/ log n), where n is the input/output length of the one-way function used. An important corollary of this is that it is impossible to construct an optimally-fair coin tossing protocol via a black-box construction from one-way functions whose round complexity is independent of the security parameter n determining the security of the one-way function being used. Informally speaking, the main ingredient of our proof is to eliminate the random-oracle from "secure" protocols with "low round-complexity" and simulate the protocol securely against semi-honest adversaries in the plain model. We believe our simulation lemma to be of broader interest.
2009
Zero-sum stochastic games are easy to solve as they can be cast as simple Markov decision processes. This is however not the case with general-sum stochastic games. A fairly general optimization problem formulation is available for general-sum stochastic games in [10]. However, the optimization problem has non-linear objective and non-linear constraints with special structure. Algorithms for computationally solving such problems are not available in the literature. We present in this paper, a simple and robust algorithm for numerical solution of general-sum stochastic games with assured convergence to a Nash equilibrium.
Monte-Carlo Tree Search is a very successful game playing algorithm. Unfortunately it suffers from the horizon effect: some important tactical sequences may be delayed beyond the depth of the search tree, causing evaluation errors. Temporal-difference search with a function approximation is a method that was proposed to overcome these weaknesses, by adaptively changing the simulation policy outside the tree. In this paper we present an experimental evidence demonstrating that a temporal difference search may fail to find an optimal policy, even in very simple game positions. Classical temporal difference algorithms try to evaluate a local situation with a numerical value, but, as it appears, a single number is not enough to model the dynamics of a partial two-player game state. As a solution we propose to replace numerical values by approximate thermographs. With this richer representation of partial states, reinforcement-learning algorithms converge and accurately represent dynamics of states, allowing to find an optimal policy.
Lecture Notes in Computer Science, 2014
We consider two-player partial-observation stochastic games on finitestate graphs where player 1 has partial observation and player 2 has perfect observation. The winning condition we study are ω-regular conditions specified as parity objectives. The qualitative-analysis problem given a partial-observation stochastic game and a parity objective asks whether there is a strategy to ensure that the objective is satisfied with probability 1 (resp. positive probability). These qualitative-analysis problems are known to be undecidable. However in many applications the relevant question is the existence of finite-memory strategies, and the qualitative-analysis problems under finite-memory strategies was recently shown to be decidable in 2EXPTIME. We improve the complexity and show that the qualitative-analysis problems for partial-observation stochastic parity games under finite-memory strategies are EXPTIME-complete; and also establish optimal (exponential) memory bounds for finite-memory strategies required for qualitative analysis.
Lecture Notes in Computer Science, 2013
One-clock priced timed games is a class of two-player, zero-sum, continuous-time games that was defined and thoroughly studied in previous works. We show that one-clock priced timed games can be solved in time m12 n n O(1) , where n is the number of states and m is the number of actions. The best previously known time bound for solving one-clock priced timed games was 2 O(n 2 +m) , due to Rutkowski. For our improvement, we introduce and study a new algorithm for solving one-clock priced timed games, based on the sweep-line technique from computational geometry and the strategy iteration paradigm from the algorithmic theory of Markov decision processes. As a corollary, we also improve the analysis of previous algorithms due to Bouyer, Cassez, Fleury, and Larsen; and Alur, Bernadsky, and Madhusudan.
Journal of the ACM, 2013
Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O ( mn 1− γ log n 1− γ ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 < γ < 1 is the discount factor. We improve Ye’s analysis in two respects. First, we improve the bound given by Ye and show that Howard’s policy iteration algorithm actually terminates after at most O ( m 1− γ log n 1− γ ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement ) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provide...
Electronic Communications in Probability, 2004
Consider a two-person zero-sum game played on a random n × n-matrix where the entries are iid normal random variables. Let Z be the number of rows in the support of the optimal strategy for player I given the realization of the matrix. (The optimal strategy is a.s. unique and Z a.s. coincides with the number of columns of the support of the optimal strategy for player II.) Faris an Maier [4] make simulations that suggest that as n gets large Z has a distribution close to binomial with parameters n and 1/2 and prove that P (Z = n) ≤ 2 −(k−1). In this paper a few more theoretically rigorous steps are taken towards the limiting distribution of Z: It is shown that there exists a < 1/2 (indeed a < 0.4) such that P (1 2 − a)n < Z < (1 2 + a)n → 1 as n → ∞. It is also shown that EZ = (1 2 + o(1))n. We also prove that the value of the game with probability 1 − o(1) is at most Cn −1/2 for some C < ∞ independent of n. The proof suggests that an upper bound is in fact given by f (n)n −1 , where f (n) is any sequence such that f (n) → ∞, and it is pointed out that if this is true, then the variance of Z is o(n 2) so that any a > 0 will do in the bound on Z above.
Electronic Proceedings in Theoretical Computer Science, 2011
Games on graphs provide a natural model for reactive non-terminating systems. In such games, the interaction of two players on an arena results in an infinite path that describes a run of the system. Different settings are used to model various open systems in computer science, as for instance turnbased or concurrent moves, and deterministic or stochastic transitions. In this paper, we are interested in turn-based games, and specifically in deterministic parity games and stochastic reachability games (also known as simple stochastic games). We present a simple, direct and efficient reduction from deterministic parity games to simple stochastic games: it yields an arena whose size is linear up to a logarithmic factor in size of the original arena.
2011
Two standard algorithms for approximately solving two-player zero-sum concurrent reachability games are value iteration and strategy iteration. We prove upper and lower bounds of 2 m Q (N) 2^{m^{\ Theta (N)}} on the worst case number of iterations needed for both of these algorithms to provide non-trivial approximations to the value of a game with N non-terminal positions and m actions for each player in each position.
Lecture Notes in Computer Science, 2005
The theory of graph games with ω-regular winning conditions is the foundation for modeling and synthesizing reactive processes. In the case of stochastic reactive processes, the corresponding stochastic graph games have three players, two of them (System and Environment) behaving adversarially, and the third (Uncertainty) behaving probabilistically. We consider two problems for stochastic graph games: the qualitative problem asks for the set of states from which a player can win with probability 1 (almost-sure winning); the quantitative problem asks for the maximal probability of winning (optimal winning) from each state. We show that for Rabin winning conditions, both problems are in NP. As these problems were known to be NP-hard, it follows that they are NPcomplete for Rabin conditions, and dually, coNP-complete for Streett conditions. The proof proceeds by showing that pure memoryless strategies suffice for qualitatively and quantitatively winning stochastic graph games with Rabin conditions. This insight is of interest in its own right, as it implies that controllers for Rabin objectives have simple implementations. We also prove that for every ω-regular condition, optimal winning strategies are no more complex than almost-sure winning strategies.
2012
We study stochastic two-player games where the goal of one player is to achieve precisely a given expected value of the objective function, while the goal of the opponent is the opposite. Potential applications for such games include controller synthesis problems where the optimisation objective is to maximise or minimise a given payoff function while respecting a strict upper or lower bound, respectively. We consider a number of objective functions including reachability, ω-regular, discounted reward, and total reward.
Computing Research Repository - CORR, 2008
We consider some well known families of two-player, zero-sum, turn-based, perfect information games that can be viewed as specical cases of Shapley's stochastic games. We show that the following tasks are polynomial time equivalent: Solving simple stochastic games, solving stochastic mean-payo games with rewards and probabilities given in unary, and solving stochastic mean-payo games with rewards and probabilities given in binary.
2020
This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena – a stochastic game graph with unknown but fixed probability distributions – to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stocha...
2006
We consider two-player infinite games played on graphs. The games are concurrent, in that at each state the players choose their moves simultaneously and independently, and stochastic, in that the moves determine a probability distribution for the successor state. The value of a game is the maximal probability with which a player can guarantee the satisfaction of her objective. We show that the values of concurrent games with ω-regular objectives expressed as parity conditions can be computed in NP ∩ coNP. This result substantially improves the best known previous bound of 3EXPTIME. It also shows that the full class of concurrent parity games is no harder than the special cases of turnbased deterministic parity games (Emerson-Jutla) and of turn-based stochastic reachability games (Condon), for both of which NP ∩ coNP is the best known bound. While the previous, more restricted NP ∩ coNP results for graph games relied on the existence of particularly simple (pure memoryless) optimal strategies, in concurrent games with parity objectives optimal strategies may not exist, and ε-optimal strategies (which achieve the value of the game within a parameter ε > 0) require in general both randomization and infinite memory. Hence our proof must rely on a more detailed analysis of strategies and, in addition to the main result, yields two results that are interesting on their own.
2006
We consider stochastic turn-based games where the winning objectives are given by formulae of the branching-time logic PCTL. These games are generally not determined and winning strategies may require memory and/or randomization. Our main results concern history-dependent strategies. In particular, we show that the problem whether there exists a history-dependent winning strategy in 1 1 2-player games is highly undecidable, even for objectives formulated in the L(F =5/8 , F =1 , F >0 , G =1) fragment of PCTL. On the other hand, we show that the problem becomes decidable (and in fact EXPTIME-complete) for the L(F =1 , F >0 , G =1) fragment of PCTL, where winning strategies require only finite memory. This result is tight in the sense that winning strategies for L(F =1 , F >0 , G =1 , G >0) objectives may already require infinite memory.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.