Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
The paper is concerned with two-person games with saddle point. We investigate the limits of value functions for long-time-average payoff, discounted average payoff, and the payoff that follows a probability density on R ≥0 .
IEEE Transactions on Automatic Control, 1977
In deterministic team problems every closed-loop representation of an optimal open-loop solution is also optimal. This property, however, no longer holds true when the optimization problem is a zero-sum or a nonzero-sum game. In zero-sum games, two weaker (but still general enough) versions of this statement are valid, which still fail to hold in the case of nonzero-sum games. In this correspondence we state and prove these two general properties of the saddle-point solution in dynamic games.
Mathematics of Operations Research
We consider two person zero-sum games where the players control, at discrete times {tn} induced by a partition Π of R + , a continuous time Markov state process. We prove that the limit of the values vΠ exist as the mesh of Π goes to 0. The analysis covers the cases of : 1) stochastic games (where both players know the state) 2) symmetric no information. The proof is by reduction to a deterministic differential game.
2012
We provide a direct, elementary proof for the existence of lim λ→0 v λ , where v λ is the value of λ-discounted finite two-person zero-sum stochastic game. 1 Introduction Two-person zero-sum stochastic games were introduced by Shapley [4]. They are described by a 5-tuple (Ω, I, J , q, g), where Ω is a finite set of states, I and J are finite sets of actions, g : Ω × I × J → [0, 1] is the payoff, q : Ω × I × J → ∆(Ω) the transition and, for any finite set X, ∆(X) denotes the set of probability distributions over X. The functions g and q are bilinearly extended to Ω × ∆(I) × ∆(J). The stochastic game with initial state ω ∈ Ω and discount factor λ ∈ (0, 1] is denoted by Γ λ (ω) and is played as follows: at stage m ≥ 1, knowing the current state ω m , the players choose actions (i m , j m) ∈ I × J ; their choice produces a stage payoff g(ω m , i m , j m) and influences the transition: a new state ω m+1 is chosen according to the probability distribution q(•|ω m , i m , j m). At the end of the game, player 1 receives m≥1 λ(1 − λ) m−1 g(ω m , i m , j m) from player 2. The game Γ λ (ω) has a value v λ (ω), and v λ = (v λ (ω)) ω∈Ω is the unique fixed point of the so-called Shapley operator [4], i.e. v λ = Φ(λ, v λ), where for all f ∈ R Ω :
2018
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of nature. The total payoff is the discounted sum of the stage payoffs. Assume that the players are very patient and use optimal strategies. We then prove that, at any point in the game, players get essentially the same expected payoff: the payoff is constant. This solves a conjecture by Sorin, Venel and Vigeral (2010). The proof relies on the semi-algebraic approach for discounted stochastic games introduced by Bewley and Kohlberg (1976), on the theory of Markov chains with rare transitions, initiated by Friedlin and Wentzell (1984), and on some variational inequalities for value functions inspired by the recent work of Davini, Fathi, Iturriaga and Zavidovique (2016).
Sankhya A, 2010
We show in a dynamic programming framework that uniform convergence of the finite horizon values implies that asymptotically the average accumulated payoff is constant on optimal trajectories. We analyze and discuss several possible extensions to two-person games.
Annales de l'Institut Henri Poincaré, Probabilités et Statistiques
In a zero-sum stochastic game, at each stage, two adversary players take decisions and receive a stage payoff determined by them and by a random variable representing the state of nature. The total payoff is the discounted sum of the stage payoffs. Assume that the players are very patient and use optimal strategies. We then prove that, at any point in the game, players get essentially the same expected payoff: the payoff is constant. This solves a conjecture by Sorin, Venel and Vigeral (2010). The proof relies on the semi-algebraic approach for discounted stochastic games introduced by Bewley and Kohlberg (1976), on the theory of Markov chains with rare transitions, initiated by Friedlin and Wentzell (1984), and on some variational inequalities for value functions inspired by the recent work of Davini, Fathi, Iturriaga and Zavidovique (2016).
Journal of Optimization Theory and Applications, 1976
A family of two-person, zero-sum differential games in which the admissible strategies are Borel measurable is defined, and two types of saddle-point conditions are introduced as optimality criteria. In one, saddle-point candidates are compared at each point of the state space with all playable pairs at that point; and, in the other, they are compared only with strategy pairs playable on the entire state space. As a theorem, these two types of optimality are shown to be equivalent for the defined family of games. Also, it is shown that a certain closure property-is sufficient for this equivalence. A game having admissible strategies everywhere constant, in which the two types of saddle-point candidates are not equivalent, is discussed.
Proceedings of the National Academy of Sciences
In 1953, Lloyd Shapley defined the model of stochastic games, which were the first general dynamic model of a game to be defined, and proved that competitive stochastic games have a discounted value. In 1982, Jean-François Mertens and Abraham Neyman proved that competitive stochastic games admit a robust solution concept, the value, which is equal to the limit of the discounted values as the discount rate goes to 0. Both contributions were published in PNAS. In the present paper, we provide a tractable formula for the value of competitive stochastic games.
International Journal of Game Theory, 2015
We study the links between the values of stochastic games with varying stage duration h, the corresponding Shapley operators T and T h = hT + (1 -h)Id and the solution of the evolution equation ḟt = (T -Id)ft. Considering general non expansive maps we establish two kinds of results, under both the discounted or the finite length framework, that apply to the class of "exact" stochastic games. First, for a fixed length or discount factor, the value converges as the stage duration go to 0. Second, the asymptotic behavior of the value as the length goes to infinity, or as the discount factor goes to 0, does not depend on the stage duration. In addition, these properties imply the existence of the value of the finite length or discounted continuous time game (associated to a continuous time jointly controlled Markov process), as the limit of the value of any discretization with vanishing mesh.
2011
We consider two-person zero-sum stochastic games with perfect information and, for each k ∈ Z+, introduce a new payoff function, called the k-total reward. For k = 0 and 1 they are the so called mean and total rewards, respectively. For all k, we prove solvability of the considered games in pure stationary strategies, and show that the uniformly optimal strategies for the discounted mean payoff (discounted 0-reward) function are also uniformly optimal for k-total rewards if the discount factor is close enough (depending on k) to 1. We also demonstrate that the k-total reward games form a proper subset of the (k + 1)-total reward games for each k. In particular, all these classes contain mean-payoff games. This observation implies that, in the non-zero-sum case, Nash-solvability fails for all k.
IEEE Transactions on Automatic Control, 2000
Dynamic games in which each player has an exponential cost criterion are referred to as risk-sensitive dynamic games. In this note, Nash equilibra are considered for such games. Feedback risk-sensitive Nash equilibrium solutions are derived for two-person discrete-time linear-quadratic nonzero-sum games, both under complete state observation and shared partial observation.
Handbook of Game Theory with Economic Applications, 2015
The survey presents recent results in the theory of two-person zero-sum repeated games and their connections with differential and continuous-time games. The emphasis is made on the following points: 1) A general model allows to deal simultaneously with stochastic and informational aspects. 2) All evaluations of the stage payoffs can be covered in the same framework (and not only the usual Cesàro and Abel means). 3) The model in discrete time can be seen and analyzed as a discretization of a continuous time game. Moreover, tools and ideas from repeated games are very fruitful for continuous time games and vice versa. 4) Numerous important conjectures have been answered (some in the negative). 5) New tools and original models have been proposed. As a consequence, the field (discrete versus continuous time, stochastic versus incomplete information models) has a much more unified structure, and research is extremely active.
This paper deals with zero-sum stochastic differential games with long-run average payoffs. Our main objective is to give conditions for existence and characterization of bias and overtaking optimal equilibria. To this end, first we characterize the family of optimal average payoff strategies. Then, within this family, we impose suitable conditions to determine the subfamilies of bias and overtaking equilibria. A key step to obtain these facts is to show the existence of solutions to the average payoff optimality equations. This is done by the usual "vanishing discount" approach. Finally, a zero-sum game associated to a certain manufacturing process illustrates our results.
arXiv: Optimization and Control, 2018
We consider zero sum stochastic games. For every discount factor $\lambda$, a time normalization allows to represent the game as being played on the interval [0, 1]. We introduce the trajectories of cumulated expected payoff and of cumulated occupation measure up to time t $\in$ [0, 1], under $\epsilon$-optimal strategies. A limit optimal trajectory is defined as an accumulation point as the discount factor tends to 0. We study existence, uniqueness and characterization of these limit optimal trajectories for absorbing games.
Bernoulli, 2005
This paper is concerned with two-person zero-sum games for continuous-time Markov chains, with possibly unbounded payoff and transition rate functions, under the discounted payoff criterion. We give conditions under which the existence of the value of the game and a pair of optimal stationary strategies is ensured by using the optimality (or Shapley) equation. We prove the convergence of the value iteration scheme to the game's value and to a pair of optimal stationary strategies. Moreover, when the transition rates are bounded we further show that the convergence of value iteration is exponential. Our results are illustrated with a controlled queueing system with unbounded transition and reward rates.
arXiv (Cornell University), 2017
This paper provides sufficient conditions for the existence of solutions for two-person zero-sum games with inf/sup-compact payoff functions and with possibly noncompact decision sets for both players. Payoff functions may be unbounded, and we do not assume any convexity/concavity-type conditions. For such games expected payoff may not exist for some pairs of strategies. The results of this paper imply several classic facts. The paper also provides sufficient conditions for the existence of a value and solutions for each player. The results of this paper are illustrated with the number guessing game.
Stochastic Processes and their Applications, 2016
In this paper we consider two-person zero-sum risk-sensitive stochastic dynamic games with Borel state and action spaces and bounded reward. The term risk-sensitive refers to the fact that instead of the usual risk neutral optimization criterion we consider the exponential certainty equivalent. The discounted reward case on a finite and an infinite time horizon is considered, as well as the ergodic reward case. Under continuity and compactness conditions we prove that the value of the game exists and solves the Shapley equation and we show the existence of optimal (non-stationary) strategies. In the ergodic reward case we work with a local minorization property and a Lyapunov condition and show that the value of the game solves the Poisson equation. Moreover, we prove the existence of optimal stationary strategies. A simple example highlights the influence of the risk-sensitivity parameter. Our results generalize findings in [1] and answer an open question posed there.
Pacific Journal of Mathematics, 1975
Some sufficient conditions are given to show the existence of equilibrium points with finite spectrum for nonzero-sum two-person continuous games on the unit square. We also examine the question of uniqueness of the equilibrium point for such games. 1* Players I and II choose secretly an x and a y in the closed interval [0, 1]. Player I receives K x {x, y) and player II receives K 2 (x, y) where K lf K 2 are continuous on the unit square. The following theorem is classical in game theory: ([3], see page 156). There exists a pair of probability distributions (F°, G°), called Nash equilibrium points satisfying K,(F\ G°) ^ K λ {x, G°) for all x in 0 ^ x < 1 and K 2 (F°, G°) ^ K 2 (F°, y) for all y in 0 ^ y ^ 1 where K λ {F, G) = \\κix, y)dF(x)dG(y) and fa G) = ^K,{x, y)dG(y) etc. Let £f be the set of such pairs (F°, G°). One can ask the following questions. When does an (F°, G°) e g 7
Dynamic Games and Applications, 2012
Considérons un jeu stochastique à deux joueurs et à somme-nulle avec un espace d'état Borélien S, des espaces d'actions métriques et compacts A, B et une probabilité de transition q telle que l'intégrale sous q de toute fonction mesurable et bornée dépend mesurablement de l'état initial s et continument des actions (a,b) des joueurs. Supposons que le paiement est une fonction bornée f des histoires infinies des états et actions. Admettons enfin que f soit mesurable pour le produit des topologies Boréliennes (des espaces des cordonnées) et semicontinue inférieurement pour le produit des topologies discrètes. Alors, le jeu a une valeur et le joueur II a une stratégie optimale parfaite en sous jeux.
2011
Nous nous intéressons à la valeur asymptotique dans les jeux répétés à somme nulle avec une évaluation générale de la suite des paiements d'étapes. Nous montrons l'existence de la valeur asymptotique dans un sens robuste dans les jeux répétés à information incomplète, les jeux de splitting et les jeux absorbants. La technique de preuve consiste (1) à plonger le jeu répété en temps discret dans un jeu en temps continu et (2) à utiliser les solutions de viscosités.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.