Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018
We consider a game-theoretical multi-agent learning problem where the feedback information can be lost and rewards are given by a broad class of games known as variationally stable games. We propose a simple variant of the online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts reweighted online gradient descent learning dynamics, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents. We then extend the framework to deal with unknown feedback loss probabilities by using an estimator (constructed from past data) in its replacement. Finally, we further extend the framework to accommodate both asynchronous loss and stochastic rewards and establish that multi-agent ROGD learning still converges to the set of Nash equilibria in such settings. Together, we make meaningful progress towards the ...
2018
We consider a game-theoretical multi-agent learning problem where the feedback information can be lost during the learning process and rewards are given by a broad class of games known as variationally stable games. We propose a simple variant of the classical online gradient descent algorithm, called reweighted online gradient descent (ROGD) and show that in variationally stable games, if each agent adopts ROGD, then almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents. We then extend the framework to deal with unknown feedback loss probabilities by using an estimator (constructed from past data) in its replacement. Finally, we further extend the framework to accomodate both asynchronous loss and stochastic rewards and establish that multi-agent ROGD learning still converges to the set of Nash equilibria in such settings. Together, these results contribute to the broad lanscape of m...
2018
We examine the long-run behavior of multi-agent online learning in games that evolve over time. Specifically, we focus on a wide class of policies based on mirror descent, and we show that the induced sequence of play (a) converges to Nash equilibrium in time-varying games that stabilize in the long run to a strictly monotone limit; and (b) it stays asymptotically close to the evolving equilibrium of the sequence of stage games (assuming they are strongly monotone). Our results apply to both gradient-based and payoff-based feedback – i.e., when players only get to observe the payoffs of their chosen actions.
2017
Online Mirror Descent (OMD) is an important and widely used class of adaptive learning algorithms that enjoys good regret performance guarantees. It is therefore natural to study the evolution of the joint action in a multi-agent decision process (typically modeled as a repeated game) where every agent employs an OMD algorithm. This well-motivated question has received much attention in the literature that lies at the intersection between learning and games. However, much of the existing literature has been focused on the time average of the joint iterates. In this paper, we tackle a harder problem that is of practical utility, particularly in the online decision making setting: the convergence of the last iterate when all the agents make decisions according to OMD. We introduce an equilibrium stability notion called variational stability (VS) and show that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria. We also extend the OMD learning dynamics to a more general setting where the exact gradient is not available and show that the last iterate (now random) of OMD converges to the set of Nash equilibria almost surely.
IEEE Transactions on Automatic Control, 2021
Competitive non-cooperative online decision-making agents whose actions increase congestion of scarce resources constitute a model for widespread modern large-scale applications. To ensure sustainable resource behavior, we introduce a novel method to steer the agents toward a stable population state, fulfilling the given coupled resource constraints. The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian. Assuming that the online learning agents have only noisy first-order utility feedback, we show that for a polynomially decaying agents' step size/learning rate, the population's dynamic will almost surely converge to generalized Nash equilibrium. A particular consequence of the latter is the fulfillment of resource constraints in the asymptotic limit. Moreover, we investigate the finite-time quality of the proposed algorithm by giving a nonasymptotic time decaying bound for the expected amount of resource constraint violation.
arXiv (Cornell University), 2023
The behaviour of multi-agent learning in many player games has been shown to display complex dynamics outside of restrictive examples such as network zero-sum games. In addition, it has been shown that convergent behaviour is less likely to occur as the number of players increase. To make progress in resolving this problem, we study Q-Learning dynamics and determine a sufficient condition for the dynamics to converge to a unique equilibrium in any network game. We find that this condition depends on the nature of pairwise interactions and on the network structure, but is explicitly independent of the total number of agents in the game. We evaluate this result on a number of representative network games and show that, under suitable network conditions, stable learning dynamics can be achieved with an arbitrary number of agents.
2017
We consider a model of game-theoretic learning based on online mirror descent (OMD) with asynchronous and delayed feedback information. Instead of focusing on specific games, we consider a broad class of continuous games defined by the general equilibrium stability notion, which we call λ-variational stability. Our first contribution is that, in this class of games, the actual sequence of play induced by OMD-based learning converges to Nash equilibria provided that the feedback delays faced by the players are synchronous and bounded. Subsequently, to tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, we propose a variant of OMD which we call delayed mirror descent (DMD), and which relies on the repeated leveraging of past information. With this modification, the algorithm converges to Nash equilibria with no feedback synchronicity assumptions and even when the delays grow superlinearly relative to the horizon of play.
2009
We propose a new concept for the analysis of games, the TASP, which gives a precise prediction about non-equilibrium play in games whose Nash equilibria are mixed and are unstable under fictitious play-like learning. We show that, when players learn using weighted stochastic fictitious play and so place greater weight on recent experience, the time average of play often converges in these ���unstable��� games, even while mixed strategies and beliefs continue to cycle.
Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic algorithms, 2009
One issue in multi-agent co-adaptive learning concerns convergence. When two (or more) agents play a game with different information and different payoffs, the general behaviour tends to be oscillation around a Nash equilibrium. Several algorithms have been proposed to force convergence to mixed-strategy Nash equilibria in imperfect-information games when the agents are aware of their opponent's strategy. We consider the effect on one such algorithm, the lagging anchor algorithm, when each agent must also infer the gradient information from observations, in the infinitesimal time-step limit. Use of an estimated gradient, either by opponent modelling or stochastic gradient ascent, destabilises the algorithm in a region of parameter space. There are two phases of behaviour. If the rate of estimation is low, the Nash equilibrium becomes unstable in the mean. If the rate is high, the Nash equilibrium is an attractive fixed point in the mean, but the uncertainty acts as narrow-band coloured noise, which causes dampened oscillations.
IEEE Transactions on Systems, Man, and Cybernetics, 1994
A multi-person discrete game where the payoff after each play is stochastic is considered. The distribution of the random payoff is unknown to the players and further none of the players know the strategies or the actual moves of other players. A learning algorithm for the game based on a decentralized team of Learning Automata is presented. It is proved that all stable stationary points of the algorithm are Nash equilibria for the game. Two special cases of the game are also discussed, namely, game with common payoff and the relaxation labelling problem. The former has applications such as pattern recognition and the latter is a problem widely studied in computer vision. For the two special cases it is shown that the algorithm always converges to a desirable solution.
Computer and Information Sciences III, 2012
We consider a class of fully stochastic and fully distributed algorithms, that we prove to learn equilibria in games. Indeed, we consider a family of stochastic distributed dynamics that we prove to converge weakly (in the sense of weak convergence for probabilistic processes) towards their mean-field limit, i.e an ordinary differential equation (ODE) in the general case. We focus then on a class of stochastic dynamics where this ODE turns out to be related to multipopulation replicator dynamics. Using facts known about convergence of this ODE, we discuss the convergence of the initial stochastic dynamics: For general games, there might be non-convergence, but when convergence of the ODE holds, considered stochastic algorithms converge towards Nash equilibria. For games admitting Lyapunov functions, that we call Lyapunov games, the stochastic dynamics converge. We prove that any ordinal potential game, and hence any potential game is a Lyapunov game, with a multiaffine Lyapunov function. For Lyapunov games with a multiaffine Lyapunov function, we prove that this Lyapunov function is a super-martingale over the stochastic dynamics. This leads a way to provide bounds on their time of convergence by martingale arguments. This applies in particular for many classes of games that have been considered in literature, including several load balancing game scenarios and congestion games. 0
SIAM Journal on Control and Optimization, 2013
In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learnt on-line, but standard results in game theory do not consider such settings. For this problem, we derive a multi-agent version of Q-learning to estimate the reward functions using novel forms of the ε-greedy learning policy. Using these Q-learning schemes to estimate reward functions, we then provide conditions guaranteeing the convergence of adaptive play and the better-reply processes to Nash equilibria in potential games and games with more general forms of acyclicity, and of regret matching to the set of correlated equilibria in generic games. A secondary result is that we prove the strong ergoditicity of stochastic adaptive play and stochastic better-reply processes in the case of vanishing perturbations. Finally, we illustrate the efficacy of the algorithms in a set of randomly generated 3-player coordination games, and show the practical necessity of our results by demonstrating that violations to the derived learning parameter conditions can cause the algorithms to fail to converge.
2010
ABSTRACT This paper introduces a novel multiagent learning algorithm which achieves convergence, targeted optimality against memory bounded adversaries, and safety, in arbitrary repeated games. Called CMLeS, its most novel aspect is the manner in which it guarantees (in a PAC sense) targeted optimality against memory-bounded adversaries, via efficient exploration and exploitation. CMLeS is fully implemented and we present empirical results demonstrating its effectiveness.
arXiv (Cornell University), 2022
An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.
Automatica, 2012
Multi-agent systems arise in several domains of engineering and can be used to solve problems which are difficult for an individual agent to solve. Strategies for team decision problems, including optimal control, N-player games (H-infinity control and non-zero sum), and so on are normally solved for off-line by solving associated matrix equations such as the coupled Riccati equations or coupled Hamilton-Jacobi equations. However, using that approach players cannot change their objectives online in real time without calling for a completely new off-line solution for the new strategies. Therefore, in this paper we bring together cooperative control, reinforcement learning, and game theory to present a multiagent formulation for the online solution of team games. The notion of graphical games is developed for dynamical systems, where the dynamics and performance indices for each node depend only on local neighbor information. It is shown that standard definitions for Nash equilibrium are not sufficient for graphical games and a new definition of ''Interactive Nash Equilibrium'' is given. We give a cooperative policy iteration algorithm for graphical games that converges to the best response when the neighbors of each agent do not update their policies, and to the cooperative Nash equilibrium when all agents update their policies simultaneously. This is used to develop methods for online adaptive learning solutions of graphical games in real time along with proofs of stability and convergence.
Theoretical Economics, 2020
While payoff‐based learning models are almost exclusively devised for finite action games, where players can test every action, it is harder to design such learning processes for continuous games. We construct a stochastic learning rule, designed for games with continuous action sets, which requires no sophistication from the players and is simple to implement: players update their actions according to variations in own payoff between current and previous action. We then analyze its behavior in several classes of continuous games and show that convergence to a stable Nash equilibrium is guaranteed in all games with strategic complements as well as in concave games, while convergence to Nash equilibrium occurs in all locally ordinal potential games as soon as Nash equilibria are isolated.
2011
In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and which must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned on-line. Standard results in game theory, however, do not consider such settings. Specifically, using results from stochastic approximation and differential inclusions, we prove the convergence of variants of fictitious play and adaptive play to Nash equilibria in potential games and weakly acyclic games, respectively. These variants all use a multi-agent version of Q-learning to estimate the reward functions and a novel form of the e-greedy decision rule to select an action. Furthermore, we derive e-greedy decision rules that exploit the sparse interaction structure encoded in two compact graphical representations of games, known as graphical and hypergraphical normal form, to improve the convergence rate of the learning algorithms. The structure captured in these representations naturally occurs in many distributed optimisation and control applications. Finally, we demonstrate the efficacy of the algorithms in a simulated ad hoc wireless sensor network management problem.
IEEE Transactions on Automatic Control, 2018
The distributed computation of Nash equilibria is assuming growing relevance in engineering where such problems emerge in the context of distributed control. Accordingly, we present schemes for computing equilibria of two classes of static stochastic convex games complicated by a parametric misspecification, a natural concern in the control of large-scale networked engineered system. In both schemes, players learn the equilibrium strategy while resolving the misspecification: (1) Monotone stochastic Nash games: We present a set of coupled stochastic approximation schemes distributed across agents in which the first scheme updates each agent's strategy via a projected (stochastic) gradient step while the second scheme updates every agent's belief regarding its misspecified parameter using an independently specified learning problem. We proceed to show that the produced sequences converge in an almost-sure sense to the true equilibrium strategy and the true parameter , respectively. Surprisingly, convergence in the equilibrium strategy achieves the optimal rate of convergence in a mean-squared sense with a quantifiable degradation in the rate constant; (2) Stochastic Nash-Cournot games with unobservable aggregate output: We refine (1) to a Cournot setting where we assume that the tuple of strategies is unobservable while payoff functions and strategy sets are public knowledge through a common knowledge assumption. By utilizing observations of noise-corrupted prices, iterative fixedpoint schemes are developed, allowing for simultaneously learning the equilibrium strategies and the misspecified parameter in an almost-sure sense. We emphasize that this learning problem is unrelated to the computational process and is a built from a set of independently collected observations.
2010
ABSTRACT This paper introduces a new multi-agent learning algorithm for stochastic games based on replicator dynamics from evolutionary game theory. We identify and transfer desired convergence behavior of these dynamical systems by leveraging the link between evolutionary game theory and multiagent reinforcement learning.
2008
The traditional agenda in Multiagent Learning (MAL) has been to develop learners that guarantee convergence to an equilibrium in self-play or that converge to playing the best response against an opponent using one of a fixed set of known targeted strategies. This paper introduces an algorithm called Learn or Exploit for Adversary Induced Markov Decision Process (LoE-AIM) that targets optimality against any learning opponent that can be treated as a memory bounded adversary.
Optimization Methods and Software, 2019
This paper presents a dynamic regret analysis on the decentralized online convex optimization problems computed over a network of agents. The goal is to distributively optimize a global function which can be decomposed into the summation of local cost functions associated with each agent. By exploiting the convexity of cost functions, previous studies have shown that the dynamic regret of a decentralized gradient-based algorithm can be upper bounded by the path-length of the comparator sequence and the connectivity in the underlying network. In this paper, we illustrate that the dynamic regret can be further improved by allowing the learner to use a class of adaptive search directions and step-sizes from estimates of first and second moments of the gradients. Specifically, we develop a new class of decentralized and stochastic algorithms based on the adaptive gradient method in the dynamic environment and show that their regret bounds for certain realistic classes of loss functions are considerably better than existing bounds. Numerical simulations verify the theoretical results and demonstrate the efficiency of the new proposed method in practice.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.