Papers by Ganesh Sambhaji Ghalme

arXiv (Cornell University), Nov 4, 2022
We study the classic problem of fairly dividing a heterogeneous and divisible resource-modeled as... more We study the classic problem of fairly dividing a heterogeneous and divisible resource-modeled as a line segment [0, 1] and typically called as a cake-among n agents. This work considers an interesting variant of the problem where agents are embedded on a graph. The graphical constraint entails that each agent evaluates her allocated share only against her neighbors' share. Given a graph, the goal is to efficiently find a locally envy-free allocation where every agent values her share of the cake to be at least as much as that of any of her neighbors' share. The most significant contribution of this work is a bounded protocol that finds a locally envyfree allocation among n agents on a tree graph using n O(n) queries under the standard Robertson-Webb (RW) query model. The query complexity of our proposed protocol, though exponential, significantly improves the currently best known hyper-exponential query complexity bound of Aziz and Mackenzie [AM16a] for complete graphs. In particular, we also show that if the underlying tree graph has a depth of at most two, one can find a locally envy-free allocation with O(n 4 log n) RW queries. This is the first and the only known locally envy-free cake cutting protocol with polynomial query complexity for a non-trivial graph structure. Interestingly, our discrete protocols are simple and easy to understand, as opposed to highly involved protocol of [AM16a]. This simplicity can be attributed to their recursive nature and the use of a single agent as a designated cutter. We believe that these results will help us improve our algorithmic understanding of the arguably challenging problem of envy-free cake-cutting by uncovering the bottlenecks in its query complexity and its relation to the underlying graph structures.

Fairness has emerged as an important concern in automated decisionmaking in recent years, especia... more Fairness has emerged as an important concern in automated decisionmaking in recent years, especially when these decisions affect human welfare. In this work, we study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs). Our proposed notion of fairness ensures that each state's long-term visitation frequency is at least a specified fraction. This quota-based notion of fairness is natural in many resource-allocation settings where the dynamics of a single resource being allocated is governed by an MDP and the distribution of the shared resource is captured by its state-visitation frequency. In an average-reward MDP (AMDP) setting, we formulate the problem as a bilinear saddle point program and, for a generative model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The proposed solution guarantees a simultaneous approximation on the expected average-reward and fairness requirement. We give sample complexity bounds for the proposed algorithm and validate our theoretical results with experiments on simulated data.

adaptive agents and multi-agents systems, May 9, 2016
We consider a general crowdsourcing setting with strategic workers whose qualities are unknown an... more We consider a general crowdsourcing setting with strategic workers whose qualities are unknown and design a multiarmed bandit (MAB) mechanism, CrowdUCB, which is deterministic, regret minimizing, and offers immediate payments to the workers. The problem involves sequentially selecting workers to process tasks in order to maximize the social welfare while learning the qualities of the strategic workers (strategic about their costs). Existing MAB mechanisms are either: (a) deterministic which potentially cause significant loss in social welfare, or (b) randomized which typically lead to high variance in payments. CrowdUCB completely addresses the above problems with the following features: (i) offers deterministic payments, (ii) achieves logarithmic regret in social welfare, (iii) renders allocations more effective by allocating blocks of tasks to a worker instead of a single task, and (iv) offers payment to a worker immediately upon completion of an assigned block of tasks. CrowdUCB is a mechanism with learning that learns the qualities of the workers while eliciting their true costs, irrespective of whether or not the workers know their own qualities. We show that CrowdUCB is ex-post individually rational (EPIR) and ex-post incentive compatible (EPIC) when the workers do not know their own qualities and when they update their beliefs in sync with the requester. When the workers know their own qualities, CrowdUCB is EPIR and ε−EPIC where ε is sub-linear in terms of the number of tasks.

Artificial Intelligence, 2021
In this paper, we introduce ballooning multi-armed bandits (BL-MAB), a novel extension of the cla... more In this paper, we introduce ballooning multi-armed bandits (BL-MAB), a novel extension of the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. In contrast to the classical MAB setting where the regret is computed with respect to the best arm overall, the regret in a BL-MAB setting is computed with respect to the best available arm at each time. We first observe that the existing stochastic MAB algorithms result in linear regret for the BL-MAB model. We prove that, if the best arm is equally likely to arrive at any time instant, a sub-linear regret cannot be achieved. Next, we show that if the best arm is more likely to arrive in the early rounds, one can achieve sub-linear regret. Our proposed algorithm determines (1) the fraction of the time horizon for which the newly arriving arms should be explored and (2) the sequence of arm pulls in the exploitation phase from among the explored arms. Making reasonable assumptions on the arrival distribution of the best arm in terms of the thinness of the distribution's tail, we prove that the proposed algorithm achieves sub-linear instance-independent regret. We further quantify explicit dependence of regret on the arrival distribution parameters. We reinforce our theoretical findings with extensive simulation results. We conclude by showing that our algorithm would achieve sub-linear regret even if (a) the distributional parameters are not exactly known, but are obtained using a reasonable learning mechanism or (b) the best arm is not more likely to arrive early, but a large fraction of arms is likely to arrive relatively early.

ArXiv, 2021
In this paper, we study an interesting combination of sleeping and combinatorial stochastic bandi... more In this paper, we study an interesting combination of sleeping and combinatorial stochastic bandits. In the mixed model studied here, at each discrete time instant, an arbitrary availability set is generated from a fixed set of base arms. An algorithm can select a subset of arms from the availability set (sleeping bandits) and receive the corresponding reward along with semi-bandit feedback (combinatorial bandits). We adapt the well-known CUCB algorithm in the sleeping combinatorial bandits setting and refer to it as CS-UCB. We prove — under mild smoothness conditions — that the CS-UCB algorithm achieves an O(log(T )) instance-dependent regret guarantee. We further prove that (i) when the range of the rewards is bounded, the regret guarantee of CS-UCB algorithm is O( √ T log(T )) and (ii) the instance-independent regret is O( 3 √ T 2 log(T )) in a general setting. Our results are quite general and hold under general environments — such as non-additive reward functions, volatile arm ...

ArXiv, 2021
Strategic classification studies the interaction between a classification rule and the strategic ... more Strategic classification studies the interaction between a classification rule and the strategic agents it governs. Under the assumption that the classifier is known, rational agents respond to it by manipulating their features. However, in many reallife scenarios of high-stake classification (e.g., credit scoring), the classifier is not revealed to the agents, which leads agents to attempt to learn the classifier and game it too. In this paper we generalize the strategic classification model to such scenarios. We define the “price of opacity” as the difference in prediction error between opaque and transparent strategy-robust classifiers, characterize it, and give a sufficient condition for this price to be strictly positive, in which case transparency is the recommended policy. Our experiments show how Hardt et al.’s robust classifier is affected by keeping agents in the dark.

We consider the problem of designing a robust credit score function in the context of online disc... more We consider the problem of designing a robust credit score function in the context of online discussion forums. Credit score function assigns a real-valued credit score to each participant based on activities on the forum. A credit score of a participant quantifies the usefulness of contribution made by her. However, participants can manipulate a credit score function by forming coalitions, i.e., by strategically awarding upvotes, likes, etc. among a subset of agents to maximize their credit scores. We propose a coalition resistant credit score function which discourages such strategic endorsements. We use community detection algorithms to identify close-knit communities in the graph of interactions and characterize coalition identifying community detection metric. In particular, we show that modularity is coalition identifying and provide theoretical guarantees on modularity based credit score function. Finally, we validate our theoretical findings with simulations on illustrative ...

This paper explores Thompson sampling in the context of mechanism design for stochastic multi-arm... more This paper explores Thompson sampling in the context of mechanism design for stochastic multi-armed bandit (MAB) problems. The setting is that of an MAB problem where the reward distribution of each arm consists of a stochastic component as well as a strategic component. Many existing MAB mechanisms use upper confidence bound (UCB) based algorithms for learning the parameters of the reward distribution. The randomized nature of Thompson sampling introduces certain unique, non-trivial challenges for mechanism design which we address in this paper through a rigorous regret analysis. We first propose a MAB mechanism with deterministic payment rule, namely, TSM-D. We show that in TSMD, the variance of agent utilities asymptotically approaches zero. However, the game theoretic properties satisfied by TSM-D (incentive compatibility and individual rationality with high probability) are rather weak. As our main contribution, we then propose the mechanism TSM-R, with randomized payment rule,...

We revisit the problem of fair clustering, first introduced by Chierichetti et al. (2017) , that ... more We revisit the problem of fair clustering, first introduced by Chierichetti et al. (2017) , that requires each protected attribute to have approximately equal representation in every cluster; i.e., a Balance property. Existing solutions to fair clustering are either not scalable or do not achieve an optimal trade-off between clustering objective and fairness. In this paper, we propose a new notion of fairness, which we call τ -ratio fairness, that strictly generalizes the Balance property and enables a finegrained efficiency vs. fairness trade-off. Furthermore, we show that simple greedy round-robin based algorithms achieve this trade-off efficiently. Under a more general setting of multi-valued protected attributes, we rigorously analyze the theoretical properties of the our algorithms. Our experimental results suggest that the proposed solution outperforms all the state-of-the-art algorithms and works exceptionally well even for a large number of clusters.
Fairness has emerged as an important concern in automated decision-making in recent years, especi... more Fairness has emerged as an important concern in automated decision-making in recent years, especially when these decisions affect human welfare. In this work, we study fairness in temporally extended decision-making settings, specifically those formulated as Markov Decision Processes (MDPs). Our proposed notion of fairness ensures that each state’s long-term visitation frequency is more than a specified fraction. In an average-reward MDP (AMDP) setting, we formulate the problem as a bilinear saddle point program and, for a generative model, solve it using a Stochastic Mirror Descent (SMD) based algorithm. The proposed solution guarantees a simultaneous approximation on the expected average-reward and the long-term state-visitation frequency. We validate our theoretical results with experiments on synthetic data.
ArXiv, 2019
We study fair division of indivisible goods among strategic agents in a single-parameter environm... more We study fair division of indivisible goods among strategic agents in a single-parameter environment. This work specifically considers fairness in terms of envy freeness up to one good (EF1) and maximin share guarantee (MMS). We show that (in a single-parameter environment) the problem of maximizing welfare, subject to the constraint that the allocation of the indivisible goods is EF1, admits a polynomial-time, 1/2-approximate, truthful auction. Under MMS setup, we develop a truthful auction which efficiently finds an allocation wherein each agent gets a bundle of value at least (1/2 - e) times her maximin share and the welfare of the computed allocation is at least the optimal, here e >0 is a fixed constant. Our results for EF1 and MMS are based on establishing interesting majorization inequalities.

Proceedings of the AAAI Conference on Artificial Intelligence, 2020
We study an interesting variant of the stochastic multi-armed bandit problem, which we call the F... more We study an interesting variant of the stochastic multi-armed bandit problem, which we call the Fair-MAB problem, where, in addition to the objective of maximizing the sum of expected rewards, the algorithm also needs to ensure that at any time, each arm is pulled at least a pre-specified fraction of times. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, which we call r-Regret, that takes into account the above fairness constraints and extends the conventional notion of regret in a natural way. Our primary contribution is to obtain a complete characterization of a class of Fair-MAB algorithms via two parameters: the unfairness tolerance and the learning algorithm used as a black-box. For this class of algorithms, we provide a fairness guarantee that holds uniformly over time, irrespective of the choice of the learning algorithm. Further, when the learning algo...
Uploads
Papers by Ganesh Sambhaji Ghalme