Papers by Grant R Y Schoenebeck

arXiv (Cornell University), Jun 22, 2022
In the literature of data privacy, differential privacy is the most popular model. An algorithm i... more In the literature of data privacy, differential privacy is the most popular model. An algorithm is differentially private if its outputs with and without any individual's data are indistinguishable. In this paper, we focus on data generated from a Markov chain and argue that Bayesian differential privacy (BDP) offers more meaningful guarantees in this context. Our main theoretical contribution is providing a mechanism for achieving BDP when data is drawn from a binary Markov chain. We improve on the state-of-the-art BDP mechanism and show that our mechanism provides the optimal noise-privacy tradeoffs for any local mechanism up to negligible factors. We also briefly discuss a non-local mechanism which adds correlated noise. Lastly, we perform experiments on synthetic data that detail when DP is insufficient, and experiments on real data to show that our privacy guarantees are robust to underlying distributions that are not simple Markov chains.

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020
We study learning statistical properties from strategic agents with private information. In this ... more We study learning statistical properties from strategic agents with private information. In this problem, agents must be incentivized to truthfully reveal their information even when it cannot be directly verified. Moreover, the information reported by the agents must be aggregated into a statistical estimate. We study two fundamental statistical properties: estimating the mean of an unknown Gaussian, and linear regression with Gaussian error. The information of each agent is one point in a Euclidean space. Our main results are two mechanisms for each of these problems which optimally aggregate the information of agents in the truth-telling equilibrium: • A minimal (non-revelation) mechanism for large populations-agents only need to report one value, but that value need not be their point. • A mechanism for small populations that is non-minimalagents need to answer more than one question. These mechanisms are "informed truthful" mechanisms where reporting unaltered data (truth-telling) 1) forms a strict Bayesian Nash equilibrium and 2) has strictly higher welfare than any oblivious equilibrium where agents' strategies are independent of their private signals. We also show a minimal revelation mechanism (each agent only reports her signal) for a restricted setting and use an impossibility result to prove the necessity of this restriction. We build upon the peer prediction literature in the singlequestion setting; however, most previous work in this area focuses on discrete signals, whereas our setting is inherently continuous, and we further simplify the agents' reports.

Peer prediction aims to incentivize truthful reports from agents whose reports cannot be assessed... more Peer prediction aims to incentivize truthful reports from agents whose reports cannot be assessed with any objective ground truthful information. In the multi-task setting where each agent is asked multiple questions, a sequence of mechanisms have been proposed which are truthful-truth-telling is guaranteed to be an equilibrium, or even better, informed truthful-truth-telling is guaranteed to be one of the best-paid equilibria. However, these guarantees assume agents' strategies are restricted to be task-independent: an agent's report on a task is not affected by her information about other tasks. We provide the first discussion on how to design (informed) truthful mechanisms for task-dependent strategies, which allows the agents to report based on all her information on the assigned tasks. We call such stronger mechanisms (informed) omni-truthful. In particular, we propose the joint-disjoint task framework, a new paradigm which builds upon the previous penalty-bonus task framework. First, we show a natural reduction from mechanisms in the penalty-bonus task framework to mechanisms in the joint-disjoint task framework that maps every truthful mechanism to an omnitruthful mechanism. Such a reduction is non-trivial as we show that current penalty-bonus task mechanisms are not, in general, omni-truthful. Second, for a stronger truthful guarantee, we design the matching agreement (MA) mechanism which is informed omnitruthful. Finally, for the MA mechanism in the detail-free setting where no prior knowledge is assumed, we show how many tasks are required to (approximately) retain the truthful guarantees. CCS CONCEPTS • Theory of computation → Algorithmic mechanism design; • Information systems → Incentive schemes; • Mathematics of computing → Probability and statistics.

Springer eBooks, 2011
We study a model of learning on social networks in dynamic environments, describing a group of ag... more We study a model of learning on social networks in dynamic environments, describing a group of agents who are each trying to estimate an underlying state that varies over time, given access to weak signals and the estimates of their social network neighbors. We study three models of agent behavior. In the fixed response model, agents use a fixed linear combination to incorporate information from their peers into their own estimate. This can be thought of as an extension of the DeGroot model to a dynamic setting. In the best response model, players calculate minimum variance linear estimators of the underlying state. We show that regardless of the initial configuration, fixed response dynamics converge to a steady state, and that the same holds for best response on the complete graph. We show that best response dynamics can, in the long term, lead to estimators with higher variance than is achievable using well chosen fixed responses. The penultimate prediction model is an elaboration of the best response model. While this model only slightly complicates the computations required of the agents, we show that in some cases it greatly increases the efficiency of learning, and on complete graphs is in fact optimal, in a strong sense.
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms
Proceedings of the ACM Web Conference 2023

Proceedings of the ACM Web Conference 2023
Peer prediction aims to incentivize truthful reports from agents whose reports cannot be assessed... more Peer prediction aims to incentivize truthful reports from agents whose reports cannot be assessed with any objective ground truthful information. In the multi-task setting where each agent is asked multiple questions, a sequence of mechanisms have been proposed which are truthful-truth-telling is guaranteed to be an equilibrium, or even better, informed truthful-truth-telling is guaranteed to be one of the best-paid equilibria. However, these guarantees assume agents' strategies are restricted to be task-independent: an agent's report on a task is not affected by her information about other tasks. We provide the first discussion on how to design (informed) truthful mechanisms for task-dependent strategies, which allows the agents to report based on all her information on the assigned tasks. We call such stronger mechanisms (informed) omni-truthful. In particular, we propose the joint-disjoint task framework, a new paradigm which builds upon the previous penalty-bonus task framework. First, we show a natural reduction from mechanisms in the penalty-bonus task framework to mechanisms in the joint-disjoint task framework that maps every truthful mechanism to an omnitruthful mechanism. Such a reduction is non-trivial as we show that current penalty-bonus task mechanisms are not, in general, omni-truthful. Second, for a stronger truthful guarantee, we design the matching agreement (MA) mechanism which is informed omnitruthful. Finally, for the MA mechanism in the detail-free setting where no prior knowledge is assumed, we show how many tasks are required to (approximately) retain the truthful guarantees. CCS CONCEPTS • Theory of computation → Algorithmic mechanism design; • Information systems → Incentive schemes; • Mathematics of computing → Probability and statistics.

Proceedings of the 2018 ACM Conference on Economics and Computation
A central question 1 of crowdsourcing is how to elicit expertise from agents. This is even more d... more A central question 1 of crowdsourcing is how to elicit expertise from agents. This is even more difficult when answers cannot be directly verified. A key challenge is that sophisticated agents may strategically withhold effort or information when they believe their payoff will be based upon comparison with other agents whose reports will likely omit this information due to lack of effort or expertise. Our work defines a natural model for this setting based on the assumption that more sophisticated agents know the beliefs of less sophisticated agents. We then provide a mechanism design framework for this setting. From this framework, we design several novel mechanisms, for both the single and multiple tasks settings, that (1) encourage agents to invest effort and provide their information honestly; (2) output a correct "hierarchy" of the information when agents are rational.
Proceedings of the ACM Web Conference 2022

arXiv (Cornell University), Aug 8, 2021
We consider two-alternative elections where voters' preferences depend on a state variable that i... more We consider two-alternative elections where voters' preferences depend on a state variable that is not directly observable. Each voter receives a private signal that is correlated to the state variable. Voters may be "contingent" with different preferences in different states; or predetermined with the same preference in every state. In this setting, even if every voter is a contingent voter, agents voting according to their private information need not result in the adoption of the universally preferred alternative, because the signals can be systematically biased. We present an easy-to-deploy mechanism that elicits and aggregates the private signals from the voters, and outputs the alternative that is favored by the majority. In particular, voters truthfully reporting their signals forms a strong Bayes Nash equilibrium (where no coalition of voters can deviate and receive a better outcome).

We consider a Bayesian persuasion problem where the sender tries to persuade the receiver to take... more We consider a Bayesian persuasion problem where the sender tries to persuade the receiver to take a particular action via a sequence of signals. This we model by considering multi-phase trials with different experiments conducted based on the outcomes of prior experiments. In contrast to most of the literature, we consider the problem with constraints on signals imposed on the sender. This we achieve by fixing some of the experiments in an exogenous manner; these are called determined experiments. This modeling helps us understand real-world situations where this occurs: e.g., multi-phase drug trials where the FDA determines some of the experiments, start-up acquisition by big firms where late-stage assessments are determined by the potential acquirer, multiround job interviews where the candidates signal initially by presenting their qualifications but the rest of the screening procedures are determined by the interviewer. The non-determined experiments (signals) in the multi-phase...

ACM Transactions on Economics and Computation, 2020
We study the influence maximization problem in undirected networks, specifically focusing on the ... more We study the influence maximization problem in undirected networks, specifically focusing on the independent cascade and linear threshold models. We prove APX-hardness (NP-hardness of approximation within factor (1-τ) for some constant τ > 0) for both models, which improves the previous NP-hardness lower bound for the linear threshold model. No previous hardness result was known for the independent cascade model. As part of the hardness proof, we show some natural properties of these cascades on undirected graphs. For example, we show that the expected number of infections of a seed set S is upper bounded by the size of the edge cut of S in the linear threshold model and a special case of the independent cascade model, the weighted independent cascade model. Motivated by our upper bounds, we present a suite of highly scalable local greedy heuristics for the influence maximization problem on both the linear threshold model and the weighted independent cascade model on undirected g...
Web and Internet Economics, 2019
We consider a setting where a verifier with limited computation power delegates a resource intens... more We consider a setting where a verifier with limited computation power delegates a resource intensive computation task-which requires a T ×S computation tableau-to two provers where the provers are rational in that each prover maximizes their own payoff-taking into account losses incurred by the cost of computation. We design a mechanism called the Minimal Refereed Mechanism (MRM) such that if the verifier has O(log S + log T) time and O(log S + log T) space computation power, then both provers will provide a honest result without the verifier putting any effort to verify the results. The amount of computation required for the provers (and thus the cost) is a multiplicative log S-factor more than the computation itself, making this schema efficient especially for low-space computations.

Peer-prediction is a mechanism which elicits privately-held, non-variable information from self-i... more Peer-prediction is a mechanism which elicits privately-held, non-variable information from self-interested agents---formally, truth-telling is a strict Bayes Nash equilibrium of the mechanism. The original Peer-prediction mechanism suffers from two main limitations: (1) the mechanism must know the "common prior" of agents' signals; (2) additional undesirable and non-truthful equilibria exist which often have a greater expected payoff than the truth-telling equilibrium. A series of results has successfully weakened the known common prior assumption. However, the equilibrium multiplicity issue remains a challenge. In this paper, we address the above two problems. In the setting where a common prior exists but is not known to the mechanism we show (1) a general negative result applying to a large class of mechanisms showing truth-telling can never pay strictly more in expectation than a particular set of equilibria where agents collude to "relabel" the signals a...

In this work we look at opinion formation and the effects of two phenomena both of which promote ... more In this work we look at opinion formation and the effects of two phenomena both of which promote consensus between agents connected by ties: influence, agents changing their opinions to match their neighbors; and selection, agents re-wiring to connect to new agents when the existing neighbor has a different opinion. In our agent-based model, we assume that only weak ties can be rewired and strong ties do not change. The network structure as well as the opinion landscape thus co-evolve with two important parameters: the probability of influence versus selection; and the fraction of strong ties versus weak ties. Using empirical and theoretical methodologies we discovered that on a two-dimensional spatial network: • With no/low selection the presence of weak ties enables fast consensus. This conforms with the classical theory that weak ties are helpful for quicklymixing and spreading information, and strong ties alone act much more slowly. • With high selection, too many weak ties inhi...

ACM Transactions on Economics and Computation, 2019
In the setting where information cannot be verified, we propose a simple yet powerful information... more In the setting where information cannot be verified, we propose a simple yet powerful information theoretical framework—the Mutual Information Paradigm—for information elicitation mechanisms. Our framework pays every agent a measure of mutual information between her signal and a peer’s signal. We require that the mutual information measurement has the key property that any “data processing” on the two random variables will decrease the mutual information between them. We identify such information measures that generalize Shannon mutual information. Our Mutual Information Paradigm overcomes the two main challenges in information elicitation without verification: (1) how to incentivize high-quality reports and avoid agents colluding to report random or identical responses; (2) how to motivate agents who believe they are in the minority to report truthfully. Aided by the information measures, we found (1) we use the paradigm to design a family of novel mechanisms where truth-telling is...

Lecture Notes in Computer Science, 2016
Peer-prediction [18] is a (meta-)mechanism which, given any proper scoring rule, produces a mecha... more Peer-prediction [18] is a (meta-)mechanism which, given any proper scoring rule, produces a mechanism to elicit privately-held, non-verifiable information from self-interested agents. Formally, truth-telling is a strict Nash equilibrium of the mechanism. Unfortunately, there may be other equilibria as well (including uninformative equilibria where all players simply report the same fixed signal, regardless of their true signal) and, typically, the truth-telling equilibrium does not have the highest expected payoff. The main result of this paper is to show that, in the symmetric binary setting, by tweaking peer-prediction, in part by carefully selecting the proper scoring rule it is based on, we can make the truth-telling equilibrium focal-that is, truth-telling has higher expected payoff than any other equilibrium. Along the way, we prove the following: in the setting where agents receive binary signals we 1) classify all equilibria of the peer-prediction mechanism; 2) introduce a new technical tool for understanding scoring rules, which allows us to make truth-telling pay better than any other informative equilibrium; 3) leverage this tool to provide an optimal version of the previous result; that is, we optimize the gap between the expected payoff of truth-telling and other informative equilibria; and 4) show that with a slight modification to the peer-prediction framework, we can, in general, make the truth-telling equilibrium focal-that is, truth-telling pays more than any other equilibrium (including the uninformative equilibria).

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, 2007
We study linear programming relaxations of Vertex Cover and Max Cut arising from repeated applica... more We study linear programming relaxations of Vertex Cover and Max Cut arising from repeated applications of the "lift-and-project" method of Lovasz and Schrijver starting from the standard linear programming relaxation. For Vertex Cover, Arora, Bollobas, Lovasz and Tourlakis prove that the integrality gap remains at least 2 − ε after Ω ε (log n) rounds, where n is the number of vertices, and Tourlakis proves that integrality gap remains at least 1.5 − ε after Ω((log n) 2) rounds. Fernandez de la Vega and Kenyon prove that the integrality gap of Max Cut is at most 1 2 + ε after any constant number of rounds. (Their result also applies to the more powerful Sherali-Adams method.) We prove that the integrality gap of Vertex Cover remains at least 2 − ε after Ω ε (n) rounds, and that the integrality gap of Max Cut remains at most 1/2 + ε after Ω ε (n) rounds.
Twenty-Second Annual IEEE Conference on Computational Complexity (CCC'07), 2007
We study semidefinite programming relaxations of Vertex Cover arising from repeated applications ... more We study semidefinite programming relaxations of Vertex Cover arising from repeated applications of the LS+ "lift-and-project" method of Lovasz and Schrijver starting from the standard linear programming relaxation. Goemans and Kleinberg prove that after one round of LS+ the integrality gap remains arbitrarily close to 2. Charikar proves an integrality gap of 2 for a stronger relaxation that is, however, incomparable with two rounds of LS+ and is strictly weaker than the relaxation resulting from a constant number of rounds. We prove that the integrality gap remains at least 7/6 − ε after c ε n rounds, where n is the number of vertices and c ε > 0 is a constant that depends only on ε.
Lecture Notes in Computer Science
We present Chora, a P2P web search engine which complements, not replaces, traditional web search... more We present Chora, a P2P web search engine which complements, not replaces, traditional web search by using peers' web viewing history to recommend useful web sites to queriers. Chora is designed around a two-step paradigm. First, Chora determines which peers to query and then it executes a query across these peers. Each peer uses a desktop search engine to query their local web history and retrieve results ordered by relevance. To determine which peers to query, a small sketch of the information available from each peer is stored in a DHT. Peers with sketches indicating that they may have relevant information are queried. The query is dispersed through an ad hoc network connecting only those machines in the query and is optimized for getting good results as quickly as possible.
Uploads
Papers by Grant R Y Schoenebeck
We prove a dichotomy theorem about the interaction of the two parameters: 1) the ``majority-like" update function, and 2) the level of intercommunity connectivity. For each setting of parameters, we show that either: the system quickly converges to consensus with high probability in time $\Theta(n \log(n))$; or, the system can get ``stuck" and take time $2^{\Theta(n)}$ to reach consensus.
We note that $O(n \log(n))$ is optimal because it takes this long for each node to even update its opinion.
Technically, we achieve this fast convergence result by exploiting the connection between a family of reinforced random walks and dynamical systems literature. Our main result shows if the system is a reinforced random walk with a gradient-like function, it converges to an arbitrary neighborhood of a local attracting point in $O(n\log n)$ time with high probability. This result adds to the recent literature on saddle-point analysis and shows a large family of stochastic gradient descent algorithm converges to a local minimal in $O(n\log n)$ when the step size $O(1/n)$.
Our opinion dynamics model captures a broad range of systems, sometimes called interacting particle systems, exemplified by the voter model, iterative majority, and iterative $k-$majority processes---which have found use in many disciplines including distributed systems, statistical physics, social networks, and Markov chain theory.
In this paper we study the general threshold model of cascades which are parameterized by a distribution over the natural numbers, in which the collective influence from infected neighbors, once beyond the threshold of an individual u, will trigger the infection of u. By varying the choice of the distribution, the general threshold model can model cascades with and without the submodular property. In fact, the general threshold model captures many previously studied cascade models as special cases, including the independent cascade model, the linear threshold model, and k-complex contagions.
We provide both analytical and experimental results for how cascades from a general threshold model spread in a general growing network model, which contains preferential attachment models as special cases. We show that if we choose the initial seeds as the early arriving nodes, the contagion can spread to a good fraction of the network and this fraction crucially depends on the fixed points of a function derived only from the specified distribution. We also show, using a coauthorship network derived from DBLP databases and the Stanford web network, that our theoretical results can be used to predict the infection rate up to a decent degree of accuracy, while the configuration model does the job poorly.
Previous work along this dimension typically a) assumes that it is difficult/costly for an adversary to create edges to honest nodes in the network; and b) limits the amount of damage done per such edge, using conductance-based methods. However, these methods fail to detect a simple class of sybil attacks which have been identified in online systems. Indeed, conductance-based methods seem inherently unable to do so, as they are based on the assumption that creating many edges to honest nodes is difficult, which seems to fail in real-world settings.
We create a sybil defense system that accounts for the adversary's ability to launch such attacks yet provably withstands them by:
1.Not assuming any restriction on the number of edges an adversary can form, but instead making a much weaker assumption that creating edges from sybils to most honest nodes is difficult, yet allowing that the remaining nodes can be freely connected to.
2.Relaxing the goal from classifying all nodes as honest or sybil to the goal of classifying the "core" nodes of the network as honest; and classifying no sybil nodes as honest.
3.Exploiting a new, for sybil detection, social network property, namely, that nodes can be embedded in low-dimensional spaces.