0% found this document useful (0 votes)
10 views128 pages

PP Slides

The document presents a tutorial on Performative Prediction at UAI 2024, focusing on the impact of predictions in social contexts compared to physical systems. It introduces the concept of performative prediction, which extends classical risk minimization by considering the causal effects of predictions on outcomes. The tutorial covers various topics including performative stability, retraining, and the implications of predictions in machine learning and social sciences.

Uploaded by

brickblack2009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views128 pages

PP Slides

The document presents a tutorial on Performative Prediction at UAI 2024, focusing on the impact of predictions in social contexts compared to physical systems. It introduces the concept of performative prediction, which extends classical risk minimization by considering the causal effects of predictions on outcomes. The tutorial covers various topics including performative stability, retraining, and the implications of predictions in machine learning and social sciences.

Uploaded by

brickblack2009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 128

Introduction to

Performative Prediction
a
Tutorial @ UAI 2024

Celestine Mendler-Dünner Tijana Zrnic


MPI for Intelligent Systems Stanford Data Science &
ELLIS Institute Tübingen Dept of Statistics
Tübingen AI Center Stanford University
1st hour:
Schedule Introduction
Framework
Performative stability and retraining

2nd hour:
Performative optimality
Performative optimality vs performative stability
Model-free and model-based optimization

Coffee break

3rd hour:
Extensions of the framework and connections
Power, incentives, digital activism
Discussion
Different facets of prediction

Prediction in the social world is different from prediction in physical systems

Prediction in astronomy:
Detect regularities and laws in nature, purely explanatory and descriptive

Prediction in social context:


Predictions are an intrinsic part of the system, they inform decisions,
beliefs and outcomes
Different facets of prediction

FiveThirtyEight publishes predictions of US


election outcome
Predictions change expectations and beliefs of
individuals
They impact voter turnout and election outcome

Herbert Simon 1954.


prediction of “Bandwagon and Underdog Effects and the
election outcomes Possibility of Election Predictions”
Different facets of prediction

FiveThirtyEight publishes predictions of US


election outcome
Predictions change expectations and beliefs of
individuals
They impact voter turnout and election outcome

Herbert Simon 1954.


prediction of “Bandwagon and Underdog Effects and the
election outcomes Possibility of Election Predictions”
Different facets of prediction

Government agencies make predictions


about socioeconomic status

Predictions are used to allocate benefits


Different facets of prediction

Government agencies make predictions


about socioeconomic status

Predictions are used to allocate benefits

People adapt and statistical regularities in


the population collapse

Camacho & Conover,


American Economic Journal, 2011
Different facets of prediction

“Option pricing theory—a “crown jewel” of neoclassical economics—


succeeded empirically not because it discovered preexisting price patterns
but because it pushed the market to conform to its predictions […].”
MacKenzie & Millo, American Journal of Sociology, 2003
Oskar Morgenstern
Habilitation, 1928

In the physical world - unlike the social world - there is no causal


relationship between the prediction of an event and its occurrence.

Forecasts that can impact the predicted event constitute one of the most
central problems in the theory of economic forecasting

Habilitation titled “Wirtschaftsprognose”. 1928.


What about machine learning?

We routinely make predictions in


economic and social contexts!
Prediction in machine learning

Traffic predictions impact routing


decisions and hence traffic
Prediction in machine learning

Find news article

Zestimates set expectations


that impact sales prices
Prediction in machine learning

Find news article

Recommender systems filter


information and shape consumption
Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)


• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world


Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)


• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world


Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)


• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world

No language to articulate Morgenstern’s argument


Performative prediction

An extension of the classical risk minimization framework that accounts for


the causal effect of predictions on the target of prediction
Performative prediction

An extension of the classical risk minimization framework that accounts for


the causal effect of predictions on the target of prediction

Performativity is an established concept in economics,


finance, public policy, and social science
see, e.g., M. Callon, D. MacKenzie

Donald MacKenzie. 2006.


“How Financial Models Shape Markets”
Performative prediction

An extension of the classical risk minimization framework that accounts for


the causal effect of predictions on the target of prediction

Performativity is an established concept in economics,


finance, public policy, and social science
see, e.g., M. Callon, D. MacKenzie

Goal: bring performativity as a concept into


the foundations of machine learning

Donald MacKenzie. 2006.


“How Financial Models Shape Markets”
Framework
[PZMH20]

Performative prediction framework


Performativity thesis:
Predictions can have a causal influence on the world they aim to predict
[PZMH20]

Performative prediction framework


Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data


→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃
[PZMH20]

Performative prediction framework


Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data


→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃

Risk 𝜃, 𝐷(𝜃) = E(",$)∼'(() [ loss (𝑥, 𝑦); 𝜃 ]

the observed loss of a model is the loss on the distribution


that surfaces after its deployment
[PZMH20]

Performative prediction framework


Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data


→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃

Risk 𝜃, 𝐷(𝜃) = E(",$)∼'(() [ loss (𝑥, 𝑦); 𝜃 ]

the model impacts the risk in two ways:


through the loss and the distribution
Performative stability
Retraining and stability
Collect data and update your model given data

Repeated risk minimization (RRM): 𝜃!"# ← argmin% Risk 𝜃, 𝐷!


1. observe data distribution 𝐷!
2. let 𝜃!"# be the risk minimizer on 𝐷!
3. deploy 𝜃!"# → 𝐷!"# = 𝐷(𝜃!"# ) Risk(𝜃, 𝐷! )
4. repeat
𝜃! Risk(𝜃, 𝐷!"# )

𝜃!"#
𝜃!"$
[PZMH20]

Retraining and stability


Collect data and update your model given data

Repeated risk minimization (RRM): 𝜃!"# ← argmin% Risk 𝜃, 𝐷!


1. observe data distribution 𝐷!
2. let 𝜃!"# be the risk minimizer on 𝐷!
3. deploy 𝜃!"# → 𝐷!"# = 𝐷(𝜃!"# ) Risk(𝜃, 𝐷! )
4. repeat
𝜃! Risk(𝜃, 𝐷!"# )

Performative stability:
𝜃 ∗ = argmin" Risk 𝜃, 𝐷(𝜃 ∗ )
Model remains optimal after deployment
𝜃!"#
𝜃!"$
A natural equilibrium concept
When does retraining converge?

Definition: We say the distribution map 𝐷(𝜃) is 𝜖-sensitive if for all 𝜃, 𝜃 #


𝑊 𝐷 𝜃 , 𝐷 𝜃& ≤ 𝜖 𝜃 − 𝜃& $

“Similar models lead to similar distributions”


When does retraining converge?

Definition: We say the distribution map 𝐷(𝜃) is 𝜖-sensitive if for all 𝜃, 𝜃 #


𝑊 𝐷 𝜃 , 𝐷 𝜃& ≤ 𝜖 𝜃 − 𝜃& $

“Similar models lead to similar distributions”

Theorem [PZMH20]: Suppose the loss function is 𝛾-strongly convex in 𝜃 and


𝛽-smooth in the data*. Then retraining converges to a unique stable point as long as
𝐷(𝜃) is not too sensitive: 𝜖 < 𝛾/𝛽 . The rate of convergence is linear:
%& $
𝜃$ − 𝜃∗ ≤ '
||𝜃( − 𝜃 ∗ ||

* ∇𝜃 ℓ(𝑧, 𝜃) is 𝛽-Lipschitz in 𝑧
ℓ!%& 𝜃 : loss over 𝐷(𝜃!%& )

Proof sketch
ℓ! 𝜃 : loss over 𝐷(𝜃! )

𝜃!

𝜃!%&
𝜃!∗

0
• 𝛾-strong convexity of the loss in 𝜃: [∇ℓ! 𝜃! − ∇ℓ! (𝜃!∗ )]# 𝜃! − 𝜃!∗ ≥ 𝛾||𝜃! − 𝜃!∗ ||$
0
• 𝛽-smoothness of the loss in the data: [∇ℓ! 𝜃! − ∇ℓ!%& 𝜃! ]# 𝜃! − 𝜃!∗ ≤ 𝛽 𝜃! − 𝜃!∗ 𝑊 𝐷(𝜃!%& , 𝐷(𝜃! ))

Kantorovich-Rubinstein duality theorem: for 𝐿-Lipschitz functions 𝑔:


E'∼)! 𝑔 𝑥 − E'∼)" 𝑔 𝑥 ≤ 𝐿 𝑊(𝐷& , 𝐷$ )
ℓ!%& 𝜃 : loss over 𝐷(𝜃!%& )

Proof sketch
ℓ! 𝜃 : loss over 𝐷(𝜃! )

𝜃!

𝜃!%&
𝜃!∗

0
• 𝛾-strong convexity of the loss in 𝜃: [∇ℓ! 𝜃! − ∇ℓ! (𝜃!∗ )]# 𝜃! − 𝜃!∗ ≥ 𝛾||𝜃! − 𝜃!∗ ||$
0
• 𝛽-smoothness of the loss in the data: [∇ℓ! 𝜃! − ∇ℓ!%& 𝜃! ]# 𝜃! − 𝜃!∗ ≤ 𝛽 𝜃! − 𝜃!∗ 𝑊 𝐷(𝜃!%& , 𝐷(𝜃! ))

*
• 𝜖-sensitivity of 𝐷(⋅): ⇒ ||𝜃! − 𝜃!∗ || ≤ 𝑊 𝐷(𝜃!%& , 𝐷(𝜃! ))
+

𝛽
• 𝜖-sensitivity of 𝐷(⋅): ≤ 𝜖 𝜃!%& − 𝜃!
𝛾
𝛽 ∗
= 𝜖 𝜃!%& − 𝜃!%& contraction for 𝜖 < 𝛾/𝛽
𝛾
Fixed-point argument
Historical arguments about the possibility of
public prediction using Brouwer’s fixed point
theorem
• Simon 1954
• Grunberg, Modigliani 1954

If the response funtion Y = 𝑅(𝑌)= is continuous


then perfect public prediction is possible

Performative stability is a natural analogue of


these fixed points in parametric prediction
settings

Herbert Simon, 1954


Fixed-point argument
Historical arguments about the possibility of
public prediction using Brouwer’s fixed point Outcome
theorem 𝒀

• Simon 1954 “Win probability”


perfect
prediction
• Grunberg, Modigliani 1954
predicted value
If the response funtion Y = 𝑅(𝑌)= is continuous !
𝒀
then perfect public prediction is possible

Performative stability is a natural analogue of


these fixed points in parametric prediction
settings

Herbert Simon, 1954


[PZMH20]

When is sensitivity satisfied?


Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold


𝜃
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted


by unit change in 𝜃
[PZMH20]

When is sensitivity satisfied?


Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold


𝜃′
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted


by unit change in 𝜃
[PZMH20]

When is sensitivity satisfied?


Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold


𝜃′
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted


by unit change in 𝜃
Retraining heuristics
Beyond risk minimization
Retraining heuristics as natural fixed point dynamics under performativity

𝜃!"# ← argmin' Risk(𝜃, 𝐷(𝜃! ))


Empirical risk using samples of 𝐷(𝜃% )
gradient update 𝜃% −𝜂 E&∼(($! ) [∇ℓ 𝑧; 𝜃% ]

• ERM and repeated gradient descent [PZMH20]


• Stochastic gradient descent [MPZH20, DX23]
• Proximal point methods [DX23]
• Projected gradient descent [WBD21]
[DX23]

Stochastic gradient descent


• SGD update uses unbiased estimate of the gradient: 𝑔! 𝜃 : = E,∼.! [∇ℓ' 𝑧; 𝜃 ]

• For small 𝜖 < 𝛾/𝛽 the gradient on problem 𝐷(𝜃% ) is aligned with the gradient on
problem 𝐷(𝜃 ∗ ) and never points against the gradient flow:

)* $
𝑔! (𝜃) − ∇𝑔∗ 𝜃 ≤ 𝜖𝛽 𝜃! − 𝜃∗ → cos ∡ 𝑔! 𝜃 , 𝑔∗ 𝜃 ≤ 1−
+

• It remains to choose the stepsize such that the variance decreases sufficiently
quickly as we approach stability to obtain the classical 𝑂 "! convergence
𝑔
rate.
#

We can study SGD as an algorithm to implicitly solve 𝑔∗


the static problem at equilibrium 𝜃
Risk(𝜃, 𝐷 𝜃 ∗ )
[DX23]

Stochastic gradient descent


• SGD update uses unbiased estimate of the gradient: 𝑔! 𝜃 : = E,∼.! [∇ℓ' 𝑧; 𝜃 ]

• For small 𝜖 < 𝛾/𝛽 the gradient on problem 𝐷(𝜃% ) is aligned with the gradient on
problem 𝐷(𝜃 ∗ ) and never points against the gradient flow:

)* $
𝑔! (𝜃) − ∇𝑔∗ 𝜃 ≤ 𝜖𝛽 𝜃! − 𝜃∗ → cos ∡ 𝑔! 𝜃 , 𝑔∗ 𝜃 ≤ 1−
+

• Choosing step size such that gradient variance decreases sufficiently quickly as we
approach stability implies classical 𝑂 "! convergence rate

SGD under performativity ≈ perturbed SGD at equilibrium distribution 𝐷(𝜃 ∗ )


[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically
[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically

Step size for greedy deploy globally decreasing Step size for lazy deploy locally decreasing
and more conservative as 𝜖 grows between deployments and independent of 𝜖
[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically

Step size for greedy deploy globally decreasing Step size for lazy deploy locally decreasing
and more conservative as 𝜖 grows between deployments and independent of 𝜖

Which one works better?


[MPZH20]

Greedy vs lazy
Setup: Mean estimation z ∼ 𝑁(𝜇 + 𝜖𝜃, 𝜎 $ ) using ℓ 𝑧, 𝜃 = $% 𝑧 − 𝜃 $

𝜖 = 0.2 𝜖 = 0.6 𝜖 = 0.9

• Greedy deploy: Better if performativity is weak


• Lazy deploy: Better at dealing with strong shifts and poor initialization
[MPZH20]

Greedy vs lazy
deployments
greedy: 50K
Setup: Mean estimation z ∼ 𝑁(𝜇 + 𝜖𝜃, 𝜎 $ ) using ℓ 𝑧, 𝜃 = $% 𝑧 − 𝜃 $
lazy: 200

𝜖 = 0.2 𝜖 = 0.6 𝜖 = 0.9

• Greedy deploy: Better if performativity is weak


• Lazy deploy: Better at dealing with strong shifts and poor initialization
• Practical tradeoff between sample collection and deployment costs
[MMG23, LW24]

Stochastic optimization under nonconvexity


Performatively stable points are stationary points: E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ =0
Stationarity makes sense even with nonconvex losses!
More generally: 𝜃 ∗ is 𝛿-stationary performatively stable if
|| E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ ||2 ≤ 𝛿
[MMG23, LW24]

Stochastic optimization under nonconvexity


Performatively stable points are stationary points: E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ =0
Stationarity makes sense even with nonconvex losses!
More generally: 𝜃 ∗ is 𝛿-stationary performatively stable if
|| E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ ||2 ≤ 𝛿

Theorem [LW24]:
Assume 𝐷 𝜃 is 𝜖-sensitive and ℓ(𝑧; 𝜃) is Lipschitz in 𝜃 and possibly nonconvex. Then,
• greedy deploy satisfies
# 5 #
∑34# || E/∼.('& ) ∇ℓ 𝑧; 𝜃3 ||2 = 𝑂 +𝑂 𝜖 ;
2 5

• lazy deploy with batch size 𝐾 satisfies


# 5 # # )
∑ || E/∼.('& ) ∇ℓ 𝑧; 𝜃3 ||2 =𝑂 + +𝑂 .
2 34# 5 6 6
Performative stability and retraining: recap

• Performative stability as a natural equilibrium concept of retraining

• Retraining heuristics converge to stable points if problem is close to static

• Online vs offline updates as a new design choice for stochastic optimization

Next: Performative optimality


Performative optimality
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃

Performative optimality:
𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) lowest possible risk
after deployment
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃

Performative optimality:
𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) lowest possible risk
after deployment

Performative stability:
𝜃7: = argmin9 Risk 𝜃, 𝐷(𝜃7: ) Do stable points have low
risk after deployment?
Stability and performative risk

Performative stability: on its own distribution 𝐷 𝜃-. , 𝜃-. looks optimal

PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃 − min, Risk 𝜙, 𝐷 𝜃 + min, Risk 𝜙, 𝐷 𝜃

= 0 for 𝜃 = 𝜃-. “easiness” of 𝐷 𝜃

→ θ-. approximately optimal if it induces ”easy” distribution Risk 𝜙, 𝐷(𝜃'( )

Risk 𝜙, 𝐷(𝜃') )
Not always true! Stable points can even maximize PR 𝜃

𝜃'*
𝜃')
Stability and performative risk
𝜖-sensitive

Example:

𝐷(𝜃): X ∼ Unif {−1, +1} , Y|X ∼ Bern(0.5 + 𝜖𝜃𝑋)


𝛾
=1
𝛽
ℓ (x, y); 𝜃 = (𝑦 − 𝑓𝜃 (x))2 where 𝑓𝜃 x = 𝜃𝑥 + 0.5

𝝐 ≤ 𝟏 → retraining converges to stable point 𝜃PS = 0

A direct calculation shows: PR 𝜃 = 0.25 + 1 − 2𝜖 𝜃2

/ 1
Non-convex for 𝜖 > 0.5! For 𝜖 ∈ ,
#0 0
stable point maximizes PR 𝜃
Optimizing the performative risk

Difficulties:

• no guarantee of convexity even is loss is convex

PR 𝜃 ≔ Risk(𝜃, 𝐷(𝜃))

only convex in first argument

• no gradient access
∇PR 𝜃 = E2∼(($) ∇ℓ z; 𝜃 + E2∼(($) [ℓ z; 𝜃 ∇log p𝜃 (z)]

Distribution map is unknown!


Optimizing the performative risk

Difficulties: For 𝑡 = 1, … , 𝑇:

• no guarantee of convexity even is loss is convex • Deploy model 𝜃𝑡

PR 𝜃 ≔ Risk(𝜃, 𝐷(𝜃)) • Collect data z𝑡1, … , 𝑧34 ∼ 𝐷(𝜃𝑡 )

only convex in first argument Compute 𝜃h PO based on 𝑆 = {𝜃3 , 𝑧35 }3,5

• no gradient access
∇PR 𝜃 = E2∼(($) ∇ℓ z; 𝜃 + E2∼(($) [ℓ z; 𝜃 ∇log p𝜃 (z)]

Distribution map is unknown!

We need to collect data from multiple deployments of 𝜃1, … , 𝜃𝑇


Model-free approaches Model-based approaches

• No explicit modeling of 𝐷(𝜃) required • Incorporate model of 𝐷(𝜃)

• Based on bandits and other zeroth-order • Based on economic models (e.g. utility-maximizing
optimization methods agents), other models from domain knowledge, etc

• Convergence relies on general regularity conditions • Convergence relies on model correctness or degree
(e.g. convexity, smooth distribution shifts, etc) of model misspecification

• Generally slow convergence • Typically fast convergence


For 𝑡 = 1, … , 𝑇:

Model-free vs model-based: example • Deploy model 𝜃𝑡

• Collect data z𝑡1, … , 𝑧34 ∼ 𝐷(𝜃𝑡 )


𝑓$ 𝑥 = 𝑥 7 𝜃 ℓ (𝑥, 𝑦 ; 𝜃) = 𝑦 − 𝑓$ (𝑥) 2
Compute 𝜃h PO based on 𝑆 = {𝜃3 , 𝑧35 }3,5

Model-free: Model-based:
4
1 We model the data-generating process:
j
PR 𝜃𝑡 = l ℓ(𝑧35 ; 𝜃3 ) • agents manipulate features to maximize the prediction:
𝑚
589 7
1
𝑥 = argmax𝑥 𝛾 ⋅ 𝑥 𝜃 − ||𝑥 − 𝑥0||2
2
𝜃h PO = argmin$∈{$",…,$#} P
jR 𝜃 • agents can manipulate features, not label
utility-cost tradeoff
This is a distribution map model 𝐷𝛾 𝜃

Fit 𝛾p using 𝑆 and let 𝜃h PO = argmin𝜃 Risk(𝜃, 𝐷1> 𝜃 )

What is the tradeoff?


Model-free vs model-based: example

Suppose model is 𝜏-misspecified: in addition to features, labels change too

𝑦 = 𝑦0 + 𝜏 ⋅ 𝑥 7 𝜃

For example, if feature manipulations have a causal effect on true label


General tradeoff

Theorem [LZ24] (informal):

PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

error due to modeling of the error due to having finite


distribution map deployments and finite data
General tradeoff

Theorem [LZ24] (informal):

PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

error due to modeling of the error due to having finite


distribution map deployments and finite data

Model-free: PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error


0 large

Model-based: PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error


depends on domain knowledge, 6
small, often 𝑂?
7
complexity of true map, etc
Model-free performative optimization
Convexity of the performative risk
Theorem [MPZ21]:
If the loss is 𝛾-strongly convex and 𝛽-smooth in the data and the distribution map is 𝜖-Lipschitz
1
and sufficiently regular*, then PR 𝜃 is guaranteed to be convex if and only if 𝜖 < .
#0

*e.g. distributions obtained by translation and rescaling:


𝑧 ∼ 𝐷(𝜃) ⇔ 𝑧 = Σ(𝜃) 𝑧0 + 𝜇(𝜃) for linear Σ 𝜃 , 𝜇(𝜃) (see [MPZ21] for details)
Convexity of the performative risk
Theorem [MPZ21]:
If the loss is 𝛾-strongly convex and 𝛽-smooth in the data and the distribution map is 𝜖-Lipschitz
1
and sufficiently regular*, then PR 𝜃 is guaranteed to be convex if and only if 𝜖 < .
#0

*e.g. distributions obtained by translation and rescaling:


𝑧 ∼ 𝐷(𝜃) ⇔ 𝑧 = Σ(𝜃) 𝑧0 + 𝜇(𝜃) for linear Σ 𝜃 , 𝜇(𝜃) (see [MPZ21] for details)

If PR 𝜃 is convex, we can use derivative-free convex optimization [FKM04]


𝑑 _
𝜃3?9 = 𝜃3 − 𝜂 ⋅ PR 𝜃3 + 𝛿𝑢3 𝑢3 , 𝑢3 ∼ Unif 𝑆𝑑 1 , 𝛿, 𝜂 > 0
𝛿

Converges to 𝜃-@ at rate O( 𝑑𝑡 A9/C ) only queries PR, not its gradient
Beyond convexity?
Optimization with no gradients and no convexity = continuum-arm bandit problem?
• “pull arm” 𝜃3 and observe bandit feedback PR j 𝜃3 with E PR
j 𝜃3 = PR(𝜃3 )
• assuming only Lipschitzness of PR we can apply Lipschitz bandits [KSU08]
Beyond convexity?
Optimization with no gradients and no convexity = continuum-arm bandit problem?
• “pull arm” 𝜃3 and observe bandit feedback PR j 𝜃3 with E PR
j 𝜃3 = PR(𝜃3 )
• assuming only Lipschitzness of PR we can apply Lipschitz bandits [KSU08]

Performative feedback is more informative than bandit feedback!


At every time step we deploy a model 𝜃3 and observe 𝑚 samples of the induced distribution 𝐷(𝜃3 )

→ faster convergence rates by constructing fine-grained confidence bounds


[JZM22]
Tighter confidence bounds

After deploying 𝜃3 we observe 𝐷(𝜃3 ) (ignoring finite-sample considerations for now)


What do we learn about the performative risk of an unexplored 𝜃DEF?

PR 𝜃DEF − PR 𝜃3 = Risk 𝜃DEF, 𝐷(𝜃DEF) − Risk 𝜃DEF, 𝐷(𝜃3 ) uncertainty due to


distribution shift
uncertainty due to
+ Risk 𝜃DEF, 𝐷(𝜃3 ) − Risk 𝜃3 , 𝐷(𝜃3 )
changing predictive model

Second term is known because we know the loss and 𝐷(𝜃3 )!

→ We only pay for uncertainty due to distribution shift


[JZM22]
Tighter confidence bounds

Confidence bound with bandit feedback Confidence bound with performative feedback
(Lipschitz PR(𝜃)) (Lipschitz Risk(𝜃, 𝐷(𝜙)) in 𝜙)

PR(𝜃) PR(𝜃)

𝜃GEHIJK 𝜃DEF 𝜃GEHIJK 𝜃DEF


[JZM22]
Tighter confidence bounds

Confidence bound with bandit feedback Confidence bound with performative feedback
(Lipschitz PR(𝜃)) (Lipschitz Risk(𝜃, 𝐷(𝜙)) in 𝜙)

upper
confidence bound

PR(𝜃) PR(𝜃)

𝜃GEHIJK 𝜃DEF 𝜃GEHIJK 𝜃DEF

Use successive elimination the algorithm can discard regions of the parameter space
to deal with finite-sample uncertainty [EMM06] that have never been explored!
Performative regret bound
Theorem [JZM22]:
Assume the distribution map 𝐷 𝜃 is 𝜖-sensitive and the loss ℓ(𝑧; 𝜃) is 𝐿2 -Lipschitz in 𝑧. Then,
the performative confidence bounds algorithm that after 𝑇 deployments achieves a regret
$%" $
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%&

where 𝑑 denotes the “zooming dimension” of the problem.

$' %" $' 𝐿 Lipschitz constant PR


Baseline: Lipschitz bandits [KSU08]: Reg 𝑇 = 𝑂z 𝑇 $' %& 𝐿 $' %& 𝑑: ≥ 𝑑 zooming dimension

Benefits of performative confidence bounds:


• regret bound scales with 𝜖 (no distribution shift → fast rate)
• z 𝑇) (no dimension dependence)
as 𝜖 → 0 bound scales as 𝑂(
• no assumption on loss as a function of 𝜃
Model-based performative optimization
Model-based approaches in a nutshell

Basic idea: learn a model of 𝐷 𝜃 and plug it into performative risk

~ 𝜃 - fitted model of 𝐷 𝜃 based on collected data


𝐷

Then we can solve


𝜃h PO = argmin𝜃 Risk(𝜃, 𝐷
~ 𝜃 )

~ 𝜃 identified correctly → we can find the optimal solution 𝜃h PO offline for any loss function!
𝐷

related to omniprediction [GKRSW22, KP23]


Microfoundations

Modeling 𝐷 𝜃 (“macro level”) in terms of the behavior of individual agents in the population (“micro level”)
Microfoundations

Modeling 𝐷 𝜃 (“macro level”) in terms of the behavior of individual agents in the population (“micro level”)
Example: strategic classification [HMPW16]
Distribution 𝐷(𝜃) comes from strategic behavior of individuals trying to adapt to decision rule

Rational-agent model
-
+ 𝑥 𝜃 = argmax 𝛾 𝑓$ 𝑥 − cost(𝑥M , 𝑥)
L

gain of positive cost of feature


clasification manipulation

𝐷1 𝜃 is “best response map“ over (𝑥 𝜃 , 𝑦)


𝑓;
strategic clasification
= performative prediction + microfoundation model
Location families
“Macro level” model:
𝑧 ∼ 𝐷 𝜃 ⇔ 𝑧M + 𝜇∗7 𝜃, 𝑧M ∼ 𝐷M
unknown “base” distribution,
𝜇∗ ∈ 𝑅 > ? = z< ∈ 𝑅 =

Theorem [JZM22]:
There exists an algorithm that after 𝑇 deployments achieves a regret
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇 max{𝑑, 𝑑𝑚} .

$%" $
Bandit approach: Reg 𝑇 = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%& fast rate regardless of the strength
of performative effects
Location families
“Macro level” model:
Satisfied in strategic classification model:
𝑧 ∼ 𝐷 𝜃 ⇔ 𝑧M + 𝜇∗7 𝜃, 𝑧M ∼ 𝐷M
𝑥 𝜃 = argmax 𝛾 𝑓$ 𝑥 − cost(𝑥M , 𝑥)
unknown “base” distribution, L
𝜇∗ ∈ 𝑅 > ? = z< ∈ 𝑅 = 𝑓; 𝑥 = 𝜃 7 𝑥
1
cost 𝑥< , 𝑥 = x − x< @ Λ(𝑥 − 𝑥< )
2

Theorem [JZM22]:
There exists an algorithm that after 𝑇 deployments achieves a regret
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇 max{𝑑, 𝑑𝑚} .

$%" $
Bandit approach: Reg 𝑇 = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%& fast rate regardless of the strength
of performative effects
Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification


Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification

Causal identification impossible if 𝜃 is fixed!

Example: if Zillow’s housing pricing algorithm is fixed, we can’t tell 𝐷 𝜃 and 𝐷OPQPRS apart

Randomizing 𝜃 allows identification


Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification

Special case: performative effects mediated by model predictions [MDW22 , KP23]

Key challenge for causal identification


𝑓$ 𝑌?
violation of “positivity”: 𝑓$ (𝑋) often deterministic!
Identification achieved by
• randomizing predictions
𝑋 𝑌 • discrete predictions
• overparameterized predictions see [MDW22]
[MDW22]

Performative modeling through causal inference


Semi-synthetic experiment: predict income on US census data
Performative effects simulated on top of real census data

randomized 𝑓; discrete 𝑓;
deterministic 𝑓; (additive noise mechanism) (rounded predictions)
Performative risk

Including prediction
Gain of causal Gain of causal
can hurt performance
identification identification

Strength of performativity 𝛼 Strength of performativity 𝛼 Strength of performativity 𝛼


Performative optimality: recap

• Performative stability can be far from performatively optimal

• Finding performative optima requires exploring different models 𝜃R, … , 𝜃S

• Performative optimization can be done via model-based and model-free approaches

• Model-free approaches make fewer assumptions and converge slowly; model-based


approaches make stronger assumptions and converge fast, but they can suffer from
modeling biases

Next: Extensions
Extensions of the framework and connections
Performative prediction framework: recap

After deploying 𝜃 the learner experiences performative risk


PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃 = 𝐸T∼V 9 ℓ(𝑧; 𝜃)

Performative optimality: Performative stability:


𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) 𝜃7: = argmin9 Risk 𝜃, 𝐷(𝜃7: )
[BHK22, RRDF22, LW22, IZY22]

Stateful distribution shifts


After deployment, the environment does not respond immediately

It remembers past deployments and gradually approaches 𝐷(𝜃)

Distribution at time 𝑡:
𝐷3 = 1 − 𝛿 ⋅ 𝐷3A9 + 𝛿 ⋅ 𝐷(𝜃3 )

how fast environment model deployed at time t


forgets past deployments

𝐷(𝜃A )
Model 𝜃 deployed over multiple steps
→ distribution close to 𝐷(𝜃)
𝐷AB6
[NFDFR23, PY23, WYW23]

Multiplayer performative prediction


Performativity arises in the context of 𝑛 competing decision-makers

Risk of decision-maker 𝑖 depends on all decisions:

PR R 𝜽 = E2∼(( (𝜽) ℓ 𝑧; 𝜃5 , where 𝜽 = (𝜃9 , … , 𝜃U )

Example: multiple navigation apps predict travel time; people respond by considering multiple
predictions

Main solution concept: Nash equilibrium 𝜽*= 𝜃9∗, … , 𝜃U∗

𝜃5∗ ∈ argmin𝜃 PR R 𝜃9∗, … , 𝜃5 , … , 𝜃U∗


!
Considerations in fairness

How do we choose ℓ? In a performative context, ℓ shouldn’t just measure predictive accuracy!


We want to optimize some notion of welfare in equilibrium

Neglecting performative feedback can amplify unfairness and polarization over time, even if
starting from a fair model [LDRSH18, HSNL18, JXLZ24]

The performative risk captures welfare through the dependence on 𝐷(𝜃)

W
For example, we can choose: ℓ (𝑥, 𝑦); 𝜃 = ℓ.V (𝑥, 𝑦); 𝜃 − 𝑦 #
#

promotes large values of the label


supervised learning loss

Loss can even depend only on data! e.g. ℓ (𝑥, 𝑦); 𝜃 = 𝑦 is a valid loss, because PR 𝜃 = 𝐸( $ [𝑦]
Shift in perspective:
Focus on those impacted by performativity
Performative risk
Two levers to achieve small risk

Risk 𝜃, 𝐷(𝜃) = E(…,†)∼‡(ˆ) [ loss (𝑥, 𝑦); 𝜃 ]

Steer data Fit patterns


given data

Finding optimal points = minimize PR 𝜃 : = Risk 𝜃, 𝐷 𝜃


Steering as a major concern for competition
EU vs Google
“[T]he General Court [of the European Union]
finds that, by favouring its own comparison
shopping service on its general results pages […]
by means of ranking algorithms, Google departed
from competition on the merits.”

Traditionally market power enabled a firm to set prices,


in digital markets power enables firms to steer users and drive consumption
[HJM22]

Performative power
Quantifying the strength of performativity as a notion of power

Performative power: The ability to impact individual outcomes through algorithmic


actions, on average across a population of users

1
P≔ sup 9 E dist(𝑍Z (𝑓M ), 𝑍Z (𝑓))
QSPRJD X∈Y |𝑈|
Z∈[

current outcome counterfactual outcome


choice of algorithmic action
for user u for user u under 𝑓
[HJM22]

Performative power
Quantifying the strength of performativity as a notion of power

Performative power: The ability to impact individual outcomes through algorithmic


actions, on average across a population of users

Average treatment effect


1
P≔ sup 9 E dist(𝑍Z (𝑓M ), 𝑍Z (𝑓))
QSPRJD X∈Y |𝑈|
Z∈[

current outcome counterfactual outcome


choice of algorithmic action
for user u for user u under 𝑓
A causal inference problem
Through performativity we can relate the abstract concept of power
to a causal inference problem.

How much would the average outcome change if the firm were to deploy a
different model?
click on a website/consumption of a service
A causal inference problem
Through performativity we can relate the abstract concept of power
to a causal inference problem.

How much would the average outcome change if the firm were to deploy a
different model?
click on a website/consumption of a service

The mechanism behind performativity is complex. It depends on social and


economic context, behavioral biases, and many design decisions on how
predictions are displayed.

Experimental designs offer a promising avenue!


Performativity gap
How do we measure performative power if we don’t have control over the algorithm?

Predictions are displayed through content arrangements!

search
query

Ranking algorithm arrangement


Performativity gap
How do we measure performative power if we don’t have control over the algorithm?

Predictions are displayed through content arrangements!

Performativity gap : 𝛿W 𝑎 = CTR W 𝑎 − CTR X (𝑎Y)


“Change in click through rate of an item under two different arrangements”

Assume independence across interactions, then performative power across the


population of platform participants is bounded as
P ≥ max 𝛿W (𝑎) 𝒜 are possible arrangements
Z∈𝒜
resulting from 𝑓 ∈ 𝐹
Quasi experimental designs
Consider alternative arrangements where items around the decision boundary are
swapped.
• Anderson, Magruder (2012) “An extra half-star rating [on Yelp] causes restaurants to
sell out 19 percentage points (49%) more frequently”

• Narayanan, Kalyanam (2015) “Being ranked 2 instead of 1 in Google Ads reduces


CTR by 21%”

These numbers speak to the performative power of the platform over its participants

Since performativity is context specific, we need to reassess it in each individual case


What is the causal effect of Google’s
ranking algorithm on user clicks?

shopping
box • Experiment as gold standard:
change the algorithm and inspect
effect

• Algorithm is proprietary and complex


generic
result • Intervene at the level of display to
emulate algorithmic updates
Browser extension

What is the causal effect of Google’s


ranking algorithm on user clicks?

• Experiment as gold standard:


change the algorithm and inspect
effect

• Algorithm is proprietary and complex

• Intervene at the level of display to


emulate algorithmic updates
[MCH24]

Performativity gap in online search


>70’000 search queries of 85 users collected over 3 months.
Randomized display for each query

𝛿9 = 0.44 CTR9 𝛿9 = 0.63 CTR9

Power of Google search over the incoming traffic to a website ranked in first
position corresponds to more than 44% of base traffic.
[MCH24]

Performativity gap in online search


Focus on queries with boxes naturally present.

𝛿# = 0.44 CTR# 𝛿# = 0.66 CTR#

The effect of adding top content and downranking an element


is larger than the effect of any of the two individual conducts
As its name suggests…

it is a search engine not a camera


Applications
• In antitrust investigations we care about performative power over a population of
consumers in a specific relevant market.

Google Shopping case these are consumers in CSS market.


A large fraction of them use Google search for navigating to the services.

• When mandating remedies we can use measures of performative power to


monitor effectiveness over population of interest.

• In consumer protection and fairness we care about power over subpopulations.

Power surfaces in performativity


Performative power can be instantiated flexibly
Let’s zoom out

population
algorithmic
systems
data

model
predictions

Performativity:
Predictions impact people
Let’s zoom out
Data about people feeds back
into the learning system
population
algorithmic
systems
data

model
predictions

Performativity:
Predictions impact people
Let’s zoom out
Finding performative optima:
Firm anticipates
distribution shift
population
algorithmic
systems
data

model
predictions
Let’s zoom out
Reversing the order of play:
Individuals anticipate the use of
their data for learning
population
algorithmic
systems
data

model
predictions
Let’s zoom out
Reversing the order of play:
Individuals anticipate the use of
their data for learning
population
algorithmic
systems
data

model
predictions

Performativity is
the reason they care
Algorithmic resistance
Algorithmic resistance
Coordinated efforts
Coordination is effective on the side of the users to have influence on
the learning algorithm.

platform

Data leverage: • data strikes, concious data contribution [VH21, VHS19, VLTCH21]
• algorithmic collective action [HMMZ23]
• collective infrastructure, e.g., GigSense [IFS24]

A single datapoint has little influence, but systematic patterns will be picked up on
[HMMZ23]

Model of algorithmic collective action

𝑃<
1−𝛼
𝑃
platform
𝑃∗

𝛼
Platform observes Platform trains
Individuals’ initial data
mixture distribution ML model 𝑓 on 𝑃
𝑥, 𝑦 ∼ 𝑃M
𝛼-fraction of the population 𝑃 = 1 − 𝛼 𝑃M + 𝛼𝑃∗
joins the collective and
implements a strategy
to change data
Collective goal:
Favorable property of 𝑓
Anticipate retraining
Theorem [HMMZ23]: For controlling the output of a gradient-based
learner it is sufficient to have a collective of fraction 𝛼 repeatedly 𝛼 ∝ suboptimality of
modifying their data as long as the targeted solution 𝜃 ∗
𝛼 ≥ 𝑂 E2∼\! |∇ℓ 𝜃 ∗; 𝑧 |
Anticipate retraining
Theorem [HMMZ23]: For controlling the output of a gradient-based
learner it is sufficient to have a collective of fraction 𝛼 repeatedly 𝛼 ∝ suboptimality of
modifying their data as long as the targeted solution 𝜃 ∗
𝛼 ≥ 𝑂 E2∼\! |∇ℓ 𝜃 ∗; 𝑧 |

“provoke target classification at test time”


𝑓 𝑔 𝑥 = 𝑦∗
Theorem [HMMZ23]: For planting a signal against an 𝜖-optimal risk
minimizing learner with success 𝑝∗ it is sufficient to have a collective of
fraction 𝛼 with 𝜉 𝛼 ∝ uniquesness of the
𝛼≥
1 − 𝑝∗ + 𝜉 signal to be planted
where 𝜉 = 𝑃M {𝑔 𝑥 : 𝑥 ∈ 𝑋}
Small collectives can be effective
By strategically correlating a single character in the CV with a skill at training time, gig
workers can plant a trigger to be exploited at test time.

• Against a Bert classifier a collective size of 0.1% is sufficient [HMMZ23].

The more accurate the learner, the more effective the strategy

Various data poisoning example demonstrate the feasibility of impacting learner


with few datapoints, see [TCLY22] for an overview.
[BM24]

Small changes can be sufficient

By reordering playlists and


strategically choosing the
position of a target song,
fans can have disproportionate
impact on transformer-based
recommendations at test time.

utility preserving actions can be effective


Incentives to participate
Utility of firm and participants often not aligned
Collective action gives power to participants

Incentives for participation:


• collective action comes with overheads and constraints
• typically not self-incentivized (see Olson 1956)
• firms might want to protect against it, punish participation, or move away from
statistical learning
[HM23]

Incentives to participate
Utility of firm and participants often not aligned
Collective action gives power to participants

Incentives for participation:


• collective action comes with overheads and constraints
• typically not self-incentivized (see Olson 1956)
• firms might want to protect against it, punish participation, or move away from
statistical learning

The performativity of predictions determines payoff of


strategies and how much cost individuals are willing to incur
Discussion
Discussion
• When deployed in the real world AI predictions are part of a broader
sociotechnical ecosystem

• Performativity is pervasive

• Changing predictions means changing outcomes

• Prediction is no longer a purely technical endeavor

• Solution concepts are context dependent


Open problems and challenges
• Major challenge for practical developments: data availability

• Performative optimization requires exploring models; how do we do so safely?


Should we aim for a different solution concept?

• How should performativity in machine learning be regulated? What kinds of


performative effects are acceptable?

• We saw that predictions impact people and people impact predictions; how do
we model this jointly?
Thank you!
Questions?
References
[PZMH20] Perdomo, J., Zrnic, T., Mendler-Dünner, C., & Hardt, M. Performative prediction.
International Conference on Machine Learning (ICML), 2020

[MPZH20] Mendler-Dünner, C., Perdomo, J., Zrnic, T., & Hardt, M. Stochastic optimization for performative
prediction. Advances in Neural Information Processing Systems (NeurIPS), 2020

[DX23] Drusvyatskiy, D., & Xiao, L. Stochastic optimization with decision-dependent distributions.
Mathematics of Operations Research, 2023

[WBD21] Wood, K., Bianchin, G., & Dall’Anese, E. Online projected gradient descent for stochastic optimization
with decision-dependent distributions. IEEE Control Systems Letters, 2021

[MMG23] Mofakhami, M., Mitliagkas, I., & Gidel, G. Performative prediction with neural networks.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023

[LW24] Li, Q., & Wai, H. T. Stochastic optimization schemes for performative prediction with nonconvex loss.
ArXiv preprint arXiv:2405.17922, 2024

[LZ24] Lin, L., & Zrnic, T. Plug-in performative optimization.


International Conference on Machine Learning (ICML), 2024
[MPZ21] Miller, J. P., Perdomo, J. C., & Zrnic, T. Outside the echo chamber: Optimizing the performative risk.
International Conference on Machine Learning (ICML), 2021

[KSU08] Kleinberg, R., Slivkins, A., & Upfal, E. Multi-armed bandits in metric spaces.
ACM Symposium on Theory of Computing (STOC), 2008

[EMM06] Even-Dar, E., Mannor, S., Mansour, Y., & Mahadevan, S. Action elimination and stopping conditions for
the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research (JMLR), 2006

[GKRSW22] Gopalan, P., Kalai, A. T., Reingold, O., Sharan, V., & Wieder, U. Omnipredictors.
Innovations in Theoretical Computer Science (ITCS), 2022

[KP23] Kim, M. P., & Perdomo, J. C. Making decisions under outcome performativity.
Innovations in Theoretical Computer Science (ITCS), 2023

[HMPW16] Hardt, M., Megiddo, N., Papadimitriou, C., & Wootters, M. Strategic classification.
Innovations in Theoretical Computer Science (ITCS), 2016

[MDW22] Mendler-Dünner, C., Ding, F., & Wang, Y. Anticipating performativity by predicting from predictions.
Advances in Neural Information Processing Systems (NeurIPS), 2022
[BHK22] Brown, G., Hod, S., & Kalemaj, I. Performative prediction in a stateful world.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

[RRDF22] Ray, M., Ratliff, L. J., Drusvyatskiy, D., & Fazel, M. Decision-dependent risk minimization in geometrically
decaying dynamic environments. AAAI Conference on Artificial Intelligence (AAAI), 2022

[LW22] Li, Q., & Wai, H. T. State dependent performative prediction with stochastic approximation.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

[IZY22] Izzo, Z., Zou, J., & Ying, L. How to learn when data gradually reacts to your model.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

[NFDFR23] Narang, A., Faulkner, E., Drusvyatskiy, D., Fazel, M., & Ratliff, L. J. Multiplayer performative prediction:
Learning in decision-dependent games. Journal of Machine Learning Research (JMLR), 2023

[PY23] Piliouras, G., & Yu, F. Y. Multi-agent performative prediction: From global stability and optimality to chaos.
ACM Conference on Economics and Computation (EC), 2023

[WYW23] Wang, X., Yau, C. Y., & Wai, H. T. Network effects in performative prediction games.
International Conference on Machine Learning (ICML), 2023
[MTR23] Mandal, D., Triantafyllou, S., & Radanovic, G. Performative reinforcement learning.
International Conference on Machine Learning (ICML), 2023

[CHM24] Cheng, G., Hardt, M., Mendler-Dünner, C. Causal Inference out of Control: Identifying the Steerability
of Consumption. International Conference on Machine Learning (ICML), 2024

[RTMR22] Rank, B., Triantafyllou, S., Mandal, D., & Radanovic, G. Performative reinforcement learning in
gradually shifting environments. Conference on Uncertainty in Artificial Intelligence (UAI), 2024

[LDRSH18] Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. Delayed impact of fair machine learning.
International Conference on Machine Learning (ICML), 2018

[HSNL18] Hashimoto, T., Srivastava, M., Namkoong, H., & Liang, P. Fairness without demographics in repeated
loss minimization. International Conference on Machine Learning (ICML), 2018

[JXLZ24] Jin, K., Xie, T., Liu, Y., & Zhang, X. Addressing polarization and unfairness in performative prediction.
ArXiv preprint arXiv:2406.16756, 2024

[HJM22] Hardt, M., Jagadeesan M., Mendler-Dünner C. Performative power.


Advances in Neural Information Processing Systems (NeurIPS), 2022.
[MCH24] Mendler-Dünner, C., Caravano, G., Hardt, M. An engine not a camera: measuring performative power
of online search. ArXiv preprint arXiv:2405.19073, 2024

[VHS19] Vincent, N., Hecht, B., &Sen S. “Data Strikes”: Evaluating the Effectiveness of a New Form of Collective
Action Against Technology Companies. The World Wide Web Conference (WWW). 2019

[VH21] Vincent, N., Hecht, B. Can “Conscious Data Contribution” Help Users to Exert “Data Leverage” Against
Technology Companies? ACM Human-Computer Interaction. 2021

[VLTCH21] Vincent, N., Li, H., Tilly, N., Chancellor, S., Hecht, B. Data Leverage: A Framework for Empowering the
Public in its Relationship with Technology Companies. ACM Conference on Fairness, Accountability, and
Transparency (FAccT). 2021

[HMMZ23] Hardt, M., Mazumdar, E., Mendler-Dünner, C., Zrnic, T. Algorithmic collective action in machine
learning. International Conference on Machine Learning (ICML), 2023

[IFS24] Imteyaz, K., Flores-Saviaga, C., Savage, S. GigSense: An LLM-Infused Tool for Workers’ Collective
Intelligence. Arxiv preprint arXiv:2405.02528, 2024

[TCLY22] Tian, Z., Cui, L., Liang, J., Yu, S.. A comprehensive survey on poisoning attacks and countermeasures in
machine learning. ACM Computing Surveys, 55(8), 2022.
[BM24] Baumann, J., Mendler-Dünner, C. Algorithmic collective action in recommender systems.
European Workshop on Algorithmic Fairness (EWAF), 2024

[HM23] Hardt, M., & Mendler-Dünner, C. Performative prediction: past and future.
ArXiv preprint arXiv:2310.16608, 2023

You might also like