0% found this document useful (0 votes)

10 views128 pages

PP Slides

The document presents a tutorial on Performative Prediction at UAI 2024, focusing on the impact of predictions in social contexts compared to physical systems. It introduces the concept of performative prediction, which extends classical risk minimization by considering the causal effects of predictions on outcomes. The tutorial covers various topics including performative stability, retraining, and the implications of predictions in machine learning and social sciences.

Uploaded by

brickblack2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views128 pages

PP Slides

Uploaded by

brickblack2009

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 128

Introduction to

Performative Prediction
a
Tutorial @ UAI 2024

Celestine Mendler-Dünner Tijana Zrnic

MPI for Intelligent Systems Stanford Data Science &
ELLIS Institute Tübingen Dept of Statistics
Tübingen AI Center Stanford University
1st hour:
Schedule Introduction
Framework
Performative stability and retraining

2nd hour:
Performative optimality
Performative optimality vs performative stability
Model-free and model-based optimization

Coffee break

3rd hour:
Extensions of the framework and connections
Power, incentives, digital activism
Discussion
Different facets of prediction

Prediction in the social world is different from prediction in physical systems

Prediction in astronomy:
Detect regularities and laws in nature, purely explanatory and descriptive

Prediction in social context:

Predictions are an intrinsic part of the system, they inform decisions,
beliefs and outcomes
Different facets of prediction

FiveThirtyEight publishes predictions of US

election outcome
Predictions change expectations and beliefs of
individuals
They impact voter turnout and election outcome

Herbert Simon 1954.

prediction of “Bandwagon and Underdog Effects and the
election outcomes Possibility of Election Predictions”
Different facets of prediction

FiveThirtyEight publishes predictions of US

election outcome
Predictions change expectations and beliefs of
individuals
They impact voter turnout and election outcome

Herbert Simon 1954.

prediction of “Bandwagon and Underdog Effects and the
election outcomes Possibility of Election Predictions”
Different facets of prediction

Government agencies make predictions

about socioeconomic status

Predictions are used to allocate benefits

Different facets of prediction

Government agencies make predictions

about socioeconomic status

Predictions are used to allocate benefits

People adapt and statistical regularities in

the population collapse

Camacho & Conover,

American Economic Journal, 2011
Different facets of prediction

“Option pricing theory—a “crown jewel” of neoclassical economics—

succeeded empirically not because it discovered preexisting price patterns
but because it pushed the market to conform to its predictions […].”
MacKenzie & Millo, American Journal of Sociology, 2003
Oskar Morgenstern
Habilitation, 1928

In the physical world - unlike the social world - there is no causal

relationship between the prediction of an event and its occurrence.

Forecasts that can impact the predicted event constitute one of the most
central problems in the theory of economic forecasting

Habilitation titled “Wirtschaftsprognose”. 1928.

What about machine learning?

We routinely make predictions in

economic and social contexts!
Prediction in machine learning

Traffic predictions impact routing

decisions and hence traffic
Prediction in machine learning

Find news article

Zestimates set expectations

that impact sales prices
Prediction in machine learning

Find news article

Recommender systems filter

information and shape consumption
Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)

• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world

Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)

• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world

Supervised learning

• We represent the population as a distribution 𝐷 over data instances (𝑋, 𝑌)

• Predictive model given by a parameter vector 𝜃
• Find a good predictive model through risk minimization

Risk 𝜃, 𝐷 = E(",$)∼' [ loss (𝑥, 𝑦); 𝜃 ]

static description of the world

No language to articulate Morgenstern’s argument

Performative prediction

An extension of the classical risk minimization framework that accounts for

the causal effect of predictions on the target of prediction
Performative prediction

An extension of the classical risk minimization framework that accounts for

the causal effect of predictions on the target of prediction

Performativity is an established concept in economics,

finance, public policy, and social science
see, e.g., M. Callon, D. MacKenzie

Donald MacKenzie. 2006.

“How Financial Models Shape Markets”
Performative prediction

An extension of the classical risk minimization framework that accounts for

the causal effect of predictions on the target of prediction

Performativity is an established concept in economics,

finance, public policy, and social science
see, e.g., M. Callon, D. MacKenzie

Goal: bring performativity as a concept into

the foundations of machine learning

Donald MacKenzie. 2006.

“How Financial Models Shape Markets”
Framework
[PZMH20]

Performative prediction framework

Performativity thesis:
Predictions can have a causal influence on the world they aim to predict
[PZMH20]

Performative prediction framework

Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data

→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃
[PZMH20]

Performative prediction framework

Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data

→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃

Risk 𝜃, 𝐷(𝜃) = E(",$)∼'(() [ loss (𝑥, 𝑦); 𝜃 ]

the observed loss of a model is the loss on the distribution

that surfaces after its deployment
[PZMH20]

Performative prediction framework

Performativity thesis:
Predictions can have a causal influence on the world they aim to predict

Lens to the world is the data

→ Data distribution 𝐷 𝜃 changes in response to a deployed model 𝜃

Risk 𝜃, 𝐷(𝜃) = E(",$)∼'(() [ loss (𝑥, 𝑦); 𝜃 ]

the model impacts the risk in two ways:

through the loss and the distribution
Performative stability
Retraining and stability
Collect data and update your model given data

Repeated risk minimization (RRM): 𝜃!"# ← argmin% Risk 𝜃, 𝐷!

1. observe data distribution 𝐷!
2. let 𝜃!"# be the risk minimizer on 𝐷!
3. deploy 𝜃!"# → 𝐷!"# = 𝐷(𝜃!"# ) Risk(𝜃, 𝐷! )
4. repeat
𝜃! Risk(𝜃, 𝐷!"# )

𝜃!"#
𝜃!"$
[PZMH20]

Retraining and stability

Collect data and update your model given data

Repeated risk minimization (RRM): 𝜃!"# ← argmin% Risk 𝜃, 𝐷!

1. observe data distribution 𝐷!
2. let 𝜃!"# be the risk minimizer on 𝐷!
3. deploy 𝜃!"# → 𝐷!"# = 𝐷(𝜃!"# ) Risk(𝜃, 𝐷! )
4. repeat
𝜃! Risk(𝜃, 𝐷!"# )

Performative stability:
𝜃 ∗ = argmin" Risk 𝜃, 𝐷(𝜃 ∗ )
Model remains optimal after deployment
𝜃!"#
𝜃!"$
A natural equilibrium concept
When does retraining converge?

Definition: We say the distribution map 𝐷(𝜃) is 𝜖-sensitive if for all 𝜃, 𝜃 #

𝑊 𝐷 𝜃 , 𝐷 𝜃& ≤ 𝜖 𝜃 − 𝜃& $

“Similar models lead to similar distributions”

When does retraining converge?

Definition: We say the distribution map 𝐷(𝜃) is 𝜖-sensitive if for all 𝜃, 𝜃 #

𝑊 𝐷 𝜃 , 𝐷 𝜃& ≤ 𝜖 𝜃 − 𝜃& $

“Similar models lead to similar distributions”

Theorem [PZMH20]: Suppose the loss function is 𝛾-strongly convex in 𝜃 and

𝛽-smooth in the data*. Then retraining converges to a unique stable point as long as
𝐷(𝜃) is not too sensitive: 𝜖 < 𝛾/𝛽 . The rate of convergence is linear:
%& $
𝜃$ − 𝜃∗ ≤ '
||𝜃( − 𝜃 ∗ ||

* ∇𝜃 ℓ(𝑧, 𝜃) is 𝛽-Lipschitz in 𝑧
ℓ!%& 𝜃 : loss over 𝐷(𝜃!%& )

Proof sketch
ℓ! 𝜃 : loss over 𝐷(𝜃! )

𝜃!
∗
𝜃!%&
𝜃!∗

0
• 𝛾-strong convexity of the loss in 𝜃: [∇ℓ! 𝜃! − ∇ℓ! (𝜃!∗ )]# 𝜃! − 𝜃!∗ ≥ 𝛾||𝜃! − 𝜃!∗ ||$
0
• 𝛽-smoothness of the loss in the data: [∇ℓ! 𝜃! − ∇ℓ!%& 𝜃! ]# 𝜃! − 𝜃!∗ ≤ 𝛽 𝜃! − 𝜃!∗ 𝑊 𝐷(𝜃!%& , 𝐷(𝜃! ))

Kantorovich-Rubinstein duality theorem: for 𝐿-Lipschitz functions 𝑔:

E'∼)! 𝑔 𝑥 − E'∼)" 𝑔 𝑥 ≤ 𝐿 𝑊(𝐷& , 𝐷$ )
ℓ!%& 𝜃 : loss over 𝐷(𝜃!%& )

Proof sketch
ℓ! 𝜃 : loss over 𝐷(𝜃! )

𝜃!
∗
𝜃!%&
𝜃!∗

*
• 𝜖-sensitivity of 𝐷(⋅): ⇒ ||𝜃! − 𝜃!∗ || ≤ 𝑊 𝐷(𝜃!%& , 𝐷(𝜃! ))
+

𝛽
• 𝜖-sensitivity of 𝐷(⋅): ≤ 𝜖 𝜃!%& − 𝜃!
𝛾
𝛽 ∗
= 𝜖 𝜃!%& − 𝜃!%& contraction for 𝜖 < 𝛾/𝛽
𝛾
Fixed-point argument
Historical arguments about the possibility of
public prediction using Brouwer’s fixed point
theorem
• Simon 1954
• Grunberg, Modigliani 1954

If the response funtion Y = 𝑅(𝑌)= is continuous

then perfect public prediction is possible

Performative stability is a natural analogue of

these fixed points in parametric prediction
settings

Herbert Simon, 1954

Fixed-point argument
Historical arguments about the possibility of
public prediction using Brouwer’s fixed point Outcome
theorem 𝒀

• Simon 1954 “Win probability”

perfect
prediction
• Grunberg, Modigliani 1954
predicted value
If the response funtion Y = 𝑅(𝑌)= is continuous !
𝒀
then perfect public prediction is possible

Performative stability is a natural analogue of

these fixed points in parametric prediction
settings

Herbert Simon, 1954

[PZMH20]

When is sensitivity satisfied?

Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold

𝜃
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted

by unit change in 𝜃
[PZMH20]

When is sensitivity satisfied?

Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold

𝜃′
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted

by unit change in 𝜃
[PZMH20]

When is sensitivity satisfied?

Definition: We say the distribution map 𝐷(𝜃) is 𝝐-sensitive if for all 𝜃, 𝜃 "
𝑊 𝐷 𝜃 , 𝐷 𝜃" ≤ 𝜖 𝜃 − 𝜃" “Similar models lead
# to similar distributions”

Estimate a binary outcome Subsidize individuals below threshold

𝜃′
𝑌? = 𝑓$ (𝑋) = 𝜃𝑋 𝑋 ∈ [0,1]

Prediction is self-fulfilling: sensitivity

𝑃 𝑌 = 1 𝑋 = Bernoulli 𝜇(𝑋) + 𝜖 𝑓$ (𝑋)

𝜖 ∝ strength of effect Poverty score

𝜖 ∝ fraction of individuals impacted

by unit change in 𝜃
Retraining heuristics
Beyond risk minimization
Retraining heuristics as natural fixed point dynamics under performativity

𝜃!"# ← argmin' Risk(𝜃, 𝐷(𝜃! ))

Empirical risk using samples of 𝐷(𝜃% )
gradient update 𝜃% −𝜂 E&∼(($! ) [∇ℓ 𝑧; 𝜃% ]

• ERM and repeated gradient descent [PZMH20]

• Stochastic gradient descent [MPZH20, DX23]
• Proximal point methods [DX23]
• Projected gradient descent [WBD21]
[DX23]

Stochastic gradient descent

• SGD update uses unbiased estimate of the gradient: 𝑔! 𝜃 : = E,∼.! [∇ℓ' 𝑧; 𝜃 ]

• For small 𝜖 < 𝛾/𝛽 the gradient on problem 𝐷(𝜃% ) is aligned with the gradient on
problem 𝐷(𝜃 ∗ ) and never points against the gradient flow:

)* $
𝑔! (𝜃) − ∇𝑔∗ 𝜃 ≤ 𝜖𝛽 𝜃! − 𝜃∗ → cos ∡ 𝑔! 𝜃 , 𝑔∗ 𝜃 ≤ 1−
+

• It remains to choose the stepsize such that the variance decreases sufficiently
quickly as we approach stability to obtain the classical 𝑂 "! convergence
𝑔
rate.
#

We can study SGD as an algorithm to implicitly solve 𝑔∗

the static problem at equilibrium 𝜃
Risk(𝜃, 𝐷 𝜃 ∗ )
[DX23]

Stochastic gradient descent

• SGD update uses unbiased estimate of the gradient: 𝑔! 𝜃 : = E,∼.! [∇ℓ' 𝑧; 𝜃 ]

• For small 𝜖 < 𝛾/𝛽 the gradient on problem 𝐷(𝜃% ) is aligned with the gradient on
problem 𝐷(𝜃 ∗ ) and never points against the gradient flow:

)* $
𝑔! (𝜃) − ∇𝑔∗ 𝜃 ≤ 𝜖𝛽 𝜃! − 𝜃∗ → cos ∡ 𝑔! 𝜃 , 𝑔∗ 𝜃 ≤ 1−
+

• Choosing step size such that gradient variance decreases sufficiently quickly as we
approach stability implies classical 𝑂 "! convergence rate

SGD under performativity ≈ perturbed SGD at equilibrium distribution 𝐷(𝜃 ∗ )

[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically
[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically

Step size for greedy deploy globally decreasing Step size for lazy deploy locally decreasing
and more conservative as 𝜖 grows between deployments and independent of 𝜖
[MPZH20]

Greedy vs lazy
greedy = deploy every step lazy = deploy only periodically

Step size for greedy deploy globally decreasing Step size for lazy deploy locally decreasing
and more conservative as 𝜖 grows between deployments and independent of 𝜖

Which one works better?

[MPZH20]

Greedy vs lazy
Setup: Mean estimation z ∼ 𝑁(𝜇 + 𝜖𝜃, 𝜎 $ ) using ℓ 𝑧, 𝜃 = $% 𝑧 − 𝜃 $

𝜖 = 0.2 𝜖 = 0.6 𝜖 = 0.9

• Greedy deploy: Better if performativity is weak

• Lazy deploy: Better at dealing with strong shifts and poor initialization
[MPZH20]

Greedy vs lazy
deployments
greedy: 50K
Setup: Mean estimation z ∼ 𝑁(𝜇 + 𝜖𝜃, 𝜎 $ ) using ℓ 𝑧, 𝜃 = $% 𝑧 − 𝜃 $
lazy: 200

𝜖 = 0.2 𝜖 = 0.6 𝜖 = 0.9

• Greedy deploy: Better if performativity is weak

• Lazy deploy: Better at dealing with strong shifts and poor initialization
• Practical tradeoff between sample collection and deployment costs
[MMG23, LW24]

Stochastic optimization under nonconvexity

Performatively stable points are stationary points: E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ =0
Stationarity makes sense even with nonconvex losses!
More generally: 𝜃 ∗ is 𝛿-stationary performatively stable if
|| E/∼.('∗ ) ∇ℓ 𝑧; 𝜃 ∗ ||2 ≤ 𝛿
[MMG23, LW24]

Stochastic optimization under nonconvexity

Theorem [LW24]:
Assume 𝐷 𝜃 is 𝜖-sensitive and ℓ(𝑧; 𝜃) is Lipschitz in 𝜃 and possibly nonconvex. Then,
• greedy deploy satisfies
# 5 #
∑34# || E/∼.('& ) ∇ℓ 𝑧; 𝜃3 ||2 = 𝑂 +𝑂 𝜖 ;
2 5

• lazy deploy with batch size 𝐾 satisfies

# 5 # # )
∑ || E/∼.('& ) ∇ℓ 𝑧; 𝜃3 ||2 =𝑂 + +𝑂 .
2 34# 5 6 6
Performative stability and retraining: recap

• Performative stability as a natural equilibrium concept of retraining

• Retraining heuristics converge to stable points if problem is close to static

• Online vs offline updates as a new design choice for stochastic optimization

Next: Performative optimality

Performative optimality
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃

Performative optimality:
𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) lowest possible risk
after deployment
Performative optimality
Under performativity, after deploying 𝜃 the learner experiences performative risk
PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃

Performative optimality:
𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) lowest possible risk
after deployment

Performative stability:
𝜃7: = argmin9 Risk 𝜃, 𝐷(𝜃7: ) Do stable points have low
risk after deployment?
Stability and performative risk

Performative stability: on its own distribution 𝐷 𝜃-. , 𝜃-. looks optimal

PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃 − min, Risk 𝜙, 𝐷 𝜃 + min, Risk 𝜙, 𝐷 𝜃

= 0 for 𝜃 = 𝜃-. “easiness” of 𝐷 𝜃

→ θ-. approximately optimal if it induces ”easy” distribution Risk 𝜙, 𝐷(𝜃'( )

Risk 𝜙, 𝐷(𝜃') )
Not always true! Stable points can even maximize PR 𝜃

𝜃'*
𝜃')
Stability and performative risk
𝜖-sensitive

Example:

𝐷(𝜃): X ∼ Unif {−1, +1} , Y|X ∼ Bern(0.5 + 𝜖𝜃𝑋)

𝛾
=1
𝛽
ℓ (x, y); 𝜃 = (𝑦 − 𝑓𝜃 (x))2 where 𝑓𝜃 x = 𝜃𝑥 + 0.5

𝝐 ≤ 𝟏 → retraining converges to stable point 𝜃PS = 0

A direct calculation shows: PR 𝜃 = 0.25 + 1 − 2𝜖 𝜃2

/ 1
Non-convex for 𝜖 > 0.5! For 𝜖 ∈ ,
#0 0
stable point maximizes PR 𝜃
Optimizing the performative risk

Difficulties:

• no guarantee of convexity even is loss is convex

PR 𝜃 ≔ Risk(𝜃, 𝐷(𝜃))

only convex in first argument

• no gradient access
∇PR 𝜃 = E2∼(($) ∇ℓ z; 𝜃 + E2∼(($) [ℓ z; 𝜃 ∇log p𝜃 (z)]

Distribution map is unknown!

Optimizing the performative risk

Difficulties: For 𝑡 = 1, … , 𝑇:

• no guarantee of convexity even is loss is convex • Deploy model 𝜃𝑡

PR 𝜃 ≔ Risk(𝜃, 𝐷(𝜃)) • Collect data z𝑡1, … , 𝑧34 ∼ 𝐷(𝜃𝑡 )

only convex in first argument Compute 𝜃h PO based on 𝑆 = {𝜃3 , 𝑧35 }3,5

• no gradient access
∇PR 𝜃 = E2∼(($) ∇ℓ z; 𝜃 + E2∼(($) [ℓ z; 𝜃 ∇log p𝜃 (z)]

Distribution map is unknown!

We need to collect data from multiple deployments of 𝜃1, … , 𝜃𝑇

Model-free approaches Model-based approaches

• No explicit modeling of 𝐷(𝜃) required • Incorporate model of 𝐷(𝜃)

• Based on bandits and other zeroth-order • Based on economic models (e.g. utility-maximizing
optimization methods agents), other models from domain knowledge, etc

• Convergence relies on general regularity conditions • Convergence relies on model correctness or degree
(e.g. convexity, smooth distribution shifts, etc) of model misspecification

• Generally slow convergence • Typically fast convergence

For 𝑡 = 1, … , 𝑇:

Model-free vs model-based: example • Deploy model 𝜃𝑡

• Collect data z𝑡1, … , 𝑧34 ∼ 𝐷(𝜃𝑡 )

𝑓$ 𝑥 = 𝑥 7 𝜃 ℓ (𝑥, 𝑦 ; 𝜃) = 𝑦 − 𝑓$ (𝑥) 2
Compute 𝜃h PO based on 𝑆 = {𝜃3 , 𝑧35 }3,5

Model-free: Model-based:
4
1 We model the data-generating process:
j
PR 𝜃𝑡 = l ℓ(𝑧35 ; 𝜃3 ) • agents manipulate features to maximize the prediction:
𝑚
589 7
1
𝑥 = argmax𝑥 𝛾 ⋅ 𝑥 𝜃 − ||𝑥 − 𝑥0||2
2
𝜃h PO = argmin$∈{$",…,$#} P
jR 𝜃 • agents can manipulate features, not label
utility-cost tradeoff
This is a distribution map model 𝐷𝛾 𝜃

Fit 𝛾p using 𝑆 and let 𝜃h PO = argmin𝜃 Risk(𝜃, 𝐷1> 𝜃 )

What is the tradeoff?

Model-free vs model-based: example

Suppose model is 𝜏-misspecified: in addition to features, labels change too

𝑦 = 𝑦0 + 𝜏 ⋅ 𝑥 7 𝜃

For example, if feature manipulations have a causal effect on true label

General tradeoff

Theorem [LZ24] (informal):

PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

error due to modeling of the error due to having finite

distribution map deployments and finite data
General tradeoff

Theorem [LZ24] (informal):

PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

error due to modeling of the error due to having finite

distribution map deployments and finite data

Model-free: PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

0 large

Model-based: PR 𝜃h PO − PR 𝜃PO ≤ misspecification error + statistical error

depends on domain knowledge, 6
small, often 𝑂?
7
complexity of true map, etc
Model-free performative optimization
Convexity of the performative risk
Theorem [MPZ21]:
If the loss is 𝛾-strongly convex and 𝛽-smooth in the data and the distribution map is 𝜖-Lipschitz
1
and sufficiently regular*, then PR 𝜃 is guaranteed to be convex if and only if 𝜖 < .
#0

*e.g. distributions obtained by translation and rescaling:

𝑧 ∼ 𝐷(𝜃) ⇔ 𝑧 = Σ(𝜃) 𝑧0 + 𝜇(𝜃) for linear Σ 𝜃 , 𝜇(𝜃) (see [MPZ21] for details)
Convexity of the performative risk
Theorem [MPZ21]:
If the loss is 𝛾-strongly convex and 𝛽-smooth in the data and the distribution map is 𝜖-Lipschitz
1
and sufficiently regular*, then PR 𝜃 is guaranteed to be convex if and only if 𝜖 < .
#0

*e.g. distributions obtained by translation and rescaling:

𝑧 ∼ 𝐷(𝜃) ⇔ 𝑧 = Σ(𝜃) 𝑧0 + 𝜇(𝜃) for linear Σ 𝜃 , 𝜇(𝜃) (see [MPZ21] for details)

If PR 𝜃 is convex, we can use derivative-free convex optimization [FKM04]

𝑑 _
𝜃3?9 = 𝜃3 − 𝜂 ⋅ PR 𝜃3 + 𝛿𝑢3 𝑢3 , 𝑢3 ∼ Unif 𝑆𝑑 1 , 𝛿, 𝜂 > 0
𝛿

Converges to 𝜃-@ at rate O( 𝑑𝑡 A9/C ) only queries PR, not its gradient
Beyond convexity?
Optimization with no gradients and no convexity = continuum-arm bandit problem?
• “pull arm” 𝜃3 and observe bandit feedback PR j 𝜃3 with E PR
j 𝜃3 = PR(𝜃3 )
• assuming only Lipschitzness of PR we can apply Lipschitz bandits [KSU08]
Beyond convexity?
Optimization with no gradients and no convexity = continuum-arm bandit problem?
• “pull arm” 𝜃3 and observe bandit feedback PR j 𝜃3 with E PR
j 𝜃3 = PR(𝜃3 )
• assuming only Lipschitzness of PR we can apply Lipschitz bandits [KSU08]

Performative feedback is more informative than bandit feedback!

At every time step we deploy a model 𝜃3 and observe 𝑚 samples of the induced distribution 𝐷(𝜃3 )

→ faster convergence rates by constructing fine-grained confidence bounds

[JZM22]
Tighter confidence bounds

After deploying 𝜃3 we observe 𝐷(𝜃3 ) (ignoring finite-sample considerations for now)

What do we learn about the performative risk of an unexplored 𝜃DEF?

PR 𝜃DEF − PR 𝜃3 = Risk 𝜃DEF, 𝐷(𝜃DEF) − Risk 𝜃DEF, 𝐷(𝜃3 ) uncertainty due to

distribution shift
uncertainty due to
+ Risk 𝜃DEF, 𝐷(𝜃3 ) − Risk 𝜃3 , 𝐷(𝜃3 )
changing predictive model

Second term is known because we know the loss and 𝐷(𝜃3 )!

→ We only pay for uncertainty due to distribution shift

[JZM22]
Tighter confidence bounds

Confidence bound with bandit feedback Confidence bound with performative feedback
(Lipschitz PR(𝜃)) (Lipschitz Risk(𝜃, 𝐷(𝜙)) in 𝜙)

PR(𝜃) PR(𝜃)

𝜃GEHIJK 𝜃DEF 𝜃GEHIJK 𝜃DEF

[JZM22]
Tighter confidence bounds

Confidence bound with bandit feedback Confidence bound with performative feedback
(Lipschitz PR(𝜃)) (Lipschitz Risk(𝜃, 𝐷(𝜙)) in 𝜙)

upper
confidence bound

PR(𝜃) PR(𝜃)

𝜃GEHIJK 𝜃DEF 𝜃GEHIJK 𝜃DEF

Use successive elimination the algorithm can discard regions of the parameter space
to deal with finite-sample uncertainty [EMM06] that have never been explored!
Performative regret bound
Theorem [JZM22]:
Assume the distribution map 𝐷 𝜃 is 𝜖-sensitive and the loss ℓ(𝑧; 𝜃) is 𝐿2 -Lipschitz in 𝑧. Then,
the performative confidence bounds algorithm that after 𝑇 deployments achieves a regret
$%" $
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%&

where 𝑑 denotes the “zooming dimension” of the problem.

$' %" $' 𝐿 Lipschitz constant PR

Baseline: Lipschitz bandits [KSU08]: Reg 𝑇 = 𝑂z 𝑇 $' %& 𝐿 $' %& 𝑑: ≥ 𝑑 zooming dimension

Benefits of performative confidence bounds:

• regret bound scales with 𝜖 (no distribution shift → fast rate)
• z 𝑇) (no dimension dependence)
as 𝜖 → 0 bound scales as 𝑂(
• no assumption on loss as a function of 𝜃
Model-based performative optimization
Model-based approaches in a nutshell

Basic idea: learn a model of 𝐷 𝜃 and plug it into performative risk

~ 𝜃 - fitted model of 𝐷 𝜃 based on collected data

𝐷

Then we can solve

𝜃h PO = argmin𝜃 Risk(𝜃, 𝐷
~ 𝜃 )

~ 𝜃 identified correctly → we can find the optimal solution 𝜃h PO offline for any loss function!
𝐷

related to omniprediction [GKRSW22, KP23]

Microfoundations

Modeling 𝐷 𝜃 (“macro level”) in terms of the behavior of individual agents in the population (“micro level”)
Microfoundations

Modeling 𝐷 𝜃 (“macro level”) in terms of the behavior of individual agents in the population (“micro level”)
Example: strategic classification [HMPW16]
Distribution 𝐷(𝜃) comes from strategic behavior of individuals trying to adapt to decision rule

Rational-agent model
-
+ 𝑥 𝜃 = argmax 𝛾 𝑓$ 𝑥 − cost(𝑥M , 𝑥)
L

gain of positive cost of feature

clasification manipulation

𝐷1 𝜃 is “best response map“ over (𝑥 𝜃 , 𝑦)

𝑓;
strategic clasification
= performative prediction + microfoundation model
Location families
“Macro level” model:
𝑧 ∼ 𝐷 𝜃 ⇔ 𝑧M + 𝜇∗7 𝜃, 𝑧M ∼ 𝐷M
unknown “base” distribution,
𝜇∗ ∈ 𝑅 > ? = z< ∈ 𝑅 =

Theorem [JZM22]:
There exists an algorithm that after 𝑇 deployments achieves a regret
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇 max{𝑑, 𝑑𝑚} .

$%" $
Bandit approach: Reg 𝑇 = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%& fast rate regardless of the strength
of performative effects
Location families
“Macro level” model:
Satisfied in strategic classification model:
𝑧 ∼ 𝐷 𝜃 ⇔ 𝑧M + 𝜇∗7 𝜃, 𝑧M ∼ 𝐷M
𝑥 𝜃 = argmax 𝛾 𝑓$ 𝑥 − cost(𝑥M , 𝑥)
unknown “base” distribution, L
𝜇∗ ∈ 𝑅 > ? = z< ∈ 𝑅 = 𝑓; 𝑥 = 𝜃 7 𝑥
1
cost 𝑥< , 𝑥 = x − x< @ Λ(𝑥 − 𝑥< )
2

Theorem [JZM22]:
There exists an algorithm that after 𝑇 deployments achieves a regret
Reg 𝑇 = ∑7389 E [PR 𝜃3 ] − PR 𝜃-@ = 𝑂z 𝑇 max{𝑑, 𝑑𝑚} .

$%" $
Bandit approach: Reg 𝑇 = 𝑂z 𝑇+𝑇 $%& 𝐿2 𝜖 $%& fast rate regardless of the strength
of performative effects
Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification

Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification

Causal identification impossible if 𝜃 is fixed!

Example: if Zillow’s housing pricing algorithm is fixed, we can’t tell 𝐷 𝜃 and 𝐷OPQPRS apart

Randomizing 𝜃 allows identification

Performative modeling through causal inference
Modeling 𝐷 𝜃 is fundamentally a causal inference problem

𝐷 𝜃 is the “effect” of deploying model 𝜃

learning 𝐷 𝜃 ⇔ causal identification

Special case: performative effects mediated by model predictions [MDW22 , KP23]

Key challenge for causal identification

𝑓$ 𝑌?
violation of “positivity”: 𝑓$ (𝑋) often deterministic!
Identification achieved by
• randomizing predictions
𝑋 𝑌 • discrete predictions
• overparameterized predictions see [MDW22]
[MDW22]

Performative modeling through causal inference

Semi-synthetic experiment: predict income on US census data
Performative effects simulated on top of real census data

randomized 𝑓; discrete 𝑓;
deterministic 𝑓; (additive noise mechanism) (rounded predictions)
Performative risk

Including prediction
Gain of causal Gain of causal
can hurt performance
identification identification

Strength of performativity 𝛼 Strength of performativity 𝛼 Strength of performativity 𝛼

Performative optimality: recap

• Performative stability can be far from performatively optimal

• Finding performative optima requires exploring different models 𝜃R, … , 𝜃S

• Performative optimization can be done via model-based and model-free approaches

• Model-free approaches make fewer assumptions and converge slowly; model-based

approaches make stronger assumptions and converge fast, but they can suffer from
modeling biases

Next: Extensions
Extensions of the framework and connections
Performative prediction framework: recap

After deploying 𝜃 the learner experiences performative risk

PR 𝜃 ≔ Risk 𝜃, 𝐷 𝜃 = 𝐸T∼V 9 ℓ(𝑧; 𝜃)

Performative optimality: Performative stability:

𝜃78 = argmin9 PR 𝜃 = argmin9 Risk 𝜃, 𝐷(𝜃) 𝜃7: = argmin9 Risk 𝜃, 𝐷(𝜃7: )
[BHK22, RRDF22, LW22, IZY22]

Stateful distribution shifts

After deployment, the environment does not respond immediately

It remembers past deployments and gradually approaches 𝐷(𝜃)

Distribution at time 𝑡:
𝐷3 = 1 − 𝛿 ⋅ 𝐷3A9 + 𝛿 ⋅ 𝐷(𝜃3 )

how fast environment model deployed at time t

forgets past deployments

𝐷(𝜃A )
Model 𝜃 deployed over multiple steps
→ distribution close to 𝐷(𝜃)
𝐷AB6
[NFDFR23, PY23, WYW23]

Multiplayer performative prediction

Performativity arises in the context of 𝑛 competing decision-makers

Risk of decision-maker 𝑖 depends on all decisions:

PR R 𝜽 = E2∼(( (𝜽) ℓ 𝑧; 𝜃5 , where 𝜽 = (𝜃9 , … , 𝜃U )

Example: multiple navigation apps predict travel time; people respond by considering multiple
predictions

Main solution concept: Nash equilibrium 𝜽*= 𝜃9∗, … , 𝜃U∗

𝜃5∗ ∈ argmin𝜃 PR R 𝜃9∗, … , 𝜃5 , … , 𝜃U∗

!
Considerations in fairness

How do we choose ℓ? In a performative context, ℓ shouldn’t just measure predictive accuracy!

We want to optimize some notion of welfare in equilibrium

Neglecting performative feedback can amplify unfairness and polarization over time, even if
starting from a fair model [LDRSH18, HSNL18, JXLZ24]

The performative risk captures welfare through the dependence on 𝐷(𝜃)

W
For example, we can choose: ℓ (𝑥, 𝑦); 𝜃 = ℓ.V (𝑥, 𝑦); 𝜃 − 𝑦 #
#

promotes large values of the label

supervised learning loss

Loss can even depend only on data! e.g. ℓ (𝑥, 𝑦); 𝜃 = 𝑦 is a valid loss, because PR 𝜃 = 𝐸( $ [𝑦]
Shift in perspective:
Focus on those impacted by performativity
Performative risk
Two levers to achieve small risk

Risk 𝜃, 𝐷(𝜃) = E(…,†)∼‡(ˆ) [ loss (𝑥, 𝑦); 𝜃 ]

Steer data Fit patterns

given data

Finding optimal points = minimize PR 𝜃 : = Risk 𝜃, 𝐷 𝜃

Steering as a major concern for competition
EU vs Google
“[T]he General Court [of the European Union]
finds that, by favouring its own comparison
shopping service on its general results pages […]
by means of ranking algorithms, Google departed
from competition on the merits.”

Traditionally market power enabled a firm to set prices,

in digital markets power enables firms to steer users and drive consumption
[HJM22]

Performative power
Quantifying the strength of performativity as a notion of power

Performative power: The ability to impact individual outcomes through algorithmic

actions, on average across a population of users

1
P≔ sup 9 E dist(𝑍Z (𝑓M ), 𝑍Z (𝑓))
QSPRJD X∈Y |𝑈|
Z∈[

current outcome counterfactual outcome

choice of algorithmic action
for user u for user u under 𝑓
[HJM22]

Performative power
Quantifying the strength of performativity as a notion of power

Performative power: The ability to impact individual outcomes through algorithmic

actions, on average across a population of users

Average treatment effect

1
P≔ sup 9 E dist(𝑍Z (𝑓M ), 𝑍Z (𝑓))
QSPRJD X∈Y |𝑈|
Z∈[

current outcome counterfactual outcome

choice of algorithmic action
for user u for user u under 𝑓
A causal inference problem
Through performativity we can relate the abstract concept of power
to a causal inference problem.

How much would the average outcome change if the firm were to deploy a
different model?
click on a website/consumption of a service
A causal inference problem
Through performativity we can relate the abstract concept of power
to a causal inference problem.

How much would the average outcome change if the firm were to deploy a
different model?
click on a website/consumption of a service

The mechanism behind performativity is complex. It depends on social and

economic context, behavioral biases, and many design decisions on how
predictions are displayed.

Experimental designs offer a promising avenue!

Performativity gap
How do we measure performative power if we don’t have control over the algorithm?

Predictions are displayed through content arrangements!

search
query

Ranking algorithm arrangement

Performativity gap
How do we measure performative power if we don’t have control over the algorithm?

Predictions are displayed through content arrangements!

Performativity gap : 𝛿W 𝑎 = CTR W 𝑎 − CTR X (𝑎Y)

“Change in click through rate of an item under two different arrangements”

Assume independence across interactions, then performative power across the

population of platform participants is bounded as
P ≥ max 𝛿W (𝑎) 𝒜 are possible arrangements
Z∈𝒜
resulting from 𝑓 ∈ 𝐹
Quasi experimental designs
Consider alternative arrangements where items around the decision boundary are
swapped.
• Anderson, Magruder (2012) “An extra half-star rating [on Yelp] causes restaurants to
sell out 19 percentage points (49%) more frequently”

• Narayanan, Kalyanam (2015) “Being ranked 2 instead of 1 in Google Ads reduces

CTR by 21%”

These numbers speak to the performative power of the platform over its participants

Since performativity is context specific, we need to reassess it in each individual case

What is the causal effect of Google’s
ranking algorithm on user clicks?

shopping
box • Experiment as gold standard:
change the algorithm and inspect
effect

• Algorithm is proprietary and complex

generic
result • Intervene at the level of display to
emulate algorithmic updates
Browser extension

What is the causal effect of Google’s

ranking algorithm on user clicks?

• Experiment as gold standard:

change the algorithm and inspect
effect

• Algorithm is proprietary and complex

• Intervene at the level of display to

emulate algorithmic updates
[MCH24]

Performativity gap in online search

>70’000 search queries of 85 users collected over 3 months.
Randomized display for each query

𝛿9 = 0.44 CTR9 𝛿9 = 0.63 CTR9

Power of Google search over the incoming traffic to a website ranked in first
position corresponds to more than 44% of base traffic.
[MCH24]

Performativity gap in online search

Focus on queries with boxes naturally present.

𝛿# = 0.44 CTR# 𝛿# = 0.66 CTR#

The effect of adding top content and downranking an element

is larger than the effect of any of the two individual conducts
As its name suggests…

it is a search engine not a camera

Applications
• In antitrust investigations we care about performative power over a population of
consumers in a specific relevant market.

Google Shopping case these are consumers in CSS market.

A large fraction of them use Google search for navigating to the services.

• When mandating remedies we can use measures of performative power to

monitor effectiveness over population of interest.

• In consumer protection and fairness we care about power over subpopulations.

Power surfaces in performativity

Performative power can be instantiated flexibly
Let’s zoom out

population
algorithmic
systems
data

model
predictions

Performativity:
Predictions impact people
Let’s zoom out
Data about people feeds back
into the learning system
population
algorithmic
systems
data

model
predictions

Performativity:
Predictions impact people
Let’s zoom out
Finding performative optima:
Firm anticipates
distribution shift
population
algorithmic
systems
data

model
predictions
Let’s zoom out
Reversing the order of play:
Individuals anticipate the use of
their data for learning
population
algorithmic
systems
data

model
predictions

Performativity is
the reason they care
Algorithmic resistance
Algorithmic resistance
Coordinated efforts
Coordination is effective on the side of the users to have influence on
the learning algorithm.

platform

Data leverage: • data strikes, concious data contribution [VH21, VHS19, VLTCH21]
• algorithmic collective action [HMMZ23]
• collective infrastructure, e.g., GigSense [IFS24]

A single datapoint has little influence, but systematic patterns will be picked up on
[HMMZ23]

Model of algorithmic collective action

𝑃<
1−𝛼
𝑃
platform
𝑃∗

𝛼
Platform observes Platform trains
Individuals’ initial data
mixture distribution ML model 𝑓 on 𝑃
𝑥, 𝑦 ∼ 𝑃M
𝛼-fraction of the population 𝑃 = 1 − 𝛼 𝑃M + 𝛼𝑃∗
joins the collective and
implements a strategy
to change data
Collective goal:
Favorable property of 𝑓
Anticipate retraining
Theorem [HMMZ23]: For controlling the output of a gradient-based
learner it is sufficient to have a collective of fraction 𝛼 repeatedly 𝛼 ∝ suboptimality of
modifying their data as long as the targeted solution 𝜃 ∗
𝛼 ≥ 𝑂 E2∼\! |∇ℓ 𝜃 ∗; 𝑧 |
Anticipate retraining
Theorem [HMMZ23]: For controlling the output of a gradient-based
learner it is sufficient to have a collective of fraction 𝛼 repeatedly 𝛼 ∝ suboptimality of
modifying their data as long as the targeted solution 𝜃 ∗
𝛼 ≥ 𝑂 E2∼\! |∇ℓ 𝜃 ∗; 𝑧 |

“provoke target classification at test time”

𝑓 𝑔 𝑥 = 𝑦∗
Theorem [HMMZ23]: For planting a signal against an 𝜖-optimal risk
minimizing learner with success 𝑝∗ it is sufficient to have a collective of
fraction 𝛼 with 𝜉 𝛼 ∝ uniquesness of the
𝛼≥
1 − 𝑝∗ + 𝜉 signal to be planted
where 𝜉 = 𝑃M {𝑔 𝑥 : 𝑥 ∈ 𝑋}
Small collectives can be effective
By strategically correlating a single character in the CV with a skill at training time, gig
workers can plant a trigger to be exploited at test time.

• Against a Bert classifier a collective size of 0.1% is sufficient [HMMZ23].

The more accurate the learner, the more effective the strategy

Various data poisoning example demonstrate the feasibility of impacting learner

with few datapoints, see [TCLY22] for an overview.
[BM24]

Small changes can be sufficient

By reordering playlists and

strategically choosing the
position of a target song,
fans can have disproportionate
impact on transformer-based
recommendations at test time.

utility preserving actions can be effective

Incentives to participate
Utility of firm and participants often not aligned
Collective action gives power to participants

Incentives for participation:

• collective action comes with overheads and constraints
• typically not self-incentivized (see Olson 1956)
• firms might want to protect against it, punish participation, or move away from
statistical learning
[HM23]

Incentives to participate
Utility of firm and participants often not aligned
Collective action gives power to participants

Incentives for participation:

The performativity of predictions determines payoff of

strategies and how much cost individuals are willing to incur
Discussion
Discussion
• When deployed in the real world AI predictions are part of a broader
sociotechnical ecosystem

• Performativity is pervasive

• Changing predictions means changing outcomes

• Prediction is no longer a purely technical endeavor

• Solution concepts are context dependent

Open problems and challenges
• Major challenge for practical developments: data availability

• Performative optimization requires exploring models; how do we do so safely?

Should we aim for a different solution concept?

• How should performativity in machine learning be regulated? What kinds of

performative effects are acceptable?

• We saw that predictions impact people and people impact predictions; how do
we model this jointly?
Thank you!
Questions?
References
[PZMH20] Perdomo, J., Zrnic, T., Mendler-Dünner, C., & Hardt, M. Performative prediction.
International Conference on Machine Learning (ICML), 2020

[MPZH20] Mendler-Dünner, C., Perdomo, J., Zrnic, T., & Hardt, M. Stochastic optimization for performative
prediction. Advances in Neural Information Processing Systems (NeurIPS), 2020

[DX23] Drusvyatskiy, D., & Xiao, L. Stochastic optimization with decision-dependent distributions.
Mathematics of Operations Research, 2023

[WBD21] Wood, K., Bianchin, G., & Dall’Anese, E. Online projected gradient descent for stochastic optimization
with decision-dependent distributions. IEEE Control Systems Letters, 2021

[MMG23] Mofakhami, M., Mitliagkas, I., & Gidel, G. Performative prediction with neural networks.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023

[LW24] Li, Q., & Wai, H. T. Stochastic optimization schemes for performative prediction with nonconvex loss.
ArXiv preprint arXiv:2405.17922, 2024

[LZ24] Lin, L., & Zrnic, T. Plug-in performative optimization.

International Conference on Machine Learning (ICML), 2024
[MPZ21] Miller, J. P., Perdomo, J. C., & Zrnic, T. Outside the echo chamber: Optimizing the performative risk.
International Conference on Machine Learning (ICML), 2021

[KSU08] Kleinberg, R., Slivkins, A., & Upfal, E. Multi-armed bandits in metric spaces.
ACM Symposium on Theory of Computing (STOC), 2008

[EMM06] Even-Dar, E., Mannor, S., Mansour, Y., & Mahadevan, S. Action elimination and stopping conditions for
the multi-armed bandit and reinforcement learning problems. Journal of Machine Learning Research (JMLR), 2006

[GKRSW22] Gopalan, P., Kalai, A. T., Reingold, O., Sharan, V., & Wieder, U. Omnipredictors.
Innovations in Theoretical Computer Science (ITCS), 2022

[KP23] Kim, M. P., & Perdomo, J. C. Making decisions under outcome performativity.
Innovations in Theoretical Computer Science (ITCS), 2023

[HMPW16] Hardt, M., Megiddo, N., Papadimitriou, C., & Wootters, M. Strategic classification.
Innovations in Theoretical Computer Science (ITCS), 2016

[MDW22] Mendler-Dünner, C., Ding, F., & Wang, Y. Anticipating performativity by predicting from predictions.
Advances in Neural Information Processing Systems (NeurIPS), 2022
[BHK22] Brown, G., Hod, S., & Kalemaj, I. Performative prediction in a stateful world.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

[RRDF22] Ray, M., Ratliff, L. J., Drusvyatskiy, D., & Fazel, M. Decision-dependent risk minimization in geometrically
decaying dynamic environments. AAAI Conference on Artificial Intelligence (AAAI), 2022

[LW22] Li, Q., & Wai, H. T. State dependent performative prediction with stochastic approximation.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022.

[IZY22] Izzo, Z., Zou, J., & Ying, L. How to learn when data gradually reacts to your model.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022

[NFDFR23] Narang, A., Faulkner, E., Drusvyatskiy, D., Fazel, M., & Ratliff, L. J. Multiplayer performative prediction:
Learning in decision-dependent games. Journal of Machine Learning Research (JMLR), 2023

[PY23] Piliouras, G., & Yu, F. Y. Multi-agent performative prediction: From global stability and optimality to chaos.
ACM Conference on Economics and Computation (EC), 2023

[WYW23] Wang, X., Yau, C. Y., & Wai, H. T. Network effects in performative prediction games.
International Conference on Machine Learning (ICML), 2023
[MTR23] Mandal, D., Triantafyllou, S., & Radanovic, G. Performative reinforcement learning.
International Conference on Machine Learning (ICML), 2023

[CHM24] Cheng, G., Hardt, M., Mendler-Dünner, C. Causal Inference out of Control: Identifying the Steerability
of Consumption. International Conference on Machine Learning (ICML), 2024

[RTMR22] Rank, B., Triantafyllou, S., Mandal, D., & Radanovic, G. Performative reinforcement learning in
gradually shifting environments. Conference on Uncertainty in Artificial Intelligence (UAI), 2024

[LDRSH18] Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. Delayed impact of fair machine learning.
International Conference on Machine Learning (ICML), 2018

[HSNL18] Hashimoto, T., Srivastava, M., Namkoong, H., & Liang, P. Fairness without demographics in repeated
loss minimization. International Conference on Machine Learning (ICML), 2018

[JXLZ24] Jin, K., Xie, T., Liu, Y., & Zhang, X. Addressing polarization and unfairness in performative prediction.
ArXiv preprint arXiv:2406.16756, 2024

[HJM22] Hardt, M., Jagadeesan M., Mendler-Dünner C. Performative power.

Advances in Neural Information Processing Systems (NeurIPS), 2022.
[MCH24] Mendler-Dünner, C., Caravano, G., Hardt, M. An engine not a camera: measuring performative power
of online search. ArXiv preprint arXiv:2405.19073, 2024

[VHS19] Vincent, N., Hecht, B., &Sen S. “Data Strikes”: Evaluating the Effectiveness of a New Form of Collective
Action Against Technology Companies. The World Wide Web Conference (WWW). 2019

[VH21] Vincent, N., Hecht, B. Can “Conscious Data Contribution” Help Users to Exert “Data Leverage” Against
Technology Companies? ACM Human-Computer Interaction. 2021

[VLTCH21] Vincent, N., Li, H., Tilly, N., Chancellor, S., Hecht, B. Data Leverage: A Framework for Empowering the
Public in its Relationship with Technology Companies. ACM Conference on Fairness, Accountability, and
Transparency (FAccT). 2021

[HMMZ23] Hardt, M., Mazumdar, E., Mendler-Dünner, C., Zrnic, T. Algorithmic collective action in machine
learning. International Conference on Machine Learning (ICML), 2023

[IFS24] Imteyaz, K., Flores-Saviaga, C., Savage, S. GigSense: An LLM-Infused Tool for Workers’ Collective
Intelligence. Arxiv preprint arXiv:2405.02528, 2024

[TCLY22] Tian, Z., Cui, L., Liang, J., Yu, S.. A comprehensive survey on poisoning attacks and countermeasures in
machine learning. ACM Computing Surveys, 55(8), 2022.
[BM24] Baumann, J., Mendler-Dünner, C. Algorithmic collective action in recommender systems.
European Workshop on Algorithmic Fairness (EWAF), 2024

[HM23] Hardt, M., & Mendler-Dünner, C. Performative prediction: past and future.
ArXiv preprint arXiv:2310.16608, 2023

PP Orig
No ratings yet
PP Orig
32 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
L1-Understanding Diffusion Models A Unified Persp
No ratings yet
L1-Understanding Diffusion Models A Unified Persp
27 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Main 2
No ratings yet
Main 2
37 pages
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
No ratings yet
Bayesian Nonparametrics and The Probabilistic Approach To Modelling
27 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Instrumental Variable Algorithms Explained
No ratings yet
Instrumental Variable Algorithms Explained
31 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Deep Learning For Causal Inference
No ratings yet
Deep Learning For Causal Inference
67 pages
Class19 Approxinf
No ratings yet
Class19 Approxinf
45 pages
Lecture 4: Inference, Asymptotics & Monte Carlo: August 11, 2018
No ratings yet
Lecture 4: Inference, Asymptotics & Monte Carlo: August 11, 2018
39 pages
Numerical Methods Collard
No ratings yet
Numerical Methods Collard
431 pages
Lect Notes 1
No ratings yet
Lect Notes 1
91 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
100 pages
NeurIPS 2018 Why Is My Classifier Discriminatory Paper
No ratings yet
NeurIPS 2018 Why Is My Classifier Discriminatory Paper
12 pages
CausalML Book
50% (2)
CausalML Book
496 pages
Introduction to Predictive Analytics with SAS
No ratings yet
Introduction to Predictive Analytics with SAS
54 pages
Causal Inference with ML and AI Techniques
No ratings yet
Causal Inference with ML and AI Techniques
518 pages
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
No ratings yet
Generalization Bounds and Representation Learning For Estimation of Potential Outcomes and Causal Effects
50 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
L10 Evaluation
No ratings yet
L10 Evaluation
57 pages
Model Comparison and Calibration Assessment
No ratings yet
Model Comparison and Calibration Assessment
70 pages
CausalML Book 2022
No ratings yet
CausalML Book 2022
500 pages
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
No ratings yet
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
254 pages
Sanet ST
No ratings yet
Sanet ST
385 pages
Machine Learning in Econometrics Course
No ratings yet
Machine Learning in Econometrics Course
197 pages
MLDL Lecture 1
No ratings yet
MLDL Lecture 1
28 pages
Beyond Classification Beyond Classification Beyond Classification Beyond Classification
No ratings yet
Beyond Classification Beyond Classification Beyond Classification Beyond Classification
23 pages
Causal Tutorial 1
No ratings yet
Causal Tutorial 1
18 pages
HD - Machine Learnind and Econometrics
No ratings yet
HD - Machine Learnind and Econometrics
185 pages
Aer15 Prediction
No ratings yet
Aer15 Prediction
5 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Taleb Text
100% (1)
Taleb Text
131 pages
A Confidence Representation Theorem For Ambiguous Sources With Applications To Financial Markets and Trade Algorithm
No ratings yet
A Confidence Representation Theorem For Ambiguous Sources With Applications To Financial Markets and Trade Algorithm
47 pages
Diebold (2017) - Forecasting in Economics, Business, Finance and Beyond
100% (1)
Diebold (2017) - Forecasting in Economics, Business, Finance and Beyond
619 pages
Actuarial Models Guide for Exam C/4
100% (1)
Actuarial Models Guide for Exam C/4
724 pages
"Prediction Policy Problems in Economics"
No ratings yet
"Prediction Policy Problems in Economics"
6 pages
Econometric Forecasting Techniques
No ratings yet
Econometric Forecasting Techniques
86 pages
ML 3
No ratings yet
ML 3
66 pages
Actuarial Models for Exam C Preparation
100% (4)
Actuarial Models for Exam C Preparation
810 pages
Lesson 10
No ratings yet
Lesson 10
61 pages
Nonparametric Risk Bounds in Forecasting
No ratings yet
Nonparametric Risk Bounds in Forecasting
40 pages
Zellner
No ratings yet
Zellner
112 pages
Calibration Test
No ratings yet
Calibration Test
72 pages
NeurIPS 2021 Adaptive Conformal Inference Under Distribution Shift Paper
No ratings yet
NeurIPS 2021 Adaptive Conformal Inference Under Distribution Shift Paper
13 pages
Vovk 21 A
No ratings yet
Vovk 21 A
20 pages
Inductive Conformal Martingales For Change-Point Detection: Denis Volkhonskiy
No ratings yet
Inductive Conformal Martingales For Change-Point Detection: Denis Volkhonskiy
22 pages
Multiply-Robust Causal Change Attribution
No ratings yet
Multiply-Robust Causal Change Attribution
21 pages
The Lifecycle of A Statistical Model: Model Failure Detection, Identification, and Refitting
No ratings yet
The Lifecycle of A Statistical Model: Model Failure Detection, Identification, and Refitting
56 pages
Diagnosing Model Performance Under Distribution Shift: Tiffany (Tianhui) Cai Hongseok Namkoong Steve Yadlowsky
No ratings yet
Diagnosing Model Performance Under Distribution Shift: Tiffany (Tianhui) Cai Hongseok Namkoong Steve Yadlowsky
41 pages
Statistics in Medicine - 2017 - Andersen - Causal Inference in Survival Analysis Using Pseudo Observations
No ratings yet
Statistics in Medicine - 2017 - Andersen - Causal Inference in Survival Analysis Using Pseudo Observations
13 pages
Causal Without Control
No ratings yet
Causal Without Control
63 pages
Anahita Nodehi DT30277
No ratings yet
Anahita Nodehi DT30277
197 pages
ST1 Science-5 Q3
No ratings yet
ST1 Science-5 Q3
3 pages
Skill Builder 3
No ratings yet
Skill Builder 3
2 pages
Ccsu252531056 1752070395
No ratings yet
Ccsu252531056 1752070395
2 pages
Master Store Items With LPP 12-09-2022
No ratings yet
Master Store Items With LPP 12-09-2022
44 pages
Marie Macey, Alan H Carling - Ethnic, Racial and Religious Inequalities - The Perils of Subjectivity (Migration, Minorities and Citizenship) - Palgrave Macmillan (2010)
No ratings yet
Marie Macey, Alan H Carling - Ethnic, Racial and Religious Inequalities - The Perils of Subjectivity (Migration, Minorities and Citizenship) - Palgrave Macmillan (2010)
233 pages
Understanding Parallel Processing in Computing
No ratings yet
Understanding Parallel Processing in Computing
5 pages
Organizational Behavior Insights
100% (2)
Organizational Behavior Insights
17 pages
Introduction To System Analysis and Design (Slides)
100% (7)
Introduction To System Analysis and Design (Slides)
46 pages
Preface
No ratings yet
Preface
7 pages
Unit IC Examiners' Report Jan 2015
100% (4)
Unit IC Examiners' Report Jan 2015
11 pages
SAT Essay Writing Guide
No ratings yet
SAT Essay Writing Guide
23 pages
My Action Plan02023
No ratings yet
My Action Plan02023
3 pages
OWNERS MANUAL Safety - Eng - 190604 2
No ratings yet
OWNERS MANUAL Safety - Eng - 190604 2
22 pages
Adobe India Hackathon 2025
No ratings yet
Adobe India Hackathon 2025
1 page
API RP 500 - Recommended Practice For Cla
100% (1)
API RP 500 - Recommended Practice For Cla
6 pages
Assembly Language Reference PDF
No ratings yet
Assembly Language Reference PDF
366 pages
Encore Manual 8.Xx
No ratings yet
Encore Manual 8.Xx
148 pages
B.Ed 18NOV2022
No ratings yet
B.Ed 18NOV2022
60 pages
Professional-Client Relationships - PDF
No ratings yet
Professional-Client Relationships - PDF
18 pages
Numerical Analysis Project
No ratings yet
Numerical Analysis Project
12 pages
Theories of First Language Acquisition
No ratings yet
Theories of First Language Acquisition
5 pages
WongWingYiu HSBC TransactionHistory
No ratings yet
WongWingYiu HSBC TransactionHistory
27 pages
Bba 206
No ratings yet
Bba 206
332 pages
Making Successful Transitions: UMW Speaking Center Presents
No ratings yet
Making Successful Transitions: UMW Speaking Center Presents
2 pages
Linear Regression and Correlation: Mcgraw Hill/Irwin
No ratings yet
Linear Regression and Correlation: Mcgraw Hill/Irwin
37 pages
2 Mark Questions Group Theory
No ratings yet
2 Mark Questions Group Theory
2 pages
10 Science Notes 10+light Reflection and Refraction 1
No ratings yet
10 Science Notes 10+light Reflection and Refraction 1
19 pages
Civil Engineering Lab: Slab & Beam Designs
No ratings yet
Civil Engineering Lab: Slab & Beam Designs
10 pages
HUL Supply Chain Insights
No ratings yet
HUL Supply Chain Insights
84 pages
Marginal Rate of Substitution Explained
No ratings yet
Marginal Rate of Substitution Explained
1 page