0% found this document useful (0 votes)
6 views65 pages

Automatic Causal Discovery

Automatic causal discovery An empirical study of one of the simplest causal prediction algorithms dfgdg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views65 pages

Automatic Causal Discovery

Automatic causal discovery An empirical study of one of the simplest causal prediction algorithms dfgdg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Automatic Causal Discovery

Richard Scheines
Peter Spirtes, Clark Glymour

Dept. of Philosophy & CALD


Carnegie Mellon
1
Outline

1. Motivation
2. Representation
3. Discovery
4. Using Regression for Causal
Discovery

2
1. Motivation

Day Care Aggressivenes


Non-experimental Evidence John A lot
s
A lot
Mary None A little

Typical Predictive Questions


• Can we predict aggressiveness from Day Care
• Can we predict crime rates from abortion rates 20 years ago

Causal Questions:
• Does attending Day Care cause Aggression?
• Does abortion reduce crime?
3
Causal Estimation

When and how can we use non-experimental


data to tell us about the effect of an
intervention?

Manipulated Probability P(Y | X set= x, Z=z)

from

Unmanipulated Probability P(Y | X = x, Z=z)


4
Conditioning vs. Intervening

P(Y | X = x1) vs. P(Y | X set= x1)

 Stained Teeth Slides

5
2. Representation

1. Representing causal structure, and


connecting it to probability
2. Modeling Interventions

6
Causation & Association

X and Y are associated iff


x1  x2 P(Y | X = x1)  P(Y | X = x2)

X is a cause of Y iff
x1  x2 P(Y | X set= x1)  P(Y | X set= x2)

7
Direct Causation

X is a direct cause of Y relative to S, iff


z,x1  x2 P(Y | X set= x1 , Z set= z)
 P(Y | X set= x2 , Z set= z)

where Z = S - {X,Y}

X Y

8
Association

X and Y are associated iff


X Y
x1  x2 P(Y | X = x1)  P(Y | X = x2)

X and Y are independent iff


X and Y are not associated X Y

9
Causal Graphs

Causal Graph G = {V,E}


Each edge X  Y represents a direct causal claim:
X is a direct cause of Y relative to V

Match
Match Match
Tip
Struck Lights
Temperature

10
Modeling Ideal Interventions

Ideal Interventions (on a variable X):

• Completely determine the value or distribution


of a variable X

• Directly Target only X


(no “fat hand”)
E.g., Variables: Confidence, Athletic Performance
Intervention 1: hypnosis for confidence
Intervention 2: anti-anxiety drug (also muscle relaxer)

11
Modeling Ideal Interventions

Interventions on the Effect

Post
Pre-experimental System

Teeth Smoking
Stains

12
Modeling Ideal Interventions

Interventions on the Cause

Post
Pre-experimental System
Teeth Smoking
Stains

13
Interventions & Causal Graphs

• Model an ideal intervention by adding an “intervention”


variable outside the original system

• Erase all arrows pointing into the variable intervened upon

Intervene to change Inf


Pre-intervention graph Post-intervention graph?
Exp Inf Rash
Exp Inf Rash

14
Calculating the Effect of
Interventions

Smoking [0,1]
P(YF,S,L) = P(S) P(YF|S) P(L|S)

Yellow Fingers Lung Cancer Replace pre-manipulation causes


[0,1] [0,1]
with manipulation

P(YF,S,L)m = P(S) P(YF|Manip) P(L|S)

Manipulation Smoking [0,1]

Yellow Fingers Lung Cancer


[0,1] [0,1]

15
Calculating the Effect of
Interventions

Smoking [0,1] P(YF,S,L) = P(S) P(YF|S) P(L|S)

Probability
Yellow Fingers Lung Cancer
P(L|YF)
Calculus

[0,1] [0,1]

P(YF,S,L) = P(S) P(YF|Manip) P(L|S) Probability


Calculus
P(L| YF set by Manip)

Manipulation Smoking [0,1]

Yellow Fingers Lung Cancer


[0,1] [0,1]
16
The Markov Condition

Causal Statistical
Structure Predictions
Markov Condition

Causal Graphs Independence

X Y Z X _||_ Z | Y
i.e.,

P(X | Y) = P(X | Y, Z)
17
Causal Markov Axiom

In a Causal Graph G, each variable V is

independent of its non-effects,


conditional on its direct causes

in every probability distribution that G can


parameterize (generate)

18
Causal Graphs 
Independence

Acyclic causal graphs:


d-separation  Causal Markov axiom

Cyclic Causal graphs:


 Linear structural equation models : d-
separation, not Causal Markov
 For some discrete variable models: d-
separation, not Causal Markov
 Non-linear cyclic SEMs : neither

19
Causal Structure  Statistical Data

Acyclic Causal Graph

X1 X2 X3
Causal Markov Axiom
(D-separation)
Independence
Relations

X1 X3 | X2

20
Causal Discovery
Statistical Data  Causal Structure

Equivalence Class of
Causal Graphs

X1 X2 X3
Causal Markov Axiom
X1 X2 X3 (D-separation)
Independence
Relations
X1 X2 X3 Discovery Algorithm
X1 X3 | X2

Background Knowledge
e.g., X2 before X3

21
Equivalence Classes

 D-separation equivalence
 D-separation equivalence over a set O
 Distributional equivalence
 Distributional equivalence over a set O

Two causal models M1 and M2 are distributionally


equivalent iff for any parameterization 1 of M1,
there is a parameterization 2 of M2 such that M1(1)
= M2(2), and vice versa.
22
Equivalence Classes

For example, interpreted as SEM models

M1 and M2 : d-separation equivalent & distributionally equivalent


M3 and M4 : d-separation equivalent & not distributionally equivalent

1 2 1
'1 '2
2 1
X1 X2
  X1 X2 2
X1 X2 X1 X2 X3
3 3 X3
1 2 '3
3 4

23
D-separation Equivalence Over a set X

Let X = {X1,X2,X3}, then Ga and Gb


1) are not d-separation equivalent, but
2) are d-separation equivalent over X
T1 X3
X3
X1 T2
X1 X2 X2

Ga Gb

24
D-separation Equivalence
D-separation Equivalence Theorem (Verma and
Pearl, 1988)

Two acyclic graphs over the same set of


variables are d-separation equivalent iff they
have:
 the same adjacencies
 the same unshielded colliders

25
Representations of
D-separation Equivalence Classes

We want the representations to:


 Characterize the Independence
Relations Entailed by the
Equivalence Class

 Represent causal features that are


shared by every member of the
equivalence class

26
Patterns & PAGs
Patterns (Verma and Pearl, 1990): graphical
representation of an acyclic d-separation
equivalence
- no latent variables.

PAGs: (Richardson 1994) graphical representation


of an equivalence class including latent variable
models and sample selection bias that are d-
separation equivalent over a set of measured
variables X

27
Patterns

Possible Edges Example

X1 X2 X1 X2

X1 X2
X3 X4
X1 X2

28
Patterns: What the Edges
Mean

X1 X2 X1 and X2 are not adjacent in any


member of the equivalence class

X1  X2 (X1 is a cause of X2) in


X1 X2 every member of the equivalence
class.

X1  X2 in some members of the


X1 X2
equivalence class, and X2  X1 in
others.

29
Patterns

X1 X2
DAG
X3 X4

D-separation Equivalence Class

??????

30
Patterns

X1 X2
Pattern
X3 X4

Represents

X1 X2 X1 X2

X3 X4 X3 X4

31
Patterns
Not all boolean combinations of orientations of
unoriented pattern adjacencies occur in the equivalence
class.
X1 X2 X3
Pattern

Represents

Not
X1 X2 X3
X1 X2 X3
X1 X2 X3

X1 X2 X3
32
PAGs: Partial Ancestral Graphs

What PAG edges mean.


X1 X2 X1 and X2 are not adjacent

X1 X2 X2 is not an ancestor of X1

X1 X2 No set d-separates X2 and X1

X1 X2 X1 is a cause of X2

X1 X2 There is a latent common


cause of X1 and X2

33
PAGs: Partial Ancestral Graph
X1 X2
PAG
X3

Represents

X1 X2 X1 X2

T1
X3 X3
etc.
X1
X2
X1 X2
T1
T1 T2
X3
X3

34
Search Difficulties
 The number of graphs is super-exponential
in the number of observed variables (if
there are no hidden variables) or infinite (if
there are hidden variables)
 Because some graphs are equivalent, can
only predict those effects that are the same
for every member of equivalence class
 Can resolve this problem by outputting
equivalence classes

35
What Isn’t Possible

Given just data, and the Causal


Markov and Causal Faithfulness
Assumptions:
 Can’t get probability of an effect being
within a given range without assuming a
prior distribution over the graphs and
parameters

36
What Is Possible

Given just data, and the Causal


Markov and Causal Faithfulness
Assumptions:
 There are procedures which are
asymptotically correct in predicting
effects (or saying “don’t know”)

37
Overview of Search Methods

 Constraint Based Searches


 TETRAD
 Scoring Searches
 Scores: BIC, AIC, etc.
 Search: Hill Climb, Genetic Alg., Simulated
Annealing
 Very difficult to extend to latent variable models

Heckerman, Meek and Cooper (1999). “A Bayesian Approach to


Causal Discovery” chp. 4 in Computation, Causation, and
Discovery, ed. by Glymour and Cooper, MIT Press, pp. 141-166
38
Constraint-based Search
 Construct graph that most closely implies
conditional independence relations found in
sample
 Doesn’t allow for comparing how much
better one model is than another
 It is important not to test all of the possible
conditional independence relations due to
speed and accuracy considerations – FCI
search selects subset of independence
relations to test
39
Constraint-based Search
Can trade off informativeness versus
speed, without affecting correctness
Can be applied to distributions where
tests of conditional independence
are known, but scores aren’t
Can be applied to hidden variable
models (and selection bias models)
Is asymptotically correct

40
Search for Patterns

Adjacency:
•X and Y are adjacent if they are dependent conditional on
all subsets that don’t include X and Y

•X and Y are not adjacent if they are independent


conditional on any subset that doesn’t include X and Y

41
Search

X1
Independencies entailed???
X3 X4
X2

42
Search

Independencies entailed
X1
X3 X4 X1 _||_ X2
X1_||_ X4 | X3
X2 X2_||_ X4 | X3

43
Search: Adjacency

Caus al Independcies
Graph
X1 X1 X2
X3 X4 X1 X4 {X3}
X2 X2 X4 {X3}

X1
Begin with:
X3 X4
X2

44
Causal Independcies
Graph
X1 X1 X2
X3 X4 X1 X4 {X3}
X2 X2 X4 {X3}

X1
Begin with:
X3 X4
X2

From X1

X1 X2 X3 X4
X2

From X1

X1 X4 {X3} X3 X4
X2

From
X1
X2 X4 {X3}
X3 X4
X2

45
Search: Orientation in Patterns

Before OrientationY Unshielded


X Y Z
X Z|Y X Z|Y
Collider Non-collider

X Y Z X Y Z

X Y Z

X Y Z

X Y Z

46
Search: Orientation in PAGs

Y Unshielded

X Y Z
X Z|Y X Z|Y
Collider Non-collider

X Y Z X Y Z

47
Orientation: Away from Collider

1) X1 - X2 adjacent, and
Test Conditions into X2.
X1 X3 2) X2 - X3 adjacent, and
* unoriented.
X2 3) X1 - X3 not adjacent

Test X1 X3 2)| X21 - X2 into X2

No Yes

X1 X3 X1 X3
* *
X2 X2

48
Search: Orientation

Pattern PAG
X1 X1
After Orientation
X3 X4 X3 X4
Phase
X2 X2

X1
X1
X1 || X2 X3 X4
X3 X4
X2
X2

X1
X1 X3 X4
X1 || X4 | X3 X3 X4 X2
X2 || X4 | X3 X2

49
Knowing when we know enough to
calculate the effect of Interventions

Observation: IQ _||_ Lead


PAG
Exposure to
Background Knowledge: Lead prior to IQ Lead
IQ

SES

Exposure to Exposure to IQ
IQ
Lead Lead

P(IQ | Lead)  P(IQ | Lead set=) P(IQ | Lead) = P(IQ | Lead set=)

50
Knowing when we know enough to
calculate the effect of Interventions
Observation: All pairs associated
Lead _||_ Grades | IQ PAG
Exposure to IQ Grades
Background Lead prior to IQ prior Lead
Knowledge to Grades

SES

Exposure to Exposure to IQ Grades


IQ Grades
Lead Lead

P(IQ | Lead)  P(IQ | Lead set=) P(IQ | Lead) = P(IQ | Lead set=)

P(Grades | IQ) = P(Grades | IQ set=) P(Grades | IQ) = P(Grades | IQ set=)

51
Knowing when we know enough to calculate
the effect of Interventions

• Causal graph known


• Features of causal graph known
• Prediction algorithm (SGS - 1993)
• Data tell us when we know enough –
i.e., we know when we don’t know

52
4. Problems with Using
Regession for Causal
Inference

53
Regression to estimate Causal
Influence

• Let V = {X,Y,T}, where


-measured vars: X = {X1, X2, …, Xn}
-latent common causes of pairs in X U Y: T = {T1, …, Tk}

• Let the true causal model over V be a Structural


Equation Model in which each V  V is a linear
combination of its direct causes and independent,
Gaussian noise.

54
Regression to estimate Causal
Influence

• Consider the regression equation:


Y = b0 + b1X1 + b2X2 + ..…bnXn

• Let the OLS regression estimate bi be the estimated causal influence of Xi


on Y.

• That is, holding X/Xi experimentally constant, bi is an estimate of the


change in E(Y) that results from an intervention that changes Xi by 1 unit.

• Let the real Causal Influence Xi  Y = i

• When is the OLS estimate bi an unbiased estimate of the the real Causal
Influence Xi  Y = i ?

55
Regression vs. PAGs to estimate
Qualitative Causal Influence

• bi = 0  Xi _||_ Y | X/Xi

• Xi - Y not adjacent
in PAG over X U Y  S  X/Xi, Xi _||_ Y | S

• So for any SEM over V in which


• Xi _||_ Y | X/Xi and
 S  X/Xi, Xi _||_ Y | S

PAG is superior to regression wrt errors of commission

56
Regression Example

T1 b1 0 
T2
X1 X2 X3 b2 0 X

b3 0
X
Y X1 X2 X3

True Model PAG

57
Regression Bias

If
• Xi is d-separated from Y conditional on
X/Xi in the true graph after removing Xi
 Y, and
• X contains no descendant of Y, then:

bi is an unbiased estimate of i

58
Regression Bias Theorem

If T = , and X prior to Y, then

bi is an unbiased estimate of i

59
Tetrad 4 Demo

[Link]/projects/tetrad

60
Applications

• Genetic Regulatory • Rock Classification


Networks • Spartina Grass
• Pneumonia • College Plans
• Photosynthesis • Political Exclusion
• Lead - IQ • Satellite Calibration
• College Retention • Naval Readiness
• Corn Exports

61
MS or Phd Projects

• Extending the Class of Models Covered


• New Search Strategies
• Time Series Models (Genetic Regulatory Networks)
• Controlled Randomized Trials vs. Observations Studies
Projects: Extending the Class
of Models Covered

1) Feedback systems
2) Feedback systems with latents
3) Conservation, or equilibrium systems
4) Parameterizing discrete latent variable
models
Projects: Search Strategies

1) Genetic Algorithms, Simulated Annealing


2) Automatic Discretization
3) Scoring Searches among Latent Variable Models
4) Latent Clustering & Scale Construction
References

• Causation, Prediction, and Search, 2nd Edition, (2001), by P. Spirtes,


C. Glymour, and R. Scheines ( MIT Press)

• Causality: Models, Reasoning, and Inference, (2000), Judea Pearl,


Cambridge Univ. Press

• Computation, Causation, & Discovery (1999), edited by C. Glymour


and G. Cooper, MIT Press

• Causality in Crisis?, (1997) V. McKim and S. Turner (eds.), Univ. of


Notre Dame Press.

• TETRAD IV: [Link]/tetrad

• Web Course on Causal and Statistical Reasoning :


[Link]/projects/csr/
65

You might also like