0% found this document useful (0 votes)
95 views34 pages

Logit Analysis of Contingency Tables

This document provides an introduction to analyzing contingency tables using logistic regression and the PROC GENMOD procedure in SAS. It discusses how to test for independence between variables in a 2-way contingency table using a chi-square test. It then demonstrates how to model the probabilities in such a table using PROC GENMOD, including specifying the data as cell frequencies with a FREQ statement or specifying the events and trials directly.

Uploaded by

Sylvia Cheung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views34 pages

Logit Analysis of Contingency Tables

This document provides an introduction to analyzing contingency tables using logistic regression and the PROC GENMOD procedure in SAS. It discusses how to test for independence between variables in a 2-way contingency table using a chi-square test. It then demonstrates how to model the probabilities in such a table using PROC GENMOD, including specifying the data as cell frequencies with a FREQ statement or specifying the events and trials directly.

Uploaded by

Sylvia Cheung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

1.

Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
CHAPTER 2: BINARY LOGIT ANALYSIS OF
CONTINGENCY TABLES
Prof. Alan Wan
1 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Table of contents
1. Introduction
2. Two-way classication and PROC GENMOD
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
3. Three-way classication
4. Class exercises
2 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction

Contingency table: a table containing two or more variables of


classication, and the purpose is to determine if these
variables are related;

Here is an example:
Annual changes
in stock prices
Up Down Total
January changes Up 22(16.1) 1(6.9) 23
in stock prices Down 6(11.9) 11(5.1) 17
Total 28 12 40
3 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction

Contingency table: a table containing two or more variables of


classication, and the purpose is to determine if these
variables are related;

Here is an example:
Annual changes
in stock prices
Up Down Total
January changes Up 22(16.1) 1(6.9) 23
in stock prices Down 6(11.9) 11(5.1) 17
Total 28 12 40
3 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction

A table containing information of this sort can be used to test


whether, as some nancial analysts suggest, January is a good
prediction of whether stock prices will go up or down in the
entire year; i.e., we can test
H
0
: whether or not stock prices go up in the entire year is the
same regardless of the behaviour in January, vs.
H
1
: otherwise

Expected frequencies (under H


0
) are shown in parentheses in
the table.
4 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction

The expected frequencies under H


0
are calculated as follows:
16.1 =
28
40
23; 6.9 =
12
40
23; 11.9 =
28
40
17; 5.1 =
12
40
17;

Why? Take 16.1 as an example;

Note that Pr (Up


Y
Up
J
) = Pr (Up
Y
|Up
J
)Pr (Up
J
);

But under independence (H


0
), Pr (Up
Y
|Up
J
) = Pr (Up
Y
).
Hence Pr (Up
Y
Up
J
) = Pr (Up
Y
)Pr (Up
J
) =
28
40
23
40
=
16.1
40
.
5 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction

This test can be conducted using the usual Pearsons


Chi-square statistic:
Pearson

s
2
=

n
i =1
(O
i
E
i
)
2
E
i

2
(r 1)(c1)
, where r and c
are the numbers of rows and columns in the table respectively;

For this example,

4
i =1
(2216.1)
2
16.1
+
(16.9)
2
6.9
+
(611.9)
2
11.9
+
(115.1)
2
5.1
= 16.96;

Now,
2
1,0.05
= 3.84. Hence we reject H
0
and conclude that
stock price movements during the whole year are not
independent of their movements in January of the year.
6 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Introduction
data stock;
input f yp jp;
datalines;
22 1 1
6 1 0
1 0 1
11 0 0
;
proc freq data=stock;
weight f;
tables yp*jp/chisq cmh;
run;


Statistics for Table of yp by jp

Statistic DF Value Prob

Chi-Square 1 16.9577 <.0001
Likelihood Ratio Chi-Square 1 18.5678 <.0001
Continuity Adj. Chi-Square 1 14.2053 0.0002
Mantel-Haenszel Chi-Square 1 16.5338 <.0001
Phi Coefficient 0.6511
Contingency Coefficient 0.5456
Cramer's V 0.6511
7 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax

Consider the penalty data of Chapter 1. Suppose individual


data are unavailable and all we have is the following table:
Blacks Non-blacks Total
Death 28 22 50
Life 45 52 97
Total 73 74 147

The Logit model for regressing DEATH on BLACKD with


data contained in a contingency table is PROC GENMOD;

One way to invoke PROC GENMOD is to use the FREQ


command, which simply replicates the observations and
converts the data into individual format based on the
frequency specied.
8 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax
DATA CONT1;
INPUT F BLACKD DEATH;
DATALINES;
22 0 1
28 1 1
52 0 0
45 1 0
;
PROC GENMOD DATA=CONT1 DESCENDING;
FREQ F;
MODEL DEATH=BLACKD/D=B;
RUN;

9 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax
The GENMOD Procedure

Model Information

Data Set WORK.CONT1
Distribution Binomial
Link Function Logit
Dependent Variable DEATH
Frequency Weight Variable F
Observations Used 4
Sum Of Frequency Weights 147


Response Profile

Ordered Total
Value DEATH Frequency

1 1 50
2 0 97

PROC GENMOD is modeling the probability that DEATH='1'.








































10 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax
Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 145 187.2704 1.2915
Scaled Deviance 145 187.2704 1.2915
Pearson Chi-Square 145 147.0000 1.0138
Scaled Pearson X2 145 147.0000 1.0138
Log Likelihood -93.6352

Algorithm converged.

Analysis Of Parameter Estimates
Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -0.8602 0.2543 -1.3587 -0.3617 11.44 0.0007
BLACKD 1 0.3857 0.3502 -0.3006 1.0721 1.21 0.2706
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.
11 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax

As far as PROC GENMOD is concerned, only 4 observations


have been inputted;

The actual number of observations, namely, 147, is considered


to be the sum of the frequencies. The FREQ command
converts the 4 observations into 147 frequencies to be used
for ML estimation;
12 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: frequency weight syntax

The Deviance statistic is a LR test that tests if there are


signicant dierence between the estimated (restricted) and
saturated (unrestricted) model;

Deviance = 2[lnL(

S
) lnL(

E
)]
2
m
,
where m is the dierence in the number of parameters
between the saturated and the estimated models;

The saturated model is a model with number of unknown


parameters being equal to the number of observations;

Hence for a model estimated by individual data, there are n


observations for n unknowns, resulting in L(

S
) = 1,
lnL(

S
) = 0 and Deviance = 2[lnL(

E
)].
13 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: event/trial syntax

Instead of inputting all 4 internal cell counts, the cell


frequencies for death sentences (events) along with the
column totals (trials) are inputted.

DATA CONT1;
INPUT DEATH TOTAL BLACKD;
DATALINES;
22 74 0
28 73 1
;
PROC GENMOD DATA=CONT1;
MODEL DEATH/TOTAL=BLACKD/D=B;
RUN;

14 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: event/trial syntax
The GENMOD Procedure

Model Information

Data Set WORK.CONT1
Distribution Binomial
Link Function Logit
Response Variable (Events) DEATH
Response Variable (Trials) TOTAL
Observations Used 2
Number Of Events 50
Number Of Trials 147

Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF

Deviance 0 0.0000 .
Scaled Deviance 0 0.0000 .
Pearson Chi-Square 0 0.0000 .
Scaled Pearson X2 0 0.0000 .
Log Likelihood -93.6352

Algorithm converged.

Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -0.8602 0.2543 -1.3587 -0.3617 11.44 0.0007
BLACKD 1 0.3857 0.3502 -0.3006 1.0721 1.21 0.2706
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.
15 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax

With frequency weighting syntax:


L =
147

i =1
[
1
1 + e
(
1
+
2
BLACKD
i
)
]
DEATH
i
[1
1
1 + e
(
1
+
2
BLACKD
i
)
]
1DEATH
i

With event/trial syntax:


L = {[
1
1 + e
(
1
+
2
(BLACKD=0))
]
22
[1
1
1 + e
(
1
+
2
(BLACKD=0))
]
52
}
{[
1
1 + e
(
1
+
2
(BLACKD=1))
]
28
[1
1
1 + e
(
1
+
2
(BLACKD=1))
]
45
}
16 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
2.1. PROC GENMOD: frequency weight syntax
2.2. PROC GENMOD: event/trial syntax
PROC GENMOD: event/trial syntax

The two likelihood functions are of course algebraically


identical, but PROC GENMOD treats the rst likelihood as
being based on 147 Bernoulli(p) observations, and the second
likelihood as being based on 2 observations, each being a
product of Bernoulli(p) densities corresponding to a common
value of BLACKD, namely, BLACKD=0 for the rst
observation and BLACKD=1 for the second observation;

Under the event/trial syntax, there are 2 observations for


estimating 2 parameters. Hence the estimated model is the
saturated model, thus resulting in a Deviance statistic of 0;

The Deviance statistic carries no signicant meaning for


two-way cross classication.
16 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

Consider the cross classication of race, gender and possession


of a drivers license for a sample of 17 and 18 year old kids:
Drivers license
Race Gender Yes No
White Male 43 134
Female 26 149
Black Male 29 23
Female 22 36

Let YES represent the event of interest, and


TOTAL=YES+NO represent the trial.
17 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication
DATA DRIVER;
INPUT WHITE MALE YES NO;
TOTAL = YES+NO;
DATALINES;
1 1 43 134
1 0 26 149
0 1 29 23
0 0 22 36
;
PROC GENMOD DATA=DRIVER;
MODEL YES/TOTAL=WHITE MALE/D=B;
RUN;

18 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication
Model Information

Data Set WORK.DRIVER
Distribution Binomial
Link Function Logit
Response Variable (Events) YES
Response Variable (Trials) TOTAL
Observations Used 4
Number Of Events 120
Number Of Trials 462


Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 1 0.0583 0.0583
Scaled Deviance 1 0.0583 0.0583
Pearson Chi-Square 1 0.0583 0.0583
Scaled Pearson X2 1 0.0583 0.0583
Log Likelihood -245.8974


Algorithm converged.


Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -0.4555 0.2221 -0.8909 -0.0201 4.20 0.0403
WHITE 1 -1.3135 0.2378 -1.7795 -0.8474 30.51 <.0001
MALE 1 0.6478 0.2250 0.2068 1.0889 8.29 0.0040
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

19 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

There are 4 observations, 120 events of YES and 462 trials


represented by TOTAL;

Both the race and gender coecients are signicantly


dierent from zero;

The present estimated model is not the saturated model as


there are 4 observations for 3 parameters. There is the
dierence of one parameter between the estimated and
saturated models. Hence the Deviance statistic has df=1.
20 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

How to construct the saturated model with the available data?

The present model can be expanded to yield the saturated


model by introducing the interaction term WHITEMALE;

The Deviance test is essentially a test of the signicance of


the interaction term
21 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

How to construct the saturated model with the available data?

The present model can be expanded to yield the saturated


model by introducing the interaction term WHITEMALE;

The Deviance test is essentially a test of the signicance of


the interaction term
21 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

How to construct the saturated model with the available data?

The present model can be expanded to yield the saturated


model by introducing the interaction term WHITEMALE;

The Deviance test is essentially a test of the signicance of


the interaction term
21 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

Estimated model:
p
i
=
1
1 + e
(
1
+
2
WHITE
i
+
3
MALE
i
)

Saturated model:
p
i
=
1
1 + e
(
1
+
2
WHITE
i
+
3
MALE
i
+
4
WHITE
i
MALE
i
)
22 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

Estimated model:
p
i
=
1
1 + e
(
1
+
2
WHITE
i
+
3
MALE
i
)

Saturated model:
p
i
=
1
1 + e
(
1
+
2
WHITE
i
+
3
MALE
i
+
4
WHITE
i
MALE
i
)
22 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

Testing the signicance of the dierence between the


estimated and saturated models is the same as testing
4
= 0;

The p-value corresponding to the Deviance statistic of 0.0583


can be computed using the following SAS commands:
data;
chi=1-probchi(0.0583,1);
put chi;
run;

This results in a p-value of 0.8092. Hence the interaction term


between MALE and WHITE diers insignicantly from zero.
23 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

To see this more clearly, let us t the model explicitly with the
interaction term:
DATA DRIVER;
INPUT WHITE MALE YES NO;
TOTAL = YES+NO;
DATALINES;
1 1 43 134
1 0 26 149
0 1 29 23
0 0 22 36
;
PROC GENMOD DATA=DRIVER;
MODEL YES/TOTAL=WHITE MALE WHITE*MALE/D=B;
RUN;

24 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication
The GENMOD Procedure

Model Information

Data Set WORK.DRIVER
Distribution Binomial
Link Function Logit
Response Variable (Events) YES
Response Variable (Trials) TOTAL
Observations Used 4
Number Of Events 120
Number Of Trials 462


Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 0 0.0000 .
Scaled Deviance 0 0.0000 .
Pearson Chi-Square 0 0.0000 .
Scaled Pearson X2 0 0.0000 .
Log Likelihood -245.8682


Algorithm converged.


Analysis Of Parameter Estimates

Standard Wald 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -0.4925 0.2706 -1.0229 0.0379 3.31 0.0688
WHITE 1 -1.2534 0.3441 -1.9278 -0.5789 13.27 0.0003
MALE 1 0.7243 0.3888 -0.0378 1.4864 3.47 0.0625
WHITE*MALE 1 -0.1151 0.4765 -1.0491 0.8189 0.06 0.8092
Scale 0 1.0000 0.0000 1.0000 1.0000

NOTE: The scale parameter was held fixed.

25 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

Now, to test H
0
:
4
= 0 vs. H
1
: otherwise, we apply the LR
test:
Deviance = 2(245.8682 245.8974)
= 0.0584

Also, the log of the odds is given by


Z
i
=
1
+
2
WHITE
i
+
3
MALE
i
+
4
WHITE
i
MALE
i
.
So,
Z
i
WHITE
i
=
2
+
4
MALE
i
,
and
Z
i
MALE
i
=
3
+
4
WHITE
i
26 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication
Hence the odds ratio estimates of WHITE and MALE are:
e
1.25340.1151MALE
i
and e
0.72430.1151WHITE
i
respectively, with the following interpretations:

The odds of having a license for white females are


e
1.2534
= 0.286 times the odds for black females;

The odds of having a license for white males are


e
1.25340.1151
= 0.2544 times the odds for black males;

The odds of having a license for black males are


e
0.7243
= 2.063 times the odds for black females;

The odds of having a license for white males are


e
0.72430.1151
= 1.839 the odds for white females.
27 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Three-way classication

The interaction term also aects the marginal eect on p


i
.
For example,
p
i
WHITE
i
= f (Z
i
)(
2
+
4
MALE
i
),
In other words, the marginal change of p
i
with respect to a
change of race from black to white is dependent on the
gender of the person;

Pearsons Chi-square goodness of t test: see Tutorial 2


28 / 29
1. Introduction
2. Two-way classication and PROC GENMOD
3. Three-way classication
4. Class exercises
Class exercises
1. Tutorial 2
2. 2004 Final Exam, Question 1
3. 2007 Final Exam, Question 1
29 / 29

You might also like