Logistic Regression
By:
Dr. Elya Nabila Abdul Bahri
1
Objectives
– Explain the concepts of logistic regression.
– Fit a binary logistic regression model using the Logistic
Regression task.
– Explain effect and reference cell coding.
– Define and explain the odds ratio.
– Explain the standard output from the Logistic Regression
task.
2
Overview
Response Analysis
Linear
Regression
Analysis
Continuous
Logistic
Regression
Analysis
Categorical
3
Types of Logistic Regression
Response Type of
Variable Logistic Regression
Binary
Two Binary
Categories
YES NO
Nominal Nominal
Three
or More
Categories Ordinal
Ordinal
4
What Does Logistic
Regression Do?
The logistic regression model uses the predictor variables, which
can be categorical or continuous, to predict the probability of
specific outcomes.
In other words, logistic regression is designed to describe
probabilities associated with the values of the response variable.
5
Logistic Regression Curve
1.0
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
x 6
Logit Transformation
pi
logit( pi ) log
1 pi
Logistic regression models transform probabilities
called logits.
where
i indexes all cases (observations).
pi is the probability that the event (a sale,
for example) occurs in the ith case.
log is the natural log (to the base e).
7
Assumption
pi Logit (pi)
Logit
Transform
Predictor Predictor
8
Logistic Regression Model
logit (pi) = 0 + 1X1 + εi
where
logit (pi) logit transformation of the probability
of the event
0 intercept of the regression line
1 slope of the regression line
εi error (residual) associated with each observation
9
Open file logit_multi
10
Reference Cell Coding: Two
Levels
Design Variables
Class Value 1
purchase Yes 1
No 0
11
Reference Cell Coding: Three
Levels
Design Variables
Class Value Label 1 2
inclevel 3 dh: High Income 1 0
2 dm: Medium Income 0 1
1 Low Income 0 0
12
Quick Quiz: Reference Cell
Coding
logit(p) = 0 + 1 * DHigh income + 2* DMedium income
Write the following in your workbook:
the equation for the logit when Income=High
the equation for the logit when Income=Medium
13
Reference Cell Coding: An
Example
logit(p) = 0 + 1 * DHigh income + 2* DMedium income
0 = the value of the logit when income is Low
1 = the difference between the logits for High
and Low income
2 = the difference between the logits for Medium
and Low income
14
Effect Coding: Two Levels
Design Variables
Class Value 1
gender Female 1
Male -1
15
Effect Coding: Three Levels
Design Variables
Class Value Label 1 2
inclevel 1 Low Income 1 0
2 Medium Income 0 1
3 High Income -1 -1
16
Effect Coding: An Example
logit(p) = 0 + 1 * DHigh income + 2* DMedium income
0 = the average value of the logit across all
categories
1 = the difference between the logit for
High income and the average logit
2 = the difference between the logit for
Medium income and the average logit
-(1+2) = the difference between the average
logit and the logit for Low income
17
Binary Logistic Regression
This demonstration illustrates fitting a simple logistic
regression model using the Logistic Regression task.
18
Binary Logistic Regression
Task
Analyze Regression
Binary Logistic
19
What Is an Odds Ratio?
An odds ratio indicates how much more likely, with respect to
odds, a certain event occurs in one group relative to its
occurrence in another group.
Example: How much more likely are females
to purchase 100 dollars or more
in items compared to males?
20
Frequency
21
Probability of Outcome
Outcome
Yes No Total
Group A 101 139 240
Group B 61 130 191
Total 162 269 431
Probability of a YES outcome Probability of a NO outcome
in Group A = 101/240 (0.42) in Group A = 139/240 (0.58)
Probability of a YES outcome Probability of a NO outcome
in Group B = 61/191 (0.32) in Group B = 90/100 (0.68)
22
Odds
Odds of Outcome in Group A
Probability of Probability of
a Yes a No
outcome in outcome in
Group A Group A
0.42 0.58 = 0.72
23
Odds
Odds of Outcome in Group B
Probability of Probability of
a Yes a No
outcome in outcome in
Group B Group B
0.32 0.68 = 0.47
24
Odds Ratio
Odds Ratio of Group A to Group B
Odds of
outcome in
Group A
Odds of
outcome in
Group B
0.72 0.47 = 1.53
25
Properties of the Odds Ratio
No Association
Group B Group A
More Likely More Likely
0 1
26
Odds Ratio Calculation from the
Current Logistic Regression Model
Logistic regression model:
logit pˆ log(odds) 0 1 gender
Odds ratio (females to males):
oddsfemales e 0 1
oddsmales e 0
odds ratio = e 0 1 e
1
=
0
e
27
1 Independent Variable:
Gender
28
Goodness-of-fit statistics for
Logistic Regression Model
29
Compare Means :
Independent-Samples T Test
30
Compare Means of Gender
31
Comparing Pairs
To find concordant, discordant, and tied pairs, compare
everyone who had the outcome of interest against
everyone who did not.
< $100 $100 +
32
Concordant Pair
Compare a woman who bought more than $100 worth
of goods from the catalog and a man who did not.
< $100 $100 +
P(100+) = .47 P(100+) = .72
The actual sorting agrees with the model.
This is a concordant pair.
33
Discordant Pair
Compare a man who bought more than $100 worth
of goods from the catalog and a woman who did not.
< $100 $100 +
P(100+) = .72 P(100+) = .47
The actual sorting disagrees with the model.
This is a discordant pair.
34
Tied Pair
Compare two women. One bought more than $100 worth
of goods from the catalog, but the other did not.
< $100 $100 +
P(100+) = .72 P(100+) = .72
The model cannot distinguish between the two.
This is a tied pair.
35
Concordant versus Discordant
Customer Purchasing Over $100
Predicted
Predicted Females
Females Males
Males
Outcome
Outcome (0.42) (0.32)
Probability (0.72) (0.47)
Probability
Customer
Females
Females Tie Discordant
Purchasing
(0.42)
(0.72) Tie Discordant
Pair Pair
Less Than
$100
Males
Males Concordant Tie
(0.47)
(0.32) Pair Pair
Concordant Tie
36
Model: Concordant,
Discordant, and Tied Pairs
37
Multiple Logistic Regression
38
Multiple Logistic
Regression
This demonstration illustrates fitting a logistic regression
model with more than one explanatory variable.
39
Objectives
– Fit a multiple logistic regression model using
the backward elimination method.
– Fit a multiple logistic regression model with interactions.
40
Multiple Logistic Regression
Purchase Gender Income Age
logit (pi) = 0 + 1X1 + 2X2 + 3X3
41
Backward Elimination
Method
Full
Model
Purchase Gender Income Age
? ?
Reduced
Model
Purchase
42
Adjusted Odds Ratio
Predictor Outcome
Gender Purchase
Controlling for
Income Age
43
Multiple Logistic Regression
44
Multiple Logistic
Regression with
Interactions
This demonstration illustrates adding interaction terms to
a main effects model and using backward elimination to
select the best model.
45
Multiple Logistic Regression
Gender Income Age
Purchase
46
Backward Elimination Method
Full Model
Purchase Gender Income Age
Reduced Model
Purchase
.
Gender
.
Income
.
Age ? ?
. . .
. . .
47
Comparing Models
Gender + Income Gender + Income + (Gender *
Income)
AIC 44.257 AIC 39.260
SIC 60.251 SIC 63.657
-2 Log likelihood 36.257 -2 Log likelihood 27.260
Concordant 54.0% Concordant 54.8%
Discordant 29.4% Discordant 28.6%
Ties 16.6% Ties 16.6%
Somers’ D 0.246 Somers’ D 0.261
Goodman and 0.295 Goodman and 0.314
Kruskal’s Gamma Kruskal’s Gamma
Kendall’s Tau-a 0.116 Kendall’s Tau-a 0.123
Concordance Index c 0.623 Concordance Index c 0.631
48
Graph Plot
49