BUS119 Data-Driven Marketing
Lecture 9: Understanding Consumers
Using Segmentation Methods in
Data-Driven Marketing
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 1
Class Outline
In this class, we will learn:
• What is Segmentation?
• Several common segmentation methods:
• CART
• CHAID
• Neural Network
• Applications of Segmentation in Data-Driven
Marketing
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 2
Review: what is segmentation?
Segment: A group of consumers who are similar in
terms of how they respond to your marketing mix.
Segmentation: The act of dividing the market into
segments.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 3
Segmentation Variables – “A Priori”
1. Demographic e.g., gender, age, income,
education, family size etc.
2. Geographic e.g., zip code, census tract, block
group etc.
3. Psychographic e.g., lifestyle, personality etc.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 4
Segmentation Variables – “Post Hoc”
1. Attributes e.g., Book Club’s Selection of the Month.
2. Benefit e.g., Satisfaction
3. Behavioral e.g., RFM (recency, frequency, and monetary
value)
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 5
The following two types of segmentation methods are
often used in Data-Driven Marketing
Classification Tree: inclusive of all the tree
generating techniques
CHAID: Chi Square Automatic Interaction Detector
CART: Classification and Regression Trees
Neural Networks
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 6
The goals of Netflix’s database marketing
efforts are…
Predicting movie preferences
Purchase price for DVD rentals
“Throttling” heavy renters
Here is the question:
How do you become more intelligent about your potential consumers’
purchase? (Assume that each consumer observation contains Income,
Gender and Age)
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 7
Segmentation Scheme 1
Income Response rate
< 30K 0.087
30K-60K 0.087
> 60K 0.087
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 8
Segmentation Scheme 2
Gender Response rate
Female 0.086
Male 0.088
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 9
Segmentation Scheme 3
Age Response rate
< 25 0.119
25 - 35 0.070
> 35 0.039
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 10
CHAID (Chi-square Automatic Interaction Detection)
Analysis: segmentation tree
How do we come up with
the CHAID tree?
Not P-value in this tree
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 11
As a review, let’s look at the chi-square test
of independence again:
Response
Will Buy Will Not Buy
Male 255 1554
Gender
127 1274
Female
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 12
Chi-Square Test of Association
Ho: Gender and Response are independent.
Chi-Square statistic = Σi (Observedi – Expectedi)2 /
Expectedi
If chi-square statistic is “large” (i.e. P-value<0.05)
Observed and expected frequencies are different from each other
We reject Ho
If chi-square statistic is “small” (i.e. P-value>0.05)
Observed and expected frequencies are similar to each other
We do not reject Ho
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 13
To test independence, we need to get the expected
frequency conditional on independence for each cell.
How to do it? Here is Step 1.
Response
Will Buy Will Not Buy
Male 255 1554 1809/3210=0.56
Gender
Female 127 1274 1401/3210=0.44
382/3210=0.12Data-Driven2828/3210=0.88 N = 3210
Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 14
To test independence, we need to get the expected
frequency conditional on independence for each
cell. How to do it? Here is Step 2.
Response
Will Buy Will Not Buy
Male 0.56*0.12*3210 0.56*0.88*3210 0.56
=215.7 =1581.9
Gender
Female 0.44*0.12*3210 0.44*0.88*3210 0.44
=169.5 =1242.9
0.12 0.88 N = 3210
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 15
Lastly, we compare the observed & Expected
Frequencies, and this gives us the test of independence
Response
Will Buy Will Not Buy
Male 255 1554
215.7 1581.9 Actual
Gender Expected
Female 127 1274
169.5 1242.9
N = 3210
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 16
Chi-Square Test of Independence
Chi-Square statistic
= (255-215.7)2/215.7+ (127-169.5)2/169.5+
(1554-1581.9)2/1581.9+ (1274-1242.9)2/1242.9
= 19.09
How does one define “large”?
Using a critical value for the chi-square statistic
Degrees of freedom (df) = (# rows – 1)*(# columns – 1)
Choose P-value smaller than 0.05 level
And this is easily done in SPSS!
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 17
Why CHAID divided AGE1 by GENDER (not
INCOME)
Run Chi-Square Test between Gender and Response; get
p-value
Run Chi-Square Test between Income and Response; get
p-value
Whichever p-value is lower indicates superior
segmentation.
……
And this continues for the second and final steps of
segmentation
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 18
CHAID (Chi-square Automatic Interaction Detection)
Analysis: segmentation tree
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 19
CHAID Segments
Segment Response Rate
1. < 25 Male 0.141
2. < 25, Female 0.091
3. 25-35, < 30K 0.081
4. 25-35, 30K-60K 0.069
5. 25-35, > 60K 0.059
6. > 35 0.039
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 20
Summary of CHAID
CHAID analysis has the following requirements:
The predictor variables must be categorical;
The splitting can be 2-branches or more;
The splitting is based on a chi-square test;
No variable is included unless there is statistically significant
association between the dependent variable and the predictor;
There is NO pruning of the final tree.
Disadvantage of CHAID:
no pruning, tree can be over-fitting
Once a variable is used, it cannot be used again.
Advantage of CHAID:
Allow splits to be more than binary;
Also is part of SPSS family (an add-on product)
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 21
CART (Classification and Regression Trees)
is very similar to CHAID
Very similar to CHAID, but the differences are:
The predictor variables can be both categorical and interval;
The splitting must be binary;
The splitting is based on a Gini measure, a measure of “impurity”,
which is not a chi-square test;
The final tree can be pruned backwards.
CART’s advantage is its ability to handle complex interactions and to
uncover these interactions through data analysis; CART is also robust
to outliers.
CART’s disadvantage: since it is based on stepwise sample splits and
not precise values, it is potentially unstable.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 22
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 23
Neural Networks: handles complex
relationships by assuming NO specific
statistical relationship between variables
Used for building predictive models in situations where the
analyst has little knowledge about the form of relationship
between the independent and dependent variables;
Previous lectures: tools (e.g. linear regression) that can be
used for prediction. However, they assume very specific
mathematical forms (often linear) in the pattern relating the
dependent and independent variables.
When patterns are too complex to be captured by these
forms, neural networks provide a viable alternative.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 24
A Common Neural Network
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 25
Summary of Neural Networks: pros and cons
Neural networks are “trained” since there is no
pre-specified mathematical model relating input
and output. Therefore it places an extremely heavy
burden on the data in the training sample.
Disadvantage: not intuitive, often difficult to
interpret the results (black box), and require very
specific software.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 26
Overall evaluation of tree and neural net
segmentation methods
Useful in dealing with large amounts of data with
many variables;
Neural net (NN) provides an alternative to
regression or other segmentation analysis methods.
However, because NN depends so much on the
training sample, its performance in the test dataset
is questionable. Select the right variables (e.g.
stepwise regression) could improve the fit.
Data-Driven Marketing
Lecture 9 Understanding Consumers Using
11/10/2023 Segmentation Methods 27