0% found this document useful (0 votes)
293 views44 pages

Decision Trees - CHAID AND CART 2019 PDF

Decision trees can be used for classification or regression problems. Classification and Regression Tree (CART) and Chi-Square Automatic Interaction Detection (CHAID) are two common decision tree algorithms. CART builds binary trees using Gini index or entropy for splits, while CHAID can create multi-way splits based on chi-square tests. It analyzes variables in order of statistical significance. CHAID also uses merging steps to combine similar nodes. Both algorithms were applied to a marketing dataset to predict sales conversions based on customer attributes.

Uploaded by

s4shivendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
293 views44 pages

Decision Trees - CHAID AND CART 2019 PDF

Decision trees can be used for classification or regression problems. Classification and Regression Tree (CART) and Chi-Square Automatic Interaction Detection (CHAID) are two common decision tree algorithms. CART builds binary trees using Gini index or entropy for splits, while CHAID can create multi-way splits based on chi-square tests. It analyzes variables in order of statistical significance. CHAID also uses merging steps to combine similar nodes. Both algorithms were applied to a marketing dataset to predict sales conversions based on customer attributes.

Uploaded by

s4shivendra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Decision Tree Learning – CART &

CHAID

U DINESH KUMAR
Classification/Decision Trees
• Introduction to Decision Trees.

• Classification/Decision Tree approach for classification.

• Classification and Regression Tree (CART)

• Chi-Square Automatic Interaction Detection (CHAID)


Classification/Decision Trees
• Decision trees (aka decision tree learning) are collection of
predictive analytics techniques that uses tree like graph for
predicting the value of the outcome variable (or target
variable) based on values of predictors or features.

• In a decision tree when the response (outcome) variable takes


discrete values then they are called classification trees.
Classification Trees – Generic Steps
• Start with root node consisting all data (call subset S0).

• Split the root node into two or more branches (called edges) using
split criteria resulting in internal nodes.

• Each internal node is further divided until no further splitting is


possible. The terminal nodes (aka leaf nodes) will not have any
outgoing edges.

• Terminal nodes are used for generating predicting the value of the
outcome variable and business rules.

• Tree pruning Stopping criteria (Tree pruning) a process of


restricting the size of the tree is used to avoid large trees and
overfitting the data.
Classification and Regression Trees
(CART)
Classification and Regression Trees
(CART)
• Splits are chosen based either on Gini Index or
Entropy criteria (there are few more).

• CART is a binary tree, whereas CHAID can split


the initial node into more than 2 branches.
Gini Index (Classification Impurity)
• Gini Index is used to measure the impurity at a
node (in classification problem) and is given
by:
J
Gini(k) =  P( j k )(1 − P( j k ))
j =1
where P(j k) is the proportion of category j in node k

Smaller Gini Index implies less impurity.


Entropy (Impurity Measure)
• Entropy is another impurity measure that is
frequently used. Entropy at node “k” is given
by:
c
Entropy(k) = −  P(j | k)log 2 (P(j | k))
j=1
Gini Index Calculation
Number of classes 2 (say 0 and 1)

Consider node label k with 10 1s and 90 0s.

2
Gini(k) =  P( j k )(1 − P( j k )) = 2 P( j k )(1 − P( j k ))
j =1

Gini(k) = 2  0.1 0.9 = 0.18

Smaller number
implies less
impurity
Entropy Calculation
Number of classes 2 (say 0 and 1)

Consider node label k with 10 1s and 90 0s.

2
Entropy(k) = −  P( j k ) log( P( j k ))
j =1

Entropy(k) = −0.1  log 2 (0.1) − 0.9  log 2 (0.9) = 0.4689

Higher than Gini


coefficient
Classification Tree Logic
Node t

Node tL Node tR Reduction in


impurity

Maxi(t) − PL  i(t L ) − PR  i(t R )


i(.) = Impurity at node (.)
PL = Proportion of observations in the left node
PR = Proportion of observations in the right node
CART Example: Marketing Head’s
Conundrum
Business Context
• Sales conversion may take more than year in case of B2B sales
environment.

• Company invests a lot of money to convert a lead.

• The success rate (conversion rate) can be very low.


Data
1. Product

2. Industry

3. Region

4. Segment (combination of product, industry and region)

5. Profit of the customer

6. Sales value

7. Profit percentage (profit expected from sales as a percentage of sales value)

8. Joint bid proportion (proportion of profit in case of joint bid)

9. Sales outcome (outcome variables)


BOX PLOT
Distribution for lost and won cases for Profit of the customer
and sales value
Logistic Regression Output
Logistic Regression Output
Logistic Regression Output
CART Model
Hyper parameter
values

Max Depth 2
Min Split 20
Min Bucket 7
Complexity 0.01
Terminal nodes are 2, 6 and 7
Rule number: 7 [Sales.Outcome=1
cover=699 (34%) prob=0.91]
Profit.of.customer....Mn< 1.015
Profit..< 54.5

Rule number: 6 [Sales.Outcome=0


cover=350 (17%) prob=0.47]
Profit.of.customer....Mn< 1.015
Profit..>=54.5

Rule number: 2 [Sales.Outcome=0


cover=1026 (49%) prob=0.22]
Profit.of.customer....Mn>=1.015
Error Matrix (in Validation Data)
Predicted Error
Actual 0 1
0 189 35 15.6%
1 18 202 8.2%

Overall Error = 11.9%


AUC is 0.91 for
CART
CHAID
Chi-square Automatic Interaction Detection
Introduction to CHAID
• CHAID is one of the popular decision tree techniques used in
solving classification problems.

• Initial models of CHAID used chi-square test of independence for


splitting. CHAID was first presented in an article titled, “An
exploratory technique for Investigating large quantities of
categorical variables”, by G V Kass in Applied Statistics (1980)
CHAID
• CHAID partitions the data into mutually exclusive, exhaustive,
subsets that best describe the dependent categorical variable.

• CHAID is an iterative procedure that examines the predictors (or


classification variables) and uses them in the order of their
statistical significance.
CHAID Splitting Rule
• The splitting in CHAID is based on the type of
the dependent variable (or target variable).
– For continuous dependent variable F test is used.
– For categorical dependent variable chi-square test
of independence is used.
CHAID Procedure
• Step 1: Examine each feature for its statistical significance with
the outcome variable using F test (for continuous dependent) or
chi-square test for categorical dependent).

• Step 2: Determine the most significant among the features


(feature with smallest p value after Bonferonni correction).

• Step 3: Divide the data by levels of the most significant feature


Each of these groups will be examined individually further.

• Step 4: For each sub-group, determine the most significant


feature from the remaining features and divide the data again.

• Step 5: Repeat step 4 till stopping criteria is reached.


CHAID Example
• Marketing Head’s Conundrum

• Target Variable – Sales Conversion


Chi-Square test of independence
• Chi-square test of independence starts with an assumption that
there is no relationship between two variables.

• For example, we assume that there is no relationship between


profit of the customer and sales conversion.
Contingency Table
Sales Conversion
Profit 0 (Lost) 1 (Won) Total

<= 1 MN 325 1129 1454


> 1 MN 1190 321 1511
Total 1515 1450 2965

H0: Profit of the customer and sales conversion are independent

HA: Profit of the customer and sales conversion are dependent


Chi-square statistics

n  (Oij − Eij )
m 2

 =  
2


i =i j =1  Eij 
i th row sum × jth column sum
E ij =
Total sum

Oij = Observed Frequency and Eij = Expected Frequency


Observed Frequency Expected Frequency
Sales Sales
Profit Conversion Total Profit Conversion Total
0 1 0 1
(Lost) (Won) (Lost) (Won)
<= 1 325 1129 1454 <= 1 325 1129 1454
MN MN
>1 1190 321 1511 >1 1190 321 1511
MN MN
Total 1515 1450 2965 Total 1515 1450 2965

n m  (O − E ) 2

 =  
2 ij ij
 = 863.92

i =i j =1  Eij 

P-value = 6.8 x 10-190, thus we reject the null hypothesis that profit of the
customer and sales conversion are independent.
Business Rules
• Node 1: If the profit of the customer is less than 0.35 mn,
then predicted Yi = 1, P(Yi = 1) = 0.9798.

• Node 3: If the profit of the customer is between 1.01 and 1.43


mn, then predicted Yi = 0, P(Yi = 1) = 0.2981

• Node 8: If the profit of the customer is between 1.01 and 1.43


mn, and profit % is more than 50, then predicted Yi = 0, P(Yi =
1) = 0.9143
Merging - CHAID
• CHAID uses both splitting and merging steps.

• In merging, least significantly different groups are merged to form


one class.
CHAID Stopping Criteria
• Maximum tree depth is reached (which is pre-
defined).

• Minimum number of a cases to be a parent node is


reached (again pre-defined)

• Minimum number of to be a child node is reached.

You might also like