0% found this document useful (0 votes)

398 views6 pages

Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree

- The document describes the CHAID algorithm for building decision trees for categorical response variables based on categorical explanatory variables. - CHAID creates a tree by recursively splitting nodes based on the categorical explanatory variable that results in the most significant difference in distributions of response variable categories between offspring nodes. - The summary provides an example application of the CHAID algorithm using the TREEDISC macro in SAS to analyze traffic violation data from Wisconsin drivers.

Uploaded by

Fidia Dta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

398 views6 pages

Tree-Based Methods: Chaid: Categorical Response Variable Categorical Explanatory Variabales Create A Decision Tree

Uploaded by

Fidia Dta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Tree-based methods:

CHAID

AID Morgan + Sonquist (1963)

Journal of Amer. Stat. Association

58, 415{434
Sonquist + Morgan (1964)
Monograph 35, ISR, U. of Michigan
THAID Messenger and Mandell (1972)
Journal of Amer. Stat. Association
67, 768{772
Morgan and Messenger (1973)
THAID, SRC-ISR, U. of Michigan
CHAID Kass, G. V. (1980),
Applied Statistics, 29, 119{127
CART Brieman, et al. (1984)
Classi cation and Regression Trees,
Wadsworth

categorical response variable

Credit Rating:

\bad" \poor" \good" \very good"

categorical explanatory variabales

create a decision tree
;@

;;

;
;

Dividing the cases that reach a certain node in

the tree.
;@
;; @@

;
;

@
@
;@

;;

;
;

1199

1198

Algorithm:

Bad
Poor
Good
V.Good

Fico < 700 700-750 Fico > 750

When there are more than two columns, nd

the "best" subtable formed by combining column categories.
(Step 1) Cross tabulate the response variable
with each of the explanatory variables.
Bad
Poor
Good
V.Good

NT=0 NT 1

1200

1201

(Step 2) This is applied to each table with

more than 2 columns.
Compute Pearson X 2 tests for independence
for each allowable subtable
Fico
< 700 700-750

bad
poor
good
v.good

2
X1

700-750 > 750

2
X2

Look for the smallest X 2 value. If it is not

signi cant, combine the column categories.

bad
poor
good
v.good

< 750 > 750

Repeat step 2
if the new table
has more than
two columns
1202

1203

(Step 4) You have now completed the

\optimal" combining of categories
for each explanatory variable.

(Step 5) Use the \most signi cant" variable

in step 4 to split the node with respect to the \merged" categories
for that variable.

bad
poor
good
v.good

Find the \most signi cant of

these \optimally" merged explanatory variables.
C1+C2 C3 C4+C5+C6

;@@

;
;;

;;

C1+C2
Compute a \Bonferroni" adjusted
chi-squared test of independence for
the reduced table for each explanatory variable.
1204

C4+C5+C6
- repeat steps 1-5 for
each of the o spring
nodes.

Stop if
no variable is signi cant in step 4.
the number of cases reaching a node is below a speci ed limit.
1205

TREEDISC macro is SAS

Summary:

{ modi ed version of CHAID

{ now part of the data mining package
{ application to the Wisconsin Driver data
response: tra c violations in 1974
(1) at least one
(0) none
explanatory variables:
sex
age
history of cardiovascular disease
place of residence
{ missing values are treated as
another category

CHAID is an algorithm
Must categorize every variable
{ ordinal variables
{ nominal variables
At each node it tries to nd
{ best explanatory variable
{ best merger of categories
;@
;; @@

;;
;;
;

Try to make the distributions of cases

across the response
categories as di erent as possible in the
\o spring"nodes.

@@
@@
@

1206

1207

This program uses the TREEDISC

R = RESIDENTIAL AREA

macro in SAS to apply a modified

X = COUNT

CHAID algorithm to the Wisconsin

run

driver data. This code is stored

in the file

chaidwis.sas

proc format

value sex 1 = 'Male'

2 = 'Female'

Fisrt set some graphics options */

value age 1 = '16-36'

To print postscipt files in UNIX */

2 = '36-55'

3 = 'over 55'

goptions cback=white ctext=black

value d

targetdevice=ps300 rotate=landscape

1 = 'Disease'
2 = 'Control'

value v

/* To print postscript files from Windows */

1 = 'Some'
2 = 'None'

goptions cback=white ctext=black

value r

1 = '> 150000'

device=WIN target=ps

2 = '39-150000'

rotate=landscape

3 = '10-39000'
4 = '< 10000'

DATA SET1

5 = 'rural'

INFILE 'c:\courses\st557\sas\drivall.dat'
INPUT AGE
LABEL

SEX

run

AGE = AGE GROUP

proc print data=set1

D = DRIVER GROUP

run

V = VIOLATION STATUS
1208

/*
/*

Draw a larger tree on several

pages */

Load in the xmacros file */

goptions cback=white ctext=black

%inc 'c:\courses\st557\sas\xmacro.sas'

device=WIN target=ps rotate=portrait

Load in the TREEDISC macro

%treedisc(intree=trd,

draw=graphics, pos=90 120)

%inc 'c:\courses\st557\sas\treedisc.sas'
/* Compute a tree for predicting
violation status (V) from age, sex,
disease stauts(D) and residence(R) */
%treedisc(data=set1, depvar=v, freq=x,
ordinal=age: r:,nominal=d: sex:,
outtree=trd, options=noformat,
trace=long)
/*

Draw the tree on one page */

%treedisc(intree=trd, draw=graphics)

TREEDISC Analysis
Values of

AGE :

Values of

R :

Splits Considered for Node

Values of

D :

SEX :

AGE

4 5

2
1

Ordinal

57.39

0.0001

Nominal

36.80

0.0001

Type

Nominal

4.40

0.0359

Predictor

SEX
Values of

Ordinal

2.53

0.4458

2
Best split:

Dependent variable (DV):

AGE Ordinal with p = 0.0000

V
New node: 3

DV values:

Chi-Square Adjusted p

AGE = 2

DV count:
New node: 2

1864

133

656

AGE = 1
DV count:

1209

147

1210

Splits Considered for Node

Predictor
Type

Type

Predictor

Chi-Square Adjusted p

Ordinal

1.41

0.7031

Nominal

0.06

0.8101

Chi-Square Adjusted p

SEX

Nominal

41.59

0.0001

Nominal

0.01

0.9193

Ordinal

0.15

0.9975

Best split:

R Ordinal with p = 0.7031

*** Reject split

Best split:

SEX Nominal with p = 0.0000

New node: 5

SEX = 1
DV count:

New node: 4

102

302

354

SEX = 2
DV count:

1211

TREEDISC Analysis of Dependent

1212

AGE value(s): 2

Variable (DV) V

DV counts: 147

3
1864

Best p-value(s): 0.0001 0.0221

V value(s): 1
DV counts: 280

2
2520

SEX value(s): 2

Best p-value(s): 0.0001 0.0001

DV counts: 20

563

Best p-value(s): 0.0856 0.5368

AGE value(s): 1
DV counts: 133

656

AGE value(s): 2

Best p-value(s): 0.0001 0.9193

DV counts: 14

284

Best p-value(s): 0.8083 0.8990

SEX value(s): 2
DV counts: 31

354

AGE value(s): 3

Best p-value(s): 0.6064 0.8571

DV counts: 6

279

Best p-value(s): 0.0264 0.1102

SEX value(s): 1
DV counts: 102

302

D value(s): 2

Best p-value(s): 0.7334 0.9703

DV counts: 0

1213

127

1214

D value(s): 1
DV counts: 6

D value(s): 2

152

DV counts: 18

Best p-value(s): 0.0592

217

Best p-value(s): 0.1317

R value(s): 1
DV counts: 3

D value(s): 1

DV counts: 40
R value(s): 2
DV counts: 1

3
111

Best p-value(s): 0.5928

AGE value(s): 3
DV counts: 69

839

Best p-value(s): 0.0363 0.8254

R value(s): 5
DV counts: 2

245

Best p-value(s): 0.3814

19
R value(s): 1
DV counts: 20

SEX value(s): 1
DV counts: 127

139

Best p-value(s): 0.8899

1301

Best p-value(s): 0.0232 0.1940

R value(s): 2
DV counts: 58

DV counts: 49

AGE value(s): 2

700

Best p-value(s): 0.7031 0.8101

462

Best p-value(s): 0.0215 0.7310

1215

1216

Objective Segmentation
No ratings yet
Objective Segmentation
21 pages
C4.5 vs CHAID: Decision Tree Algorithms
No ratings yet
C4.5 vs CHAID: Decision Tree Algorithms
30 pages
BA CH 12 PPT
No ratings yet
BA CH 12 PPT
50 pages
Unit 3
100% (1)
Unit 3
21 pages
Classification Trees - CART and CHAID
No ratings yet
Classification Trees - CART and CHAID
50 pages
(Oke) Vt6sutton PDF
No ratings yet
(Oke) Vt6sutton PDF
27 pages
DWM Unit-3 Sem Ans
No ratings yet
DWM Unit-3 Sem Ans
10 pages
(Wadsworth Statistics - Probability) Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone - Classification and Regression Trees-Chapman and Hall - CRC (1984)
No ratings yet
(Wadsworth Statistics - Probability) Leo Breiman, Jerome H. Friedman, Richard A. Olshen, Charles J. Stone - Classification and Regression Trees-Chapman and Hall - CRC (1984)
369 pages
CHAID Decision Tree
No ratings yet
CHAID Decision Tree
14 pages
Class 10 CA 2024 Print
No ratings yet
Class 10 CA 2024 Print
65 pages
UNIT 3 Classification
No ratings yet
UNIT 3 Classification
17 pages
Module 3
No ratings yet
Module 3
33 pages
Data Mining: Decision Trees & CHAID
No ratings yet
Data Mining: Decision Trees & CHAID
18 pages
Decision Tree Induction in Data Mining
No ratings yet
Decision Tree Induction in Data Mining
15 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
Big Data Classification Basics
No ratings yet
Big Data Classification Basics
47 pages
Unit-Iv (Dmwh6em)
No ratings yet
Unit-Iv (Dmwh6em)
33 pages
5 Classification
No ratings yet
5 Classification
59 pages
A Comparative Analysis of Methods For Pruning Decision Trees
No ratings yet
A Comparative Analysis of Methods For Pruning Decision Trees
16 pages
Unit 4
No ratings yet
Unit 4
22 pages
Tree-Based Learning Explained
No ratings yet
Tree-Based Learning Explained
70 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
No ratings yet
A New Decision Tree Method For Data Mining in Medicine: Kasra Madadipouya
7 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Regression Trees
No ratings yet
Regression Trees
11 pages
Lecture 8
No ratings yet
Lecture 8
81 pages
Decision Tree
No ratings yet
Decision Tree
38 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Classification: Lecture Notes For Chapters 4 & 5
No ratings yet
Classification: Lecture Notes For Chapters 4 & 5
42 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
E IS388 Theory MellaMargaretaVeronica 00000059669
No ratings yet
E IS388 Theory MellaMargaretaVeronica 00000059669
7 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
40 pages
UNIT-3 Machine Learning
No ratings yet
UNIT-3 Machine Learning
43 pages
C4.5 Algorithm
100% (1)
C4.5 Algorithm
31 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
CH 6
No ratings yet
CH 6
72 pages
Unit 4
No ratings yet
Unit 4
19 pages
Unit 4
No ratings yet
Unit 4
20 pages
L4 Classification
No ratings yet
L4 Classification
7 pages
Unit 3
No ratings yet
Unit 3
95 pages
Classification Basics and Techniques
No ratings yet
Classification Basics and Techniques
65 pages
ML Assignment-2: Unit 3
No ratings yet
ML Assignment-2: Unit 3
21 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
15 pages
Feature Selection Method in Decision Tree Induction
No ratings yet
Feature Selection Method in Decision Tree Induction
7 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Unit 3
No ratings yet
Unit 3
98 pages
Digital Signal Controller TMS320F28335: Modul 2: Arhitektura
No ratings yet
Digital Signal Controller TMS320F28335: Modul 2: Arhitektura
16 pages
Marvellous Infosystems Machine Learning - Logistic Regression
No ratings yet
Marvellous Infosystems Machine Learning - Logistic Regression
3 pages
Class IX English Life Skills Guide
No ratings yet
Class IX English Life Skills Guide
16 pages
MGT 370 Auditing Group Assignment
No ratings yet
MGT 370 Auditing Group Assignment
2 pages
The Anglo-Saxon Period 449-1066 English
No ratings yet
The Anglo-Saxon Period 449-1066 English
10 pages
Ship Insurance Correspondents Guide
No ratings yet
Ship Insurance Correspondents Guide
136 pages
Inferring Meaning of Literary Terms
No ratings yet
Inferring Meaning of Literary Terms
11 pages
You Were Made For A Mission
No ratings yet
You Were Made For A Mission
4 pages
Bootstrap 3 All Classes List Cheat Sheet Reference PDF (2020) PDF
No ratings yet
Bootstrap 3 All Classes List Cheat Sheet Reference PDF (2020) PDF
21 pages
Year 8 Core V2 Summer 2021 Markscheme
No ratings yet
Year 8 Core V2 Summer 2021 Markscheme
5 pages
First vs. Second Conditional Practice
No ratings yet
First vs. Second Conditional Practice
2 pages
Cambridge International AS & A Level: English Language 9093/32
No ratings yet
Cambridge International AS & A Level: English Language 9093/32
8 pages
Introduction To Polynomial
No ratings yet
Introduction To Polynomial
23 pages
RHEL Patch Rollback Guide
No ratings yet
RHEL Patch Rollback Guide
7 pages
Thinking About The Emotions: A Philosophical History 1st Edition Cohen 2024 Scribd Download
100% (1)
Thinking About The Emotions: A Philosophical History 1st Edition Cohen 2024 Scribd Download
55 pages
Nolasco Rob New Streetwise Intermediate Student S Book
100% (2)
Nolasco Rob New Streetwise Intermediate Student S Book
122 pages
11ler 1. Dönem 2. Yazılı
No ratings yet
11ler 1. Dönem 2. Yazılı
2 pages
Part 1 Sol
No ratings yet
Part 1 Sol
26 pages
IC Work Plan Template
No ratings yet
IC Work Plan Template
5 pages
James Brown - Mixed Methods Research For TESOL-Edinburgh University Press - 1 Edition (November 5, 2014 (2014)
No ratings yet
James Brown - Mixed Methods Research For TESOL-Edinburgh University Press - 1 Edition (November 5, 2014 (2014)
14 pages
10731648
No ratings yet
10731648
274 pages
Process Scheduling
No ratings yet
Process Scheduling
1 page
English 10: Autobiography Analysis
No ratings yet
English 10: Autobiography Analysis
2 pages
Student Demo Topics Guide
No ratings yet
Student Demo Topics Guide
2 pages
Fuzzy Logic: Concepts and Applications
No ratings yet
Fuzzy Logic: Concepts and Applications
69 pages
Disable Logging of Changes On Customizing Tables
No ratings yet
Disable Logging of Changes On Customizing Tables
2 pages
Identify The Transition Words
No ratings yet
Identify The Transition Words
2 pages
Vinesh
No ratings yet
Vinesh
1 page
Fourth Grade Main Idea Lesson Plan
No ratings yet
Fourth Grade Main Idea Lesson Plan
4 pages
Grade 6 English Review Guide
No ratings yet
Grade 6 English Review Guide
30 pages