CLASSIFICATION: Business Analytics
DECISION TREES Lecture 7/8
LEARNING OBJECTIVES
• Explain what is classification
• Define decision tree
• Compare the advantages and disadvantages of decision tree
• Building decision a tree
• Evaluating a decision tree
A. Explain What is Classification
WHAT IS CLASSIFICATION
• Classification is a data mining function that assigns items into
categories or classes.
• The goal of classification is to accurately predict the target class for
each case in the data.
A. Explain What is Classification
EXAMPLES OF
CLASSIFICATION TASKS
• Identifying loan applicants as low, medium, or high credit risks.
• Predicting tumour cells as benign or malignant
• Classifying credit card transactions as legitimate or fraudulent
A. Explain What is Classification
CLASSIFICATION RULES
• Classification rules help assign new objects to classes.
E.g., given a new automobile insurance applicant, should he or she be
classified as low risk, medium risk or high risk?
• Classification rules for the above example could use a variety of data, such
as educational level, salary, age, etc.
Person P, P.degree = master and P.income > 75,000 ⇒ P.credit = excellent
Person P, P.degree = bachelorsand (P.income > 25,000 and P.income< 75,000) ⇒ P.credit = good
• Rules are not necessarily exact - there may be some misclassifications
• Classification rules can be represented by a decision tree.
B. Define Decision Tree
WHAT IS A DECISION TREE?
• A decision tree is a hierarchical collection of rules
that describes how to divide a collection of records
into successively smaller groups of records.
• The aim of the division is to have resulting segments
become more and more similar (pure) to one another
with respect to the target.
• It is a predictive model based on a branching series
of tests
• Can be used for binary or multiple outcomes
• Allows us to understand which variables are
important
• Spot unexpected patterns
B. Define Decision Tree
STRUCTURE OF A DECISION TREE
• Consist of root, nodes, leaves, and splits
• At each node, a decision is made on which
variable to split
• These variables are the most important
• All records landing at the same leaf get the
same prediction
C. Advantages and Disadvantages of Decision Tree
PROS AND CONS
OF DECISION TREES
Pros Cons
+ Reasonable training time - Cannot handle complicated
relationship between attributes
+ Fast application
- Problems are created with lots of
+ Easy to interpret
missing data
+ Easy to implement
+ Can handle large number of
attributes
D. Building a Decision Tree
PURPOSE OF A DECISION TREE
• Given a collection of records (training set )
‒ Each record contains a set of attributes.
• One of the attributes is the class.
• The aim is to find a model for the class attribute as a function of the values
of other attributes.
• Goal: previously unseen records should be assigned a class as accurately as
possible.
‒ A test set is used to determine the accuracy of the model.
‒ Usually, the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
D. Building a Decision Tree
DECISION TREE
CLASSIFICATION TASK
D. Building a Decision Tree
DECISION TREE USING HUNT’S
ALGORITHM
Hunt's algorithm
grows a decision tree
in a recursive fashion
by partitioning the training records
into successively purer subsets.
D. Building a Decision Tree
BUILDING THE DECISION TREE
• We start at the root node with all records in the training set
• Drawn from left to right
• Consider every split on every variable
• Choose the split that maximizes a measure of purity
• For each child of the root node, we again search for the best split
• Eventually, the process stops when no good split is available or leaves are
pure
D. Building a Decision Tree
BUILDING THE DECISION TREE
Step 1
Don’t Cheat (4 )
Cheat (2)
D. Building a Decision Tree
BUILDING THE DECISION TREE
Step 2
Don’t Cheat (1 )
Cheat (3)
D. Building a Decision Tree
BUILDING THE DECISION TREE
Step 3
D. Building a Decision Tree
BUILDING THE DECISION TREE
Step 1
Step 3
Step 2
E. Evaluating a Decision Tree
BUILDING THE DECISION TREE
E. Evaluating a Decision Tree
APPLY MODEL TO TEST DATA
E. Evaluating a Decision Tree
APPLY MODEL TO TEST DATA
E. Evaluating a Decision Tree
APPLY MODEL TO TEST DATA
E. Evaluating a Decision Tree
HOW TO SPLIT DATA
FOR TEST CONDITION
• Depends on attribute types
‒ Nominal (Categorical)
‒ Ordinal (Categorical but ordered for example education level)
‒ Continuous (any value can be represented)
• There are two types of splits
‒ Multi-way split
‒ 2-way split (binary)
E. Evaluating a Decision Tree
SPLITTING BASED ON
NOMINAL ATTRIBUTES
E. Evaluating a Decision Tree
SPLITTING BASED ON
NOMINAL ATTRIBUTES
E. Evaluating a Decision Tree
SPLITTING BASED ON
CONTINUOUS ATTRIBUTES
Different ways of handling
• Change value to form an ordinal categorical attribute
• Binary Decision: (A < v) or (A >=v)
‒ Consider all possible splits and finds the best cut
E. Evaluating a Decision Tree
FINDING A GOOD SPLIT
AT A DECISION TREE NODE
• There are many ways to find a good split
• But, they have two things in common - Splits are preferred where
‒ The children are similar in size
‒ Each child is as pure as possible
• Most algorithms seek to maximize the purity of each of the children
E. Evaluating a Decision Tree
HOW TO DETERMINE
THE BEST SPLIT
• Nodes with homogeneous class distribution are preferred
E. Evaluating a Decision Tree
HOW TO DETERMINE
THE BEST SPLIT
• Before Splitting: 10 records of class 0, 10 records of class 1
E. Evaluating a Decision Tree
PERFORMANCE MEASURES FOR
DECISION TREES
• After a decision tree is constructed, each leaf node has a score
• A leaf score is the likelihood that the more common class arises
• A decision tree also has an accuracy score which is calculated as
follows:
Accuracy = # Correctly classified/ Total #
E. Evaluating a Decision Tree
CALCULATING THE ACCURACY OF
A DECISION TREE
E. Evaluating a Decision Tree
EVALUATION OF
CLASSIFICATION MODELS
• Counts of test records that are correctly (or incorrectly) predicted by
the classification model
• Confusion matrix
E. Evaluating a Decision Tree
EXERCISE FOR DECISION TREE -
SHOULD WE GO SAILING?
yes (2)
no (3)
E. Evaluating a Decision Tree
EXERCISE FOR DECISION TREE -
SHOULD WE GO SAILING?
yes (1)
no (1)
E. Evaluating a Decision Tree
EXERCISE FOR DECISION TREE -
SHOULD WE GO SAILING?
E. Evaluating a Decision Tree
EXERCISE FOR DECISION TREE -
SHOULD WE GO SAILING?
QUESTIONS?