Decision Tree

The document discusses various classification methods in machine learning, including Naïve Bayesian classification, decision trees, and instance-based methods, highlighting the differences between supervised and unsupervised learning. It explains the two-step process of data classification involving a learning step to construct a model and a classification step to predict class labels for new data. Additionally, it details decision tree algorithms, attribute selection measures like information gain, gain ratio, and Gini index, and the principles of Bayesian classification.

Uploaded by

coderashu15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views41 pages

Decision Tree

Uploaded by

coderashu15

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Classification

Naïve Bayesian classification, Decision trees, Decision rules,

Instance-based methods
Supervised vs. Unsupervised Learning
Supervised learning (classification)
• Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations.
• New data is classified based on the training set.
Unsupervised learning (clustering)
• The class labels of training data is unknown.
• Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data.
Prediction Problems: Classification vs. Numeric Prediction
Classification
• Predicts categorical class labels (discrete or nominal)
• Classifies data (constructs a model) based on the training set and the values
(class labels) in a classifying attribute and uses it in classifying new data
Numeric Prediction
• Models continuous-valued functions, i.e., predicts unknown or missing values
Typical applications
• Credit/loan approval:
• Medical diagnosis: if a tumor is cancerous or benign
• Fraud detection: if a transaction is fraudulent
• Web page categorization: which category it is
Classification
Data classification is a two-step process: Learning & Classification step.
i. Learning step (or Training phase or Model construction): where a
classification model is constructed.
• Describing a set of predetermined classes.
• “Learning from” a training set made up of database tuples and their
associated class labels.
• Each tuple/sample is assumed to belong to a predefined class, as
determined by the class label attribute.
• The set of tuples used for model construction is training set
• The model is represented as classification rules, decision trees, or
mathematical formulae
• The class label attribute is categorical (or nominal) in that each value
serves as a category or class.
Classification
ii. Classification step (Model usage ): where the model is used to predict
class labels for given data.
• Classifying future or unknown objects
• If the accuracy is acceptable, use the model to classify new data
Note: If the test set is used to select models, it is called validation (test)
set
Classification
• Estimate accuracy of the model
• The known label of test sample is compared with the classified result
from the model
• Accuracy rate is the percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set (otherwise overfitting)
Decision Tree
• A decision tree is a flowchart-like tree structure:
• Each internal node (nonleaf node) denotes a test on an attribute
• Each branch represents an outcome of the test.
• Each leaf node (or terminal node) holds a class label.
• Internal nodes are denoted
by rectangles.
• Leaf nodes are denoted by
ovals.
• Some decision tree
algorithms produce only
binary trees.
• Others can produce
nonbinary trees.
Decision Tree
How are decision trees used for classification?
• Given a tuple, X, for which the associated class label is unknown, the
attribute values of the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class
prediction for that tuple.
• Decision trees can easily be
converted to classification
rules.
• Decision tree classifiers have
good accuracy. However,
successful use may depend
on the data at hand.
Decision Tree
• Popular traditional Decision Tree Algorithms:
✔ ID3 (Iterative Dichotomiser)
✔ C4.5 (a successor of ID3)
✔ CART (Classification And Regression Trees)
• ID3, C4.5, and CART adopt a greedy (i.e., nonbacktracking) approach in
which decision trees are constructed in a top-down recursive
divide-and-conquer manner.
Decision Tree
Decision tree algorithm has three parameters:
i. D: data partition, Initially, it is the complete set of training tuples and their
associated class labels.
ii. Parameter attribute list: the set of candidate attributes.
iii. Attribute selection method: specifies a heuristic procedure for selecting
the attribute that “best” discriminates the given tuples according to class.
The criterion consists of a splitting attribute and, possibly, either a
split-point or splitting subset.
Attribute selection measure such as information gain or the Gini index.
Gini index: Tree is strictly binary
Information gain: Multiway splits (i.e., two or more branches to be grown
from a node)
Decision Tree
• If the tuples in D are all of the same class, then node N becomes a leaf
and is labeled with that class (Stopping criteria).
• Otherwise, the algorithm calls Attribute selection method to determine
the splitting criterion.
• All the tuples in partition D (represented at node N) belong to the same
class.
Stopping criteria: Recursive partitioning stops only when any one of the
following terminating conditions is true :
✔ All the tuples in partition D (represented at node N) belong to the
same class.
✔ There are no remaining attributes on which the tuples may be further
partitioned.
✔ There are no tuples for a given branch, that is, a partition Dj is empty.
Decision Tree
• Let A be the splitting attribute. A has v distinct values, {a 1, a2,..., av},
based on the training data.
• A is discrete-valued: In this case, the outcomes of the test at node N
correspond directly to the known values of A.
• A branch is created for each known value, aj, of A and labeled with that
value.
• A is continuous-valued: In this case, the test at node N has two possible
outcomes, corresponding to the conditions A ≤ split point and A > split
point, respectively.
Decision Tree: Attribute Selection Measures
Three popular attribute selection measures—
a) Information gain (ID3)
b) Gain ratio (C4.5)
c) Gini index (CART)
Decision Tree: Attribute Selection Measures- Information gain
Information Gain:
• ID3 uses information gain as its attribute selection measure.
• The attribute with the highest information gain is chosen as the
splitting attribute for node N.
• To classify the tuples in the resulting partitions and reflects the least
randomness or “impurity” in these partitions.
Decision Tree: Attribute Selection Measures- Information gain
Entropy (Information Theory)
• A measure of uncertainty associated with a random variable.
• High entropy -> higher uncertainty
• Lower entropy -> lower uncertainty
• The expected information (or entropy) needed to classify a tuple in D
is:
Decision Tree: Attribute Selection Measures- Information gain
• The expected information (or entropy) needed to classify a tuple in D
is:

• where pi is the nonzero probability that an arbitrary tuple in D belongs

to class Ci and is estimated by |Ci,D|/|D|.
• A log function to the base 2 is used, because the information is encoded
in bits.
• Info(D) is just the average amount of information needed to identify the
class label of a tuple in D.
Decision Tree: Attribute Selection Measures- Information gain
• Information needed (after using A to split D into v partitions) to classify
D:

• The term acts as the weight of the jth partition.

• InfoA(D) is the expected information required to classify a tuple from D
based on the partitioning by A.
• The smaller the expected information (still) required, the greater the
purity of the partitions.
Decision Tree: Attribute Selection Measures- Information gain
• Information gain is defined as the difference between the original
information requirement (i.e., based on just the proportion of classes)
and the new requirement (i.e., obtained after partitioning on A).
• Information gained by branching on attribute A
Decision Tree: Attribute Selection Measures- Information gain
• In this example, class label attribute, buys_computer, has two distinct
values (namely, yes, no).
• There are nine tuples of class yes and five tuples of class no.
• A (root) node N is created for the tuples in D.
• To find the splitting criterion for
these tuples, we first need to
compute the expected information
to classify a tuple in D.
Decision Tree: Attribute Selection Measures- Information gain
• Next, we need to compute the expected information requirement for
each attribute.
• Let’s start with the attribute age. We need to look at the distribution of
yes and no tuples for each category of age.
Decision Tree: Attribute Selection Measures- Information gain
• For the age category “youth,” there are two yes tuples and three no
tuples.
• For the category “middle aged,” there are four yes tuples and zero no
tuples.
• For the category “senior,” there
are three yes tuples and two no
tuples.
Decision Tree: Attribute Selection Measures- Information gain

• Similarly, we can compute Gain(income) =0.029 bits, Gain(student) =

0.151 bits, and Gain(credit_rating) = 0.048 bits.
• Because age has the highest
information gain among the
attributes, it is selected as the
splitting attribute.
Decision Tree: Attribute Selection Measures- Information gain

• *Note the tuples falling into the partition for age D

middle aged all belong to the same class.
• Therefore be created at the end of this branch and
labeled “yes.”
Decision Tree: Attribute Selection Measures- Information gain
Decision Tree: Attribute Selection Measures
• Computing Information-Gain for Continuous-Valued Attributes:
• Let attribute A be a continuous-valued attribute
• Must determine the best split point for A
Sort the value A in increasing order
Typically, the midpoint between each pair of adjacent values is
considered as a possible split point
(ai+ai+1)/2 is the midpoint between the values of ai and ai+1
The point with the minimum expected information requirement for
A is selected as the split-point for A
• Split:
D1 is the set of tuples in D satisfying A ≤ split-point, and D2 is the set
of tuples in D satisfying A > split-point
Decision Tree: Attribute Selection Measures- Gain Ratio
• Information gain measure is biased toward tests with many outcomes.
• Information gain prefers to select attributes having a large number of
values.
• For example, consider an attribute that acts as a unique identifier such
as product ID.
• A split on product ID would result in a large number of partitions (as
many as there are values), each one containing just one tuple.
• Each partition is pure, the information required to classify data set D
based on this partitioning would be Infoproduct_ID(D)= 0.
• Therefore, the information gained by partitioning on this attribute is
maximal.
Decision Tree: Attribute Selection Measures- Gain Ratio
• C4.5, a successor of ID3, uses an extension to information gain known
as gain ratio, which attempts to overcome this bias.

• For each outcome, it considers the number of tuples having that

outcome with respect to the total number of tuples in D.

• The attribute with the maximum gain ratio is selected as the splitting
attribute.
Decision Tree: Attribute Selection Measures- Gain Ratio
• Computation of gain ratio for the attribute income.

• We have Gain(income) = 0.029.

• Therefore,
GainRatio(income) 0.029/1.557
= 0.019.
Decision Tree: Attribute Selection Measures- Gini Index
• Gini index measures the impurity of a data partition or set of training
tuples.

• where pi is the probability that a tuple in D belongs to class Ci and is

estimated by |Ci,D|/|D|. The sum is computed over m classes.
• Gini index considers a binary split for each attribute.
• To determine the best binary split on A, we examine all the possible
subsets that can be formed using known values of A.
• For example, if income has three possible values, namely {low, medium,
high}, then the possible subsets are {low, medium, high}, {low, medium},
{low, high}, {medium, high}, {low}, {medium}, {high}, and {}.
Decision Tree: Attribute Selection Measures- Gini Index
• We exclude the power set, {low, medium, high}, and the empty set {}
from consideration since, conceptually, they do not represent a split.
• Therefore, there are 2v − 2 possible ways to form two partitions of the
data D, based on a binary split on A.
• When considering a binary split, we compute a weighted sum of the
impurity of each resulting partition.
• For example, if a binary split on A partitions D into D1 and D2, the Gini
index of D given that partitioning is:

• For a discrete-valued attribute, the subset that gives the minimum Gini
index for that attribute is selected as its splitting subset.
Decision Tree: Attribute Selection Measures- Gini Index
• The reduction in impurity that would be incurred by a binary split on a
discrete- or continuous-valued attribute A is:

• The attribute that maximizes the reduction in impurity (or, equivalently,

has the minimum Gini index) is selected as the splitting attribute.
Decision Tree: Attribute Selection Measures- Gini Index
• Gini index to compute the impurity of D:

• To find the splitting criterion for the tuples in D, we need to compute

the Gini index for each attribute.
• For attribute income and consider each
of the possible splitting subsets.
• The subset {low, medium} in 10 tuples
in partition D1 satisfying the condition
“income ∈ {low, medium}.”
• The remaining four tuples of D would
be assigned to partition D2.
Decision Tree: Attribute Selection Measures- Gini Index
• Gini index value computed based on partitioning {low, medium} is

• Gini index for the subsets {low, high}

and {medium} is 0.458
• {medium, high} and {low}= 0.450.
• The best binary split for attribute
income is on {low, medium*} (or {high})
because it minimizes the Gini index.
Decision Tree: Attribute Selection Measures- Gini Index
• Evaluating age, we obtain {youth, senior} (or {middle aged}) as the best
split for age with a Gini index of 0.375.
• The attributes student and credit_rating are both binary, with Gini index
values of 0.367 and 0.429, respectively.

• Gini index for the subsets {low, high}

and {medium} is 0.458
• {medium, high} and {low}= 0.450
Decision Tree
The three measures, in general, return good results but
• Information gain:
biased towards multivalued attributes
• Gain ratio:
tends to prefer unbalanced splits in which one partition is much
smaller than the others
• Gini index:
biased to multivalued attributes
has difficulty when # of classes is large
tends to favor tests that result in equal-sized partitions and purity in
both partitions
Bayes Classification
• Bayesian classifiers are statistical classifiers.
• Bayesian classifier can predict class membership probabilities such as
the probability that a given tuple belongs to a particular class.
• Bayesian classification is based on Bayes’ theorem.
• A simple Bayesian classifier known as the naïve Bayesian classifier.
• Naïve Bayesian classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other attributes.

• Bayes’ theorem is useful in that it provides a way of calculating the

posterior probability, P(H|X) from P(H), P(X|H), and P(X).
Bayes Classification
• For example, X is a 35-year-old customer with an income of $40,000.
Suppose that H is the hypothesis that our customer will buy a
computer.
• Then P(H|X) reflects the probability that customer X will buy a
computer given that we know the customer’s age and income.
Bayes Classification
• Naïve Bayesian classifiers assume that the effect of an attribute value
on a given class is independent of the values of the other attributes.

• For example, X is a 35-year-old customer with an income of $40,000.

Suppose that H is the hypothesis that our customer will buy a
computer.
• Then P(H|X) reflects the probability that customer X will buy a
computer given that we know the customer’s age and income.
Naïve Bayesian Classification
• Naïve Bayesian classifier, or simple Bayesian classifier, works as follows:
• Let D be a training set of tuples and their associated class labels. As
usual, each tuple is represented by an n-dimensional attribute vector, X
= (x1, x2,…, xn), depicting n measurements made on the tuple from n
attributes, respectively, A1, A2,…, An.
• Suppose that there are m classes, C1, C2,:::, Cm. Given a tuple, X, the
classifier will predict that X belongs to the class having the highest
posterior probability, conditioned on X. That is, the na¨ıve Bayesian
classifier predicts that tuple X belongs to the class Ci if and only if
Summary
• Data sets are made up of data objects. A data object represents an
entity. Data objects are described by attributes. Attributes can be
nominal, binary, ordinal, or numeric.
• The values of a nominal (or categorical) attribute are symbols or names
of things, where each value represents some kind of category, code, or
state.
• Binary attributes are nominal attributes with only two possible states
(such as 1 and 0 or true and false). If the two states are equally
important, the attribute is symmetric; otherwise it is asymmetric.
• An ordinal attribute is an attribute with possible values that have a
meaningful order or ranking among them, but the magnitude between
successive values is not known.
References
• Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and
Techniques, Morgan Kaufmann, 3rd Edition.
• Tan, Pang-Ning, Michael Steinbach, and Vipin Kumar, Introduction to
data mining, Pearson India.

Classification
No ratings yet
Classification
75 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Trees
No ratings yet
Trees
78 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Lecture11 Ch8 ClassBasic Part1
No ratings yet
Lecture11 Ch8 ClassBasic Part1
38 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Data Mining Chapter 3 Classification
No ratings yet
Data Mining Chapter 3 Classification
82 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
190 pages
Topic7 Classification
No ratings yet
Topic7 Classification
104 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
CH 5
No ratings yet
CH 5
84 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
ECE Classification Concepts
No ratings yet
ECE Classification Concepts
69 pages
DM 4
No ratings yet
DM 4
68 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Classification
No ratings yet
Classification
81 pages
Concepts and Techniques
No ratings yet
Concepts and Techniques
53 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Class Basic
No ratings yet
Class Basic
75 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Understanding Classification and Decision Trees
No ratings yet
Understanding Classification and Decision Trees
80 pages
ML 04 Classification Decission Tree
No ratings yet
ML 04 Classification Decission Tree
28 pages
Data Mining: Classification Algorithms
No ratings yet
Data Mining: Classification Algorithms
34 pages
Unit Iii
No ratings yet
Unit Iii
11 pages
Classification
No ratings yet
Classification
45 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
DM 3
No ratings yet
DM 3
37 pages
Indiana University Press Meridians: This Content Downloaded From 128.192.114.19 On Mon, 25 Apr 2016 17:19:06 UTC
No ratings yet
Indiana University Press Meridians: This Content Downloaded From 128.192.114.19 On Mon, 25 Apr 2016 17:19:06 UTC
34 pages
Reverie and Metaphor - Some Thoughts On How - Thomas H. Ogden
No ratings yet
Reverie and Metaphor - Some Thoughts On How - Thomas H. Ogden
16 pages
Spiritual Journey and Faith Talks
No ratings yet
Spiritual Journey and Faith Talks
128 pages
Family and Community Partnerships in Early Childhood Education MCQ PDF
No ratings yet
Family and Community Partnerships in Early Childhood Education MCQ PDF
14 pages
Psychology of Criminal Behaviour a Canadian Perspective Canadian 2nd Edition Brown eBook and TestBank Bundle Exam Prep
No ratings yet
Psychology of Criminal Behaviour a Canadian Perspective Canadian 2nd Edition Brown eBook and TestBank Bundle Exam Prep
342 pages
Professionalism in Engineering Consultancy
No ratings yet
Professionalism in Engineering Consultancy
70 pages
R Code For Linear Regression Analysis 1 Way ANOVA
No ratings yet
R Code For Linear Regression Analysis 1 Way ANOVA
8 pages
Sustainable Cooling for Farmers
No ratings yet
Sustainable Cooling for Farmers
3 pages
Unlock Your Potential
No ratings yet
Unlock Your Potential
45 pages
4 2 (A) Fig. 2.1 Shows A Bookshelf With Two Groups of Books A and B On It. There Are Six Books in
No ratings yet
4 2 (A) Fig. 2.1 Shows A Bookshelf With Two Groups of Books A and B On It. There Are Six Books in
1 page
Cage Density Effects on Duck Performance
No ratings yet
Cage Density Effects on Duck Performance
8 pages
Alexander Wendt - Anarchy Is What States Make of It
No ratings yet
Alexander Wendt - Anarchy Is What States Make of It
2 pages
Iere High School Exam Timetable 2025
No ratings yet
Iere High School Exam Timetable 2025
2 pages
Green Building Code
100% (3)
Green Building Code
50 pages
Classroom Observation: Whole Class Teaching
No ratings yet
Classroom Observation: Whole Class Teaching
20 pages
Fluid Statics: Principles of Hydrostatics Pressure or Unit Pressure, P
100% (1)
Fluid Statics: Principles of Hydrostatics Pressure or Unit Pressure, P
11 pages
Department of Economics Undergraduate Courses
No ratings yet
Department of Economics Undergraduate Courses
4 pages
SMK Syed Ibrahim: RPT Bahasa Inggeris Tingkatan 5
No ratings yet
SMK Syed Ibrahim: RPT Bahasa Inggeris Tingkatan 5
10 pages
Øklfyadm Ikwfyfkhyhu Fo - QR Jksfèkr Fkeksziykflvd DH (Kksy Okyh Dscy Fof'Kf"V
No ratings yet
Øklfyadm Ikwfyfkhyhu Fo - QR Jksfèkr Fkeksziykflvd DH (Kksy Okyh Dscy Fof'Kf"V
17 pages
Jeddah-Jeddah Ip11231
No ratings yet
Jeddah-Jeddah Ip11231
2 pages
Refrigerants
No ratings yet
Refrigerants
28 pages
Erro
No ratings yet
Erro
74 pages
VFX - Index - Report - 2023 - SAMPLE
No ratings yet
VFX - Index - Report - 2023 - SAMPLE
19 pages
Short Answer Key: Practice Problems 1
No ratings yet
Short Answer Key: Practice Problems 1
2 pages
Group2 PR 2.1751533604098
No ratings yet
Group2 PR 2.1751533604098
36 pages
Criticisms of The Theory of Administration
No ratings yet
Criticisms of The Theory of Administration
8 pages
Pharmacy
No ratings yet
Pharmacy
19 pages
Thesis Francois Habak
No ratings yet
Thesis Francois Habak
29 pages
SLV10010075451DCQ
No ratings yet
SLV10010075451DCQ
9 pages
Petaliidae Lab Report
No ratings yet
Petaliidae Lab Report
2 pages