Decision Tree

A decision tree is a supervised classification model used for predicting a discrete target feature through a series of Boolean tests on input features. There are two main types: classification trees for categorical outcomes and regression trees for continuous values. The process of building a decision tree involves selecting the best attribute for splitting the data, recursive splitting, applying stopping criteria, and pruning to avoid overfitting.

Uploaded by

raja2017pillai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Decision Tree

Uploaded by

raja2017pillai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

What is a decision tree?

A decision tree is a simple model for supervised classification. It is used for classifying a single discrete
target feature. Each internal node performs a Boolean test on an input feature .The edges are labeled with
the values of that input feature. Each leaf node specifies a value for the target feature.

Classification of Decision Tree

We have mainly two types of decision tree based on the nature of the target variable: classification
trees and regression trees.
· Classification trees: They are designed to predict categorical outcomes means they classify
data into different classes. They can determine whether an email is “spam” or “not spam”
based on various features of the email.
· Regression trees : These are used when the target variable is continuous It predict numerical
values rather than categories. For example a regression tree can estimate the price of a house
based on its size, location, and other features.
How is a Decision Tree Formed?
Building a decision tree involves several steps that ensure the final model is both accurate and
interpretable. Here’s a breakdown of the process:
1. Selecting the Best Attribute
The first step in building a decision tree is selecting the best attribute to split the dataset. This is done
using criteria such as information gain or Gini index. The attribute that results in the highest gain (or
lowest impurity) is chosen as the root node.
2. Recursive Splitting
Once the best attribute is selected, the data is split into subsets based on the value of that attribute. This
process is repeated recursively for each subset, creating branches and sub-branches until the data is
perfectly split or meets a stopping criterion (e.g., maximum depth or minimum number of instances per
leaf).
3. Stopping Criteria
The decision tree continues splitting the data until one of the following conditions is met:
· All instances in a node belong to the same class.
· The maximum depth of the tree is reached.
· The number of instances in a node falls below a specified threshold.
4. Tree Pruning
After the decision tree is fully grown, it may be necessary to prune it to remove overfitting. Pruning helps
simplify the model by removing branches that do not contribute much to the accuracy of the model on
unseen data
Attribute Selection Measures
Choosing the right attribute to split the data at each node is a critical step in building an accurate decision
tree. The most common methods used for attribute selection are information gain and Gini index.
Information Gain
Information Gain is based on the concept of entropy, which measures the amount of disorder or
uncertainty in the dataset. The goal is to select the attribute that reduces entropy the most when the data is
split.

The formula for entropy is:

Entropy = - Σ (pi * log2(pi)),
where "pi" represents the probability of the i-th class occurring within the dataset and "log2" is the base-2
logarithm.
The formula for information gain in a decision tree is:
Information Gain = Entropy(Parent Node) - (Weighted average of Entropy of Child Nodes);
essentially, it calculates the difference between the entropy of the parent node before splitting and the
weighted average entropy of the child nodes after splitting, with the best split being the one that
maximizes information gain.
Gini Index
The Gini index measures the impurity of a dataset. A lower Gini index indicates a purer dataset,
meaning most of the instances belong to a single class.
The Gini index is computed as:
Gini = 1 - ∑(pi)^2
where "pi" represents the proportion of class "i" in the dataset, and the summation is taken over all classes
"i" present in the data; essentially, it calculates 1 minus the sum of the squared probabilities of each class
within a given node.
Comparison
Both information gain and Gini index serve a similar purpose, but information gain is more commonly
used in decision trees for classification tasks, while Gini index is often preferred in CART
(Classification and Regression Trees) algorithms.

Example
 To illustrate the operation of ID3, consider the learning task represented by the
training examples of below table.
 Here the target attribute PlayTennis, which can have values yes or no for different days.
 Consider the first step through the algorithm, in which the topmost node of the
decision tree is created.

Day Outlook Temperature Humidity Wind PlayTennis

D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

Given a collection S, containing positive and negative examples of some target concept, the entropy of
S relative to this Boolean classification is
Where,

p+ is the proportion of positive examples in S

p- is the proportion of negative examples in S.

Example:
Suppose S is a collection of 14 examples of some boolean concept, including 9 positive and 5 negative
examples. Then the entropy of S relative to this boolean classification is

 ID3 determines the information gain for each candidate attribute (i.e., Outlook,
Temperature, Humidity, and Wind), then selects the one with highest information
gain.
 The information gain values for all four attributes are
Gain(S, Outlook) = 0.246
Gain(S, Humidity) = 0.151
Gain(S, Wind) =
0.048 Gain(S, Temperature) =
0.029

 According to the information gain measure, the Outlook attribute provides the best
prediction of the target attribute, PlayTennis, over the training examples. Therefore,
Outlook is selected as the decision attribute for the root node, and branches are created
below the root for each of its possible values i.e., Sunny, Overcast, and Rain.
SRain = { D4, D5, D6, D10, D14}

Gain (SRain , Humidity) = 0.970 – (2/5)1.0 – (3/5)0.917 = 0.019

Gain (SRain , Temperature) =0.970 – (0/5)0.0 – (3/5)0.918 – (2/5)1.0 = 0.019

Gain (SRain , Wind) =0.970 – (3/5)0.0 – (2/5)0.0 = 0.970

Decision Tree
No ratings yet
Decision Tree
19 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Tree - Notes
No ratings yet
Decision Tree - Notes
8 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Training Day 22
No ratings yet
Training Day 22
48 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Inductive Inference with Decision Trees
No ratings yet
Inductive Inference with Decision Trees
53 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree
100% (4)
Decision Tree
66 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
6 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Information Gain in Decision Trees
No ratings yet
Information Gain in Decision Trees
10 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Examples
No ratings yet
Examples
8 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree Learning Basics
No ratings yet
Decision Tree Learning Basics
36 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Machine Learning Chapter 4
No ratings yet
Machine Learning Chapter 4
9 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
48 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
7 pages
Queries
No ratings yet
Queries
4 pages
Lasso Python Code
No ratings yet
Lasso Python Code
3 pages
Classification Performance
No ratings yet
Classification Performance
1 page
Reliability and Failure Analysis
No ratings yet
Reliability and Failure Analysis
12 pages
Probability and Risk Analysis
100% (1)
Probability and Risk Analysis
41 pages
Electricity Consumption and Economic Growth in China: Cointegration and Co-Feature Analysis
No ratings yet
Electricity Consumption and Economic Growth in China: Cointegration and Co-Feature Analysis
13 pages
Serial Dilution Problems
No ratings yet
Serial Dilution Problems
9 pages
Business Economic Forecasting Assignment
No ratings yet
Business Economic Forecasting Assignment
2 pages
02 Moisture Content of Aggregates
No ratings yet
02 Moisture Content of Aggregates
4 pages
Graphingstationsmiddleschoolsciencengssalignedgraphingactivities 2
No ratings yet
Graphingstationsmiddleschoolsciencengssalignedgraphingactivities 2
26 pages
Apendice G155
No ratings yet
Apendice G155
8 pages
MA-2203: Introduction To Probability and Statistics: Lecture Slides
No ratings yet
MA-2203: Introduction To Probability and Statistics: Lecture Slides
64 pages
Unit 3 Math For AI Statistics and Probability
No ratings yet
Unit 3 Math For AI Statistics and Probability
4 pages
Fisher - S Exact Test
No ratings yet
Fisher - S Exact Test
14 pages
Bitcoin Price Prediction with HMM
No ratings yet
Bitcoin Price Prediction with HMM
6 pages
Eco 154 - 0
No ratings yet
Eco 154 - 0
260 pages
Assessing Vulnerability To Drought Based On Exposure, Sensitivity and Adaptive Capacity: A Case Study in Middle Inner Mongolia of China
No ratings yet
Assessing Vulnerability To Drought Based On Exposure, Sensitivity and Adaptive Capacity: A Case Study in Middle Inner Mongolia of China
13 pages
OCTOPUS Office User Manual
No ratings yet
OCTOPUS Office User Manual
56 pages
Minitab Guide: Exponential Smoothing
No ratings yet
Minitab Guide: Exponential Smoothing
14 pages
GPA and ACT Correlation Analysis
No ratings yet
GPA and ACT Correlation Analysis
8 pages
Climate Risk and Business: Practical Methods For Assessing Risk
No ratings yet
Climate Risk and Business: Practical Methods For Assessing Risk
46 pages
(Zurich Lectures in Advanced Mathematics) Guus Balkema-High Risk Scenarios and Extremes A Geometric Approach - European Mathematical Society (2007)
100% (1)
(Zurich Lectures in Advanced Mathematics) Guus Balkema-High Risk Scenarios and Extremes A Geometric Approach - European Mathematical Society (2007)
391 pages
Best Movesets for Unova Starters
100% (1)
Best Movesets for Unova Starters
105 pages
Cheat Sheet For Test 4 Updated
No ratings yet
Cheat Sheet For Test 4 Updated
8 pages
Allen Et Al. 2013
No ratings yet
Allen Et Al. 2013
14 pages
Humidification
No ratings yet
Humidification
4 pages
Class 3 Navie Bayes
No ratings yet
Class 3 Navie Bayes
21 pages
Data Tables & Monte Carlo Simulations in Excel - A Comprehensive Guide
No ratings yet
Data Tables & Monte Carlo Simulations in Excel - A Comprehensive Guide
48 pages
Cointegration, Error Correction CH 3 - 8 PDF
No ratings yet
Cointegration, Error Correction CH 3 - 8 PDF
344 pages
Kasanayan Sapananaliksik Sa Filipino
No ratings yet
Kasanayan Sapananaliksik Sa Filipino
36 pages
Corrosion Prediction
No ratings yet
Corrosion Prediction
12 pages
Evaluation of Hot Pepper Varieties (Capsicum Species) For Yield Related Traits, Quality and Yield in Burie District, West Gojjam Zone, Ethiopia
No ratings yet
Evaluation of Hot Pepper Varieties (Capsicum Species) For Yield Related Traits, Quality and Yield in Burie District, West Gojjam Zone, Ethiopia
8 pages
Study of Anova Test
No ratings yet
Study of Anova Test
16 pages