0% found this document useful (0 votes)

24 views6 pages

08 Notes DecisionTrees RandomForest

Uploaded by

hellokitty97531

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

08 Notes DecisionTrees RandomForest

Uploaded by

hellokitty97531

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Decision Trees and Random Forests∗

February 20, 2025

1 Introduction
In the previous lectures we dealt with the design of linear classifiers described by linear
discriminant functions g(x),
g(x) = wT x + b = 0.
Provided that the dataset is linearly separable, we saw that algorithms such as standard
linear classifier and SVM can derive the weights w and bias b which optimally fit the
input data samples. When classes slightly overlap, soft margin SVM is exploited. From
this lecture, we will deal with problems that are not linearly separable and for which
the design of a linear classifier, even in an optimal way, does not lead to satisfactory
performance. The design of nonlinear classifiers emerges now as an inescapable necessity.
To seek non-linearly separable problems one does not need to go into complicated
situations. The well-known XOR Boolean function is a typical example of such a problem
(see Figure 1). Depending on the values of the input binary data x = [x1 , x2 ], the output
is either 0 or 1, and x is classified into one of the two classes +1 or -1. The corresponding
truth table for the XOR operation is shown in Table 1, where a true value is expected if
the two inputs are not equal and a false value if they are equal.

Figure 1: The XOR problem. The blue dots denote true and red dots denote false.

∗
References
- Christopher Bishop. Pattern Recognition and Machine Learning. 2006
- Sergios Theodoridis, Konstantinos Koutroumbas. Pattern Recognition. 2009
- Understanding Random Forests. March 7, 2022

1
Course Notes GEO5017: Machine Learning for the Built Environment

x1 0 0 1 1
x2 0 1 0 1
y +1 -1 -1 +1

Table 1: Truth labels for the XOR problem.

It is apparent from the XOR problem that no single straight line exists that can sep-
arate the two classes, as the decision boundary between the two classes is naturally not
linear. Non-linear classifiers are specifically designed to cope with such problems. Com-
mon non-linear classifiers include decision trees, random forests, Multi-Layer Perceptron
(MLP) algorithm, neural networks, and deep learning. In this lecture, we mainly discuss
decision trees and random forests.

2 Decision trees
Decision trees are known as multistage decision systems in which classes are sequentially
rejected until a finally accepted class is reached. To this end, the feature space is split into
unique regions, corresponding to the classes, in a sequential manner. Upon the arrival of
a feature vector, the searching of the region to which the feature vector will be assigned is
achieved via a sequence of decisions along a path of nodes of an appropriately constructed
tree. Such schemes offer advantages when a large number of classes are involved. The
most popular decision trees are those that split the space into hyperrectangles with sides
parallel to the axes. The sequence of decisions is applied to individual features, and the
questions to be answered are of the form
is feature xi ≤ a ?
where a is a threshold value. Such trees are known as Ordinary Binary Classification
Trees (OBCTs). Other types of trees are also possible that split the space into convex
polyhedral cells or pieces of spheres. Figure 2 gives an example of the OBCT.

Figure 2: Decision tree classification. (a) OBCT divides the feature space into sub-
regions. (b) A set of questions to be asked for classification.

Splitting criteria. Every binary split of a node generates two descendant nodes.
For the tree growing methodology, from the root node down to the leaves, to make sense,

2
Course Notes GEO5017: Machine Learning for the Built Environment

every split must generate subsets that are more “class homogeneous” compared to the
ancestor’s subset. This means that the training feature vectors in each one of the new
subsets show a higher preference for specific class(es), whereas data in ancestors are more
equally distributed among the classes. The goal, therefore, is to define a measure that
quantifies node impurity and split the node so that the overall impurity of the descendant
nodes is optimally decreased concerning the ancestor node’s impurity.
We consider two different criteria for evaluating the impurity of a given node t: Gini
impurity and Entropy impurity.

• Gini impurity. It’s given by

K
X
I(t) = 1 − p(yk |t)2 , (1)
k=1

where p(yk |t) is the probability of a vector in note t belongs to the class yk . K is the
total number of classes in the current node. If the node t contains only one class,
I(t) = 0. Consider a two-class problem, Gini impurity gets its maximum value
when the two classes have the same probability, i.e., I(t) = 1 − (0.52 + 0.52 ) = 0.5.

• Entropy impurity. It’s given by

K
X
I(t) = − p(yk |t) · log2 p(yk |t), (2)
k=1

where log2 is the logarithm with base 2. This is nothing else than the entropy
associated with the data contained in node t, known from Shannon’s Information
Theory. Similar to Gini, Entropy impurity gets 0 if the node t contains only one
class. It gets its maximum value when the probability of the two classes is the same,
i.e., I(t)max = −(0.5 · log2 0.5 + 0.5 · log2 0.5) = 1.

Computationally, entropy is more complex since it makes use of logarithms and con-
sequently, the calculation of the Gini index will be faster. In practice, Gini impurity is
more widely used (e.g., the default splitting criterion in the scikit-learn library is Gini).
Let us denote the splitted nodes by tY and tN according to the “Yes” or “No” answer
to the single question adopted for the node t, also referred as the ancestor node. Let Nt
denotes the data contained in t. The descendant nodes are associated with two new
subsets, that is, NtY , NtN , respectively. The decrease in node impurity is defined as
NtY NtN
∆I(t) = I(t) − I(tY ) − I(tN ),
Nt Nt
where I(tY ), I(tN ) are the impurities of the tY and tN nodes, respectively. The goal is
to adopt, from the set of candidate questions, the one that performs the split leading to
the highest decrease of impurity.
Stopping rule. The natural question that now arises is when one decides to stop
splitting a node and declares it as a leaf of the tree. A possibility is to adopt a threshold
T and stop splitting if the maximum value of ∆I(t) over all possible splits, is less than
T . Other alternatives are to stop splitting either if the subset Nt is small enough or the
data in node t is pure (i.e., all data samples in node t belong to the same class).

3
Course Notes GEO5017: Machine Learning for the Built Environment

A critical factor in designing a decision tree is its size. As was the case with the MLPs,
the size of a tree must be large enough but not too large; otherwise, it tends to learn
the particular details of the training set and exhibits poor generalization performance.
Experience has shown that the use of a threshold value for the impurity decreases as the
stop-splitting rule does not lead to trees of the right size, because it usually stops tree
growing either too early or too late. The most commonly used approach is to grow a tree
up to a large size first and then prune nodes according to a pruning criterion.
A drawback associated with tree classifiers is their high variance. In practice, it is
common that a small change in the training data set results in a very different tree. The
reason for this lies in the hierarchical nature of the tree classifiers. In the next section,
random forests are introduced for overcoming such limitations.

3 Random forests
A common strategy to overcome the high variance of decision tree classifiers is to com-
bine them. Thus, one can exploit their advantages to reach an overall better performance
than could be achieved by using each of them separately. An important observation that
justifies such an approach is the following. From the different (candidate) decision trees
we design to choose the one that results in the best performance (i.e., the highest clas-
sification accuracy). However, different trees may fail (to classify correctly) on different
data distributions. That is, even the “best” decision tree can fail on datasets that other
trees succeed on. To this end, the random forest algorithm is proposed.
Random forest, like its name implies, consists of a large number of individual decision
trees that operate as an ensemble. Each tree in the random forest spits out a class
prediction and the class with the most votes becomes the model’s prediction. Figure 3
gives an illustration of the random forest model.

Figure 3: A random forest model.

The fundamental concept behind random forest is a simple but powerful one: the
wisdom of crowds. The reason that the random forest model works so well is: a large
number of relatively uncorrelated classifiers (trees) operating as a committee will outper-
form any of the individual constituent classifiers. Two aspects are involved to ensure the

4
Course Notes GEO5017: Machine Learning for the Built Environment

independence among the various tree classifiers in the whole forest:

• Bagging (also referred to as bootstrap aggregation). It is a technique that can
reduce variance and improve the generalization error performance. The basic idea
is to create a number (say B) of variants, X1 , X2 , X3 , ..., XB , of the training set, by
uniformly sampling from the original dataset X with replacement1 . For each of the
training set variants Xi , a tree Ti is constructed. The final decision is in favor of
the class predicted by the majority of the sub-classifiers, Ti (i = 1, 2, 3, ..., B). Note
that with replacement, we are not splitting the training data into smaller chunks
and training each tree on a different chunk. Rather, if we have a sample of size N ,
we are still feeding each tree a training set of size N (unless specified otherwise).
But instead of the original training data, we take a random sample of size N with
a certain level of data repetitiveness. For instance, if our training data was [1, 2,
3, 4, 5, 6] then we might give one of our trees the following list [1, 2, 2, 3, 6, 6].
Notice that both lists are of length six and that “2” and “6” are both repeated in
the randomly selected training data we give to our tree (because we sample with
replacement).
• Random feature selection. In a normal decision tree, when it is time to split
a node, we consider every possible feature and pick the one that produces the
most separation between the observations in the left node vs. those in the right
node using impurity measures. In contrast, each tree in a random forest can pick
only from a random subset of features. This forces more variation amongst the
trees in the model and ultimately results in lower correlation across trees and more
diversification. Figure 4 gives an illustration of random feature selection.

Figure 4: Random feature selection.

Finally, it must be stated that there are close similarities between the decision trees
(random forests) and the neural network classifiers. Both aim at forming complex de-
cision boundaries in the feature space. A major difference lies in the way decisions are
1
It means that we can select the same value multiple times.

5
Course Notes GEO5017: Machine Learning for the Built Environment

made. Decision trees (random forests) employ a hierarchically structured decision func-
tion sequentially. In contrast, neural networks utilize a set of soft (not final) decisions
in a parallel fashion. Furthermore, their training is performed via different philosophies.
However, despite their differences, it has been shown that linear tree classifiers (with a
linear splitting criterion) can be adequately mapped to an MLP structure. So far, from
the performance point of view, comparative studies seem to give an advantage to the
MLPs concerning the classification error, and an advantage to the decision trees (random
forests) concerning the required training time.

Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
33 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
DS535 Note 6 (Page1-14)
No ratings yet
DS535 Note 6 (Page1-14)
13 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Ni Hms 92537
No ratings yet
Ni Hms 92537
5 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Rf&DTfratello 2018
No ratings yet
Rf&DTfratello 2018
10 pages
Lesson 5.0 Supervised Learning With Decision Trees
No ratings yet
Lesson 5.0 Supervised Learning With Decision Trees
16 pages
Decision Trees: A Recent Overview: S. B. Kotsiantis
No ratings yet
Decision Trees: A Recent Overview: S. B. Kotsiantis
23 pages
Random Forest Explained
No ratings yet
Random Forest Explained
39 pages
Bayes and Decision Tree
No ratings yet
Bayes and Decision Tree
36 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
No ratings yet
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
32 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
37 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Decision Tree Construction in Machine Learning
No ratings yet
Decision Tree Construction in Machine Learning
1 page
Decision Tree
No ratings yet
Decision Tree
8 pages
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
No ratings yet
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
49 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
19 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
34 pages
Tree-Based Machine Learning Methods
100% (1)
Tree-Based Machine Learning Methods
138 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
49 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
17 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
48 pages
Heterogeneous Forests of Decision Trees: Lecture Notes in Computer Science August 2002
No ratings yet
Heterogeneous Forests of Decision Trees: Lecture Notes in Computer Science August 2002
7 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02
No ratings yet
CHTKT - DataScience - Chapter03 - Machine Learning With Python - 02
34 pages
Machine Learning: B.E, M.Tech, PH.D
No ratings yet
Machine Learning: B.E, M.Tech, PH.D
23 pages
Decision Trees for Business Analysts
No ratings yet
Decision Trees for Business Analysts
32 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
Random Forest
No ratings yet
Random Forest
5 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Decision Tree
0% (1)
Decision Tree
16 pages
Unit 2
No ratings yet
Unit 2
29 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Week-on-Week Business Metrics Guide
No ratings yet
Week-on-Week Business Metrics Guide
2 pages
Bit108 - Mvic June'23 - Fe QP - Revised
No ratings yet
Bit108 - Mvic June'23 - Fe QP - Revised
7 pages
Unit 7.6 - Sequencing and Pattern Recognition - Getting The Message Across - Qustions
No ratings yet
Unit 7.6 - Sequencing and Pattern Recognition - Getting The Message Across - Qustions
3 pages
Walchand Institute of Technology, Solapur Information Technology 2021-22 SEMESTER - I
No ratings yet
Walchand Institute of Technology, Solapur Information Technology 2021-22 SEMESTER - I
4 pages
CS506 Midterm Exam Questions
No ratings yet
CS506 Midterm Exam Questions
106 pages
Types of TM
No ratings yet
Types of TM
3 pages
Image Compression Techniques Overview
No ratings yet
Image Compression Techniques Overview
19 pages
Lab Report No-02: Delete and Traverse in Array
No ratings yet
Lab Report No-02: Delete and Traverse in Array
4 pages
Dsa Lecture 14 Graphs
No ratings yet
Dsa Lecture 14 Graphs
39 pages
CS301 FILE by Ali Raza
No ratings yet
CS301 FILE by Ali Raza
38 pages
Gramar and Vocabulary
100% (4)
Gramar and Vocabulary
254 pages
Class 3
No ratings yet
Class 3
7 pages
Iran TST 2018
No ratings yet
Iran TST 2018
3 pages
Company Specific Coding Questions
No ratings yet
Company Specific Coding Questions
5 pages
Automata Theory MCQs with Answers
No ratings yet
Automata Theory MCQs with Answers
3 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
17 pages
Factorising (Easy+Medium+Hard+Very Hard)
No ratings yet
Factorising (Easy+Medium+Hard+Very Hard)
14 pages
Aspiring Data Scientist's Portfolio
No ratings yet
Aspiring Data Scientist's Portfolio
2 pages
MRSPTU MCA Syllabus 2020 Overview
No ratings yet
MRSPTU MCA Syllabus 2020 Overview
18 pages
CHAPTER 2 Basics of C Programming
No ratings yet
CHAPTER 2 Basics of C Programming
32 pages
COMTool User Guide
No ratings yet
COMTool User Guide
20 pages
Python List Methods Explained: Set 2
No ratings yet
Python List Methods Explained: Set 2
3 pages
Binary Search Tree Operations Explained
No ratings yet
Binary Search Tree Operations Explained
5 pages
C Programming Concepts Guide
No ratings yet
C Programming Concepts Guide
3 pages
Research
No ratings yet
Research
4 pages
Movie Get Movie by Genre
No ratings yet
Movie Get Movie by Genre
2 pages
Mumbai University Computer Engineering Sem 3 Syllabus
No ratings yet
Mumbai University Computer Engineering Sem 3 Syllabus
49 pages
CS Syllabus 2019
No ratings yet
CS Syllabus 2019
14 pages
TPSEC04
No ratings yet
TPSEC04
12 pages
CoreJava Day 3 (Arrays, Static and Non Static)
No ratings yet
CoreJava Day 3 (Arrays, Static and Non Static)
15 pages

08 Notes DecisionTrees RandomForest

Uploaded by

08 Notes DecisionTrees RandomForest

Uploaded by

Decision Trees and Random Forests∗

February 20, 2025

Table 1: Truth labels for the XOR problem.

• Gini impurity. It’s given by

• Entropy impurity. It’s given by

Figure 3: A random forest model.

independence among the various tree classifiers in the whole forest:

Figure 4: Random feature selection.

You might also like