0% found this document useful (0 votes)

12 views7 pages

Decision Tree

Uploaded by

Sanjoy Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

Decision Tree

Uploaded by

Sanjoy Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Random Forest Algorithm: Detailed Notes

Based on GeeksforGeeks and Towards Data Science

September 14, 2025

Contents
1 Random Forest Defination 2

2 Decision Tree Defination 2

3 Classification Tree vs Regression Tree 2

4 Why Decision Tree Learning is Greedy? Is it a Problem? How to

Overcome? 2

5 Advantages and Disadvantages of Decision Trees 3

6 Why Random Forest is Better than Single Decision Tree? 4

7 Limitations of Random Forest 4

8 Bagging in Random Forest, Out-of-Bag Error 4

9 Main Steps of Random Forest Algorithm 5

10 Advantages and Disadvantages of Random Forest 6

11 Assumptions of Random Forest 6

12 When to Use Random Forest Over Other Algorithms 6

1
1 Random Forest Defination
Random Forest is a machine learning algorithm that uses many decision trees to make
better predictions. Each tree looks at different random parts of the data and their results
are combined by voting for classification or averaging for regression which makes it as
ensemble learning technique. This helps in improving accuracy and reducing errors.

2 Decision Tree Defination

A Decision Tree helps us to make decisions by mapping out different choices and their
possible outcomes. It’s used in machine learning for tasks like classification and pre-
diction. In this article, we’ll see more about Decision Trees, their types and other core
concepts.

3 Classification Tree vs Regression Tree

• Classification Tree:

– Used for predicting categorical outcomes (discrete values)

– Final prediction determined by majority voting among terminal nodes
– Example: Predicting survival on Titanic (Survived/Not Survived)
– Evaluation metrics: Accuracy, Precision, Recall, F1-score

• Regression Tree:

– Used for predicting continuous numerical values

– Final prediction is the average of values in terminal nodes
– Example: Predicting house prices
– Evaluation metrics: Mean Squared Error (MSE), R-squared

4 Why Decision Tree Learning is Greedy? Is it a

Problem? How to Overcome?
Why Greedy?

• Decision trees use a greedy approach called recursive binary splitting

• At each step, the algorithm makes the locally optimal choice (best split based on
metrics like Gini impurity or information gain)

• Doesn’t consider future consequences of current decisions

• Aims to maximize immediate homogeneity in child nodes

Is it a Problem?

• Yes, because the locally optimal split may not lead to the globally optimal tree

2
• Can result in suboptimal trees that don’t capture the true underlying patterns
• Often leads to overfitting, especially with deep trees
How to Overcome?
• Pruning: Remove branches that provide little predictive power
• Ensemble Methods: Combine multiple trees (e.g., Random Forest)
• Setting Constraints: Limit tree depth, minimum samples per leaf, etc.
• Random Forest: Specifically addresses this by averaging multiple trees

5 Advantages and Disadvantages of Decision Trees

Advantages:
• Easy to Understand: Decision Trees are visual which makes it easy to follow the
decision-making process.
• Versatility: Can be used for both classification and regression problems.
• No Need for Feature Scaling: Unlike many machine learning models, it don’t
require us to scale or normalize our data.
• Handles Non-linear Relationships: It capture complex, non-linear relationships
between features and outcomes effectively.
• Interpretability: The tree structure is easy to interpret helps in allowing users to
understand the reasoning behind each decision.
• Handles Missing Data: It can handle missing values by using strategies like
assigning the most common value or ignoring missing data during splits
Disadvantages:
• Overfitting: They can overfit the training data if they are too deep which means
they memorize the data instead of learning general patterns. This leads to poor
performance on unseen data.
• Instability: It can be unstable which means that small changes in the data may
lead to significant differences in the tree structure and predictions.
• Bias towards Features with Many Categories: It can become biased toward
features with many distinct values which focuses too much on them and potentially
missing other important features which can reduce prediction accuracy.
• Difficulty in Capturing Complex Interactions: Decision Trees may struggle
to capture complex interactions between features which helps in making them less
effective for certain types of data.
• Computationally Expensive for Large Datasets: For large datasets, building
and pruning a Decision Tree can be computationally intensive, especially as the
tree depth increases.

3
6 Why Random Forest is Better than Single Decision
Tree?
• Reduces Overfitting: By averaging multiple trees, Random Forest reduces vari-
ance without increasing bias

• Improved Accuracy: Combines predictions from many trees (wisdom of the

crowd)

• Handles Noise Better: Less sensitive to noise and outliers in the data

• More Stable: Small changes in data don’t drastically affect the model

• Feature Importance: Provides better estimates of feature importance

• Handles Missing Data: Better at handling missing values

• Robust to Overfitting: Especially with many trees and proper feature sampling

7 Limitations of Random Forest

• Computational Complexity: More expensive to train and predict than single
trees

• Less Interpretability: Harder to visualize and understand than a single decision

tree

• Memory Intensive: Requires storing multiple trees in memory

• Slower Prediction: Prediction time increases with number of trees

• Bias Toward Features with Many Categories: Like decision trees, can be
biased toward features with many levels

• Not Ideal for Linear Relationships: May not perform as well as linear models
when relationships are truly linear

8 Bagging in Random Forest, Out-of-Bag Error

What is Bagging?
Bagging is the application of the Bootstrap procedure to a high-variance machine learn-
ing algorithm, typically decision [Link] is a general procedure that can be used to reduce
the variance for those algorithm that have high variance. An algorithm that has high
variance are decision trees, like classification and regression trees.

Added by Sanjoy: Bagging, or Bootstrap Aggregating, is an ensemble machine

learning technique that improves model accuracy and reduces variance by training mul-
tiple models on different subsets of the training data, generated through bootstrapping.
Each subset is created by sampling with replacement from the original dataset. The

4
predictions from these individual models are then combined, either by averaging for re-
gression tasks or by majority vote for classification tasks, to produce a more robust and
accurate final prediction.
How it’s Used in Random Forest?
Random forest is a modification of bagging that further improves the performance of the
model by introducing randomness in the feature selection process. In random forest, we
create multiple decision trees using a subset of the original features. In each node of the
tree, instead of using all the features, we randomly select a subset of features to split
the data. This process is repeated for each node, resulting in a decision tree that uses a
subset of the features.
By using a subset of the features at each node, random forest introduces diversity in
the decision trees, which further reduces the variance of the model. Moreover, the feature
selection process prevents the trees from being highly correlated, which is a problem in
bagging. Therefore, the random forest can improve the accuracy of a model by reducing
overfitting and increasing the diversity of the trees.
Out-of-Bag (OOB) Error

• OOB error is an estimate of prediction error using samples not included in bootstrap

• For each tree, about 1/3 of samples are not used in training (out-of-bag)

• These OOB samples can be used as a validation set

• OOB error is calculated by aggregating predictions on OOB samples across all trees

• Provides unbiased estimate of generalization error without needing a separate vali-

dation set

• Formula: OOB Error = Number of misclassified OOB samples

Total number of OOB samples

9 Main Steps of Random Forest Algorithm

1. Create Decision Trees: The algorithm makes many decision trees each using a
random part of the data. So every tree is a bit different.

2. Pick Random Features: When building each tree it doesn’t look at all the
features (columns) at once. It picks a few at random to decide how to split the
data. This helps the trees stay different from each other.

3. Each Tree Make Predictions: Every tree gives its own answer or prediction
based on what it learned from its part of the data.

4. Combine Prediction: For classification we choose a category as the final answer

is the one that most trees agree on i.e majority voting and for regression we predict
a number as the final answer is the average of all the trees predictions.

Updated by Sanjoy:

1. Bootstrap Sampling: Each tree gets its own unique training set, created by
randomly sampling from the original data with replacement. This means some
data points may appear multiple times while others aren’t used.

5
2. Random Feature Selection: When making a split, each tree only considers a
random subset of features (typically square root of total features).

3. Growing Trees: Each tree grows using only its bootstrap sample and selected fea-
tures, making splits until it reaches a stopping point (like pure groups or minimum
sample size).

4. Final Prediction: All trees vote together for the final prediction. For classifica-
tion, take the majority vote of class predictions; for regression, average the predicted
values from all trees.

10 Advantages and Disadvantages of Random Forest

Advantages:

• Random Forest provides very accurate predictions even with large datasets.

• Random Forest can handle missing data well without compromising with accuracy.

• It doesn’t require normalization or standardization on dataset.

• When we combine multiple decision trees it reduces the risk of overfitting of the
model.

Disadvantages:

• It can be computationally expensive especially with a large number of trees.

• It’s harder to interpret the model compared to simpler models like decision trees.

11 Assumptions of Random Forest

1. Each tree makes its own decisions: Every tree in the forest makes its own
predictions without relying on others.

2. Random parts of the data are used: Each tree is built using random samples
and features to reduce mistakes.

3. Enough data is needed: Sufficient data ensures the trees are different and learn
unique patterns and variety

4. Different predictions improve accuracy: Combining the predictions from

different trees leads to a more accurate final result.

12 When to Use Random Forest Over Other Algo-

rithms
• Handles Missing Data: It can work even if some data is missing so you don’t
always need to fill in the gaps yourself.

6
• Shows Feature Importance: It tells you which features (columns) are most
useful for making predictions which helps you understand your data better.

• Works Well with Big and Complex Data: It can handle large datasets with
many features without slowing down or losing accuracy.

• Used for Different Tasks: You can use it for both classification like predicting
types or labels and regression like predicting numbers or amounts

• Handles High-Dimensionality: Excels with datasets containing many features

(even p ≫ n cases) through feature randomness and subset selection.

• Robust to Noise and Outliers: Bagging and feature sampling reduce overfitting
and increase stability compared to single decision trees.

• Handles Mixed Data Types: Naturally manages both numerical and categorical
features without extensive preprocessing.

• Non-Parametric Flexibility: Captures complex non-linear relationships without

assuming data distributions.

Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
MLS 1 - Decision Trees and Random Forests
No ratings yet
MLS 1 - Decision Trees and Random Forests
16 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
No ratings yet
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
6 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
Da MS
No ratings yet
Da MS
24 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Random Forest Algorithm 1
100% (2)
Random Forest Algorithm 1
14 pages
Random Forest
No ratings yet
Random Forest
14 pages
Understanding Random Forest in ML
No ratings yet
Understanding Random Forest in ML
3 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forest, CNN and Different Algorithm
No ratings yet
Random Forest, CNN and Different Algorithm
14 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random Forest
No ratings yet
Random Forest
25 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Random Forest
No ratings yet
Random Forest
29 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Random Forest for ML Enthusiasts
No ratings yet
Random Forest for ML Enthusiasts
4 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random Forests
No ratings yet
Random Forests
43 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Random Forests
No ratings yet
Random Forests
35 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Random Forest
No ratings yet
Random Forest
27 pages
Random Forest
No ratings yet
Random Forest
6 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
25 June 2024 12:34: Random Fores Page 1
No ratings yet
25 June 2024 12:34: Random Fores Page 1
6 pages
Understanding Random Forests Algorithm
No ratings yet
Understanding Random Forests Algorithm
2 pages
Random Forest Classic Style
No ratings yet
Random Forest Classic Style
9 pages
Random Forest - Basics
100% (1)
Random Forest - Basics
9 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Random Forest Lecture
No ratings yet
Random Forest Lecture
5 pages
CE880 Lecture7 Slides
No ratings yet
CE880 Lecture7 Slides
78 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
Ca-Project: Aryan Devesh Puja Shabnas Mudit
No ratings yet
Ca-Project: Aryan Devesh Puja Shabnas Mudit
8 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
Random Forests
No ratings yet
Random Forests
2 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Understanding Random Forest Algorithm
No ratings yet
Understanding Random Forest Algorithm
8 pages
Tree-Based Methods Explained
No ratings yet
Tree-Based Methods Explained
68 pages
ML For Oil and Gas Using Python
No ratings yet
ML For Oil and Gas Using Python
9 pages
Supply Chain Analytics Syllabus PDF
No ratings yet
Supply Chain Analytics Syllabus PDF
5 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
Machine Learning Prediction of Web Crippling Strength in Cold For 2025 Struc
No ratings yet
Machine Learning Prediction of Web Crippling Strength in Cold For 2025 Struc
15 pages
Submitted By: Bazia Azmat Submitted To: Ma'am Shahinza Manzoor Roll Number: 1016 Program: BSCE-6 Course Title: DSP (Lab)
No ratings yet
Submitted By: Bazia Azmat Submitted To: Ma'am Shahinza Manzoor Roll Number: 1016 Program: BSCE-6 Course Title: DSP (Lab)
15 pages
Machine Learning in Business
No ratings yet
Machine Learning in Business
3 pages
7 Principles of Data Ethics Framework
No ratings yet
7 Principles of Data Ethics Framework
25 pages
AI Enablement Roadmap Ebook by Softo
No ratings yet
AI Enablement Roadmap Ebook by Softo
41 pages
Aniket Gurav: Data Scientist Profile
No ratings yet
Aniket Gurav: Data Scientist Profile
4 pages
Email Spam Classification
No ratings yet
Email Spam Classification
17 pages
Scikit Learn Docs PDF
No ratings yet
Scikit Learn Docs PDF
2,663 pages
Tema 4 PRIM 2023 24
No ratings yet
Tema 4 PRIM 2023 24
13 pages
Machine Learning in Software Effort Estimation
No ratings yet
Machine Learning in Software Effort Estimation
18 pages
Machine Learning Lab Guide for CSE
No ratings yet
Machine Learning Lab Guide for CSE
26 pages
Expert Help for Computer Science Dissertations
100% (2)
Expert Help for Computer Science Dissertations
6 pages
Large Language Models Are Zero Shot Text Classifiers
No ratings yet
Large Language Models Are Zero Shot Text Classifiers
9 pages
R2018 Final Year Electronics Scheme Syllabus 23april21
No ratings yet
R2018 Final Year Electronics Scheme Syllabus 23april21
65 pages
Nokia's Intelligent RAN with AI/ML
No ratings yet
Nokia's Intelligent RAN with AI/ML
5 pages
Department of Computer Science and Engineering (CSE)
No ratings yet
Department of Computer Science and Engineering (CSE)
11 pages
Project Earthquake Detector
No ratings yet
Project Earthquake Detector
7 pages
Awirose Richa Agrawal CV Engineer
No ratings yet
Awirose Richa Agrawal CV Engineer
1 page
AI Internship Offer Letter 2024
No ratings yet
AI Internship Offer Letter 2024
6 pages
Virtual Internship Report: AI & ML Developer
No ratings yet
Virtual Internship Report: AI & ML Developer
40 pages
Diffusion2GAN Distilling Diffusion Models Into Conditional GANs
No ratings yet
Diffusion2GAN Distilling Diffusion Models Into Conditional GANs
28 pages
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE332L TH VL2024250101754 2024-07-29 Reference-Material-I
85 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Lec 20
No ratings yet
Lec 20
33 pages
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
No ratings yet
Neural Representation of AND, OR, NOT, XOR and XNOR Logic Gates (Perceptron Algorithm)
14 pages
SVM - An Essay
No ratings yet
SVM - An Essay
1 page
Smart Music Player Project
No ratings yet
Smart Music Player Project
28 pages