Random Forest Algorithm: Detailed Notes
Based on GeeksforGeeks and Towards Data Science
September 14, 2025
Contents
1 Random Forest Defination 2
2 Decision Tree Defination 2
3 Classification Tree vs Regression Tree 2
4 Why Decision Tree Learning is Greedy? Is it a Problem? How to
Overcome? 2
5 Advantages and Disadvantages of Decision Trees 3
6 Why Random Forest is Better than Single Decision Tree? 4
7 Limitations of Random Forest 4
8 Bagging in Random Forest, Out-of-Bag Error 4
9 Main Steps of Random Forest Algorithm 5
10 Advantages and Disadvantages of Random Forest 6
11 Assumptions of Random Forest 6
12 When to Use Random Forest Over Other Algorithms 6
1
1 Random Forest Defination
Random Forest is a machine learning algorithm that uses many decision trees to make
better predictions. Each tree looks at different random parts of the data and their results
are combined by voting for classification or averaging for regression which makes it as
ensemble learning technique. This helps in improving accuracy and reducing errors.
2 Decision Tree Defination
A Decision Tree helps us to make decisions by mapping out different choices and their
possible outcomes. It’s used in machine learning for tasks like classification and pre-
diction. In this article, we’ll see more about Decision Trees, their types and other core
concepts.
3 Classification Tree vs Regression Tree
• Classification Tree:
– Used for predicting categorical outcomes (discrete values)
– Final prediction determined by majority voting among terminal nodes
– Example: Predicting survival on Titanic (Survived/Not Survived)
– Evaluation metrics: Accuracy, Precision, Recall, F1-score
• Regression Tree:
– Used for predicting continuous numerical values
– Final prediction is the average of values in terminal nodes
– Example: Predicting house prices
– Evaluation metrics: Mean Squared Error (MSE), R-squared
4 Why Decision Tree Learning is Greedy? Is it a
Problem? How to Overcome?
Why Greedy?
• Decision trees use a greedy approach called recursive binary splitting
• At each step, the algorithm makes the locally optimal choice (best split based on
metrics like Gini impurity or information gain)
• Doesn’t consider future consequences of current decisions
• Aims to maximize immediate homogeneity in child nodes
Is it a Problem?
• Yes, because the locally optimal split may not lead to the globally optimal tree
2
• Can result in suboptimal trees that don’t capture the true underlying patterns
• Often leads to overfitting, especially with deep trees
How to Overcome?
• Pruning: Remove branches that provide little predictive power
• Ensemble Methods: Combine multiple trees (e.g., Random Forest)
• Setting Constraints: Limit tree depth, minimum samples per leaf, etc.
• Random Forest: Specifically addresses this by averaging multiple trees
5 Advantages and Disadvantages of Decision Trees
Advantages:
• Easy to Understand: Decision Trees are visual which makes it easy to follow the
decision-making process.
• Versatility: Can be used for both classification and regression problems.
• No Need for Feature Scaling: Unlike many machine learning models, it don’t
require us to scale or normalize our data.
• Handles Non-linear Relationships: It capture complex, non-linear relationships
between features and outcomes effectively.
• Interpretability: The tree structure is easy to interpret helps in allowing users to
understand the reasoning behind each decision.
• Handles Missing Data: It can handle missing values by using strategies like
assigning the most common value or ignoring missing data during splits
Disadvantages:
• Overfitting: They can overfit the training data if they are too deep which means
they memorize the data instead of learning general patterns. This leads to poor
performance on unseen data.
• Instability: It can be unstable which means that small changes in the data may
lead to significant differences in the tree structure and predictions.
• Bias towards Features with Many Categories: It can become biased toward
features with many distinct values which focuses too much on them and potentially
missing other important features which can reduce prediction accuracy.
• Difficulty in Capturing Complex Interactions: Decision Trees may struggle
to capture complex interactions between features which helps in making them less
effective for certain types of data.
• Computationally Expensive for Large Datasets: For large datasets, building
and pruning a Decision Tree can be computationally intensive, especially as the
tree depth increases.
3
6 Why Random Forest is Better than Single Decision
Tree?
• Reduces Overfitting: By averaging multiple trees, Random Forest reduces vari-
ance without increasing bias
• Improved Accuracy: Combines predictions from many trees (wisdom of the
crowd)
• Handles Noise Better: Less sensitive to noise and outliers in the data
• More Stable: Small changes in data don’t drastically affect the model
• Feature Importance: Provides better estimates of feature importance
• Handles Missing Data: Better at handling missing values
• Robust to Overfitting: Especially with many trees and proper feature sampling
7 Limitations of Random Forest
• Computational Complexity: More expensive to train and predict than single
trees
• Less Interpretability: Harder to visualize and understand than a single decision
tree
• Memory Intensive: Requires storing multiple trees in memory
• Slower Prediction: Prediction time increases with number of trees
• Bias Toward Features with Many Categories: Like decision trees, can be
biased toward features with many levels
• Not Ideal for Linear Relationships: May not perform as well as linear models
when relationships are truly linear
8 Bagging in Random Forest, Out-of-Bag Error
What is Bagging?
Bagging is the application of the Bootstrap procedure to a high-variance machine learn-
ing algorithm, typically decision [Link] is a general procedure that can be used to reduce
the variance for those algorithm that have high variance. An algorithm that has high
variance are decision trees, like classification and regression trees.
Added by Sanjoy: Bagging, or Bootstrap Aggregating, is an ensemble machine
learning technique that improves model accuracy and reduces variance by training mul-
tiple models on different subsets of the training data, generated through bootstrapping.
Each subset is created by sampling with replacement from the original dataset. The
4
predictions from these individual models are then combined, either by averaging for re-
gression tasks or by majority vote for classification tasks, to produce a more robust and
accurate final prediction.
How it’s Used in Random Forest?
Random forest is a modification of bagging that further improves the performance of the
model by introducing randomness in the feature selection process. In random forest, we
create multiple decision trees using a subset of the original features. In each node of the
tree, instead of using all the features, we randomly select a subset of features to split
the data. This process is repeated for each node, resulting in a decision tree that uses a
subset of the features.
By using a subset of the features at each node, random forest introduces diversity in
the decision trees, which further reduces the variance of the model. Moreover, the feature
selection process prevents the trees from being highly correlated, which is a problem in
bagging. Therefore, the random forest can improve the accuracy of a model by reducing
overfitting and increasing the diversity of the trees.
Out-of-Bag (OOB) Error
• OOB error is an estimate of prediction error using samples not included in bootstrap
• For each tree, about 1/3 of samples are not used in training (out-of-bag)
• These OOB samples can be used as a validation set
• OOB error is calculated by aggregating predictions on OOB samples across all trees
• Provides unbiased estimate of generalization error without needing a separate vali-
dation set
• Formula: OOB Error = Number of misclassified OOB samples
Total number of OOB samples
9 Main Steps of Random Forest Algorithm
1. Create Decision Trees: The algorithm makes many decision trees each using a
random part of the data. So every tree is a bit different.
2. Pick Random Features: When building each tree it doesn’t look at all the
features (columns) at once. It picks a few at random to decide how to split the
data. This helps the trees stay different from each other.
3. Each Tree Make Predictions: Every tree gives its own answer or prediction
based on what it learned from its part of the data.
4. Combine Prediction: For classification we choose a category as the final answer
is the one that most trees agree on i.e majority voting and for regression we predict
a number as the final answer is the average of all the trees predictions.
Updated by Sanjoy:
1. Bootstrap Sampling: Each tree gets its own unique training set, created by
randomly sampling from the original data with replacement. This means some
data points may appear multiple times while others aren’t used.
5
2. Random Feature Selection: When making a split, each tree only considers a
random subset of features (typically square root of total features).
3. Growing Trees: Each tree grows using only its bootstrap sample and selected fea-
tures, making splits until it reaches a stopping point (like pure groups or minimum
sample size).
4. Final Prediction: All trees vote together for the final prediction. For classifica-
tion, take the majority vote of class predictions; for regression, average the predicted
values from all trees.
10 Advantages and Disadvantages of Random Forest
Advantages:
• Random Forest provides very accurate predictions even with large datasets.
• Random Forest can handle missing data well without compromising with accuracy.
• It doesn’t require normalization or standardization on dataset.
• When we combine multiple decision trees it reduces the risk of overfitting of the
model.
Disadvantages:
• It can be computationally expensive especially with a large number of trees.
• It’s harder to interpret the model compared to simpler models like decision trees.
11 Assumptions of Random Forest
1. Each tree makes its own decisions: Every tree in the forest makes its own
predictions without relying on others.
2. Random parts of the data are used: Each tree is built using random samples
and features to reduce mistakes.
3. Enough data is needed: Sufficient data ensures the trees are different and learn
unique patterns and variety
4. Different predictions improve accuracy: Combining the predictions from
different trees leads to a more accurate final result.
12 When to Use Random Forest Over Other Algo-
rithms
• Handles Missing Data: It can work even if some data is missing so you don’t
always need to fill in the gaps yourself.
6
• Shows Feature Importance: It tells you which features (columns) are most
useful for making predictions which helps you understand your data better.
• Works Well with Big and Complex Data: It can handle large datasets with
many features without slowing down or losing accuracy.
• Used for Different Tasks: You can use it for both classification like predicting
types or labels and regression like predicting numbers or amounts
• Handles High-Dimensionality: Excels with datasets containing many features
(even p ≫ n cases) through feature randomness and subset selection.
• Robust to Noise and Outliers: Bagging and feature sampling reduce overfitting
and increase stability compared to single decision trees.
• Handles Mixed Data Types: Naturally manages both numerical and categorical
features without extensive preprocessing.
• Non-Parametric Flexibility: Captures complex non-linear relationships without
assuming data distributions.