Lecture2 Random Forest

The document explains the Random Forest machine learning model, which utilizes multiple decision trees for classification and regression through bootstrapping and bagging techniques. It details the calculation of GINI impurity to determine the best splits in decision trees and outlines the steps for implementing the Random Forest classification method. Additionally, it provides examples of dataset manipulation and decision tree construction to illustrate the process.

Uploaded by

alvi.ibn.amzad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views23 pages

Lecture2 Random Forest

Uploaded by

alvi.ibn.amzad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Decision Making through Random

Forest

Md. Golam Rabiul Alam

Associate Professor, BRAC University
Random Forest
Random forest is a decision tree based non-
linear machine learning model for classification,
regression and feature selection.
Random Forest
 The word “Random” is for random selection of data
instances, which is known as bootstrapping method in
statistics an ML as well.

 The word “Forest” is for using several decision trees in

developing decision models through bagging method.
Random Forest
GINI Impurity:

The GINI Impurity of a node is the probability that a randomly chosen

sample in a node would be incorrectly labeled if it was labeled by the
distribution of samples in the node.

The GINI impurity can be computed by summing the probability pi of an

item with label i being chosen times the probability  p  1  p of a mistake
k i
k i

in categorizing that item.

It reaches its minimum (zero) when all cases in the node fall into a single
target category.
Random Forest
Random Forest

Find the GINI impurity from the given date?

Random Forest
Random Forest
 How to split the root node? Which splitting is better?
Random Forest
Random Forest
Steps in Random Forest Classification Method:
 1. Bootstrapping for random data subset generation
 2. Decision tree construction for each of the data subset
 i) Determination of GINI impurity of each of the features.
 Ii) Determination of GINI impurity of prospective splitting
sub-tree
 Iii) Construction of Decision tree based on the splitting
GINI impurity (i.e. if sum of the GINI impurity of splitted
sub-tree is lower than the GINI impurity of parent node
then split the parent node)
 3. Bagging for ensemble classification
 4. Majority voting for classification decision making.
Implement Random forest on the given dataset
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Bootstrapped Dataset 1
Day Outlook Temparature Humidity Wind Play Tennis
Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day12 Overcast Mild High Strong Yes
Day13 Overcast Hot Normal Weak Yes
Day14 Rain Mild High Strong No
Day2 Sunny Hot High Strong No

Create decision trees using random subset of variables

or columns [ Here, we considered only 2 columns
randomly]
Day Temparature Humidity Play Tennis
Day10 Mild Normal Yes
Day11 Mild Normal Yes
Day12 Mild High Yes
Day13 Hot Normal Yes
Day14 Mild High No
Day2 Hot High No
Calculations
Temperature Humidity
Mild [Yes: 3, No: 1] High [Yes: 1, No: 2]
Hot [Yes: 1, No: 1] Normal [Yes: 3, No: 0]
GINI(Temperature=Mild) GINI(Humidity = High)
=1-(3/4)^2-(1/4)^2= 1-0.5625- = 1 -(1/3)^2-(2/3)^2= 1 -
0.0625 = 0.375 0.1111 - 0.4444 = 0.444
GINI(Temperature = Hot) GINI(Humidity = Normal)
= 1-(1/2)^2-(1/2)^2 = 0.5 = 1-(3/3)^2-(0/3)^2 = 1-1-0 = 0
Now, Gini impurity of parent GINI(Humidity) = (3/6)* 0.444
node = weighted average of + (3/6)*0 = 0.22223
Gini impurities of leaf nodes.
GINI(Temperature) =
(4/6)*0.375 + (2/6)*0.5 =
0.417
Calculations

Now, we should consider for next level nodes for better separation

Day Outlook Tempara Humidity Wind Play Day Outlook Temparat Play
ture Tennis ure Tennis
Day12 Overcast Mild High Strong Yes Day12 Overcast Mild Yes
Day14 Rain Mild High Strong No Day14 Rain Mild No
Day2 Sunny Hot High Strong No Day2 Sunny Hot No
Calculations
Temperature Outlook
Mild [Yes: 1, No: 1] Sunny [Yes: 0, No: 1]
Hot [Yes: 0, No: 1] Overcast [Yes: 1, No: 0]
GINI(Temperature=Mild)= Rain [Yes: 0, No: 1]
1-(1/2)^2-(1/2)^2= 0.5 GINI(Outlook=sunny) = 0
GINI(Temperature = Hot) = GINI(Outlook= Overcast) = 0
1-(0/1)^2-(1/1)^2 = 1-0-1=0 GINI(Outlook= Rain) = 0
Now, Now,
Gini impurity of parent node = Gini impurity of parent node =
weighted average of Gini weighted average of Gini
impurities of leaf nodes impurities of leaf nodes
GINI(Temperature) = (2/3)*0.5 GINI(Outlook) = (1/3)*0 +
+ (1/3)*0 = 0.333 (1/3)*0 + (1/3)*0 = 0
Calculations

Day Outlook Temparature Humidity Wind Play Tennis

Day10 Rain Mild Normal Weak Yes
Day11 Sunny Mild Normal Strong Yes
Day13 Overcast Hot Normal Weak Yes
Bootstrapped dataset creation-2
Day Outlook Temparature Humidity Wind Play Tennis
Day1 Sunny Hot High Weak No
Day2 Sunny Hot High Strong No
Day3 Overcast Hot High Weak Yes
Day4 Rain Mild High Weak Yes
Day5 Rain Cool Normal Weak Yes
Day2 Sunny Hot High Strong No

2. Create decision trees using random subset of variables

or columns [ Here, we considered only 2 columns
randomly] from Bootstrapped dataset
Day Outlook Temparature Play Tennis
Day1 Sunny Hot No
Day2 Sunny Hot No
Day3 Overcast Hot Yes
Day4 Rain Mild Yes
Day5 Rain Cool Yes
Day2 Sunny Hot No
3. Calculations
Outlook
Sunny [Yes: 0, No: 3]
Overcast [Yes: 1, No: 0]
Rain [Yes: 2, No: 0]
GINI(Outlook=sunny) = 1 - (0/3)^2-(3/3)^2 = 1 - 0 - 1 = 0
GINI(Outlook= Overcast) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Outlook= Rain) = 1 - (2/2)^2-(0/2)^2 = 1 - 1 - 0 = 0
Now,
GINI impurity of parent node = weighted average of Gini
impurities of leaf nodes

GINI(Outlook) = (3/6)0 + (1/6)0 + (2/6)*0 = 0

3. Calculations (cont…)
Temperature
Hot [Yes: 1, No: 3]
Mild [Yes: 1, No: 0]
Cool [Yes: 1, No: 0]
GINI(Temperature=Hot)= 1-(1/4)^2-(3/4)^2= 1-0.0625-0.5625
= 0.375
GINI(Temperature=Mild) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0
GINI(Temperature=Cool) = 1 - (1/1)^2-(0/1)^2 = 1 - 1 - 0 = 0

GINI(Temperature) = (4/6)* 0.375 + (1/6)0 + (1/6)0 = 0.25

The lowest impurity means, the feature with lowest impurity
separates the classes well.
As GINI(Outlook) < GINI(Temperature), so Outlook will be
in the root of our decision tree.

Now, we should consider for next level nodes for better

separation.
Bootstrapped Dataset 3
Day Outlook Temparature Humidity Wind Play Tennis
Day6 Rain Cool Normal Strong No
Day7 Overcast Cool Normal Strong Yes
Day8 Sunny Mild High Weak No
Day9 Sunny Cool Normal Weak Yes
Day10 Rain Mild Normal Weak Yes
Day13 Overcast Hot Normal Weak Yes

Create decision trees using random subset of variables

or columns [ Here, we considered only 2 columns
randomly]
Day Humidity Wind Play Tennis
Day6 Normal Strong No
Day7 Normal Strong Yes
Day8 High Weak No
Day9 Normal Weak Yes
Day10 Normal Weak Yes
Day13 Normal Weak Yes
NOW, A Query:
Day Outlook Temparature Humidity Wind Play Tennis
Day13 Overcast Hot Normal Weak Yes

Bagging = Yes: 1 Bagging = Yes: 2

If Tree 3 result is NO.

Then Bagging: Yes: 2, No: 1
So, Final result of the query is YES
Calculations
Humidity Wind
High [Yes: 1, No: 2] Strong [Yes: 0, No: 2]
Normal [Yes: 3, No: 0] Weak [Yes: 0, No: 1]
GINI(Humidity = High) = 1 - GINI(Wind = Strong)=1 -
(1/3)^2-(2/3)^2= 1 - 0.1111 - (0/2)^2-(2/2)^2= 1 - 0 - 1 = 0
0.4444 = 0.444 GINI(Wind = Weak) = 1 -
GINI(Humidity = Normal) = 1 - (0/1)^2-(1/1)^2 = 1 - 0 - 1 = 0
(3/3)^2-(0/3)^2 = 1 - 1 - 0 = 0
GINI(Humidity) = (3/6)* 0.444 GINI(Wind) = (2/3)* 0 +
+ (3/6)*0 = 0.22223 (1/3)*0 = 0

As GINI(Wind) = GINI(Humidity), so Wind or Humidity will be

the level 2 factor of our decision tree.

Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Session 6 - Decision Tree
No ratings yet
Session 6 - Decision Tree
37 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
18 pages
Examples
No ratings yet
Examples
8 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Unit IV - Decision Tree With ID3
No ratings yet
Unit IV - Decision Tree With ID3
8 pages
Lecture1 SML-I Merged
No ratings yet
Lecture1 SML-I Merged
157 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Week - 2 Day - 2 Machine Learning 2 - 3
No ratings yet
Week - 2 Day - 2 Machine Learning 2 - 3
33 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
Bagging, Boosting, Decision Trees, Random Forest
No ratings yet
Bagging, Boosting, Decision Trees, Random Forest
19 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Decision Tree Fundamentals in BI
No ratings yet
Decision Tree Fundamentals in BI
42 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Machine Learning Chapter 4
No ratings yet
Machine Learning Chapter 4
9 pages
Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
76 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
2 - 4 Cart
No ratings yet
2 - 4 Cart
16 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Trees for Tax Evasion Analysis
No ratings yet
Decision Trees for Tax Evasion Analysis
19 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Gini Index
No ratings yet
Gini Index
28 pages
Inductive Inference with Decision Trees
No ratings yet
Inductive Inference with Decision Trees
53 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
CSC454 10
No ratings yet
CSC454 10
36 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Lecture 05
No ratings yet
Lecture 05
63 pages
DMMLASSIGNMENT
No ratings yet
DMMLASSIGNMENT
36 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
ML Unit Iii
No ratings yet
ML Unit Iii
18 pages
Decision Tree Classifier-CART
No ratings yet
Decision Tree Classifier-CART
29 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Tree Models
No ratings yet
Tree Models
42 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Random Forest
No ratings yet
Random Forest
5 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
3 Decision Trees
No ratings yet
3 Decision Trees
41 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Capacitor, MOSFET - IV Characteristics
No ratings yet
Capacitor, MOSFET - IV Characteristics
23 pages
Signal Strength
No ratings yet
Signal Strength
18 pages
Verilog Intro
No ratings yet
Verilog Intro
112 pages
Note 13 CNN 3
No ratings yet
Note 13 CNN 3
4 pages
Wa0002.
No ratings yet
Wa0002.
28 pages
Examen Java Otras
No ratings yet
Examen Java Otras
14 pages
P211 OrderForm - v5 - 112019
No ratings yet
P211 OrderForm - v5 - 112019
6 pages
Plant Resources in Tehuacán Valley
No ratings yet
Plant Resources in Tehuacán Valley
38 pages
Light-Reflection and Refraction
No ratings yet
Light-Reflection and Refraction
47 pages
Statistics Quiz for Students
No ratings yet
Statistics Quiz for Students
3 pages
Chapter 2 Controller Principle
100% (1)
Chapter 2 Controller Principle
61 pages
Business Statistics-B.com I
0% (1)
Business Statistics-B.com I
3 pages
Catalog Recovery Without Importing Tapes
No ratings yet
Catalog Recovery Without Importing Tapes
7 pages
Learning To learn-Report-Hautamäki
100% (1)
Learning To learn-Report-Hautamäki
103 pages
Formule Matematiche Essenziali
No ratings yet
Formule Matematiche Essenziali
5 pages
Pragmatics 2025
No ratings yet
Pragmatics 2025
88 pages
PPE Safety Concerns and Solutions
No ratings yet
PPE Safety Concerns and Solutions
4 pages
Evaluación Diagnóstica - 2022 Let's Remember!: Pre A1
No ratings yet
Evaluación Diagnóstica - 2022 Let's Remember!: Pre A1
6 pages
Football Player Attribute Guide
No ratings yet
Football Player Attribute Guide
4 pages
Humes Aesthetics The Literature and Dire
No ratings yet
Humes Aesthetics The Literature and Dire
41 pages
A World of Regions
100% (4)
A World of Regions
20 pages
College Board Top 100 Common SAT ACT Vocabulary Words
No ratings yet
College Board Top 100 Common SAT ACT Vocabulary Words
4 pages
Civil Engineering Field Attachment Repot
No ratings yet
Civil Engineering Field Attachment Repot
62 pages
HR Planning for Business Leaders
No ratings yet
HR Planning for Business Leaders
13 pages
Fractions on a Number Line Lesson Plan
No ratings yet
Fractions on a Number Line Lesson Plan
6 pages
Computer Graphics Lab (CSP-305) Worksheet-3
No ratings yet
Computer Graphics Lab (CSP-305) Worksheet-3
4 pages
Complex Numbers Exercise
No ratings yet
Complex Numbers Exercise
4 pages
Section Period Activity/Method Diagram Activity Warm Up 5-10 Minutes Organization
No ratings yet
Section Period Activity/Method Diagram Activity Warm Up 5-10 Minutes Organization
9 pages
Activities 8
No ratings yet
Activities 8
3 pages
American Lit II
No ratings yet
American Lit II
8 pages
Sound Simplifications in Speech
No ratings yet
Sound Simplifications in Speech
3 pages
Multivariable Calculus - Mathematics - MIT OpenCourseWare
100% (1)
Multivariable Calculus - Mathematics - MIT OpenCourseWare
2 pages
Working of Single Phase Induction Motors
No ratings yet
Working of Single Phase Induction Motors
4 pages
Longman Dictionary of Language Teaching and Applied Linguistics
No ratings yet
Longman Dictionary of Language Teaching and Applied Linguistics
8 pages
Work Immersion Performance Evaluation
89% (9)
Work Immersion Performance Evaluation
4 pages