0% found this document useful (0 votes)

16 views6 pages

Decision Tree Notes

Decision Trees are supervised learning algorithms used for classification and regression by splitting data based on feature values. They utilize impurity measures like Entropy and Gini Index to determine the best splits, and can handle non-linear data without feature scaling. Key processes include building trees through recursive splitting, evaluating model performance with cross-validation, and tuning hyperparameters for optimal results.

Uploaded by

prek012_686865590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views6 pages

Decision Tree Notes

Uploaded by

prek012_686865590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Comprehensive Notes on Decision Trees (for Examination)

Introduction to Decision Trees

• A supervised learning algorithm used for both classification and regression.

• Splits data into branches based on feature values until a decision/leaf node is reached.

Interpretation of Decision Trees

• Root Node: Feature giving maximum purity gain

• Internal Nodes: Decision points
• Leaf Nodes: Final predictions

Building Decision Trees

• Choose best feature at each step using impurity measures (Entropy, Gini, etc.)
• Recursively split until stopping condition is met

⚖ Tree Models vs Linear Models

• Trees: Handle non-linearity, no need for feature scaling

• Linear Models: Work best with linearly separable data

Memory Tip: "Trees branch smartly, lines draw plainly"

Decision Trees for Regression

• Predicts mean value of target in each region

• Uses Mean Squared Error (MSE) or Mean Absolute Error (MAE) to measure impurity

1 n
Formula: MSE = n ∑i=1 (yi − yˉ)2

Example:

• Leaf 1: [200, 210, 190] → Predicted y = 200

• Leaf 2: [400, 390, 410] → Predicted y = 400

1
Regression Tree Building Process

1. Calculate MSE of the target variable.

2. Split data based on attributes, compute MSE for each resulting node.
3. Subtract resulting MSE from original MSE → MSE Reduction.
4. Choose attribute with highest MSE reduction.
5. Repeat recursively until MSE is low and node is homogeneous.
6. Final prediction at leaf = average of target values.

⚖ Impurity Measures for Classification

1. Classification Error:

E = 1 − max(pi )

2. Gini Index:

k
G = ∑i=1 pi (1 − pi )

3. Entropy:

k
D = − ∑i=1 pi log2 (pi )

Memory Tip: "Entropy = Uncertainty, Gini = Diversity, Error = Simplicity"

Information Gain (Entropy-Based)

Gain = D − DA

• D: Entropy before split

• D_A: Weighted avg. entropy after split

Example:

• Parent: 2 of class A, 2 of class B → D = 1.0

• Split: Left = [A, A], Right = [B, B] → D_A = 0
• Gain = 1 - 0 = 1.0 (Perfect Split)

Splitting Based on Feature Type

1. Nominal Categorical Feature (k categories): 2k−1 − 1 splits

2. Ordinal or Continuous Feature (n values):

2
3. Sort values

4. Try n − 1 split points between values

Goal: Maximize homogeneity (minimize impurity)

📊 Weighted Post-Split Impurity:

nL nR
Post-Impurity = n DL + n DR ΔImpurity = D − Post-Impurity

Choose split with maximum gain (i.e., largest ∆ Impurity).

Disadvantages of Decision Trees

Disadvantage Explanation

Overfitting Trees grow too deep, memorize data

Instability Small data changes → large tree change

Bias Favors features with many levels

Poor linear fit Can’t model smooth linear trends

Memory Tip: "D.O.S.E.: Deep trees, Outliers, Sensitive, Easy to overfit"

✂ Tree Truncation vs Tree Pruning

Method Description Risk

Truncation Pre-pruning, stop early Underfitting

Pruning Post-pruning, cut weak branches Better generalization

Hyperparameter Tuning for Trees

Hyperparameter Effect

max_depth Controls depth, prevents overfitting

min_samples_split Minimum samples to split node

min_samples_leaf Minimum samples per leaf

max_features Max features to consider per split

3
Hyperparameter Effect

criterion Impurity function: Gini, Entropy, MSE

ccp_alpha Cost complexity pruning

Tip: Use Grid Search or Randomized Search + Cross-Validation

🔍 Feature Importance

• Importance = Total impurity reduction caused by the feature across all splits

Example:

Feature Total Gain

Income 0.40

Age 0.04

City 0.01

Memory Tip: "More impurity it kills, more important it feels."

Log Base 2 Calculation (log₂)

log10 (x) log(x) ln(x)
Formula: log2 (x) = log10 (2) ≈ 0.3010 Or using natural log: log2 (x) = 0.6931

Steps on calculator:

1. Use log(x) or ln(x)

2. Divide by log(2) or ln(2)

🔍 Entropy with Equal Classes (3 Classes Example)

• Class Distribution: [A, B, C] → each with 1/3

Entropy = −3 × ( 13 log2 ( 13 )) = log2 (3) ≈ 1.585

Memory Tip: Maximum entropy = log2 (k) where k = number of equally likely classes.

4
⭐ Cross-Validation & K-Fold Cross-Validation

Cross-Validation

• Technique to evaluate model performance more reliably than a single train-test split
• Helps avoid overfitting by testing on multiple data subsets

♻ K-Fold Cross-Validation

1. Split dataset into K equal parts (folds)

2. For each fold:
3. Use it as test set, others as training set
4. Train and evaluate
5. Compute average score across K runs

Example (K=5):

• Run model 5 times, each time a different fold is test set

Memory Tip: "K parts, K turns as test. Judge by average."

GridSearchCV (Grid Search with Cross-Validation)

• Used to find the best hyperparameters for a model

• Tries all combinations of given parameters using cross-validation

Steps:

1. Define parameter grid:

param_grid = {
'max_depth': [3, 5, 10],
'min_samples_split': [2, 5, 10]
}

1. Apply GridSearchCV:

from sklearn.model_selection import GridSearchCV

model = DecisionTreeClassifier()
grid_search = GridSearchCV(model, param_grid, cv=5)
grid_search.fit(X, y)

1. Best params:

grid_search.best_params_

5
Memory Tip: "Grid = Try All, CV = Test All"

Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Trees and Random Forest Overview
No ratings yet
Decision Trees and Random Forest Overview
79 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Act 9
No ratings yet
Act 9
22 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML-chap9 2024 110217
No ratings yet
ML-chap9 2024 110217
52 pages
Geometric Intuition of Decision Trees
No ratings yet
Geometric Intuition of Decision Trees
7 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
49 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
3 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Unit 2
No ratings yet
Unit 2
29 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
7 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
54 pages
Decision Trees and Ensemble Methods Guide
No ratings yet
Decision Trees and Ensemble Methods Guide
11 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
11 pages
Chapter 7 Supervised Learning
No ratings yet
Chapter 7 Supervised Learning
71 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Bayes and Decision Tree
No ratings yet
Bayes and Decision Tree
36 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees: Types and Terminologies
No ratings yet
Decision Trees: Types and Terminologies
17 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Decision Trees for Business Analysts
No ratings yet
Decision Trees for Business Analysts
32 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Decision Tree Classification Overview
No ratings yet
Decision Tree Classification Overview
37 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
06 - Decision Trees
No ratings yet
06 - Decision Trees
14 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Lecture-7 Machine Learning With Python
No ratings yet
Lecture-7 Machine Learning With Python
42 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
100 AdaBoost and GBM MCQs
No ratings yet
100 AdaBoost and GBM MCQs
28 pages
Slanting Lines
No ratings yet
Slanting Lines
1 page
Adjective
No ratings yet
Adjective
1 page
Possessive Apostrophes Homework Merged
No ratings yet
Possessive Apostrophes Homework Merged
62 pages
Kalam Memory Palace Worksheet
No ratings yet
Kalam Memory Palace Worksheet
2 pages
Here Are Ten Engaging and Entertaining Riddles in The Requested Table Format
No ratings yet
Here Are Ten Engaging and Entertaining Riddles in The Requested Table Format
1 page
Data Warehouse Is A Collection of Data. It Has The Following Properties
No ratings yet
Data Warehouse Is A Collection of Data. It Has The Following Properties
6 pages
Grade 4 Coordinate Grid Picture C
No ratings yet
Grade 4 Coordinate Grid Picture C
2 pages
CK or K Blanks
No ratings yet
CK or K Blanks
1 page
Quiz-Possessive-S-Merged (1) - Pages
No ratings yet
Quiz-Possessive-S-Merged (1) - Pages
22 pages
Problem Statement For B-Tree: Functionalities: Insertion
No ratings yet
Problem Statement For B-Tree: Functionalities: Insertion
7 pages
Enhancing Plant Disease Detection: A Novel CNN-Based Approach With Tensor Subspace Learning and HOWSVD-MDA
No ratings yet
Enhancing Plant Disease Detection: A Novel CNN-Based Approach With Tensor Subspace Learning and HOWSVD-MDA
17 pages
Laplace Transform Formulas Guide
No ratings yet
Laplace Transform Formulas Guide
4 pages
Activity Report - Introduction To Stat
No ratings yet
Activity Report - Introduction To Stat
3 pages
Digital Data Characteristics & Sampling
No ratings yet
Digital Data Characteristics & Sampling
6 pages
Transform Exponential Functions
No ratings yet
Transform Exponential Functions
28 pages
Matrix CRAM
No ratings yet
Matrix CRAM
14 pages
MTH210
No ratings yet
MTH210
126 pages
Applied Mathematics and Computation: C. Clavero, J.C. Jorge
No ratings yet
Applied Mathematics and Computation: C. Clavero, J.C. Jorge
1 page
Fast Fourier Transforms: Quote of The Day
No ratings yet
Fast Fourier Transforms: Quote of The Day
13 pages
Tripos Guide
No ratings yet
Tripos Guide
8 pages
Lagrange's Equation of Motion - DPP 01
No ratings yet
Lagrange's Equation of Motion - DPP 01
3 pages
Notes On Advanced Quantun Mechanics by TU Delft
No ratings yet
Notes On Advanced Quantun Mechanics by TU Delft
274 pages
Introduction To Signals and Operations: Continuous-Time
No ratings yet
Introduction To Signals and Operations: Continuous-Time
12 pages
Ch.03 Modeling in Time Domain
No ratings yet
Ch.03 Modeling in Time Domain
11 pages
Introduction To Management Science: Thirteenth Edition, Global Edition
No ratings yet
Introduction To Management Science: Thirteenth Edition, Global Edition
41 pages
BCSE204L FAT Model Question
No ratings yet
BCSE204L FAT Model Question
2 pages
Cs 6505
No ratings yet
Cs 6505
204 pages
Python Tuples for Beginners
No ratings yet
Python Tuples for Beginners
4 pages
Ex 4.2 Forecasting Stock Price Closure For Each Day
No ratings yet
Ex 4.2 Forecasting Stock Price Closure For Each Day
4 pages
Understanding the Dirac Delta Function
No ratings yet
Understanding the Dirac Delta Function
3 pages
Module - 2 Ver 1.4
No ratings yet
Module - 2 Ver 1.4
35 pages
Constraint Satisfaction Problems
No ratings yet
Constraint Satisfaction Problems
10 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
Metaheuristic Algorithms For Solar Radiation Prediction A Systematic Analysis
No ratings yet
Metaheuristic Algorithms For Solar Radiation Prediction A Systematic Analysis
23 pages
Advanced Accounting Solutions
No ratings yet
Advanced Accounting Solutions
8 pages
Disjunctive Rule Learning Methods
No ratings yet
Disjunctive Rule Learning Methods
8 pages
sm5 084
No ratings yet
sm5 084
3 pages
AMA 63022 vs SOA Exam Guidelines
No ratings yet
AMA 63022 vs SOA Exam Guidelines
2 pages
Deep Learning Technique Syllabus
100% (1)
Deep Learning Technique Syllabus
2 pages
WDM01 01 Que 20180118
No ratings yet
WDM01 01 Que 20180118
32 pages