0% found this document useful (0 votes)

28 views4 pages

Trees - Regression - Ipynb - Colab

This document presents a notebook on using decision trees for regression, specifically utilizing the penguins dataset to illustrate the differences between regression and classification settings. It demonstrates how decision trees make predictions through piecewise constant functions and compares them to linear regression models. The notebook also explores the impact of tree depth on prediction complexity in regression tasks.

Uploaded by

whizbainz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views4 pages

Trees - Regression - Ipynb - Colab

Uploaded by

whizbainz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

2/5/25, 2:00 PM trees_regression.

ipynb - Colab

keyboard_arrow_down Decision tree for regression

In this notebook, we present how decision trees are working in regression problems. We show differences with the decision trees previously
presented in a classification setting.

First, we load the penguins dataset specifically for solving a regression problem.

Start coding or generate with AI.

import pandas as pd

url = "https://raw.githubusercontent.com/PhilChodrow/PIC16B/master/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url)

penguins = penguins[penguins['Flipper Length (mm)'].notna()]

feature_name = "Flipper Length (mm)"

target_name = "Body Mass (g)"
data_train, target_train = penguins[[feature_name]], penguins[target_name]

To illustrate how decision trees predict in a regression setting, we create a synthetic dataset containing some of the possible flipper length
values between the minimum and the maximum of the original data.

import numpy as np

data_test = pd.DataFrame(
np.arange(data_train[feature_name].min(), data_train[feature_name].max()),
columns=[feature_name],
)

Using the term "test" here refers to data that was not used for training. It should not be confused with data coming from a train-test split, as it
was generated in equally-spaced intervals for the visual evaluation of the predictions.

Note that this is methodologically valid here because our objective is to get some intuitive understanding on the shape of the decision
function of the learned decision trees.

However, computing an evaluation metric on such a synthetic test set would be meaningless since the synthetic dataset does not follow the
same distribution as the real world data on which the model would be deployed.

import matplotlib.pyplot as plt

import seaborn as sns

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
_ = plt.title("Illustration of the regression dataset used")

We first illustrate the difference between a linear model and a decision tree.

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 1/4
2/5/25, 2:00 PM trees_regression.ipynb - Colab

from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(data_train, target_train)
target_predicted = linear_model.predict(data_test)

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(data_test[feature_name], target_predicted, label="Linear regression")
plt.legend()
_ = plt.title("Prediction function using a LinearRegression")

On the plot above, we see that a non-regularized LinearRegression is able to fit the data. A feature of this model is that all new predictions
will be on the line.

ax = sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(
data_test[feature_name],
target_predicted,
label="Linear regression",
linestyle="--",
)
plt.scatter(
data_test[::3],
target_predicted[::3],
label="Predictions",
color="tab:orange",
)
plt.legend()
_ = plt.title("Prediction function using a LinearRegression")

Contrary to linear models, decision trees are non-parametric models: they do not make assumptions about the way data is distributed. This
affects the prediction scheme. Repeating the above experiment highlights the differences.

from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=1)
tree.fit(data_train, target_train)
target_predicted = tree.predict(data_test)

sns.scatterplot(
data=penguins, x=feature_name, y=target_name, color="black", alpha=0.5
)
plt.plot(data_test[feature_name], target_predicted, label="Decision tree")
plt.legend()
_ = plt.title("Prediction function using a DecisionTreeRegressor")

We see that the decision tree model does not have an a priori distribution for the data and we do not end-up with a straight line to regress
flipper length and body mass.

Instead, we observe that the predictions of the tree are piecewise constant. Indeed, our feature space was split into two partitions. Let's
check the tree structure to see what was the threshold found during the training.

from sklearn.tree import plot_tree

_, ax = plt.subplots(figsize=(8, 6))
_ = plot_tree(tree, feature_names=[feature_name], ax=ax)

The threshold for our feature (flipper length) is 206.5 mm. The predicted values on each side of the split are two constants: 3698.71 g and
5032.36 g. These values corresponds to the mean values of the training samples in each partition.

In classification, we saw that increasing the depth of the tree allowed us to get more complex decision boundaries. Let's check the effect of
increasing the depth in a regression setting:

tree = DecisionTreeRegressor(max_depth=3)
tree.fit(data_train, target_train)
target_predicted = tree.predict(data_test)

Increasing the depth of the tree increases the number of partitions and thus the number of constant values that the tree is capable of
predicting.

In this notebook, we highlighted the differences in behavior of a decision tree used in a classification problem in contrast to a regression
problem.

https://colab.research.google.com/drive/1tWhPW0-421_AIeLqjY87yUjjB_1CuSkv?usp=sharing#scrollTo=1b551029 4/4

Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
Notes 221104 101858
No ratings yet
Notes 221104 101858
32 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
No ratings yet
Write A Program To Demonstrate Decision Tree Algorithm For A Classification Problem and Perform Parameter Tuning For Better Results
5 pages
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
No ratings yet
Lecture 7 - Decision Tree Regression Imran 19032025 103416am
40 pages
Trees - Classification - Ipynb - Colab
No ratings yet
Trees - Classification - Ipynb - Colab
6 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
DT R
No ratings yet
DT R
2 pages
DA Lab Week-3
No ratings yet
DA Lab Week-3
15 pages
MIS410 Chapter6
No ratings yet
MIS410 Chapter6
47 pages
Exp 5
No ratings yet
Exp 5
3 pages
Decision Tree Algorithm in Healthcare AI
No ratings yet
Decision Tree Algorithm in Healthcare AI
10 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Unit IV
No ratings yet
Unit IV
36 pages
Session - 11 Decision Tree Regression Model
No ratings yet
Session - 11 Decision Tree Regression Model
2 pages
of Decision Tree
No ratings yet
of Decision Tree
14 pages
ML Exp6
No ratings yet
ML Exp6
3 pages
ML Using Python Programs
No ratings yet
ML Using Python Programs
12 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Random Forest
No ratings yet
Random Forest
25 pages
Week 2 Watermark
No ratings yet
Week 2 Watermark
84 pages
DTC Algorithm Implementation Guide
No ratings yet
DTC Algorithm Implementation Guide
7 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Exp4 - Supervised Learning
No ratings yet
Exp4 - Supervised Learning
10 pages
Exp 3 121a1047 Lavanya Kurup ML
No ratings yet
Exp 3 121a1047 Lavanya Kurup ML
4 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Day 3 Assignment
No ratings yet
Day 3 Assignment
4 pages
Machine Learning Regression Guide
No ratings yet
Machine Learning Regression Guide
6 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
Tutorial 6
No ratings yet
Tutorial 6
8 pages
Class Tree
No ratings yet
Class Tree
36 pages
ML Lab-1
No ratings yet
ML Lab-1
32 pages
Artificial Intelligence Lab 8
No ratings yet
Artificial Intelligence Lab 8
6 pages
771 A18 Lec3
No ratings yet
771 A18 Lec3
83 pages
5b Python Implementation of Decision Tree
No ratings yet
5b Python Implementation of Decision Tree
7 pages
ML Lab Programs 2
No ratings yet
ML Lab Programs 2
16 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Understanding Response Variables in ML
No ratings yet
Understanding Response Variables in ML
10 pages
Understanding Regression Trees in ML
No ratings yet
Understanding Regression Trees in ML
11 pages
68545ce22d7c3
No ratings yet
68545ce22d7c3
3 pages
Random Forest
No ratings yet
Random Forest
2 pages
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
100% (2)
Foundations of Machine Learning: Module 2: Linear Regression and Decision Tree
16 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Decision Tree Regression
No ratings yet
Decision Tree Regression
2 pages
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Sklearn Decision Tree Basics
No ratings yet
Sklearn Decision Tree Basics
7 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
ML Lab Record2
No ratings yet
ML Lab Record2
42 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 7
23 pages
08 09 10 Cross Validation and Decision Trees
No ratings yet
08 09 10 Cross Validation and Decision Trees
15 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
CE880 Lecture7 Slides
No ratings yet
CE880 Lecture7 Slides
78 pages
CSET301 LabW8L2
No ratings yet
CSET301 LabW8L2
1 page
Experiment 8
No ratings yet
Experiment 8
4 pages
Non-Linear Data Structures Guide
No ratings yet
Non-Linear Data Structures Guide
20 pages
Unit 3
No ratings yet
Unit 3
34 pages
SVM vs CNN in Hyperspectral Classification
No ratings yet
SVM vs CNN in Hyperspectral Classification
11 pages
Unit 2 - Interpolation
No ratings yet
Unit 2 - Interpolation
31 pages
IIR Filter Design Lecture PDF
No ratings yet
IIR Filter Design Lecture PDF
58 pages
Haar Transform and Its Applications
No ratings yet
Haar Transform and Its Applications
20 pages
Design of Window Function in LABVIEW Environment
No ratings yet
Design of Window Function in LABVIEW Environment
5 pages
10-Trees (1) Huylm PDF
No ratings yet
10-Trees (1) Huylm PDF
56 pages
CT Problem Sheet 5
No ratings yet
CT Problem Sheet 5
3 pages
Prefix Sum - Notes-2
No ratings yet
Prefix Sum - Notes-2
15 pages
Computer Science Exam Guide
No ratings yet
Computer Science Exam Guide
6 pages
The Basics of Anti-Aliasing - Using Switched-Capacitor Filters (Maxim Integrated Tutorials-Switched Cap)
No ratings yet
The Basics of Anti-Aliasing - Using Switched-Capacitor Filters (Maxim Integrated Tutorials-Switched Cap)
5 pages
Soft Computing Syllabus
No ratings yet
Soft Computing Syllabus
1 page
Digital Image Processing Course
No ratings yet
Digital Image Processing Course
2 pages
Polynomial Link List
No ratings yet
Polynomial Link List
13 pages
Gauss Elimination Examples & Solutions
No ratings yet
Gauss Elimination Examples & Solutions
8 pages
Bessel Filter: Features & Comparison
No ratings yet
Bessel Filter: Features & Comparison
3 pages
M.Tech - Computer Vision and Image Processing
No ratings yet
M.Tech - Computer Vision and Image Processing
21 pages
Machine Learning Syllabus MIC23 AIDS
No ratings yet
Machine Learning Syllabus MIC23 AIDS
2 pages
InterviewCodingQuestions Ecommerce Array
No ratings yet
InterviewCodingQuestions Ecommerce Array
15 pages
Communications Systems Lab 3
No ratings yet
Communications Systems Lab 3
5 pages
Mrt307 Scheme
100% (1)
Mrt307 Scheme
4 pages
Machine Learning: A Comprehensive Overview
No ratings yet
Machine Learning: A Comprehensive Overview
3 pages
317 Max Flow
No ratings yet
317 Max Flow
26 pages
6) Linear Algebraic-Gauss Eliminaton (Compatibility Mode)
No ratings yet
6) Linear Algebraic-Gauss Eliminaton (Compatibility Mode)
24 pages
AES Block Cipher Explained
No ratings yet
AES Block Cipher Explained
13 pages
Or Notes
No ratings yet
Or Notes
73 pages
Lecture 17 ECE265A - Rx4 Image - Rej
No ratings yet
Lecture 17 ECE265A - Rx4 Image - Rej
18 pages
DLT Mid-2 Answers
No ratings yet
DLT Mid-2 Answers
20 pages
Course Information File (Design and Analysis of Algorithms)
No ratings yet
Course Information File (Design and Analysis of Algorithms)
5 pages

Trees - Regression - Ipynb - Colab

Uploaded by

Trees - Regression - Ipynb - Colab

Uploaded by

2/5/25, 2:00 PM trees_regression.

keyboard_arrow_down Decision tree for regression

Start coding or generate with AI.

penguins = penguins[penguins['Flipper Length (mm)'].notna()]

feature_name = "Flipper Length (mm)"

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

from sklearn.tree import DecisionTreeRegressor

from sklearn.tree import plot_tree

You might also like