Unit - 3 Machine Learning

The document discusses different types of regression analysis techniques including simple linear regression, multiple linear regression, and polynomial regression. It provides introductions and examples of each technique as well as discussing their applications and assumptions.

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views30 pages

Unit - 3 Machine Learning

Uploaded by

Abhishek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

UNIT 3: Regression

BTCS 618‐18
Dr. Vandana Mohindru
Topics to be discussed
• Introduction to Regression
• Need and Applications of Regression
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Evaluating Regression Models Performance
• Mean Absolute Error (MAE)
• Mean Squared Error (MSE)
• Root Mean Squared Error (RMSE)
• Rsquare
• Scatter plot
Introduction to Regression
• Regression is a statistical method to model the relationship between a
dependent (target) and independent (predictor) variables with one or
more independent variables.
• More specifically, Regression analysis helps us to understand how the
value of the dependent variable is changing corresponding to an
independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary,
price, etc.
• We can understand the concept of regression analysis using the below
example:
• Example: Suppose there is a marketing company A, who does various
advertisement every year and get sales on that. The below list shows the
advertisement made by the company in the last 5 years and the
corresponding sales:
Introduction to Regression
• Now, the company wants to do the advertisement of $200 in the year
2019 and wants to know the prediction about the sales for this year. So
to solve such type of prediction problems in machine learning, we need
regression analysis.
Introduction to Regression
• Regression is a supervised learning technique which helps in finding the
correlation between variables and enables us to predict the continuous
output variable based on the one or more predictor variables.
• It is mainly used for prediction, forecasting, time series modeling, and
determining the causal‐effect relationship between variables.
• In Regression, we plot a graph between the variables which best fits the
given datapoints, using this plot, the machine learning model can make
predictions about the data.
• In simple words, "Regression shows a line or curve that passes through
all the datapoints on target‐predictor graph in such a way that the
vertical distance between the datapoints and the regression line is
minimum." The distance between datapoints and line tells whether a
model has captured a strong relationship or not.
Terminologies Related to the Regression Analysis
• Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
• Independent Variable: The factors which affect the dependent variables or
which are used to predict the values of the dependent variables are called
independent variable, also called as a predictor.
• Outliers: Outlier is an observation which contains either very low value or very
high value in comparison to other observed values. An outlier may hamper the
result, so it should be avoided.
• Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It
should not be present in the dataset, because it creates problem while ranking
the most affecting variable.
• Underfitting and Overfitting: If our algorithm works well with the training
dataset but not well with test dataset, then such problem is called Overfitting.
And if our algorithm does not perform well even with training dataset, then
such problem is called underfitting.
Need and Applications of Regression
• Regression analysis helps in the prediction of a continuous variable. There
are various scenarios in the real world where we need some future
predictions such as weather condition, sales prediction, marketing trends,
etc., for such case we need some technology which can make predictions
more accurately.
• So for such case we need Regression analysis which is a statistical method
and used in machine learning and data science. Below are some other
reasons for using Regression analysis:
• Regression estimates the relationship between the target and the
independent variable.
• It is used to find the trends in data.
• It helps to predict real/continuous values.
• By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.
Need and Applications of Regression
Some applications of regression can be as:
• Prediction of rain using temperature and other factors
• Determining Market trends
• Prediction of road accidents due to rash driving.
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that models
the relationship between a dependent variable and a single independent
variable. The relationship shown by a Simple Linear Regression model is
linear or a sloped straight line, hence it is called Simple Linear
Regression.
• The key point in Simple Linear Regression is that the dependent variable
must be a continuous/real value. However, the independent variable
can be measured on continuous or categorical values.
• Simple Linear regression algorithm has mainly two objectives:
• Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
• Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year, etc.
Simple Linear Regression
Simple Linear Regression Model:
The Simple Linear Regression model can be represented using
the below equation:
y= a0+a1x+ ε
Where,
a0= It is the intercept of the Regression line (can be obtained putting
x=0)
a1= It is the slope of the regression line, which tells whether the line is
increasing or decreasing.
ε = The error term. (For a good model it will be negligible)
Simple Linear Regression
• In the given plot, we can see the
real values observations in green
dots and predicted values are
covered by the red regression line.
The regression line shows a
correlation between the
dependent and independent
variable.
• The good fit of the line can be
observed by calculating the
difference between actual values
and predicted values. But as we
can see in the given plot, most of
the observations are close to the
regression line, hence our model is
good for the training set.
Simple Linear Regression
• In the given plot, there are
observations given by the blue
color, and prediction is given by
the red regression line. As we
can see, most of the
observations are close to the
regression line, hence we can
say our Simple Linear Regression
is a good model and able to
make good predictions.
Multiple Linear Regression
• In the previous topic, we have learned about Simple Linear Regression,
where a single Independent/Predictor(X) variable is used to model the
response variable (Y). But there may be various cases in which the
response variable is affected by more than one predictor variable; for
such cases, the Multiple Linear Regression algorithm is used.
• Moreover, Multiple Linear Regression is an extension of Simple Linear
regression as it takes more than one predictor variable to predict the
response variable. We can define it as:
• Multiple Linear Regression is one of the important regression
algorithms which models the linear relationship between a single
dependent continuous variable and more than one independent
variable.
Multiple Linear Regression
Example:
• Prediction of CO2 emission based on engine size and number of cylinders
in a car.
Some key points about MLR:
• For MLR, the dependent or target variable(Y) must be the
continuous/real, but the predictor or independent variable may be of
continuous or categorical form.
• Each feature variable must model the linear relationship with the
dependent variable.
• MLR tries to fit a regression line through a multidimensional space of
data‐points.
Multiple Linear Regression
MLR Equation
In Multiple Linear Regression, the target variable(Y) is a linear combination
of multiple predictor variables x1, x2, x3, ...,xn. Since it is an enhancement
of Simple Linear Regression, so the same is applied for the multiple linear r
egression equation, the equation becomes:
Y = b0 + b1x1 + b2x2 + b3x3 + ………………..bnxn

Where,
Y = Output/Response variable
b0, b1, b2, b3 , bn....= Coefficients of the model.
x1, x2, x3, x4,...= Various Independent/feature variable
Multiple Linear Regression
Assumptions for Multiple Linear Regression:
• A linear relationship should exist between the Target and predictor
variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation between the
independent variable) in data.

Applications of Multiple Linear Regression:

• Effectiveness of Independent variable on prediction
• Predicting the impact of changes
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent(y) and independent variable(x) as nth
degree polynomial. The Polynomial Regression equation is given below:
y = b0+ b1x1+ b2x12+ b2x13+...... bnx1n
• It is also called the special case of Multiple Linear Regression in ML.
Because we add some polynomial terms to the Multiple Linear
regression equation to convert it into Polynomial Regression.
• It is a linear model with some modification in order to increase the
accuracy.
• The dataset used in Polynomial regression for training is of non‐linear
nature.
Polynomial Regression
• It makes use of a linear regression model to fit the complicated and non‐
linear functions and datasets.
• Hence, "In Polynomial regression, the original features are converted
into Polynomial features of required degree (2,3,..,n) and then modeled
using a linear model."
Polynomial Regression
Need for Polynomial Regression:
• If we apply a linear model on a linear dataset, then it provides us a good
result as we have seen in Simple Linear Regression, but if we apply the
same model without any modification on a non‐linear dataset, then it
will produce a drastic output. Due to which loss function will increase,
the error rate will be high, and accuracy will be decreased.
• So for such cases, where data points are arranged in a non‐linear
fashion, we need the Polynomial Regression model. We can understand
it in a better way using the below comparison diagram of the linear
dataset and non‐linear dataset.
Polynomial Regression
Need for Polynomial Regression:
Polynomial Regression
Need for Polynomial Regression:
• In the above image, we have taken a dataset which is arranged non‐
linearly. So if we try to cover it with a linear model, then we can clearly
see that it hardly covers any data point. On the other hand, a curve is
suitable to cover most of the data points, which is of the Polynomial
model.
• Hence, if the datasets are arranged in a non‐linear fashion, then we
should use the Polynomial Regression model instead of Simple Linear
Regression.
Polynomial Regression
Equation of the Polynomial Regression Model:
• Simple Linear Regression equation: y = b0+b1x .........(a)
• Multiple Linear Regression equation: y= b0+b1x+ b2x2+ b3x3+....+ bnxn ......(b)
• Polynomial Regression equation: y= b0+b1x + b2x2+ b3x3+....+ bnxn .........(c)

When we compare the above three equations, we can clearly see that all three
equations are Polynomial equations but differ by the degree of variables. The
Simple and Multiple Linear equations are also Polynomial equations with a
single degree, and the Polynomial regression equation is Linear equation with
the nth degree. So if we add a degree to our linear equations, then it will be
converted into Polynomial Linear equations.
Evaluating Regression Models Performance
• Regression analysis is a subfield of supervised machine learning. It aims
to model the relationship between a certain number of features and a
continuous target variable. Following are the performance metrics used
for evaluating a regression model:

1. Mean Absolute Error (MAE)

2. Mean Squared Error (MSE)
3. Root Mean Squared Error (RMSE)
4. Rsquare
5. Scatter plot
Evaluating Regression Models Performance
Let’s keep one thing in mind, what is an Error?
Any deviation from the actual value is an error,

Error = Y (actual) — Y (predicted)

So keeping this in mind, we have understood the requirement of

the metrics, let’s deep dive into the methods we can use to find
out ways to understand out model’s performance.
Evaluating Regression Models Performance
1. Mean Absolute Error (MAE):

where yᵢ is the actual expected output and ŷᵢ is the model’s prediction.

It is the simplest evaluation metric for a regression scenario.
Say, yᵢ = [5,10,15,20] and
ŷᵢ = [4.8,10.6,14.3,20.1],
Thus, MAE = 1/4 * (|5‐4.8|+|10‐10.6|+|15‐14.3|+|20‐20.1|) = 0.4
Evaluating Regression Models Performance
2. Mean Squared Error (MSE):

Here, the error term is squared and thus more sensitive to outliers as
compared to Mean Absolute Error (MAE).

Thus, MSE = 1/4 * (|5‐4.8|^2+|10‐10.6|^2+|15‐14.3|^2+|20‐20.1|^2) =

0.225
Evaluating Regression Models Performance
3. Root Mean Squared Error (RMSE):

Since MSE includes squared error terms, we take the square root of the
MSE, which gives rise to Root Mean Squared Error (RMSE).

Thus, RMSE = (0.225)^0.5 = 0.474

Evaluating Regression Models Performance
4. R‐Squared:

• R‐squared is calculated by dividing the sum of squares of residuals (SSRES) from

the regression model by the total sum of squares (SSTOT) of errors from the
average model and then subtract it from 1.
• R‐squared is also known as the Coefficient of Determination. It explains the
degree to which the input variables explain the variation of the output /
predicted variable.
• A R‐squared value of 0.81, tells that the input variables explains 81 % of the
variation in the output variable. The higher the R squared, the more variation
is explained by the input variables and better is the model.
Although, there exists a limitation in this metric, which is solved by the
Adjusted R‐squared.
Evaluating Regression Models Performance
5. Scatter Plot: Scatter plots are often used to identify relationships
between two variables, such as experience and salary.
Evaluating Regression Models Performance
5. Scatter Plot:
• The relationship between the two variables is called the Correlation; the
closer the data comes to making a straight line, the stronger the
correlation.
• When analyzing scatter plots, the viewer also looks for the slope and
strength of the data pattern.
• Slope refers to the direction of change in one variable when the other
gets bigger.
• Strength refers to the scatter of the plot: if the points are tightly
concentrated around a line, the relationship is strong.
• Scatter plots can also show unusual features of the data set, such as
clusters, patterns, or outliers, that would be hidden if the data were
merely in a table.

Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
25 pages
Unit-3 Part 2 DA
No ratings yet
Unit-3 Part 2 DA
20 pages
Module 4
No ratings yet
Module 4
41 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Hanan
No ratings yet
Hanan
9 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Supervised Learning
No ratings yet
Supervised Learning
20 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Unit 2
No ratings yet
Unit 2
26 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
6 pages
Data Science
100% (1)
Data Science
14 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
ML Exp 1
No ratings yet
ML Exp 1
4 pages
4 ML
No ratings yet
4 ML
41 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
19 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
ML Using Python Unit3 PDF
No ratings yet
ML Using Python Unit3 PDF
8 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Regression Analysis for ML Beginners
No ratings yet
Regression Analysis for ML Beginners
12 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
Unit 2
No ratings yet
Unit 2
136 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Unit 2
No ratings yet
Unit 2
67 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Unit 3
No ratings yet
Unit 3
48 pages
AI & ML: Linear Regression Guide
No ratings yet
AI & ML: Linear Regression Guide
55 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Applications of Regression Analysis
No ratings yet
Applications of Regression Analysis
98 pages
Regression and Trees in ML
No ratings yet
Regression and Trees in ML
100 pages
Notes 2
No ratings yet
Notes 2
22 pages
Unit 2
No ratings yet
Unit 2
18 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Unit-2-Linear Regression-R1
No ratings yet
Unit-2-Linear Regression-R1
21 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
Linear Regression Techniques Guide
No ratings yet
Linear Regression Techniques Guide
103 pages
Regression
No ratings yet
Regression
6 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Regression Guide for Supporting Characters
100% (1)
Regression Guide for Supporting Characters
21 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Solving One Variable Linear Equations
No ratings yet
Solving One Variable Linear Equations
10 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Java Notes 8
No ratings yet
Java Notes 8
116 pages
Java Notes 3
No ratings yet
Java Notes 3
133 pages
Java Notes 1
No ratings yet
Java Notes 1
10 pages
Unit - 1 Machine Learning
No ratings yet
Unit - 1 Machine Learning
82 pages
The in Uence of Interactive Learning, Learning Motivation, Immersion Learning and Cognitive Learning On Learning Performance
No ratings yet
The in Uence of Interactive Learning, Learning Motivation, Immersion Learning and Cognitive Learning On Learning Performance
12 pages
Full
100% (3)
Full
61 pages
Samrawit Kassa
No ratings yet
Samrawit Kassa
100 pages
Eb 014366
No ratings yet
Eb 014366
11 pages
Final 3rd-4th Sem Syllabus - CS Stream
No ratings yet
Final 3rd-4th Sem Syllabus - CS Stream
8 pages
2024 Y12 Apps Task 1 Test 1
No ratings yet
2024 Y12 Apps Task 1 Test 1
13 pages
Exploratory Factor Analysis Using SPSS 2023
No ratings yet
Exploratory Factor Analysis Using SPSS 2023
50 pages
Co-Curricular Activities and Its Relation Ship With Academic Performance
No ratings yet
Co-Curricular Activities and Its Relation Ship With Academic Performance
9 pages
G1 SB2023-Lower Extremity Kinematic and Kinetic Factors Associated With Bat Speed at Ball Contact During The Baseball Swing
No ratings yet
G1 SB2023-Lower Extremity Kinematic and Kinetic Factors Associated With Bat Speed at Ball Contact During The Baseball Swing
13 pages
OLAP in Business Intelligence Analytics
No ratings yet
OLAP in Business Intelligence Analytics
12 pages
Validation of Core, Rectal and Skin Temperature Predictions of A Free Web-Based Predictive Heat Strain Software Based On The ISO 7933:2023 Standard in Recreational Athletes
No ratings yet
Validation of Core, Rectal and Skin Temperature Predictions of A Free Web-Based Predictive Heat Strain Software Based On The ISO 7933:2023 Standard in Recreational Athletes
12 pages
WoE Methopd For Landslide Susceptibility MAp in Tandikek and Damarbancah - IJSR
No ratings yet
WoE Methopd For Landslide Susceptibility MAp in Tandikek and Damarbancah - IJSR
8 pages
2005 Parametric Versus Non-Parametric Statistics in The Analysis of Randomized Trials With Non-Normally Distributed Data
No ratings yet
2005 Parametric Versus Non-Parametric Statistics in The Analysis of Randomized Trials With Non-Normally Distributed Data
12 pages
Econometrics for Finance Students
No ratings yet
Econometrics for Finance Students
84 pages
تحليلات التربه والنبات
No ratings yet
تحليلات التربه والنبات
15 pages
Thesis and Graduate Work Advisory
No ratings yet
Thesis and Graduate Work Advisory
10 pages
J Pharm Sci - 2000 - Irvine - MDCK Madin Darby Canine Kidney Cells A Tool For Membrane Permeability Screening
No ratings yet
J Pharm Sci - 2000 - Irvine - MDCK Madin Darby Canine Kidney Cells A Tool For Membrane Permeability Screening
6 pages
10 Portfolio Management
No ratings yet
10 Portfolio Management
15 pages
Enhancing Socio-Economic Empowerment Through Financial Inclusion Initiatives A Case Study of Rwanda Union of The Blind
No ratings yet
Enhancing Socio-Economic Empowerment Through Financial Inclusion Initiatives A Case Study of Rwanda Union of The Blind
18 pages
Untitled
No ratings yet
Untitled
536 pages
Designing and Developing Assessment
No ratings yet
Designing and Developing Assessment
44 pages
Questionario DASH
No ratings yet
Questionario DASH
10 pages
Effect of Growth Rate On Wood Specific Gravity and Selected Mechanical Properties
No ratings yet
Effect of Growth Rate On Wood Specific Gravity and Selected Mechanical Properties
15 pages
STAMP-2.5D Structural and Thermal Aware
No ratings yet
STAMP-2.5D Structural and Thermal Aware
8 pages
Sample Question BBI-merged
No ratings yet
Sample Question BBI-merged
67 pages
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685 Instant Download
100% (2)
How To Use SPSS A Step by Step Guide To Analysis and Interpretation 4th Edition by Brian Cronk 188458568X 978-1884585685 Instant Download
47 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
16 pages
PR 2
No ratings yet
PR 2
27 pages
Source 2-Int'l
No ratings yet
Source 2-Int'l
6 pages
Mathematics & Statistics Exam Paper
No ratings yet
Mathematics & Statistics Exam Paper
3 pages

Unit - 3 Machine Learning

Uploaded by

Unit - 3 Machine Learning

Uploaded by

UNIT 3: Regression

Applications of Multiple Linear Regression:

1. Mean Absolute Error (MAE)

Error = Y (actual) — Y (predicted)

So keeping this in mind, we have understood the requirement of

where yᵢ is the actual expected output and ŷᵢ is the model’s prediction.

Thus, MSE = 1/4 * (|5‐4.8|^2+|10‐10.6|^2+|15‐14.3|^2+|20‐20.1|^2) =

Thus, RMSE = (0.225)^0.5 = 0.474

• R‐squared is calculated by dividing the sum of squares of residuals (SSRES) from

You might also like