0% found this document useful (0 votes)

4 views20 pages

Regression

Regression analysis is a statistical method used to model the relationship between dependent and independent variables, enabling predictions of continuous values. It includes techniques like simple linear regression, multiple linear regression, and polynomial regression, each suited for different types of data relationships. Applications range from forecasting sales and market trends to predicting outcomes in various fields such as economics and environmental science.

Uploaded by

Karuna Salgotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views20 pages

Regression

Uploaded by

Karuna Salgotra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

REGRESSION

Regression analysis is a statistical method to model the

relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.
More specifically, Regression analysis helps us to understand
how the value of the dependent variable is changing
corresponding to an independent variable when other
independent variables are held fixed. It predicts continuous/real
values such as temperature, age, salary, price, etc.

Regression is a supervised learning technique which helps in

finding the correlation between variables and enables us to
predict the continuous output variable based on the one or more
predictor variables. It is mainly used for prediction,
forecasting, time series modeling, and determining the
causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best
fits the given data points, using this plot, the machine learning
model can make predictions about the data. In simple
words, "Regression shows a line or curve that passes through
all the data points on target-predictor graph in such a way
that the vertical distance between the data points and the
regression line is minimum." The distance between data points
and line tells whether a model has captured a strong relationship
or not.
Some examples of regression can be as:
o Prediction of rain using temperature and other factors
o Determining Market trends
o Prediction of road accidents due to rash driving.
Terminologies Related to the Regression Analysis:
o Dependent Variable: The main factor in Regression
analysis which we want to predict or understand is called
the dependent variable. It is also called target variable.

o Independent Variable: The factors which affect the

dependent variables or which are used to predict the values
of the dependent variables are called independent variable,
also called as a predictor.

o Outliers: Outlier is an observation which contains either

very low value or very high value in comparison to other
observed values. An outlier may hamper the result, so it
should be avoided.
o Multicollinearity: If the independent variables are highly
correlated with each other than other variables, then such
condition is called Multicollinearity. It should not be
present in the dataset, because it creates problem while
ranking the most affecting variable.

o Underfitting and Overfitting: If our algorithm works well

with the training dataset but not well with test dataset, then
such problem is called Overfitting. And if our algorithm
does not perform well even with training dataset, then such
problem is called underfitting.

NEED OF REGRESSION
Regression analysis helps in the prediction of a continuous
variable. There are various scenarios in the real world where we
need some future predictions such as weather condition, sales
prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for
such case we need Regression analysis which is a statistical
method and used in machine learning and data science.
o Regression estimates the relationship between the target
and the independent variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently
determine the most important factor, the least important
factor, and how each factor is affecting the other
factors.
APPLICATIONS OF REGRESSION:
 Forecasting continuous outcomes like house prices, stock
prices, or sales.
 Predicting the success of future retail sales or marketing
campaigns to ensure resources are used effectively.
 Predicting customer or user trends, such as on streaming
services or ecommerce websites.
 Analyzing datasets to establish the relationships between
variables and an output.
 Predicting interest rates or stock prices from a variety of
factors.
 Creating time series visualizations.
SIMPLE LINEAR REGRESSION
o Linear regression is a statistical regression method which is
used for predictive analysis.
o It is one of the very simple and easy algorithms which
works on regression and shows the relationship between the
continuous variables.
o It is used for solving the regression problem in machine
learning.
o Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable
(Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear
regression is called simple linear regression. And if there
is more than one input variable, then such linear regression
is called multiple linear regression.
Simple Linear regression algorithm has mainly two
objectives:
o Model the relationship between the two variables. Such
as the relationship between Income and expenditure,
experience and Salary, etc.
o Forecasting new observations. Such as Weather
forecasting according to temperature, Revenue of a
company according to the investments in a year, etc.
How to perform a simple linear regression
Simple linear regression formula
The formula for a simple linear regression is:

 y is the predicted value of the dependent variable (y) for

any given value of the independent variable (x).
 β0 is the intercept, the predicted value of y when the x is 0.
 β1 is the regression coefficient – how much we expect y to
change as x increases.
 x is the independent variable ( the variable we expect is
influencing y).
 ԑ is the error of the estimate, or how much variation there
is in our estimate of the regression coefficient.

How to Find the Regression Equation

In the table below, the xi column shows scores on the aptitude
test. Similarly, the yi column shows statistics grades. The last
two columns show deviations scores - the difference between
the student's score and the average score on each test. The last
two rows show sums and mean scores that we will use to
conduct the regression analysis.

And for each student, we also need to compute the squares of

the deviation scores
And finally, for each student, we need to compute the product of
the deviation scores.

The regression equation is a linear equation of the form:

ŷ = b0 + b1x . To conduct a regression analysis, we need to
solve for b0 and b1.
First, we solve for the regression coefficient (b 1):

Once we know the value of the regression coefficient (b1), we

can solve for the regression slope (b0):

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

ŷ = b0 + b1x

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

MULTIPLE LINEAR REGRESSIONS
Multiple Linear Regression is one of the important regression
algorithms which model the linear relationship between a single
dependent continuous variable and more than one independent
variable.

Example:
Prediction of CO2 emission based on engine size and number of
cylinders in a car.
Some key points about MLR:
o For MLR, the dependent or target variable(Y) must be the
continuous/real, but the predictor or independent variable
may be of continuous or categorical form.
o Each feature variable must model the linear relationship
with the dependent variable.
o MLR tries to fit a regression line through a
multidimensional space of data-points.
ŷ = a+b1x1+b2x2+…………..bnxn
ŷ represents the dependent variable
a represents the dependent variable axis intercept
n signifies the number of variables
x1-xn are the independent variables
b1-bn are coefficient parameters
Regression sum calculations:
 Σx12 = ΣX12 – (ΣX1)2 / n
 Σx22 = ΣX22 – (ΣX2)2 / n
 Σx1y = ΣX1y – (ΣX1Σy) / n
 Σx2y = ΣX2y – (ΣX2Σy) / n
 Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n

Example: Multiple Linear Regression

Suppose we have the following dataset with one response

variable y and two predictor variables X1 and X2:
Use the following steps to fit a multiple linear regression model
to this dataset.
Step 1: Calculate X12, X22, X1y, X2y and X1X2.

Step 2: Calculate Regression Sums.

Next, make the following regression sum calculations:
 Σx12 = ΣX12 – (ΣX1)2 / n = 38,767 – (555)2 / 8 = 263.875
 Σx22 = ΣX22 – (ΣX2)2 / n = 2,823 – (145)2 / 8 = 194.875
 Σx1y = ΣX1y – (ΣX1Σy) / n = 101,895 – (555*1,452) / 8
= 1,162.5
 Σx2y = ΣX2y – (ΣX2Σy) / n = 25,364 – (145*1,452) / 8 = -
953.5
 Σx1x2 = ΣX1X2 – (ΣX1ΣX2) / n = 9,859 – (555*145) / 8 = -
200.375
Step 3: Calculate b0, b1, and b2.
The formula to calculate b1 is: [(Σx22)(Σx1y) – (Σx1x2)(Σx2y)] /
[(Σx12) (Σx22) – (Σx1x2)2]
Thus, b1 = [(194.875)(1162.5) – (-200.375)(-953.5)] /
[(263.875) (194.875) – (-200.375) ] = 3.148
2

The formula to calculate b2 is: [(Σx12)(Σx2y) – (Σx1x2)(Σx1y)] /

[(Σx12) (Σx22) – (Σx1x2)2]
Thus, b2 = [(263.875)(-953.5) – (-200.375)(1152.5)] /
[(263.875) (194.875) – (-200.375)2] = -1.656

Thus, b0 = 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867

Step 5: Place b0, b1, and b2 in the estimated linear regression
equation.
The estimated linear regression equation is: ŷ = b0 + b1*x1 +
b2*x2
In our example, it is ŷ = -6.867 + 3.148x1 – 1.656x2
POLYNOMIAL REGRESSION
In polynomial regression, the relationship between the
independent variable x and the dependent variable y is described
as an nth degree polynomial in x.
Polynomial regression is needed when there is no linear
correlation fitting all the variables. So instead of looking like a
line, it looks like a nonlinear function.
TYPES OF POLYNOMIAL REGRESSION

1. Linear – if degree as 1

2. Quadratic – if degree as 2

3. Cubic – if degree as 3 and goes on, on the basis of degree.

MATHEMATICAL EQUATION:
y=a+a1*x+a2*x2 +………+ +an*xn

Let the quadratic polynomial regression model be

y=a+a1*x+a2*x2

The values of a, a1, and a2 are calculated using the following

system of equations:
First, we calculate the required variables and note them in the
following table.

Using the given data we,

Solving this system of equations we get

a=12.4285714
a1=-5.5128571
a2=0.7642857

The required quadratic polynomial model is

y=12.4285714 -5.5128571 * x +0.7642857 * x2

NEED FOR POLYNOMIAL REGRESSION

o If we apply a linear model on a linear dataset, then it
provides us a good result as we have seen in Simple Linear
Regression, but if we apply the same model without any
modification on a non-linear dataset, then it will produce
a drastic output. Due to which loss function will increase,
the error rate will be high, and accuracy will be decreased.

o So for such cases, where data points are arranged in a

non-linear fashion, we need the Polynomial Regression
model. We can understand it in a better way using the
below comparison diagram of the linear dataset and non-
linear dataset.
o In the above image, we have taken a dataset which is
arranged non-linearly. So if we try to cover it with a linear
model, then we can clearly see that it hardly covers any
data point. On the other hand, a curve is suitable to cover
most of the data points, which is of the Polynomial model.

o Hence, if the datasets are arranged in a non-linear fashion,

then we should use the Polynomial Regression model
instead of Simple Linear Regression.
ADVANTAGES OF POLYNOMIAL REGRESSION

 You can model non-linear relationships between variables.

 There is a large range of different functions that you can use for
fitting.
 Good for exploration purposes: you can test for the presence of
curvature and its inflections.
 It is a flexible tool that can be used to fit a large variety of data
point distributions.
DISADVANTAGES OF POLYNOMIAL REGRESSION
 Even a single outlier in the data plot can seriously mess up the
results.
 PR models are prone to overfitting. If enough parameters are
used, you can fit anything. As John von Neumann reportedly
said: “with four parameters I can fit an elephant, with five I can
make him wiggle his trunk.”
 As a consequence of the previous, PR models might not
generalize well outside of the data used.
POLYNOMIAL REGRESSION USED IN:
 Death rate prediction
 Tissue growth rate prediction
 Speed regulation software

Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
25 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Unit-3 Part 2 DA
No ratings yet
Unit-3 Part 2 DA
20 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Unit 2
No ratings yet
Unit 2
26 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Data Science
100% (1)
Data Science
14 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Learning With Regression and Tree - 1-50
No ratings yet
Learning With Regression and Tree - 1-50
50 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
MODULE 2regression
No ratings yet
MODULE 2regression
36 pages
ML Exp3
No ratings yet
ML Exp3
10 pages
Unit 3
No ratings yet
Unit 3
48 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Unit 2 Da
No ratings yet
Unit 2 Da
37 pages
Module 11. Lesson Proper
No ratings yet
Module 11. Lesson Proper
5 pages
1.5.linear Regression
No ratings yet
1.5.linear Regression
5 pages
Linear Regression for Analysts
No ratings yet
Linear Regression for Analysts
6 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
Regression Analysis for ML Beginners
No ratings yet
Regression Analysis for ML Beginners
12 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
19 pages
4 ML
No ratings yet
4 ML
41 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Unit 2
No ratings yet
Unit 2
67 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Regression
No ratings yet
Regression
14 pages
Supervised Learning
No ratings yet
Supervised Learning
20 pages
Unit 2
No ratings yet
Unit 2
136 pages
Module 2
No ratings yet
Module 2
12 pages
DS 3 2
No ratings yet
DS 3 2
17 pages
Lecture Note #8 - PEC-CS701E
No ratings yet
Lecture Note #8 - PEC-CS701E
20 pages
Unit 11
No ratings yet
Unit 11
21 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
18 pages
NM Presentation
No ratings yet
NM Presentation
16 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
9 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
UNIT 2 - Linear & Logistic Regression Ppt-Inverted
No ratings yet
UNIT 2 - Linear & Logistic Regression Ppt-Inverted
53 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
LINEAR Regression Update
No ratings yet
LINEAR Regression Update
37 pages
Notes 2
No ratings yet
Notes 2
22 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
Regression Analysis and Techniques
No ratings yet
Regression Analysis and Techniques
49 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
Data Analytics and Visualization Unit-II
No ratings yet
Data Analytics and Visualization Unit-II
23 pages
Ch. - 2 Class X Maths-Self Assessment Test
No ratings yet
Ch. - 2 Class X Maths-Self Assessment Test
2 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Chapter 11 Simpson's Rule
No ratings yet
Chapter 11 Simpson's Rule
26 pages
Modeling Data With Polynomials: Lesson
No ratings yet
Modeling Data With Polynomials: Lesson
8 pages
Earth Dam Seepage Analysis
No ratings yet
Earth Dam Seepage Analysis
5 pages
BEC 2304 Operations Research 1 Year III Semester I-1
No ratings yet
BEC 2304 Operations Research 1 Year III Semester I-1
4 pages
Numerical Assignment (1 To 4)
No ratings yet
Numerical Assignment (1 To 4)
5 pages
Summative Test No. 1 Mathematics 9
No ratings yet
Summative Test No. 1 Mathematics 9
3 pages
Bi Syllabus
No ratings yet
Bi Syllabus
32 pages
Math10 - Q1 - Module5 - Jeanneth J. Ortega Pages Deleted
100% (1)
Math10 - Q1 - Module5 - Jeanneth J. Ortega Pages Deleted
19 pages
9.4 The Simplex Method: Minimization: X W X X
No ratings yet
9.4 The Simplex Method: Minimization: X W X X
11 pages
Engineering Mathematics, 1/e: Book Information Sheet Book Information Sheet
No ratings yet
Engineering Mathematics, 1/e: Book Information Sheet Book Information Sheet
1 page
CSE III: Advanced Math Questions
No ratings yet
CSE III: Advanced Math Questions
6 pages
Calculus: Simpson's Rule Explained
No ratings yet
Calculus: Simpson's Rule Explained
2 pages
2.mech R23 II.B.tech Syllabus
No ratings yet
2.mech R23 II.B.tech Syllabus
45 pages
12th Maths Chapter 3 Question Paper English Medium PDF Download
No ratings yet
12th Maths Chapter 3 Question Paper English Medium PDF Download
2 pages
Eigenvalue Problem Solutions and Methods
No ratings yet
Eigenvalue Problem Solutions and Methods
35 pages
Applications of Numerical Method in Chemical Engineering
No ratings yet
Applications of Numerical Method in Chemical Engineering
3 pages
hw2 Sol
No ratings yet
hw2 Sol
7 pages
Partial Fractions Decomposition Guide
No ratings yet
Partial Fractions Decomposition Guide
4 pages
Newton Mahmud
No ratings yet
Newton Mahmud
5 pages
Determinant
No ratings yet
Determinant
123 pages
Fault Diagnosis Using JCF: Soham Dasgupta
No ratings yet
Fault Diagnosis Using JCF: Soham Dasgupta
20 pages
Taylor and Maclaurin Series Overview
No ratings yet
Taylor and Maclaurin Series Overview
12 pages
Transportation Problem Using Stepping Stone Method (Optimal Solution) Calculator
No ratings yet
Transportation Problem Using Stepping Stone Method (Optimal Solution) Calculator
3 pages
Business Mathematics II Syllabus ITB
No ratings yet
Business Mathematics II Syllabus ITB
2 pages
(Ebook PDF) Linear Algebra With Applications 10th Editioninstant Download
100% (5)
(Ebook PDF) Linear Algebra With Applications 10th Editioninstant Download
56 pages
Benders Decomposition Overview
No ratings yet
Benders Decomposition Overview
20 pages
Chemical Process Optimization Techniques
No ratings yet
Chemical Process Optimization Techniques
22 pages
Lab Report
No ratings yet
Lab Report
17 pages

Regression

Uploaded by

Regression

Uploaded by

REGRESSION

Regression analysis is a statistical method to model the

Regression is a supervised learning technique which helps in

o Independent Variable: The factors which affect the

o Outliers: Outlier is an observation which contains either

o Underfitting and Overfitting: If our algorithm works well

 y is the predicted value of the dependent variable (y) for

How to Find the Regression Equation

And for each student, we also need to compute the squares of

The regression equation is a linear equation of the form:

Once we know the value of the regression coefficient (b1), we

Therefore, the regression equation is: ŷ = 26.768 + 0.644x .

ŷ = 26.768 + 0.644x = 26.768 + 0.644 * 80

ŷ = 26.768 + 51.52 = 78.288

Example: Multiple Linear Regression

Suppose we have the following dataset with one response

Step 2: Calculate Regression Sums.

The formula to calculate b2 is: [(Σx12)(Σx2y) – (Σx1x2)(Σx1y)] /

Thus, b0 = 181.5 – 3.148(69.375) – (-1.656)(18.125) = -6.867

3. Cubic – if degree as 3 and goes on, on the basis of degree.

Let the quadratic polynomial regression model be

The values of a, a1, and a2 are calculated using the following

Using the given data we,

The required quadratic polynomial model is

y=12.4285714 -5.5128571 * x +0.7642857 * x2

NEED FOR POLYNOMIAL REGRESSION

o So for such cases, where data points are arranged in a

o Hence, if the datasets are arranged in a non-linear fashion,

 You can model non-linear relationships between variables.

You might also like