0% found this document useful (0 votes)

16 views10 pages

ML - Module 3 Chapter 5

This document provides an overview of regression analysis, a key concept in machine learning used to predict continuous outcomes based on predictor variables. It covers various types of regression, including linear, multiple, polynomial, and logistic regression, along with their equations and applications. Additionally, it discusses the assumptions and limitations of linear regression, as well as the ordinary least squares method for estimating model parameters.

Uploaded by

vikasvikki158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

ML - Module 3 Chapter 5

Uploaded by

vikasvikki158

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MACHINE LEARNING(BCS602)

MODULE 3
CHAPTER 5
REGRESSION ANALYSIS
5.1 Introduction to Regression
Regression analysis is a fundamental concept that consists of a set of machine learning methods
that predict a continuous outcome variable (y) based on the value of one or multiple predictor
variables (x).
OR
Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables.
Regression is a supervised learning technique which helps in finding the correlation between
variables.
It is mainly used for prediction, forecasting, time series modelling, and determining the causal-
effect relationship between variables.
Regression shows a line or curve that passes through all the datapoints on target-predictor
graph in such a way that the vertical distance between the datapoints and the regression line
is minimum." The distance between datapoints and line tells whether a model has captured a
strong relationship or not.
• Function of regression analysis is given by:
Y=f(x)
Here, y is called dependent variable and x is called independent variable.
Applications of Regression Analysis
➢ Sales of a goods or services
➢ Value of bonds in portfolio management
➢ Premium on insurance companies
➢ Yield of crop in agriculture
➢ Prices of real estate

5.2 INTRODUCTION TO LINEARITY, CORRELATION AND CAUSATION

A correlation is the statistical summary of the relationship between two sets of variables. It is
a core part of data exploratory analysis, and is a critical aspect of numerous advanced machine
learning techniques.
Correlation between two variables can be found using a scatter plot

Deepa S, Dept.Of CSE,RNSIT 1

MACHINE LEARNING(BCS602)

There are different types of correlation:

Positive Correlation: Two variables are said to be positively correlated when their values
move in the same direction. For example, in the image below, as the value for X increases, so
does the value for Y at a constant rate.
Negative Correlation: Finally, variables X and Y will be negatively correlated when their
values change in opposite directions, so here as the value for X increases, the value for Y
decreases at a constant rate.
Neutral Correlation: No relationship in the change of variables X and Y. In this case, the
values are completely random and do not show any sign of correlation, as shown in the
following image:

Causation
Causation is about relationship between two variables as x causes y. This is called x implies b.
Regression is different from causation. Causation indicates that one event is the result of the
occurrence of the other event; i.e. there is a causal relationship between the two events.
Linear and Non-Linear Relationships
The relationship between input features (variables) and the output (target) variable is
fundamental. These concepts have significant implications for the choice of algorithms, model
complexity, and predictive performance.
Linear relationship creates a straight line when plotted on a graph, a Non-Linear relationship
does not create a straight line but instead creates a curve.
Example:
Linear-the relationship between the hours spent studying and the grades obtained in a class.
Non-Linear-
Linearity:
Linear Relationship: A linear relationship between variables means that a change in one
variable is associated with a proportional change in another variable. Mathematically, it can be
represented as y = a * x + b, where y is the output, x is the input, and a and b are constants.

Deepa S, Dept.Of CSE,RNSIT 2

MACHINE LEARNING(BCS602)

Linear Models: Goal is to find the best-fitting line (plane in higher dimensions) to the data
points. Linear models are interpretable and work well when the relationship between variables
is close to being linear.
Limitations: Linear models may perform poorly when the relationship between variables is
non-linear. In such cases, they may underfit the data, meaning they are too simple to capture
the underlying patterns.
Non-Linearity:
Non-Linear Relationship: A non-linear relationship implies that the change in one variable is
not proportional to the change in another variable. Non-linear relationships can take various
forms, such as quadratic, exponential, logarithmic, or arbitrary shapes.
Non-Linear Models: Machine learning models like decision trees, random forests, support
vector machines with non-linear kernels, and neural networks can capture non-linear
relationships. These models are more flexible and can fit complex data patterns.
Benefits: Non-linear models can perform well when the underlying relationships in the data
are complex or when interactions between variables are non-linear. They have the capacity to
capture intricate patterns.

Types of Regression

Deepa S, Dept.Of CSE,RNSIT 3

MACHINE LEARNING(BCS602)

Linear Regression:
Single Independent Variable: Linear regression, also known as simple linear regression, is
used when there is a single independent variable (predictor) and one dependent variable
(target).
Equation: The linear regression equation takes the form: Y = β0 + β1X + ε,
Where
Y is the dependent variable,
X is the independent variable,
β0 is the intercept,
β1 is the slope (coefficient), and
ε is the error term.
Linear regression is used to establish a linear relationship between two variables and make
predictions based on this relationship. It's suitable for simple scenarios where there's only one
predictor.

Multiple Regression:
Multiple Independent Variables: Multiple regression, as the name suggests, is used when there
are two or more independent variables (predictors) and one dependent variable (target).
Equation: The multiple regression equation extends the concept to multiple predictors:
Y = β0 + β1X1 + β2X2 + ... + βnXn + ε,
where
Y is the dependent variable,
X1, X2, ..., Xn are the independent variables,
β0 is the intercept, β1, β2, ..., βn are the coefficients, and
ε is the error term.
Multiple regression allows you to model the relationship between the dependent variable and
multiple predictors simultaneously. It's used when there are multiple factors that may influence
the target variable, and you want to understand their combined effect and make predictions
based on all these factors.

Polynomial Regression:
Polynomial regression is an extension of multiple regression used when the relationship
between the independent and dependent variables is non-linear.

Deepa S, Dept.Of CSE,RNSIT 4

MACHINE LEARNING(BCS602)

Equation: The polynomial regression equation allows for higher-order terms, such as quadratic
or cubic terms: Y = β0 + β1X + β2X^2 + ... + βnX^n + ε. This allows the model to fit a curve
rather than a straight line.

Logistic Regression:
Use: Logistic regression is used when the dependent variable is binary (0 or 1). It models the
probability of the dependent variable belonging to a particular class.
Equation: Logistic regression uses the logistic function (sigmoid function) to model
probabilities: P(Y=1) = 1 / (1 + e^(-z)),
where z is a linear combination of the independent variables: z = β0 + β1X1 + β2X2 + ... +
βnXn. It transforms this probability into a binary outcome.

Limitations of Regression
1. Outliers - Outliers are abnormal data. It can bias the outcome of the regression model, as
outliers push the regression line towards it.
2. Number of cases - The ratio of independent and dependent variables should be at least 20:1.
For every explanatory variable, there should be at least 20 samples. Atleast five samples are
required in extreme cases.
3. Missing data - Missing data in training data can make the model unfit for the sampled data.
4. Multicollinearity - If exploratory variables are highly correlated (0.9 and above), the
regression is vulnerable to bias. Singularity leads to perfect correlation of 1. The remedy is to
remove exploratory variables that exhibit correlation more than 1. If there is a tie, then the
tolerance (1-R squared) is used to eliminate variables that have the greatest value.

5.3 INTRODUCTION TO LINEAR REGRESSION

Linear regression model can be created by fitting a line among the scattered data points. The
line is of the form:

Deepa S, Dept.Of CSE,RNSIT 5

MACHINE LEARNING(BCS602)

The assumptions of linear regression are listed as follows:

1. The observations (y) are random and are mutually independent.
2. The difference between the predicted and true values is called an error. The error is also
mutually independent with the same distributions such as normal distribution with zero mean
and constant variables.
3. The distribution of the error term is independent of the joint distribution of explanatory
variables.
4. The unknown parameters of the regression models are constants.

Ordinary Least Square Approach

The ordinary least squares (OLS) algorithm is a method for estimating the parameters of a
linear regression model. Aim: To find the values of the linear regression model's parameters
(i.e., the coefficients) that minimize the sum of the squared residuals.
In mathematical terms, this can be written as: Minimize ∑(yi – ŷi)^2

where yi is the actual value, ŷi is the predicted value.

A linear regression model used for determining the value of the response variable, ŷ, can be
represented as the following equation.
y = b0 + b1x1 + b2x2 + … + bnxn + e
• where: y - is the dependent variable, b0 is the intercept, e is
the error term
• b1, b2, …, bn are the coefficients of the independent
variables x1, x2, …, xn
The coefficients b1, b2, …, bn can also be called the coefficients
of determination. The goal of the OLS method can be used to
estimate the unknown parameters (b1, b2, …, bn) by minimizing
the sum of squared residuals (RSS). The sum of squared residuals
is also termed the sum of squared error (SSE).
This method is also known as the least-squares method for regression or linear regression.
Mathematically the line of equations for points are:
y1=(a0+a1x1)+e1
y2=(a0+a1x2)+e2 and so on
……. yn=(a0+a1xn)+en.

In general ei=yi - (a0+a1x1)

Deepa S, Dept.Of CSE,RNSIT 6

MACHINE LEARNING(BCS602)

Linear Regression Example

Deepa S, Dept.Of CSE,RNSIT 7

MACHINE LEARNING(BCS602)

Deepa S, Dept.Of CSE,RNSIT 8

MACHINE LEARNING(BCS602)

Linear Regression in Matrix Form

Deepa S, Dept.Of CSE,RNSIT 9

MACHINE LEARNING(BCS602)

Deepa S, Dept.Of CSE,RNSIT 10

Regression
No ratings yet
Regression
7 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
Notes 2
No ratings yet
Notes 2
22 pages
Module 3
No ratings yet
Module 3
34 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Classification Algorithms Overview
No ratings yet
Classification Algorithms Overview
19 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Regression Analysis Guide
No ratings yet
Regression Analysis Guide
13 pages
CH 5
No ratings yet
CH 5
36 pages
Understanding Regression in Supervised Learning
No ratings yet
Understanding Regression in Supervised Learning
25 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Regression Analysis for ML Beginners
No ratings yet
Regression Analysis for ML Beginners
12 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
4 ML
No ratings yet
4 ML
41 pages
Unit-3 Part 2 DA
No ratings yet
Unit-3 Part 2 DA
20 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Regression Guide for Supporting Characters
100% (1)
Regression Guide for Supporting Characters
21 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
25 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
No ratings yet
Regression Analysis in Machine Learning: Temperature, Age, Salary, Price
12 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Machine Learning: Regression & Trees
No ratings yet
Machine Learning: Regression & Trees
17 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
ML Module 3
No ratings yet
ML Module 3
34 pages
Unit-2-Linear Regression-R1
No ratings yet
Unit-2-Linear Regression-R1
21 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
2 pages
Regression
No ratings yet
Regression
19 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
ML Unit-III Notes
No ratings yet
ML Unit-III Notes
83 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
Wa0023.
No ratings yet
Wa0023.
22 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
11 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Ra Web
No ratings yet
Ra Web
70 pages
4 STAT-602 Regression & Correlation (Mid&Final)
No ratings yet
4 STAT-602 Regression & Correlation (Mid&Final)
22 pages
Unit 2
No ratings yet
Unit 2
67 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
ML Using Python Unit3 PDF
No ratings yet
ML Using Python Unit3 PDF
8 pages
Business Insights with Regression
No ratings yet
Business Insights with Regression
4 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
DS 3 2
No ratings yet
DS 3 2
17 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Statistics LEC8 Correlation Ciefficient
No ratings yet
Statistics LEC8 Correlation Ciefficient
13 pages
Correlation and Regression Notes
No ratings yet
Correlation and Regression Notes
5 pages
3-4 CLRM
No ratings yet
3-4 CLRM
87 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Experiment 8 ML Vtu
No ratings yet
Experiment 8 ML Vtu
4 pages
Experiment 7 ML Vtu
No ratings yet
Experiment 7 ML Vtu
5 pages
Decision Tree c4.5
No ratings yet
Decision Tree c4.5
22 pages
Aqar 2023 24 1 3 2 Aiml
No ratings yet
Aqar 2023 24 1 3 2 Aiml
344 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
Introductory Econometrics Test Bank Compress
100% (1)
Introductory Econometrics Test Bank Compress
134 pages
Semiparametric M-Estimation With Overparameterized Neural Networks
No ratings yet
Semiparametric M-Estimation With Overparameterized Neural Networks
33 pages
Verification and Validation of Numerical Calculation of Ship Resistance and Flow Field of A Large Tanker
No ratings yet
Verification and Validation of Numerical Calculation of Ship Resistance and Flow Field of A Large Tanker
13 pages
Stat7220001
No ratings yet
Stat7220001
6 pages
Introduction to Structural Equation Modeling
No ratings yet
Introduction to Structural Equation Modeling
36 pages
Chapter 13 (Technical English For Statistics)
No ratings yet
Chapter 13 (Technical English For Statistics)
7 pages
Figure 12-6: To Display Details
No ratings yet
Figure 12-6: To Display Details
144 pages
Catic at All Paper
No ratings yet
Catic at All Paper
13 pages
Qualitative Data Models Guide
No ratings yet
Qualitative Data Models Guide
34 pages
Curve Fitting Techniques Overview
No ratings yet
Curve Fitting Techniques Overview
44 pages
12 Gen Ch3 Least Squares Regression Notes 2024
No ratings yet
12 Gen Ch3 Least Squares Regression Notes 2024
17 pages
Adjustments: Created By: Khiel S. Yumul Bsge-3A
No ratings yet
Adjustments: Created By: Khiel S. Yumul Bsge-3A
105 pages
Forecasting PPTs
No ratings yet
Forecasting PPTs
73 pages
Rainfall Zone E, B, D Harriss
No ratings yet
Rainfall Zone E, B, D Harriss
18 pages
Econometrics for Economics Students
No ratings yet
Econometrics for Economics Students
77 pages
Lecture 4: Least Squares
No ratings yet
Lecture 4: Least Squares
7 pages
Mcqs Econometric
75% (20)
Mcqs Econometric
25 pages
Unit 4 and 5
No ratings yet
Unit 4 and 5
18 pages
Quantitative Methods Guide
No ratings yet
Quantitative Methods Guide
25 pages
3D Affine Coordinate Transformations: Constantin-Octavian Andrei
No ratings yet
3D Affine Coordinate Transformations: Constantin-Octavian Andrei
63 pages
Multiple Linear Regression Models Guide
No ratings yet
Multiple Linear Regression Models Guide
33 pages
Experimental Evaluation of Low Head Oxygenators
No ratings yet
Experimental Evaluation of Low Head Oxygenators
12 pages
Neim
No ratings yet
Neim
24 pages
Comparison and Simulation of Building
No ratings yet
Comparison and Simulation of Building
19 pages
Maths IV Antim Prahar 2.0
No ratings yet
Maths IV Antim Prahar 2.0
9 pages
Soft Analyzers For A Sulfur Recovery Unit
No ratings yet
Soft Analyzers For A Sulfur Recovery Unit
10 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
2023genmath2 W
No ratings yet
2023genmath2 W
28 pages
Econometric Models and Economic Forecasts: Robert S. Pindyck Daniel L. Rubinfeld
No ratings yet
Econometric Models and Economic Forecasts: Robert S. Pindyck Daniel L. Rubinfeld
6 pages
Mind On Statistics Ch14 Q
No ratings yet
Mind On Statistics Ch14 Q
24 pages

ML - Module 3 Chapter 5

Uploaded by

ML - Module 3 Chapter 5

Uploaded by

MACHINE LEARNING(BCS602)

5.2 INTRODUCTION TO LINEARITY, CORRELATION AND CAUSATION

Deepa S, Dept.Of CSE,RNSIT 1

There are different types of correlation:

Deepa S, Dept.Of CSE,RNSIT 2

Deepa S, Dept.Of CSE,RNSIT 3

Deepa S, Dept.Of CSE,RNSIT 4

5.3 INTRODUCTION TO LINEAR REGRESSION

Deepa S, Dept.Of CSE,RNSIT 5

The assumptions of linear regression are listed as follows:

Ordinary Least Square Approach

where yi is the actual value, ŷi is the predicted value.

In general ei=yi - (a0+a1x1)

Deepa S, Dept.Of CSE,RNSIT 6

Linear Regression Example

Deepa S, Dept.Of CSE,RNSIT 7

Deepa S, Dept.Of CSE,RNSIT 8

Linear Regression in Matrix Form

Deepa S, Dept.Of CSE,RNSIT 9

Deepa S, Dept.Of CSE,RNSIT 10

You might also like