Linear Regression
Objectives
v What is machine learning
vTypes of data and terminology
vTypes of machine learning
v Supervised learning
v Linear regression
v Least square and Gradient Descent
v Hands on implementing Linear Regression from sketch.
Machine Learning
v Machine Learning is the science to make computers learn from data
without explicitly program them and improve their learning over time
in autonomous fashion.
v This learning comes by feeding them data in the form of
observations and real-world interactions.”
v Machine Learning can also be defined as a tool to predict future
events or values using past data.
Types of Data
vBased on Values
vContinuous data (ex. Age – 0-100)
v Categorical data (ex. Gender- Male/Female)
v Based on pattern
v Structured data (ex. Databases)
v Unstructured data (ex. Audio, Video, Text)
Types of Data- continued
v Labelled data – consists of input output pair. For every set
input features the output/response/label is present in
dataset. (ex- labelled image as cat’s or dog’s photo)
{ 𝑥# , 𝑦# , 𝑥& , 𝑦& , 𝑥' , 𝑦' … … … … … 𝑥) , 𝑦) }
vUnlabelled data- There is no output/response/label for the
input features in data. (ex. news articles, tweets, audio)
{𝑥# , 𝑥& , 𝑥' … … … … 𝑥) }
Types of Data- continued
vTraining Data – Sample data points which are used to train the
machine learning model.
vTest Data- sample data points that are used to test the
performance of machine learning model.
Note- For modelling, the original dataset is partitioned into the ratio of 70:30 or 75:25 as
training data and test data.
Types of Machine Learning
Machine
Learning
Supervised Unsupervised Reinforcement
Learning Learning Learning
Regression Classification Clustering PCA
Supervised Learning
v Class of machine learning that work on externally supplied instances in form of
predictor attributes and associated target values.
v The model learns from the training data using these ‘target variables’ as
reference variables.
vEx1 : model to predict the resale value of a car based on its mileage, age,
color etc.
v The target values are the ‘correct answers’ for the predictor model which can
either be a regression model or a classification model.
Motivation for learning
v It is being assumed that there exists a relationship/association between input
features and target variable.
v Relationship can be observed by plotting a scatter plot between the two
variables.
v Relationship measure can be quantified by calculating correlation between two
the variables.
𝑐𝑜𝑣(𝑥, 𝑦) ∑ 𝑥5 − 𝑥̅ ∗ 𝑦5 − 𝑦9
𝑐𝑜𝑟𝑟 𝑥, 𝑦 = =
𝑣𝑎𝑟 𝑥 . 𝑣𝑎𝑟(𝑦)
∑ 𝑥5 − 𝑥̅ & ∑ 𝑦5 − 𝑦9 &
Linear Regression
v Linear regression is a way to identify a relationship between two or more
variables and use these relationships to predict values of one variable for given
value(s) of other variable(s).
v Linear regression assume the relationship between variables can be modelled
through linear equation or an equation of line
Slope
Dependent/Regressed variable 𝒚 = 𝒘𝟎 + 𝒘𝟏 𝑿 Independent/Regressor variable
Intercept
Multiple Regression
v Last slide showed the linear regression model with one independent and one
dependent variable.
v In Real world a data point has various important attributes and they need to be
catered to while developing a regression model. (Many independent variables and
one dependent variable)
𝑦 = 𝑤A + 𝑤#𝑥# + 𝑤&𝑥& + 𝑤'𝑥'. … … … . wCxC
Regression –Problem Formulation
Let you have given with a data:
170
Age in Years Blood Pressure
(X) (Y) 160
Blood Pressure (Y)
56 147 150
49 145
140
72 160
38 115 130
63 130 120
47 128
110
35 45 55 65 75
Age in Year (X)
Linear Regression
v For given example the Linear Regression is modeled as:
𝐵𝑙𝑜𝑜𝑑𝑃𝑟𝑒𝑠𝑠𝑢𝑟𝑒 𝑦 = 𝑤A + 𝑤# 𝐴𝑔𝑒𝑖𝑛𝑌𝑒𝑎𝑟(𝑋)
OR
𝑦 = 𝑤A + 𝑤# 𝑋 – Equation of line
with 𝑤A 𝑖𝑠 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑛 𝑌_𝑎𝑥𝑖𝑠 𝑎𝑛𝑑 𝑤# 𝑖𝑠 𝑠𝑙𝑜𝑝𝑒 𝑜𝑓 𝑙𝑖𝑛𝑒
Blood Pressure - Dependent Variable
Age in Year - Independent Variable
Linear Regression- Best Fit Line
v Regression uses line to show the trend of distribution.
vThere can be many lines that try to fit the data points in scatter diagram
v The aim is to find Best fit Line
170
160
Blood Pressure (Y)
150
140
130
120
110
35 45 55 65 75
Age in Year (X)
What is Best Fit Line
v Best fit line tries to explain the variance in given data. (minimize the total residual/error)
What is Best Fit Line
v Best fit line tries to explain the variance in given data. (minimize the total residual/error)
Linear Regression- Methods to Get Best
vLeast Square
v Gradient Descent
Linear Regression- Least Square
Model: 𝑌 = 𝑤A + 𝑤# 𝑋
Task: Estimate 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑤A 𝑎𝑛𝑑 𝑤#
According to pr𝑖𝑛𝑐𝑖𝑝𝑙𝑒 𝑜𝑓 𝑙𝑒𝑎𝑠𝑡 𝑠𝑞𝑢𝑎𝑟𝑒 𝑡ℎ𝑒 𝑛𝑜𝑟𝑚𝑎𝑙 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛𝑠 𝑡𝑜 𝑠𝑜𝑙𝑣𝑒 𝑓𝑜𝑟 𝑤A 𝑎𝑛𝑑 𝑤#
) )
c 𝑌5 = 𝑛 𝑤A + 𝑤# c 𝑋5 … … … … … (1)
5d# 5d#
) )
)
c 𝑋5 𝑌5 = 𝑤A c 𝑋5 + 𝑤# c 𝑋5& … … … … . (2)
5d#
5d# 5d#
Linear Regression–Least Square
Let divide the equation (1) by n (number of sample points) we get:
) )
1 1
c 𝑌5 = 𝑤A + 𝑤# c 𝑋5
𝑛 𝑛
5d# 5d#
OR
𝑦9 = 𝑤A + 𝑤#𝑥………….(3)
̅
So line of regression will always passes through the points (𝑥,̅ 𝑦)
9
Linear Regression–Least Square
Now we know :
# ) # )
𝑐𝑜𝑣 𝑥, 𝑦 = ∑ 𝑥𝑦 − 𝑥̅ 𝑦9 ≔ ∑ 𝑥𝑦 = 𝑐𝑜𝑣 𝑥, 𝑦 + 𝑥̅ 𝑦9 ………(4)
) 5d# 5 5 ) 5d# 5 5
and
# ) # )
𝑣𝑎𝑟 𝑥 = ∑5d# 𝑥5& − 𝑥̅ & and 𝑣𝑎𝑟 𝑦 = ∑5d# 𝑦5& − 𝑦9 &
) )
Dividing equation (2) by n and using equation (4) and (5) we get:
𝑐𝑜𝑣 𝑥, 𝑦 + 𝑥̅ 𝑦9 = 𝑤A𝑥̅ + 𝑤#(𝑣𝑎𝑟 𝑥 + 𝑥̅ &)…………………….(5)
Linear Regression–Least Square
Now by using equation
𝑦9 = 𝑤A + 𝑤#𝑥̅
and
𝑐𝑜𝑣 𝑥, 𝑦 + 𝑥̅ 𝑦9 = 𝑤A𝑥̅ + 𝑤#(𝑣𝑎𝑟 𝑥 + 𝑥̅ &)
We will get:
𝑐𝑜𝑣(𝑥, 𝑦)
𝑤# =
𝑣𝑎𝑟(𝑥)
and 𝑤A = 𝑦9 − 𝑤#𝑥̅
Performance metric for least square regression
1 )
∑5d#(𝑦5 − 𝑦ℎ𝑎𝑡5 )&
𝑅& = 1 − 𝑛
1 )
9 &
∑5d#(𝑦5 − 𝑦)
𝑛
& (1 − 𝑅&)(𝑛 − 1)
𝑅klm =1−
(𝑛 − 𝑘 − 1)
Linear Regression- Gradient Descent
Model: 𝑌 = 𝑤A + 𝑤# 𝑋
Task: Estimate 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑤A 𝑎𝑛𝑑 𝑤#
Define the cost function,
)
1
𝑐𝑜𝑠𝑡 𝑤A , 𝑤# = c(𝑦5 − 𝑦ℎ𝑎𝑡5 )&
𝑛
5d#
Objective of gradient Descent
)
1
𝐦𝐢𝐧 𝑐𝑜𝑠𝑡 𝑤A , 𝑤# = c(𝑦5 − (𝑤A + 𝑤# 𝑥5 ))&
𝒘𝟎 ,𝒘𝟏 𝑛
5d#
Linear Regression- Gradient Descent
Model: 𝑌 = 𝑤A + 𝑤#𝑋
Task: Estimate 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑤A 𝑎𝑛𝑑 𝑤#
Cost(w0, w1)
the objective,
)
1
𝐦𝐢𝐧 𝑐𝑜𝑠𝑡 𝑤A, 𝑤# = c(𝑦5 − (𝑤A + 𝑤#𝑥5 ))&
𝒘𝟎 ,𝒘𝟏 𝑛
5d#
w0
Linear Regression- Gradient Descent
v Gradient descent works if following steps:
1. Initialize the parameters to some random variable
2. Calculate the gradient of cost function w. r. t. to parameters
3. Update the parameters using gradient in opposite direction.
4. Repeat step-2 and step-3 for some number of times or till it reaches to minimum
cost value.
Linear Regression- Gradient Descent
)
1
𝑐𝑜𝑠𝑡 𝑤A, 𝑤# = c(𝑦5 − (𝑤A + 𝑤#𝑥5 ))&
𝑛
5d#
Calculating gradients of cost function:
𝜕𝑐𝑜𝑠𝑡(𝑤A , 𝑤# ) 2 )
𝑔𝑟𝑎𝑑𝑤A = = c (𝑦5 − (𝑤A + 𝑤# 𝑥5 ))(−1)
𝜕𝑤A 𝑛 5d#
𝜕𝑐𝑜𝑠𝑡(𝑤A , 𝑤# ) 2 )
𝑔𝑟𝑎𝑑𝑤# = = c (𝑦5 − (𝑤A + 𝑤# 𝑥5 ))(−𝑥)
𝜕𝑤# 𝑛 5d#
Parameter update:
𝑤A = 𝑤A − 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑟𝑎𝑡𝑒 ∗ 𝑔𝑟𝑎𝑑𝑤A
𝑤# = 𝑤# − 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔𝑟𝑎𝑡𝑒 ∗ 𝑔𝑟𝑎𝑑𝑤#
Performance metric for gradient based regression
Root Mean Square Error (RMSE) is the standard deviation of prediction errors.
(𝑦5 − 𝑦ℎ𝑎𝑡5 )&
𝑅𝑀𝑆𝐸 =
𝑛
Mean absolute error (MAE) is a measure of difference between two variables.
𝑦5 − 𝑦ℎ𝑎𝑡5
𝑀𝐴𝐸 =
𝑛
Thank you !
Let see the hands on…