Diploma in
Data Analytics
Lesson 3: Linear regression
Introduction to linear regression
Correlation
Vectors & factors
Lesson Objectives
Lesson 3
Intro to linear
regression
Regression
analysis
Method for determining the
relationship that best fits the
observed data
Regression
analysis
𝑌= 𝛼+ 𝛽+ 𝜀
𝑌 = dependant variable
𝛼 = intercept
𝛽 = slope
𝜀 = random error term
Types of regression
• Linear
• Logistic
• Polynomial
• Lasso
• Ridge
• Random forest
Legendre and Gauss
issued a paper what is
known as the earliest
Did you form of linear regression
know? in the early 1900’s
Intro to simple
linear regression
• Foundation to more complex
and modern techniques
• Models relationship between
dependent and independent
variable through linear
equation
• Least squares method to fit
line of best fit
𝑦ො = 𝑎 + 𝑏𝑥𝑖
𝑦 = fitted or
Simple linear predicted value
regression 𝑎 = estimate of α,
intercept
𝑏 = estimate of β,
slope
Scatterplot
Source: http://onlinestatbook.com/2/regression/intro.html
Line of best fit
Source: http://onlinestatbook.com/2/regression/intro.html
Linear regression
in R
>lm()
>lm(y ~ x, data)
>lm.fit
> summary (lm.fit)
Correlation
Correlation
coefficient
• Measure of association
between X and Y in a
normal population
• -1 < corr coeff. < 1
Understanding
correlation
• Positive correlation
• Corr coeff. > 0
• Negative correlation
• Corr coeff. < 0
• No correlation
• Corr coeff. = 0
Population
correlation
coefficient
𝑐𝑜𝑟(𝑋, 𝑌)
𝜌= , −1 < 𝜌 < 1
𝑣𝑎𝑟 𝑋 𝑣𝑎𝑟(𝑌)
Sample
correlation
coefficient
𝑆𝑆𝑥𝑦
𝑟= , −1 < 𝑟 < 1
𝑆𝑆𝑥 𝑆𝑆𝑦
Correlation vs
causation
Correlation does not
automatically indicate causation
• Pearson’s
Correlation correlation
in R coefficient:
• cor(x, y)
• cor.test()
Vectors &
factors
Basic commands in R
• Functions
• funcname(input1, input2)
• Vector creation
• x <- c(1, 2, 3)
• x = c(1, 2, 3)
• Length of set
• length(x)
Basics in R • List of objects
• ls()
• Remove object
• rm()
Challenge
• Input data manually
• Create scatterplot of data
#exploredata