0% found this document useful (0 votes)
15 views23 pages

Data Analytics Lesson 11 Slides

This document covers Lesson 3 of a Data Analytics diploma, focusing on linear regression, its types, and the relationship between dependent and independent variables. It introduces key concepts such as the regression equation, correlation coefficients, and basic R commands for implementing linear regression. Additionally, it emphasizes the distinction between correlation and causation and includes practical challenges for data visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views23 pages

Data Analytics Lesson 11 Slides

This document covers Lesson 3 of a Data Analytics diploma, focusing on linear regression, its types, and the relationship between dependent and independent variables. It introduces key concepts such as the regression equation, correlation coefficients, and basic R commands for implementing linear regression. Additionally, it emphasizes the distinction between correlation and causation and includes practical challenges for data visualization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Diploma in

Data Analytics
Lesson 3: Linear regression
Introduction to linear regression
Correlation
Vectors & factors

Lesson Objectives
Lesson 3
Intro to linear
regression
Regression
analysis
Method for determining the
relationship that best fits the
observed data
Regression
analysis
𝑌= 𝛼+ 𝛽+ 𝜀
𝑌 = dependant variable
𝛼 = intercept
𝛽 = slope
𝜀 = random error term
Types of regression
• Linear

• Logistic

• Polynomial

• Lasso

• Ridge

• Random forest
Legendre and Gauss
issued a paper what is
known as the earliest
Did you form of linear regression
know? in the early 1900’s
Intro to simple
linear regression
• Foundation to more complex
and modern techniques
• Models relationship between
dependent and independent
variable through linear
equation
• Least squares method to fit
line of best fit
𝑦ො = 𝑎 + 𝑏𝑥𝑖
𝑦 = fitted or
Simple linear predicted value
regression 𝑎 = estimate of α,
intercept
𝑏 = estimate of β,
slope
Scatterplot

Source: http://onlinestatbook.com/2/regression/intro.html
Line of best fit

Source: http://onlinestatbook.com/2/regression/intro.html
Linear regression
in R
>lm()

>lm(y ~ x, data)

>lm.fit

> summary (lm.fit)


Correlation
Correlation
coefficient
• Measure of association
between X and Y in a
normal population
• -1 < corr coeff. < 1
Understanding
correlation
• Positive correlation
• Corr coeff. > 0

• Negative correlation
• Corr coeff. < 0

• No correlation
• Corr coeff. = 0
Population
correlation
coefficient
𝑐𝑜𝑟(𝑋, 𝑌)
𝜌= , −1 < 𝜌 < 1
𝑣𝑎𝑟 𝑋 𝑣𝑎𝑟(𝑌)
Sample
correlation
coefficient
𝑆𝑆𝑥𝑦
𝑟= , −1 < 𝑟 < 1
𝑆𝑆𝑥 𝑆𝑆𝑦
Correlation vs
causation
Correlation does not
automatically indicate causation
• Pearson’s

Correlation correlation

in R coefficient:

• cor(x, y)

• cor.test()
Vectors &
factors
Basic commands in R

• Functions
• funcname(input1, input2)

• Vector creation
• x <- c(1, 2, 3)

• x = c(1, 2, 3)
• Length of set

• length(x)
Basics in R • List of objects

• ls()
• Remove object

• rm()
Challenge
• Input data manually
• Create scatterplot of data

#exploredata

You might also like