0% found this document useful (0 votes)

25 views67 pages

BSC - Applied Statistics - Correlation and SLR

The document outlines the curriculum for Applied Statistics I, focusing on evaluating relationships between variables, computing correlation coefficients, and fitting simple linear regression models. It includes practical activities, examples, and theoretical concepts such as covariance, correlation, and regression analysis. The course aims to equip students with the skills to analyze and interpret statistical data using software tools.

Uploaded by

Sahan Pramuditha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views67 pages

BSC - Applied Statistics - Correlation and SLR

Uploaded by

Sahan Pramuditha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 67

Applied Statistics I

CC 3201 (2 Credits)
BSc in ARMT, AB, GT

Upuli I Wickramaarachchi
Learning Outcomes:

1. Evaluate the strength and direction of a relationship

between two variables

2. Compute and interpret Correlation coefficients

3. Fit and interpret Simple Linear Regression formula

coefficients

4. Use of statistical software to compute correlation

coefficients and fitting a SLR model
Activity 1

Record the heights and weights of a random sample of

15 students of the same sex (Males).

Is there any apparent relationship between the two

variables?

Would you expect the same relationship (if any) to exit

between the heights and weights of the opposite sex?
Studying Relationships - Example

The data below shows the marks obtained by 1st year ARMT
students for Basic Maths and Statistics courses
Student 3257 326 321 325 328 319 321 326 327 324
ID 0 4 3 9 0 5 5 1 8
Maths 20 23 8 29 14 11 11 20 17 17
(out of
30) x
Stats (out 30 35 21 33 33 26 22 31 33 36
of 30) y
 Is there a relationship between the marks obtained by each
students for maths and statistics?
Studying Relationships – Example
contd.
• A starting point would be to plot the marks as a scatter plot

• We can calculate the means,

= 17

= 30
30

• Now the graph is divided into 4

sections
17 • The problem is to find how strong
the tendency is?
What is Covariance?

 An attempt to quantify the tendency to go from bottom left to

top right is to evaluate the expression

)(
Simplifying Covariance Formula

and are constants (averages), so you can factor them

out of summations
Also, =n and =n
Simplifying Covariance Formula contd.
Replace the constants and
simplify
Simplifying Covariance Formula contd.
Covariance
Covariance measures the direction of the linear
relationship between two variables — whether they
increase or decrease together.

𝟏 𝟏
𝑺𝒙𝒚 = ∑ ( 𝒙 − 𝒙 )( 𝒚 − 𝒚 ¿ ¿= ( ∑ 𝒙𝒚 − 𝒙 𝒚 ¿ ¿ )
𝒏 𝒏
Studying Relationships – Example
contd.
• Thus, relationship between Stat and Maths marks is,
𝟏
𝑺 𝒙𝒚 =
𝒏
∑ 𝒙𝒚 − 𝒙 𝒚
By substituting values to the formula,
𝟏
𝑺 𝒙𝒚 = ×𝟓𝟑𝟏𝟑− 𝟑𝟎 ×𝟕𝟎=𝟐𝟏 . 𝟑
𝟏𝟎
• But what about the strength and
the direction of relationship?
Correlation?

 A correlation is a statistical measure of association/relationship

between two variables.
 Correlation analysis quantifies the degree (strength and
direction) to which an association tends to a certain pattern
via a measure called a correlation coefficient.
 For example, Pearson’s correlation coefficient—measures the
degree to which two variables tend toward a straight-line
relationship.
Quantifying correlation

There are different methods for quantifying correlation, but

these all share a number of properties:

1. If there is no relationship between the variables, the

correlation coefficient will be zero. The closer to 0 the
value, the weaker the relationship. A perfect
correlation will be either -1 or +1, depending on the
direction.
Quantifying correlation

2. The value of a correlation coefficient indicates the

direction and strength of the association, but it says
nothing about the steepness of the relationship. A
correlation coefficient is just a number, so it can’t tell us
exactly how one variable depends on the other.
Pearson’s product-moment correlation (r)

 A measure of linear association between numeric

variables.
 Give strength and direction of the relationship between the
02 variables
 This means Pearson’s correlation (r) is appropriate when
numeric variables follow a ‘straight-line’ relationship. That
doesn’t mean they have to be perfectly related, by the
way. It simply means there shouldn’t be any ‘curviness’ to
Pearson’s correlation formula

and
Or simply, Pearson’s correlation formula is
written as

Where,

and
“Pearson’s correlation test” - Assumptions

1. Both variables are measured on an interval or ratio scale.

2. The two variables are normally distributed (in the
population).
3. The relationship between the variables is linear.
Measures strength of the association
Correlation tests between two variables

Spearman’s rank correlation

Pearson’s product-
moment correlation (r)
For qualitative variables
For quantitative variables
(Ordinal scale data)
(Ratio and interval scale data)
• r only quantifies the relationship between x and y, but doesn’t show the
relativeness  doesn’t reveal the “form of relationship”
Y=3 + 5x Y=3 + 10x Y=3 + x
y y
y

x x
x
r= r= r=
1 1 1
• Though, r=1, form of the relationship is not same (rate of increment is
different)
• Our correlation analysis only characterises the strength and direction of
the association. We need to use a different kind of analysis to say
Relationships and
regression
Applications of Regression Analysis:

Much of biology is concerned with relationships between

numeric variables. For example:
 We sample fish and measure their length and weight
because we want to understand how weight changes with
length.
 We survey grassland plots and measure soil pH and
species diversity to understand how species diversity
depends on soil pH.
Applications of Regression Analysis:

 We manipulate temperature and measure fitness in

insects because we want to describe their thermal
tolerance.
 Studying the joint effect seed quality, soil fertility,
fertilizer used, Temperature and rainfall on rice yield
Regression Analysis:
 In contrast to correlation, a regression analysis allows us
to make precise statements about how one numeric
variable depends on the values of another. Graphically,
we can evaluate such dependencies using a scatter plot.
We may be interested in knowing:
 Are the variables related or not? There’s not much point
in studying a relationship that isn’t there:
Regression Analysis :
 Is the relationship positive or negative? Sometimes we
can answer a scientific question just by knowing the
direction of a relationship:
Regression Analysis :
 Is the relationship a straight line or a curve? It is
important to know the form of a relationship if we want
to make predictions:
Uses of Regression Analysis :

Purpose: Models the relationship by fitting a line

(equation) that predicts the value of one variable
(dependent) from another (independent).
Uses of Regression Analysis :
 To know the form of relationship
 Parameter estimation
 In controlling purposes (In production processes)
 Predictions
 Given the value of X, value of Y can be estimated
Uses of Regression Analysis :

n g
a ti
o l
a p
t r
E x
ng
ti
la
t i rp
o
l a In
t e

p o ng
e r
t
In
Regression

Univariate Multivariate regression

Single response variable Multiple response variables
regression
( ‘y’) (many ‘y’s)

Simple linearMultiple linear

regression regression
Steps in Regression Analysis :

1. Statement of the problem

2. Selection of potentially relevant variables

3. Data collection

4. Model specification

5. Model validation

6. Use the fitted model

What does linear regression do?

 Simple linear regression allows researchers to predict

how one variable (the response variable) responds or
depends on to another (the predictor variable),
assuming a straight-line relationship.
How does simple linear regression work?

Finding the best fit line:

 If we draw a straight line through a set of points on a graph

then, unless they form a perfect straight line, some points will lie
close to the line and others further away.

 The vertical distances between the line and each point (i.e.
measured parallel to the 𝑦-axis)  residuals.
• The residuals represent the ‘left
over’ variation after the line has
been fitted through the data. They
indicate how well the line fits the
data
• If all the points lay close to the
line, the variability of the residuals
would be low relative to the
overall variation in the response
variable, 𝑦.
Regression works by finding the line which
minimises the size of the residuals in some sense
Simple Linear Regression
Simple Linear Regression

Y = β0 + β 1 X + ε

where

 𝑦 is the response variable,

 𝑥 is the predictor variable,

 β0 is the intercept (i.e. where the line crosses the 𝑦 axis), and

 β1 is the slope of the line.

 ε – random error

 The slope of the line  the amount by which 𝑦 changes for a change of one unit in 𝑥.
Simple Linear Regression

Y = β0 + β 1 X + ε

 The slope of the line  the amount by which 𝑦 changes for a change of

one unit in 𝑥.

 If the value of 𝑏 is positive (i.e. a plus sign in the above equation) this

means the line slopes upwards to the right.

 A negative slope (𝑦 = 𝑎 − 𝑏𝑥) means the line slopes downwards to the

right.
Calculating Simple Linear Regression


𝑦 = 𝛽0 + 𝛽1 𝑥 +𝜀
Estimating the model parameters  OLS method

𝑦=^
^ 𝛽0 + ^
𝛽1 𝑥

^𝛽 = 𝑆 𝑆 𝑥𝑦 ^𝛽 =¯ ^𝛽 𝑥
1 0 𝑦 − 1¯
𝑆 𝑆𝑥
 The Simple Linear Regression Model
𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝜀
𝑦=^
𝛽0 + ^
 The Least Squares Regression Line where
^ 𝛽1 𝑥

^𝛽 = 𝑆 𝑆𝑥𝑦 ^𝛽 =¯
𝑦 ^𝛽 𝑥
1
𝑆 𝑆𝑥 0 − 1¯

𝑆 𝑆𝑥 = ∑ ¿ ¿
(∑ 𝑥 𝑖 ) ( ∑ 𝑦 𝑖 )
𝑆 𝑆𝑥𝑦 =∑ ( 𝑥 𝑖 − ¯ ¯ )=∑ 𝑥 𝑖 𝑦 𝑖 −
𝑥 )( 𝑦 𝑖 − 𝑦 45
𝑛
Example - SLR

An educational economist wants to establish the relationship

between an individual’s income and education. He takes a
random sample of 10 individuals and asks for their annual
income ( in $1000s) and education ( in years). The results are
shown below.
Education 11 12 11 15 8 10 11 12 17 11

Income
25 33 22 41 18 28 32 24 53 26
Dependent and Independent
Variables
 The dependent variable is the one that we want to
forecast or analyze.
 The independent variable is hypothesized to affect the
dependent variable.
 In this example, we wish to analyze income and we
choose the variable individual’s education that most
affects income. Hence, y is income and x is individual’s
First Step:

∑𝑖
𝑥 =118
Sum of Squares:

𝑆 𝑆𝑥𝑦 =∑ 𝑥𝑖 𝑦 𝑖 −( ∑ 𝑥𝑖 )¿ ¿
𝑆 𝑆𝑥 = ∑ 𝑥 − ¿ ¿
2
𝑖
Therefore, ^𝛽 = 𝑆 𝑆𝑥𝑦 = 215 . 4 =3 . 74
1
𝑆 𝑆𝑥 57 . 6
^𝛽 =¯𝑦 − ^𝛽 𝑥¯ = 302 − 3 . 74 118 =− 13 . 93
0 1
10 10
50
The Least Squares Regression Line

 The least squares regression line is

^
𝑦 =−13.93 +3.74 𝑥

 Interpretation of coefficients:
 The sample slope ^
𝛽 1=3.74 tells us that on average for each additional
year of education, an individual’s income rises by $3.74 thousand.

 The y-intercept is^

𝛽 0 =−13.93 . This value is the expected (or
average) income for an individual who has 0 education level (which
is meaningless here)
51
Coefficient of Determination (R²)
 R² measures the degree of linear association between X
and Y.
 So, an R² close to 0 does not necessarily indicate that X
and Y are unrelated (relation can be nonlinear)
 Also, a high R² does not necessarily indicate that the
estimated regression line is a good fit.
52
Coefficient of Determination (R²)

𝟐 𝟐
𝑹 =𝒓
53
Degrees of freedom, mean squares and F tests

 Degrees of freedom – no.of independent components in a statistic

Source of Sum of Degre Mean F0

Variation Squares es of Squares
Freedo
m
SSR 1 MSR
Regression
SSE=SST-SSR n-2 MSE F0 =
Residual MSR/MSE
SST n-1
Total

 To test the hypothesis H0 : β1 = 0 at α% level of significance, compute the test

statistics F0 and reject F0 > F 1

n-2 or p-value
Interpreting the SLR Model

Fitted model

The residuals are the difference between the

actual values and the predicted values.

The p-value, in association with the t-

statistic, shows how significant the
coefficient is to the model.

the average amount that the actual

values of Y (the dots) differ from
the predictions (the line) in units of
Y
Significant of the
model
Coefficient of determination R2=
SSR/SST x 100%
Also, Coefficient of determination R2= (Coefficient of
Test for Model Adequacy (Testing
assumptions/Testing residuals)
1. Test for normality

H0: errors have a normal distribution

H1: errors do not have a normal distribution

This can be assessed by constructing a “normal probability plot of residuals”

A normal probability plot of the residuals is a

scatter plot with the theoretical percentiles of the
normal distribution on the x-axis and the sample
percentiles of the residuals on the y-axis
Test for Model Adequacy (Testing
assumptions/Testing residuals)
2. Test for constant variance

This can be assessed by constructing a “Residuals vs. Fits Plot”

Residuals vs. Fits Plot is a scatter plot of

residuals on the y-axis and fitted values
(estimated responses) on the x-axis. The plot
is used to detect non-linearity, unequal error
variances, and outliers.
The residuals have fairly constant variance (i.e. the
distance between the residuals and the value zero)
at each level of the fitted values.

Symmetric around 0 and

parallel to X
Non constant
variance

• If this happens no constant MSE, ANOVA is

wrong
• To avoid this situation data transformation is
• Constant variance, fitted model
is wrong.
• Maybe a non-linear regression
Test for Model Adequacy (Testing
assumptions/Testing residuals)
2. Test for constant variance

 Another method to test for constant variance is by using the Scale-Location

plot.

 Here, fitted values vs the square root of the standardized residuals are
plotted.

 Ideally, the residual points should equally spread around the red line, which
would indicate constant variance.
Linearity assumption?

In the above plot, we can see that there is a

clear pattern in the residual plot. This would
indicate that we failed to meet the
assumption that there is a linear relationship
between the predictors and the outcome
variable.
Test for Model Adequacy (Testing
assumptions/Testing residuals)
3. Test for independency

 The easiest way to check the assumption of independence is using the

Durbin-Watson test.

 The null hypothesis states that the errors are not auto-correlated with
themselves (they are independent).

 Thus, a p-value > 0.05, we would fail to reject the null hypothesis.
Test for Model Adequacy (Testing
assumptions/Testing residuals)

Interpreting Durbin-Watson Values:

 DW = 2: This signifies no autocorrelation, meaning the residuals are

independent, which is the ideal scenario for linear regression assumptions.

 DW < 2: This suggests positive autocorrelation, where successive errors are

correlated positively.

 DW > 2: This indicates negative autocorrelation, where successive errors are

correlated negatively.

BSC - Applied Statistics - Correlation and SLR
No ratings yet
BSC - Applied Statistics - Correlation and SLR
67 pages
Correlation
100% (1)
Correlation
29 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Raghunath Chatterjee Correlation Lecture
No ratings yet
Raghunath Chatterjee Correlation Lecture
40 pages
Correlation Regression
No ratings yet
Correlation Regression
18 pages
1.1.2simple Linear Regression
No ratings yet
1.1.2simple Linear Regression
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Lecture 7 - Correlation Regression
No ratings yet
Lecture 7 - Correlation Regression
47 pages
Correlation Regression
100% (1)
Correlation Regression
55 pages
Business Analytics: Data Analysis Methods
No ratings yet
Business Analytics: Data Analysis Methods
83 pages
Correlation and Simple Regression
No ratings yet
Correlation and Simple Regression
5 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
Statistics: Correlation & Regression
100% (1)
Statistics: Correlation & Regression
9 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
Day 3
No ratings yet
Day 3
85 pages
Simple Linear Regression Guide
No ratings yet
Simple Linear Regression Guide
89 pages
Simple Correlation and Regression Analysis
No ratings yet
Simple Correlation and Regression Analysis
16 pages
Unit 6, Regression
No ratings yet
Unit 6, Regression
34 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
71 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
25 pages
06 Correlation and Regression
No ratings yet
06 Correlation and Regression
63 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
Lesson 6.2 Correlation and Regression Analysis Final Edition
No ratings yet
Lesson 6.2 Correlation and Regression Analysis Final Edition
8 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Chapter 9-Correlation and Regression
No ratings yet
Chapter 9-Correlation and Regression
23 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Correlation and Regression
No ratings yet
Correlation and Regression
17 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Correlation and Simple Linear Regression
No ratings yet
Correlation and Simple Linear Regression
77 pages
Correlation and Regression Guide
No ratings yet
Correlation and Regression Guide
9 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Best Fit Line for Regression Analysis
No ratings yet
Best Fit Line for Regression Analysis
57 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
No ratings yet
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
35 pages
13simple Linear Regression
No ratings yet
13simple Linear Regression
127 pages
Probablity
No ratings yet
Probablity
4 pages
Biostatistics: Lect6: Correlation and Regression Analysis Dr. Ecem Yeğin
No ratings yet
Biostatistics: Lect6: Correlation and Regression Analysis Dr. Ecem Yeğin
28 pages
Lecture - Correlation and Regression GEG 222
100% (1)
Lecture - Correlation and Regression GEG 222
67 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
15 pages
Correlation and Regression Guide
No ratings yet
Correlation and Regression Guide
14 pages
Correlation and Regression Analysis Guide
No ratings yet
Correlation and Regression Analysis Guide
27 pages
Unit 7 8614
No ratings yet
Unit 7 8614
35 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
65 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Correlation and Regression Analysis: BMT 1063 Business Statistics
No ratings yet
Correlation and Regression Analysis: BMT 1063 Business Statistics
42 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Understanding Correlation Coefficient
No ratings yet
Understanding Correlation Coefficient
21 pages
Correlation 140708105710 Phpapp01
No ratings yet
Correlation 140708105710 Phpapp01
21 pages
Excel Regression for Finance Students
No ratings yet
Excel Regression for Finance Students
19 pages
Chapter Regression PDF
No ratings yet
Chapter Regression PDF
95 pages
Introduction to OLS Regression
No ratings yet
Introduction to OLS Regression
14 pages
Unit 7 8614
No ratings yet
Unit 7 8614
35 pages
DrSoomro - 2588 - 20292 - 1 - Lecture 9
No ratings yet
DrSoomro - 2588 - 20292 - 1 - Lecture 9
29 pages
Plane Stress and Strain 2
No ratings yet
Plane Stress and Strain 2
8 pages
Kendall's Tau and Spearman's Rank Correlation Coefficient Assess Statistical
No ratings yet
Kendall's Tau and Spearman's Rank Correlation Coefficient Assess Statistical
7 pages
CIRCLE
No ratings yet
CIRCLE
11 pages
Surviviorship Bubble Lab
No ratings yet
Surviviorship Bubble Lab
7 pages
X Practical 25-26
No ratings yet
X Practical 25-26
6 pages
Dynamic Memory & Pointers Guide
No ratings yet
Dynamic Memory & Pointers Guide
22 pages
String High Level Questions
No ratings yet
String High Level Questions
9 pages
Teaching Children Mathematics
No ratings yet
Teaching Children Mathematics
9 pages
Class 6 Worksheet Elementary Shapes
No ratings yet
Class 6 Worksheet Elementary Shapes
1 page
Advanced Free DPP - 01-2
No ratings yet
Advanced Free DPP - 01-2
6 pages
Purcell 10.2: Solutions To Problem Set 12
No ratings yet
Purcell 10.2: Solutions To Problem Set 12
5 pages
Structural Analysis Course Guide
No ratings yet
Structural Analysis Course Guide
3 pages
Class 9th - Kinematics PDF
No ratings yet
Class 9th - Kinematics PDF
1 page
Fem - Unit 4 and 5 - Question Bank
No ratings yet
Fem - Unit 4 and 5 - Question Bank
22 pages
2400full Calculus PDF
No ratings yet
2400full Calculus PDF
354 pages
33 As Statistics Unit 5 Test
No ratings yet
33 As Statistics Unit 5 Test
2 pages
P.6 M.T.C End Term 1
No ratings yet
P.6 M.T.C End Term 1
8 pages
Musa 12154
100% (1)
Musa 12154
37 pages
Units and Measurements
No ratings yet
Units and Measurements
14 pages
Matrix Computation Lecture Intro
No ratings yet
Matrix Computation Lecture Intro
5 pages
RS Aggarwal Class 12 Solutions Chapter-8
No ratings yet
RS Aggarwal Class 12 Solutions Chapter-8
86 pages
American National Standard For Roadway Lighting Equipmentñ Fiber-Reinforced Plastic (FRP) Lighting Poles
No ratings yet
American National Standard For Roadway Lighting Equipmentñ Fiber-Reinforced Plastic (FRP) Lighting Poles
15 pages
C Notes
100% (1)
C Notes
96 pages
Lecture Slides 5
No ratings yet
Lecture Slides 5
15 pages
Biostatistics in Neonatal Nursing
No ratings yet
Biostatistics in Neonatal Nursing
275 pages
English Grammar and Vocabulary Guide
No ratings yet
English Grammar and Vocabulary Guide
129 pages
Understanding Polygons and Their Types
No ratings yet
Understanding Polygons and Their Types
11 pages
Third-Order Intercept vs 1-dB Compression
No ratings yet
Third-Order Intercept vs 1-dB Compression
5 pages
Solution Manual For Cost Management Measuring Monitoring and Motivating Performance 2nd Edition by Eldenburg Newest Edition 2025
100% (12)
Solution Manual For Cost Management Measuring Monitoring and Motivating Performance 2nd Edition by Eldenburg Newest Edition 2025
154 pages
Notes
No ratings yet
Notes
5 pages

BSC - Applied Statistics - Correlation and SLR

Uploaded by

BSC - Applied Statistics - Correlation and SLR

Uploaded by

Applied Statistics I

1. Evaluate the strength and direction of a relationship

2. Compute and interpret Correlation coefficients

3. Fit and interpret Simple Linear Regression formula

4. Use of statistical software to compute correlation

Record the heights and weights of a random sample of

Is there any apparent relationship between the two

Would you expect the same relationship (if any) to exit

• We can calculate the means,

• Now the graph is divided into 4

 An attempt to quantify the tendency to go from bottom left to

and are constants (averages), so you can factor them

 A correlation is a statistical measure of association/relationship

There are different methods for quantifying correlation, but

1. If there is no relationship between the variables, the

2. The value of a correlation coefficient indicates the

 A measure of linear association between numeric

1. Both variables are measured on an interval or ratio scale.

Spearman’s rank correlation

Much of biology is concerned with relationships between

 We manipulate temperature and measure fitness in

Purpose: Models the relationship by fitting a line

Univariate Multivariate regression

Simple linearMultiple linear

1. Statement of the problem

2. Selection of potentially relevant variables

6. Use the fitted model

 Simple linear regression allows researchers to predict

Finding the best fit line:

 If we draw a straight line through a set of points on a graph

 𝑦 is the response variable,

 𝑥 is the predictor variable,

 β1 is the slope of the line.

means the line slopes upwards to the right.

 A negative slope (𝑦 = 𝑎 − 𝑏𝑥) means the line slopes downwards to the

An educational economist wants to establish the relationship

 The least squares regression line is

 The y-intercept is^

 Degrees of freedom – no.of independent components in a statistic

Source of Sum of Degre Mean F0

 To test the hypothesis H0 : β1 = 0 at α% level of significance, compute the test

statistics F0 and reject F0 > F 1

The residuals are the difference between the

The p-value, in association with the t-

the average amount that the actual

H0: errors have a normal distribution

H1: errors do not have a normal distribution

This can be assessed by constructing a “normal probability plot of residuals”

A normal probability plot of the residuals is a

This can be assessed by constructing a “Residuals vs. Fits Plot”

Residuals vs. Fits Plot is a scatter plot of

Symmetric around 0 and

• If this happens no constant MSE, ANOVA is

 Another method to test for constant variance is by using the Scale-Location

In the above plot, we can see that there is a

 The easiest way to check the assumption of independence is using the

Interpreting Durbin-Watson Values:

 DW = 2: This signifies no autocorrelation, meaning the residuals are

 DW < 2: This suggests positive autocorrelation, where successive errors are

 DW > 2: This indicates negative autocorrelation, where successive errors are

You might also like