Correlation and
Regression
Dr. Hanaa Moussa
Correlation and regression:
An area of inferential statistics involves determining whether a relationship exists
between two or more numerical or quantitative variables.
Examples:
Educators are interested in determining whether the number of hours a student
studies is related to the student’s score on a particular exam.
Medical researchers are interested in questions such as, Is caffeine related to
kidney damage? or Is there a relationship between a person’s age and his or her
blood pressure?
These are only a few of the many questions that can be answered by using the
techniques of correlation and regression analysis.
Correlation and regression:
Correlation: is a statistical method used to determine whether a relationship
between variables exists.
Regression: is a statistical method used to describe the nature of the
relationship between variables, that is, positive or negative, linear or
nonlinear.
1. Are two or more variables linearly related?
2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
To answer questions (1) & (2), statisticians use a numerical measure this
measure is called a correlation coefficient.
To answer the question (3), you must ascertain what type of relationship
exists. There are two types of relationships
1.simple relationships (simple regression): two variables—(independent or
explanatory or predictor variable) & (dependent or response variable)
2. multiple relationships (multiple regression ).
Finally, the question (4) asks what type of predictions can be made.
Predictions are made in all areas and daily
Correlation coefficient
Correlation coefficient (r): measures the strength and direction of a
linear relationship between two quantitative variables x and y.
The symbol for the sample correlation coefficient is 𝒓.
The range of the correlation coefficient is from −𝟏 to +𝟏 .
If there is a strong positive linear relationship between the
variables, the value of 𝒓 will be close to +𝟏 .
If there is a strong negative linear relationship between the
variables, the value of 𝒓 will be close to −𝟏.
When there is no linear relationship between the variables or
only a weak relationship, the value of 𝒓 will be close to 0.
A scatter plot
is a graph of the ordered pairs (x, y) of numbers consisting of the independent
variable x and the dependent variable y.
Example
Construct a scatter plot for the data obtained in a study on the number of absences and the final
grades of seven randomly selected students from a statistics class. The data are shown here.
Degree of correlation positive negative
Perfect correlation +1 -1
Strong correlation +0.75 to 0.99 -0.75 to- 0.99
Moderate correlation +0.25 to 0.74 -0.25 to -0.74
Weak correlation 0< to 0.24 0> to -0.24
No correlation 0 0
Correlation coefficient for quantitative data
1. Pearson product moment correlation coefficient (PPMC)
Correlation coefficient examples
Example : Construct a scatter plot for the data obtained in a study on the number of
absences and the final grades of seven randomly selected students from a statistics class.
Also, find the correlation coefficient. The data are shown.
Example (2):
A researcher wishes to see if there is a relationship between the ages and net worth
of the wealthiest people in America. The data for a specific year are shown. Evaluate
Pearson’s correlation coefficient
Regression Line: (Line of best fit)
Linear Regression Equation:
𝑌 = 𝑎 + 𝑏𝑋
Where;
𝑛 𝑋𝑌 − 𝑋 𝑌
𝑏=
𝑛 𝑋2 − 𝑋 2
And
𝑌 𝑋
𝑎= −𝑏
𝑛 𝑛
Example:
Find the regression line for the data obtained in a study on the number of
absences and the final grades of seven randomly selected students from a
statistics class.
𝑦 𝑥
𝑎= −𝑏
𝑛 𝑛
511 57
𝑎= − (−3.662) = 102.493
7 7
Example2
Evaluate the Pearson’s correlation coefficient and regression line for the data
shown for car rental companies in the United States for a recent y.
Pearson’s correlation coefficient
regression line
𝑦 𝑥
𝑎= −𝑏
𝑛 𝑛
18.7 153.8
𝑎= — (0.106) = 0.369
6 6
𝑦 ′ = 𝑎 + 𝑏𝑥
𝑦 ′ = 0.369 + 0.106𝑥
Points
Example
Find the coefficient of determination for the data obtained in a study on
the number of absences and the final grades of seven randomly selected
students from a statistics class.
𝒓2 = 0.8911
𝒓2 = 0.8911
This result means that 89% of the variation in the dependent
variable (Final grade y) is accounted for by the variations in the
independent variable (number of absence x). The rest of the
variation, 0.11, or 11%, is unexplained.
Correlation coefficient for ordinal data
2. Spearman correlation coefficient (Rank correlation ): it is used to find
the correlation between the ordinal qualitative variables
Where,
𝒏 is the number of the data pairs
𝒅 is the rank difference.
Example
Example
Correlation coefficient examples
Example (5) : the following table gives the grades of some students in mathematics and
statistics. Find the correlation coefficient between them.
Reference:
Bluman, Allan G. Elementary statistics : a step by step approach / Allan Bluman. —
8th ed.