0% found this document useful (0 votes)
147 views8 pages

Statistics and Probability: Quarter 4 - (Week 6)

This document provides information about correlation and regression analysis including: 1. It describes bivariate data which involves two related variables and discusses correlation analysis as a statistical method to determine if a relationship exists between two variables. 2. It explains how to calculate Pearson's correlation coefficient (r) which measures the strength and direction of the linear relationship between two variables. 3. It gives an example of calculating r using a set of bivariate data and interpreting the results based on the value of r.

Uploaded by

Jessa May Marcos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views8 pages

Statistics and Probability: Quarter 4 - (Week 6)

This document provides information about correlation and regression analysis including: 1. It describes bivariate data which involves two related variables and discusses correlation analysis as a statistical method to determine if a relationship exists between two variables. 2. It explains how to calculate Pearson's correlation coefficient (r) which measures the strength and direction of the linear relationship between two variables. 3. It gives an example of calculating r using a set of bivariate data and interpreting the results based on the value of r.

Uploaded by

Jessa May Marcos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

3

STATISTICS AND
PROBABILITY
Quarter 4 – (Week 6)
CORRELATION AND
REGRESSION ANALYSIS

Name of Student:

____________________________________________________

Grade & Section:


1

___________________________________________________
Page
EXPECTATION
At the end of the learning module, you are expected to:
1. describe the nature of bivariate data;
2. calculate the Pearson’s sample correlation coefficient; and
3. solve problem involving correlation analysis.

LESSON
In our previous study of statistics, we dealt with data which involve a single
variable. These are called univariate data. Since we are dealing with a single variable
independently of the other variables, the only statistical option we can do is to describe
it in terms of its central tendency, variation, or other descriptive statistics. In this lesson,
we will learn how to describe bivariate data.

Bivariate data are data that involve two variables as different from univariate
data that involve only a single variable. In univariate data, the major purpose of the
analysis is to describe based on the descriptive statistics computed such as averages,
standard deviations, frequency counts and the like.

In bivariate data, the purpose of the analysis is to describe relationships where


new statistical methods will be introduced. We will be describing relationships between
related variables in terms of strength and direction.

There are lot of examples of bivariate data. For example, in a class, IQ scores
and math scores in a long exam can be collected. In chemistry, a confined gas can
have different volumes and corresponding pressure. In physics, an elastic spring can
be subjected to varying stress which will be accompanied by a corresponding
elongation or strain. In motoring, we can collect from different cars their ages and their
mileages.

IQ scores are related to scores in math. An IQ score defines the intelligence of a


student. More often than not, intelligent students will perform better in mathematics.
Thus, the higher the IQ of a student is, the higher his math score will be. On the other
hand, mileage and age of the car are related in different way. The older the car is, the
lesser its mileage is.

Correlation analysis is a statistical method used to determine whether a


relationship between two variables exist.

A scatter plot is the most common display of qualitative data. It shows pattern,
trends, relationship and possible extraordinary value/s between the variable.

Steps in Constructing a Scatter Plot


1. Draw a graph and label the x- and y- axes.
2. Assign each qualitative variable to an axis.
2
Page
3. Choose a range for each axis that includes the maximum and the minimum values in
the data set.
4. Plot each point on the graph.

Direction of Correlation

Positive Correlation exists when high values of one variable correspond to high values in
the other variable or low values in one variable correspond to low
values in the other variable.
Negative Correlation exists when high values of one variable correspond to low values in
the other variable or low values in one variable correspond to high
values in the other variable.
Zero Correlation exists when high values in one variable correspond to either high or
low values in the other variable

Strength of Correlation
• Perfect
• Very high
• Moderately high
• Moderately low
• Very low
• Zero

The trend line is the line closest to the point. The direction of the line tells the
direction of correlation that exist between the variables. If the trend line points to the right,
its slope is positive, thus there is a positive correlation between two variables. If it points
to the left, there is negative correlation between two variables.

3
Page
Pearson Product-Moment Correlation Coefficient

The Pearson Product-Moment Correlation Coefficient also called the sample


correlation coefficient r, is a widely used statistical measure of strength of a linear
relationship between two variables. It is given by

𝑛∑𝑋𝑌 − ∑𝑋 • ∑𝑌
𝑟=
�[𝑛∑𝑋 2 − (∑𝑋)2 ][𝑛∑𝑌 2 − (∑𝑌)2 ]

Where: r = sample correlation coefficient


n = sample size
X = values of variable x
Y = values of variable y

We will use the given table to determine the strength of the computed r.

Pearson r Qualitative Description


±1 Perfect
±0.75 to < ±1 Very high
±0.50 to < ±0.75 Moderately high
±0.25 to < ±0.50 Moderately low
> 0 to < ±0.25 Very low
0 No correlation

Example 1:

Determine the value of Pearson r for the following data and interpret the results.

X 3 5 6 8 10
Y 16 14 10 12 20

a.) Construct the table shown below


X Y X2 Y2 XY
3 16
5 14
6 10
8 12
10 20

b.) Complete the table above by:


• Square all entries in the X column and put them under X2 column.
• Square all entries in the Y column and put them under Y2 column.
4

• Multiply entries in X and Y columns and put them in XY column.


Page

• Get the summation of all entries in X, Y, X2, Y2 and XY column.


X Y X2 Y2 XY
3 16 9 256 48
5 14 25 196 70
6 10 36 100 60
8 12 64 144 96
10 20 100 400 200
∑ 𝑋 =32 ∑ 𝑌 =72 ∑ 𝑋2=234 ∑ 𝑌2=1096 ∑ 𝑋𝑌 = 474

Use the Pearson Product Moment Correlation Formula to solve for r and interpret.

Solving for r :
n=5
∑ 𝑋 =32
∑ 𝑌 =72
∑ 𝑋2=234
∑ 𝑌2=1096
∑ 𝑋𝑌 = 474

5(474)−(32)(72)
𝑟=
�[5(234)−(32)2 ][5(1096)−(72)2 ]

2370−2304
𝑟=
�[1170−1024][5480−5184

66
𝑟=
�(146)(296)

66
𝑟=
√43216

𝑟 = 0.32; moderately low but positive

Steps in Solving the Pearson’s r Correlation Coefficient

1. Arrange the given bivariate data in tabular form with the values of the first
variable (X) in the first column and the second variable (Y) in the second
column.
2. Calculate the sum of all the values of X and Y.
3. Square each value of the first variable X and then find the summation of the
squares.
4. Do the same with the second variable Y.
5. Multiply the corresponding values of X and Y and solve the summation of the
products.
6. Substitute the summation values in the formula, solve and interpret the result.
5
Page
Example 2:
Andrew studies of age correlates with the average number of hours of sleep, so
he selects a random sample of size 6 and surveyed the needed data. Can Andrew
conclude a strong relationship between a person’s age and the number of hours he or
she sleeps?
Age (X) 10 16 22 30 34 40
Hours of Sleep (Y) 8 7 8 7 6 5

X Y X2 Y2 XY
10 8 100 64 80
16 7 256 49 112
22 8 484 64 176
30 7 900 49 210
34 6 1156 36 204
40 5 1600 25 200
∑ 𝑋 =152 ∑ 𝑌 =41 ∑ 𝑋2=4496 ∑ 𝑌2=287 ∑ 𝑋𝑌 =982

Solving for r : n=6 ∑ 𝑋 =152 ∑ 𝑌 =41


∑ 𝑋2=4496 ∑ 𝑌2=287 ∑ 𝑋𝑌 =982

6(982)−(152)(41)
𝑟=
�[6(4496)−(152)2 ][6(287)−(41)2]

5892−6232
𝑟=
�[26976−23104][1722−1681

−340
𝑟=
�(3872)(41)

−340
𝑟=
√158752

𝑟 = −0.85

The computed r value is -0.85. Hence, the relationship between a person’s age and the
number of hours he or she sleep is very high but negative.

Example 2:
A college professor surveyed 10 College students and gathered the data
given below. He wants to determine the strength of association between the
student’s midterm (X) and final (Y) grade.

Midterm(X) 79 85 84 89 89 91 90 92 93 95
6

Final (Y) 80 82 79 83 89 91 89 90 94 95
Page
X Y X2 Y2 XY
79 80 6241 6400 6320
85 82 7225 6724 6970
84 79 7056 6241 6636
89 83 7921 6889 7387
89 89 7921 7921 7921
91 91 8281 8281 8281
90 89 8100 7921 8010
92 90 8464 8100 8280
93 94 8649 8836 8742
95 95 9025 9025 9025
∑ 𝑋 =887 ∑ 𝑌 =872 ∑ 𝑋2=78883 ∑ 𝑌2=76338 ∑ 𝑋𝑌 = 77572

Solving for r : n = 10 ∑ 𝑋 =887 ∑ 𝑌 =872


∑ 𝑋2=78883 ∑ 𝑌2=76338 ∑ 𝑋𝑌 = 77572

10(77572)−(887)(872)
𝑟=
�[10(78883)−(887)2][10(76338)−(872)2]

775720−773464
𝑟=
�[788830−786769][763380 – 760384]

2256
𝑟=
�(2061)(2996)

𝑟 = 0.91

The computed r value is 0.91. Thus, the relationship between the student’s
midterm and final grades is very high but positive.

ACTIVITY
A. Write TRUE in the blank if the statement is true; otherwise, write FALSE.

1) There is a negative correlation between two variables if the points are very
close to a straight line with a negative slope.
2) If r is equal to -1, the relationship lacks linearity.
3) The Pearson Product-Moment Correlation is also known as the Regression
Coefficient.
4) An r equal to 1 or -1 implies a perfect linear relationship between two
variables.
5) When r = -1, all the points from the sample lie on a straight line with a
7

positive slope.
Page
Directions: Encircle the letter of the correct answer. (Corresponding points are
indicated in each item)
For item 6-9, the data below shows the number of times a customer calls for
customer service in a mobile phone company versus the customer satisfaction rating,
with 10 as the highest.
Number of Calls (X) Satisfaction Rating (Y)
8 4
18 3
17 4
15 5
9 8
7 9
11 4
16 5
20 2

6. What is n? (1 point)
A. 7 B. 8 C. 9 D. 10
7. The value of ∑ 𝑋2 is equal to _____ . (2 points)
A. 44 B. 121 C. 1809 D. 1890
8. Which of the following is the value of ∑ 𝑋𝑌? (2 points)
A. 44 B. 121 C. 256 D. 528
9. What is the coefficient of r ? (2 points)
A. – 0.74 B. – 0.70 C. 0.74 D. 0.70

For item 10-13, a gadget store keeps track of the number of advertisement it placed in
local newspaper and the number of gadgets sold each week. The following data shown below.

Number of Ads (X) 6 5 5 7 3 3 2


Gadgets Sold (Y) 18 13 12 13 10 9 6

10. The value of ∑ 𝑌2 is equal to _____. (2 points)


A. 31 B. 81 C. 393 D. 1023
11. Which of the following is the value of ∑ 𝑋𝑌? (2 points)
A. 393 B. 157 C. 81 D. 31

12. What is the coefficient of r ? (2 points)


A. – 0.83 B. – 0.80 C. 0.80 D. 0.83
13. How would you describe the relationship between the number of advertisement it
placed in local newspaper and the number of gadgets sold each week? (2 points)
A. Very high but negative C. Very low but negative
B. Very high but positive D. Very low but positve
8
Page

You might also like