0% found this document useful (0 votes)
64 views49 pages

Data Management Module

This document is a module from Gordon College on Data Management, specifically focusing on statistics. It covers the definition of statistics, its branches, types of variables, methods of data collection, and data presentation techniques. The module aims to equip students with statistical tools for data processing and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views49 pages

Data Management Module

This document is a module from Gordon College on Data Management, specifically focusing on statistics. It covers the definition of statistics, its branches, types of variables, methods of data collection, and data presentation techniques. The module aims to equip students with statistical tools for data processing and decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Republic of the Philippines

City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

College of Education
Arts and Sciences

GEC04
Mathematics in the Modern
World

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
1
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Module V:
DATA
MANAGEMENT
Ms. Katherine D. Yap

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
2
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Module No. V

“To understand God’s thoughts we must study Statistics, for these are the measure of His purpose”
Florence Nightingale
I. Introduction

From the definition of UCI Department of Statistics, statistics is the science


concerned with developing and studying methods for collecting, analyzing, interpreting
and presenting empirical data. Statistics is a highly interdisciplinary field; research in
statistics finds applicability in virtually all scientific fields and research questions in the
various scientific fields motivate the development of new statistical methods and theory.
II. Learning Objectives
At the end of Module V, the students are expected to:
1. Use a variety of statistical tools to process and manage numerical data.
2. Use the methods of linear regression and correlation to predict the value of the variable
given certain conditions.
3. Advocate the use of statistical data in making important decisions.

III. Topics and Key Concepts


INTRODUCTION TO BASIC TERMS IN STATISTICS
A. DIVISION OF STATISTICS
From the definition of emathzone.com, statistics may be divided into two main
branches:
(1) Descriptive Statistics deals with the collection of data, its presentation in various
forms, such as tables, graphs and diagrams and finding averages and other
measures which would describe the data.
(2) Inferential statistics deals with techniques used for the analysis of data, making
estimates and drawing conclusions from limited information obtained through
sampling and testing the reliability of the estimates.

B. POPULATION AS DIFFERENTIATED FROM SAMPLE


The word population defines by Paguio et al. (2012) that refers to the total
collection of actual or potential realizations of the unit of observations in the research
study.
Lumenlearning.com stated that sometimes a government wishes to try to gain
information about all the people living within an area with regard to gender, race,
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
3
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

income, and religion. This type of information gathering over a whole population is
called a census.
A sample is a set of data collected and/or selected from a population by a defined
procedure which is defined by lumenlearning .com.

C. VARIABLE
Australian Bureau of Statistics stated that variable is any characteristics, number, or
quantity that can be measured or counted. A variable may also be called a data item.
Age, sex, business income and expenses, country of birth, capital expenditure, class
grades, eye color and vehicle type are examples of variables. It is called a variable
because the value may vary between data units in a population, and may change in
value over time.
For example; 'income' is a variable that can vary between data units in a population
(i.e. the people or businesses being studied may not have the same incomes) and can
also vary over time for each data unit (i.e. income can go up or down).
Paguio et al (2012) mentioned that variables are classified into qualitative and
quantitative variable. A qualitative variable which is called categorical, has values that
are described by words rather than numbers. Qualitative variables generally have
either nominal or ordinal scales, for example: gender, disease status, occupation,
gender, race and others. Quantitative or numerical variable is a data which arise from
counting, measuring something or from some kind of mathematical operation. These are
variable that are intrinsically numeric. Number of children in a family, and age are good
examples of quantitative variable.

D. VARIABLES ACCORDING TO CONTINUITY OF VALUES


The Australian Bureau of Statistics enumerated that, numeric variables may
be further described as either continuous or discrete:
A continuous variable is a numeric variable. Observations can take any value between
a certain set of real numbers. The value given to an observation for a continuous
variable can include values as small as the instrument of measurement allows.
Examples of continuous variables include height, time, age, and temperature.

A discrete variable is a numeric variable. Observations can take a value


based on a count from a set of distinct whole values. A discrete variable cannot take the
value of a fraction between one value and the next closest value.
Examples of discrete variables include the number of registered cars, number of
business locations, and number of children in a family, all of which measured as whole
units (i.e. 1, 2, 3 cars).

The data collected for a numeric variable are quantitative data.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
4
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

E. VARIABLES ACCORDING TO SCALE OF MEASUREMENT


a. Nominal Scale - Is a categorical variable. Observations can take a value that is not
able to be organized in a logical sequence. Examples of nominal categorical variables
include sex, business type, eye color, religion and brand.
b. Ordinal variable is a categorical variable where observations can take a value that
can be logically ordered or ranked. The ordinal variables can be ranked higher or
lower than another that do not necessarily establish a numeric difference between
each category. Examples of ordinal categorical variables include academic grades,
clothing size (i.e. small, medium, large, extra large) and attitudes (i.e. strongly agree,
agree, disagree, strongly disagree).
c. Interval Scale – Ordered scales in which the measurement disparity provides a
consistent quantity or intervals between points of scale. No absolute zero or absence
of a significant zero is a key feature of data intervals. Example: temperature, scores
in achievement test, calendar time
d. Ratio scale – Ordered scales in which the difference between the measurements
involves absolute zero meaning it has a meaningful zero which represents the
absence of the measured quantity. Example: age, weight, height, distance

F. TYPES OF DATA
There are two types of data, primary data and secondary data. Primary data, apply to
data obtained or analyzed from first-hand experience that is derived directly from the
original source. There are numerous advantages for primary data like it is more accurate
and more likely to be correct while its disadvantages were costly and time consuming.
On the other hand, Secondary data refers to information previously obtained by certain
persons or organizations or collected in the past or by other parties. Its advantages are
can be obtained easily, less expensive because it can be done with books and over the
internet and many more while disadvantages are the information needed does not meet
one’s specified needs and no control over the quality of data.

G. METHODS OF COLLECTING DATA


1. Observation Method
Those require human or mechanical knowledge of what people are actually doing
or what things are taking place and the information is collected by knowledge of
process at work.
2. Experimentation Method

The study of cause and effect is an experiment. This differs from non-
experimental approaches in that it involves one variable being intentionally changed,
while attempting to keep the other variables stable.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
5
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

3. Registration Method

This applies to continuous, permanent, mandatory documentation of the


occurrence of critical events along with some identifying or descriptive features relating
to them, as provided by the civil code, laws or regulations. Example of registration
method are the records of births, deaths, marriages and COMELEC registration record
of all Filipinos of voting age at the National Statistics Office.

4. Direct Method

The researcher gets the requisite information directly from the interviewer,
and the direct personal interview gathers the information.

5. Indirect Method
It is a method of gathering primary data which is most widely used and collected
through a collection of questionnaires. A questionnaire is a document prepared by the
researcher which contains a collection of questions provided to obtain the information
needed.
A. PRESENTATION OF DATA
1. TEXTUAL. This presentation mode of data is clarified or addressed in text form or
as a paragraph.
2. TABULAR. The data are presented systematically through tables consisting of
vertical columns and horizontal rows with headings detailing those rows and columns.
3. GRAPHICAL. The most effective means of presenting statistical data, is in
graphical method since it will make the information clearer.
Types of Common Graphs
1. Scatter Plot. This shows n pairs of observations as dots on an X-Y graph, it is
usually used in investigating the relationship between two variables and if there is
an association between two variables and what kind of association that exists.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
6
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

2. Histogram. It is graphical representation of a frequency distribution where it is a


bar chart whose Y-axis shows the number of data values within each class of a
frequency distribution and whose X-axis shows the class boundary of each class
and there should be no gaps between bars.

3. Line chart. This is used to view time series, spot patterns or compare periods and
can display several variables at once.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
7
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

4. Pie chart. This is a circular graph that displays the relative contribution that
different categories make to an aggregate sum, a circle wedge reflects the
contribution of each category, so that the graph resembles a pie cut into various
sizes.

5. Frequency Polygon and Ogive. A frequency polygon is a line graph that links the
midpoints of the histogram intervals, plus additional intervals at the start and end
so the line crosses the X-axis. An ogive is a graph of the cumulative frequencies in
rows.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
8
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

FREQUENCY DISTRIBUTIONS
Raw Data

Raw data are collected data which have not been organized numerically. An example
is the set of mass of 200 male students obtained from an alphabetical listing of college records.

Array

An array is an arrangement of raw numerical data according to magnitude which is


ascending or descending order. The difference between the largest and smallest number is
called the RANGE of the data. For example, if the largest mass of 200 male students is 84 kg
and the smallest mass is 63 kg, the range is 84-63 = 21 kg.

Frequency Distribution

It is a tabular arrangement of data showing its classification or grouping according to


magnitude or size.

Class interval. This refers to the grouping defined by a lower limit and an upper limit.

Class frequency. This refers to the number of observations belonging to a class interval.

Class mark. This is the midpoint or middle value of the class interval.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
9
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Class boundary. This is the more precise expressions of the class limits also called the true
limits.

Class size. This is the width of each class interval.

Steps in Constructing a Frequency Distribution


1. Array the given raw data in ascending order.
2. Compute the range.
Range= Highest score – Lowest score
3. Determine the number of classes by using the Sturge’s formula.
K = 1 + 3.322 log n
where:
k is the approximate number of classes
n is the number of observations
4. Compute for the class size. C = R ÷ K. The computed value of C should be rounded-
off for convenience.
5. Determine the lowest class limit.
6. Tally each score to the category of class interval it belongs to. Sum the frequency and
check if its total is equal to the total number of observations.

Relative Frequency Distribution

Denoted by % (rf), is derived by getting the ratio of the number of items in each class
to the total number of frequency. The relative frequency distribution may be expressed in
percent and its total sum must be equal to 100%.

Cumulative Frequency Distribution

The cumulative frequency is the accumulated frequencies of the classes; it can be either
at the beginning or end of the distribution.

The “less than” cumulative frequency is the number of observations that are less than
the upper class boundary in a given interval.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
10
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

The “greater than” cumulative frequency is the number of observations that are greater
than the lower class boundary in a given interval.

Example: Grouped Data

Construct a frequency distribution from a sample of 100 residents of Barangay


Banicain, Olongapo City. The following are the observed ages gathered from 100 persons.

Age (in years) of 100 Residents of Brgy. New


Banicain, Olongapo City
14 27 27 23 29 21 20 12 22 17
23 24 18 20 27 16 12 22 19 19
15 20 29 25 24 20 20 17 18 18
12 22 23 17 23 26 16 21 21 20
17 18 26 18 28 27 18 22 19 16
14 16 19 20 20 18 25 19 26 15
28 13 18 17 14 27 24 20 18 25
17 20 23 18 18 24 19 19 14 18
21 21 25 24 14 25 20 17 17 17
15 12 26 23 17 20 24 25 18 15

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
11
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Solution

Step 1. Arrange the given raw data in ascending order.

Age (in years) of 100 Residents of Banicain,


Olongapo City in Ascending Order
12 12 12 12 13 14 14 14 14 14
15 15 15 15 16 16 16 16 17 17
17 17 17 17 17 17 17 17 18 18
18 18 18 18 18 18 18 18 18 18
18 19 19 19 19 19 19 19 20 20
20 20 20 20 20 20 20 20 20 20
21 21 21 21 21 22 22 22 22 23
23 23 23 23 23 24 24 24 24 24
24 25 25 25 25 25 25 26 26 26
26 27 27 27 27 27 28 28 29 29

Step 2. Compute the range.

range= Highest score – Lowest score


range= 29-12
range= 17

Step 3. Compute the number of classes.


K = 1 + 3.322 log n
= 1 + 3.322 log 100
= 7.644

Step 4. Compute the class size.


C=R÷K
= 17 ÷ 7.644
= 2.22 which is approximately equal to 2.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
12
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Step 5. Organize the class interval. Start the first class with a lower limit equal to or a little bit
less than the lowest observed value.

Step 6. Tally each score to the category of class interval it belongs to.

Class Mark To obtain the midpoint, simply add the lower limit and upper limit and
divided by two. For example class interval 12-13, adding these two will give us 25 divided
by 2 equals 12.5.
Class BoundaryThe exact limit is obtained by adding 0.5 from upper limit and
subtracting 0.5 from lower limit. For example class interval 12-13 the exact limit is 11.5-
13.5.
Cumulative Frequency In less than cumulative frequency (cf<), adding of
frequencies from the top. Start at 5; (5 + 9)=14; (14 + 14)=28; (28 + 20)=48; (48 + 17)=65;
(65 + 10)=75; (75 + 12)=87; (87 + 9)=96; (96 + 4)=100. The last cumulative frequency is
equal to the total number of observation.
On the other hand, cumulative frequency (>cf), is done by subtracting the frequency
starting from the top. Start at the total number of your observation which is 100; (100-5)=95;
(95-9)=86; (86-12)=74; (74-10)=64; (64-17)=47; (47-20)=27; (27-14)=13; (13-9)=4.
Relative Frequency The frequency percentage is obtained by dividing the frequency
of a class interval by the total number of observation times 100%. For example class interval
12-13, divide the frequency 5 to the total number of observation which is 100 multiplied by
100%.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
13
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Frequency Distribution of Age (in years) of 100 Residents of Banicain, Olongapo City
Class Class
Frequency Class <Cumulative >Cumulative Relative
Interval Mark Boundaries frequency Frequency Frequency
12-13 5 12.5 11.5-13.5 5 100 5%
14-15 9 14.5 13.5-15.5 14 95 9%
16-17 14 16.5 15.5-17.5 28 86 14%
18-19 20 18.5 17.5-19.5 48 72 20%
20-21 17 20.5 19.5-21.5 65 52 17%
22-23 10 22.5 21.5-23.5 75 35 10%
24-25 12 24.5 23.5-25.5 87 25 12%
26-27 9 26.5 25.5-27.5 96 13 9%
28-29 4 28.5 27.5-29.5 100 4 4%
N=100

MEASURES OF CENTRAL TENDENCY


After the data have been presented in tabular or graphical form, the researcher must be
able to describe them in terms of a single number. This single figure which is representative or
summary of the characteristics of a given set of data is called a measure of central tendency.

Most commonly used measures of central tendency are mean, median and mode.

Measures of Central Tendency of Ungrouped Data

Ungrouped Data or Raw Data are those data which are not yet organized or arranged
into frequency distribution. If your number of observation is less than or equal  to 30 it is
ungrouped data.

Mean

The arithmetic mean or arithmetic average is defined as the sum of all items or terms
divided by the total number of items or terms. The definition is the same for both the sample
and population, although we use different symbol to refer to each.

The symbol for the sample mean is x bar ( x ), and for the population mean is the Greek
letter mu (µ).

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
14
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Suppose you have six scores: 12, 10, 18, 16, 20 and 14. If x1=12, x2=10, x3=18,
x4=16, x5=20, x6=14 the mean as represented as x bar is:

x1  x 2  x3  x 4  x5  x6
x
N

12  10  18  16  20  14
x  15
6

Instead of writing the equation for the mean as shown above you can shorten it to:

Population Mean Sample Mean


x x
x
N n
where: where:
  the mean x = the mean

x sum of all scores ∑x= sum of all the scores


N=total number of cases in the population n= total number of cases in the sample

Median

The median of ungrouped data is the value of the middle item after arranging the data
in an ascending or descending order.

Example 1: Compute for the median from the following set of scores; 6, 14, 10, 8, 2, 12 and
4.

2, 4, 6, 8, 10, 12, 14

Answer: The median is 8, which is the middle item.

Example 2: Find the median of the following set of item; 6, 14, 10, 8, 12 and 4.

4, 6, 8, 10, 12, 14

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
15
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

8  10
median  9
Answer: 2

Mode

The mode for ungrouped data is defined as the value that appears with the highest
frequency. That is, the item that appears most often.

Example:

Find the mode of the following set of items: 4, 7, 11, 6, 4, 3, 5, 8, 9, 2

Answer: The mode is 4.

Measures of Central Tendency of Grouped Data

Grouped data are those data organized and summarized in the forms of frequency
distribution. If your number of observation is greater than  30 it is grouped data. These are
data classified into categories for better presentation and analysis.

Arithmetic Mean

Two methods of computing for the mean:

1. Long Method

x
X F i i

n
where:
X i  classmark
Fi  frequency
n= total number of frequency

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
16
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

2. Short Method (coded formula)

 n 
  di fi 
x  Am   i 1 i
 n 
 
 
where:
Am = assumed mean or class mark of the class
interval with the highest frequency

d i = coded deviation
f i = frequency
i = class interval
n = total number of frequency

Example

The mean score of the frequency distribution of 60 students in entrance examination is shown
below.

Class Frequency Class Mark


( X i Fi ) (d i ) (d i f i )
Interval ( fi ) (Xi )
18-26 8 22 176 -2 -16
27-35 13 31 403 -1 -13
36-44 21 40 840 0 0
45-53 6 49 294 1 6
54-62 12 58 696 2 24
n= 60 x i f i  2409 d i f i 1

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
17
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Solution

1. Using the Long Method

x
X F i i

n
x
 2409  40.15
60
2. Using the Short Method (coded formula)

 n 
  di fi 
x  Am   i 1 i
 n 
 
 

 1 
x  40   9  40.15
 60 
Median

The formula for finding the median of grouped data is given as follows:
𝑛⁄ − <𝑐𝑓
2
𝑀𝑑𝑛 = 𝐿𝐶𝐵𝑀𝑑𝑛 + 𝑐 ( )
𝑓𝑖

where:

Mdn = median
𝐿𝐶𝐵𝑚𝑑𝑛 = Lower Class Boundary containing the median class
<cf = less than cumulative frequency preceding the median class
f i = frequency of the class interval containing the median class
c = class interval
n= total number of frequency

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
18
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

To solve for the median the following steps are followed.

1. Compute the less than cumulative frequency.


2. Find the class interval in which n/2, one half the total of respondent must be equal
to or greater than to the less than cumulative frequency for the first time.
3. Apply the formula by substituting the given values.

Example: Compute the median of the given data:

Class
Frequency ( f i ) Cumulative
Interval Frequency <

18-26 8 8
27-35 13 21
36-44 21 42 median class

45-53 6 48
54-62 12 60
N= 60

n/2= 60/2 = 30

𝑛⁄ − < 𝑐𝑓
𝑀𝑑𝑛 = 𝐿𝐶𝐵𝑀𝑑𝑛 + 𝑐 ( 2 )
𝑓𝑖

 60  21 
Mdn  35.5   2 9  39.36
 21 
 

Answer: 39.36

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
19
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Mode

The formula for finding the mode of grouped data is given as follows:
𝑓𝑀𝑜 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐 ( )
2𝑓𝑀𝑜 − 𝑓1 − 𝑓2

where:
M o = Mode
𝐿𝐶𝐵𝑀𝑜 = Lower Class Boundary containing the modal class
𝑓𝑀𝑜 = frequency of the class interval containing the modal class
𝑓1 = frequency of the class before the modal class
𝑓2 = frequency of the class after the modal class
c = class size
n= total number of frequency

Modal class is the class interval with the largest frequency.

Example: Compute the mode of the given data:

Class Frequency
Interval ( fi )
18-26 8
27-35 13
36-44 21
45-53 6
54-62 12
N= 60

𝑓𝑀𝑜 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐 ( )
2𝑓𝑀𝑜 − 𝑓1 − 𝑓2
(21 − 13)
𝑀𝑜 = 35.5 + 9 ( )
2 21) − 13 − 6
(

Ans. 38.63

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
20
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

MEASURES OF POSITION

Quantiles

The quantiles are a natural extension of the median concept in that they are the values
which divide the distribution into a given number of equal parts. While the median divide the
distribution into two parts, the quartiles divide the distribution into four equal parts or quartiles,
ten equal parts or deciles and one hundred equal parts or percentiles.

Ungrouped Data

𝑖(𝑛+1)
Quartile
4

𝑖(𝑛+1)
Decile
10

𝑖(𝑛+1)
Percentile
100

Example : Find 3rd quartile for the following data.


5, 7, 11, 1, 17, 23, 19, 3, 9, 21, 15 and 13

Solution:

First thing to do is arrange the data in ascending order.

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 and 23

For 3rd quartile:

i ( n  1) 312  1
Q3    9.75th position  9 th position  .75 * (10th  9 th ) position
4 4

17 + .75 * (19-17) = 18.5

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
21
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

After you arranged the data in ascending order, you count what number falls under the
9.75th position. To get the 9.75th position, we have to interpolate from the given data. The 9.75th
position is interpolated from the 9th position plus .75 (10th-9th). The value of the third quartile
is equal to 18.5.

Grouped Data

(𝑖𝑛⁄4)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + 𝑐 ( )
𝑓𝑄𝑖
where:
𝐿𝐶𝐵𝑄𝑖 = the Lower Class Boundary of the 𝑄𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝑄𝑖−1 = less than cumulative frequency
preceding the 𝑄𝑖 th class
𝑓𝑄𝑖 = frequency of the 𝑄𝑖 th class

(𝑖𝑛⁄10)−< 𝑐𝑓𝐷𝑖−1
𝐷𝑖 = 𝐿𝐶𝐵𝐷𝑖 + 𝑐 ( )
𝑓𝐷𝑖
where:
𝐿𝐶𝐵𝐷𝑖 = the Lower Class Boundary of the 𝐷𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝐷𝑖−1 = less than cumulative frequency
preceding the 𝐷𝑖 th class
𝑓𝐷𝑖 = frequency of the 𝐷𝑖 th class

(𝑖𝑛⁄100)−< 𝑐𝑓𝑝𝑖−1
𝑃𝑖 = 𝐿𝐶𝐵𝑝𝑖 + 𝑐 ( )
𝑓𝑝𝑖
where:
𝐿𝐶𝐵𝑝𝑖 = the Lower Class Boundary of the 𝑃𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝑝𝑖−1 = less than cumulative frequency
preceding the 𝑃𝑖 th class
𝑓𝑝𝑖 = frequency of the 𝑃𝑖 th class

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
22
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Example

The following is a frequency distribution of an achievement test. Compute the third quartile
(Q3 ).

Class Frequency Cumulative


Interval ( fi ) Frequency <
18-26 8 8
27-35 13 21
36-44 21 42

45-53 6 48 class interval containing


the desired quartile
54-62 12 60
N= 60

Solution

𝑖𝑛 (3)(60)
= = 45
4 4

(𝑖𝑛⁄100)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + ( )
𝑓𝑄𝑖

 45  42 
Q3  44.5  9   49
 6 

Ans.: 49(third quartile)

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
23
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

MEASURES OF DISPERSION OR VARIABILITY

While measures of central tendency are used to estimate "normal" values of a dataset,
measures of dispersion are important for describing the spread of the data, or its variation
around a central value. Two distinct samples may have the same mean or median, but
completely different levels of variability, or vice versa. A proper description of a set of data
should include both of these characteristics. There are various methods that can be used to
measure the dispersion of a dataset, each with its own set of advantages and disadvantages.

For Ungrouped Data

Range

It is defined as the difference between the largest and smallest sample values. Also, it
is one of the simplest measures of variability to calculate. It depends only on extreme values
and provides no information about how the remaining data are distributed.

Mean Absolute Deviation (MAD)

To arrive at a more precise and reliable measure of variation, all item values in the
distribution must be taken into account and determine the amount by which each item value
varies from the mean of the distribution and one way of doing so is to use the mean absolute
deviation.

MAD 
x i x
n

where :
xi = value of each observation
= symbol for absolute value
n = total number of items
x = mean

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
24
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Quartile Deviation or Semi-Interquartile Range

It is the amount of dispersion present in the middle 50% of the values in a distribution.
It is the difference between the first quartile and the third quartile divided by two.

Q3  Q1
QD =
2

Variance

It is the average of the squared deviation values from the distribution’s mean. If all
values are identical the variance is zero, the greater the dispersion of values the greater the
variance. The symbol for sample variance is S2 and the population variance is the Greek letter
sigma  2 .

Population Variance Sample Variance

∑𝑁
𝑖=1(𝑋𝑖 − 𝜇 )
2 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ )2
𝜎2 = 𝑠2 =
𝑁 𝑛−1

where:

x i  value of each item x i  value of each item


  population mean x  sample mean
N=total number of observations n=total number of observations

Standard Deviation

It is the positive square root of the variance which measures the spread or dispersion of
each value from the mean of the distribution. It is the most used measure of spread since it
improves interpretability by removing the variance square and expressing deviations in their
original unit, and is significantly related to normal distributions. It is the most important
measure of dispersion since it enables us to determine with a great deal of accuracy where the
values of the distribution are located in relation to the mean.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
25
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Population Standard Deviation Sample Standard Deviation

∑𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
𝜎=√ 𝑠=√
𝑁 𝑛−1
where:

x i  value of each item x i  value of each item


  population mean x  sample mean
N=total number of observations n=total number of observations

Example: Ungrouped data

The weights in kilos of twelve students are: 50, 59, 55, 48, 60, 54, 48, 61, 57, 45, 52
and 63. Solve the following:
a. Range
b. Quartile deviation
c. Mean Absolute Deviation
d. Variance
e. Standard Deviation

Solution:

a. Range = Highest Score – Lowest score

= 63 – 45

Ans.: 18

Q3  Q1
b. QD 
2
45, 48, 48, 50, 52, 54, 55, 57, 59, 60, 61 and 63

i n  1 i n  1 59.75  48.50
Q3  Q1  QD 
4 4 2

312  1 112  1
  QD  5.63
4 4

 9.75  59  .7560  59  3.25  48  .2550  48

Q3  59.75 Q1  48.50
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
26
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

c. Mean Absolute deviation

45  48  48  50  52  54  55  57  59  60  61  63
x  54.33
12
x = 54.33 or 54

Xi | Xi - x | ( Xi - x ) ( xi  x) 2

45 |45-54|=9 (45-54)= -9 81
48 |48-54|=6 (48-54)= -6 36
48 |48-54|=6 (48-54)= -6 36
50 |50-54|=4 (50-54)= -4 16
52 |52-54|=2 (52-54)= -2 4
54 |54-54|=0 (54-54)= 0 0
55 |55-54|=1 (55-54)= 1 1
57 |57-54|=3 (57-54)= 3 9
59 |59-54|=5 (59-54)= 5 25
60 |60-54|=6 (60-54)= 6 36
61 |61-54|=7 (61-54)= 7 49
63 |63-54|=9 (63-54)= 9 81

x i  x =58  ( xi  x) 2  374

MAD 
x i x
MAD 
58
 4.83
n 12

Ans.: MAD= 4.83

d. Variance
( x i  x ) 2
s2 
n 1
374
s2   34
12  1

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
27
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Ans.: s 2  34

e. Standard Deviation

s
 (x i  x)2
s
374
s  34
n 1 12  1

Ans.: s = 5.83

For Grouped Data


Range = Upper boundary of highest class minus lower boundary of lowest class.

Q3  Q1
Deviation = QD 
2

Mean Absolute Deviation , MAD 


 f cm  x
n
where:
cm = class midpoint
= symbol for absolute value
n = total number of items
x = mean
Variance (  2 )

f (cm  x ) 2
 
2

n
where :
cm = class mark of each classes
x = mean
n = total number of observations
f = frequency

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
28
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Standard Deviation (  )


 f (cm  x ) 2

where:
cm = class mark of each classes
x = mean
n = total number of observations
f = frequency

Example: For Grouped data


The following is a frequency distribution of an achievement test. Using the table below
compute the following: a) Mean Absolute Deviation, b) Standard Deviation and c) Quartile
Deviation

Class Frequency Class


Mark d' d' f | cm  x | f | cm  x |
Interval ( fi )
(cm)
|22-
18-26 8 22 -2 -16 8*18=144
40|=18
|31-
27-35 13 31 -1 -13 13*9=117
40|=9
|40-
36-44 21 40 0 0 21*0=0
40|=0
|49-
45-53 6 49 1 6 6*9=54
40|=9
|58-
54-62 12 58 2 24 12*18=216
40|=18

N= 60 d i f i 1  f | cm  x | 531

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
29
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

a. Mean Absolute Deviation

 n 
  di fi   1 
x  40   9  40.15 or 40
x  Am   i 1 i
 60 
 N 
 
 
531
MAD   8.85
MAD 
 f cm  x 60
n
Answer =8.85

Cumulative
Class Interval Frequency ( f i )
Frequency <
18-26 8 8
27-35 13 21
36-44 21 42
45-53 6 48
54-62 12 60
N= 60

b. Standard Deviation


 f (cm  x) 2
n
8019

60

  133.65  11.56

Answer = 11.56

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
30
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Class Frequency Class


Mark (cm  x ) (cm  x ) 2 f (cm  x ) 2
Interval ( fi )
(cm)
18-26 8 22 (22-40)=-18 (18) 2  324 8*324=2592

27-35 13 31 (31-40)=-9 (9) 2  81 13*81=1053

36-44 21 40 (40-40)=0 ( 0) 2  0 21*0=0

45-53 6 49 (49-40)=9 (9) 2  81 6*81=486


12*324=388
54-62 12 58 (58-40)=18 (18) 2  324
8
N= 60   8019

c. Quartile deviation

(𝑖𝑛⁄4)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + 𝑐 ( )
𝑓𝑄𝑖

3(60) (60)
Q3   45 Q1   15
4 4

 45  42   15  8 
Q3  44.5  9  Q1  26.5  9 
 6   13 

Q3  49 Q1  31.35

Q3  Q1 49  31.35
QD  QD   8.83
2 2

Answer = 8.83

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
31
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

NORMAL DISTRIBUTION
An assessment of the normality of data is a prerequisite for many statistical tests as normal data
is an underlying assumption in parametric testing. The normal distribution is used in analysis of data in
determining parametric and non-parametric test. Its graph is called a normal curve. The mathematical
equation of the normal curve was first described in 1733 by De Moivre.

A population investigated in a certain school regarding the academic performance of students


and has a characteristic that follows a normal distribution. If we are to study the grade point average
(GPA) of the students with a population (N= 2, 500), we may find that the majority of the students
population will yield excellent, very good, good, satisfactory, passed and failed.

If the grades of the students are plotted on a graph with the frequency of students in the ordinate
or y axis and their grades in the abscissa or x axis, we probably approximate a bell shaped curve like
the figure 8 below.

Figure 8: Normal Curve

Properties of a Normal Curve

1. The mean, median and mode coincide at


one point at the center of the distribution
2. The curve is symmetrical and bell- shaped
3. The tail of the curve is asymptotic to the
horizontal line
4. Three standard deviation to the left & right
of the curve
5. The total area under normal curve is 100%. Figure 9: Standard
Normal Curve
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
32
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Standard Normal Curve


Formula for finding the standard score (z)

xx
z
s

where:
z = standard score
x = mean
s = standard deviation
x = a given value of a particular variable

Consider the following procedures in determining the areas under the standard normal curve:

1. If the areas above the mean or right of a positive z-score, subtract the value in the table of the
normal curve areas from 0.5000.
2. If the areas below the mean or left of a positive z-score, add the value in the table of the
normal curve areas to 0.5000.
3. If the areas above the mean or right of a negative z-score, add the value in the table of the
normal curve areas to 0.5000
4. If the areas below the mean or left of a negative z-score, subtract the value in the table of the
normal curve areas from 0.5000

Area under Normal curve

Example 1. Find the area under the normal curve from z  1.18.

Solution:

Step 1: Sketch the curve. Locate the


measurement of horizontal axis
on the indicated range of values.

Step 2: The area under the curve over the


description of the event. The shaded
part is the area under the normal
curve of the event z  1.18.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
33
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Step 3: Use the standard normal table (see Appendix A) to find this area. Now, a complete copy of the
table is not here. But, here's an abridged version to locate the area under the normal curve.

z .00 .01 ... .08 .09


0.0 .0000 .0040 ... .0319 .0359
0.1 .0398 .0438 ... .0714 .0753
... ... ... ... ... ...
1.0 .3413 .3438 ... .3599 .3621
1.1 .3643 .3665 ... .3810 .3830
1.2 .3849 .3869 ... .3997 .4015
... ... ... ... ... ...

Corresponding to a measurement value of z = 1.18 is an area of 0.3810.

0.3810
0.5000
0.8810

For the area under standard normal curve from z  1.18 is 0.8810.

Answer: 0.8810 or 88.10%.

Example 2. What is the area under the normal curve from z  -0.63?

Solution:

1. Identify the range of values described by "  -0.63”.


2. Identify the area you need to find (shaded part of the figure below).
3. Look-up the appropriate area in your table. That area is 0.2357. Since the condition is z  -
0.63, we subtract 0.5 to 0.2357.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
34
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

0.5000
0.2357
0.2643

Answer: 0.2643 or 26.43%.

Example 3. Find the area of the normal curve from z = -1.57 to z = 3.99.

Solution:

1. Identify the range of values described from z = -1.57 to z = 3.99.


2. Identify the area you need to find (shaded part of the figure below).
3. Look-up the appropriate area in your table. That area of 1.57 is 0.4418 and 3.99 is 0.5000.
Since the condition is z = -1.57 to z = 3.99, we add the area 0.4418 to 0.5000.

0.4418
0.5000
0.9418

Answer: 0.9418 or 94.18%.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
35
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Example 4. Find the z score corresponding to the given area to the right of + z = 0.2000

Solution:
The shaded portion shows that the area to the right of z is 0.2000. To obtain the area we are
looking for, we subtract 0.2000 from the total area of the right half of the normal curve. Hence,
0.5000 - 0.2000 = 0.3000. Referring to the figure below.

0.5000
0.2000
0.3000 0.2995 0.84

Referring to the table of the area of the normal curve, Appendix A,

we find the entry nearest to 0.3000 is 0.2996 and this corresponds to a z-score of 0.84.

Answer: 0.84 (because 20% of the observations are above 0.84).

Example 5.

Find the z score corresponding to the given area to the right of +z is 0.3520.

Solution:
The shaded portion shows that the area to the right of z is 0.3520. To obtain the area we are
looking for, we subtract 0.3520 from the total area of the right half of the normal curve. Hence,
0.5000 - 0.3520 = 0.1480. Referring to the figure below.

0.5000
0.3520
0.1480 0.38

Referring to the table of the area of the normal curve, we find the entry 0.1480 and this corresponds to
a z-score of 0.38.

Answer: 0.38

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
36
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Application of the Standard Normal Curve

Example:

The mean weight of college students is 70 kg and the standard deviation is 3 kg. Assuming that the
weight is normally distributed, what is the probability that the students weigh:

a. between 60 kg and 75 kg.


b. more than 72 kg.
c. less than 64 kg.
Solution:

a. Between 60 and 75 kg

1. Convert raw score of 60 and 75 to z score

60  70
z  3.33
3

75  70
z  1.67
3
2. Sketch the curve and identify the area you need to find (shaded part of the figure below

3. The z value of 3.33 gives an area of 0.4996, while z value of 1.67 corresponds to an area of
0.4525. The shaded area is the sum between these two given areas. Therefore, the required area
is 0.4996 + 0.4525 = 0.9521 or 95.21%
0.4996
0.4525
0.9521

Answer: 0.9521 or 95.21%

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
37
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

b. more than 72 kg.

1. Convert raw score of 72 to z score

72  70
z  0.67
3

2. Sketch the curve and identify the area you need to find (shaded part of the figure below).

3. The z value of 0.67 gives an area of 0.2486. From the shaded area, we subtract 0.2486 from
the total area of the right half of the normal curve. Hence, 0.5000 - 0.2486 = 0.0.2514 or
25.14%

0.5000
0.2486
0.2514

Answer: 0.2514 or 25.14%

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
38
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

c. less than 64 kg

1. Convert raw score of 64 to z score

64  70
z 2
3

2. Sketch the curve and identify the area you need to find (shaded part of the figure below).

3. The z value of 2.00 gives an area of 0.4772. From the shaded area, we subtract 0.4772 from
the total area of the left half of the normal curve. Hence, 0.5000 - 0.4772 = 0.0228 or 2.28%

0.5000
0.4772
0.0228

Answer: 0.0228 or 2.28%

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
39
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

CORRELATION AND REGRESSION ANALYSIS

Correlation is a degree of relationship between variables, which seeks to determine how


well a linear or other equation describes or explains the relationship between variables. It also
implies “association” between two variables.

PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT

The Pearson product-moment correlation coefficient (or Pearson r for short) is a


measure of the strength of a linear association between two variables with interval and ratio
type of scale.

N  xy   x y
r
N  x   x N  y   y  
2 2 2 2

where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 y = sum of the values of the square y


2

 xy = sum of the values of the product of x and y


n = total number of pair

The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value
of 0 indicates that there is no association between the two variables. This is shown in figure 7.

Figure 7: Scatter plot Diagram

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
40
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

The arbitrary scale for the interpretation of r is given below.

Range of computed r Interpretation


± 1.0 Perfect Relationship
± 0.70 to 0.99 Strong/ High Relationship
± 0.40 to 0.69 Moderate Relationship
± 0.01 to 0.39 Slight/ Low Relationship
0 No Correlation

LINEAR REGRESSION

Regression is a term used to describe the process of estimating the relationship between
two variables. The relationship is estimated by fitting a straight line through the given data.
The method of least squares permits us to find a line of best fit called regression line which
keeps the errors of prediction to a minimum.

The equation for a fitted line is:

Y  a  bx

where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted

To find the slope b: To find the value of a:


a  y  bx
N  xy   x y
b
N  x 2   x 
2
where:
y = mean value of Y
x = mean value of X

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
41
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

where :
 x = sum of the values of x
 y = sum of the values of y
 x = sum of the values of the square of x
2

 xy = sum of the values of the product of x and y


n= total number of pairs
Example

Below are the scores of 12 college students in Mathematics and Physics tests of 80 items
each.

Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70

a. Draw a scatter diagram


b. Find the correlation coefficient of Mathematics and Physics scores and interpret
c. Find the regression line equation
d. Predict the score in Physics (x) if the score in Mathematics (y) of the student is 75
Solution

Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop analysis,
conclude “no relationship”. Otherwise proceed to step number 2

72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72

The scatter plot indicates an upward linear trend between Mathematics and Physics
proficiency. Thus, “there is a reason to believe that they are related.”

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
42
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Step 2: Compute for Pearson r by rearranging the given in columns.

Mathematics Physics xy
Number x2 y2
(x) (y)
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970
N = 12  x 800  y 811  x 2
53418 y 2
54849  xy 54107

r
1254107  800811
1253418  8002 1254849  8112 
r  0.70
Referring to the arbitrary scale for the interpretation of r = 0.70, it states that there is a
strong/ high positive relationship between the scores of the students in Mathematics and
Physics.

Step 3: Formulate the regression line equation by solving first the value of the variables b
and a.

Solving for b

b
1254107  800811
b  0.48
1253418  8002

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
43
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

Solving for a

a  67.58  0.4866.67 a  35.58

Substitute the computed values of b and a to the regression line equation

Y = a + bx
y  35.58  0.48x regression line equation

We can now estimate scores in Physics (y) using the regression line equation by
substituting a value or score in Mathematics (x). Say for instance, if x is equal to 75, then
solving for y will give a 71.59.

y  35.58  0.4875
y  71.58

Therefore, the estimated score in Physics is 71.59 or approximately equivalent to 72 if


the score in Mathematics is 75. The regression line equation may be used now in estimating
scores for y by substituting a value of x.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
44
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

IV. Teaching and Learning Materials and Resources


a. PowerPoint Presentation
b. Google meet

V. Learning Task (Module 2)


Name: ___________________________________ Date:_________________
GEC04
EXERCISE I: INTRODUCTION TO BASIC TERMS IN STATISTICS
I. Classify each of the following as: A. Nominal B. Ordinal C. Interval D. Ratio
measurement. Write the correct UPPERCASE letter of your answer.

___________ 1. Faculty are classified as:


1 – Contract of Service
2 – Consultant
3 – Regular Casual
4 – Regular Plantilla
___________ 2. Economic Status of students as:
0 – low status
1 – high status
___________ 3. Religion.
___________ 4. Amount of money in bank accounts.
___________ 5. Boiling point of a water in Fahrenheit scale.
___________ 6. Instructors are ranked according to their performance rating.
___________ 7. Ranking of ten students according to their performance rating.
___________ 8. Speed of different automobile in seconds.
___________ 9. Flavors of ice cream.
___________ 10. Gender.
___________ 11. Different Learning Styles of students.
___________ 12. Teaching Modalities.
___________ 13. Percentage scores on a Statistics exam.
____________ 14. Gordon College Students Year level
____________ 15. Inventory of sales every month.

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
45
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

II. Classify the data described in the following scenarios as qualitative or quantitative.

_____________ a. The body mass index of elementary grader students in certain school.

_____________ b. The fasting blood sugar readings are determined for several
individuals in a study involving diabetics.

_____________ c. The number of questions correctly answered on a 50-item quiz is


recorded for each student in a statistics class.

_____________ d. The students in a certain school are classified into one of six
categories in classroom performance as follows: Excellent, Very
Good, Good, Satisfactory, Passed and Failed.

III. PERFORMANCE TASK. Using the table below which shows a frequency distribution
of test scores in entrance examination of 500 students in Mathematics construct a
histogram and line graph for these data using MS EXCEL.

Scores in Entrance Examination Number


30-39 24
40-49 46
50-59 58
60-69 76
70-79 68
80-89 82
90-99 48
100-109 32
110-119 66

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
46
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

EXERCISE II: MEASURES OF CENTRAL TENDENCY AND MEASURES OF


DISPERSION

1. The following scores were received by 20 accounting students in a short quiz: 10, 9, 15,
20, 13, 15, 18, 11, 7, 12, 15, 13, 18, 19, 12, 8, 10, 13, 17, and 15. Find the third quartile,
eight decile, forty percentile, mean, median and mode.
2. The following are the scores of ten (10) management students in four quizzes. Solve
the (a) range, (b) quartile deviation, (c) mean absolute deviation, (d) variance, and (e)
standard deviation

a. 32, 29, 28, 27, 26, 25, 24, 23, 22, 20


b. 30, 28, 27, 26, 26, 26, 26, 24, 24, 18

3. The following is the result of the examination of 55 students in Statistics .

Classes Frequency
33-40 4
41-48 8
49-56 10
57-64 11
65-72 9
73-80 6
81-88 2
89-96 5

Compute the following:


a. Mean
b. Median
c. Mode
d. sixty-two percentile
e. seventh decile
f. third quartile
g. quartile deviation
h. mean absolute deviation
i. variance
g. standard deviation

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
47
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

EXERCISE III: NORMAL DISTRIBUTION

1. Find the area of the normal curve given the following:

a. from z=0 to z=1.43


b. from z= -1.23 to z=0
c. from z=2.38 to z=3.09
d. from z=-1.42 to z=2.37
e. from z=-2.35 to z=2.48
2. An entrance examination is to be conducted to 1500 incoming freshmen students at Gordon
College which is known to be normally distributed and has a mean of 85 and a standard
deviation of 10.

a. How many students would be expected to score above 95?

c. How many students would be expected to score between 75 and 115?

3. The annual salaries of employees in a large company are approximately normally


distributed with a mean of P50,000 and a standard deviation of P20,000.

a. What percent of people earn less than P40,000?


b. What percent of people earn between P45,000 and P65,000?
c. What percent of people earn more than P70,000?

Exercise IV: CORRELATION AND REGRESSION

1. Test scores of nine (9) students are shown below. What can you say about the strength of
the correlation between these sets of scores in Trigonometry and Geometry?

Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
48
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314

2. The number of hours spent per week viewing television (y) and the number of years of
education (x) were recorded for ten randomly selected individuals. The results are given
below;
x 12 14 11 16 16 18 12 20 10 12
y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.

VI. Reference
 https://www.emathzone.com/tutorials/basic-statistics/kinds-or-branches-
statistics.html#ixzz6S54TscJa
 https://courses.lumenlearning.com/
 https://www.abs.gov.au/
 Paguio, D. et al. 2012. Statistics With Computer Based Discussions. Jimczyville
Publication
 https://www.stat.uci.edu

“The future belongs to those who believe in the beauty of their dreams”
-Eleanor Roosevelt-

GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
49

You might also like