Data Management Module
Data Management Module
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
College of Education
Arts and Sciences
GEC04
Mathematics in the Modern
World
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
1
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Module V:
DATA
MANAGEMENT
Ms. Katherine D. Yap
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
2
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Module No. V
“To understand God’s thoughts we must study Statistics, for these are the measure of His purpose”
Florence Nightingale
I. Introduction
income, and religion. This type of information gathering over a whole population is
called a census.
A sample is a set of data collected and/or selected from a population by a defined
procedure which is defined by lumenlearning .com.
C. VARIABLE
Australian Bureau of Statistics stated that variable is any characteristics, number, or
quantity that can be measured or counted. A variable may also be called a data item.
Age, sex, business income and expenses, country of birth, capital expenditure, class
grades, eye color and vehicle type are examples of variables. It is called a variable
because the value may vary between data units in a population, and may change in
value over time.
For example; 'income' is a variable that can vary between data units in a population
(i.e. the people or businesses being studied may not have the same incomes) and can
also vary over time for each data unit (i.e. income can go up or down).
Paguio et al (2012) mentioned that variables are classified into qualitative and
quantitative variable. A qualitative variable which is called categorical, has values that
are described by words rather than numbers. Qualitative variables generally have
either nominal or ordinal scales, for example: gender, disease status, occupation,
gender, race and others. Quantitative or numerical variable is a data which arise from
counting, measuring something or from some kind of mathematical operation. These are
variable that are intrinsically numeric. Number of children in a family, and age are good
examples of quantitative variable.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
4
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
F. TYPES OF DATA
There are two types of data, primary data and secondary data. Primary data, apply to
data obtained or analyzed from first-hand experience that is derived directly from the
original source. There are numerous advantages for primary data like it is more accurate
and more likely to be correct while its disadvantages were costly and time consuming.
On the other hand, Secondary data refers to information previously obtained by certain
persons or organizations or collected in the past or by other parties. Its advantages are
can be obtained easily, less expensive because it can be done with books and over the
internet and many more while disadvantages are the information needed does not meet
one’s specified needs and no control over the quality of data.
The study of cause and effect is an experiment. This differs from non-
experimental approaches in that it involves one variable being intentionally changed,
while attempting to keep the other variables stable.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
5
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
3. Registration Method
4. Direct Method
The researcher gets the requisite information directly from the interviewer,
and the direct personal interview gathers the information.
5. Indirect Method
It is a method of gathering primary data which is most widely used and collected
through a collection of questionnaires. A questionnaire is a document prepared by the
researcher which contains a collection of questions provided to obtain the information
needed.
A. PRESENTATION OF DATA
1. TEXTUAL. This presentation mode of data is clarified or addressed in text form or
as a paragraph.
2. TABULAR. The data are presented systematically through tables consisting of
vertical columns and horizontal rows with headings detailing those rows and columns.
3. GRAPHICAL. The most effective means of presenting statistical data, is in
graphical method since it will make the information clearer.
Types of Common Graphs
1. Scatter Plot. This shows n pairs of observations as dots on an X-Y graph, it is
usually used in investigating the relationship between two variables and if there is
an association between two variables and what kind of association that exists.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
6
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
3. Line chart. This is used to view time series, spot patterns or compare periods and
can display several variables at once.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
7
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
4. Pie chart. This is a circular graph that displays the relative contribution that
different categories make to an aggregate sum, a circle wedge reflects the
contribution of each category, so that the graph resembles a pie cut into various
sizes.
5. Frequency Polygon and Ogive. A frequency polygon is a line graph that links the
midpoints of the histogram intervals, plus additional intervals at the start and end
so the line crosses the X-axis. An ogive is a graph of the cumulative frequencies in
rows.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
8
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
FREQUENCY DISTRIBUTIONS
Raw Data
Raw data are collected data which have not been organized numerically. An example
is the set of mass of 200 male students obtained from an alphabetical listing of college records.
Array
Frequency Distribution
Class interval. This refers to the grouping defined by a lower limit and an upper limit.
Class frequency. This refers to the number of observations belonging to a class interval.
Class mark. This is the midpoint or middle value of the class interval.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
9
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Class boundary. This is the more precise expressions of the class limits also called the true
limits.
Denoted by % (rf), is derived by getting the ratio of the number of items in each class
to the total number of frequency. The relative frequency distribution may be expressed in
percent and its total sum must be equal to 100%.
The cumulative frequency is the accumulated frequencies of the classes; it can be either
at the beginning or end of the distribution.
The “less than” cumulative frequency is the number of observations that are less than
the upper class boundary in a given interval.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
10
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
The “greater than” cumulative frequency is the number of observations that are greater
than the lower class boundary in a given interval.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
11
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Solution
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
12
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Step 5. Organize the class interval. Start the first class with a lower limit equal to or a little bit
less than the lowest observed value.
Step 6. Tally each score to the category of class interval it belongs to.
Class Mark To obtain the midpoint, simply add the lower limit and upper limit and
divided by two. For example class interval 12-13, adding these two will give us 25 divided
by 2 equals 12.5.
Class BoundaryThe exact limit is obtained by adding 0.5 from upper limit and
subtracting 0.5 from lower limit. For example class interval 12-13 the exact limit is 11.5-
13.5.
Cumulative Frequency In less than cumulative frequency (cf<), adding of
frequencies from the top. Start at 5; (5 + 9)=14; (14 + 14)=28; (28 + 20)=48; (48 + 17)=65;
(65 + 10)=75; (75 + 12)=87; (87 + 9)=96; (96 + 4)=100. The last cumulative frequency is
equal to the total number of observation.
On the other hand, cumulative frequency (>cf), is done by subtracting the frequency
starting from the top. Start at the total number of your observation which is 100; (100-5)=95;
(95-9)=86; (86-12)=74; (74-10)=64; (64-17)=47; (47-20)=27; (27-14)=13; (13-9)=4.
Relative Frequency The frequency percentage is obtained by dividing the frequency
of a class interval by the total number of observation times 100%. For example class interval
12-13, divide the frequency 5 to the total number of observation which is 100 multiplied by
100%.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
13
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Frequency Distribution of Age (in years) of 100 Residents of Banicain, Olongapo City
Class Class
Frequency Class <Cumulative >Cumulative Relative
Interval Mark Boundaries frequency Frequency Frequency
12-13 5 12.5 11.5-13.5 5 100 5%
14-15 9 14.5 13.5-15.5 14 95 9%
16-17 14 16.5 15.5-17.5 28 86 14%
18-19 20 18.5 17.5-19.5 48 72 20%
20-21 17 20.5 19.5-21.5 65 52 17%
22-23 10 22.5 21.5-23.5 75 35 10%
24-25 12 24.5 23.5-25.5 87 25 12%
26-27 9 26.5 25.5-27.5 96 13 9%
28-29 4 28.5 27.5-29.5 100 4 4%
N=100
Most commonly used measures of central tendency are mean, median and mode.
Ungrouped Data or Raw Data are those data which are not yet organized or arranged
into frequency distribution. If your number of observation is less than or equal to 30 it is
ungrouped data.
Mean
The arithmetic mean or arithmetic average is defined as the sum of all items or terms
divided by the total number of items or terms. The definition is the same for both the sample
and population, although we use different symbol to refer to each.
The symbol for the sample mean is x bar ( x ), and for the population mean is the Greek
letter mu (µ).
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
14
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Suppose you have six scores: 12, 10, 18, 16, 20 and 14. If x1=12, x2=10, x3=18,
x4=16, x5=20, x6=14 the mean as represented as x bar is:
x1 x 2 x3 x 4 x5 x6
x
N
12 10 18 16 20 14
x 15
6
Instead of writing the equation for the mean as shown above you can shorten it to:
x x
x
N n
where: where:
the mean x = the mean
Median
The median of ungrouped data is the value of the middle item after arranging the data
in an ascending or descending order.
Example 1: Compute for the median from the following set of scores; 6, 14, 10, 8, 2, 12 and
4.
2, 4, 6, 8, 10, 12, 14
Example 2: Find the median of the following set of item; 6, 14, 10, 8, 12 and 4.
4, 6, 8, 10, 12, 14
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
15
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
8 10
median 9
Answer: 2
Mode
The mode for ungrouped data is defined as the value that appears with the highest
frequency. That is, the item that appears most often.
Example:
Grouped data are those data organized and summarized in the forms of frequency
distribution. If your number of observation is greater than 30 it is grouped data. These are
data classified into categories for better presentation and analysis.
Arithmetic Mean
1. Long Method
x
X F i i
n
where:
X i classmark
Fi frequency
n= total number of frequency
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
16
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
n
di fi
x Am i 1 i
n
where:
Am = assumed mean or class mark of the class
interval with the highest frequency
d i = coded deviation
f i = frequency
i = class interval
n = total number of frequency
Example
The mean score of the frequency distribution of 60 students in entrance examination is shown
below.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
17
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Solution
x
X F i i
n
x
2409 40.15
60
2. Using the Short Method (coded formula)
n
di fi
x Am i 1 i
n
1
x 40 9 40.15
60
Median
The formula for finding the median of grouped data is given as follows:
𝑛⁄ − <𝑐𝑓
2
𝑀𝑑𝑛 = 𝐿𝐶𝐵𝑀𝑑𝑛 + 𝑐 ( )
𝑓𝑖
where:
Mdn = median
𝐿𝐶𝐵𝑚𝑑𝑛 = Lower Class Boundary containing the median class
<cf = less than cumulative frequency preceding the median class
f i = frequency of the class interval containing the median class
c = class interval
n= total number of frequency
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
18
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Class
Frequency ( f i ) Cumulative
Interval Frequency <
18-26 8 8
27-35 13 21
36-44 21 42 median class
45-53 6 48
54-62 12 60
N= 60
n/2= 60/2 = 30
𝑛⁄ − < 𝑐𝑓
𝑀𝑑𝑛 = 𝐿𝐶𝐵𝑀𝑑𝑛 + 𝑐 ( 2 )
𝑓𝑖
60 21
Mdn 35.5 2 9 39.36
21
Answer: 39.36
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
19
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Mode
The formula for finding the mode of grouped data is given as follows:
𝑓𝑀𝑜 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐 ( )
2𝑓𝑀𝑜 − 𝑓1 − 𝑓2
where:
M o = Mode
𝐿𝐶𝐵𝑀𝑜 = Lower Class Boundary containing the modal class
𝑓𝑀𝑜 = frequency of the class interval containing the modal class
𝑓1 = frequency of the class before the modal class
𝑓2 = frequency of the class after the modal class
c = class size
n= total number of frequency
Class Frequency
Interval ( fi )
18-26 8
27-35 13
36-44 21
45-53 6
54-62 12
N= 60
𝑓𝑀𝑜 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐 ( )
2𝑓𝑀𝑜 − 𝑓1 − 𝑓2
(21 − 13)
𝑀𝑜 = 35.5 + 9 ( )
2 21) − 13 − 6
(
Ans. 38.63
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
20
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
MEASURES OF POSITION
Quantiles
The quantiles are a natural extension of the median concept in that they are the values
which divide the distribution into a given number of equal parts. While the median divide the
distribution into two parts, the quartiles divide the distribution into four equal parts or quartiles,
ten equal parts or deciles and one hundred equal parts or percentiles.
Ungrouped Data
𝑖(𝑛+1)
Quartile
4
𝑖(𝑛+1)
Decile
10
𝑖(𝑛+1)
Percentile
100
Solution:
i ( n 1) 312 1
Q3 9.75th position 9 th position .75 * (10th 9 th ) position
4 4
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
21
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
After you arranged the data in ascending order, you count what number falls under the
9.75th position. To get the 9.75th position, we have to interpolate from the given data. The 9.75th
position is interpolated from the 9th position plus .75 (10th-9th). The value of the third quartile
is equal to 18.5.
Grouped Data
(𝑖𝑛⁄4)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + 𝑐 ( )
𝑓𝑄𝑖
where:
𝐿𝐶𝐵𝑄𝑖 = the Lower Class Boundary of the 𝑄𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝑄𝑖−1 = less than cumulative frequency
preceding the 𝑄𝑖 th class
𝑓𝑄𝑖 = frequency of the 𝑄𝑖 th class
(𝑖𝑛⁄10)−< 𝑐𝑓𝐷𝑖−1
𝐷𝑖 = 𝐿𝐶𝐵𝐷𝑖 + 𝑐 ( )
𝑓𝐷𝑖
where:
𝐿𝐶𝐵𝐷𝑖 = the Lower Class Boundary of the 𝐷𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝐷𝑖−1 = less than cumulative frequency
preceding the 𝐷𝑖 th class
𝑓𝐷𝑖 = frequency of the 𝐷𝑖 th class
(𝑖𝑛⁄100)−< 𝑐𝑓𝑝𝑖−1
𝑃𝑖 = 𝐿𝐶𝐵𝑝𝑖 + 𝑐 ( )
𝑓𝑝𝑖
where:
𝐿𝐶𝐵𝑝𝑖 = the Lower Class Boundary of the 𝑃𝑖 th class
c= class size
n = total number of observations in the distribution
< 𝑐𝑓𝑝𝑖−1 = less than cumulative frequency
preceding the 𝑃𝑖 th class
𝑓𝑝𝑖 = frequency of the 𝑃𝑖 th class
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
22
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Example
The following is a frequency distribution of an achievement test. Compute the third quartile
(Q3 ).
Solution
𝑖𝑛 (3)(60)
= = 45
4 4
(𝑖𝑛⁄100)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + ( )
𝑓𝑄𝑖
45 42
Q3 44.5 9 49
6
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
23
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
While measures of central tendency are used to estimate "normal" values of a dataset,
measures of dispersion are important for describing the spread of the data, or its variation
around a central value. Two distinct samples may have the same mean or median, but
completely different levels of variability, or vice versa. A proper description of a set of data
should include both of these characteristics. There are various methods that can be used to
measure the dispersion of a dataset, each with its own set of advantages and disadvantages.
Range
It is defined as the difference between the largest and smallest sample values. Also, it
is one of the simplest measures of variability to calculate. It depends only on extreme values
and provides no information about how the remaining data are distributed.
To arrive at a more precise and reliable measure of variation, all item values in the
distribution must be taken into account and determine the amount by which each item value
varies from the mean of the distribution and one way of doing so is to use the mean absolute
deviation.
MAD
x i x
n
where :
xi = value of each observation
= symbol for absolute value
n = total number of items
x = mean
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
24
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
It is the amount of dispersion present in the middle 50% of the values in a distribution.
It is the difference between the first quartile and the third quartile divided by two.
Q3 Q1
QD =
2
Variance
It is the average of the squared deviation values from the distribution’s mean. If all
values are identical the variance is zero, the greater the dispersion of values the greater the
variance. The symbol for sample variance is S2 and the population variance is the Greek letter
sigma 2 .
∑𝑁
𝑖=1(𝑋𝑖 − 𝜇 )
2 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋̅ )2
𝜎2 = 𝑠2 =
𝑁 𝑛−1
where:
Standard Deviation
It is the positive square root of the variance which measures the spread or dispersion of
each value from the mean of the distribution. It is the most used measure of spread since it
improves interpretability by removing the variance square and expressing deviations in their
original unit, and is significantly related to normal distributions. It is the most important
measure of dispersion since it enables us to determine with a great deal of accuracy where the
values of the distribution are located in relation to the mean.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
25
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
∑𝑁
𝑖=1(𝑋𝑖 −𝜇)
2 ∑𝑛 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
𝜎=√ 𝑠=√
𝑁 𝑛−1
where:
The weights in kilos of twelve students are: 50, 59, 55, 48, 60, 54, 48, 61, 57, 45, 52
and 63. Solve the following:
a. Range
b. Quartile deviation
c. Mean Absolute Deviation
d. Variance
e. Standard Deviation
Solution:
= 63 – 45
Ans.: 18
Q3 Q1
b. QD
2
45, 48, 48, 50, 52, 54, 55, 57, 59, 60, 61 and 63
i n 1 i n 1 59.75 48.50
Q3 Q1 QD
4 4 2
312 1 112 1
QD 5.63
4 4
Q3 59.75 Q1 48.50
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
26
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
45 48 48 50 52 54 55 57 59 60 61 63
x 54.33
12
x = 54.33 or 54
Xi | Xi - x | ( Xi - x ) ( xi x) 2
45 |45-54|=9 (45-54)= -9 81
48 |48-54|=6 (48-54)= -6 36
48 |48-54|=6 (48-54)= -6 36
50 |50-54|=4 (50-54)= -4 16
52 |52-54|=2 (52-54)= -2 4
54 |54-54|=0 (54-54)= 0 0
55 |55-54|=1 (55-54)= 1 1
57 |57-54|=3 (57-54)= 3 9
59 |59-54|=5 (59-54)= 5 25
60 |60-54|=6 (60-54)= 6 36
61 |61-54|=7 (61-54)= 7 49
63 |63-54|=9 (63-54)= 9 81
x i x =58 ( xi x) 2 374
MAD
x i x
MAD
58
4.83
n 12
d. Variance
( x i x ) 2
s2
n 1
374
s2 34
12 1
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
27
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Ans.: s 2 34
e. Standard Deviation
s
(x i x)2
s
374
s 34
n 1 12 1
Ans.: s = 5.83
Q3 Q1
Deviation = QD
2
f (cm x ) 2
2
n
where :
cm = class mark of each classes
x = mean
n = total number of observations
f = frequency
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
28
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Standard Deviation ( )
f (cm x ) 2
where:
cm = class mark of each classes
x = mean
n = total number of observations
f = frequency
N= 60 d i f i 1 f | cm x | 531
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
29
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
n
di fi 1
x 40 9 40.15 or 40
x Am i 1 i
60
N
531
MAD 8.85
MAD
f cm x 60
n
Answer =8.85
Cumulative
Class Interval Frequency ( f i )
Frequency <
18-26 8 8
27-35 13 21
36-44 21 42
45-53 6 48
54-62 12 60
N= 60
b. Standard Deviation
f (cm x) 2
n
8019
60
133.65 11.56
Answer = 11.56
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
30
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
c. Quartile deviation
(𝑖𝑛⁄4)−< 𝑐𝑓𝑄𝑖−1
𝑄𝑖 = 𝐿𝐶𝐵𝑄𝑖 + 𝑐 ( )
𝑓𝑄𝑖
3(60) (60)
Q3 45 Q1 15
4 4
45 42 15 8
Q3 44.5 9 Q1 26.5 9
6 13
Q3 49 Q1 31.35
Q3 Q1 49 31.35
QD QD 8.83
2 2
Answer = 8.83
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
31
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
NORMAL DISTRIBUTION
An assessment of the normality of data is a prerequisite for many statistical tests as normal data
is an underlying assumption in parametric testing. The normal distribution is used in analysis of data in
determining parametric and non-parametric test. Its graph is called a normal curve. The mathematical
equation of the normal curve was first described in 1733 by De Moivre.
If the grades of the students are plotted on a graph with the frequency of students in the ordinate
or y axis and their grades in the abscissa or x axis, we probably approximate a bell shaped curve like
the figure 8 below.
xx
z
s
where:
z = standard score
x = mean
s = standard deviation
x = a given value of a particular variable
Consider the following procedures in determining the areas under the standard normal curve:
1. If the areas above the mean or right of a positive z-score, subtract the value in the table of the
normal curve areas from 0.5000.
2. If the areas below the mean or left of a positive z-score, add the value in the table of the
normal curve areas to 0.5000.
3. If the areas above the mean or right of a negative z-score, add the value in the table of the
normal curve areas to 0.5000
4. If the areas below the mean or left of a negative z-score, subtract the value in the table of the
normal curve areas from 0.5000
Example 1. Find the area under the normal curve from z 1.18.
Solution:
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
33
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Step 3: Use the standard normal table (see Appendix A) to find this area. Now, a complete copy of the
table is not here. But, here's an abridged version to locate the area under the normal curve.
0.3810
0.5000
0.8810
For the area under standard normal curve from z 1.18 is 0.8810.
Example 2. What is the area under the normal curve from z -0.63?
Solution:
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
34
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
0.5000
0.2357
0.2643
Example 3. Find the area of the normal curve from z = -1.57 to z = 3.99.
Solution:
0.4418
0.5000
0.9418
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
35
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Example 4. Find the z score corresponding to the given area to the right of + z = 0.2000
Solution:
The shaded portion shows that the area to the right of z is 0.2000. To obtain the area we are
looking for, we subtract 0.2000 from the total area of the right half of the normal curve. Hence,
0.5000 - 0.2000 = 0.3000. Referring to the figure below.
0.5000
0.2000
0.3000 0.2995 0.84
we find the entry nearest to 0.3000 is 0.2996 and this corresponds to a z-score of 0.84.
Example 5.
Find the z score corresponding to the given area to the right of +z is 0.3520.
Solution:
The shaded portion shows that the area to the right of z is 0.3520. To obtain the area we are
looking for, we subtract 0.3520 from the total area of the right half of the normal curve. Hence,
0.5000 - 0.3520 = 0.1480. Referring to the figure below.
0.5000
0.3520
0.1480 0.38
Referring to the table of the area of the normal curve, we find the entry 0.1480 and this corresponds to
a z-score of 0.38.
Answer: 0.38
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
36
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Example:
The mean weight of college students is 70 kg and the standard deviation is 3 kg. Assuming that the
weight is normally distributed, what is the probability that the students weigh:
a. Between 60 and 75 kg
60 70
z 3.33
3
75 70
z 1.67
3
2. Sketch the curve and identify the area you need to find (shaded part of the figure below
3. The z value of 3.33 gives an area of 0.4996, while z value of 1.67 corresponds to an area of
0.4525. The shaded area is the sum between these two given areas. Therefore, the required area
is 0.4996 + 0.4525 = 0.9521 or 95.21%
0.4996
0.4525
0.9521
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
37
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
72 70
z 0.67
3
2. Sketch the curve and identify the area you need to find (shaded part of the figure below).
3. The z value of 0.67 gives an area of 0.2486. From the shaded area, we subtract 0.2486 from
the total area of the right half of the normal curve. Hence, 0.5000 - 0.2486 = 0.0.2514 or
25.14%
0.5000
0.2486
0.2514
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
38
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
c. less than 64 kg
64 70
z 2
3
2. Sketch the curve and identify the area you need to find (shaded part of the figure below).
3. The z value of 2.00 gives an area of 0.4772. From the shaded area, we subtract 0.4772 from
the total area of the left half of the normal curve. Hence, 0.5000 - 0.4772 = 0.0228 or 2.28%
0.5000
0.4772
0.0228
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
39
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
N xy x y
r
N x x N y y
2 2 2 2
where :
x = sum of the values of x
y = sum of the values of y
x = sum of the values of the square of x
2
The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value
of 0 indicates that there is no association between the two variables. This is shown in figure 7.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
40
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
LINEAR REGRESSION
Regression is a term used to describe the process of estimating the relationship between
two variables. The relationship is estimated by fitting a straight line through the given data.
The method of least squares permits us to find a line of best fit called regression line which
keeps the errors of prediction to a minimum.
Y a bx
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
41
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
where :
x = sum of the values of x
y = sum of the values of y
x = sum of the values of the square of x
2
Below are the scores of 12 college students in Mathematics and Physics tests of 80 items
each.
Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70
Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop analysis,
conclude “no relationship”. Otherwise proceed to step number 2
72
71
70
69
68
67
66
65
64
60 62 64 66 68 70 72
The scatter plot indicates an upward linear trend between Mathematics and Physics
proficiency. Thus, “there is a reason to believe that they are related.”
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
42
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Mathematics Physics xy
Number x2 y2
(x) (y)
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970
N = 12 x 800 y 811 x 2
53418 y 2
54849 xy 54107
r
1254107 800811
1253418 8002 1254849 8112
r 0.70
Referring to the arbitrary scale for the interpretation of r = 0.70, it states that there is a
strong/ high positive relationship between the scores of the students in Mathematics and
Physics.
Step 3: Formulate the regression line equation by solving first the value of the variables b
and a.
Solving for b
b
1254107 800811
b 0.48
1253418 8002
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
43
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
Solving for a
Y = a + bx
y 35.58 0.48x regression line equation
We can now estimate scores in Physics (y) using the regression line equation by
substituting a value or score in Mathematics (x). Say for instance, if x is equal to 75, then
solving for y will give a 71.59.
y 35.58 0.4875
y 71.58
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
44
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
45
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
II. Classify the data described in the following scenarios as qualitative or quantitative.
_____________ a. The body mass index of elementary grader students in certain school.
_____________ b. The fasting blood sugar readings are determined for several
individuals in a study involving diabetics.
_____________ d. The students in a certain school are classified into one of six
categories in classroom performance as follows: Excellent, Very
Good, Good, Satisfactory, Passed and Failed.
III. PERFORMANCE TASK. Using the table below which shows a frequency distribution
of test scores in entrance examination of 500 students in Mathematics construct a
histogram and line graph for these data using MS EXCEL.
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
46
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
1. The following scores were received by 20 accounting students in a short quiz: 10, 9, 15,
20, 13, 15, 18, 11, 7, 12, 15, 13, 18, 19, 12, 8, 10, 13, 17, and 15. Find the third quartile,
eight decile, forty percentile, mean, median and mode.
2. The following are the scores of ten (10) management students in four quizzes. Solve
the (a) range, (b) quartile deviation, (c) mean absolute deviation, (d) variance, and (e)
standard deviation
Classes Frequency
33-40 4
41-48 8
49-56 10
57-64 11
65-72 9
73-80 6
81-88 2
89-96 5
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
47
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
1. Test scores of nine (9) students are shown below. What can you say about the strength of
the correlation between these sets of scores in Trigonometry and Geometry?
Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
48
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, East Tapinac, Olongapo City
Tel. No. (047) 224-2089 loc. 314
2. The number of hours spent per week viewing television (y) and the number of years of
education (x) were recorded for ten randomly selected individuals. The results are given
below;
x 12 14 11 16 16 18 12 20 10 12
y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.
VI. Reference
https://www.emathzone.com/tutorials/basic-statistics/kinds-or-branches-
statistics.html#ixzz6S54TscJa
https://courses.lumenlearning.com/
https://www.abs.gov.au/
Paguio, D. et al. 2012. Statistics With Computer Based Discussions. Jimczyville
Publication
https://www.stat.uci.edu
“The future belongs to those who believe in the beauty of their dreams”
-Eleanor Roosevelt-
GEC04 1st sem 2020 – 2021 NOT FOR SALE.EXCLUSIVE FOR GORDON COLLEGE ONLY
49