1
MODULE 3
DATA
MANAGEMEN
T
Let Us Pray
Dear Lord and Father of all,
Thank you for today.
Thank you for ways in which you provide us all.
For your protection and love we thank you .
Help us to focus our hearts and minds now on what we are about
to learn.
Inspire us by Your Holy Spirit as we listen and write.
Guide us by your eternal light as we discover more about the
world around us.
We ask all this in the name of Jesus.
Amen.
3.1: data gathering
3.2: Measure of
central tendency
3.3: Measure of
dispersion
Data Management
and
Data Gathering
Data
Management
WHAT IS DATA
MANAGEMENT or
statistics?
The science of collecting, organizing, presenting, analyzing
and interpreting numerical data.
TYPES OF STATISTICS
1. Descriptive Statistics
- is concerned with collecting, organizing, presenting,
and analyzing numerical data. The statistician tries
to describe or summarize a situation.
TYPES OF STATISTICS
2. Inferential Statistics
- draws conclusions like decisions, predictions, or
generalizations about the data set.
- It implies that before carrying out an inference,
appropriate and correct descriptive measures or
methods are employed to bring out good results.
Data Collection
• Population as used in statistics, refers to a set
of people, objects, measurements, or
happenings that belong to a defined group.
Data Collection
Sample is a portion of the population.
In this way , you will save efforts, time and
resources in conducting your study.
In this way , you will save efforts, time and
resources in conducting your study.
Data Collection
Sample is a portion of the population.
SAMPLING TECHNIQUES
Probability Sampling
1.Random sampling 3. Stratified Sampling
2. Systematic Sampling 4. Cluster Sampling
Non-Probability Sampling
1.Convenience Sampling 3. Quota Sampling
2.Purposive Sampling 4. Snowball sampling
PROBABILITY SAMPLING
All members of the population have equal chances of being chosen
as a part of the sample.
Systematic Sampling
Random Sampling
Members of your
Members of your sample
population are written in
are selected through
a list systematically with
lottery
corresponding numbers.
PROBABILITY SAMPLING
All members of the population have equal chances of being chosen
as a part of the sample.
Stratified Sampling Cluster Sampling
Members of your population are grouped. You
Members of your population are
can choose equal number of respondents in
each group, or in proportion to the number of grouped. Selection of all respondents
elements in each group. are in groups. You can chooses all the
respondents in your selected groups.
NON-PROBABILITY SAMPLING
All members of the population do not have equal chances of being
chosen as a part of the sample.
Convenience Sampling Purposive Sampling
Samples are selected Samples are determined
because of their immediate by the researcher based
availability. on the purpose of the
study.
NON-PROBABILITY SAMPLING
All members of the population do not have equal chances of being
chosen as a part of the sample.
Quota Sampling Snowball Sampling
Samples are selected to
Samples are selected
achieve the needed number of
participants in the study. based on the
recommendation of other
members in the sample.
Lets Try!
I. Identify which item in each column is the Population and the Sample.
1. P NCR S Manila
2. S Tablespoon of sugar PJar of sugar
3. S STEM students PAcademic Track students
4. P Juice in a pitcher SJuice in a glass
5. P All manufactured cellular phones Smodel units of cellular phones
Lets Try!
I. identify the what sampling used in each items.
Random 1. The online reseller writes all her loyal customers in a sheet of paper.
Stratified 2. The coordinator selects 3 students in each grade level.
Systematic 3. A radio program staff member answers every 50th caller.
Cluster 4. The Local Government unit chooses respondents only from
barangays that are placed under hard lockdown.
Convenience 5. Asking 100 customers who are leaving the mall.
Snowball 6. Accepting blood donations from persons with AB- blood type and asking them if they
can also refer friends whom they know with the same blood type.
Purposive 7. Selecting Covid-19 survivors as respondents in a survey because the study deals with
the development of a new coronavirus vaccine.
Quota 8. Posting an online survey and accepting only 300 responses.
METHODS OF DATA GATHERING
Direct Method Indirect Method
Includes observation and Includes ways in which you can
interview where you get the obtain the needed data without
information firsthand your actual presence.
Example:
Example:
The researcher sets a
The researcher checks the
particular time and date
school records for the
to talk with the
average grade of his
respondents.
respondents.
CLASSIFICATIONS OF DATA
Qualitative Data Quantitative Data
Categories that show Numbers or values that
classifications or subtitles. represents counts or
Gender, Marital Status, measures.
Weight, number of siblings,
Grade Level, Senior High
Track/Strand hours spent in studying.
Discrete data
Continuous Data
LEVELS OF MEASUREMENT
Nominal - data that are categorical.
Examples: Gender, Nationality, Civil Status
Ordinal - data that are in ordered or ranked categories
Examples: rating (good, better, best), ranking (first, second, third
Interval - data that have no real zero.
Examples: Temperature because having 0 degrees does not mean no
temperature
Ratio - data that have real zero
Examples: Weight because 0 kilograms means no weight at all
Measure of
Central Tendency
Measure of Central Tendency
Mean
Example:
The mean is the The grades in Statistics of 10
sum of the item students are 82, 85, 79, 78, 89,
values divided by
the number of 87, 88, 89, 75, and 77.
items.
What is their average grade?
Measure of Central Tendency
Mean
The mean is the
sum of the item
values divided by
the number of
items.
Measure of Central Tendency
Media • The grades in Statistics of 10
Thenmedian is the students are 82, 85, 79, 78, 89, 87,
value of the 88, 89, 75, and 77.
middle term when • What is the median?
data are arranged
in either
ascending or 75, 77, 78, 79, 82, 85, 87, 88, 89,
descending order. 89
Measure of Central Tendency
Mode • The grades in Statistics of 10
The mode is students are 82, 85, 79, 78,
referred to as the 89, 87, 88, 89, 75, and 77.
most frequently
occurring value in • What is the mode?
a given set of
data. 75, 77, 78, 79, 82, 85, 87, 88, 89, 89
Mean, Median and Mode
(grouped Data, Decreasing Order)
Mean of grouped data
The "mean" is the "average" you're
used to, where you add up all the
numbers and then divide by the
number of numbers.
The data represents the ages of 40 women when
they each had a boyfriend. Construct a grouped
frequency distribution with 5 classes.
18 20 20 20 20 21 20 17 19 20
19 18 22 26 20 19 22 15 18 27
16 23 24 17 25 24 16 20 26 15
21 17 23 16 21 17 26 16 23 19
Grouped Frequency 18 13 16 21
Distribution 20 18 23 17
Class Limits
21 22 24 23
Frequency
25 – 27 5 20 26 17 16
22 – 24 7 20 20 25 21
19 – 21 11 21 19 24 17
16 – 18 14
13 – 15 3
20 22 16 26
17 15 20 16
total 40
19 18 26 23
20 27 15 19
Mean
∑ 𝑓𝑋
´𝑥 =
∑𝑓
Class Limits Frequency
25 – 27 5
22 – 24 7
19 – 21 14
16 – 18 11
13 – 15 3
Mean
Where
f - frequency
X - class mark (midpoint)
𝑓𝑋
´𝑥 =
∑ fX - product of the frequency and
the class mark
∑𝑓 ΣfX - sum of the product of the
frequency and the class mark
Σf - total frequency
- sample mean
Mean
∑ 𝑓𝑋
´𝑥 =
(25+2
7)/2 ∑𝑓
=26
Class limits f X fX
25-27 5 26 130
22-24
19-21
7 23 161 = =20
14 20 280
16-18
11 17 187
13-15
3 14 42
´
𝟒𝟎 ´
𝟖𝟎𝟎
Σ f / 2 ≺ cf
Median
Md =lb mc +
[ f mc ]i
Where
- the lower boundary of the
median class
Σf/2 - total frequency divided by 2
<cf - cumulative frequency of the
lower class next to the median class.
- frequency of the median class.
i - class width
Σ f / 2 ≺ cf
Median
Md =lb mc +
[ f mc ]i
Class limits f
The median class is
25-27 5 the class with the
22-24 7 smallest cumulative
19-21 14
frequency greater than
or equal to Σf/2.
16-18 11
13-15 3
Σ f / 2 ≺ cf
Median
Md =lb mc +
[ f mc ]i
Class limits f <cf
25-27 5 Md= 18.5 + 3
22-24 7 40
35
19-21 1
4 28 = 18.5 + 3
16-18 1 14
1 3 =18.5 + 1.29
13-15 3
´ Md = 19.79
𝟒𝟎
lb=19 – 0.5 = 18.5
i=16-13=3 Σf/2=20
𝐷1
Mode
Mo=lb mo +
[
𝐷 1+ 𝐷 2
i
]
Where
- lower boundary of the modal class
- highest frequency minus the frequency of the next
lower class
- highest frequency minus the frequency of the next
upper class
i - class width
Mode
Class f
limits The modal class is the
25-27 5 class with the highest
22-24 7 frequency.
19-21 14
16-18 11
13-15 3
Mode
Class f Mo = 18.5 + 3
limits
25-27 5
= 18.5 + 3
22-24 7
> =7
19-21 14 > =18.5 + 0.9
=3
16-18 11 Mo =19.4
13-15 3
lb=19 - 0.5 = 18.5
i=16 - 13=3
Mean of Ungroup Data
-the Mean is the most commonly
used measure of central Tendency.
When we speak of average, we
always refer to the mean.
Σ 𝑥
𝑥
´ =
𝑁
Example:
Six friends in a biology class of 20 students receives test
grades of 92, 84,65,76,88 and 90. Find the mean of these
test score.
First get the sum of their scores:
=
=
82.5
Example:
The ages of five contestants in a Statistics Quiz Bee
are the following:
18,17, 18,19 and 18. Find their average age.
=
=
18
Median of Ungroup Data
It is the midpoint of the data array. Before finding the value,
the data must be arrange in order, from least to greatest or
vice versa. The median will either be a specific value or will
fall between two values.
=
=
Example:
Seven mothers were selected and given a blood pressure
check, their blood pressure were recorded below.
135, 121, 119, 116, 130, 121, 131
Find their Median.
Solution: arrange the data in order.
116,119,121,130, 131, 135
= 121
Example:
Eight novels were randomly selected and the numbers of
pages were recorded as follows:
415, 398, 402, 400, 420, 415, 407, 425
Find their Median.
Solution: arrange the data in order.
398, 400, 402, 407, 415,4,15, 420, 425
=
= 411
Mode of Ungroup Data
It is the value that occurs most often in the data
set.
The number/value/observation in a data set
which appears the most number of times.
Example:
Finds the mode of the given data set:
15, 28, 25, 48, 22, 43, 39, 44, 43, 49, 34, 22, 33, 27, 25, 22, 30
Arrange the data set
15, 22, 22, 22, 25, 25, 27, 28, 30, 33, 34, 39, 43, 43, 44, 48, 49
Another Example:
The speed of ten stenographer in typing per minute are as follows:
121, 110, 120, 119, 112, 121, 118, 115, 107, 115.
Arrange the data set:107, 110, 112, 115, 115, 118, 119, 120, 121, 121
The data set has two models: 115 and 121- the data set is said to be
bimodal.
Example:
Finds the mode of the given data:
2, 5, 8, 9, 11, 4, 23.
There is no mode
Measures of
Dispersion
Definition
Measure of dispersion are descriptive statistics that describe how
similar a set of scores are to each other.
The more similar the scores are to each other, the lower
the measure of dispersion will be.
The less similar the scores are to each other, the higher the
measure of dispersion will be..
In general, the more spread out a distribution is, the larger
the measure of dispersion will be.
Measures of Dispersion
Which of the distribution
of scores has the larger
dispersion?
The upper distribution has
more dispersion because the
scores are more spread out.
That is, they are less similar
to each other.
Measures of Dispersion
These are the measures of dispersion:
The range
Interquartile range
Variance/standard deviation
Coefficient of Variaton
The Range
The range is defined as the difference between the largest score
in the set of data and the smallest score in the set of
s data,
L X -X
What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
The largest score (XL ) is 9; the smallest scores (X ) is 1;
The range is - = 9-1=8
When to use the Range
The range is use when
You have ordinal data
The range is rarely used in scientific work as it is fairly
insensitive
It depends on only two score in the set of data, X and X
Two very different sets of data can have the same L s
range:
1 1 1 1 9 vs 1 3 5 7 9
Variance
Variance is defined as the average of the
square deviations:
=
What does the Variance Formula mean?
It says to subtract the mean from each of the scores
This difference is called a deviate or a deviation score
The deviate tells us how far a given score is from a typical, or average,
score
Thus, the deviate is a measure of dispersion for a given score
=
What does the Variance Formula mean?
One of the definitions of the mean was that it always made the sum of the scores
minus the mean equal to 0
Thus, the average of the deviates must be a 0 since the sum of the deviates must
equal 0.
to avoid this problem, statisticians square the deviate score prior to averaging them
Squaring the deviate score makes all the squared scores positive.
≠ =
Standard Deviation
When the deviate scores are squared in variance, their unit
of measure is squared as
2 well
E.g. If people’s weights are measured in pounds, then the variance
of the weights would be expressed in pounds (or squared pounds)
Since squared units of measure are often difficult to deal
with, the square root of variance is often used instead.
• The standard deviation is the square root of variance
Variance of a Sample
When calculating variance, it is often easier to use a
computation formula which is algebraically equivalent to the
definitional formula:
=
is the population variance, X is a score, µ is the population
mean, and N is the number of scores.
Variance of a Sample
the sample mean is not a perfect estimate of the population mean, the
formula for the variance of a sample is slightly different from the formula for
the variance of a population:
2=
S
is the sample variance, X is a score, is the sample mean, and
N is the number of scores.
Coefficient of Variation
It tells if a standard deviation in large or small by comparing the
standard deviation to the mean.
It allows comparison of standard deviations that come from data
sets with different means.
For population: cv= x 100%
For the sample: cv= x100%
Coefficient of Variation
Find the measures of dispersion in the given sample
data.
6 ,7, 7, 8 , 9 , 10
1. Range: 10-6 = 4
Exercise:
Find the measures of dispersion in the given sample
data.
6 , 7, 7, 8 , 9 , 10
1. Range: 10-6 = 4
2. Variance
S2 =
Get the MEAN
= = = 7.83
Exercise:
Find the measures of dispersion in the given sample data.
6, 7, 7, 8, 9, 10
2. Variance x x-
= 7.83 6 6-7.83=-1.83 =3.3489
s2 = 7 -0.83 0.6889
7 -0.83 0.6889
s2 = 8 0.17 0.0289
s2 = 9 1.17 1.3689
s2 = 2.1667 10 2.17 4. 7089
10. 8334
Exercise:
Find the measures of dispersion in the given sample data.
6, 7, 7, 8, 9, 10
2. Variance
s2 = 2.1667 4. Coefficient of Variation
3. Standard Deviation cv= x100%
s=
cv = x 100
s=
cv= 0.1880 x100
s= 1.4720
cv=18.8%