Methods Of Data Collection, Organization And
Presentation
Samuel D.[Bsc/PH, MPH/Epidemiology &
Biostatistics]
Learning Objectives
At the end of this chapter, the students will be able to:
ü Identify the different methods of data organization and
presentation
ü Understand the criterion for the selection of a method to
organize and present data
ü Identify the different methods of data collection and
criterion that we use to select a method of data
collection
2
Introduction
Before any statistical work can be done data must be
collected.
Depending on the type of variable and the objective of
the study different data collection methods can be
employed.
3
Frequency Distributions
A frequency is the number of times a given datum
occurs in a data set.
A frequency distribution is a table that shows
\classes" or \intervals" of data entries with a count of
the number of entries in each class.
4
A categorical distribution
Non-numerical information can also be represented in a
frequency distribution.
In connection with large sets of data, a good overall
picture and sufficient information can often be conveyed
by grouping the data into a number of class intervals.
5
A categorical distribution cont…
Example
Age (years) Number of persons
Under 18 1,748
18 – 24 3,325
25 – 34 3,149
35 – 44 1,323
45 – 54 512
55 and over 335
Total 10,392
This kind of frequency distribution is called grouped
frequency distribution.
6
A categorical distribution cont…
Frequency distributions present data in a relatively
compact form, gives a good overall picture, and contain
information that is adequate for many purposes, but there
are usually some things which can be determined only
from the original data.
For instance, the above grouped frequency distribution
cannot tell how many of the arrested persons are 19
years old, or how many are over 62.
7
A categorical distribution cont…
The construction of grouped frequency distribution
consists essentially of four steps:
1. Choosing the classes
Choosing suitable classification involves choosing the
number of classes and the range of values each class
should cover, namely, from where to where each class
should go.
Both of these choices are arbitrary to some extent, but
they depend on the nature of the data and its accuracy,
and on the purpose the distribution is to serve.
8
A categorical distribution cont…
A guide on the determination of the number of classes
(k) can be the Sturge’s Formula, given by:
K = 1 + 3.322×log(n), where n is the number of
observations
And the length or width of the class interval (w) can be
calculated by:
W = (Maximum value – Minimum value)/K = Range/K
9
A categorical distribution cont…
2. Sorting (or tallying) of the data into these classes,
3. Counting the number of items in each class, and
4. Displaying the results in the forma of a chart or table
10
Cumulative Frequencies
The following are some rules that are generally observed:
1. We seldom use fewer than 6 or more than 20 classes;
and 15 generally is a good number, the exact number we
use in a given situation depends mainly on the number
of measurements or observations we have to group.
2. We always make sure that each item (measurement or
observation) goes into one and only one class, i.e.
classes should be mutually exclusive.
3. Determination of class limits:
Class limits should be definite and clearly stated. In other words,
open-end classes should be avoided since they make it difficult, or
even impossible, to calculate certain further descriptions that may
be of interest.
11
Cumulative Frequencies
When frequencies of two or more classes are added
up, such total frequencies are called Cumulative
Frequencies.
This frequencies help as to find the total number of
items whose values are less than or greater than
some value.
12
Cumulative Frequencies cont…
Note:-
In the construction of cumulative frequency distribution, if we
start the cumulation from the lowest size of the variable to
the highest size, the resulting frequency distribution is called
`Less than cumulative frequency distribution' and
If the cumulation is from the highest to the lowest value the
resulting frequency distribution is called `more than
cumulative frequency distribution.'
The most common cumulative frequency is the less than
cumulative frequency.
13
Relative Frequency
A relative frequency is the fraction of times an
answer occurs.
To find the relative frequencies, divide each frequency
by the total number of students in the sample.
The last entry of the cumulative relative frequency
column is one, indicating that one hundred percent of
the data has been accumulated.
14
Cumulative Relative frequency
Cumulative relative frequency is the
accumulation of the previous relative frequencies.
To find the cumulative relative frequencies, add all the
previous relative frequencies to the relative frequency
for the current row.
15
Mid-Point of a class interval and the determination
of Class Boundaries
Mid-point or class mark (Xc) of an interval is the value of
the interval which lies mid-way between the lower true
limit (LTL) and the upper true limit (UTL) of a class. It is
calculated as:
Xc =
16
True limits (or class boundaries)
Are those limits, which are determined mathematically to
make an interval of a continuous variable continuous in
both directions, and no gap exists between classes.
The true limits are what the tabulated limits would
correspond with if one could measure exactly.
17
Example:
Frequency distribution of weights (in Ounces) of Malignant
Tumors Removed from the Abdomen of 57 subjects
Weight Class Xc. Frequency Cumulative Relative
Ht Boundary Frequency frequency
10-19 9.5 -19.5 14.5 5 5 0.0877
20-29 19.5-29.5 24.5 19 24 0.3333
30-39 29.5-39.5 34.5 10 34 0.1754
40-49 39.5-49.5 44.5 13 47 0.2281
50-59 49.5-59.5 54.5 4 51 0.0702
60-69 59.5-69.5 64.5 4 55 0.0702
70-79 69.5-79.5 74.5 2 57 0.0352
Total 57 1.0000
Note:
The width of a class is found from the true class limit by
subtracting the true lower limit from the upper true limit of any
particular class.
18
Example 2:
Construct a grouped frequency distribution of the
following data on the amount of time (in hours) that 80
college students devoted to leisure activities during a
typical school week:
19
20
21