Sampling Terminologies
i) Population
A group of individual persons, objects or items from which samples are taken.
ii) Sample: A sample is a subject of the population. A sample is a finite part of the
population whose properties are studied to gain information about the whole
population
Assume we have population of 100 Pebble stones
We are interested in the weight of the stones
We shall examine different methods of obtaining a sample of size 10
iii) Survey/study population: This is the finite population from which we will select
our samples (the 100 stones).
iv) Population characteristic: this is the aspect of the population we wish to measure.
In this case, it is the weight of the pebbles.
v) Sampling unit: the individual unit we are sampling. In this case, it is an individual
pebble.
vi) Sampling frame: A list of all sampling units in the survey/study population. In this
case, it is a list containing the stones numbers 1 − 100 .
vii) Census: A survey consisting of every member of the population. A census would
involve weighing all 100 stones
1
Reasons for sampling
Cost
Sampling a fraction of the population is cheaper (cost effective) than conducting
a census.
Sampling rather than using a census saves time
We do sample because some population are partly accessible.
Some populations are very large
For accuracy purposes
Sampling Methods
Sampling is the act, process or technique of collecting a suitable sample, or presenting
part of the population for the purpose of determining parameters or characteristics of the
whole population.
i) Accessibility sampling: the most easily obtained observations are chosen.
ii) Judgment sampling: the experimenter chooses the sample based on what he
or she thinks is a representative sample
iii) Quota sampling: this typically combines accessibility and judgment sampling.
iv) Random sampling: members of the sample are chosen at random. There are
two types of basic random sampling.
Simple random sampling: This is random sampling without
replacement. Each population is either not in the sample or in once.
Simple random sampling gives equal probability of selection to every
permitted (unordered) sample of a given size.
Unrestricted random sampling: this is a random sampling with
replacement. All possible population members are available for each
random selection. So a population member may be in the sample more
than once.
v) Stratified sampling:
Sometimes populations within an entire population vary considerately. In this
case, it is advantageous to divide the sample into subpopulations called strata
and then perform simple random sampling within each stratum. This is known
as stratified sampling.
vi) Cluster and multi- stage sampling
Types of data
Quantitative and qualitative data
2
Descriptive statistics
Presentation of;
Tables
Frequency table, cumulative frequency tables and the stem-and-leaf tables
Graphs
Histograms, frequency polygons, cumulative frequency polygons
Steps involved in making the frequency table
i) Compute the range ′𝑟′ of data
𝑋 = 𝑙𝑎𝑟𝑔𝑒𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
If { 𝑚𝑎𝑥
𝑋𝑚𝑖𝑛 = 𝑙𝑜𝑤𝑒𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
Then 𝑟 = 𝑋𝑚𝑎𝑥 − 𝑋_𝑚𝑖𝑛
ii) Find the number of classes C and the class width(class length) 𝑤
# Of classes ⇒ 2𝐶 ≥ 𝑛 where n is the number of observations (sample size)
𝑛 = 100, find c.
Example 1
i) 𝑛 = 100 , find 𝑐
2𝑐 ≥ 𝑛 (for the first time)
2𝑐 ≥ 100
21 ≱ 100
22 ≱ 100
23 ≱ 100
24 ≱ 100
25 ≱ 100
3
26 ≱ 100
27 ≥ 100
i.e. 27 = 128 ,128 ≥ 100
𝐶 = # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 = 7
ii) If 𝑛 = 250 , find c
𝑐 = 8 classes (i.e. 28 = 256 ≥ 250
The width is given by
𝑟 𝑟𝑎𝑛𝑔𝑒
𝑤 = 𝑐 = # 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 (Round off to the nearest whole number)
iii) Define the class limits (Lower limit ℒ 𝑖 and upper limit 𝒰𝑖 )
𝐿𝑖 for the first class is supposed to be less or equal to 𝑋_𝑚𝑖𝑛
𝒰𝑖 =Upper limit for the first class
𝒰𝑖 = 𝐿𝑖 + 𝑤
𝐿𝑖 +𝒰𝑖
Class mark, 𝑋𝑖 = ,𝑖 = 1 , 2 ,3 ,⋯⋯𝑐
2
iv) Get the frequencies, 𝑛𝑖 , 𝑖 = 1 ,2 , ⋯ ⋯ 𝑐
Can also get the relative frequencies given by
𝑛𝑖
ℱ𝑖 =
𝑛
[10 − 20[ Means 10 is included but 20 isn’t.
4
Frequency table:
# of classes class limit class mark 𝐶𝐹𝑖 relative freq(𝑛𝑖 )
(𝑋𝑖 ) Freq
1 [𝐿1 − 𝒰1 [ 𝑋1 𝑛1 ℱ1 𝑛1
2 [𝐿2 − 𝒰2 [ 𝑋2 𝑛1 + 𝑛2 ℱ2 𝑛2
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑋𝑐 𝑛1 + 𝑛2 + ⋯ + 𝑛𝑐 ℱ𝑐 𝑛𝑖
Stem - and – leaf table
Example
Take marks obtained in MA 110
50 , 55 , 58 , 60 , 62 , 63 , 64 , 69 , 70 , 72 , 77 , 80 , 84
Stems Leaves
5 0 5 8
6 0 2 3 4 9
7 0 2 7
8 0 4
5
Numerical or Descriptive Measures
Measures of location: Mean, mode, median
Measures of spread: variance, standard deviation
Measures of skewness: coefficient of skewness
Measures of relative spread: Coefficient of variation
Ungrouped Data (raw data):
Mean, 𝑋̅ (𝑋 bar):
Sample data 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛
𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑋̅ =
𝑠𝑖𝑧𝑒 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒
𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
𝑋̅ =
𝑛
𝑛
1
𝑋̅ = ∑ 𝑋𝑖
𝑛
𝑖=1
Median
Median observation refers to the middle observation or value when the data are
arranged in increasing sequence.
Sample data: 𝑋(1) , 𝑋(2) , 𝑋(3) , …. , 𝑋(𝑛)
𝑋𝑚𝑖𝑛 𝑋𝑚𝑎𝑥
6
Median
𝑋(𝑛+1) 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
⇒ 𝑀𝑑 = {𝑋(𝑛) + 𝑋(𝑛+1)
2 2
𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2
Example
Find the median in the following sequence
50, 85, 47, 62, 58, 80 (sample size 𝑛 = 6)
Example
Find the variance for the sequence
2, 4, 7, 11, 15
39
𝑋̅ = 5 = 7.8
𝑐
(𝑋1 − 𝑋̅)2 + (𝑋2 − 𝑋̅ )2 + ⋯ + (𝑋𝑐 − 𝑋̅)2 1
𝑉𝑎𝑟(𝑋) = = ∑(𝑋𝑖 − 𝑋̅ )2
𝑛−1 𝑛−1
𝑖=1
Variance
(2 − 7.8)2 + (4 − 7.8)2 + (7 − 7.8)2 + (11 − 7.8)2 + (15 − 7.8)2
(𝑆 2 ) =
5−1
110.8
= 4
= 27.7
Standard deviation S
S= √𝑆 2
7
Measure of skewness
3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑠𝑘𝑒𝑤𝑛𝑒𝑠𝑠 =
𝑆
i) If 𝑆𝑘 = 0 (meaning 𝑋̅ = median), then the data set is symmetrical
ii) If 𝑆𝑘 is less than 0 (𝑋̅ < 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the left
iii) If 𝑆𝑘 is greater than 0(𝑋̅ > 𝑚𝑒𝑑𝑖𝑎𝑛), then the data is skewed to the right.
COEFFICIENT OF VARIATION
𝑆
CV = 𝑋̅
8
GROUPED DATA
To group the data, we will arrange the data in a table as follows:
𝐼𝐷# class class Freq 𝐶𝐹𝑖 𝑛𝑖 𝑥 𝑖 𝑅. 𝑓𝑟𝑒𝑞 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2
marks(𝑥𝑖 ) (𝑛𝑖 )
limit 𝑓𝑖
𝑛1
1 [𝐿1 − 𝒰1 [ 𝑥1 𝑛1 𝐶𝐹1 𝑛1 𝑥1 𝑛1 (𝑥1 − 𝑥̅ )2
𝑛
2 [𝐿2 − 𝒰2 [ 𝑥2 𝑛2 𝐶𝐹2 𝑛2 𝑥 2 𝑛2 𝑛2 (𝑥2 − 𝑥̅ )2
𝑛
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑘−1 [𝐿𝑘−1 − 𝒰𝑘−1 [ 𝑥𝑘−1 𝑛𝑘−1 𝑛𝑘−1 𝑥𝑘−1 𝑛𝑘−1 ⋮
𝐶𝐹𝑘−1 𝑛
𝑛𝑘
𝑘 [𝐿𝑘 − 𝒰𝑘 [ 𝑥𝑘 𝑛𝑘 𝐶𝐹𝑘 𝑛𝑘 𝑥 𝑘
𝑛
𝑘+1 [𝐿𝑘+1 − 𝒰𝑘+1 [ 𝑥𝑘+1 𝑛𝑘+1 𝐶𝐹𝑘+1 𝑛𝑘+1 𝑥𝑘+1 𝑛𝑘−1
𝑛
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝑛𝑐
𝑐 [𝐿𝑐 − 𝒰𝑐 [ 𝑥𝑐 𝑛𝑐 𝐶𝐹𝑐 𝑛𝑐 𝑥 𝑐
𝑛
9
Mean:
(𝑛1 𝑋1 + 𝑛2 𝑋2 + ⋯ + 𝑛𝑐 𝑋𝑐 )
𝑋̅ =
𝑛
Sample mean:
𝑐
1
= ∑ 𝑛𝑖 𝑋𝑖
𝑛
𝑖=1
Variance(𝑆 2 ):
𝑛1 (𝑋1 − 𝑋̅ )2 + 𝑛2 (𝑋2 − 𝑋̅)2 + ⋯ + 𝑛𝑐 (𝑋𝑐 − 𝑋̅)2
𝑆2 =
𝑛−1
Sample variance
𝑐
1
= ∑ 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2
𝑛−1
𝑖=1
Median(𝑴𝒅 )
Locate the median class 𝑘 that that
𝑛
𝐶𝐹𝑘 ≥ (for the first time)
2
Where 𝐶𝐹𝑘 is the cumulative frequency for the 𝑘 𝑡ℎ class and 𝑛 is the sample size
Then use the formula
𝑤 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘−1 ) to compute the median.
𝑘
10
Mode (𝑀𝑜 )
Locate the modal class(class having the highest frequency). If we let 𝑘 be the modal
class, then
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝑤 (𝑑 )
1 +𝑑2
Where
𝑑1 = 𝑛𝑘 − 𝑛𝑘−1
𝑑2 = 𝑛𝑘 − 𝑛𝑘+1
Example
The following data represent the examinations scores obtained by 100 students in
MA110 course.
40 41 42 44 45 46 46 46 47 47 47 47
48 48 49 50 50 50 51 51 52 52 52 52
52 52 53 53 53 53 53 53 54 54 54 55
55 55 55 56 56 56 56 56 57 57 57 57
57 57 57 57 57 57 57 58 58 58 58 58
58 58 59 59 59 59 60 60 60 60 61 61
61 61 61 62 62 62 63 63 63 63 64 64
64 65 65 66 66 67 67 67 67 68 68 69
70 71 72 74
a. Construct a stem – and – leaf plot and use it to determine the mode and the
median for the above ungrouped data.
Stem leaves
4 0 1 2 4 5 6 6 6 7 7 7 7 8 8
5 0 0 0 1 1 2 2 2 2 2 2 3 3 3
6 0 0 0 0 1 1 1 1 1 2 2 2 3 3
7 0 1 2 3 4
11
Stem leaves
4 9
5 3 3 3 4 4 4 5 5 5 5 6 6 6 6 6
6 3 3 4 4 4 5 5 6 6 7 7 7 7 8 8
7
Stems leaves
5 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9
6 9
Summary
Leaves
4 5 6 7
Total 15 51 30 4 100
Mode(𝑴𝒐 ) = 57%
It has the highest frequency. Most of the students got 57%
𝑋 𝑛 +𝑋 𝑛
( ) ( +1)
2 2
Median = , 𝑛 = 100(sample size)
2
𝑋(50) + 𝑋(51)
=
2
57+57
= 2
= 57
12
Roughly, 50% of the students got above 57% mark and 50% of the students got below
or equal to 57%.
b. Construct an absolute frequency distribution and an absolute cumulative
frequency distributions
c. Construct a histogram, a frequency polygon and use the histogram to estimate
the mode. Use the cumulative frequency polygon to estimate the median.
d. Determine the mean, the mode, the median, the variance, the standard deviation
and coefficient of skewness for the grouped data. Interpret these numbers
correctly
Answers
b. Absolute frequency and absolute cumulative frequency distribution
I. First get the range
Range, 𝑟 = 𝑋max − 𝑋𝑚𝑖𝑛
= 74 − 40
= 34
ii. find the number of classes
# of classes= 2𝑐 ≥ 𝑛 , 𝑛 = 100
∴ 𝑐 = 7 classes
iii. Determine the class width
𝑟 34
𝑤=𝑐= = 4.85
7
iv. Determine the first lower limit(should be less or equal to 𝑋𝑚𝑖𝑛
NB: The frequency polygon starts and ends with frequency of zero
𝐿1 ≤ 40, thus we can have 35 as our first limit which implies that
𝒰𝑖 = 𝐿𝑖 + 𝜔
= 35 + 5 = 40
This means that the first class limit is [35 − 40[
13
Note that the table below is the absolute cumulative frequency table because it has
both the frequency and the cumulative frequency columns.
ID # Class limit Class Frequency 𝑛𝑖 𝑥 𝑖 𝐶𝐹𝑖 𝑛𝑖 (𝑋𝑖 − 𝑋̅)2
mark(𝑥𝑖 )
[𝐿𝑖 − 𝒰𝑖 [ (𝑛𝑖 )
0 [35 − 40[ 37.5 0 0 0 0
1 [40 − 45[ 42.5 4 170 4 894.01
2 [45 − 50[ 47.5 11 522.5 15 1089.0275
3 52.5 20 1050 35 490.05
[50 − 55[
4
[55 − 60[ 57.5 31 1782.5 66 0.075
5
[60 − 65[ 62.5 19 1187.5 85 484.5475
6
[65 − 70[ 67.5 11 742.5 96 1111.0278
7
[70 − 75[ 72.5 4 290 100 906.01
8 [75 − 80[ 77.6 0 0 100 0
100 = 𝑛 5745 4974.75
For grouped data
𝑐
1
𝑥̅ = ∑ 𝑛𝑖 𝑥𝑖
𝑛
𝑖
1
= 100 × 5745
= 57.45
𝑛2 = 4(42.5 − 57.45)2 = 894.01
14
Construct the histogram
(35, 0), (40, 4), (45, 11), (50, 20), (55, 31);
35
Frequency 30
25
20 • frequency polygon
•
15
10 • •
5 • •
• •
35 40 45 50 55 60 65 70 75 80
Mode ≈ 57 Lower limit
Histogram
Plot lower limit / frequency
Frequency polygon
Plot class mark / frequency
(40, 0), (45, 4), (50, 15)
15
CF polygon
Plot 𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 / 𝐶𝐹𝑖
100 •
•
90
•
80
70 cumulative frequency
•
60 polygon
50 •
40
•
30
20 •
10 •
•
40 45 50 55 60 65 70 75 80 upper limit
median≈ 57
Mean:
𝑐
1
𝑥̅ = ∑ 𝑛𝑖 𝑥𝑖
𝑛
1=1
1
= 100 (5745)
= 57.45
Mode(𝑀0 )
The modal class is class 4
16
𝑢 is the value for the modal class
i.e. 𝑢 = 4
𝑑1 = 𝑛4 − 𝑛3 = 31 − 20 = 11
𝑑2 = 𝑛4 − 𝑛5 = 31 − 19 = 19
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2
11
= 55 + 5 (11+12)
= 57.39
Median (𝑀𝑑 )
Find the median class from the table using the following procedure.
*Modal class will not always be the median class, in our example, it was just a
coincidence.
𝑛
We find the median class by comparing 𝑛 and 𝐹𝑖 , since 𝑛 = 100, = 50. So the median
2
class is the class that reaches 50 (for the first time) in the cumulative frequency column,
that is
Class 1= 4
Class 2= 16
Class 3= 35
Class 4= 66
Thus class 4 is the median class because it reaches 50 for the first time
∴ 𝑘 = 4 class = 4
𝑤 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘−1 )
𝑘
𝑤 𝑛
𝑀𝑑 = 𝐿4 + 𝑛 ( 2 − 𝐶𝐹3 )
4
5 100
= 55 + ( − 35)
31 2
= 57.42
17
Variance (𝑺𝟐 )
1
𝑆 2 = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑋𝑖 − 𝑋̅ )2
1
= 100−1 (4974.75)
1
= 99 (4974.75)
= 50.25
Standard deviation
𝑠 = √𝑆 2
= √7.09
Coefficient of skewness
3(𝑋̅−𝑀𝑑 )
𝑆𝑘 = 𝑆
3(57.45−57.42)
= 7.09
= 0.012 (round off to nearest whole number)
= 0 (Rounding off)
Data is symmetrical since 𝑆𝑘 = 0
18
Q11 (worksheet)
Class i Class Tally Frequency(𝑛𝑖 ) C. relative Class 𝑛𝑖 𝑥 𝑖 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2
limit frequency mark
[𝐿𝑖 − 𝒰𝑖 [ (𝑥𝑖 )
||||| ||||
1 [10 − 12[ 9 9 = 9/50 11 99 53.5824
||||| |||||
2 [12 − 14[ ||||| 26 35 = 35/50 13 333 5.0336
| |||||
|||||
||||| |||||
3 [14 − 16 10 45 15 150 24.336
45 =
50
|||||
4 [16 − 18[ 5 50 = 50/50 17 85 63.368
672 146.32
Mean
1
𝑥̅ = ∑𝑛 𝑥
𝑛 𝑖 𝑖
1
= 50 (672)
= $13.44
19
1
Variance(𝑆 2 ) = 𝑛−1 ∑𝑐𝑖=1 𝑛𝑖 (𝑥𝑖 − 𝑥̅ )2
1
= (146.32)
50−1
1
= 49 (146.32)
= 2.986
Mode
𝑑1
𝑀𝑜 = 𝐿𝑘 + 𝜔 (𝑑 )
1 +𝑑2
Modal class is class 2, thus 𝑘 = 2
𝐿(2) = 12
𝑑1 = 𝑛(2) − 𝑛(1) 𝑑2 = 𝑛(2) − 𝑛(3)
= 26 − 9 = 26 − 10
= 17 = 16
17
𝑀𝑜 = 12 + 2 (17+16 )
= $ 13.03
Median
𝜔 𝑛
𝑀𝑑 = 𝐿𝑘 + 𝑛 ( 2 − 𝐶𝐹𝑘+1 )
𝑘
𝑛
𝐶𝐹 ≥ for the first time in class 2 , thus 𝑘 = 2
2
𝐿𝑘 = 12 , 𝜔 = 2 , 𝑛𝑘 = 26 , 𝐶𝐹(2−1) = 9
2 50
𝑀𝑑 = 12 + 26 ( 2 − 9)
1
= 12 + 13 (25 − 9)
= 13.23
20
13. Find sample size 𝑛, 𝑛 = 300
To find sample proportion, find
𝑛(𝑆𝑂𝑇)
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
i.e. sample presentation(SOT)
𝑛(𝑆𝑂𝑇)
=
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
100 1
= 300 = 3 = 0.33̅
𝑛(𝑓𝑒𝑚𝑎𝑙𝑒) 82
= 300
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
41
= 150 = 0.273̅
𝑛(𝑑𝑒𝑔𝑟𝑒𝑒) 235
= 300 = 0.783̅
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
A probability should lie between 0 and 1
Probability is 1 when you are certain
When there is no chance of something happening, the probability is zero.
21