Biostatistics
Tutorials
Tutorial # 5
Data Summarization
Biostatistics
Raw data Collect
Organize D
Processing (Treatment)
Present A
Summarize T
Analyze A
Interpret
Data Summarization
Measures of Central
Measures of Variability
Tendency
= Measures of
= Averages
Dispersion (Scatter)
Show the center of Show the spread of data
data around the center
Data Average Data Data Average Data
Data Summarization
Measures of Central
Measures of Variability
Tendency
1 1
Arithmetic Mean Range
2
2 Average Deviation
Median 3
Variance
3
Mode 4
Standard Deviation
The Arithmetic Mean
The Mean = The average = X
The Most widely used measure of
central tendency
Defenition : The sum of all data values
divided by the total number of values
The Arithmetic Mean
A. For raw data
Data : x1, x2, x3 … xN
Σx = x1 + x2 + x3 + …. xN
N = Total No. of data
The Arithmetic Mean
B. For data grouped in FDTs
Data : classes having frequency (F) &
midpoints (X)
ΣF.X = F1.X1 + F2.X2 …etc
N = Total No. of data
The Arithmetic Mean
Class Class Midpoint Frequency
Interval (x) (F)
120- 123 8
126- 129 9
132- 135 12
138- 141 10
144-150 147 11
Total 50
The Arithmetic Mean
Class Class Midpoint Frequency F.X
Interval (x) (F)
120- 123 8 984
126- 129 9 1161
132- 135 12 1620
138- 141 10 1410
144-150 147 11 1617
Total N = 50 ΣF.X = 6792
Mean 135.84
The Arithmetic Mean
20 35 45 60
38 39 41 42
40
The mean summarizes data in one number
The mean shows the center of data
The mean does not show the spread of data around the
center (c.f. measures of variability)
Although it describes all data, it is mostly none of
these data (c.f. median : middle value of data)
The Median
The Median = The Middle
The value that divides all data into 2
equal halves after their arrangement
It may be : the middle value or the
average of the 2 middle values
(according to No. of data : odd or even)
The Median
If N is the No. of data
If odd No. of data If even No. of data
MD is the middle MD is the average of
value the 2 middle values
The Median
A. Odd No. of Data
Calculate the median for the following data : 142, 116, 166, 128, 138
Solution
A. Arrange the data ascendingly
116 128 138 142 166
B. Position of MD = (N+1)/2 = (5+1)/2 = 3
116 128 138 142 166
3
The Median
A. Even No. of Data
Calculate the median for the following data : 111, 126, 117, 146, 130, 149
Solution
A. Arrange the data ascendingly
111 117 126 130 146 149
B. Position of MD = (N+1)/2 = (6+1)/2 = 3.5
111 117 126 130 146 149
3.5
The Median
C. Median = Average of 126 & 130
MD = (126+ 130)/2 = 128
N.B : 128 is not one of the data values
The Mean Versus The Median
The Mean The Median
Affected by extremes Not affected by
(outliers) extremes (outliers)
Example
5, 6, 7, 8, 9, 10, 100
Mean = 20.7 Median = 8
The Mode
The most frequent value
The data may have :
No mode ≥ 1 mode
1 mode 2 modes >2 modes
(Unimodal) (Bimodal) (Multimodal)
The Mode
5, 8, 6, 5, 9, 5, 3 Mode = 5 (Unimodal)
5, 8, 6, 5, 9, 8, 3 Mode = 5, 8 (Bimodal)
5, 8, 6, 5, 9, 8, 6 Mode=5,8,6 (Multimodal)
5, 8, 6, 3, 2, 4, 9 Mode = No mode
The Mode
Unimodal Bimodal
No mode
The Mean, Median & Mode
Symmetric Distribution Positively-skewed Distribution
Mean = Median = Mode Mode < Median < Mean
Negatively-skewed Distribution
Mean < Median < Mode
Measures of Variability
20 35 45 60
38 39 41 42
40
Measures of central tendency are not enough
alone to completely describe the data
Why ?
As they show the center of data but do not show
spread of data around the center
Thus both measures will be used
The Range
The simplest measure of variablility
The difference between the maximum & the
minimum
Range = Xmax – Xmin
20 35 45 60
Xmin Xmax
The Range
Disadvantage
It depends only on the 2 outliers (maximum &
minimum) & does not take into account the
other values
Range = Xmax – Xmin
20 35 45 60
Xmin Xmax
The Average Deviation
= The Mean Deviation
Data Mean Data
Deviation (d) = Absolute difference between any
value [X] & the mean [X] (Absolute i.e. must be +)
The Average Deviation
Why absolute differences ?
As the summation of differences
will be always = zero
The result will be always = zero
The Average Deviation
Rules
Average deviation is more precise
measure of variability than range
The Average Deviation
Calculate average deviation for each set of data
20 35 45 60
38 39 41 42
40
Then compare between the 2 sets of data and
write your comment
The Average Deviation
X X d X X d
20 40 20 38 40 2
35 40 5 39 40 1
45 40 5 41 40 1
60 40 20 42 40 2
Σ --- 50 Σ -- 6
Average Deviation = 50/4 = 12.5 Average Deviation = 6/4 = 1.5
Average Deviation is directly proportional to degree of
variability (dispersion; scatter) of values
The Variance (S )
2
The average of squared deviations of values from the
mean
Need the
mean
No need of
the mean
(N-1) = Degree of freedom = No. of data minus 1
The variance is directly proportional to variablity
The Variance (S )
2
Calculate variance of 10, 11, 13, 18, 17, 16, 20; (N= 7)
Mean = (10+11+13+18+17+16+20)/7 = 15
X X d d2
10 15 -5 25
11 15 -4 16
13 15 -2 4
18 15 3 9
17 15 2 4 S2 = 84 / 6 =
16 15 1 1 14
20 15 5 25
Σ 105 --- --- 84
The Variance (S )
2
X X2
10 100
11 121
13 S2 = [1659 – (105)2 /7]/6
169
18 324
17 289 S2 = 84/6 = 14
16 256
20 400
Σ 105 1659
The Standard Deviation (SD)
The square root of the variance
Notes
Average deviation, variance & SD must be positive
When No. of data > 30, (N) can be used instead of
(N-1) in calculation of S2 & SD
Why ?
As the result does not differ significantly in this
case
Notes
The mean & median must be one (unique) value
The mode may be none, one, or more
Mean may be truely representative of data (valid)
or not truely representative of data (invalid)
Results of any experiment are suscepetible to
variability; accuracy is inversely proportional to
degree of variability
Notes
If all data are identical
There will be no variability
Range = Average deviation = Variance = SD = Zero
Today's Picture
Thank You Thank You Thank You
Thank You Thank You Thank You
Thank You Thank You
Thank You
Thank You Thank You
Thank You
Thank You Thank You
Thank You
Thank You Thank You
Thank You