0% found this document useful (0 votes)
23 views37 pages

Data Summarization

This document provides a tutorial on data summarization in biostatistics, focusing on measures of central tendency (mean, median, mode) and measures of variability (range, average deviation, variance, standard deviation). It explains how to calculate these measures and their significance in describing data distributions. The tutorial emphasizes the importance of both central tendency and variability for a comprehensive understanding of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views37 pages

Data Summarization

This document provides a tutorial on data summarization in biostatistics, focusing on measures of central tendency (mean, median, mode) and measures of variability (range, average deviation, variance, standard deviation). It explains how to calculate these measures and their significance in describing data distributions. The tutorial emphasizes the importance of both central tendency and variability for a comprehensive understanding of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Biostatistics

Tutorials
Tutorial # 5
Data Summarization
Biostatistics
Raw data Collect
Organize D
Processing (Treatment)

Present A
Summarize T
Analyze A
Interpret
Data Summarization
Measures of Central
Measures of Variability
Tendency
= Measures of
= Averages
Dispersion (Scatter)
Show the center of Show the spread of data
data around the center

Data Average Data Data Average Data


Data Summarization
Measures of Central
Measures of Variability
Tendency
1 1
Arithmetic Mean Range
2
2 Average Deviation
Median 3
Variance
3
Mode 4
Standard Deviation
The Arithmetic Mean
The Mean = The average = X

The Most widely used measure of


central tendency
Defenition : The sum of all data values
divided by the total number of values
The Arithmetic Mean
A. For raw data
Data : x1, x2, x3 … xN

Σx = x1 + x2 + x3 + …. xN
N = Total No. of data
The Arithmetic Mean
B. For data grouped in FDTs
Data : classes having frequency (F) &
midpoints (X)

ΣF.X = F1.X1 + F2.X2 …etc


N = Total No. of data
The Arithmetic Mean
Class Class Midpoint Frequency
Interval (x) (F)
120- 123 8
126- 129 9
132- 135 12
138- 141 10
144-150 147 11
Total 50
The Arithmetic Mean
Class Class Midpoint Frequency F.X
Interval (x) (F)
120- 123 8 984
126- 129 9 1161
132- 135 12 1620
138- 141 10 1410
144-150 147 11 1617
Total N = 50 ΣF.X = 6792
Mean 135.84
The Arithmetic Mean
20 35 45 60
38 39 41 42
40
The mean summarizes data in one number
The mean shows the center of data
The mean does not show the spread of data around the
center (c.f. measures of variability)
Although it describes all data, it is mostly none of
these data (c.f. median : middle value of data)
The Median
The Median = The Middle
The value that divides all data into 2
equal halves after their arrangement
It may be : the middle value or the
average of the 2 middle values
(according to No. of data : odd or even)
The Median
If N is the No. of data

If odd No. of data If even No. of data

MD is the middle MD is the average of


value the 2 middle values
The Median
A. Odd No. of Data
Calculate the median for the following data : 142, 116, 166, 128, 138

Solution
A. Arrange the data ascendingly

116 128 138 142 166


B. Position of MD = (N+1)/2 = (5+1)/2 = 3
116 128 138 142 166
3
The Median
A. Even No. of Data
Calculate the median for the following data : 111, 126, 117, 146, 130, 149

Solution
A. Arrange the data ascendingly

111 117 126 130 146 149


B. Position of MD = (N+1)/2 = (6+1)/2 = 3.5
111 117 126 130 146 149
3.5
The Median
C. Median = Average of 126 & 130

MD = (126+ 130)/2 = 128

N.B : 128 is not one of the data values


The Mean Versus The Median
The Mean The Median
Affected by extremes Not affected by
(outliers) extremes (outliers)
Example
5, 6, 7, 8, 9, 10, 100
Mean = 20.7 Median = 8
The Mode
The most frequent value
The data may have :
No mode ≥ 1 mode

1 mode 2 modes >2 modes


(Unimodal) (Bimodal) (Multimodal)
The Mode
5, 8, 6, 5, 9, 5, 3 Mode = 5 (Unimodal)

5, 8, 6, 5, 9, 8, 3 Mode = 5, 8 (Bimodal)

5, 8, 6, 5, 9, 8, 6 Mode=5,8,6 (Multimodal)

5, 8, 6, 3, 2, 4, 9 Mode = No mode
The Mode

Unimodal Bimodal

No mode
The Mean, Median & Mode

Symmetric Distribution Positively-skewed Distribution


Mean = Median = Mode Mode < Median < Mean

Negatively-skewed Distribution
Mean < Median < Mode
Measures of Variability
20 35 45 60
38 39 41 42
40
Measures of central tendency are not enough
alone to completely describe the data
Why ?
As they show the center of data but do not show
spread of data around the center
Thus both measures will be used
The Range
The simplest measure of variablility
The difference between the maximum & the
minimum
Range = Xmax – Xmin
20 35 45 60
Xmin Xmax
The Range
Disadvantage
It depends only on the 2 outliers (maximum &
minimum) & does not take into account the
other values
Range = Xmax – Xmin
20 35 45 60
Xmin Xmax
The Average Deviation
= The Mean Deviation

Data Mean Data

Deviation (d) = Absolute difference between any


value [X] & the mean [X] (Absolute i.e. must be +)
The Average Deviation
Why absolute differences ?
As the summation of differences
will be always = zero

The result will be always = zero


The Average Deviation
Rules

Average deviation is more precise


measure of variability than range
The Average Deviation
Calculate average deviation for each set of data
20 35 45 60
38 39 41 42
40

Then compare between the 2 sets of data and


write your comment
The Average Deviation
X X d X X d
20 40 20 38 40 2
35 40 5 39 40 1
45 40 5 41 40 1
60 40 20 42 40 2
Σ --- 50 Σ -- 6
Average Deviation = 50/4 = 12.5 Average Deviation = 6/4 = 1.5

Average Deviation is directly proportional to degree of


variability (dispersion; scatter) of values
The Variance (S )
2

The average of squared deviations of values from the


mean
Need the
mean
No need of
the mean
(N-1) = Degree of freedom = No. of data minus 1

The variance is directly proportional to variablity


The Variance (S )
2

Calculate variance of 10, 11, 13, 18, 17, 16, 20; (N= 7)
Mean = (10+11+13+18+17+16+20)/7 = 15
X X d d2
10 15 -5 25
11 15 -4 16
13 15 -2 4
18 15 3 9
17 15 2 4 S2 = 84 / 6 =
16 15 1 1 14
20 15 5 25
Σ 105 --- --- 84
The Variance (S )
2

X X2
10 100
11 121
13 S2 = [1659 – (105)2 /7]/6
169
18 324
17 289 S2 = 84/6 = 14
16 256
20 400
Σ 105 1659
The Standard Deviation (SD)
The square root of the variance
Notes
Average deviation, variance & SD must be positive

When No. of data > 30, (N) can be used instead of


(N-1) in calculation of S2 & SD
Why ?
As the result does not differ significantly in this
case
Notes
The mean & median must be one (unique) value

The mode may be none, one, or more


Mean may be truely representative of data (valid)
or not truely representative of data (invalid)
Results of any experiment are suscepetible to
variability; accuracy is inversely proportional to
degree of variability
Notes
If all data are identical

There will be no variability

Range = Average deviation = Variance = SD = Zero


Today's Picture
Thank You Thank You Thank You

Thank You Thank You Thank You


Thank You Thank You
Thank You
Thank You Thank You
Thank You
Thank You Thank You
Thank You
Thank You Thank You
Thank You

You might also like