0% found this document useful (0 votes)
9 views42 pages

B. Biostatistics (Descriptive Statistics)

The document provides an overview of descriptive statistics, including measures of central tendency (mean, median, mode) and measures of spread (range, quartiles, variance, standard deviation). It explains how to calculate these statistics and their relevance in summarizing data, as well as the importance of understanding ratios, proportions, and rates in public health. Additionally, it discusses the appropriate use of these measures depending on the type of data being analyzed.

Uploaded by

charlesachuti25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views42 pages

B. Biostatistics (Descriptive Statistics)

The document provides an overview of descriptive statistics, including measures of central tendency (mean, median, mode) and measures of spread (range, quartiles, variance, standard deviation). It explains how to calculate these statistics and their relevance in summarizing data, as well as the importance of understanding ratios, proportions, and rates in public health. Additionally, it discusses the appropriate use of these measures depending on the type of data being analyzed.

Uploaded by

charlesachuti25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Descriptive Statistics

• These types of statistics are


used to describe and summarise
data in such a way that
significant patterns may be
revealed.
• They cannot be used to test
hypothesis
Examples of descriptive statistics

Ratios e.g. measures of


morbidity, mortality and
natality.
Measures of central tendency
e.G. Mean, mode and median.
Measures of dispersion e.G.
Range, interquatile range and
standard deviation
Ratios, Proportion, and Rates
• Ratios are simply expressions of
one measure relative to another.
There are several types of ratios
that are frequently used in public
health.
[Link] Ratios
[Link]
[Link]
Ratios
• Consider a population of 20 male patients
and 80 female patients.
• The ratio of men to women= 20:80 or
20/80.
• Or it can be simplified to a 1:4 ratio (or
1/4 ratio). This indicates that for every
man, there are four women.
• This could also be considered from the
inverse perspective, i.e., the number of
women relative to the number of men =
80/20 which is equivalent to 4 to 1, i.e.,
there are four women for every man.
Proportions

• A proportion is a type of ratio that


relates a part to a whole. For the
example given, of 20 men and 80
women, the total population size is 100,
and the proportion of men is 20/100 or
20%.
• The proportion of women is 80/100 or
80%. In both of these proportions the
size of gender is being related to the
size of the entire population.
• Prevalence is a type of proportion
Proportions Cont’d

• All fractions, including


proportions, are ratios. But only
ratios in which the numerator is
included in the denominator is a
proportion.
Rates

• Rates are a special type of ratio that


incorporate the dimension of time
into the denominator. Familiar
examples include measurements of
speed (miles per hour) or water flow
(gallons per minute).
• Ex. incidence rates or incidence
density is a measurement of the
frequency of a health outcome that is
more like a true rate
Measures of Central Tendency (MCT)
• These measures provide a
numerical summary of the
important characteristics of the
distribution of a variable.
• Such summaries are necessary
for precise and efficient
comparisons of different sets of
data.
• The most informative summary
measure for quantitative data is
Mean, Median, Mode
1. MEAN: sum of the numbers
divided by n
[Link]: the middle number
when the numbers are
ordered. If set is even, the
median is the average of the
two middle numbers.
[Link]: most frequent
number. Can be Unimodal,
Bimodal or Trimodal.
• Appropriate MCT depends on the type
of data.
• Continuous data e.g. ht, use mean. =
mean height is 32.5 cm'. The mode is
not a good measure here because, it
may not exist
• Discrete data e.g. number of children,
use mode or median, this avoids
situation where the mean of children
is reported as ‘2.3 ‘!
• Categorical data e.g. Colour of houses
sold use mode, for example, ‘White”
is the most common house colour'.
MEAN
• The arithmetic mean is the most
common measure of central
tendency.
• The symbol "μ" is used for the mean
of a population. The symbol "M“ or x
is used for the mean of a sample.
The formula for μ is shown below:
• μ = ΣXi/N (population mean)
• M or x = ΣXi/n (sample mean)
• While the mean is the
preferred MCT for continuous
data, in some situations it is
not the "best“. E.g. When
data distribution is skewed.
Here the median is the
preferred MCT
THE MEDIAN
• The median is the midpoint
of a distribution: the same
number of scores is above
the median as below it if
the distribution of data is
odd. When the data set is
even, the median is the
mean of the two middle
numbers.
Calculating the Median
• Formula: n+1/2 (this gives you the
position of the median)
• Step 1. Arrange data set in an
ascending order
• Step 2. Find the position of the
median. If the data set is odd the
median is exactly at the position. If
the set is even the median will fall
between two numbers. Add these
two and divide by 2 and that is the
median
MODE
• The mode is the most frequently
occurring value
• With continuous data measured
to many decimals, the frequency
of each value is one since no two
scores will be exactly the same.
• The mode is not usually used
because the largest frequency of
scores might not be at the
center
• The only situation in which the
mode may be preferred over the
other two MCTs is when
describing discrete categorical
data.
• The mode is appropriate in
describing data that is counted
rather than measured data
• An advantage of the mode over
the mean and median is that it
can be found for both numerical
and categorical data
• Since more than one mode can
occur (bi-modal or multi-
modal) in a data set, its ability
to describe the centre using
one summary value is limited
• The mode can be a single
summary value (50) or it can
be as a form of a modal class
(50-57)
MEASURES OF SPREAD
Introduction
• A measure of spread
(dispersion), is used to describe
the variability in a sample or
population.
• It is usually used in conjunction
with a measure of central
tendency, such as the mean or
median, to provide an overall
description of a set of data.
• Why is it important to measure
the spread of data?
• It gives us an idea how well a
mean represents a data set. Mean
is not good with data set with
large spread but is appropriate if
the spread of data is small.
• Large spread indicates high
variability between individual
scores, such does not auger well
in research.
Types of measures of spread

• Range
• Quartiles
• Variance
• Standard deviation.
Range

• The range is the difference


between the highest and lowest
scores in a data set and is the
simplest measure of spread.
• Range = maximum value -
minimum value
• NB, unlike with median, data must
not be ordered, however, an
ordered data makes it easier to
quickly see the minimum and
maximum values
• The range delineates the boundaries
of data sets.
• The importance of this is seen if you
are measuring a variable that has a
high and low values that should not
be crossed.
• The range can be used to detect any
errors when entering data. E.g., if you
are recording the age of school
children, you quickly note a mistake if
your range is 7 to 118yrs!
Quartiles and Interquartile Range

• Quartiles measure spread of a


data set by breaking the data set
into quarters. There are four
quartiles in a percentile
• 1st quartile is in 25th percentile,
• 2nd quartile= 50th percentile,
• 3rd quartile= 75th percentile, and
• 4th quartile is in 100th percentile.
• Quartiles are much less affected
by outliers or a skewed data set
than the equivalent measures of
mean and standard deviation.
• Quartiles are often reported
along with the median as the
best choice of measure of spread
and central tendency,
respectively, when dealing with
skewed and/or data with
outliers.
• A common way of expressing
quartiles as measure of spread is
as an interquartile range (IR).
• IR describes the difference
between the third quartile (Q3) and
the first quartile (Q1). It tells of the
range of the middle half of the
scores in the distribution. i.e.
• Formula for calculating IQR = Q3 -
Q1
How to calculate quartiles and
IQR
• Arrange data set in ascending
order
• Find the median (n+1/2)
• This is the 50th percentile or the
middle quartile (Q2)
• Find the lower quartile Use 1/4 x
(n+1) or (n+1)/4) This is
the median of the first quarter of
the data set i.e Q1.
• Find the median of the upper
quartile
• The upper quartile (Q3) is
the median of the upper half
of the data set i.e. Q3 =
¾(n+1)
• The IQR is Q3 – Q1
Finding median (Q2), Q1, Q3 and IQR

• 18 20 23 20 23 27 24 23 29
• Solution:
• Arrange the values in ascending order
of magnitude:
• 18 20 20 23 23 23 24 27 29

n =9
• Median (Q2) is
= n +1/2
= 9 +1/2
= 10/2
=5. This is the position where the
median of the whole data set lies
(Q2)

18 20 20 23 23 23 24 27 29

1 2 3 4 5th 6 7 8 9
• Median of lower quarter (Q1) is
=n +1/4
= 9 +1/4
= 10/4
=2.5- This is the position where the
median of first quarter (Q1) lies
18 20 20 23 23 23 24 27 29

1 2 2.5th 3 4 5th 6 7 8 9

Q1 = 20 +20/ 2 = 20
• Median of upper quartile (Q3)
• Q3 = ¾ (n + 1)
= ¾ (9 +1)
= ¾ (10)
= 7.5. This is the position where the median of
the third quarter (Q3) lies in the data set
18 20 20 23 23 23 24 27 29
Q1 Q2 Q3
1 2 2.5th 3 4 5th 6 7 7.5th 8 9
= Q3 = 24 +27/ 2 51/2 = 25.2.
Q3 = 25.2
• IQR = 25.2 – 20
= 5.2
This means the middle 50% of the
data values range from 20 to
25.5.
• The interquartile range (IQR) is
the spread of the middle 50% of
the data values.
Variance and standard deviation

• Quartiles do not take into account


every score in our group of data.
• To take into account the actual values
of each score in a data set and get
the spread we use the VARIANCE &
STANDARD DEVIATION.
• Either of the two variations can be
used in research.
Variance
• The deviation of a group of scores
from the population mean (m),
• The variance squares up each of the
deviations from the mean (m-x)2.
• Adding up all the squared deviations
gives us the sum of squares
(numerator), which we can then
divide by the total population (N) and
we get the variance
• The usefulness of calculating the
variance is that- If the variance
is a large number it means that
the scores in the data set are
spread widely from the mean.
• Conversely, if the scores are
spread closely around the mean,
the variance will be a smaller
number.
• However, there are two potential
problems with the variance:
• First, because the deviations of
scores from the mean are
'squared', this gives more weight
to extreme scores.
• Secondly, the variance is not in
the same units as the scores in
the data set: variance is
measured in the units squared.
• This means that its value cannot
directly relate to the values in a
data set.
• In these circumstances,
calculating the standard
deviation rather than the
variance rectifies this problem.
Standard Deviation

• The standard deviation is a measure


of the spread of scores within a set of
data.
• SD can be calculated for the entire
population or for the sample.
• SD for entire population is preferred.
• However, since researchers often deal
with data from a sample only, the
population standard deviation can be
derived from a sample standard
deviation.
Type of data used when calculating a
SD
• The SD is used in conjunction with the
mean to summarise continuous data,
NOT categorical data.
• In addition, SD, like the mean, is
normally only appropriate when the
continuous data is not significantly
skewed or has outliers.
Formulae for calculating population and sample standard
deviation?

• The sample standard deviation


formula is:
• The population standard
deviation formula is:

You might also like