0% found this document useful (0 votes)

273 views319 pages

Basic Biostatistics

This document provides an introduction to biostatistics, describing what statistics and biostatistics are, the uses of biostatistics including health program evaluation and assessing risk factors, and key concepts like population and sample, descriptive statistics, measures of central tendency, and the difference between parameters and statistics.

Uploaded by

girum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

273 views319 pages

Basic Biostatistics

Uploaded by

girum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 319

Chapter-1

Introduction to Biostatistics

Name: Huruy Assefa

E-mail: [email protected]

Mob.: 0914-728565

School of Public Health

Introduction
• What is statistics?

• Statistics: A field of study concerned with:

– collection, organization, analysis, summarization and interpretation
of numerical data, and

– the drawing of inferences about a body of data when only a small

part of the data is observed.

• Statistics helps us use numbers to communicate ideas

• Statisticians try to interpret and communicate the results to
others.

School of Public Health 2

Cont.
· Biostatistics: The application of statistical methods
to the fields of biological and medical sciences.

· Concerned with interpretation of biological data &

the communication of information derived from
these data

· Has central role in medical investigations

School of Public Health 3
Uses of biostatistics

• Provide methods of organizing information

• Assessment of health status
• Health program evaluation
• Resource allocation
• Magnitude of association
– Strong vs weak association between exposure and
outcome

School of Public Health 4

Cont.
• Assessing risk factors
– Cause & effect relationship
• Evaluation of a new vaccine or drug
– What can be concluded if the proportion of people free
from the disease is greater among the vaccinated than
the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?
• Drawing of inferences
– Information from sample to population

School of Public Health 5

What does biostatistics cover?
Research Planning

Design The best way to

learn about
biostatistics is to
Execution (Data collection)
follow the flow of a
research from
Data Processing
inception to the
final publication
Data Analysis

Presentation

Interpretation
Publication 6
variable:
 It is a characteristic that takes on different
values in different persons, places, or things.

For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.

School of Public Health 7

Types of variables

Quantitative Qualitative

Quantitative Variables Qualitative Variables

It can be measured in the Many characteristics are not

usual sense. capable of being measured. Some
of them can be ordered or ranked.
For example:
 the heights of adult males, For example:
 the weights of preschool  classification of people into socio-
children, economic groups,
 the ages of patients seen in  social classes based on income,
a education, etc.
 dental clinic. 8
Types of variables &
scale of measurement

Quantitative variables Qualitative variables

(Numerical) (Categorical)

Interval Nominal

Ordinal
Ratio

School of Public Health 9

Types of Statistics
1. Descriptive statistics:

• Ways of organizing and summarizing data

• Helps to identify the general features and trends in a set of

data and extracting useful information

• Also very important in conveying the final results of a

study

• Example: tables, graphs, numerical summary measures

School of Public Health 10

Cont.
2. Inferential statistics:

• Methods used for drawing conclusions about a

population based on the information obtained
from a sample of observations drawn from that
population

• Example: Principles of probability, estimation,

confidence interval, comparison of two or more
means or proportions, hypothesis testing, etc.
School of Public Health 11
Data
• Data are numbers which can be obtained by measurement or
counting

• The raw material for statistics

• Can be obtained from:

– Routinely kept records, literature
– Surveys
– Counting
– Experiments
– Reports
– Observation
– Etc

School of Public Health 12

Types of Data
1. Primary data: collected from the items or
individual respondents directly by the researcher
for the purpose of a study.

2. Secondary data: which had been collected by

certain people or organization, & statistically
treated and the information contained in it is used
for other purpose by other people

School of Public Health 13

Population and Sample
• Population:
– Refers to any collection of objects

• Target population:
– A collection of items that have something in common
for which we wish to draw conclusions at a particular
time.
• E.g., All hospitals in Ethiopia
– The whole group of interest

School of Public Health 14

Cont.
Study (Sampled) Population:

• The subset of the target population that has at least some

chance of being sampled

• The specific population group from which samples are drawn

and data are collected

School of Public Health 15

Cont.
Sample:
 A subset of a study population, about which
information is actually obtained.

 The individuals who are actually measured

and comprise the actual data.

School of Public Health 16

Cont.
Population
• Role of statistics
in using information
from a sample to make
inferences about the
population

Information

Sample

School of Public Health 17

Cont.
E.g.: In a study of the prevalence
of HIV among adolescents in
Ethiopia, a random sample of
adolescents in Ayder of Mekelle
Sample were included.
Target Population: All
Study Population adolescents in Ethiopia

Target Population Study population: All

adolescents in Mekelle
Sample: Adolescents in Ayder
sub-city who were included in
the study
18
Generalizability
• Is a two-stage procedure:

• We need to be able to generalize from:

– the sample to the study population, &
– then from the study population to the target population

• If the sample is not representative of the

population, the conclusions are restricted to the
sample & don’t have general applicability

School of Public Health 19

Parameter and Statistic
• Parameter: A descriptive measure computed
from the data of a population.
– E.g., the mean (µ) age of the target population

• Statistic: A descriptive measure computed from

the data of a sample.
– E.g., sample mean age ( )

School of Public Health 20

Descriptive Statistics:
Summarizing data

School of Public Health 21

Measures of Central Tendency (MCT)

• On the scale of values of a variable there is a certain stage

at which the largest number of items tend to cluster.

• Since this stage is usually in the centre of distribution, the

tendency of the statistical data to get concentrated at a
certain value is called “central tendency”

• The various methods of determining the point about

which the observations tend to concentrate are called
Measures of Central Tendency.

School of Public Health 22

Cont.

• The objective of calculating MCT is to determine a

single figure/value which may be used to represent
the whole data set.

• In that sense it is an even more compact description

of the statistical data than the frequency distribution.

• Since a MCT represents the entire data, it facilitates

comparison within one group or between groups of
data.

23
Characteristics of a good MCT
MCT is good or satisfactory if it possesses the following
characteristics.
1. It should be based on all the observations

2. It should not be affected by the extreme values

3. It should be as close to the maximum number of values as possible

4. It should have a definite value

5. It should not be subjected to complicated and tedious calculations (easy)

School of Public Health 24

Cont.

• The most common measures of central

tendency include:
– Arithmetic Mean
– Median
– Mode
– Others

25
1. Arithmetic Mean
A. Ungrouped Data

• The arithmetic mean is the "average" of the data set and

by far the most widely used measure of central location

• Is the sum of all the observations divided by the total

number of observations.

26
Cont.

27
Cont.
The heart rates for n=10 patients were as follows (beats
per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of these
patients?

School of Public Health 28

Cont.
b) Grouped data
In calculating the mean from grouped data, we assume that all values falling into a
particular class interval are located at the mid-point of the interval. It is calculated as
follow:
k

m f
i=1
i i
x= k

f i=1
i

where,
k = the number of class intervals
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval

29
Cont.
Class interval Mid-point (mi) Frequency (fi) mifi
10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0

Total __ 169 5810.5

Example. Compute the mean age of 169 subjects from the grouped data.

Mean = 5810.5/169 = 34.48 years

30
Cont.

When the data are skewed, the mean is “dragged” in the direction of the skewness

• It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure of
central location or does not reflect the center of the sample.

School of Public Health 31

Properties of the Arithmetic Mean

• For a given set of data there is one and only one arithmetic mean
(uniqueness)

• Easy to calculate and understand (simple)

• Influenced by each and every value in a data set

• Greatly affected by the extreme values

• In case of grouped data if any class interval is open, arithmetic

mean can not be calculated

School of Public Health 32

Median
a) Ungrouped data
• The median is the value which divides the data set into two equal
parts.

• If the number of values is odd, the median will be the middle

value when all values are arranged in order of magnitude.

• When the number of observations is even, there is no single

middle value but two middle observations.

• In this case the median is the mean of these two middle

observations, when all observations have been arranged in the
order of their magnitude.

School of Public Health 33

Cont.

34
Cont.

School of Public Health 35

Cont.

School of Public Health 36

Cont.
 The median is a better description (than the mean) of the
majority when the distribution is skewed
• Example:- Data: 14, 89, 93, 95, 96
– Skewness is reflected in the outlying low value of 14
– The sample mean is 77.4
– The median is 93

School of Public Health 37

b) Grouped data

• In calculating the median from grouped data, we

assume that the values within a class-interval are
evenly distributed through the interval.

• The first step is to locate the class interval in which the

median is located, using the following procedure.

• Find n/2 and see a class interval with a minimum

cumulative frequency which contains n/2.
• Then, use the following formal.

School of Public Health 38

Cont.
n 
  Fc 
~
x = Lm  2 W
 fm 
 
where,
Lm = lower true class boundary of the interval containing the median
Fc = cumulative frequency of the interval just above the median
class
interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations 39
Example. Compute the median age of 169 subjects from the grouped data.

n/2 = 169/2 = 84.5

Class interval Mid-point (mi) Frequency (fi) Cum. freq

10-19 14.5 4 4
20-29 24.5 66 70
30-39 34.5 47 117
40-49 44.5 36 153
50-59 54.5 12 165
60-69 64.5 4 169

Total 169

40
Cont.

• n/2 = 84.5 = in the 3rd class interval

• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• (n/2 – fc) = 84.5-70 = 14.5

• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

41
Properties of the median
• There is only one median for a given set of data (uniqueness)

• The median is easy to calculate

• Median is a positional average and hence it is insensitive to very large

or very small values (not affected by extreme values)

• Median can be calculated even in the case of open end intervals

• It is determined mainly by the middle points and less sensitive to the

remaining data points (weakness).

School of Public Health 42

Mode
• The mode is the most frequently occurring value
among all the observations in a set of data.

• It is not influenced by extreme values.

• It is possible to have more than one mode or no mode.

• It is not a good summary of the majority of the data.

School of Public Health 43

Mode
Mode
Mode

20
18
16
14
12
N 10
8
6
4
2
0
44
T. Ancelle, D. Coulombie
a) Ungrouped data
• It is a value which occurs most frequently in
a set of values.

• If all the values are different there is no

mode, on the other hand, a set of values
may have more than one mode.

School of Public Health 45

Cont.
• Example
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal”
• Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes - 2 & 5
• This distribution is said to be “bi-modal”
• Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different

School of Public Health 46

b) Grouped data

• To find the mode of grouped data, we

usually refer to the modal class, where the
modal class is the class interval with the
highest frequency.
• If a single value for the mode of grouped
data must be specified, it is taken as the
mid-point of the modal class interval.

School of Public Health 47

Cont.

48
Properties of mode
· It is not affected by extreme values

· It can be calculated for distributions with open

end classes

· Often its value is not unique

 The main drawback of mode is that often it does

not exist

School of Public Health 49

Cont.

Which measure of central tendency is best with a given

set of data?

• Two factors are important in making this decisions:

– The scale of measurement (type of data)

– The shape of the distribution of the

observations

School of Public Health 50

Cont.
• The mean can be used for discrete and continuous
data.

• The median is appropriate for discrete and

continuous data as well, but can also be used for
ordinal data.

• The mode can be used for all types of data, but may
be especially useful for nominal and ordinal
measurements.

School of Public Health 51

Relationship between Mean, Median and Mode

(A) Symmetric and unimodal distribution —

Mean, median, and mode should all be
approximately the same

Mean, Median & Mode

School of Public Health 52

Cont.
(A) Bimodal — Mean and median should be
about the same, but may take a value that is
unlikely to occur; two modes might be best

School of Public Health 53

Cont.
(C) Skewed to the right (positively skewed) —
Mean is sensitive to extreme values, so median
might be more appropriate

Mode

Median

Mean

School of Public Health 54

Cont.
(D) Skewed to the left (negatively skewed) —
Same as (c)
Mode

Median

Mean

School of Public Health 55

Measures of Dispersion

School of Public Health 56

Consider the following two sets of data:

A: 177 193 195 209 226 Mean = 200

B: 192 197 200 202 209 Mean = 200

Two or more sets may have the same mean and/or median but they may be
quite different.

School of Public Health 57

These two distributions have the same mean,
median, and mode

School of Public Health 58

Cont.

• MCT are not enough to give a clear

understanding about the distribution of the
data.

• We need to know something about the

variability or spread of the values — whether
they tend to be clustered close together, or
spread out over a broad range

59
Measures of Dispersion
• Measures that quantify the variation or dispersion of a set of
data from its central location

• Dispersion refers to the variety exhibited by the values of the

data.

• The amount may be small when the values are close

together.

• If all the values are the same, no dispersion

School of Public Health 60

Cont.

• Measures of dispersion include:

– Range
– Inter-quartile range
– Variance
– Standard deviation
– Coefficient of variation
– Standard error
– Others

61
Range (R)
• The difference between the largest and smallest
observations in a sample.

• Range = Maximum value – Minimum value

• Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37

• Data set with higher range exhibit more variability

School of Public Health 62

Properties of range
· It is the simplest crude measure and can be easily understood

· It takes into account only two values which causes it to be a

poor measure of dispersion

· Very sensitive to extreme observations

· The larger the sample size, the larger the

range

School of Public Health 63

Interquartile range (IQR)
• Indicates the spread of the middle 50% of the
observations, and used with median

IQR = Q3 - Q1

• Example: Suppose the first and third quartile for weights of

girls 12 months of age are 8.8 Kg and 10.2 Kg, respectively.

IQR = 10.2 Kg – 8.8 Kg

i.e., 50% of the infant girls weigh between 8.8 and 10.2 Kg.

School of Public Health 64

Properties of IQR:
• It is a simple and versatile measure
• It encloses the central 50% of the observations
• It is not based on all observations but only on two
specific values
• It is important in selecting cut-off points in the
formulation of clinical standards
• Since it excludes the lowest and highest 25% values, it
is not affected by extreme values
• Less sensitive to the size of the sample

65
Variance (2, s2)
• The variance is the average of the squares of the deviations
taken from the mean.

• It is squared because the sum of the deviations of the

individual observations of a sample about the sample mean
is always 0
åxi -x
0= ( )

• The variance can be thought of as an average of squared

deviations

School of Public Health 66

Cont.

• Variance is used to measure the dispersion of

values relative to the mean.

• When values are close to their mean (narrow

range) the dispersion is less than when there
is scattering over a wide range.
– Population variance = σ2
– Sample variance = S2

67
Cont.
a) Ungrouped data

 Let X1, X2, ..., XN be the measurement on N

population units, then:
N

 i
(X   ) 2

2  i 1
where
N
N

X i
= i =1
is the population mean.
N

68
Cont.
A sample variance is calculated for a sample of individual values
(X1, X2, … Xn) and uses the sample mean
𝑿 (e.g:- ) rather than the
population mean µ.

School of Public Health 69

Degrees of freedom
• In computing the variance there are (n-1) degrees of freedom
because only (n-1) of the deviations are independent from
each other

• The last one can always be calculated from the others

automatically.

• This is because the sum of the deviations from their mean (X i-

Mean) must add to zero.

School of Public Health 70

b) Grouped data
k

 i
(m  x) 2
fi
S2  i=1
k

f
i=1
i -1

where
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
x = the sample mean
k = the number of class intervals
School of Public Health 71
Properties of Variance:
· The main disadvantage of variance is that its unit is
the square of the unit of the original measurement
values.

· The variance gives more weight to the extreme

values as compared to those which are near to mean
value, because the difference is squared in variance.

• The drawbacks of variance are overcome by the

standard deviation.

72
Standard deviation (, s)

• It is the square root of the variance.

• This produces a measure having the same

scale as that of the individual values.
2 2
   and S = S

School of Public Health 73

Example:
• Following are the survival times of n=11
patients after heart transplant surgery.

• The survival time for the “ith” patient is

represented as Xi for i= 1, …, 11.

• Calculate the sample variance and SD?

74
School of Public Health 75
Example:
Example. Compute the variance and SD of the age of 169 subjects from the grouped
data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96

Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80

Total 169 1901.20 20199.22

76
Properties of SD
• The SD has the advantage of being expressed in the same
units of measurement as the mean

• SD is considered to be the best measure of dispersion and

is used widely because of the properties of the theoretical
normal curve.

• However, the drawback of SD is if the units of

measurements of variables of two data sets is not the
same.

School of Public Health 77

Coefficient of variation (CV)
• When two data sets have different units of
measurements, or their means differ
sufficiently in size, the CV should be used as a
measure of dispersion.

• It is the best measure to compare the variability

of two series of sets of observations.

• Data with less coefficient of variation is

considered more consistent.
School of Public Health 78
Cont.
•CV is the ratio of the SD to the mean multiplied by 100.

SD
CV  100
x
SD Mean CV (%)
SBP 15mm 130mm 11.5
Cholesterol 40mg/dl 200mg/dl 20.0

• “Cholesterol is more variable than systolic blood

pressure”

79
NOTE:
• The range often appears with the median as a numerical
summary measure

• The IQR is used with the median as well

• The SD is used with the mean

• For nominal and ordinal data, a table or graph is often

more effective than any numerical summary measure

School of Public Health 80

Probability and Probability
Distributions

School of Public Health 81

Probability
• Chance of observing a particular outcome.

• Likelihood of an event.

• Probability theory developed from the study of

games of chance like dice and cards.

• A process like flipping a coin, rolling a die or drawing

a card from a deck are probability experiments.

School of Public Health 82

Why Probability in Statistics?

• Results are not certain

• To evaluate how accurate our results are:

– Given how our data were collected, are our results

accurate ?

– Given the level of accuracy needed, how many

observations need to be collected ?
School of Public Health 83
When can we talk about probability ?

When dealing with a process that has an uncertain

outcome

Experiment = any process with an uncertain outcome

• When an experiment is performed, one and only one

outcome is obtained

• Event = something that may happen or not when the

experiment is performed

School of Public Health 84

Two Categories of Probability

• Objective and Subjective Probabilities.

• Objective probability
1) Classical probability and
2) Relative frequency probability.

School of Public Health 85

Classical Probability
• Is based on gambling ideas
• Rolling a die -
– There are 6 possible outcomes:
– Total ways = {1, 2, 3, 4, 5, 6}.
• Each is equally likely
– P(i) = 1/6, i=1,2,...,6.
 P(1) = 1/6
 P(2) = 1/6
 …….
 P(6) = 1/6
SUM = 1

School of Public Health 86

Cont.
• Definition: If an event can occur in N mutually exclusive
and equally likely ways, and if m of these posses a
characteristic, E, the probability of the occurrence of E =
m/N.

P(E)= the probability of E = P(E) = m/N

• If we toss a die, what is the probability of 4 coming up?

m = 1(which is 4) and N = 6
The probability of 4 coming up is 1/6.
87
Relative Frequency Probability
• The proportion of times the event A occurs — in
a large number of trials repeated under
essentially identical conditions

• Definition: If a process is repeated a large number of

times (n), and if an event with the characteristic E occurs
m times, the relative frequency of E,
Probability of E = P(E) = m/n.

School of Public Health 88

Cont.
• If you toss a coin 100 times and head comes up 40 times,
P(H) = 40/100 = 0.4.

• If we toss a coin 10,000 times and the head comes up 5562,

P(H) = 0.5562.
• Therefore, the longer the series and the longer sample size, the closer the
estimate to the true value.

Example:
Of 158 people who attended a dinner party, 99 were ill.
P (Illness) = 99/158 = 0.63 = 63%.

89
Subjective Probability
• Personalistic (represents one’s degree of belief in the
occurrence of an event).

• Personal assessment of which is more effective to

provide cure – traditional/modern

• Personal assessment of which sports team will win a

match.

• Also uses classical and relative frequency methods to

assess the likelihood of an event.

School of Public Health 90

Cont.
• E.g., If someone says that he is 95% certain that
a cure for EBOLA will be discovered within 5
years, then he means that:

P(discovery of cure for EBOLA within 5 years) = 95%

= 0.95

• Although the subjective view of probability has enjoyed

increased attention over the years, it has not fully accepted by
scientists.

School of Public Health 91

Mutually Exclusive Events

· Two events A and B are mutually exclusive if they

cannot both happen at the same time
P (A ∩ B) = 0

• Example:
– A coin toss cannot produce heads and tails
simultaneously.
– Weight of an individual can’t be classified simultaneously
as “underweight”, “normal”, “overweight”

School of Public Health 92

Independent Events
• Two events A and B are independent if the
probability of the first one happening is the
same no matter how the second one turns
out. OR. The outcome of one event has no effect on the occurrence or
non-occurrence of the other.
P(A∩B) = P(A) x P(B) (Independent events)
P(A∩B) ≠ P(A) x P(B) (Dependent events)

Example:
– The outcomes on the first and second coin tosses
are independent

School of Public Health 93

Intersection and union
• The intersection of two events A and B, A ∩ B, is the event that A and B happen
simultaneously
P ( A and B ) = P (A ∩ B )

• Let A represent the event that a randomly selected newborn is LBW, and B the event that
he or she is from a multiple birth

• The intersection of A and B is the event that the infant is both LBW and from a multiple
birth

• The union of A and B, A U B, is the event that either A happens or B happens or they both
happen simultaneously
P ( A or B ) = P ( A U B )

• In the example above, the union of A and B is the event that the newborn is either LBW
or from a multiple birth, or both

School of Public Health 94

Properties of Probability
1. The numerical value of a probability always lies between
0 and 1, inclusive.
 A value 0 means the event can not occur
 A value 1 means the event definitely will occur
 A value of 0.5 means that the probability that the event will
occur is the same as the probability that it will not occur.

2. The sum of the probabilities of all mutually exclusive

outcomes is equal to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.

School of Public Health 95

Cont.
3. For two mutually exclusive events A and B,
P(A or B ) = P(AUB)= P(A) + P(B).

If not mutually exclusive:

P(A or B) = P(A) + P(B) - P(A and B)

4. The complement of an event A, denoted by Ā or Ac, is the event that A

does not occur

• Consists of all the outcomes in which event A does NOT occur

P(Ā) = P(not A) = 1 – P(A)
• Ā occurs only when A does not occur.
• These are complementary events.

96
Basic Probability Rules

1. Addition rule

 If events A and B are mutually exclusive:

P(A or B) = P(A) + P(B)
P(A and B) = 0

 More generally:
P(A or B) = P(A) + P(B) - P(A and B)
P(event A or event B occurs or they both occur)

School of Public Health 97

Example:
The probabilities below represent years of
schooling completed by mothers of newborn infants

1. What is the probability that a mother has completed < 12 years of

schooling?

2. What is the probability that a mother has completed 12 or more years of

schooling?

School of Public Health 98

Cont.
1. The probability that a mother has
completed < 12 years of schooling is:
P( 8 years) = 0.056 and
P(9-11 years) = 0.159

• Since these two events are mutually exclusive,

P( 8 or 9-11) = P( 8 U 9-11)
= P( 8) + P(9-11)
= 0.056+0.159
= 0.215
99
Cont.
2. The probability that a mother has completed 12
or more years of schooling is:
P(12) = P(12 or 13-15 or 16)
= P(12 U 13-15 U 16)
= P(12)+P(13-15)+P(16)
= 0.321+0.218+0.230
= 0.769

100
2. Multiplication rule

– If A and B are independent events, then

P(A ∩ B) = P(A) × P(B)

– More generally,
P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)
P(A and B) denotes the probability that A and B
both occur at the same time.

101
Conditional Probability

• The conditional probability that event B has

occurred given that event A has already
occurred is denoted P(B|A) and is defined

provided that P(A) ≠ 0.

School of Public Health 102

Cont.
Example:
A study investigating the effect of prolonged exposure to
bright light on retina damage in premature infants.

Retinopathy Retinopathy TOTAL

YES NO
Bright light 18 3 21
Reduced light 21 18 39
TOTAL 39 21 60

103
Cont.
• The probability of developing retinopathy is:

P (Retinopathy) = No. of infants with retinopathy

Total No. of infants
= (18+21)/(21+39)
= 0.65

104
Cont.

• We want to compare the probability of

retinopathy, given that the infant was exposed to
bright light, with that the infant was exposed to
reduced light.

• Exposure to bright light and exposure to reduced

light are conditioning events, events we want to
take into account when calculating conditional
probabilities.

105
Cont.

• The conditional probability of retinopathy, given

exposure to bright light, is:

• P(Retinopathy/exposure to bright light) =

No. of infants with retinopathy exposed to bright light

No. of infants exposed to bright light

= 18/21 = 0.86
106
Cont.

• P(Retinopathy/exposure to reduced light) =

# of infants with retinopathy exposed to reduced light

No. of infants exposed to reduced light

= 21/39 = 0.54

• The conditional probabilities suggest that premature infants

exposed to bright light have a higher risk of retinopathy than
premature infants exposed to reduced light.
107
Cont.
 For independent events A and B
P(A/B) = P(A).

 For non-independent events A and B

P(A and B) = P(A/B) P(B)
(General Multiplication Rule)

108
Exercise:
Culture and Gonodectin (GD) test results for 240 Urethral Discharge
Specimens

Culture Result
GD Test yes No Total
Result Gonorrhea Gonorrhea

Positive 175 9 184

Negative 8 48 56

Total 183 57 240

109
Cont.
1. What is the probability that a man has
gonorrhea?
2. What is the probability that a man has a
positive GD test?
3. What is the probability that a man has a positive
GD test and gonorrhea?
4. What is the probability that a man has a
negative GD test and does not have gonorrhea
5. What is the probability that a man with
gonorrhea has a positive GD test?

110
Cont.
6. What is the probability that a man does not
have gonorrhea has a negative GD test?
7. What is the probability that a man does not
have gonorrhea has a positive GD test?
8. What is the probability that a man with positive
GD test has gonorrhea?

111
Probability Distributions
• It is the way data are distributed, in order to draw
conclusions about a set of data

• Random Variable = Any quantity or characteristic that is

able to assume a number of different values such that
any particular outcome is determined by chance

• The probability distribution of a random variable is a

table, graph, or mathematical formula that gives the
probabilities with which the random variable takes
different values or ranges of values.

School of Public Health 112

Discrete Probability Distributions

• For a discrete random variable, the probability

distribution specifies each of the possible outcomes
of the random variable along with the probability
that each will occur

• Examples can be:

– Frequency distribution
– Relative frequency distribution
– Cumulative frequency

School of Public Health 113

Cont.

• We represent a potential outcome of the

random variable X by x

 0 ≤ P(X = x) ≤ 1
 ∑ P(X = x) = 1

114
The following data shows the number of diagnostic
services a patient receives

School of Public Health 115

Cont.
• What is the probability that a patient receives
exactly 3 diagnostic services?
P(X=3) = 0.031

• What is the probability that a patient receives at

most one diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
116
Cont.
• What is the probability that a patient receives
at least four diagnostic services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016

117
Probability distributions can also
be displayed using a graph

0.8
0.7
0.6
0.5
Probability, X=x

0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
No. of diagnostic services, x

School of Public Health 118

Cont.

• Examples of discrete probability distributions

are the binomial distribution and the Poisson
distribution.

119
Binomial Distribution
• Consider dichotomous (binary) random variable

• Is based on Bernoulli trial

– When a single trial of an experiment can result in only one of
two mutually exclusive outcomes (success or failure; dead or
alive; sick or well, male or female)

School of Public Health 120

Example:
• We are interested in determining whether a newborn infant will
survive until his/her 70th birthday
• Let Y represent the survival status of the
child at age 70 years
• Y = 1 if the child survives and Y = 0 if he/she does not
• The outcomes are mutually exclusive and exhaustive
• Suppose that 72% of infants born survive to age 70 years
P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28

School of Public Health 121

Characteristics of a Binomial Distribution

• The experiment consist of n identical trials[Fixed].

• Only two possible outcomes on each trial.
• The probability of A (success), denoted by p, remains
the same from trial to trial. The probability of B (failure),
denoted by q,
q = 1- p.
• The trials are independent.
• n and  are the parameters of the binomial distribution.
• The mean is n and the variance is n(1- )

School of Public Health 122

Cont.
• If an experiment is repeated n times and the
outcome is independent from one trial to
another, the probability that outcome A occurs
exactly x times is:
• P (X=x) = , x = 0, 1, 2, ..., n.

123
Cont.
• n denotes the number of fixed trials
• x denotes the number of successes in
the n trials
• p denotes the probability of success
• q denotes the probability of failure (1- p)

• Represents the number of ways of selecting x objects out of n where the

order of selection does not matter.
• where n!=n(n-1)(n-2)…(1) , and 0!=1
124
Example:
• Suppose we know that 40% of a certain
population are cigarette smokers. If we take a
random sample of 10 people from this
population, what is the probability that we will
have exactly 4 smokers in our sample?

School of Public Health 125

Cont.
• If the probability that any individual in the population
is a smoker to be P=.40, then the probability that x=4
smokers out of n=10 subjects selected is:

P(X=4) =10C4(0.4)4(1-0.4)10-4
= 10C4(0.4)4(0.6)6 = 210(.0256)(.04666)
= 0.25

• The probability of obtaining exactly 4 smokers in the

sample is about 0.25.

126
Cont.

• We can compute the probability of observing zero

smokers out of 10 subjects selected at random, exactly
1 smoker, and so on, and display the results in a table,
as given, below.

• The third column, P(X ≤ x), gives the cumulative

probability. E.g. the probability of selecting 3 or fewer
smokers into the sample of 10 subjects is
P(X ≤ 3) =.3823, or about 38%.
127
Cont.

128
Cont.
The probability in the above table can be converted
into the following graph

0.3
0.25
Probability

0.2
0.15
0.1
0.05
0
0 1 2 3 4 5 6 7 8 9 10
No. of Smokers

School of Public Health 129

Exercise
Each child born to a particular set of parents
has a probability of 0.25 of having blood type
O. If these parents have 5 children.
What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.

School of Public Health 130

Solution for ‘a’

a.)
 5 2 5-2
P(x  2) =  (0.25) (0.75)
 2
 0.2637

School of Public Health 131

2. The Poisson Distribution
• Is a discrete probability distribution used to
model the number of occurrences of an event
that takes place infrequently in time or space

• Applicable for counts of events over a given

interval of time, for example:
– number of patients arriving at an emergency
department in a day
– number of new cases of HIV diagnosed at a clinic in
a month

School of Public Health 132

Cont.
• In such cases, we take a sample of days and observe the number
of patients arriving at the emergency department on each day,

• We are observing a count or number of events, rather than a

yes/no or success/ failure outcome for each subject or trial, as in
the binomial.

• Suppose events happen randomly and independently in time at a

constant rate. If events happen with rate  events per unit time,
the probability of x events happening in unit time is:

 x e 
P(x) =
x!
133
Cont.
• where x = 0, 1, 2, . . .∞
• x is a potential outcome of X
• The constant λ (lambda) represents the rate at
which the event occurs, or the expected number
of events per unit time
• e = 2.71828

• It depends up on just one parameter, which is

the µ number of occurrences (λ).
134
Example
• The daily number of new registrations of
cancer is 2.2 on average.
What is the probability of
a) Getting no new cases
b) Getting 1 case
c) Getting 2 cases
d) Getting 3 cases
e) Getting 4 cases

School of Public Health 135

Solutions
0  2.2
a) P ( X  0)  ( 2.2) e  0.111
0!

b) P(X=1) = 0.244
c) P(X=2) = 0.268
d) P(X=3) = 0.197
e) P(X=4) = 0.108

School of Public Health 136

0.3

0.2
Probability

0.1

0.0

0 1 2 3 4 5 6 7
Poisson distribution with mean 2.2

137
Example:
• In a given geographical area, cases of tetanus are
reported at a rate of λ = 4.5/month
• What is the probability that 0 cases of tetanus will
be reported in a given month?

138
Cont.
• What is the probability that 1 case of tetanus
will be reported?

139
Continuous Probability Distributions
• A continuous random variable X can take on any value in a
specified interval or range

• The probability distribution of X is represented by a

smooth curve called a probability density function

• The area under the smooth curve is equal to 1

• The area under the curve between any two points x1 and
x2 is the probability that X takes a value between x1 and x2

School of Public Health 140

Cont.

• The probability associated with any one particular value is

equal to 0

• Therefore, P(X=x) = 0

• Also, P(X ≥ x) = P(X > x)

• We calculate:
Pr [ a < X < b], the probability of an
interval of values of X.

141
The Normal distribution

• Frequently called the “Gaussian distribution” or

bell-shape curve.

• Variables such as blood pressure, weight,

height, serum cholesterol level, and IQ score —
are approximately normally distributed

School of Public Health 142

Cont.

A random variable is said to have a normal distribution if it has

a probability distribution that is symmetric and bell-shaped

143
Cont.
• A random variable X is said to follow ND, if and
only if, its probability density function is:
2
1  x-  
1  
2  

f(x) = e , - < x < .
 2

144
Cont.
 π (pi) = 3.14159
 e = 2.71828, x = Value of X
 Range of possible values of X: -∞ to +∞
 µ = Expected value of X (“the long run
average”)
 σ2 = Variance of X.
 µ and σ are the parameters of the normal
distribution — they completely define its
shape
145
Cont.
1. The mean µ tells you about location -
– Increase µ - Location shifts right
– Decrease µ – Location shifts left
– Shape is unchanged

2. The variance σ2 tells you about narrowness or

flatness of the bell -
– Increase σ2 - Bell flattens. Extreme values are more likely
– Decrease σ2 - Bell narrows. Extreme values are less likely
– Location is unchanged

146
147
Properties of the Normal Distribution
1. It is symmetrical about its mean, .

2. The mean, the median and mode are almost equal. It is unimodal.

3. The total area under the curve about the x-axis is 1 square unit.

4. The curve never touches the x-axis.

5. As the value of  increases, the curve becomes more and more flat
and vice versa.

6. The distribution is completely determined by the parameters  and .

School of Public Health 148

149
Cont.

• We cannot tabulate every possible

distribution

• Tabulated normal probability calculations are

available only for the ND with µ = 0 and σ2=1.

150
Standard Normal Distribution
· It is a normal distribution that has a mean equal to
0 and a SD equal to 1, and is denoted by N(0, 1).

· The main idea is to standardize all the data that is

given by using Z-scores.

· These Z-scores can then be used to find the area

(and thus the probability) under the normal curve.

School of Public Health 151

The standard normal distribution has mean 0 and
variance 1

• Approximately 68% of the area under the standard normal

curve lies between ±1, about 95% between ±2, and about 99%
between ±2.5

School of Public Health 152

Z - Transformation

• If a random variable X~N(,) then we can

transform it to a SND with the help of Z-
transformation

Z= x-

• Z represents the Z-score for a given x value

School of Public Health 153

Cont.

• Consider redefining the scale to be in terms of

how many SDs away from mean for normal
distribution, μ=110 and σ=15.
Value
X = 50 65 80 95 110 125 140 155 170
Z = -4 -3 -2 -1 0 1 2 3 4
SDs from mean using
Z= =

154
Cont.

• This process is known as standardization and

gives the position on a normal curve with μ=0
and σ=1, i.e., the SND, Z.

• A Z-score is the number of standard deviations

that a given x value is above or below the
mean.

155
Some Useful Tips

School of Public Health 156

a) What is the probability that z < -1.96?

(1) Sketch a normal curve

(2) Draw a perpendicular line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -
1.96) = 0.0250

157
b) What is the probability that -1.96 < z < 1.96?

The area between the values

P(-1.96 < z < 1.96) = 0.9750 - 0.0250 = 0.9500

158
c) What is the probability that z > 1.96?

• The answer is the area to the right of the line; found by

subtracting table value from 1.0000;
P(z > 1.96) =1.0000 - 0.9750 = 0.0250

159
160
Exercise

1. Compute P(-1 ≤ Z ≤ 1.5)

Ans: 0.7745

2. Find the area under the SND from 0 to 1.45

Ans: 0.4265

3. Compute P(-1.66 < Z < 2.85)

Ans: 0.9493

School of Public Health 161

Applications of the Normal Distribution

Example:
• The diastolic blood pressures of males 35–44 years
of age are normally distributed with µ = 80 mm Hg
and σ2 = 144 mm Hg2
[σ = 12 mm Hg].
• Let individuals with BP above 95 mm Hg are
considered to be hypertensive

School of Public Health 162

Cont.

a. What is the probability that a randomly selected

male has a BP above 95 mm Hg?

Approximately 10.6% of this population would be

classified as hypertensive.

163
Cont.
b. What is the probability that a randomly
selected male has a DBP above 110 mm Hg?

Z = 110 – 80 = 2.50
12

P (Z > 2.50) = 0.0062

• Approximately 0.6% of the population has a
DBP above 110 mm Hg

164
Cont.

c. What is the probability that a randomly selected

male has a DBP below 60 mm Hg?
Z = 60 – 80 = -1.67
12

P (Z < -1.67) = 0.0475

• Approximately 4.8% of the population has a

DBP below 60 mm Hg
165
Other Distributions

1. Student t-distribution
2. F- Distribution
3. 2 –Distribution

School of Public Health 166

Sampling and Sampling
Distributions

167
Cont.
• Researchers often use sample survey methodology to
obtain information about a larger population by
selecting and measuring a sample from that population.

• Since population is too large, we rely on the

information collected from the sample.

• Inferences about the population are based on the

information from the sample drawn from that
population.

168
Cont.

• A sample is a collection of individuals selected

from a larger population.

• Sampling enables us to estimate the

characteristic of a population by directly
observing a portion of the population.

169
Cont.

Sample Information

Population

170
Steps needed to select a sample and ensure that
this sample will fulfill its goals.

1. Establish the study's objectives

– The first step in planning a useful and efficient survey is
to specify the objectives with as much detail as
possible.
– Without objectives, the survey is unlikely to generate
valuable results.
– Clarifying the aims of the survey is critical to its
ultimate success.
– The initial users and uses of the data should be
identified at this stage.

171
Cont.
2. Define the target population

– The target population is the total population for which the

information is required.

– Specifically, the target population is defined by the following

characteristics:
• Nature of data required
• Geographic location
• Reference period
• Other characteristics, such as socio-demographic characteristics

School of Public Health 172

Cont.
3. Decide on the data to be collected
– The data requirements of the survey must be established.

– To ensure that the requirements are operationally sound, the necessary data
terms and definitions also need to be determined.

4. Set the level of precision

– There is a level of uncertainty associated with estimates coming

from a sample.
– The sample-to-sample variation is what causes the sampling error.
– Researchers can estimate the sampling error associated with a
particular sampling plan, and try to minimize it.

173
Cont.
5. Decide on the methods on measurement

– Choose measuring instrument and method of approach to the

population
– Data about a person’s state of health may be obtained from
statements that he/she makes or from a medical examination
– The survey may employ a self-administered questionnaire, an
interviewing

6. Preparing Frame
– List of all members of the population
– The elements must not overlap
174
Sampling

• The process of selecting a portion of the

population to represent the entire population.

• A main concern in sampling:

– Ensure that the sample represents the population,
and
– The findings can be generalized.

School of Public Health 175

Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of
collecting information.

• Reduced cost: Sampling reduces demands on resource such as

finance, personnel, and material.

• Greater accuracy: Sampling may lead to better accuracy of

collecting data

• Sampling error: Precise allowance can be made for sampling

error

• Greater speed: Data can be collected and summarized more

quickly

School of Public Health 176

Disadvantages of sampling:
• There is always a sampling error.

• Sampling may create a feeling of

discrimination within the population.

• Sampling may be inadvisable where every unit

in the population is legally required to have a
record.

School of Public Health 177

Errors in sampling
1) Sampling error: Errors introduced due to errors in the
selection of a sample.
– They cannot be avoided or totally eliminated.

2) Non-sampling error:
- Observational error
- Respondent error
- Lack of preciseness of definition
- Errors in editing and tabulation of data

School of Public Health 178

Sampling Methods

Two broad divisions:

A. Probability sampling methods

B. Non-probability sampling methods

School of Public Health 179

Probability sampling
• Involves random selection of a sample

• A sample is obtained in a way that ensures every

member of the population to have a known, non zero
probability of being included in the sample.

• The method chosen depends on a number of factors,

such as
– the available sampling frame,
– how spread out the population is,
– how costly it is to survey members of the population

School of Public Health 180

Most common probability
sampling methods

1. Simple random sampling

2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multi-stage sampling

School of Public Health 181

1. Simple random sampling
• Involves random selection

• Each member of a population has an equal

chance of being included in the sample.

School of Public Health 182

Cont.

• To use a SRS method:

– Make a numbered list of all the units in the
population

– Each unit should be numbered from 1 to N (where

N is the size of the population)

– Select the required number.

183
Cont.

• The randomness of the sample is ensured

by:
• use of “lottery’ methods
• a table of random numbers

184
Example
• Suppose your school has 500 students and
you need to conduct a short survey on the
quality of the food served in the cafeteria.

• You decide that a sample of 10 students

should be sufficient for your purposes.

• In order to get your sample, you assign a

number from 1 to 500 to each student in
your school.

School of Public Health 185

Cont.

• To select the sample, you use a table of

randomly generated numbers.

• Pick a starting point in the table (a row and

column number) and look at the random
numbers that appear there. In this case, since
the data run into three digits, the random
numbers would need to contain three digits as
well.

186
Cont.
• Ignore all random numbers after 500 because they do
not correspond to any of the students in the school.

• Remember that the sample is without replacement, so

if a number recurs, skip over it and use the next random
number.

• The first 10 different numbers between 001 and 500

make up your sample.

187
Cont.

• SRS has certain limitations:

– Requires a sampling frame.
– Difficult if the reference population is dispersed.
– Minority subgroups of interest may not be
selected.

188
2. Systematic random sampling
• Sometimes called interval sampling,
systematic sampling means that there is a gap,
or interval, between each selected unit in the
sample

• The selection is systematic rather than

randomly

School of Public Health 189

Cont.

• Important if the reference population is

arranged in some order:
– Order of registration of patients
– Numerical number of house numbers
– Student’s registration books

• Taking individuals at fixed intervals (every kth)

based on the sampling fraction, eg. if the
sample includes 20%, then every fifth.
190
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is the total
population size).

2. Determine the sampling interval (K) by dividing the number of units in

the population by the desired sample size.

3. Select a number between one and K at random. This number is called the
random start and would be the first number included in your sample.

4. Select every Kth unit after that first number

Note: Systematic sampling should not be used when a cyclic repetition is

inherent in the sampling frame.

School of Public Health 191

Example
• To select a sample of 100 from a population of 400, you
would need a sampling interval of 400 ÷ 100 = 4.

• Therefore, K = 4.

• You will need to select one unit out of every four units to
end up with a total of 100 units in your sample.

• Select a number between 1 and 4 from a table of

random numbers.

School of Public Health 192

Cont.
• If you choose 3, the third unit on your frame
would be the first unit included in your
sample;

• The sample might consist of the following

units to make up a sample of 100: 3 , 7, 11, 15,
19...395, 399 (up to N, which is 400 in this
case).

193
Cont.
• Using the above example, you can see that
with a systematic sample approach there are
only four possible samples that can be
selected, corresponding to the four possible
random starts:
A. 1, 5, 9, 13...393, 397
B. 2, 6, 10, 14...394, 398
C. 3, 7, 11, 15...395, 399
D. 4, 8, 12, 16...396, 400
194
3. Stratified random sampling

• It is done when the population is known to be have

heterogeneity with regard to some factors and those factors
are used for stratification

• Using stratified sampling, the population is divided into

homogeneous, mutually exclusive groups called strata, and

• A population can be stratified by any variable that is available

for all units prior to sampling (e.g., age, sex, province of
residence, income, etc.).

• A separate sample is taken independently from each stratum.

School of Public Health 195

Why do we need to create strata?
• That it can make the sampling strategy more efficient.

• A larger sample is required to get a more accurate

estimation if a characteristic varies greatly from one
unit to the other.

• For example, if every person in a population had the

same salary, then a sample of one individual would be
enough to get a precise estimate of the average salary.

School of Public Health 196

Cont.
• Equal allocation:
– Allocate equal sample size to each stratum
• Proportionate allocation:
n
nj  N j, j = 1, 2, ..., k where, k is
N the number of strata and

– nj is sample size of the jth stratum

– Nj is population size of the jth stratum
– n = n1 + n2 + ...+ nk is the total sample size
– N = N1 + N2 + ...+ Nk is the total population
size
197
4. Cluster sampling
• Sometimes it is too expensive to spread a sample across
the population as a whole.

• Travel costs can become expensive if interviewers have to

survey people from one end of the country to the other.

• To reduce costs, researchers may choose a cluster

sampling technique

• The clusters should be homogeneous, unlike stratified

sampling where by the strata are heterogeneous

School of Public Health 198

Steps in cluster sampling
• Cluster sampling divides the population into groups or clusters.

• A number of clusters are selected randomly to represent the

total population, and then all units within selected clusters are
included in the sample.

• No units from non-selected clusters are included in the sample

—they are represented by those from selected clusters.

• This differs from stratified sampling, where some units are

selected from each group.

School of Public Health 199

Example

• In a school based study, we assume students of

the same school are homogeneous.

• We can select randomly sections and include all

students of the selected sections only

School of Public Health 200

Cont.
• Sometimes a list of all units in the population is not available,
while a list of all clusters is either available or easy to create.

• In most cases, the main drawback is a loss of efficiency when

compared with SRS.

• It is usually better to survey a large number of small clusters

instead of a small number of large clusters.
– This is because neighboring units tend to be more alike, resulting in a
sample that does not represent the whole spectrum of opinions or
situations present in the overall population.

• Another drawback to cluster sampling is that you do not have total control
over the final sample size.
201
5. Multi-stage sampling
• Similar to the cluster sampling.

• But it involves picking a sample from within each chosen cluster,

rather than including all units in the cluster.

• This type of sampling requires at least two stages.

• In the first stage, large groups or clusters are identified and selected.

• In the second stage, population units are picked from within the
selected clusters (using any of the possible probability sampling
methods) for a final sample.

School of Public Health 202

Cont.
• If more than two stages are used, the process of choosing
population units within clusters continues until there is a final
sample.

• Also, you do not need to have a list of all of the units in the
population. All you need is a list of clusters and list of the units
in the selected clusters.

• Admittedly, more information is needed in this type of sample

than what is required in cluster sampling. However, multi-stage
sampling still saves a great amount of time and effort by not
having to create a list of all the units in a population.

203
B. Non-probability sampling
• The difference between probability and non-probability
sampling has to do with a basic assumption about the
nature of the population under study.

• In probability sampling, every item has a known chance

of being selected.

• In non-probability sampling, there is an assumption that

there is an even distribution of a characteristic of
interest within the population.

School of Public Health 204

Cont.

• In non-probability sampling, since elements

are chosen arbitrarily, there is no way to
estimate the probability of any one element
being included in the sample.

• Also, no assurance is given that each item has

a chance of being included, making it
impossible either to estimate sampling
variability or to identify possible bias
205
Cont.
• Reliability cannot be measured in non-probability sampling;
the only way to address data quality is to compare some of
the survey results with available information about the
population.

• Still, there is no assurance that the estimates will meet an

acceptable level of error.

• Researchers are reluctant to use these methods because

there is no way to measure the precision of the resulting
sample.
206
Cont.

• Despite these drawbacks, non-probability

sampling methods can be useful when
descriptive comments about the sample itself
are desired.

• There are also other circumstances, such as

researches, when it is unfeasible or
impractical to conduct probability sampling.

207
The most common types of non-
probability sampling

1. Convenience or haphazard sampling

2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique

School of Public Health 208

1. Convenience or haphazard sampling

• Convenience sampling is sometimes referred

to as haphazard or accidental sampling.

• It is not normally representative of the target

population because sample units are only
selected if they can be accessed easily and
conveniently.

School of Public Health 209

Cont.

• The obvious advantage is that the method is

easy to use, but that advantage is greatly
offset by the presence of bias.

• Although useful applications of the technique

are limited, it can deliver accurate results
when the population is homogeneous.

210
Cont.

• For example, a scientist could use this method to

determine whether a lake is polluted or not.

• Assuming that the lake water is well-mixed, any

sample would yield similar information.

• A scientist could safely draw water anywhere on the

lake without bothering about whether or not the
sample is representative

211
2. Volunteer sampling
• As the term implies, this type of sampling occurs
when people volunteer to be involved in the study.

• In psychological experiments or pharmaceutical

trials (drug testing), for example, it would be
difficult and unethical to enlist random participants
from the general public.

• In these instances, the sample is taken from a group

of volunteers.

School of Public Health 212

Cont.

• Sometimes, the researcher offers payment to

attract respondents.

• In exchange, the volunteers accept the

possibility of a lengthy, demanding or
sometimes unpleasant process.

213
Cont.
• Sampling voluntary participants as opposed to
the general population may introduce strong
biases.

• Often in opinion polling, only the people who

care strongly enough about the subject tend
to respond.

• The silent majority does not typically

respond, resulting in large selection bias.
214
3. Judgment sampling
• This approach is used when a sample is taken based
on certain judgments about the overall population.

• The underlying assumption is that the investigator

will select units that are characteristic of the
population.

• The critical issue here is objectivity: how much can

judgment be relied upon to arrive at a typical
sample?

School of Public Health 215

Cont.
• Judgment sampling is subject to the
researcher's biases and is perhaps even more
biased than haphazard sampling.

• Since any preconceptions the researcher may

have are reflected in the sample, large biases
can be introduced if these preconceptions are
inaccurate.

216
Cont.

• Researchers often use this method in

exploratory studies like pre-testing of
questionnaires and focus groups.

• They also prefer to use this method in

laboratory settings where the choice of
experimental subjects (i.e., animal, human)
reflects the investigator's pre-existing beliefs
about the population.
217
Cont.

• One advantage of judgment sampling is the

reduced cost and time involved in acquiring
the sample.

218
4. Quota sampling

• This is one of the most common forms of non-

probability sampling.

• Sampling is done until a specific number of

units (quotas) for various sub-populations have
been selected.

School of Public Health 219

Cont.

• Since there are no rules as to how these

quotas are to be filled, quota sampling is
really a means for satisfying sample size
objectives for certain sub-populations.

220
Cont.

• As with all other non-probability sampling

methods, in order to make inferences about
the population, it is necessary to assume that
persons selected are similar to those not
selected.

• Such strong assumptions are rarely valid.

221
Cont.

• The main argument against quota sampling is

that it does not meet the basic requirement of
randomness.

• Some units may have no chance of selection

or the chance of selection may be unknown.

• Therefore, the sample may be biased.

222
Cont.

• Quota sampling is generally less expensive than

random sampling.

• It is also easy to administer, especially considering

the tasks of listing the whole population, randomly
selecting the sample and following-up on non-
respondents can be omitted from the procedure.

223
Cont.

• Quota sampling is an effective sampling

method when information is urgently required
and can be carried out sampling frames.

• In many cases where the population has no

suitable frame, quota sampling may be the
only appropriate sampling method.

224
5. Snowball sampling
• A technique for selecting a research sample
where existing study subjects recruit future
subjects from among their acquaintances.

• Thus the sample group appears to grow like a

rolling snowball.

School of Public Health 225

Cont.
• This sampling technique is often used in hidden
populations which are difficult for researchers to
access; example populations would be drug users or
commercial sex workers.

• Because sample members are not selected from a

sampling frame, snowball samples are subject to
numerous biases. For example, people who have
many friends are more likely to be recruited into the
sample.

226
Estimation

School of Public Health

• Up until this point, we have assumed that the values
of the parameters of a probability distribution are
known.

• In the real world, the values of these population

parameters are usually not known

• Instead, we must try to say something about the way

in which a random variable is distributed using the
information contained in a sample of observations

School of Public Health 228

• The process of drawing conclusions about an entire
population based on the data in a sample is known as
statistical inference.

• Methods of inference usually fall into one of two broad

categories:
** Estimation or Hypothesis testing **

• For now, we will focus on using the observations in a

sample to estimate a population parameter

School of Public Health 229

Estimation
• It is concerned with estimating the values of
specific population parameters based on
sample statistics.

• It is about using information in a sample to

make estimates of the characteristics
(parameters) of the source population.

School of Public Health 230

Estimation, Estimator & Estimate

♣ Estimation is the computation of a statistic from sample

data, often yielding a value that is an approximation (guess)
of its target, an unknown true population parameter value.

♣ The statistic itself is called an estimator and can be of two

types - point estimator or interval estimator.

♣ The value or values that the estimator assumes are called

estimates.

School of Public Health 231

Point versus Interval Estimators
• Point estimation involves the calculation of a single
number to estimate the population parameter

• Interval estimation specifies a range of reasonable

values for the parameter

 Thus,
– A point estimate is of the form: [ Value ],
– Whereas, an interval estimate is of the form:
[ lower limit, upper limit ]

School of Public Health 232

1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
Sample Statistics are Estimators of Population Parameters

Sample mean, µ
Sample variance, S2 2
Sample proportion, P or π
Sample Odds Ratio,
OR
OŔ
RR
Sample Relative Risk, RŔ
ρ
Sample correlation coefficient, r

School of Public Health 233

2. Interval Estimation
• Interval estimation specifies a range of reasonable values for the population
parameter based on a point estimate.

• A confidence interval is a particular type of interval estimator and

 Give a plausible range of values of the estimate likely to include the

“true” (population) value with a given confidence level.

 Also give information about the precision of an estimate.

 Wider CIs indicate less certainty.

 CIs can also answer the question of whether or not an association exists

School of Public Health 234

Confidence Level

• Confidence Level
– Confidence in which the interval will contain the
unknown population parameter

• P (L, U) = (1 - α)

School of Public Health 235

Estimation for Single Population

School of Public Health 236

1. CI for a Single Population Mean
(normally distributed)
A. Known variance (large sample size)

• There are 3 elements to a CI:

1. Point estimate
2. SE of the point estimate
3. Confidence coefficient

• Consider the task of computing a CI estimate of μ for a

population distribution that is normal with σ known.

• Available are data from a random sample of size = n.

School of Public Health 237

Cont.
Assumptions
 Population standard deviation () is known
 Population is normally distributed

• A 100(1-)% C.I. for  is:

·  is to be chosen by the researcher, most common values of  are 0.05,

0.01 and 0.1.

School of Public Health 238

Margin of Error
(Precision of the estimate)

School of Public Health 239

Cont.
 As n increases, the CI decreases.

 As s increases, the length of CI increases.

 As the confidence level increases (α decreases),

the length of CI increases.

School of Public Health 240

Example:
1. Waiting times (in hours) at a particular hospital are
believed to be approximately normally distributed with a
variance of 2.25 hr.

a. A sample of 20 outpatients revealed a mean waiting time

of 1.52 hours. Construct the 95% CI for the estimate of
the population mean.

b. Suppose that the mean of 1.52 hours had resulted from a

sample of 32 patients. Find the 95% CI.

c. What effect does larger sample size have on the CI?

School of Public Health 241

Solution:

2.25
a. 1.52  1.96  1.52  1.96(.33)
20
 1.52  .65  (0.87, 2.17)

• We are 95% confident that the true mean waiting

time is between 0.87 and 2.17 hrs.

 An incorrect interpretation is that there is 95%

probability that this interval contains the true
population mean.

242
Cont.
b.
2.25
1.52  1.96  1.52  1.96(.27)
32
 1.52  .53  (.99, 2.05)
c. The larger the sample size makes the CI
narrower (more precision).

243
Cont.
B. Unknown variance (small sample size, n ≤ 30)
• What if the  for the underlying population is
unknown and the sample size is small?

• As an alternative we use Student’s t

distribution.

School of Public Health 244

Cont.

School of Public Health 245

Example

• Standard error =
• t-value at 90% CL at 19 df =1.729

School of Public Health 246

Cont.

School of Public Health 247

Exercise

• Compute a 95% CI for the mean birth weight

based on n = 10, sample mean = 116.9 oz and
s =21.70.

• From the t Table, t9, 0.975 = 2.262

• Ans: (101.4, 132.4)

School of Public Health 248

2. CIs for single population proportion, p

• Is based on three elements of CI

– Point estimate
– SE of point estimate
– Confidence coefficient

School of Public Health 249

Cont.

School of Public Health 250

Example 1
• A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.

School of Public Health 251

Interpretation

School of Public Health 252

Example 2
• Suppose that among 10,000 female operating-room nurses,
60 women have developed breast cancer over five years. Find
the 95% for p based on point estimate.
• Point estimate = 60/10,000 = 0.006
• The 95% CI for p is given by the interval:

• The 95% CI for p is:

School of Public Health 253

Hypothesis Testing

School of Public Health

• The purpose of Hypothesis Testing is to aid the
researcher in reaching a decision (conclusion)
concerning a population by examining a
sample from that population.

School of Public Health 255

Hypothesis

• Is a statement about one or more

• Is a claim (assumption) about a population
parameter

The purpose of Hypothesis Testing is to aid the

researcher in reaching a decision (conclusion)
concerning a population by examining a
sample from that population.

School of Public Health 256

Examples of Research Hypotheses
Population Mean
• The average length of stay of patients
admitted to the hospital is five days

• The mean birth weight of babies delivered by

mothers with low SES is lower than those from
higher SES.
• Etc

School of Public Health 257

Types of Hypothesis
1. The Null Hypothesis, H0

· Is a statement claiming that there is no difference between the

hypothesized value and the population value.
· (The effect of interest is zero = no difference)

· States the assumption (hypothesis) to be tested

· H0 is always about a population parameter, not about a sample statistic

· Begin with the assumption that the Ho is true

– Similar to the notion of innocent until proven guilty

School of Public Health 258

Cont.
2. The Alternative Hypothesis, HA

• Is a statement of what we will believe is true if our sample data

causes us to reject Ho.

• Is generally the hypothesis that is believed (or needs to be

supported) by the researcher.

• Is a statement that disagrees (opposes) with Ho

(The effect of interest is not zero)

School of Public Health 259

Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses
clearly
• Specify HO and HA
H0:  = 0 H0:  ≤ 0 H0:  ≥ 0
H1:   0 H1:  > 0 H1:  < 0
two-tailed one-tailed one-tailed
2. State the assumptions necessary for computing
probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown

School of Public Health 260

Cont.
3. Select a sample and collect data
• Categorical, continuous

4. Decide on the appropriate test statistic for

the hypothesis. E.g., One population

School of Public Health 261

Cont.
5. Specify the desired level of significance for
the statistical test (=0.05, 0.01, etc.)
6. Determine the critical value.
– A value the test statistic must attain to be
declared significant.

-1.96 1.96 1.645 -1.645

School of Public Health 262

7. Obtain sample evidence and compute the
test statistic
8. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true (or
accepted).
• If Ho is not rejected, we conclude that Ho may
be true.

School of Public Health 263

Rules for Stating Statistical Hypotheses

1. One population
• Indication of equality (either =, ≤ or ≥) must appear
in Ho.
Ho: μ = μo, HA: μ ≠ μo
Ho: P = Po, HA: P ≠ Po
• Can we conclude that a certain population mean is
– not 50?
Ho: μ = 50 and HA: μ ≠ 50
– greater than 50?
Ho: μ ≤ 50 HA: μ > 50

School of Public Health 264

Cont.

• Can we conclude that the proportion of

patients with leukemia who survive more than
six years is not 60%?
Ho: P = 0.6 HA: P ≠ 0.6

School of Public Health 265

Statistical Decision

• Reject Ho if the value of the test statistic that

we compute from our sample is one of the
values in the rejection region

• Don’t reject Ho if the computed value of the

test statistic is one of the values in the non-
rejection region.

School of Public Health 266

Another way to state conclusion

• Reject Ho if P-value < α

• Accept Ho if P-value ≥ α

 P-value is the probability of obtaining a test statistic

as extreme as or more extreme than the actual test
statistic obtained if the Ho is true

 The larger the test statistic, the smaller is the P-value.

OR, the smaller the P-value the stronger the evidence
against the Ho.

School of Public Health 267

Types of Errors in Hypothesis Tests

• Whenever we reject or accept the Ho, we

commit errors.

• Two types of errors are committed.

– Type I Error
– Type II Error

School of Public Health 268

Type I Error
• The probability of a type I error is the
probability of rejecting the Ho when it is true

• The probability of type I error is α

• Called level of significance of the test

• Set by researcher in advance

School of Public Health 269

Type II Error
• The error committed when a false Ho is not
rejected

• The probability of Type II Error is 

• Usually unknown but larger than α

School of Public Health 270

Cont.

Action Reality
(Conclusion)
Ho True Ho False

Do not Correct action Type II error (β)

reject Ho (Prob. = 1-α) (Prob. = β= 1-Power)

Reject Ho Type I error (α) Correct action

(Prob. = α = Sign. level) (Prob. = Power = 1-β)

271
Type I & II Error Relationship

School of Public Health 272

Hypothesis Testing of a Single Mean
(Normally Distributed)

School of Public Health 273

Known Variance

School of Public Health 274

Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain population
has a mean age of 27. Can we conclude that the mean age of the
population is not 30? The variance is known to be 20. Let = .05.

 Answer, "Yes we can, if we can reject the Ho that it is 30."

A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population

School of Public Health 275

Cont.
C. Hypotheses
Ho: µ = 30
HA: µ ≠ 30
D. Test statistic
As the population variance is known, we use Z as the
test statistic.

School of Public Health 276

Cont.
E. Decision Rule
 Reject Ho if the Z value falls in the rejection region.
 Don’t reject Ho if the Z value falls in the non-rejection region.
 Because of the structure of Ho it is a two tail test. Therefore, reject Ho if
Z ≤ -1.96 or Z ≥ 1.96.

School of Public Health 277

F. Calculation of test statistic

G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection region. The
value is significant at 5% α.
H. Conclusion
We conclude that µ is not 30. P-value = 0.0340

A Z value of -2.12 corresponds to an area of 0.0170. Since there are two
parts to the rejection region in a two tail test, the P-value is twice this which
is .0340.

School of Public Health 278

Hypothesis test using
confidence interval
• A problem like the above example can also be solved
using a confidence interval.

• A confidence interval will show that the calculated

value of Z does not fall within the boundaries of the
interval. However, it will not give a probability.
• Confidence interval

School of Public Health 279

Example: One -Tailed Test

• A simple random sample of 10 people from a certain

population has a mean age of 27. Can we conclude that the
mean age of the population is less than 30? The variance is
known to be 20. Let α = 0.05.
• Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
• Hypotheses
Ho: µ ≥ 30, HA: µ < 30

School of Public Health 280

• Test statistic

• Rejection Region

Lower tail test

• With α = 0.05 and the inequality, we have the entire rejection region at the
left. The critical value will be Z = -1.645. Reject Ho if Z < -1.645.

School of Public Health 281

Cont.

• Statistical decision
– We reject the Ho because -2.12 < -1.645.

• Conclusion
– We conclude that µ < 30.
– p = .0170 this time because it is only a one tail test and not a two tail test.

School of Public Health 282

Unknown Variance
• In most practical applications the standard deviation of
the underlying population is not known

• In this case,  can be estimated by the sample standard

deviation s.

• If the underlying population is normally distributed, then

the test statistic is:

School of Public Health 283

Example: Two-Tailed Test
• A simple random sample of 14 people from a certain population gives
a sample mean body mass index (BMI) of 30.5 and sd of 10.64. Can we
conclude that the BMI is not 35 at α 5%?

• Ho: µ = 35, HA: µ ≠35

• Test statistic

• If the assumptions are correct and Ho is true, the test statistic follows
Student's t distribution with 13 degrees of freedom.

School of Public Health 284

Cont.
• Decision rule
– We have a two tailed test. With α = 0.05 it means that each tail is 0.025. The
critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604.

• Do not reject Ho because -1.58 is not in the rejection region. Based on

the data of the sample, it is possible that µ = 35. P-value = 0.1375

School of Public Health 285

Sampling from a population that is not normally
distributed

• Here, we do not know if the population displays a

normal distribution.

• However, with a large sample size, we know from the

Central Limit Theorem that the sampling distribution
of the population is distributed normally.

School of Public Health 286

Cont.
• With a large sample, we can use Z as the test statistic
calculated using the sample sd.

School of Public Health 287

Hypothesis Tests for Proportions
• Involves categorical values

• Two possible outcomes

– “Success” (possesses a certain

characteristic)
– “Failure” (does not possesses that
characteristic)

• Fraction or proportion of population in the “success”

category is denoted by p

School of Public Health 288

Proportions

School of Public Health 289

Hypothesis Testing about a Single Population
Proportion

(Normal Approximation to Binomial Distribution)

School of Public Health 290

School of Public Health 291
Example
• We are interested in the probability of developing asthma
over a given one-year period for children 0 to 4 years of age
whose mothers smoke in the home. In the general population
of 0 to 4-year-olds, the annual incidence of asthma is 1.4%. If
10 cases of asthma are observed over a single year in a
sample of 500 children whose mothers smoke, can we
conclude that this is different from the underlying probability
of p0 = 0.014? Α = 5%

H0 : p = 0.014
HA: p ≠ 0.014

School of Public Health 292

Cont.
• The test statistic is given by:

School of Public Health 293

Cont.
• The critical value of Zα/2 at α=5% is ±1.96.

• Don’t reject Ho since Z (=1.14) in the non-rejection region

between ±1.96.

• P-value = 0.2548

• We do not have sufficient evidence to conclude that the

probability of developing asthma for children whose
mothers smoke in the home is different from the
probability in the general population

School of Public Health 294

Sample size determination

School of Public Health 295

Sample Size

• Sample Size: The number of study subjects selected to

represent a given study population.

• In estimating a certain characteristic of a population,

sample size calculations are important to ensure that
estimates are obtained with required precision or
confidence

• Should be sufficient to represent the characteristics of

interest of the study population.

School of Public Health 296

Cont.

Sample size determination depends on the:

– Objective of the study
– Design of the study
• Descriptive/Analytic
– Accuracy of the measurements to be made
– Degree of precision required for generalization
– Plan for statistical analysis
– Degree of confidence with which to conclude

School of Public Health 297

Cont.

• The feasible sample size is also determined by

the availability of resources:
– time
– manpower
– transport
– available facility, and
– money

School of Public Health 298

Sample size for single sample

A. Sample size for estimating a single

population mean

B. Sample size to estimate a single population

proportion

School of Public Health 299

A. Sample size for estimating a single
population mean

Where d = e in some text books

• where d = Margin of error =

= Absolute precision
= Half of the width (w) of CI

School of Public Health 300

Examples:
1. Find the minimum sample size needed to estimate the drop
in heart rate (µ) for a new study using a higher dose of
propranolol than the standard one. We require that the
two-sided 95% CI for µ be no wider than 5 beats per minute
and the sample sd for change in heart rate equals 10 beats
per minute.
2 2 2
n = (1.96) 10 /(2.5) = 62 patients

School of Public Health 301

2. Suppose that for a certain group of cancer patients, we
are interested in estimating the mean age at diagnosis.
We would like a 95% CI of 5 years wide. If the population
SD is 12 years, how large should our sample be?

302
Cont.

• Suppose d=1
• Then the sample size increases

303
Cont.

3. A hospital director wishes to estimate the

mean weight of babies born in the hospital.
How large a sample of birth records should be
taken if she/he wants a 95% CI of 0.5 wide?
Assume that a reasonable estimate of  is 2.
Ans: 246 birth records.

School of Public Health 304

But the population 2 is most of the
time unknown
As a result, it has to be estimated from:
• Pilot or preliminary sample:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous or similar studies

School of Public Health 305

B. Sample size to estimate a single
population proportion

School of Public Health 306

Cont.

1. Suppose that you are interested to know the

proportion of infants who breastfed >18 months
of age in a rural area. Suppose that in a similar
area, the proportion (p) of breastfed infants was
found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points
with 95% confidence. Let p=0.20, d=0.03, α=5%

307
Sample Size: Two Samples

A. Estimation of the difference between two

population means

B. Estimation of the difference between two

population proportions

School of Public Health 308

A. Sample size for estimating a difference in two
means

School of Public Health 309

B. Sample size for estimating a difference in two
proportions

School of Public Health 310

Data Screening

School of Public Health 311

Data check entry

• One of the first steps to proper data screening is to

ensure the data is correct

– Check out each person’s entry individually

• Makes sense if small data set or proper data checking procedure

• Can be too costly so…

– range of data should be checked

School of Public Health 312

Normality
• All of the continuous data we are covering need
to follow a normal curve

• Skewness (univariate) – this represents the

spread of the data

School of Public Health 313

Cont.

• skewness statistic is output by SPSS and SE

skewness is
S Skewness
 Z skewness
SESkewness
Z skewness  3.2 violation of skewness assumption

School of Public Health 314

Cont.
• Kurtosis (univariate) – is how peaked the data is; Kurtosis stat
output by SPSS
• Kurtosis standard error
S Kurtosis
 Z kurtosis
SEKurtosis
Z kurtosis  3.2 violation of kurtosis assumption

– for most statistics the skewness assumption is more important that the
kurtosis assumption

School of Public Health 315

Outliers

• technically it is a data point outside of you

distribution; so potentially detrimental
because may have undo effect on
distribution

School of Public Health 316

Linearity

• relationships among variables are linear in

nature; assumption in most analyses

School of Public Health 317

Homoscedasticity

• For grouped data this is the same as

homogeneity of variance

• For ungrouped data – variability for one

variables is the same at all levels of another
variable (no variance interaction)

School of Public Health 318

Multicollinearity/Singularity

• If correlations between two variables are excessive

(e.g. 0.95) then this represents multicollinearity

• If correlation is 1 then you have singularity

• Often Multicollinearity/Singularity occurs in data

because one variable is a near duplicate of another

School of Public Health 319

1 Biostatistics LECTURE 1
100% (1)
1 Biostatistics LECTURE 1
64 pages
Overview of Epidemiologic Studies
No ratings yet
Overview of Epidemiologic Studies
56 pages
Levels of Prevention
No ratings yet
Levels of Prevention
24 pages
Epidemiology and Medical Statistics
100% (1)
Epidemiology and Medical Statistics
43 pages
Week 1 (Part 2) - Introduction To Biostatistics and Epidemiology
No ratings yet
Week 1 (Part 2) - Introduction To Biostatistics and Epidemiology
4 pages
Landmark Investigations in Epidemiology
No ratings yet
Landmark Investigations in Epidemiology
22 pages
Liberia IDSR Technical Guidelines
No ratings yet
Liberia IDSR Technical Guidelines
246 pages
Environmental and Occupational Health Part 1
No ratings yet
Environmental and Occupational Health Part 1
36 pages
Introduction to Epidemiology Basics
No ratings yet
Introduction to Epidemiology Basics
28 pages
0 Ppt1 Introduction To Biostatistics123
No ratings yet
0 Ppt1 Introduction To Biostatistics123
59 pages
Exercises Vitals
100% (2)
Exercises Vitals
1 page
Stata Data Analysis Lab Guide
No ratings yet
Stata Data Analysis Lab Guide
51 pages
Evaluating Causality in Epidemiology Studies
No ratings yet
Evaluating Causality in Epidemiology Studies
51 pages
Medical Statistics: "Statistics in Medicine" Redirects Here. For The Journal, See
No ratings yet
Medical Statistics: "Statistics in Medicine" Redirects Here. For The Journal, See
5 pages
Basic Biostatistics Part I
No ratings yet
Basic Biostatistics Part I
194 pages
Biostatistics Lecture Notes Overview
50% (4)
Biostatistics Lecture Notes Overview
36 pages
1.introduction To Epidemiology
No ratings yet
1.introduction To Epidemiology
186 pages
Introduction To Nursing Research: NUR 499 Waynesburg College
No ratings yet
Introduction To Nursing Research: NUR 499 Waynesburg College
47 pages
Biostatistics in
No ratings yet
Biostatistics in
75 pages
Biostastics
No ratings yet
Biostastics
430 pages
Lect 1 Introduction To Epidemiology
No ratings yet
Lect 1 Introduction To Epidemiology
46 pages
Biostatistics Course Overview
No ratings yet
Biostatistics Course Overview
18 pages
The National Nutrition Program (2016-2020) Progress Analysis: Evidence For The Upcoming Food and Nutrition Strategy Development
No ratings yet
The National Nutrition Program (2016-2020) Progress Analysis: Evidence For The Upcoming Food and Nutrition Strategy Development
42 pages
Biostatistics for Medical Students
100% (1)
Biostatistics for Medical Students
32 pages
2.4 Population Dynamics
No ratings yet
2.4 Population Dynamics
22 pages
Descriptive Epidemiology
No ratings yet
Descriptive Epidemiology
25 pages
3 Summarizing Data
No ratings yet
3 Summarizing Data
71 pages
Handbook Biostatistics MPH Ay 2016 2017
No ratings yet
Handbook Biostatistics MPH Ay 2016 2017
11 pages
Understanding Normal Distribution Basics
No ratings yet
Understanding Normal Distribution Basics
9 pages
EpiData 3.1 Setup and Usage Guide
No ratings yet
EpiData 3.1 Setup and Usage Guide
36 pages
Epidemiology Exercises and Calculations
75% (4)
Epidemiology Exercises and Calculations
6 pages
Chapter 4 Inferential
No ratings yet
Chapter 4 Inferential
135 pages
Introduction To Biostatistics & Epidemiology
No ratings yet
Introduction To Biostatistics & Epidemiology
41 pages
Introduction To Biostatistics1
No ratings yet
Introduction To Biostatistics1
23 pages
Analyzing Grouped Data Statistics
No ratings yet
Analyzing Grouped Data Statistics
51 pages
Introduction To Epidemiology - Ocred
100% (1)
Introduction To Epidemiology - Ocred
23 pages
Basic Concepts of Epidemiology
No ratings yet
Basic Concepts of Epidemiology
15 pages
STUDY DESIGNS-for PPT Use Only
No ratings yet
STUDY DESIGNS-for PPT Use Only
4 pages
HTHSCI 2G03 - Statistics and Epidemiology I
No ratings yet
HTHSCI 2G03 - Statistics and Epidemiology I
16 pages
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
No ratings yet
Introduction To Statistical Computing in Clinical Research: Biostatistics 212
39 pages
6) BIOSTATISTICs
No ratings yet
6) BIOSTATISTICs
99 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
44 pages
Principles of Communicable Diseases Epidemiology: Dr.K.Arulanandem Lecturer/Coordinator
100% (1)
Principles of Communicable Diseases Epidemiology: Dr.K.Arulanandem Lecturer/Coordinator
55 pages
EPIDEMIOLOGY LECTURE 4 Descriptive Epidemiology
No ratings yet
EPIDEMIOLOGY LECTURE 4 Descriptive Epidemiology
37 pages
Introduction to Community Health Concepts
No ratings yet
Introduction to Community Health Concepts
15 pages
Public Health & Epidemiology Guide
No ratings yet
Public Health & Epidemiology Guide
34 pages
Epidemiologic Study Designs: Dr. Sunita Dodani Assistant Professor Family Medicine, CHS
No ratings yet
Epidemiologic Study Designs: Dr. Sunita Dodani Assistant Professor Family Medicine, CHS
23 pages
Principles of Epidemiology Lecture Notes
No ratings yet
Principles of Epidemiology Lecture Notes
3 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
CHAPTER 6 Epidemiological Surveillance
100% (1)
CHAPTER 6 Epidemiological Surveillance
50 pages
Data Arrangement and Presentation Methods
No ratings yet
Data Arrangement and Presentation Methods
55 pages
Lecture 2-Introduction To Public Health.
No ratings yet
Lecture 2-Introduction To Public Health.
29 pages
Health Promotion Strategy Guide
No ratings yet
Health Promotion Strategy Guide
37 pages
Checklist For Cohort Studies
No ratings yet
Checklist For Cohort Studies
4 pages
SPSS Basics: Data Management & Analysis
100% (1)
SPSS Basics: Data Management & Analysis
58 pages
Introduction To Biostatistics HI
No ratings yet
Introduction To Biostatistics HI
38 pages
Biostatistics for Health Students
100% (1)
Biostatistics for Health Students
39 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
33 pages
Biostatistics Introduction
No ratings yet
Biostatistics Introduction
34 pages
Introduction to Biostatistics Guide
No ratings yet
Introduction to Biostatistics Guide
48 pages
Seismic Analysis for Engineers
No ratings yet
Seismic Analysis for Engineers
10 pages
Bridge and Culvert - MCQ Ebook
No ratings yet
Bridge and Culvert - MCQ Ebook
7 pages
JSTEFebruary 18 RP04
No ratings yet
JSTEFebruary 18 RP04
6 pages
Cement Testing Methods and Standards
100% (1)
Cement Testing Methods and Standards
30 pages
Biology Model Exam - Answer Key
No ratings yet
Biology Model Exam - Answer Key
1 page
CHS
No ratings yet
CHS
2 pages
Assume A Minimum Thickness To Be 2.5m.: Example: Arch Dam Profile Analysis
No ratings yet
Assume A Minimum Thickness To Be 2.5m.: Example: Arch Dam Profile Analysis
3 pages
Bar Bending Schedule BBS Format
No ratings yet
Bar Bending Schedule BBS Format
8 pages
Analysis and Design of G+5 Mixed Use Building by New EBCS Code
No ratings yet
Analysis and Design of G+5 Mixed Use Building by New EBCS Code
190 pages
S4S Wardrobe Audit Clothing Count: Item: Quantity
No ratings yet
S4S Wardrobe Audit Clothing Count: Item: Quantity
4 pages
Luminac GST - Outdoor Pricelist 2017
No ratings yet
Luminac GST - Outdoor Pricelist 2017
11 pages
Chapter 1 The Last Lesson by Alphonse Daudet
No ratings yet
Chapter 1 The Last Lesson by Alphonse Daudet
5 pages
Consumer Trend Canvas
100% (3)
Consumer Trend Canvas
23 pages
Capacitor Charging and Discharging Project
100% (2)
Capacitor Charging and Discharging Project
16 pages
2025 - Year 12 Subject Requirement List
No ratings yet
2025 - Year 12 Subject Requirement List
6 pages
WI 750 001 Doc Numbering
No ratings yet
WI 750 001 Doc Numbering
3 pages
Financial Performance of SBI vs ICICI
No ratings yet
Financial Performance of SBI vs ICICI
12 pages
7980-Enus-Ug Rev B
No ratings yet
7980-Enus-Ug Rev B
206 pages
Tecumseh Engine Valve Specs Guide
No ratings yet
Tecumseh Engine Valve Specs Guide
1 page
Csi ZG520 Ec-2r First Sem 2023-2024
No ratings yet
Csi ZG520 Ec-2r First Sem 2023-2024
6 pages
Geotextiles and Geomembranes
No ratings yet
Geotextiles and Geomembranes
13 pages
Road Safety Workshop Report 2017
No ratings yet
Road Safety Workshop Report 2017
18 pages
Internship Report Format
No ratings yet
Internship Report Format
13 pages
Smart Contracts and Intellectual Property
No ratings yet
Smart Contracts and Intellectual Property
6 pages
Sadia Resume
No ratings yet
Sadia Resume
2 pages
Chemical Engineering CV - Abyadh Fahmi
No ratings yet
Chemical Engineering CV - Abyadh Fahmi
1 page
AASHTO LRFD - The HL-93 Live Load Model - Dynamic Load Allowance
No ratings yet
AASHTO LRFD - The HL-93 Live Load Model - Dynamic Load Allowance
1 page
01-Introduction To Materials Science & Crystalline Structure
No ratings yet
01-Introduction To Materials Science & Crystalline Structure
38 pages
Unit-1 DM
No ratings yet
Unit-1 DM
16 pages
Service Quality Models A Review - Seth
No ratings yet
Service Quality Models A Review - Seth
53 pages
SMETA Training Session 2 For Buyers & Suppliers
No ratings yet
SMETA Training Session 2 For Buyers & Suppliers
42 pages
Excursions Grammar and Practice Book 1 PDF
No ratings yet
Excursions Grammar and Practice Book 1 PDF
2 pages
Suggestions Als
No ratings yet
Suggestions Als
4 pages
Intravenous Infusion Stability Guide
No ratings yet
Intravenous Infusion Stability Guide
1 page
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
No ratings yet
Effects of A Personalized Game On Students Outcomes and Visual Attention During Digital Citizenship Learning-1
23 pages
Microsoft Fabric Data Engineer Interview Roadmap
No ratings yet
Microsoft Fabric Data Engineer Interview Roadmap
2 pages
Gordon's Functional Health Pattern
100% (3)
Gordon's Functional Health Pattern
5 pages
British Standard: A Single Copy of This British Standard Is Licensed To Giorgio Cavalieri On March 15, 2001
No ratings yet
British Standard: A Single Copy of This British Standard Is Licensed To Giorgio Cavalieri On March 15, 2001
21 pages
Vector Sum and Resolving A Vector
No ratings yet
Vector Sum and Resolving A Vector
14 pages