Introduction to Biostatistics
Dr. Shakil Shams
M.B.B.S., M.Phil. (Anatomy)
Assistant Professor
Department of Anatomy
Dhaka Medical College, Dhaka
Topics outline
Statistics and Biostatistics
Research
Research methodology
Data
Variable
Scales of measurement
Study design
Sampling & sampling methods
Data presentation
Topics outline (Contd.)
Probability, P-value & probability distribution
Data interpretation
Data analysis
Estimation
Hypothesis & hypothesis testing
Tests of Significance
Sensitivity & specificity
Correlation & regression
STATISTICS & BIOSTATISTICS
Statistics
It is the science which deals with development and
application of the most appropriate methods for the –
Collection of data
• collection of data
• presentation of collected data
• analysis & interpretation of the results
• making decisions on the basis of such analysis
Statisticians try to interpret and
communicate the results to others
Biostatistics
It is the branch of applied statistics directed toward
applications in the health sciences and biology.
Why biostatistics?
some statistical methods are more heavily used in
health applications than elsewhere e.g. survival
analysis
examples are drawn from health sciences - makes
subject more appealing to those interested in health
illustrates how to apply methodology to similar
problems encountered in real life health situations
Biostatistics (Contd.)
Statistics is not merely a compilation of
computational techniques- it is a way of learning
from data
Biostatistics is concerned with learning from
biological, public health, and other health data
Basic Biostatistics 9
Biostatisticians are:
Data detectives who uncover
patterns and clues through data
description and exploration
Data judges who confirm and
adjudicate decision using
inferential methods
Basic Biostatistics 10
Basic Biostatistics
Knife for Surgeon
Biostatistics for Medical researcher
RESEARCH
Research
Systemic, scientific and ethical search to
explore new knowledge for solving a particular
problem
"Exitus" (death) table compiled by Dr. Sigmund Rascher
Body
Body
Water temperature
Attempt no. temperature at Time in water Time of death
temperature when removed
death
from the water
5.2 °C 27.7 °C 27.7 °C
5 66' 66'
(41.4 °F) (81.9 °F) (81.9 °F)
6 °C 29.2 °C 29.2 °C
13 80' 87'
(43 °F) (84.6 °F) (84.6 °F)
4 °C 27.8 °C 27.5 °C
14 95' 95'
(39 °F) (82.0 °F) (81.5 °F)
4 °C 28.7 °C 26 °C
16 60' 74'
(39 °F) (83.7 °F) (79 °F)
4.5 °C 27.8 °C 25.7 °C
23 57' 65'
(40.1 °F) (82.0 °F) (78.3 °F)
4.6 °C 27.8 °C 26.6 °C
25 51' 65'
(40.3 °F) (82.0 °F) (79.9 °F)
4.2 °C 26.7 °C 25.9 °C
29 53' 53'
(39.6 °F) (80.1 °F) (78.6 °F)
Is this research?
Is this research too?
“Among all criminals and murderers, the most
dangerous type is the criminal physician.”
Miklos Nyiszli, prisoner at Auschwitz
Nuremberg code of Medical Research,1947:
adopted 10 principles
informed written consent from the participant is
the first pre-requisite
Purpose of research
discover new facts or principles
verify and test old facts or principles
fresh interpretation of known fact
develop tool, concept, theory
Types of research
Basic research
Applied research
Quantitative research
Qualitative research
Descriptive research
Analytical research
Basic (pure) research
• deals with basic processes of life, disease & social /
natural phenomenon
• generate new ideas, principles & theories
• intellectually & academically interesting
• has no immediate utility for any existing pressing
problem
e.g.
How does a malignant cell multiply?
How genes are regulated?
Applied (practical) research
• existing problem oriented research directed to solve
an immediate pressing problem
• inspired by the need of social action
e.g.
risk factors of early onset MI
prevention of high infant mortality
Quantitative research
• deals with phenomenon that can be quantified and
expressed in terms of quantity/amount
• measures and compares the magnitude of a problem
or phenomenon
• data collected in numbers
e.g.
height, weight, skinfold thickness, upper arm and calf
girth, breadth of humerus and femur
study on dietary program for reducing body weight
Qualitative research
• deals with the phenomenon that can not be quantified
and can not be expressed in terms of quantity
• explains, describes and understands the phenomenon
to answer about the complex nature of it
• data collected in words
e.g.
People’s opinion about the services provided by
Dhaka Medical College Hospital
Perception of security at work place by the female
garment-manufacturing workers
Descriptive research
• describes situations and events as it exists/ed naturally
to answer the questions who, what, when, which and
where
• researcher has only to report what is happening or
what has happened - no explanation & no cause-effect
relation
e.g.
prevalence, distribution & pattern of goiter
antenatal care practice in rural area
Analytical (explanatory) research
•explains the reasons of the phenomenon that the descriptive
research observed
•deals with the determinants of the phenomenon
•attempts to establish Cause-Effect relationship between
variables
•use facts already available & analyze these to make critical
evaluation
Analytical research (Contd.)
•answers to the question how/why
e.g.
Why goiter is more common in some areas of Bangladesh?
Is analytical research superior to
descriptive research?
Stages of research
1) Planning of research
• Generation of idea and problem identification
• Knowledge building about the problem (review of literature)
• Statement of the research problem
• Statement of research question (RQ)
• Statement of research hypothesis
• Setting of research objective
• Deciding on appropriate study population & study design
• Deciding on appropriate sample size & sampling technique
• Deciding on data collection plan
• Development of data collection instrument
Stages of research (Contd.)
2) Implementation or data collection
3) Data management
• Data editing
• Data reduction
• Data presentation
• Data analysis
• Data interpretation
• Data inference / decision
4) Report writing
5) Dissemination / publication of the report
Some of these steps may go
simultaneously & repeatedly
throughout the research process
List of topics :
Research methodology & research method
Some steps of planning of research
Research problem
Knowledge building/ literature review
Research title
Research question(s)
Research hypothesis
Research objective(s)
Research questionnaire
Research methodology
“The grand aim of all science is to cover the greatest
number of empirical facts by logical deduction from
the smallest number of hypotheses or axioms.”
Albert Einstein
Research methodology
It is the way to deal with the various steps adopted by a
researcher to study the research problem systematically
along with the logic, assumptions and rationale behind
them.
Whenever we choose a research method, we must justify
why we preferred this over others. Research methodology
seeks to answer this question.
The two terms research methodology and
research method are often used
interchangeably. This is incorrect.
Research methods
Techniques and tools used for conducting research.
These are -
1. methods dealing with collecting & describing data
2. techniques used for establishing statistical relationship
between variables e.g. statistical tests, correlation-regression
analysis, odds ratio etc.
3. methods used to evaluate the reliability, validity and
accuracy of the results delivered from the data e.g.
sensitivity, specificity, PPV, NPV etc.
When we speak about research methodology we not
only talk about the research methods but also keep in
view the logic behind the method that we use in the
context of our research undertaken.
Research methodology has many dimensions; research
methods are part of research methodology.
Research problem
Research problem
Research problem is a perceived difficulty or a feeling of
discomfort which a researcher experiences due to the
discrepancy between the existing situation and what it
should be
“যা চেয়েছি কেন তা পাই না?”
Research problem
“যা পেয়েছি কেন তা চাই না?”
Research problem
Problem = Expectation – Reality
How does research problem arise?
Research problem arises from:
Day-to-day personal experiences
Practical issues in the hospital and community
Findings of the previous researches
Brain storming
Intuition
Criteria of research problem
Must be -
researchable
important
feasible
ethically acceptable
Should be interesting
Knowledge building
Knowledge building
for obtaining in-depth knowledge & insights about the research
problem
done by:
literature review
review of books, conference proceedings, reports
thesis, newspaper, magazines
experience survey
focus group interview
case study
pilot study
Purpose of Knowledge Building:
avoidance of duplication / repetition
finding of gaps / conflicting information
identification of unanswered questions
discovery of fallacy / inconsistency
identification of study variables
development of research question, hypothesis, objective
idea generation about population, study design, sample
size, sampling, statistical process & study procedure
Knowledge building about the problem
Before undertaking research work 60-70%
During the research work 10-20%
After completing data analysis but
15-20%
before writing the research report
These proportions are not rigid.
Research title
Research title
Should be accurate, complete and specific
For accuracy, should use the same terms in the title as in
the question and answer
For completeness, should include all information that
reflect all the main topics
For specificity, should use specific words
Research title (Contd.)
Title should be unambiguous
avoid noun clusters
avoid misplaced adjectives
not use abbreviations
Title should be concise
<100 characters and spaces
omit unnecessary words
omit nonspecific openings such as ̎Studies of ̏
omit ̎ the ̏ at the beginning of the title
Research question
Research question
Queries or ideas arising out of the research problem for the
researcher seeks answer through his or her research effort
research question is a question, not a statement
there may be more than one research question in a research
e.g. Does Hb. concentration fall in pregnancy?
Types of research question
What
Where
When Descriptive study
Who
Which
How
Why Analytical study
What is the effect of exercise on serum cholesterol?
How different foot dimension are related to stature?
Who are affected in machine handling in garments factory
regarding hand grip strength?
Where is the drug found in highest concentration after its
intravenous administration?
Why 2D:4D is sexually dimorphic amongst adult
Bangladeshies?
Criteria of a good research question
FINER
feasible
interesting
novel
ethical
relevant
How to develop a good research question:
1. begin by identifying a broader subject of interest that lends
itself to investigation e.g. childhood obesity
2. do preliminary research on the general topic to find out what
research has already been done and what literature already
exists
3. find a unique area that yet to be investigated or a particular
question that may be worth replicating
How to develop a good research question (Contd.)
4. begin to narrow the topic by asking open-ended "how" and
"why" questions e.g. consider the factors that are
contributing to childhood obesity or the success rate of
intervention programs
5. create a list of potential questions for consideration and
choose one that interests you and provides an opportunity for
exploration
6. Finally, evaluate the question by using the following list of
guidelines:
Is the research question interesting?
Is it a new issue or problem that needs to be solved?
Is it attempting to shed light on previously researched topic?
Is the research question researchable?
Is the methodology to conduct the research feasible?
Is the research question measureable?
Will the process produce data that can be supported or
contradicted?
Is the research question too broad or too narrow?
Too narrow Better
What is the childhood obesity rate in How does the education level of the
Dhaka? parents impact childhood obesity rates
in Dhaka?
This is too narrow because it can be This question demonstrates the correct
answered with a simple statistic. amount of specificity and the results
would provide the opportunity for an
Questions that can be answered with a
argument to be formed.
"yes" or a "no" should typically be
avoided.
Unfocused and too broad More focused
What are the effects of childhood How does childhood obesity correlate with
obesity in the Bangladesh? academic performance in elementary
school children?
This question is so broad that research
This question has a very clear focus for which
methodology would be very difficult.
data can be collected, analyzed, and discussed.
The question is too broad to be discussed in
a typical research paper.
Too objective More Subjective
How much time do young children What is the relationship between
spend doing physical activity per day? physical activity levels and childhood
obesity?
This question may allow the researcher to This is a more subjective question that may
collect data but does not lend itself to lead to the formation of an argument based on
collecting data that can be used to create a the results and analysis of the data.
valid argument because the data is just factual
information.
Too simple Better
How are school systems addressing What are the effects of intervention
childhood obesity? programs in the elementary schools on
the rate of childhood obesity among 3rd -
6th grade students?
This information can be obtained without the This question requires both investigation and
need to collect unique data. The question could evaluation which will lead the research to form
be answered with a simple online search and an argument that may be discussed.
does not provide an opportunity for analysis.
PHRASING OF RESEARCH QUESTION
DICTATES TO THE APPROPRIATE STUDY
DESIGN
Research question Study design needed
How severity of DM is related with food habit? Cross sectional study
Is high serum uric acid concentration in eclampsia
Case control study
associated with adverse fetal outcome?
What is the effect of exercise on serum cholesterol?
Cohort study
Is there any influence of spirulina on experimentally
Experimental study
induced atherosclerosis in rats?
Is laparoscopic cholecystectomy better than traditional
Clinical trial
surgery?
Research hypothesis
Research hypothesis
It is a logical, tentative(assumed) & testable answer(s) to the research
question
Concerned with the parameters of the population about which the
statement is made
e.g.
Hemoglobin concentration falls during pregnancy.
Hypothesis reflects the depth of knowledge and the capability of
imagination of the researcher
Importance of hypothesis:
Help to select methodology and methods of data collection
Help to select population and variables to be studied
Help to select intervention needed
Types of hypothesis
1. Null hypothesis (Ho) :
It is the hypothesis of no difference.
It states –
no real difference between sample statistics &
population parameter
observed result is purely due to by chance or sampling
error
Examples:
(1) Mean cholesterol value in normal (M1) = Mean cholesterol
value in hypertension patients ( M2 )
(2) No association between lung cancer and smoking
2. Alternative hypothesis (Ha):
It is the hypothesis of difference.
It states –
sample statistic is different from population parameter
observed result is not due to by chance or sampling error
rather due to some valid reason or extraneous factor (real
difference)
Examples:
(1) Mean cholesterol value in normal (M1) <
mean cholesterol value in hypertension patients ( M2 )
(2) There is association between lung cancer and smoking
* In fact, alternative hypothesis is the researcher’s
hypothesis
H0 & Ha are diagonally opposite
• Research question is must for all research.
• Research hypothesis is needed in
analytical research.
Examples of good and bad hypothesis statements
Steps of constructing a good hypothesis
1. State the research question:
“How does a person’s level of education influence their
attitudes towards immigrant rights?”
Steps of constructing a hypothesis (Contd.)
2. Write a theoretical statement explaining why you think the
independent variable will increase or decrease the dependent
variable.
Possible ways education could affect immigration attitudes:
People with higher levels of education are more politically liberal
than those with low levels of education, so people with higher
levels of education will be more supportive of immigrant rights.
People with less education are more likely than people with more
education to know someone who has recently immigrated to the
U.S., so people with lower levels of education will be more
supportive of immigrant rights.
Steps of constructing a hypothesis (Contd.)
3. State the alternative hypothesis.
“People with a college degree will agree more strongly than
those with no college degree that legal immigrants should
have the same rights as U.S. citizens.”
Steps of constructing a hypothesis (Contd.)
A hypothesis must be falsifiable. This means the hypothesis
can be proved wrong.
So,
4. Finally state the null hypothesis.
“People with a college degree do not agree more strongly than
those with no college degree that legal immigrants should
have the same rights as U.S. citizens.”
Some points to note :
Hypothesis -
gives only the tentative explanation of the research question
is a statement , not a question
is not a must for a research
may or may not be the real situation
is to test, not to prove
It is better for the researcher to follow the research
questions, instead of formulating hypothesis for
the problems on which no research has been
carried out so far
Research objective
Research objectives (SMAART)
• Objectives are goals to be achieved through the research
process
• Reflects the questions whose answers the researcher wants
the study to yield
• Objective covers –
exploration
description
explanation
prediction
evaluation
impact assessment
Research objectives (Contd.)
General objective:
overall goal /aim of the research process
short statement about what is expected to be
achieved by the research
Research objectives (Contd.)
Specific objective:
every individual task or work that is to be done to
achieve the general objective
Ultimate objective:
statement that tells about the benefits, implications and
utilization of study findings
Example (early onset MI)
Research problem:
• Now-a-days, people are dying from MI in very early age.
Research question:
• What are the risk factors of early onset MI?
Research hypothesis:
• Obesity, HTN, DM & dyslipidemia are the causes of early
onset MI.
General objective:
• to find out the risk factors of early onset MI
Specific objective:
• to measure height & body weight of study subjects
• to measure blood pressure of study subjects
• to measure fasting blood glucose of study subjects
• to measure lipid profile of study subjects
Ultimate objective:
• to make the people aware about the risk factors of early
onset MI to reduce the burden of early onset MI
Research questionnaire
Research questionnaire
A questionnaire is a data collection tool containing series of
questions that is generally mailed or handed over to the
respondent and filled in by the respondent him/herself or by
the interviewer in favour of the respondent.
Questionnaire should be be simple & clear
Questionnaire should also be adequate in length
“A good speech should be like a woman's skirt; long
enough to cover the subject and short enough to
create interest.”
Winston S. Churchill
Research is not …
an accidental discovery
accidental discovery may occur in structured research
process
merely collection of data
collecting reliable data is part of the research process
searching out published research results in libraries /
internet
research process always includes synthesis and analysis
but just reviewing of literature is not research
Research is a creative and circular process
A brave researcher sits behind the target to see what he can see
&
never distorts the facts even it is found contra to be
Basic Biostatistics 99
DATA & VARIABLE
“Data! data! data!” [Holmes] cried impatiently. “I can't make
bricks without clay.’’
DATA:
a set of values recorded on observational units
VARIABLE:
characteristic or attribute of an individual / object /
phenomenon that take on different values in different
persons / objects or in the same person / object in different
time, place etc.
Example:
Variable Data
Blood pressure 120 mm of Hg, Hypertension
Age 50 yrs., Old, Young
Types of data
1.Qualitative (categorical) data
• data that vary in kinds
• expressed as rate, ratio, percentage, proportion
• have no scale of measurement
• provides answer to the question : What type?
• two types :
a) Nominal : categories cannot be ordered one
above another e.g. Sex , Marital status
b) Ordinal : categories can be ordered one
above another
e.g. Level of knowledge, Pain score
2.Quantitative (numerical) data
• data that vary in amount and can be measured and ordered in
terms of quantity
• expressed as mean, range etc.
• have scale of measurement
• provides answer to the question : How much?
• two types –
a) Continuous : take any value even fractions or decimals
e.g. height, weight
b) Discrete : take only whole numbers
e.g. family member
Exercise 1:
To check the accuracy of the clinical diagnosis of malaria,
blood slides from 33 patients were examined for MPs.
There were three possible results : Negative, P.
falciparum or P. vivax.
The results were :
Negative 19
P. falciparum 13
P. vivax 1
TOTAL 33
These data are:
Nominal/Ordinal/Continuous/Discrete
Exercise 2:
Health personnel from 148 rural health institutions were
asked the following question : “How often have you run
out of anti-malarial drugs in the last two years?”
There were four possible answers : never , 1 to 2 times
(rarely) , 3 to 5 times (occasionally) , more than 5 times
(frequently).
The results were :
Never 47
Rarely 71
Occasionally 24
Frequently 6
TOTAL 148
These data are:
Ordinal
At the end, statistics is a game of numbers, be it a
qualitative or quantitative data.
Types of variable
1. Independent variable (Usually the cause)
• variable that influences, regulates or cause some changes
in the dependent variable
• selected for the study in the believe that it is a contributory
factor or at least can influence the problem
• has causal or input or exposure status
2. Dependent variable (Usually the effect)
• a measure that reflects the effect of independent
variable
• selected for the study in the believe that it helped to
describe the problem
• has output or outcome or effect or response status
DOORE
Example :
Independent V. Dependent V.
Salt intake Hypertension
Hypertension MI
Finger tip to root Digit 3 Hand grip strength
3. Intervening (Intermediate) variable:
• 3rd variable through which independent variable affects the
dependent variable
• fits into a causal chain
e.g.
Independent V. Intervening V. Dependent V.
Salt intake Hypertension MI
Low economic Inadequate diet Underweight
status
Example from research setting :
It is expected that the incidence of diarrhoea would
decrease as the number of water faucets in a village
increased. If there is no change over time, there might
be an intervening variable.
People, for example, may dislike the taste of tap-water so
much that they use it for everything, except for
drinking.
4. Confounding (Extraneous) variable:
• 3rd variable that is independently related to both
dependent & independent variables and thereby might
affect the relationship between these two
• not in a causal chain
• not related to the purpose of study
• distort the study result
e.g.
Obesity MI
Hypertension
Possible combinations:
Obesity (+) Hypertension (+) MI (+)
Confounding variable
Obesity (+) Hypertension (+) MI (-)
Obesity (+) Hypertension (-) MI (+)
Obesity (+) Hypertension (-) MI (-)
Obesity (-) Hypertension (+) MI (+)
Obesity (-) Hypertension (+) MI (-)
Obesity (-) Hypertension (-) MI (+)
Obesity (-) Hypertension (-) MI (-)
Other terms related to data
Primary data
• Obtained 1st hand by researcher
• Generated by observation, measurement, interview etc.
Secondary data
• Obtained by some others
• Have already passed through statistical processes
• Collected from records, documents, journals , other studies
Derived data
• Derived from primary or secondary data
e.g. BMI derived (calculated) from body weight and height
Dichotomous (Binary) data
• Expressing only two mutually exclusive information
e.g. Sex (male or female)
Univariate data
• Express only one information
e.g. Birth rate of female baby
Bivariate data
• Express two linked/related information simultaneously
e.g. Birth rate of Rh+ve female baby
Multivariate data
• Express more than two related information simultaneously
e.g. Birth rate of Rh+ve female premature baby
Outlier data
• Distinct from the main body of data
• Incompatible with the rest of the data
• Usually regarded as error but may be true also
SCALES OF MEASUREMENT
Scales of Measurement
Nominal: Classification
Ordinal: Ranking
Interval: Equal interval
Ratio: Absolute zero
Scale Classification Order Equal Intervals Zero
Nominal Yes No No No
Ordinal Yes Yes No No
Interval Yes Yes Yes No
Ratio Yes Yes Yes Yes
Scales of measurement (Contd.)
Nominal scale:
• Assigns some numerical values to the
variable only for-
identification
e.g. Student registration no. 1,2,3 …
Scales of measurement (Contd.)
Ordinal scale:
• Assigns some numerical values to the
variable for-
identification
ranking
e.g.
Rich - 1
Middle class - 2
Poor - 3
Scales of measurement (Contd.)
Interval scale:
Contains highest & lowest values
Intervals between adjacent scale values
are equal
Has an arbitrary zero
Measures on either side of zero
e.g. Celsius & Fahrenheit
temperature scales, IQ scores
Interval scale (Contd.)
For –
identification
ranking
addition
subtraction
Ratio scale:
Contains highest & lowest values
Intervals between adjacent scale values are
equal
Has an absolute zero (zero means zero)
Measures on only one side of zero
e.g. Kelvin temperature scale, income, length,
area, volume, height, weight
Ratio scale (Contd.)
For–
identification
ranking
addition
subtraction
multiplication
division
Highest scale - can be used for any purpose
20
15
10
-5
-10
-15
-20
Interval scale Ratio scale
Hint:
If score can go below zero or if
the zero is not a true zero, measurement is
interval.
If score can not go below zero or
if the zero is a true zero, measurement is ratio.
R AB
Possible data types and levels of measurement
Level of measurement is set for each variable to
obtain the required statistical test &
presentation in SPSS
STUDY DESIGN
STUDY DESIGN
Scientific and ethical methods of search to collect valid
and reliable information / data
Features of study design:
ethical
capable to obtain reliable and valid data
help researcher to avoid wrong conclusion
Selection of a study design depends on -
type of research problem
resources available
time to complete the research
knowledge about the research problem
Some core concepts
Validity / accuracy: degree of closeness of a measurement or
a test to the fact (true value)
Reliability / precision: quality of a test to get the consistent
results over time when repeated in identical situations
Example : BP of an individual is 120 mmHg.
The center of the target represents the true value of the
substance being tested (Here, 120mm of Hg).
Reliable Reliable &
not valid valid
Not reliable Data set-2
Data set-1 not valid
119, 120, 122
88, 89, 90 Data set-3
88, 110, 150
Temporal relationship:
Time sequence between exposure /input/cause and
outcome/result/effect
Exposure always precede the outcome
Outcome always follows the exposure
“ ডিম আগে না মুরগী আগে ??? ”
Exposure Outcome
Obesity MI
Smoking Lung cancer
Prospective study
Measurement of exposure prior to the occurrence of
outcome
or
Data recorded or generated about the events that
will occur after the start of the study
Forward looking study
Moves from exposure to outcome
Retrospective study
Measurement of exposure after the occurrence of
outcome
or
Data recorded about the events occurred in the past
before the start of the study
Backward looking study
Moves from outcome to exposure
Prospective study:
Exposure Outcome
Retrospective study:
Outcome Exposure
•Spatial relationship:
Related to space
e.g.
Change in spatial
configuration of
troponin during
muscle contraction
“ আমি কোথায় ??? ”
Longitudinal study
• Data collection at more than one point of time with
follow up on same study subjects
•Usually prospective but may be retrospective as well
e.g. Clinical outcome of CABG in CAD
Rate of RTA of a city in last 10 years
Onset
Longitudinal & Longitudinal &
Retrospective Prospective
Cross sectional study
Single time data collection on a cross section of
population at one definite point of time or within a
short span of time
No follow up & no repeated data collection
e.g. Recording the ECG findings of an MI patient at the
time of admission – just for once
Types of study design
Types of study design
Observational (non-interventional):
a. Descriptive
1. Case study
2. Cross sectional study
3. Surveillance
b. Analytical
1. Case control study
2. Cohort study
3. Cross sectional study
Experimental (interventional):
1. Clinical trial
2. Community intervention trial
3. Quasi experimental study
OBSERVATIONAL vs. EXPERIMENTAL STUDY
Outcome
Cause/exposure Outcome
Observational / Non-interventional
study
Based on observation of events created either by nature
(by natural phenomenon) or by human beings (but not for
research purpose)
Researcher does not manipulate the outcome
A sample of population is observed for different
characteristics by interview, questionnaire, measurement,
records etc.
Study on “Correlation of stature with measurements of
foot segments calculated from footprint and foot
outline of adult male Bangladeshis”
Study on “Health hazards after an earthquake”
e.g. “Hand anthropometry and hand grip strength of adult
female Bangladeshi garment workers”
Types of observational study
Types of descriptive study
a) case study/case report - for rare diseases or common diseas
with uncommon presentation
CA R D or CA RP
b) case series- repeated case study
c) cross sectional study- when comparison and follow-up
is not done
Analytical study
Synonyms - Comparative study
Explanatory study
Causal study
Needs a comparison group to come to a conclusion
Explains disease occurrence
Analyzesand answers specific research question/s
Research hypothesis is a must
Analytical study (Contd.)
Primary goal is to establish the association between the
exposure/ risk factor/etiology and
outcome/disease, so provides etiology or
determinants of the problem
Types are:
a) case control study
b) cohort study
c) cross sectional study-when comparison is done
Descriptive vs. Analytical Study
1. Describes the distribution 1. Describes the causes
of problem or determinants of
problem
2. Needs no comparison 2. Comparison group
group needed
3. No attempt to analyze 3. Cause-effect
cause-effect relationship relationship analyzed
4. Usually no hypothesis 4. Hypothesis testing is
testing is done done
Distinction between descriptive study
& analytical study is not so clear cut
A large scale descriptive study may give a clear
answer to specific research question
An analytical study may be incidentally of great
descriptive interest
Case control / Case referent study
Moves from outcome to exposure
Outcome-based sampling is done. Study subjects are
selected based on the outcome (present/absent)
Cases are with outcome and controls are without outcome
Outcome not always has to be disease
Mostly retrospective and longitudinal
Case-control ratio - 1:1 or maximally up to 1:4
Types of case control study
Population based : case and control from the same base
population
Hospital based : case and control are selected from
hospital admitted patients
Multi-factorial : several exposures are explored
simultaneously
Nested : case and control are selected from the same
cohort and nested within the cohort
Advantages of case control study
Quick, inexpensive
Small sample size
Good for rare and chronic disease
Less ethical constrain
CA R D
Disadvantages of case control study
Can not infer temporality between exposure and
outcome
Information on exposure may be less accurate
Prone to bias
Cohort study
Cohort: Special group of people having some definite, common
base line characteristics and exposed to the same environment
for a long period and who are followed up for a definite period
e.g. radiologists, garment workers, army personnel, mine
workers etc.
CO R E
Good for rare exposure
e.g. Alcohol consumption in antenatal period
Cohort study (Contd.)
Moves from exposure to outcome
Exposure-based sampling is done. Study subjects are
selected based on the exposure(present/absent)
Exposed group (with exposure) & unexposed group
(without exposure)
All must be free from outcome at the start of study
Mostly prospective and longitudinal
Cohort study (Contd.)
e.g.
a) Study on prevalence of respiratory tract infections of
individuals working in anatomy dissection hall
b) Study on the association of deposition of metals in the
lungs of coal miners
Cohort study (Contd.)
Researcher identifies a cohort population with and
without the exposure status but all must be free of
outcome at the start
follows the cohort in future with observation at
several points of time and determines the outcome in
exposed and unexposed group
Design of cohort study
Types of Cohort Study
1. Prospective cohort study
2. Retrospective or Historical cohort study
Exposed (CO) Poor SAP
Good SAP
Cohort
Poor SAP
Unexposed (no CO)
Good SAP
Time
Onset
Direction of inquire
Prospective cohort study on childhood obesity
(CO) & school academic performance (SAP)
Historical cohort study
In 2016, a historical cohort study can be designed to
study the effect of neonatal asphyxia on future
neurological disability by conceptually going back to
1996.
Retrospective as well as prospective
Historical cohort study (Contd.)
• Outcome already occurred before study onset.
• Exposure base in past
• Cohort is defined in past based on previous dat.
• Follow up directed (through records) from past to
present up to certain cut-off time
Exposed (CO) Poor SAP
Good SAP
Cohort
Poor SAP
Unexposed (no CO)
Good SAP
Time
Onset
Direction of inquiry
Historical cohort study on childhood obesity (CO)
& school academic performance (SAP)
Prognostic Cohort Study
Identify factors influencing the prognosis of disease after
diagnosis & treatment.
Cohort composed of cases diagnosed and treated & then
they are followed up to evaluate prognosis with respect
to some factors.
Here cohort cases are not free from disease but free from
outcome of interest (cure, death, disability etc).
Death
MI with DM
Cured
MI
Death
MI without DM
Cured
Time
Onset
Direction of inquiry
Prognostic cohort study of DM on MI
Advantages of cohort study
Possible to measure multiple outcomes against a
single exposure
Less prone to bias
Can ensure temporality
Good for rare exposure
Measures incidence
A wide picture of health hazard can be obtained
Disadvantages of cohort study
Costly, time consuming
Large sample size needed
Chance of attrition- lost to follow up due to
withdrawal, death, change of location
Not good for rare disease
Experimental / interventional study
Based on experiment, created by the researche
Researcher determines who will be exposed to
the factor of interest and who will not
Researcher intervenes to affect the outcome
More ethical and feasibility issues
Less chance of bias
e.g. To reduce the BP
give drug
change of lifestyle
Intervention may be:
Drug therapy
Treatment regime
Surgery
Dietary manipulation
Change of lifestyle
Medical counseling
Rehabilitation procedure
Blinding or masking
Ignorance of certain person/s involved with the
clinical trial - regarding the intervention assigned to
the participants
Advantages:
Avoid assessment bias
Protect the behavioral change of the participants
Types :
1. Single blind: participants are kept ignorant
2. Double blind: participants and assessors
are kept ignorant
(Common type)
3. Triple blind: participants , assessors and
researchers - all are ignorant
e.g. Experiments at Guantanamo Bay prison
Blinding (masking)
Blinded
Type
Participants Assessor Researcher
Single blind Yes No No
Double blind Yes Yes No
Triple blind Yes Yes Yes
Experimental study:
1.Clinical trial
2.Community intervention trial
3.Quasi experimental study
Clinical trial
It is a prospective and experimental study comparing
the effects of intervention in human subjects
Types:
a) controlled clinical trial
b) uncontrolled clinical trial
c) randomized controlled clinical trial
Controlled clinical trial
It is a clinical trial comparing the effects of
intervention in an experimental group against a
control group involving human subjects to
determine which of the intervention is of greatest
benefit
Placebo
Preparation identical in all respects to that given to the
treatment group except that it lacks the active
component
Look and taste must be equivalent to the active drug
used
Conditions to allow placebo as ethical:
No standard treatment is available
Existing treatment has doubtful efficacy
Existing treatment is rarely available to population at large
Patient is not benefited by standard treatment and there is no
second option
Patient refuses existing treatment and willing to be on
placebo
Patient suffers from minor disease
Test regime is an add on to existing regime
Types of control group (Contd.)
Concurrent control: independent control group is
generated along with intervention group who receive
placebo or equivalent intervention for the same period as
experimental group
Non-concurrent / historical / external control: results of
study by another researcher or same researcher done in the
past on an identical issue is used as control
Uncontrolled clinical trial
Clinical trial which evaluates the effects of
intervention in an interventional group of human
subjects without comparison with any control group
Here, control group design is not ethical or possible
e.g.
Ethically, placebo should not be given to a Ca-
breast patient. So, Ca-breast patient can not be a
control.
Randomized controlled clinical trial
Clinical trial comparing the effects of intervention in
an experimental group against a control group
following random allocation of participants to
interventional and control group
Randomization: random allocation of study subjects
in experimental or control groups by lottery technique
Features of randomized clinical trial
Interventional: intervention is done to manipulate the
outcome
Controlled: control group is present
Randomized: participants are allocated randomly in
interventional and control groups
Prospective, longitudinal, analytical type of study
Stages of randomized clinical trial
a) Enrollment: by randomized sampling followed by
inclusion and exclusion criteria
b) Allocation: by randomization into experimental and
control groups
c) Intervention: given to both groups
d) Follow up: to observe the outcome in both groups
e) Analysis: comparison of outcomes
Non-random
sampling
Inclusion criteria
Equivalent to diagnostic criteria
Strict criteria to identify target group for clinical trial
Avoids selection bias & misclassification of participants
e.g. For a study to be done on “Different dimensions of
foot of adult male Bangladeshis”- the inclusion criteria
will be,
a) 20-25 years of age (age confirmed by
national ID)
Exclusion criteria
Takes into account:
Confounding variable/s
Co-morbidities
Non-compliance
Refusal to participate
Exclusion criteria (Contd.)
e.g. For a study to be done on “Different dimensions of
foot of adult male Bangladeshis”- the exclusion
criteria will be,
a) known case of congenital or acquired
foot deformity
b) history of trauma to foot
Parallel design / Concurrent controlled CT
Participants are selected by RCT. Then, intervention will
be given to the experimental group & placebo will be
given to the control group.
Widely practiced.
Cross-over design / Concurrent & self-controlled
CT
If question arises that the effect in the experimental
group may not solely be due to the intervention rather
due to some other factors – then cross-over design is
employed.
Cross-over design (Contd.)
Community interventional trial
Intervention given at the community level, not at
individual level
Directed at a given group of patients with specific
conditions
Randomization done at communities, not at individual
levels
e.g. Impact of health education on EPI programme
Quasi experimental study
Qusai = Resembling
Looks like an experimental design but lacks the
key ingredient - random assignment
Typically allows the researcher to control the
assignment to the treatment condition using some
criterion other than random assignment, often by
convenience.
Quasi experimental study (Contd.)
e.g. If we study the effect of maternal alcohol use when the
mother is pregnant, we know that alcohol does harm
embryos.
A strict experimental design would include that mothers were
randomly assigned to drink alcohol. This would be
highly illegal because of the possible harm the study
might do to the embryos.
So what researchers do is to ask people how much alcohol
they used in their pregnancy and then assign them to
groups.
Quasi experimental study (Contd.)
regarded as unscientific and unreliable by
physical and biological scientists
very useful method for social scientists
Can test causal hypotheses
Identifies a comparison group that is as similar as
possible to the treatment group in terms of
baseline (pre-intervention) characteristics
Meta analysis (Study of studies)
It is a systematic review of several similarly designed,
small studies on a specific topic
Results of the studies are pooled, summarized and
statistically reanalyzed to get a simple, integrated summary
estimate
Not exactly a review article rather takes review article one
step further by using revised statistical procedures
Most commonly methods – Odds ratio (OR) &
Confidence interval (CI)
Advantages of meta analysis
Increases statistical power by increasing total sample
size
Answers questions not originally asked at the beginning
of the study
Resolves uncertainty when reports of similar study do
not agree
Observational vs. Experimental Study
1. Based on observation of 1. Based on observation
naturally occurring event of experimentally
created event
2. Nature affects the 2. Researcher intervenes
outcome to affect the outcome
3. Researcher measures 3. Researcher intervenes
and then measures
only, via observation
4. Ethical problem less 4. Ethical problem more
SHORT OVERVIEW OF STUDY DESIGN
CSS
Historical CS
Study design
Intervention
No Yes
Observational Experimental
Randomization
Comparison Yes No
Yes group No
RCT CT
Analytical Descriptive
PROBLEMS
1. Retrospective analysis of 5 yr survival of 200 Ca breast
patients operated in 2008
2. DUB & its effect on fertility: a retrospective analysis over 10 yrs.
3. Prostate volume & post operative outcome following TURP
4. Preoperative hs-CRP & post operative outcome in patients
treated by CABG
5. Plasma BNP & in hospital mortality following AMI
6. Epidemiological evaluation of anthrax
7. Risk factors of LBW
Study Design
1. Historical cohort study
2. Historical cohort study
3. Prognostic cohort study
4. Prognostic cohort study
5. Prognostic cohort study
6. Cross sectional study
7. Case control study or cross sectional study
SAMPLE & SAMPLING METHOD
Sample and Sampling methods
Population or universe :
entire group of study elements from which data are
collected
Sample :
part of population which represent the population
describing the characteristics of that population
Sampling unit:
every member of sample or unit chosen in selecting
sample
e.g.
Individuals
Geographical areas – state, district, village
Elementary/ Study/Observational unit:
an object or person on which measurement or observation is
made
Sampling units and Study units are sometimes
identical, sometimes different
e.g.
Study of the prevalence of malnutrition among preschoolers
( under 5 years children )
Here,
Sampling unit – may be village
Study unit - under 5 years children of selected
villages
Parameter
Summary value of population
It is always unknown but constant
Used to represent a certain population characteristics
e.g.
mean stature of all 1st year medical students of
Bangladesh
Statistic
Summary value of sample
Always known but inconstant/varying
e.g.
mean stature of 100 1st year medical students
Sampling :
Process of selection of a number of study units from a
defined study population
Sampling frame (Source list) :
Ordered list of sampling units in the population
e.g.
If 100 students are chosen as sample, then,
Sampling frame -100 students
Sampling unit - each student
Sampling Techniques
SAMPLING
TECHNIQUES
RANDOM NON-RANDOM
SAMPLING SAMPLING
Random Sampling
each sampling unit of total population will have
the equal chance to be included into the sample
based on random selection
types :
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Multistage sampling
Multiphase sampling
Simple Random Sampling
Simplest form of probability sampling
Sampling frame is a must
Sampling units are selected at random by lottery or by
random number table
Good for small, homogenous and easily available
population
Systematic Random Sampling
Sampling frame is needed but not a must
Sampling units are selected systematically (not randomly)
at constant regular intervals down the sampling frame
Only 1st sampling unit is selected randomly and then the
rest are selected at a fixed interval
Good for large, scattered, heterogeneous population
Stratified Random Sampling
Total population is divided into homogeneous, non-
overlapping strata by some relevant characteristics (e.g.
sex, age, religion, BMI, income etc.) which can influence
the outcome variables
Sampling frame of each strata is constructed and
sampling units are selected by simple or systematic
random sampling
Good for heterogeneous population
Stratified Random Sampling
Example :
“ Sexual dimorphism in the length of ring and index fingers in 100
medical students”
To make a sample of 20 medical students,
total students are divided into male and female strata
again divided into another strata according to religion
4 strata are generated: Muslim male, Muslim female, Hindu male, Hindu
female
by SRS/SyRS, from each strata, 5 medical students will be selected to
make the sample size 20
Cluster Sampling
Selection of groups of study units (clusters) instead of the
selection of study units individually
e.g.
“Nutritional status of medical students of medical colleges of
Bangladesh”
Cluster (Sampling unit) - all medical colleges of Bangladesh
Study unit- medical students
By simple random sampling some govt. medical colleges are
selected and then all medical students of the selected medical
colleges are included in the sample
Multistage Sampling
Sampling is done at different stages
Good when,
o sample frame is not available
o population is too large and widely dispersed
o population is heterogeneous
o study needs a wide area of coverage like community survey
MULTISTAGE SAMPLING Contd.
Example :
“Effectiveness of EPI programme on under 5 years
children in Bangladesh”
Total population is divided into 1st stage/primary
sampling unit and a sample of such unit is selected by
simple or systematic random sampling
MULTISTAGE SAMPLING Contd.
Each selected sampling unit is further subdivided into 2nd
stage/secondary sampling unit and then again a sample of
such unit is selected by simple or systematic random
sampling
The procedure continues until the desired stage is reached
MULTISTAGE SAMPLING Contd.
Country
Division
District
Upazila
Union
Village
Households
MULTIPHASE SAMPLING
Part of information is collected from a large sample
and additional information is collected from sub-
samples of whole sample either at the same time or at a
later stage
Differs from multistage sampling as it is concerned with
similar type of sampling unit at each phase rather than
different types of sampling units used in multistage
sampling
MULTIPHASE SAMPLING Contd.
Example :
“Prevalence of Dengue fever in a definite area”
In a large sample of population, suspected cases of dengue fever are
identified by signs and symptoms
suspected individuals are allowed to do platelet count
Individuals with low platelet count are detected
IgM test is done in all positive cases to finally identify dengue positive
patients
NON RANDOM SAMPLING
Each sampling unit of total population will not have
the equal chance to be included in the sample
Sampling units are selected by choice or by personal
judgment
Types :
Convenience sampling
Purposive sampling
Accidental sampling
Quota sampling
Snow ball sampling
CONVENIENCE SAMPLING
Sample includes people who are mostly available, easily
accessible and are conveniently selected
Subjects are selected in a haphazard manner as per
inclusion criteria based on their accessibility and
proximity to the researcher
Researcher has the freedom to choose whomever he/she
finds within the frame of inclusion criteria
ACCIDENTAL SAMPLING
Synonym - Incidental sampling
People assembled in one place with a common interest
are incidentally surveyed as a sample
e.g. People attending a seminar or workshop, people in
cinema hall or cricket match etc.
PURPOSIVE SAMPLING
Synonym- Judgment sampling
Researchers judgment is used to select the sample which
he/she thinks to be most typical of the population
◦ e.g. “Correlation of stature and measurements of
foot segments calculated from footprint and foot
outline of adult male Bangladeshi medical
students”
Researcher purposively & according to his/her
judgment selects few students and collects data from
them
QUOTA SAMPLING
Total population is divided into relatively
homogeneous , non- overlapping groups (quota)
Samples are taken from each quota non-randomly by
convenience or purposive sampling
Similar to stratified sampling but there is no
randomization
QUOTA SAMPLING Contd.
e.g.
“People’s opinion about the level of security in a
community”
Total population is divided into different quotas like
politicians, doctors, teachers, students, religious leaders etc.
and then appropriate subjects are selected purposively or
conveniently (not randomly)
For easy understanding:
‘Quota’ can be further divided into more
homogenous ‘Strata’.
e.g.
Quota= Doctor, Engineer etc.
Strata= Male doctor, Feale doctor, Male engineer, Female
engineer etc.
SNOWBALL SAMPLING
One eligible person is first identified and then that
person identifies other similar person(s) who is known
to him/her
Conducted in stages
Used for hard to find population e.g. drug addicted
persons, HIV and AIDS patients
DATA SUMMARIZATION
&
REDUCTION
Stages of research
Planning of research
Implementation or data collection
Data management
Data editing
Data reduction
Data presentation
Data analysis
Data interpretation &
Data inference / decision
Report writing
Dissemination / publication of the report
Data editing
Purpose:
complete the data by correcting the omissions
to check against illegal entries
to check for inconsistency of the data
to check for impossibility of the data
to discard the meaningless data
Data editing (Contd.)
Types:
Validation edits
Logical edits
Consistency edit
Range edit
Variance edit
Data summarization/ reduction
reduction of volume of raw data
manageable amount without compromising details
convenient presentation and analysis
meaningful impression/ summary information of data
Common methods of data summarization
Tabulation is the common method of data reduction or
summarization
Tabulation can be done in 3 ways:
1. Master table
2. Frequency table
3. Contingency table/ Cross table
Tables
displays data in numerical forms in the rows and
column
display large information in small space
provides a compact way of presenting large set
detailed information
Parts of table
Table number ( in Arabic numerical)
Title and (subtitle, if any)
Head note (if necessary)
Caption or column heading
Subs or row heading
Body
Foot note (if any)
Source (if not original data)
Table- (I)
Title :
Head note:
Caption/ Column heading
Subs/ Body
Row
heading
Foot note:
Source:
Table 4: Breadth of anterior and posterior mitral valve
leaflets in different age groups
Breadth (mm)
Age Anterior Posterior
group Mean ± SD Mean ± SD Probability
value
18-40 years 27.99 ± 3.92 41.23 ± 5.20 0.05*
n=36
41-64 years 33.68 ± 3.56 48.28 ± 3.72 0.001*
n= 30
≥ 65 years 35.33 ± 3.77 50.33 ± 5.27 2.45ns
n=4
P value : * (significant), ns (not significant)
General principle of table construction
Simplicity- not more than 3 variables
Clarity- head note/ foot note
Self explanatory- without textual any helps
Directness- only necessary data are included
Title- complete, clear, concise, to the point
Source- must be given if not primary data.
Format- by lines and spaces/ no ditto marks “-”
Master table
simple form of tabulation
shows distribution of observations across several
variables of interest
each observation is simultaneously cross classified
across variable
not intended for presentation but only a step
towards deriving various simple or summary table
from it
Master table (Contd.)
Variable Observations (number)
Male 150
Female 130
Muslim 140
Hindu 80
Christian 40
Buddies 20
Frequency table
frequency counts indicate the number of times a data
with particular characteristics occur in a data set
shows how frequently an event occurs
traditionally shows frequency counts against
continuous quantitative data
displays how many scores fall into particular division
of variable
Types of frequency distribution
Simple frequency distribution
Grouped frequency distribution
Relative frequency distribution
Cumulative frequency distribution
Cumulative relative frequency distribution
Frequency table (Exercise)
Examination marks of 30 students:
98,97,48,47,52,58,61,65,70,73,79,84,86,92,93,70,65,8
4,86,86,58,48,52,52,52,58,58,92,92,65,65,73
What to do ?
1st organize the data in ordered array from smallest to
largest as follows:
47, 48, 48, 52, 52, 52, 58,58,58,58, 61,65,65,65,65,
70,70,73,73,79,84,84, 86,86,92,92,92,93,97,98
Frequency table (Ungrouped)
Values Frequency (tally) Values Frequency (tally)
47 | 79 |
48 || 84 ||
52 ||| 86 ||
58 |||| 92 |||
61 | 93 |
65 |||| 97 |
70 || 98 |
73 ||
Frequency table (Grouped)
ungrouped frequency distribution will be grouped
for summarization
Class interval:
◦ small range of values into which data are
condensed and classified
Class frequency:
◦ number of values in each class
Class interval (CI)
Range of values into which data are condensed and
classified
Two types:
a) Inclusive type: upper limit included within the
relevant class (I D Good for discrete)
40-49
50-59
60-69
70-79
Class interval (CI)
b) Exclusive type: upper limit of one class is
excluded for the next class
40-50
50-60
60-70
70-80
not too large
2 things to remember
about CI not too narrow
E C Good for continuous quantitative
data
Other terms in frequency
Class frequency:
◦ Number of values to each class
Order of CI:
◦ Usually arranged from smallest to largest one
Class limit (CL):
◦ Two ends of each class are regarded as class limit
Class mark:
◦ Mid point of each class
Cumulative frequency:
Cumulative frequency of an observation is the
sum of all frequency up to that observation.
It is obtained by adding all frequency of previous
observation to that observation.
Useful to know how many values are less than or
more than a certain class.
Relative frequency:
It is the class frequency of a given value expressed
as a percentage of total frequency.
Class frequency
Relative frequency = X 100
Total frequency
e.g. 20 students (15 MD/MS and 5 M. Phil)
Here relative frequency of MD/MS = 15/20 x100= 75%
and relative frequency of M. Phil = 5/20 x 100= 25%
References
Hoque M. , 2014, Dhaka, Bangladesh abc of research
methodology & biostatistics
Carver R. H. & Nash J. D. , 2012, New Delhi, India Doing Data
Analysis with SPSS
Anderson D. R., Sweeney D. J. & Williams T. A. , 2011, New
Delhi, India Statistics for Business and Economics
Bajpai N. , 2010, New Delhi, India Business Statistics