0% found this document useful (0 votes)
37 views13 pages

Module 1 - Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

Module 1 - Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Lecture 1
 Welcome to the class !

 Samples and Populations

 An Overview of Study Designs


Lecture 1
 Types of Data
Introduction and Overview

Some May Say…


 “The world is in the midst of a data craze…..”

 “This is the era of BIG data”..


 Genomics
 Medical Informatics
 Medical Imaging
Section A: So, Why Do I Need Biostatistics in My Life?  Internet Usage

 “Data has never been more relevant”..

Statistics! Statistics!
“I keep saying that the sexy job in the next 10 years will be Headline from Harvard Business Review:
statisticians,” said Hal Varian, chief economist at Google. “And I’m
not kidding.”
Data Scientist: The Sexiest Job of the 21st Century1

Hal Varian , Google Chief Economist, August 2009

1 Davenport T, Patil D. Harvard Business Review (October 2012)

1
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Statistics! Data Are Everywhere!


Headline from New York Times:  Research results and data are certainly utilized and summarized in
the popular media
For Today’s Graduate, Just One Word: Statistics2
From The Baltimore Sun, 8/23/12:

Elmo makes apples more appealing to kids

“Kids took nearly twice as many apples when they had Elmo stickers
on them as when they didn’t, researchers from Cornell University
said in a letter in the August issue of the Archives of Pediatrics and
Adolescent Medicine.
2 Lohr S. New York Times (August 2009) 8

Data Are Everywhere! Data Are Everywhere!


 From The NY Times, 8/23/12:  From The Washington Post, August 5, 2009:

The Widespread Problem of Doctor Burnout DC to offer STD Tests in Every High School

“Analyzing questionnaires sent to more than 7,000 doctors, researchers “The program conducted last year at eight high schools found that 13
found that almost half complained of being emotionally exhausted, percent of about 3,000 students tested positive for an STD, mostly
feeling detached from their patients and work or suffering from a gonorrhea or chlamydia, according to the D.C. Department of
low sense of accomplishment. The researchers then compared the Health. “
doctors’ responses with those of nearly 3,500 people working in
other fields and found that even after adjusting for variables like
gender, age, number of hours worked and amount of education, the
doctors were still more likely to suffer from burnout.
9 10

Data Provides Information Steps in a Research Project


 Good Data Can Be Analyzed and Summarized to Provide Useful  Planning/Design of Study
Information
 Data collection
 Bad Data Can Be Analyzed and Summarized to Provide
Incorrect/Harmful/Non-informative Information
 Data analysis

 Presentation

 Interpretation

Biostatistics CAN play a role in each of these steps! (but sometimes


11 is only called upon for the data analysis part) 12

2
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Biostatistics Issues Biostatistics Issues


 Planning/Design of studies  Data Collection
 Primary Question(s) of Interest:
- Quantifying information about a single group?  Data Analysis
- Comparing multiple groups?  How best to summarize the information coming from the raw
 Sample size data
- How many subjects needed total?  Dealing with variability (both natural and sampling related):
- How many in each of the groups to be compared? - Important patterns in data are obscured by variability
 Selecting Study Participants - Distinguish real patterns from random variation
- Randomly chosen from “master list”?  Inference: using information from the single study coupled with
- Selected from a pool of interested persons? information about variability to make statement about the
- Take whoever shows up? larger population/process of interest: What statistical methods
 If group comparison of interest, how to assign to groups? are appropriate given the data collected?
13 14

Biostatistics Issues Statistical Reasoning 1: Goals


 Presentation  Summarization
 What summary measures will best convey the “main messages”
in the data about the primary (and secondary) research
 Measurement of Associations
questions of interest
 How to convey/ rectify uncertainty in estimates based on the
data  Interval Estimation and Statistical Inference

 Interpretation  Sample Size Considerations when Designing A Study


 What do the results mean in terms of practice, the program,
the population etc..?

15 16

Statistical Reasoning 2: Goals Universal Goals


 Adjustment  Throughout all of our endeavors the focus will be on

 Assessing Effect Modification (Statistical Interactions)  interpreting the results of statistical procedures correctly
 summarizes the results from published studies in an
understandable fashion
 Prediction Using Potentially Multiple Inputs  assessing the strengths and weaknesses of published research
results including:
 study design
 Linear, logistic and time-to-event regression
 clarity of the research question(s)
 appropriateness of the statistical methods
 clarity of the reported results
 appropriateness of the overall scientific/substantive
conclusions
17 18

3
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Learning Objectives
 Upon completing this lecture section you should be able to:
 Explain the difference between a population and sample (so far
as the terms are used in research)
 Give examples of populations, and of a corresponding sample
from a population
 Explain that characteristics of a randomly selected data sample
should imperfectly mimic the characteristics of the population
Section B: Samples and Populations from which the sample was taken
 Explain how non-random samples may differ systematically
from the populations from which they were taken

19 20

Population Versus Sample Random Sampling


 Sample : A subset (part) of a larger group (population) from which  For studies it is optimal if the sample which provides the data is
information is collected to learn about the larger group representative of the population under study
 For example, twenty-five 18-year-old male college students in  Certainly not always possible!
the United States
 For this term, we will make this assumption unless otherwise
specified
 Population: The entire group for which information is wanted
 For example, all 18-year-old male college students in the  One way of getting a representative sample: simple random
United States sampling
 A sampling scheme in which every possible subsample of size n
from a population is equally likely to be selected

21 22

Random Sampling Population Versus Random Sample


 If a sample is randomly selected from a population, the  Generally speaking, with research we want to learn truths in a
characteristics of the sample should (imperfectly) mimic those of population, but can only estimate these from an imperfect sample
the population of observations from the population
Population Random Sample
 How can a random sample be obtained? 20% M < 30 years
 First , a “master list” of the population must be enumerated 15% M ≥ 30 years
 Using a computer, a random subset of any size can be drawn 26% F< 30 years
from the population 39% F ≥ 30 years

23 24

4
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Population Versus Sample: Example 1 Population Versus (Random) Sample: Example 1


 Researchers wanted to learn about the pulmonary health in clinical  Researchers wanted to learn about the pulmonary health in clinical
population of men. There were able to sample 113 men from this population of men. There were able to sample 113 men from this
population, and measure the systolic blood pressure of each male in population, and measure the systolic blood pressure of each male in
the sample. the sample.

Sample mean= 123.6 mmHg

Sample sd = 12.9 mmHg

25 26

Population Versus (Random) Sample: Example 22 Population Versus Sample: Example 22


 Researchers wanted to characterize the risk of mother to infant HIV  Researchers wanted to characterize the risk of mother to infant HIV
transmission (within 18 months of birth). The researchers studied a transmission (within 18 months of birth). The researchers studied
183 births to 183 randomly sampled HIV+ women . 183 births to HIV+ women and found that 40 of the children tested
positive for HIV within 18 months, for a transmission percentage of
22%

2Connor E, et al. Reduction of Maternal-Infant Transmission of Human Immunodeficiency Virus Type 1


with Zidovudine Treatment. New England Journal of Medicine (1994). 331(18); 1173-1180

27 28

Other Types of (Non-Random) Samples Other Types of (Non-Random) Samples


 Other types of sampling may be necessary, but may also result in  Other types of sampling may be necessary, but may also result in
samples whose elements do not reflect he makeup of the samples whose elements do not reflect he makeup of the
populations of interest (bias) populations of interest
Population
 Voters (not registered, but those who will actually vote) in the
US Presidential Election 20% M < 30 years
 Intravenous Drug users in Chennai 15% M ≥ 30 years
 Patients with a Certain Disease 26% F< 30 years
 Homeless persons in Baltimore 39% F ≥ 30 years
 Men who have sex with men (MSM) in Malawi

29 30

5
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Other Types of (Non-Random) Samples Other Types of (Non-Random) Samples


 What kinds of sampling strategies can be employed that may/may  What kinds of other sampling strategies can be employed?
not result in a random sample?
 Intravenous Drug users in Chennai
 Voters (not registered, but those who will actually vote) in the  Homeless persons in Baltimore
US Presidential Election  Men who have sex with men (MSM) in Malawi

Random digit dialing Convenience Sampling

Respondent Driven Sampling

31 32

Other Types of (Non-Random) Samples Summary


 What kinds of other sampling strategies can be employed?  Generally speaking, with regards to pubic health and medical
research, not all elements of a population can be studied. As such, a
 Patients with a Certain Disease sample is taken from the population of interest.
 Random sampling is the best strategy for getting a sample
Random sample from a clinical/hospital population whose characteristics imperfectly mimic the population
 However.. random sampling is not always feasible: other
approaches can be used, and the sampling procedure needs to
be considered when applying the results from the sample to the
population

33 34

Learning Objectives
 At the end of this lecture section you will be able to:

 Describe the similarities and differences between the


randomized cohort, observational cohort, case-control and
observational cross-sectional studies
 Explain the major analytical challenge that comes from
comparing outcomes across groups where the group
Section C: Study Design Types membership has not been randomized
 Start to become aware of some of the major issues to consider
when making conclusions based on study results (ie: mapping
the statistics to the scientific/clinical/substantive)

35 36

6
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Common Study Design Types Common Study Design Types


 Prospective Cohort Studies  (Observational) Cross-Sectional Studies
 Randomized/controlled study design
 Observational (Cohort) Studies Everything assessed at the same point in time

Subjects are classified as to their exposure(s) status at study start,


and followed over time to see who develops outcome(s)

 Case/Control Studies

Subjects are chosen based on their outcome status, and the


exposure(s) that occurred prior to outcome are assessed

37 38

The Research Process The Prospective Research Process


 The goal of much research is to characterize differences in (sub)  The first step is to either:
populations  Get a representative sample from the general population under
 Does the live polio vaccine reduce the risk of contracting polio? study (A)
 How does the weights of Nepalese children less than a year old  Get representative samples from the populations under study
differ by sex? (ie: the groups to be compared) (B)
 Does calorie labeling on restaurant menus result in a reduction
in amount of calories consumed?
The second step (possibly) is to either assign the sample members to
 Does AZT reduce the risk of HIV transmission from mother to

the groups of interest (randomization) or to classify them based on
child? their “self-selected” membership (ex: current smokers vs non-
smoker)

 How the second step is done (or if first step A is employed)


39 determines the type of study being done 40

Randomized Trials (Prospective Cohort) Example 1: Salk Polio Vaccine Trials1


 Important for accounting for many kinds of biases  A very famous randomized trial

 Randomization, done correctly on a large number of subjects nearly 200, 745 Vaccinated for Polio
ensures that the only systematic difference in the groups being
compared is the exposure(s) of interest
≈ 400,000 School Children Randomized

201,229 Given a Placebo

1 Meier, P. The biggest public health experiment ever: The 1954 field trial of the Salk
poliomyelitis vaccine. In J. M. Tanur, F. Mosteller, W. H. Kruskal, R. F. Link, R. S. Pieters & G. R.
Rising. Statistics: A Guide To the Unknown (1972).
41 42

7
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

1954 Salk Polio Vaccine Trial Benefits of Randomization


 At the end of the follow-up period there were 82 cases in the  Randomization helps protect against self selection biases
vaccine group and 162 in the placebo group  Examples:
Males more likely to volunteer for placebo than females
 Subsequently analyses report slightly different numbers because Smokers less likely to be in exposed group
some false positives were discovered in each of the two groups Healthier persons sign up for the intervention

 The goal of randomization is to eliminate any systematic differences


in characteristics of subjects in each of the exposure groups under
study, save for the exposure itself

43 44

Example 2: Hormone Replacement Therapy2 Hormone Replacement Therapy


 Another very famous randomized trial  At the end of the follow-up period there were 163 CHD cases in the
therapy group and 122 in the placebo group
8,508 Given Therapy

16,608 Women

8,102 Given Placebo

2 The Women’s Health Initiative Study Group. Risks and Benefits of Estrogen Plus Progestin in
Health Postmenopausal Women: Principle Results from the Women’s Health Initiative Randomized
Controlled Trial. (2002) Journal of The American Medical Association . 288 (3) 321-333.
45 46

Randomization Is Not Always Possible! Non-Randomized Design: Observational Cohort Studies


 Unfortunately (at least for scientific purposes), you cannot always  Studies in which subjects “self-select” to be in exposure groups: i.e.
perform randomized trials!! subjects are not randomized

Smokers  Sometimes this is the only type of study that can be done

Random Assignment  Outcome/exposure relationships are of interest


 Sometimes difficult to directly assess because of selection bias
issues which may lead to systematic differences between the
Non-smokers
exposure groups other then the exposure of interest
 Examples:
Smokers more likely to drink alcohol.
Vegetarians more likely to exercise.
47 48

8
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Example 3: Needle Exchange and HIV Infection3 Needle Exchange


 New York City: relative risk of HIV infection for intravenous drug  Adjusted for the following . . .
users (IVDUS) by needle exchange program participation  Age, gender, race, frequency of injection

As per the authors:

“ Interpretation :We observed an individual-level protective effect


against HIV infection associated with participation in a syringe-
exchange programme. “

3Des Jarlais et al. HIV incidence among injecting drug users in New York City syringe-exchange
programmes. (1996) The Lancet. 348. 987-991.

49 50

Example 4: HPV Vaccine and Sexual Activity in Teens4 HPV Vaccine and Sexual Activity in Teens
 From the article abstract:  Results were adjusted for other characteristics of the teens
including “health care–seeking behavior and demographic
characteristics.”.
“RESULTS: The cohort included 1398 girls (493 HPV vaccine–
exposed;905 HPV vaccine–unexposed). … “

“CONCLUSIONS: HPV vaccination in the recommended ages was not


Associated with increased sexual activity–related outcome rates.”

4Bednarczyk et al. Sexual Activity–Related Outcomes After Human Papillomavirus


Vaccination of 11- to 12-Year-Olds. (2012) Pediatrics. 130 (5). 798-805.

51 52

Observational Cohort Studies Observational Studies


 Issues to consider in the analyses of observational studies:  Sometimes are performed to study results that will then be studied
with a follow-up randomized trial

53 54

9
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Example 5: Beta Carotene and Cancer5 Case-Control Studies: Retrospective Research Process
 “Abstract/Background. Observational studies suggest that people  In the previously discussed prospective cohort-studies (randomized
who consume more fruits and vegetables containing beta carotene and observational), the subjects had their exposure status assigned
have somewhat lower risks of cancer and cardiovascular disease, to them, or were selected and then the exposure status was
and earlier basic research suggested plausible mechanisms. Because classified: the outcome of interest was assessed over time, after the
large randomized trials of long duration were necessary to test this exposure had occurred
hypothesis directly, we conducted a trial of beta carotene
supplementation.”
 In situations in which researchers wish to study exposures associated
with rare outcomes, it is not necessarily feasible to do a prospective
 “Conclusions. In this (randomized) trial among healthy men, 12 cohort study. Such an approach would require a very large number
years of supplementation with beta carotene produced neither of enrollees in order to see any outcomes in the samples being
benefit nor harm in terms of the incidence of malignant neoplasms, compared
cardiovascular disease, or death from all causes.
5 Hennekens C, et al. Lack of Effect of Long-Term Supplementation with Beta Carotene on the
Incidence of Malignant Neoplasms and Cardiovascular Disease (1996). New England Journal of 55 56
Medicine.

Case-Control Studies Example 6: Smoking and Lung Cancer6


 A useful alternate approach to a cohort study in this scenario is a  Another landmark study in public heath/medicine
case-control study The selection of cases (lung cancer patients) and controls (persons
without lung cancer) is described in detail in the article.
 In this design, enrollees are selected on whether they have the
outcome or not (usually a rare disease), and then exposure(s) is
assessed

6 Doll R and Hill A. Smoking and Carcinoma of the Lung: Preliminary Report, (1950). British
Medical Journal. pps 739-748.

57 58

Example 6: Smoking and Lung Cancer Example 6: Smoking and Lung Cancer
 Another landmark study in public heath/medicine  Summary of findings, as per the authors

“Consideration has been given to the possibility that the results


“ The method of the investigation was as follows: Twenty could have been produced by the selection of an unsuitable
London hospitals were asked to co-operate by notifying all group of control patients, by patients with respiratory disease
patients admitted to them with carcinoma of the lung, ..” (and exaggerating their smoking habits, or by bias on the part of
several other cancers) the interviewers. Reasons are given for excluding all these
possibilities, and it is concluded that smoking is an important
“……for each lung-carcinoma patient visited at a hospital the factor in the cause of carcinoma of the lung.”
almoners were instructed to interview a patient of the same
sex, within the same five-year age group.”

59 60

10
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Case-Control Studies Cross-Sectional Observational Studies


 Issue to handle in the analyses of results from case-control studies  All information (exposures, outcomes) is assessed at same time
point
 Example: current smoking status, and current flu status

 Many times cross-sectional studies are done to estimate prevalence


(proportions of people with a given characteristic)

61 62

Example 7: Intimate Partner Violence (IPV) and SES7 Example 7: Intimate Partner Violence (IPV) and SES
 Analysis of a the British Crime Survey, a nationally-representative  Analysis of a the British Crime Survey, a nationally-representative
cross-sectional study from England, including men and women 16-59 cross-sectional study from England, including men and women 16-59
years old years old

Snippet from Abstract Results: Conclusion from Abstract :


“Lifetime IPV was reported by 23.8% of women and 11.5% of men.
Physical IPV was reported by 16.8% and 7.0%, respectively; “Conclusions. Physical and emotional IPV are very common among
emotional-only IPV was reported by 5.8% and 4.2%, respectively.” adults in England. Emotional IPV prevention policies may be
appropriate across the social spectrum; those for physical IPV should
be particularly accessible to disadvantaged women.”

7 Khalifeh H, et al. Intimate Partner Violence and Socioeconomic Deprivation in England: Findings
From a National Cross-Sectional Survey, (2013). American Journal of Public Health. 103(3): 462-
472. 63 64

Cross-Sectional Studies Summary


 Issue to handle in the analysis of results from cross-sectional studies  Types of study designs

 Issue with non-randomized studies

65 66

11
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Learning Objectives
 In this short lecture, a brief summary is given of the types of data
that frequently occur in research studies, and will be dealt with
analytically in this class (both terms)

 At the end of this lecture section, you should be able to:


 Distinguish between continuous, binary (and categorical) and
time to event data
Section D: Data Types
 Give examples of each of these aforementioned data types

67 68

Continuous Data Binary Data


 Continuous Data (incremental measurements)  Binary (dichotomous) data: takes on only two values, “yes” or “no”
 Blood pressure, mmHg
 Weight , lbs (kgs, oz etc..)  Binary (dichotomous) data (“Yes/no” data)
 Height, ft (cm, in etc..)  Polio :Yes/No
 Age, years (months)  Remission :Yes/No
 Income level, dollars/year (Euro by year, etc..)  Sex : Male/Female (or as yes/no, “is subject male?”)
 Quit Smoking: Yes/No
 A defining characteristic of continuous data is that a one unit  Etc..
change in the value means the same thing across the entire range of
data values

69 70

Categorical Data Time to Event Data


 Categorical data : an extension of binary data to include more than  Data that are a hybrid of continuous data and binary data
2 possible values  Whether an event occurs and time to the occurence (or time to
last follow-up without occurrence)
 Nominal categorical data: no inherent order to categories
 Race/ethnicity
 Country of birth
 Religious Affiliation

 Ordinal categorical data: order to categories


 Income level categorized into four categories, least to greatest
 Degree of agreement, five categories from strongly disagree to
strongly agree
71 72

12
Lecture 1 : Statistical Reasoning for Public Health: Estimation, Inference, & Interpretation

Different Statistics for Different Data Types Different Statistics for Different Data Types
 To compare blood pressures in a clinical trial evaluating two blood  To compare the proportion of polio cases in the two treatment arms
pressure-lowering medications, you could: of the Salk Polio vaccine, you could:
 Estimate the mean difference in blood pressure change (after-  Estimate the difference in proportions (risk difference) and
before) between the two treatment groups ratio of proportions (relative risk, risk ratio)
 Estimate a 95% confidence interval and/or use a t-test to test  Estimate 95% confidence intervals and/or use a chi-square test
for population level differences in the mean blood pressure to test for population level differences in these quantities
change

73 74

Different Statistics for Different Data Types


 To compare differences in time to contracting HIV between HIV
negative IV drug users in a needle exchange program and HIV
negative IV drug users not enrolled in a needle exchange program,
you could:
 Estimate an incidence rate ratio for contracting HIV that
compares these two groups
 Construct a Kaplan-Meier curve for each group to provide a
graphical description of the time to HIV profile for each group
 Estimate a 95% confidence interval for the incidence rate ratio
and/or use a log-rank to test for a population level

75

13

You might also like