12/26/2020
OBJECTIVES
Objectives:
Student will be able to :
DESCRIPTIVE DATA ANALYSIS Comprehend data and the summarization modalities.
Distinguish the source & method of data collection
By Rose Poni Gore Properly organize tables with one or more variables
Adequately prepare the following types of charts:
bar charts, pie charts, histogram
Describe when to use table, graphs, and chart
Identify types of variables and level of measurement
1 2
BEAUTIFUL HAPPY
HEALTHY
WITHOUT THE DOCUMENTATION, THE DATA CHILDREN
MAY BE OF LITTLE IF ANY VALUE (1995 NSFG)
12 11 12 2 6 7 7 7 8 9 9 4 4
11 4 14 3
11 4 4 11 7 5 20 3 10
11 7 10
6 3 7 7 3 5 10 10
2 4 5 11 21
11 11 11 3 10 10 4 10 7 3
3 4
1
12/26/2020
THE DATA MAY REPRESENT THE AGE OF M FOR MALE AND F FOR FEMALE …THEN MAKES
THE CHILDREN PLAYING AT BEACH SENSE
12 11 12 2 6 7 7 7 8 9 9 4 4
11 4 14 3
11 4 4 11 7 5 20 3 10
11 7 10
6 3 7 7 3 5 10 10
2 4 5 11 21
11 11 11 3 10 10 4 10 7 3
5 6
FEMALES & MALES/ F&M
DATA
Data are observations of random variables made on the
elements of a population or sample
Data are the quantities numbers or qualities attributes
measured or observed that are to be collected and/ or
analyzed
The word Data is plural, Datum is singular
A collection of Data is often called a data set singular
7 8
2
12/26/2020
Data Collection Methods DATA COLLECTION METHODS
Primary: Secondary:
where the investigator is the first to collect the data. where the data is collected by OTHERS, for other purposes
Sources include: that those of the current study.
medical examinations, Sources include:
Interviews, individual records (medical / employment);
observations, etc. group records (census data, vital statistics)
Merits: less measurement error, suits objectives of the study
better.
Disadvantage: costly, may not be feasible.
9 10
DATA PRESENTATION TABLES
One of the most important means of summarizing the data
Even quite small data sets are difficult to comprehend without from a single variable is to tabulate the frequency
some summarization of Statistical quantities such as the distribution of the variable.
mean and variance can be extremely helpful in summarizing
A frequency distribution simply tells how often a variable
data, but first we discuss tabular and graphical summaries takes on each of its possible values For quantitative variables
with many possible values the possible values are typically
binned or grouped
into intervals.
11 12
3
12/26/2020
TABLE e.g. TABLE
Here the relative frequency as a proportion is just
Relative frequency proportion where n sample size
The relative frequency as a percentage is Relative Frequency
percent %.
©2006 by Getu Degu and Tegbar Yigzaw
13 14
GRAPHS PIE CHART
BAR CHART, PIE CHART, OR HISTOGRAM
Frequency distributions can often be displayed effectively Example:
using graphical means such as the bar chart, pie chart, or blood group of 50
histogram students
Pie charts; are useful for displaying the relative frequency Group Students
distribution of a nominal variable Here is an example of the A 5
relative frequency distribution (Percentage). B 20
AB 10
O 15
15 16
4
12/26/2020
TABLE 1
GRAPHICAL SUMMARIES FOR DISCRETE VARIABLES
Graphical displays are very useful for summarizing data, and both
dichotomous and non-ordered categorical variables are best
summarized with bar charts.
The response options (e.g., yes/no, present/absent) are shown on the
horizontal axis and either the frequencies or relative frequencies are
plotted on the vertical axis; below is the tabular presentation in Table
and corresponding figure is a frequency bar chart which Note that for
dichotomous and categorical variables there should be a space in
between the response options.
The analogous graphical representation for an ordinal variable does not
have spaces between the bars in order to emphasize that there is an
inherent order. 17 18
CHART -2
A DISTINGUISHING FEATURE OF BAR CHARTS FOR NON-ORDERED & DICHOTOMOUS
FREQUENCY BAR CHART . 1 CATEGORICAL VARIABLES IS THAT THE BARS ARE SEPARATED BY SPACES
TO EMPHASIZE THAT THEY DESCRIBE NON-ORDERED CATEGORIES.
Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights19
Reserved. Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights20
Reserved.
5
12/26/2020
RELATIVE FREQUENCY BAR CHART - 3
DICHOTOMOUS VARIABLE
In contrast, chart 3 below illustrates a relative
frequency bar chart of the distribution of treatment
with antihypertensive medications.
This graphical representation of Dichotomous
Variable
Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights21
Reserved. Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights22
Reserved.
These bar charts -4 display the same relative
frequencies
Left exaggerated the differences Right minimized the differences
These bar charts display the same relative frequencies,
i.e., 31.8% and 37.7%. the bar chart on the right
minimizes the difference, because the vertical scale is
too expansive, ranging from 0 - 100%. While the bar
chart on the left visually exaggerates the difference,
because the vertical scale is too restrictive, ranging
from 30 - 40%.
23 24
6
12/26/2020
DICHOTOMOUS DATA HISTOGRAMS FOR ORDINAL VARIABLES
A specific subset of nominal level data is dichotomous data, When one is dealing with ordinal variables, however,
and data are nominal but have only two possible categories. the appropriate graphical format is a histogram. A
A common example either live or death. histogram is similar to a bar chart, except that the
Other dichotomous variables; are things that can be adjacent (head- to- head) bars abut one another in order
measured as (yes or no) / (on or off )/ or (present or absent). to reinforce the idea that the categories have an inherent
order.
A histogram depicts or illustrates the frequency distribution
of a quantitative random variable ( Continuous?)
25 26
TABLE 2 DISPLAYS ORDINAL VARIABLE Figure 5-1 / relative frequency Figure 5-2 frequency histogram
histogram for blood pressure for blood pressure
27 Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights28
Reserved.
7
12/26/2020
DATA ANALYSIS AND CAUSAL INFERENCE
STEPS IN DATA MANAGEMENT
The first thing we do with data is to manage them (note
that epidemiologists usually regard the word “data” as a
Data are observer notes, respondent answers, biochemical plural word, based on its Latin root; however, other fields
measurements, contents of medical records, machine readable often consider “data” to be singular). Since epidemiologic
datasets, and other kinds of information from which we attempt studies tend to have many – hundreds, thousands, or even
to derive meaning. So what does one do with them? millions – of observations and often tens or hundreds of
Analysis and interpretation of the data create the meaning that data items for each observation, managing epidemiologic
we ascribe or attribute to the data. data involves “mass production”. Therefore a systematic,
organized, professional approach is critical for detecting and
What does one do with them? avoiding problems with the data.
Created by Lisa Sullivan, Wayne W. La Morte, Boston University
School of Public Health ©2016. All Rights Reserved. 29 30
Steps in data management
• Design the data collection process
• Write down all data collection procedures DATA TYPES
• Train and supervise data collectors
• Monitor all data collection activities
• Document all data collection experiences Quantitative and Qualitative
• Keep track of, document, and safeguard data
31 32
8
12/26/2020
DATA SOURCES EXPERIMENTAL STUDIES
In an experiment the researcher
deliberately imposes a treatment on
Data arise from experimental or observational studies one or more subjects or experimental
and it is important to distinguish the two; units not necessarily human The
1 -- experimental studies experimenter then measures or
2--- observational studies observes the subjects response to the
treatment
Crucial element is an intervention
33 34
EXPERIMENTAL STUDIES EXPERIMENTAL STUDIES
Randomization an extremely E.g. suppose we randomize of migraine sufferers to
important aspect of experimental an active drug and the remaining to a placebo
design control treatment
Another important concept Experiment is blind if pills in the two treatment
especially in human groups look and taste identical and subjects are not
experimentation is blinding told which treatment they receive This guards
An experiment is blind if the against the placebo effect
subjects don't know which
treatment they receive 35 36
9
12/26/2020
EXPERIMENTAL STUDIES EXPERIMENTAL STUDIES
An experiment is double-blind if the researcher who Experiments have many advantages and are strongly
administers preferred when possible However experiments are
the treatments and measures the response does not know rarely feasible in public health epidemiology
which treatment is assigned In health sciences medicine experiments involving
Guards against experimenter effects humans are
Experimenter may behave differently toward the subjects in called clinical trials
the two groups or measure the response differently in the
two groups
37 38
TYPES OF OBSERVATIONAL STUDIES
OBSERVATIONAL STUDIES
No intervention Case studies or Case series
Data collected on an existing system A descriptive account of interesting characteristics e.g.
Less expensive symptoms observed in a single case subject with disease or in
a sample of cases
Easier logistically
Typically are unplanned and don't involve any research
More often ethically practical hypotheses No comparison group
Interventions often not possible Poor design but can generate research hypotheses for
subsequent investigation;
39 40
10
12/26/2020
ANALYTICAL STUDIES DATA PROCESSING
Case control study; Conducted Review, edit, and code data forms,
retrospectively by looking into documenting exceptions and actions
past Convert to electronic form
Two types of subjects included; “Clean” data – check for illegal or
cases subjects with the disease improbable values, combinations of
outcome of interest values
controls subjects without the Prepare summaries
disease outcome
41 42
DATA EXPLORATION STATISTICAL ANALYSIS PLAN
Examine the data – frequency distributions, The most important part of any research project is the planning
cross-tabulations, scatterplots – (be alert for process. This statement is as true for data analysis
surprises and suspicious findings, reviewing as for any of the other steps in the research process.
distributions for illegal or improbable values or The investigator should select the statistics to describe the
combinations of values, such as pregnant sample
males) and to analyze the data for each research question or
Examine means and prevalence for factors of hypothesis before initiating the study.
interest, overall and within interesting
subgroups
43 44
Look at associations, prevalence ratios,
11
12/26/2020
DESCRIPTIVE STATISTICS WILL DESCRIBE THESE
VARIABLES. DEPENDENT VARIABLE & INDEPENDENT VARIABLE
Investigators should plan to first describe their Is very important to differentiate between dependent and
sample. independent variables in order to look for causal explanation
They should identify the important demographic
Dependent variable: measure or explain the problem under
characteristics of the sample, such as;
study (disease effect) and is call outcome
sex, Independent variable: Explain the factor that is assume to
age, cause the disease or influences the problem (intervention or
treatment) and can be manipulated e.g. some patient can
race. receive treatment at varying dosages and also call predictors
45 46
TYPES OF VARIABLES VARIABLES
Variable types can be distinguished based on
their scale Independent Variable
= Dependent variable
Typically different statistical methods are
Cause/exposure =
appropriate for variables of different scales
Outcome /effect
47 48
12
12/26/2020
STATISTICAL ANALYSIS
Descriptive Analysis ………………………..(describing what is) DESCRIPTIVE STATISTICS
Inferential Analysis ………………………..(determining the likelihood
of a real difference being present in the population).
Descriptive statistics are numbers that summarize
the data with the purpose of describing what
occurred in the sample.
To select the most appropriate statistics, investigators need to know which type of
question????? they are asking and the level of measurement being used for the
variables. 49 50
51 52
13
12/26/2020
LEVEL OF MEASUREMENT ORDINAL LEVEL DATA
Level of measurement refers to the amount of information
contained within the data element Ordinal level data are one step up from nominal data.
Data elements are measured at the nominal (categorical), As
ordinal, interval, or ratio (continuous) level. the name implies, ordinal data have an inherent order.
The term nominal level (or categorical) data; Data values such as never, sometimes, often, and
refers to data that can only be put into groups. For example, always have order / rank.
the demographic data elements of race and religion are
measured at the nominal level.
53 54
INFERENTIAL STATISTICS
Descriptive statistics also can be used to compare inferential statistics are numbers that allow the
samples from one study with another. investigator to determine;
Descriptive statistics, also help researchers detect whether there are differences between two or more
sample characteristics that may influence their samples and
conclusions.
whether these differences are likely to be present in
the population of interest.
55 56
14
12/26/2020
JUST AN OVER VIEW, NO WORRIES…
57
15