0% found this document useful (0 votes)
37 views15 pages

Descriptive Data Analysis

Uploaded by

Ogak Jerry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views15 pages

Descriptive Data Analysis

Uploaded by

Ogak Jerry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

12/26/2020

OBJECTIVES

Objectives:
Student will be able to :
DESCRIPTIVE DATA ANALYSIS  Comprehend data and the summarization modalities.
 Distinguish the source & method of data collection
By Rose Poni Gore  Properly organize tables with one or more variables
 Adequately prepare the following types of charts:
bar charts, pie charts, histogram
 Describe when to use table, graphs, and chart
 Identify types of variables and level of measurement
1 2

BEAUTIFUL HAPPY
HEALTHY
WITHOUT THE DOCUMENTATION, THE DATA CHILDREN
MAY BE OF LITTLE IF ANY VALUE (1995 NSFG)

12 11 12 2 6 7 7 7 8 9 9 4 4
11 4 14 3
11 4 4 11 7 5 20 3 10
11 7 10
6 3 7 7 3 5 10 10
2 4 5 11 21
11 11 11 3 10 10 4 10 7 3

3 4

1
12/26/2020

THE DATA MAY REPRESENT THE AGE OF M FOR MALE AND F FOR FEMALE …THEN MAKES
THE CHILDREN PLAYING AT BEACH SENSE

12 11 12 2 6 7 7 7 8 9 9 4 4
11 4 14 3
11 4 4 11 7 5 20 3 10
11 7 10
6 3 7 7 3 5 10 10
2 4 5 11 21
11 11 11 3 10 10 4 10 7 3

5 6

FEMALES & MALES/ F&M


DATA

Data are observations of random variables made on the


elements of a population or sample
 Data are the quantities numbers or qualities attributes
measured or observed that are to be collected and/ or
analyzed
 The word Data is plural, Datum is singular
 A collection of Data is often called a data set singular

7 8

2
12/26/2020

Data Collection Methods DATA COLLECTION METHODS


 Primary: Secondary:
where the investigator is the first to collect the data. where the data is collected by OTHERS, for other purposes
 Sources include: that those of the current study.
 medical examinations, Sources include:
 Interviews,  individual records (medical / employment);
 observations, etc.  group records (census data, vital statistics)
 Merits: less measurement error, suits objectives of the study
better.
 Disadvantage: costly, may not be feasible.
9 10

DATA PRESENTATION TABLES

One of the most important means of summarizing the data


Even quite small data sets are difficult to comprehend without from a single variable is to tabulate the frequency
some summarization of Statistical quantities such as the distribution of the variable.
mean and variance can be extremely helpful in summarizing
 A frequency distribution simply tells how often a variable
data, but first we discuss tabular and graphical summaries takes on each of its possible values For quantitative variables
with many possible values the possible values are typically
binned or grouped
into intervals.
11 12

3
12/26/2020

TABLE e.g. TABLE

Here the relative frequency as a proportion is just


Relative frequency proportion where n sample size
 The relative frequency as a percentage is Relative Frequency
percent %.

©2006 by Getu Degu and Tegbar Yigzaw


13 14

GRAPHS PIE CHART


BAR CHART, PIE CHART, OR HISTOGRAM
Frequency distributions can often be displayed effectively Example:
using graphical means such as the bar chart, pie chart, or blood group of 50
histogram students
 Pie charts; are useful for displaying the relative frequency Group Students
distribution of a nominal variable Here is an example of the A 5
relative frequency distribution (Percentage). B 20
AB 10
O 15
15 16

4
12/26/2020

TABLE 1
GRAPHICAL SUMMARIES FOR DISCRETE VARIABLES

 Graphical displays are very useful for summarizing data, and both
dichotomous and non-ordered categorical variables are best
summarized with bar charts.
 The response options (e.g., yes/no, present/absent) are shown on the
horizontal axis and either the frequencies or relative frequencies are
plotted on the vertical axis; below is the tabular presentation in Table
and corresponding figure is a frequency bar chart which Note that for
dichotomous and categorical variables there should be a space in
between the response options.
 The analogous graphical representation for an ordinal variable does not
have spaces between the bars in order to emphasize that there is an
inherent order. 17 18

CHART -2
A DISTINGUISHING FEATURE OF BAR CHARTS FOR NON-ORDERED & DICHOTOMOUS
FREQUENCY BAR CHART . 1 CATEGORICAL VARIABLES IS THAT THE BARS ARE SEPARATED BY SPACES
TO EMPHASIZE THAT THEY DESCRIBE NON-ORDERED CATEGORIES.

Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights19
Reserved. Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights20
Reserved.

5
12/26/2020

RELATIVE FREQUENCY BAR CHART - 3


DICHOTOMOUS VARIABLE

 In contrast, chart 3 below illustrates a relative


frequency bar chart of the distribution of treatment
with antihypertensive medications.
 This graphical representation of Dichotomous
Variable

Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights21
Reserved. Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights22
Reserved.

These bar charts -4 display the same relative


frequencies

Left exaggerated the differences Right minimized the differences


 These bar charts display the same relative frequencies,
i.e., 31.8% and 37.7%. the bar chart on the right
minimizes the difference, because the vertical scale is
too expansive, ranging from 0 - 100%. While the bar
chart on the left visually exaggerates the difference,
because the vertical scale is too restrictive, ranging
from 30 - 40%.

23 24

6
12/26/2020

DICHOTOMOUS DATA HISTOGRAMS FOR ORDINAL VARIABLES

 A specific subset of nominal level data is dichotomous data, When one is dealing with ordinal variables, however,
and data are nominal but have only two possible categories. the appropriate graphical format is a histogram. A
A common example either live or death. histogram is similar to a bar chart, except that the
 Other dichotomous variables; are things that can be adjacent (head- to- head) bars abut one another in order
measured as (yes or no) / (on or off )/ or (present or absent). to reinforce the idea that the categories have an inherent
order.
A histogram depicts or illustrates the frequency distribution
of a quantitative random variable ( Continuous?)
25 26

TABLE 2 DISPLAYS ORDINAL VARIABLE Figure 5-1 / relative frequency Figure 5-2 frequency histogram
histogram for blood pressure for blood pressure

27 Created by Lisa Sullivan, Wayne W. La Morte, Boston University School of Public Health ©2016. All Rights28
Reserved.

7
12/26/2020

DATA ANALYSIS AND CAUSAL INFERENCE


STEPS IN DATA MANAGEMENT
 The first thing we do with data is to manage them (note
that epidemiologists usually regard the word “data” as a
 Data are observer notes, respondent answers, biochemical plural word, based on its Latin root; however, other fields
measurements, contents of medical records, machine readable often consider “data” to be singular). Since epidemiologic
datasets, and other kinds of information from which we attempt studies tend to have many – hundreds, thousands, or even
to derive meaning. So what does one do with them? millions – of observations and often tens or hundreds of
 Analysis and interpretation of the data create the meaning that data items for each observation, managing epidemiologic
we ascribe or attribute to the data. data involves “mass production”. Therefore a systematic,
organized, professional approach is critical for detecting and
 What does one do with them? avoiding problems with the data.
Created by Lisa Sullivan, Wayne W. La Morte, Boston University
School of Public Health ©2016. All Rights Reserved. 29 30

Steps in data management

• Design the data collection process


• Write down all data collection procedures DATA TYPES
• Train and supervise data collectors
• Monitor all data collection activities
• Document all data collection experiences Quantitative and Qualitative
• Keep track of, document, and safeguard data

31 32

8
12/26/2020

DATA SOURCES EXPERIMENTAL STUDIES

In an experiment the researcher


deliberately imposes a treatment on
Data arise from experimental or observational studies one or more subjects or experimental
and it is important to distinguish the two; units not necessarily human The
 1 -- experimental studies experimenter then measures or
 2--- observational studies observes the subjects response to the
treatment
 Crucial element is an intervention
33 34

EXPERIMENTAL STUDIES EXPERIMENTAL STUDIES

Randomization an extremely E.g. suppose we randomize of migraine sufferers to


important aspect of experimental an active drug and the remaining to a placebo
design control treatment
Another important concept Experiment is blind if pills in the two treatment
especially in human groups look and taste identical and subjects are not
experimentation is blinding told which treatment they receive This guards
 An experiment is blind if the against the placebo effect
subjects don't know which
treatment they receive 35 36

9
12/26/2020

EXPERIMENTAL STUDIES EXPERIMENTAL STUDIES

An experiment is double-blind if the researcher who Experiments have many advantages and are strongly
administers preferred when possible However experiments are
the treatments and measures the response does not know rarely feasible in public health epidemiology
which treatment is assigned  In health sciences medicine experiments involving
 Guards against experimenter effects humans are
Experimenter may behave differently toward the subjects in called clinical trials
the two groups or measure the response differently in the
two groups
37 38

TYPES OF OBSERVATIONAL STUDIES


OBSERVATIONAL STUDIES

No intervention Case studies or Case series


 Data collected on an existing system  A descriptive account of interesting characteristics e.g.
 Less expensive symptoms observed in a single case subject with disease or in
a sample of cases
 Easier logistically
 Typically are unplanned and don't involve any research
 More often ethically practical hypotheses No comparison group
 Interventions often not possible  Poor design but can generate research hypotheses for
subsequent investigation;
39 40

10
12/26/2020

ANALYTICAL STUDIES DATA PROCESSING

Case control study; Conducted Review, edit, and code data forms,
retrospectively by looking into documenting exceptions and actions
past Convert to electronic form
 Two types of subjects included; “Clean” data – check for illegal or
cases subjects with the disease improbable values, combinations of
outcome of interest values
controls subjects without the Prepare summaries
disease outcome
41 42

DATA EXPLORATION STATISTICAL ANALYSIS PLAN

Examine the data – frequency distributions,  The most important part of any research project is the planning
cross-tabulations, scatterplots – (be alert for process. This statement is as true for data analysis
surprises and suspicious findings, reviewing  as for any of the other steps in the research process.
distributions for illegal or improbable values or  The investigator should select the statistics to describe the
combinations of values, such as pregnant sample
males)  and to analyze the data for each research question or
Examine means and prevalence for factors of  hypothesis before initiating the study.
interest, overall and within interesting
subgroups
43 44
Look at associations, prevalence ratios,
11
12/26/2020

DESCRIPTIVE STATISTICS WILL DESCRIBE THESE


VARIABLES. DEPENDENT VARIABLE & INDEPENDENT VARIABLE

Investigators should plan to first describe their Is very important to differentiate between dependent and
sample. independent variables in order to look for causal explanation
They should identify the important demographic
Dependent variable: measure or explain the problem under
characteristics of the sample, such as;
study (disease effect) and is call outcome
sex, Independent variable: Explain the factor that is assume to
age, cause the disease or influences the problem (intervention or
treatment) and can be manipulated e.g. some patient can
 race. receive treatment at varying dosages and also call predictors
45 46

TYPES OF VARIABLES VARIABLES

Variable types can be distinguished based on


their scale Independent Variable
= Dependent variable
Typically different statistical methods are
Cause/exposure =
appropriate for variables of different scales
Outcome /effect

47 48

12
12/26/2020

STATISTICAL ANALYSIS
Descriptive Analysis ………………………..(describing what is) DESCRIPTIVE STATISTICS
Inferential Analysis ………………………..(determining the likelihood
of a real difference being present in the population).
Descriptive statistics are numbers that summarize
the data with the purpose of describing what
occurred in the sample.

 To select the most appropriate statistics, investigators need to know which type of
question????? they are asking and the level of measurement being used for the
variables. 49 50

51 52

13
12/26/2020

LEVEL OF MEASUREMENT ORDINAL LEVEL DATA

 Level of measurement refers to the amount of information


contained within the data element Ordinal level data are one step up from nominal data.
 Data elements are measured at the nominal (categorical), As
ordinal, interval, or ratio (continuous) level. the name implies, ordinal data have an inherent order.
 The term nominal level (or categorical) data;  Data values such as never, sometimes, often, and
 refers to data that can only be put into groups. For example, always have order / rank.
 the demographic data elements of race and religion are
measured at the nominal level.
53 54

INFERENTIAL STATISTICS

Descriptive statistics also can be used to compare inferential statistics are numbers that allow the
samples from one study with another. investigator to determine;
Descriptive statistics, also help researchers detect  whether there are differences between two or more
sample characteristics that may influence their samples and
conclusions.
 whether these differences are likely to be present in
the population of interest.

55 56

14
12/26/2020

JUST AN OVER VIEW, NO WORRIES…

57

15

You might also like