Descriptive Data Analysis
Aliford Asoms Mpinganjira
Lecturer in Biostatistics
DEPARTMENT OF MATHEMATICAL SCIENCES
MALAWI UNIVERSITY OF BUSINESS AND APPLIED SCIENCES
Quantitative Data Analysis
In quantitative research, after collection of data, the
statistical analysis follows the following steps:
First step is to conduct descriptive analysis of the
variables
Second step is to conduct inferential statistics
2
Descriptive Data Analysis
Descriptive Data Analysis is the type of data analysis which
involves:
Giving the data summaries ( frequency distributions and graphs)
to help in visualization/exploring/describing/discussing/finding:
- outstanding issues( in terms of %/ proportions/probabilities)
- shape of distribution ( normal or skewed)
- trends/patterns/ relationships
Working out and interpreting measures of central
tendency( mean, mode and median)
Work out measures of location( percentiles, deciles and quartiles)
Working out measures of variability/spread/dispersion
3
Descriptive Data Analysis
4
Descriptive statistics involving
Categorical Data
Categorical data can be
summarized by
[Link] frequency distributions
[Link] charts
[Link] graphs
[Link] also called Contingency tables (
two/three way)
Categorical Frequency
Distribution
This distribution is used for data
that can be placed in specific
categories, such as nominal or
ordinal data.
e.g. Data on political affiliation,
religious affiliation, race and
gender are examples of categorical
data.
Example
Twenty-five graduate engineers were given a
blood test to determine their blood type. The
data set is
A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A
Construct a frequency distribution for the data.
Solution
Category Tally Frequenc
y
A //// 5
B //// // 7
AB //// 4
O //// //// 9
Interpretation: Most of the graduate engineers in the
population are of blood type O, with a frequency of 9
Cont…
Activity 1
Using variable race in the lowbirthweight data set, create and
interpret:
-Categorical relative frequency distribution
-pie chart
-bar graph
Try the same using any categorical variable in your data set
Steps:
analyze>descriptive stat>select var> frequencies/pie/bar under
chart
Cont..
Output
Interpretation? Note: Avoid re-writing the frequency table in words
Most of the women who gave birth in the population of interest( mention
it)are whites with a percentage of 50.79. Maybe this is because of….
Example of a pie
chart
Interpretation?
Example of a bar graph
Interepretation?
Note: the variable(likert scale) can be transformed to have just two
categories( dissatisfied and satisfied)
Comparing groups within groups
Example of a clustered bar graphs
Activity 2
• Using Lowbirthweight dataset generate a clustered bar graph of low
birthweight comparing smoking status and comment on it
Steps:
Graph> legacy diologs>bar>clustered > select variables>ok
• try the same with any two categorical variables in your data set
Cont…
Crosstabs( used to summarize two or three categorical variables
Activity 3
Using the Lowbirthweight data set, create cross tabs and
comment on them for the following variables:
1. Lowbirthweight and smoking status
2. Low birthweight, smoking status and race
Steps
Analyze> descriptive statistics> crosstab>selects variables>ok
Re-do 1 and instead of frequencies, find these probability
distributions: joint, marginal and conditional ( given smoking
status)
Cont…
Example of crosstab
Counts can be misleading sometimes, so its good to convert them into
proportions/probabilities
Cont…
Joint probability distribution
Interpretation:There is a higher chance(0.455) of finding a
woman who was not smoking during pregnancy and gave birth to
a child of birth weight> 2500g.
Cont…
Conditional probability distribution
Interpretation: Among smokers, there is a higher chance of finding a woman who gave
birth to a child of birth weight > 2500 than birth weight <2500, there is a higher chance
of giving birth to a child of birth weight < 2500 among smokers than non smokers.
Descriptive statistics involving
continuous data
Continuous data can be
summarized by
[Link] frequency distributions
[Link]
[Link] polygon
[Link]
Example of grouped frequency distribution
(amount of waste, in kgs, produced per household
per month)
Class Cumm
Class Limits Tally Freq
Boundaries ulative
24 - 30 23.5 - 30.5 /// 3 3
31 - 37 30.5 - 37.5 / 1 4
38 - 44 37.5 - 44.5 //// 5 9
45 - 51 44.5 - 51.5 //// //// 9 18
52 - 58 51.5 - 58.5 //// / 6 24
59 - 65 58.5 - 65.5 / 1 25
Total 25
Activity 4
In a study about distances students cover when going to school , the
following data were obtained:
30 50 33 70 81 49 61 35 19 80 25
10 40 35 30 30 61 80 40 56 62 24 60
44 80 20 90 30 70 40 10 50
• Construct a grouped cumulative frequency distribution with six classes
• comment on the constructed distribution
Example of a Histogram
Example of a Frequency
Polygon
Example of an Ogive
Activity 5
Using the variable age in the Lowbirth weight data set
create and interpret the following:
grouped frequency distribution, histogram, frequency
polygon, and ogive and interpret it.
Use a continues variable in your data set to create
grouped frequency distribution, histogram, frequency
polygon, and ogive
Descriptive statistics to
explore trends/relationships
• Scatter plots to explore relationships in
continuous data
• Historigram(time series graph) to
explore trends in time series data
• Example of scatter plot
A B C
Interpretation?
Activity 6
Using the Lowbirth weight data set create, fit a line of best
fit and interpret a scatter plot of age of the mother and
weight at last..
Steps
Graphs>chart bilder>scatter>drag variables into appropriate axises>ok>
double click the graph>right click>add fit line.
• Example of scatter plot
Interpretation
There is a positive linear relationship between age of the mother and
the weight of the mother
Example of time series
graph
Activity 7
Use any data set containing time series data and create
time series graph of the variable against time
Steps
Graphs>chart bilder>scatter>drag variables into appropriate( time in x-
axis and the other variable(s) in y-axis axises>ok> double click the graph>
add interpolation line from the menue bar)