0% found this document useful (0 votes)

75 views27 pages

Statistics I Chapter 2: Univariate Data Analysis

This chapter discusses univariate data analysis through graphical displays and numerical measures. It introduces graphical displays such as histograms, boxplots, barcharts and piecharts to represent categorical and numerical data. Numerical measures described include measures of central tendency (mean, median, mode), variation (standard deviation, variance, range, interquartile range), and other descriptive statistics (quartiles, percentiles). The chapter recommends readings and provides examples of calculating and interpreting these graphical displays and numerical measures.

Uploaded by

Dani Amuza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views27 pages

Statistics I Chapter 2: Univariate Data Analysis

Uploaded by

Dani Amuza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Statistics I

Chapter 2: Univariate data analysis

Contents
I

Graphical displays for categorical data (barchart, piechart)

Graphical displays for numerical data data (histogram, polygon,

boxplot)
Numerical measures to describe:

I
I

central tendency (mean, median, mode)

variation (variance, standard deviation, quasi-variance and
quasi-standard-deviation, range, IQR, coefficient of variation)
others (quartiles, percentiles)

Chapter 2: Univariate data analysis

Newbold, P. Estadstica para los Negocios y la Economa (2009)

Chapter 2

Graphical presentation of data

Once we have a frequency distribution of the data, the following

graphical displays can be obtained:
Categorical

piechart
barchart

Numerical

histogram
polygon
boxplot

Graphs for qualitative data: piechart

Example 1: The frequency table below corresponds to the data

representing blood types reported for a sample of 40 individuals.

Class
A
B
AB
O
Total

Absolute
Frequency
12
11
8
9
40

Relative
Frequency
0.300
0.275
0.200
0.225
1

Piechart
Example 1 cont.:
I Each slice is a fraction of the total size of the pie
I Many softwares rank slices alphabetically
I Although pretty harder to read than barcharts
I Avoid 3D piecharts, for those the area in the background seems to
be smaller than the area in the foreground

O 22.5%

B 27.5%

A 30%

AB 20%

Graphs for qualitative data: barchart

Example 2: The frequency table below corresponds to levels of

satisfaction for 901 employees.

Class
VU
U
S
VS
Total

Absolute
Frequency
62
108
319
412
901

Relative
Frequency
0.07
0.12
0.35
0.46
1

Cumulative
Absolute
Frequency
62
170
489
901

Cumulative
Relative
Frequency
0.07
0.19
0.54
1

Barchart

200
100
0

FREQUENCY

300

400

Example 2 cont.:
I Bars are of the same width and equally-spaced, with the heights
corresponding to the frequencies
I There are gaps between the bars
I Bars are labeled with class names
I Many softwares rank bars alphabetically

Barchart

12
10
8
6
4
2
0

Barcharts can also be constructed for discrete data if there are not
too many values
This is a barchart for Example 3 of Ch.1 where we looked at the
number of leaves attacked by a pest for a sample of 50 plants

FREQUENCY

Graphs for quantitative data: histogram and polygon

Example: 4 The frequency distribution of the daily high temperature (in

Fahrenheit) reported on 20 winter days is as follows:
Class Interval
[10, 20)
[20, 30)
[30, 40)
[40, 50)
[50, 60)
Total

Midpoint
15
25
35
45
15

ni
3
6
5
4
2
20

fi
0.15
0.30
0.25
0.20
0.10
1

Ni
3
9
14
18
20

Fi
0.15
0.45
0.70
0.90
1

Histogram and polygon

Polygon

There are no gaps between the bars/bins

Bin widths = widths of class intervals (identical), class boundaries
are marked on the horizontal axis
Bin heights = frequencies (here, absolute)
Bin areas are proportional to the frequencies

FREQUENCIES

TEMP (F)

Histogram with area of 1 (on a density scale)

0.030
0.020

Bin widths = widths of class intervals (not necessarily identical)

Bin heights = li lfii1
Bin areas = fi
TOTAL AREA = 1

0.010

0.000

TEMP (F)

Describing data numerically

Variation

Center

mean
median
mode

New notation:

n
X

range
interquartile range
variance
standard deviation
coeff. of variation

Others

quartiles
percentiles

xi = x1 + x2 + . . . + xn

i=1

P
( : sum, i = 1: the lower limit, n: the upper limit, xi : example of a
formula depending on i)
Example:
3
X
i 2 = (1)2 + 02 + 12 + 22 + 32 = 15
i=1

Central tendency: (arithmetic) mean

The most common measure of central tendency

Population mean
PN
=

Sample mean

N
Pn

x =
I

i=1

x1 + . . . + xN
N

x1 + . . . + xn
n

If a, b (b 6= 0) are real numbers and y = a + bx, then

y = a + b
x

Affected by extreme values (outliers)

Example: X : 3, 1, 5, 4, 2,
x =

Y : 3, 1, 5, 4, 200

3+1+5+4+2
=3
5

y =

3 + 1 + 5 + 4 + 200
= 42.6!
5

Central tendency: median

In the ordered list, the median M is the middle number

x((n+1)/2)
if n odd (the middle number)
M=
x(n/2) +x(n/2+1)
if
n even (the average of the two middle numbers)
2
(x(1) , x(2) , . . . , x(n) means that the observations are ranked in increasing
order, eg. x(1) = xmin , x(n) = xmax )

Not affected by outliers

Example: Given observations 3, 1, 5, 4, 2 (n = 5), first rank the data

1,2, 3 ,4,5, then identify the middle number(s)

M = x((5+1)/2) =

3rd smallest
z}|{
x(3)
=3

Example: Given observations 3, 1, 5, 4, 2, 0 (n = 6), first rank the data

0,1, 2,3 ,4,5, then identify the middle number(s)

x(6/2) + x(6/2+1)
2

the average of 3rd and 4th

z }| {
x(3) + x(4)
2+3
=
=
= 2.5
2
2

Central tendency: mode

The value that occurs most often

Not affected by outliers

Used for either numerical or categorical data

There may be no mode, there may be several modes

Example: Given observations 3, 1, 5, 4, 2, there is no mode

Example: Given observations 3, 1, 5, 4, 2, 1, the mode is 1

Shape: comparing mean and median

Three types of distributions:
I

Skewed to the left Mean < Median

Symmetric Mean = Median

Skewed to the right Median < Mean

LEFTSKEWED

x<M

SYMMETRIC

x=M

RIGHTSKEWED

M<x

Note: The distribution in the middle is known as bell-shaped or normal

Variation: range and interquartile range (IQR)

Range is the simplest measure of variation

R = xmax xmin

Ignores the way the data is distributed

Sensitive to outliers

Example: Given observations 3, 1, 5, 4, 2, R = 5 1 = 4

Example: Given observations 3, 1, 5, 4, 100, R = 100 1 = 99
I

Interquartile range (IQR) can eliminate some outlier problems.

Eliminate high and low observations and calculate the range of the
middle 50% of the data
IQR = 3rd quartile 1st quartile = Q3 Q1

Variation: Interquartile range and boxplot

Outliers are observations that fall

I
I

below the value of Q1 1.5 IQR

above the value of Q3 + 1.5 IQR

For extreme outliers, replace 1.5 by 3 in the above definition

xmin

25%

MEDIAN
(Q2)

25%

xmax

25%

31
IQR=18

25%

Quartiles and percentiles

Quartiles split the ranked data into four segments with an equal number
of values per segment

The first quartile Q1 has position 14 (n + 1)

The second quartile Q2 (= median) has position 12 (n + 1)

The third quartile Q3 has position 34 (n + 1)

Example: Given observations 22, 18, 17, 16, 16, 13, 12, 21, 11 (n = 9), first rank
the data 11, 12, 13, 16, 16 , 17, 18, 21, 22, then identify the positions
Q1 = x(2.5) = x(3) = 12

Q2 = 16

Q3 = x(7.5) = x(8) = 21

pth percentile, p = 1, 2, . . . , 99, Pk = x(k(n+1)/100) .

Example cont.: 60th percentile = x(60(9+1)/100) = x(6) = 17

Measure of variation: variance

Average of squared deviations of values from the mean

Population variance
2 =

i=1

(xi )2
N

Sample variance

2 =

faster to calculate
}|
{
zP
n
2
x )2
)
i=1 xi n(
i=1 (xi x
=
n
n

divided by n

Sample quasi-variance (corrected sample variance)

Pn
Pn
2
)2
x )2
i=1 (xi x
i=1 xi n(
s2 =
=
divided by n 1
n1
n1

They are related via

2 =

n1 2
s
n

If a, b (b 6= 0) are real numbers and y = a + bx, then sy2 = b 2 sx2

Measure of variation: standard deviation (SD)

I
I

The most-commonly used measure of spread

Population standard deviation, sample standard deviation and
sample quasi-standard deviation are respectively

= 2

=
2
s = s2

Shows variation about the mean

Has the same units as the original data, whilst variance is in units2

Variance and SD are both affected by outliers

Calculating variance and standard deviation

Example: X : 11, 12, 13, 16, 16, 17, 18, 21, Y : 14, 15, 15, 15, 16, 16, 16, 17,
Z : 11, 11, 11, 12, 19, 20, 20, 20
x =

124
= 15.5
8
n
X
i=1
n
X

y =

124
= 15.5
8

z =

124
= 15.5
8

xi2 = 112 + 122 + . . . + 212 = 2000

yi2 = 142 + 152 + . . . + 172 = 1928

i=1
n
X

zi2 = 112 + 112 + . . . + 202 = 2068

i=1

sx2

Pn
=

i=1

xi2 n(
x )2
2000 8(15.5)2
78
=
=
= 11.1429 sx = 3.3381
n1
81
7
1928 8(15.5)2
6
sy2 =
= = 0.8571 sy = 0.9258
81
7
2
2068

8(15.5)
146
sz2 =
=
= 20.8571 sz = 4.5670
81
7

Comparing standard deviations

Example cont.: X : 11, 12, 13, 16, 16, 17, 18, 21,
Y : 14, 15, 15, 15, 16, 16, 16, 17, Z : 11, 11, 11, 12, 19, 20, 20, 20
x = 15.5 sx = 3.3

y = 15.5 sy = 0.9

z = 15.5 sz = 4.6

Numerical summaries and frequency tables. Standarization.

If the data is discrete then

Pk
i=1 xi ni
x =
n

and

s2 =

i=1

xi2 ni n
x2
n1

If the data is continuous, we replace xi in the above difinition, by the

mid-points of class intervals

To standardize variable x means to calculate

x x
s

If you apply this formula to all observations x1 , . . . , xn and call the

transformed ones z1 , . . . , zn , then the mean of the zs is zero with the
standard deviation of one

Standarization = finding z-score

Empirical rule
If the data is bell-shaped (normal), that is, symmetric and with light
tails, the following rule holds:
I

68% of the data are in (

x 1s, x + 1s)

95% of the data are in (

x 2s, x + 2s)

99.7% of the data are in (

x 3s, x + 3s)

Note: This rule is also known as 68-95-99.7 rule

Example: We know that for a sample of 100 observations, the mean is
40 and the quasi-standard deviation is 5. Assuming that the data is
bell-shaped, give the limits of an interval that captures 95% of the
observations.
95% of xi s are in: (
x 2s) = (40 2(5)) = (30, 50)

Measure of variation: coefficient of variation (CV)

Measures relative variation and is defined as

CV =

s
|
x|

Is a unitless number (sometimes given in %s)

Shows variation relative to mean

Example: Stock A: Average price last year = 50, Standard deviation = 5

Stock B: Average price last year = 100, Standard deviation = 5
5
5
= 0.10 CVB =
= 0.05
50
100
Both stocks have the same SDs, but stock B is less variable relative to its
mean price
CVA =

Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Lecture 06-Describing Data Visual Information
No ratings yet
Lecture 06-Describing Data Visual Information
49 pages
Actuary Math - Stat. Lec1-9
No ratings yet
Actuary Math - Stat. Lec1-9
22 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
Descriptive Statistics and EDA Overview
No ratings yet
Descriptive Statistics and EDA Overview
36 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
RMBS BPT402
No ratings yet
RMBS BPT402
103 pages
Statistical Data
No ratings yet
Statistical Data
41 pages
Measures of Central Tendency and Spread: Chapter 1, Section 2
No ratings yet
Measures of Central Tendency and Spread: Chapter 1, Section 2
36 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
4 pages
Understanding Statistics: Concepts & Applications
No ratings yet
Understanding Statistics: Concepts & Applications
35 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
DSILYTC Session 5 - Descriptive Statistics
No ratings yet
DSILYTC Session 5 - Descriptive Statistics
99 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
50 pages
Statistics for Business Analysis
No ratings yet
Statistics for Business Analysis
29 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
42 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability
No ratings yet
Module 2 - Exploratory Data Analysis (EDA) : Central Tendency and Variability
56 pages
Understanding Measures of Dispersion
No ratings yet
Understanding Measures of Dispersion
7 pages
Intro to Descriptive Statistics
No ratings yet
Intro to Descriptive Statistics
68 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
Week 6+7+8
No ratings yet
Week 6+7+8
37 pages
Math264 Numerical Measures Apaydın
No ratings yet
Math264 Numerical Measures Apaydın
64 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Measures of Location and VARIATION For 1 Variable
No ratings yet
Measures of Location and VARIATION For 1 Variable
44 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Central Tendency and Variability Analysis
No ratings yet
Central Tendency and Variability Analysis
43 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
SLIDES - Statistics-Descriptive Statistics
No ratings yet
SLIDES - Statistics-Descriptive Statistics
25 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
53 pages
Spring Semester, 2020-2021
No ratings yet
Spring Semester, 2020-2021
40 pages
Central Tendency & Variation Measures
No ratings yet
Central Tendency & Variation Measures
3 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
2 Descriptives
No ratings yet
2 Descriptives
43 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
MS Excel in Data Analytics
No ratings yet
MS Excel in Data Analytics
56 pages
Stats
No ratings yet
Stats
109 pages
02 - Descriptive Statistics
No ratings yet
02 - Descriptive Statistics
45 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
Lecture 2b - Describing Data-Numerical
No ratings yet
Lecture 2b - Describing Data-Numerical
47 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
Unit 3 Measure of Central Location
No ratings yet
Unit 3 Measure of Central Location
29 pages
Chapter 3 Numerical Descriptive Measures
No ratings yet
Chapter 3 Numerical Descriptive Measures
63 pages
Standard Deviation
No ratings yet
Standard Deviation
37 pages
Descriptive Statistics in Biostatistics
No ratings yet
Descriptive Statistics in Biostatistics
35 pages
Statistics for Students
No ratings yet
Statistics for Students
1 page
ch2 (Descriptive Statistics)
No ratings yet
ch2 (Descriptive Statistics)
18 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Exploring Numerical Data - Students
No ratings yet
Exploring Numerical Data - Students
97 pages
2a. Describing Variables With Numbers
No ratings yet
2a. Describing Variables With Numbers
30 pages
Bus. Statt. Chapter-Lecture 2+3
No ratings yet
Bus. Statt. Chapter-Lecture 2+3
43 pages
SEO Strategies for Effective Content
No ratings yet
SEO Strategies for Effective Content
8 pages
2007 Diederichs
100% (1)
2007 Diederichs
35 pages
Robotics Exam Prep for B.Tech Students
No ratings yet
Robotics Exam Prep for B.Tech Students
5 pages
Center of Gravity
No ratings yet
Center of Gravity
2 pages
001 MMW Elementary Logic Lect PDF
No ratings yet
001 MMW Elementary Logic Lect PDF
47 pages
IGCSE Co-Ordinated Sciences 0654 - 52 Paper 5 Oct - Nov 2020
No ratings yet
IGCSE Co-Ordinated Sciences 0654 - 52 Paper 5 Oct - Nov 2020
2 pages
0607 s11 QP 22 PDF
No ratings yet
0607 s11 QP 22 PDF
8 pages
4.1 - Understanding Thermal Equilibrium
No ratings yet
4.1 - Understanding Thermal Equilibrium
12 pages
Finite Element Analysis Example.: We Set Up A Triangle With 6 Internal Nodes, As Shown
No ratings yet
Finite Element Analysis Example.: We Set Up A Triangle With 6 Internal Nodes, As Shown
7 pages
Assignment2 BMS
No ratings yet
Assignment2 BMS
10 pages
Lpic FLC
No ratings yet
Lpic FLC
19 pages
Feedback Assistant in SLS For Maths - 10 Jul
No ratings yet
Feedback Assistant in SLS For Maths - 10 Jul
1 page
dbms-lab-manual-MR21 Syllabus Final
No ratings yet
dbms-lab-manual-MR21 Syllabus Final
15 pages
Problem Set 1 - 2025
No ratings yet
Problem Set 1 - 2025
2 pages
Advanced Differential Equations
No ratings yet
Advanced Differential Equations
9 pages
Equilibrium of Force System: Source: Engineering Mechanics by Ferdinand L Singer
No ratings yet
Equilibrium of Force System: Source: Engineering Mechanics by Ferdinand L Singer
7 pages
Transmission Line Models & Performance
No ratings yet
Transmission Line Models & Performance
15 pages
CPM Precalculus Chapter 05 Solutions
No ratings yet
CPM Precalculus Chapter 05 Solutions
22 pages
DSS Assignment
No ratings yet
DSS Assignment
29 pages
Handbook of Electrochemistry G Zoski
0% (1)
Handbook of Electrochemistry G Zoski
7 pages
Module - Triangle Congruence PDF
No ratings yet
Module - Triangle Congruence PDF
28 pages
Teaching Children Mathematics
No ratings yet
Teaching Children Mathematics
9 pages
Six Sigma Green Belt Training Course
No ratings yet
Six Sigma Green Belt Training Course
1 page
SOP for Master's in Accounting & Finance
No ratings yet
SOP for Master's in Accounting & Finance
2 pages
Betriebs-Anleitung: Option For The Control
100% (1)
Betriebs-Anleitung: Option For The Control
54 pages
Agamograph Lesson Plan for Grade 8
No ratings yet
Agamograph Lesson Plan for Grade 8
3 pages
Geometry of Design Studies in Proporation and Comp... - (Architectural Proportions)
No ratings yet
Geometry of Design Studies in Proporation and Comp... - (Architectural Proportions)
2 pages
Bearing and Scale Drawing
No ratings yet
Bearing and Scale Drawing
10 pages
Chapter 2: Boolean Algebra & Logic Gates Solutions of Problems: Problem: 2-1
100% (1)
Chapter 2: Boolean Algebra & Logic Gates Solutions of Problems: Problem: 2-1
7 pages
Comparisonof ISO21748 and ISO11352 Standardsformeasurementuncertaintyestimationinwateranalysis
No ratings yet
Comparisonof ISO21748 and ISO11352 Standardsformeasurementuncertaintyestimationinwateranalysis
8 pages