0% found this document useful (0 votes)

33 views17 pages

Data Visualization

This document discusses univariate descriptive statistics and examples of various graphs and measures used to summarize quantitative data, including histograms, boxplots, dot plots, density plots, pie charts, bar graphs, measures of center such as mean, median and mode, and measures of spread such as standard deviation, interquartile range, and mean absolute deviation. Examples are provided to illustrate sea urchin size data, weather categories, personal income distributions, and comparing income distributions from different years.

Uploaded by

Chakradhar Nakka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views17 pages

Data Visualization

Uploaded by

Chakradhar Nakka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Univariate Descriptive Statistics

Displays: pie charts, bar graphs, box plots, his-

tograms, density estimates, dot plots, stem-
leaf plots, tables, lists.

Example: sea urchin sizes

Boxplot Histogram
60
10 20 30 40 50 60

50
Number of Urchins
Urchin Size (mm)

40
30
20
10
0
0

0 10 20 30 40 50 60 70

Urchin Size (mm)

Dot Plot Density

0.015
0.010
Density

0.005
0.000

0 10 20 30 40 50 60 −20 0 20 40 60 80

Urchin Size (mm) Urchin Size (mm)

14
Points:

1) Useful for quantitative variables.

2) Boxplot shows five point summary: mini-

mum, first quartile, median, third quartile, max-
imum.

3) Dot Plot illegible with 250 data points. (1

dot for each size plotted on line.)

4) Histogram, density plot serve similar pur-

poses.

5) Density goes below 0: bad.

6) Histogram doesn’t show clustering density

plot shows.

15
Example: Categorical: Weather in Central
Park
Pie Chart Bar Graph

10
clear

8
6
[Link]
4

cloudy
2
0

clear [Link] cloudy

Pie chart harder to read.

General summary: Pie Charts are bad.

More useful with more categories.

Ordering of categories important for nominal

variables.

Cloudiness is ordinal.
16
Pie charts: wedge has area proportional to #
of individuals in category.

Bar chart: bar has height equal to # of indi-

viduals in category.

Density estimates not discussed in this course.

Histogram:

1) divide range of values into intervals.

2) Count numbers of individuals in each inter-

val.

3) bar AREA is proportional to # of individuals

in interval; width is length of interval.

4) equal width bars best – then height propor-

tional to # of individuals.

5) label x-axis; include units.

6) label y-axis.
17
Example: Personal Income for BC (ages 15+).
(For those with income.) Source: 2001 Cen-
sus.
Adult Personal Income (BC)
0.03
0.02
0.01
0.00

0 20 40 60 80 100

Income ($000s)

18
Points

1) Bar widths unequal – census tables given

that way.

2) So take width times height to get area =

fraction of population in that income group.

3) Last group on right open ended – artificially

cut off at $100,000 by me.

4) Plot is “long-tailed to the right” or “skewed

to the right”.

5) Based on 20% sample of 1,523,720 people

aged 15 + in BC on census day, 2001.

6) Income is for previous year – 2000.

19
Comparison of 1995, 2005.
1996 Income
Density

0 20 40 60 80 100

2001 Income
Density

0 20 40 60 80 100

20
Comparison of 2000, 2005.

BC Individual Income 2000 and 2005

0.030

2005
2000
0.025
0.020
Density

0.015
0.010
0.005
0.000

0 20 40 60 80 100

BC Individual Income 2000

21
Summarizing the pictures.

Purposes: less space in text than a graph; pre-

cise numerical comparison between groups.

Summarizing a histogram:

Where is centre of the x-axis values? Jargon:

location or centre.

How far do the x values extend on either side?

Jargon: spread, variation, width.

Is the picture symmetric or does it extend far-

ther to right than left?

Location and number of bumps.

22
Measures of location:

Mean, Arithmetic Mean, Average, Arith-

metic Average: total of x-values divided by
number of x values.

Histogram balances at mean. (First Moment

in physics.) Think of See-Saw: small kid far
from centre balances big kid close to centre.

Formula: data X1, . . . , Xn.

Pn
Xi
X̄ = i=1
n

Utility of summation notation in this course:

NIL. But X̄ is standard notation for average of
X.

Median: number such that 1/2 of X values at

least that large, and 1/2 of X values at least
that small.

Sort list: if n is odd median is middle of sorted

list. If n is even take average of two middle
values.
23
Numerical examples: ages in my family:

50, 50, 20, 15, 8, 8.

50 + 50 + 20 + 15 + 8 + 8 151
Ā = = ≈ 25.2
6 6

Median age: middle numbers are 15, 20.

Halfway between is median = 17.5.

Mode: most common value. Not useful con-

cept in most cases. Location of tallest bar in
histogram (affected by definition of classes).

Mode of ages is not unique: 50 or 8. Not

useful summary of centre.

24
Comparison:

Advantages of mean:

1) if your average weekly income is $100 you

know how you will do in the long run; not so
if median weekly income is $100.

2) Same point: average and sample size tells

you total.

3) Has simpler mathematical behaviour than

median.

Advantages of median:

Not influenced by extreme members of list.

Median income, for instance, gives more infor-

mation about typical person.

25
Measures of spread:

Standard Deviation

Interquartile Range

Mean Absolute Deviation.

Deviations from the mean: subtract mean from

each number in list: Xi − X̄. For my family de-
viations are

24.8, 24.8, −5.2, −10.2, −17.2, −17.2.

Summarize size of deviations:

Average is 0. Not useful as measure of size

since pluses cancel minuses.

26
Mean absolute deviation: take absolute values
(ignore − signs) and average
24.8 + 24.8 + 5.2 + 10.2 + 17.2 + 17.2
6
= 16.6 years

Standard deviation: square deviations, aver-

age, take square root:
s
(24.8)2 + · · · + (−17.2)2
s=
5
= 19.8 years.
WARNING: notice the 5 not 6. This is Tradi-
tional. Not important in large data sets.

Jargon: variance is s2:

2 (24.8)2 + · · · + (−17.2)2
s =
5
= 390.6 years2

27
Interquartile Range:

First define quartiles, quintiles, etc.

First, second and third quartiles split list into

4 equal pieces.

One quarter of list below first quartile, two

quarters below second, three quarters below
third.

Second quartile is median.

Interquartile range is third quartile minus first

quartile.

Book gives method to find quartiles.

Quintiles split list into 5 equal parts.

Percentiles split list into 100 equal parts.

28
Comparison:

Advantages of IQR: like median not influenced

by extremes.

Easily related to proportions of population.

But: rather than use 2 number summary (me-

dian, IQR) typically use 3 number summary
(quartiles) or 5 number summary (min, max,
quartiles).

Boxplot is graph of 5 number summary.

Advantages of Mean Absolute Deviation.

Seems intuitive.

Less influenced by extremes than Standard De-

viation.

But: poor mathematical properties.

We mostly use Standard Deviation.

29
Why the Standard Deviation?

Usual explanation: squares nicer mathemati-

cally than absolute values.

Real explanation (WARNING: personal view):

ONLY the SD works in normal approximations
for sums.

Normal approximations? A common summary

for curves.

Rule of thumb: in many lists of data about

2/3 of the observations are within 1 SD of the
mean, about 95% within 2 SDs of the mean
and almost all within 3 SDs of the mean.

NEXT TOPIC: the normal curve. (bell curve,

Gaussian)

MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
Lecture Notes
No ratings yet
Lecture Notes
37 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
Chapter 2
No ratings yet
Chapter 2
52 pages
Statistics
100% (1)
Statistics
11 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
C291 2019 Lectures 4 5
No ratings yet
C291 2019 Lectures 4 5
10 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Stats Review
No ratings yet
Stats Review
5 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Lectures 11 12 13 - Engineering Statistics 2017 - Handouts
No ratings yet
Lectures 11 12 13 - Engineering Statistics 2017 - Handouts
97 pages
Probability 2024 Copy Helps in Exam
No ratings yet
Probability 2024 Copy Helps in Exam
215 pages
Statistics Maths Clinic Gr12 Eng
No ratings yet
Statistics Maths Clinic Gr12 Eng
6 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Stats 1, Lecture
No ratings yet
Stats 1, Lecture
11 pages
Core Statistics 101 Guide
No ratings yet
Core Statistics 101 Guide
32 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Stats Lecture 1
No ratings yet
Stats Lecture 1
45 pages
Data Analysis and Data Visualization Basics 2
No ratings yet
Data Analysis and Data Visualization Basics 2
50 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
MÔ TẢ BIẾN SỐ
No ratings yet
MÔ TẢ BIẾN SỐ
48 pages
Statistics and Probability Formulas Guide
No ratings yet
Statistics and Probability Formulas Guide
47 pages
Biostat Aguila Mission Solis
No ratings yet
Biostat Aguila Mission Solis
44 pages
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
100% (1)
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
29 pages
Unit 5 8614
No ratings yet
Unit 5 8614
39 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Descriptive Statistics and EDA Overview
No ratings yet
Descriptive Statistics and EDA Overview
36 pages
Understanding Variables in Statistics
No ratings yet
Understanding Variables in Statistics
16 pages
Stats and Its Real World Applications.
No ratings yet
Stats and Its Real World Applications.
53 pages
Unit 4 & 5 8614
No ratings yet
Unit 4 & 5 8614
58 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Statistical Measures and Analysis
No ratings yet
Statistical Measures and Analysis
47 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Understanding Statistics: Types & Methods
No ratings yet
Understanding Statistics: Types & Methods
7 pages
Book P2 2025 F
No ratings yet
Book P2 2025 F
131 pages
Summarizing Data
No ratings yet
Summarizing Data
43 pages
Group 5 Stats
No ratings yet
Group 5 Stats
45 pages
Statistics Basics for Students
No ratings yet
Statistics Basics for Students
46 pages
Jerome Statistics
No ratings yet
Jerome Statistics
12 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
36 pages
Eco Unit 2
No ratings yet
Eco Unit 2
10 pages
Basic Statistics: Mean, Median, Mode
No ratings yet
Basic Statistics: Mean, Median, Mode
9 pages
Basic Statistical Data Descriptions
No ratings yet
Basic Statistical Data Descriptions
7 pages
Ders 3-4 Descriptives of Statistics
No ratings yet
Ders 3-4 Descriptives of Statistics
31 pages
Manm526 W1
No ratings yet
Manm526 W1
38 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
Unit 01 - Describing Data and Its Distributions - 1 Per Page
No ratings yet
Unit 01 - Describing Data and Its Distributions - 1 Per Page
79 pages
1 - 3 - 4 - Class1 - Descriptive Statistics - 4slines - 1trang
No ratings yet
1 - 3 - 4 - Class1 - Descriptive Statistics - 4slines - 1trang
99 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
Descriptive Statistics and Data Visualization
No ratings yet
Descriptive Statistics and Data Visualization
419 pages
Intro to Statistics Basics
No ratings yet
Intro to Statistics Basics
53 pages
Stats For Data Science
No ratings yet
Stats For Data Science
21 pages
Statistics
No ratings yet
Statistics
12 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Geo - Statistics 1 1 1
No ratings yet
Geo - Statistics 1 1 1
51 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
9 pages
Add Maths Sba
No ratings yet
Add Maths Sba
27 pages
22dsb3303a Lecture Notes
No ratings yet
22dsb3303a Lecture Notes
108 pages
Old Multichoice Questions
No ratings yet
Old Multichoice Questions
8 pages
Module 4 Data Management MMW
No ratings yet
Module 4 Data Management MMW
25 pages
MATH 231-Reading Material For Biostatics
No ratings yet
MATH 231-Reading Material For Biostatics
94 pages
Gaver 1993
No ratings yet
Gaver 1993
36 pages
S & Punit 1
No ratings yet
S & Punit 1
186 pages
Understanding Central Values in Data
No ratings yet
Understanding Central Values in Data
22 pages
MS102-part 2 Assignment
No ratings yet
MS102-part 2 Assignment
3 pages
R Dplyr - Data Manipulation (50 Examples)
No ratings yet
R Dplyr - Data Manipulation (50 Examples)
47 pages
Business Report
No ratings yet
Business Report
23 pages
Utilization of Assessment Data
100% (4)
Utilization of Assessment Data
50 pages
Stat 4091 Exercise2
100% (1)
Stat 4091 Exercise2
4 pages
CMA Foundation Maths Marathon Part-1 - Compressed
No ratings yet
CMA Foundation Maths Marathon Part-1 - Compressed
309 pages
Data Exploration: Summary Stats & Visualization
100% (1)
Data Exploration: Summary Stats & Visualization
30 pages
Robust Statistics - How Not To Reject Outliers
100% (1)
Robust Statistics - How Not To Reject Outliers
5 pages
NCERT Exemplar Problems From Class 7 Mathematics Unit 3 Data Handling
No ratings yet
NCERT Exemplar Problems From Class 7 Mathematics Unit 3 Data Handling
35 pages
90th Percentile Response Time Loadrunner Analysis Report
No ratings yet
90th Percentile Response Time Loadrunner Analysis Report
1 page
Causal Effect
No ratings yet
Causal Effect
16 pages
CIMA Business Mathematics Fundamentals Past Papers PDF
No ratings yet
CIMA Business Mathematics Fundamentals Past Papers PDF
107 pages
Two Way ANOVA
100% (1)
Two Way ANOVA
83 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
26 pages
Basic Statistics: B.A. Programme (Geography) Semester - IV Practical - II (GGB-08)
No ratings yet
Basic Statistics: B.A. Programme (Geography) Semester - IV Practical - II (GGB-08)
13 pages
Handbook of Nuclear Medicine and Molecular Imaging For Physicists Modelling Dosimetry and Radiation Protection Volume Ii Michael Ljungberg Download
No ratings yet
Handbook of Nuclear Medicine and Molecular Imaging For Physicists Modelling Dosimetry and Radiation Protection Volume Ii Michael Ljungberg Download
79 pages
Module 5
No ratings yet
Module 5
15 pages
PS Solved Problems
No ratings yet
PS Solved Problems
39 pages
AGE 302 Introductory Notes-1
No ratings yet
AGE 302 Introductory Notes-1
19 pages
Statistics for Data Analysts
No ratings yet
Statistics for Data Analysts
7 pages
Sat Math Problem Solving Q Bank
No ratings yet
Sat Math Problem Solving Q Bank
24 pages