100% found this document useful (1 vote)

322 views48 pages

Exploratory Data Analysis Techniques

The document summarizes a presentation on exploratory data analysis techniques. It defines exploratory data analysis, compares it to confirmatory data analysis, and outlines various graphical and non-graphical exploratory data analysis techniques including stem-and-leaf plots, box plots, histograms, scatterplots, bar graphs, pie charts, and measures of central tendency, spread, and correlation. Examples of each technique are provided using hypothetical rainfall and population data.

Uploaded by

nagpala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

322 views48 pages

Exploratory Data Analysis Techniques

Uploaded by

nagpala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Chocolate Cake Seminar

Series on Statistical Applications

Todays Talk:

Be an Explorer with Exploratory

Data Analysis!
By David Ramirez

Outline of Presentation
Exploratory v. Confirmatory Data Analyses
Exploratory Data Analysis Techniques

Examples of Graphical Techniques

Examples of Non-graphical Techniques

What is Exploratory Data Analysis (EDA)?

John Tukey (1915-2000), American statistician
It is important to understand what
you CAN DO before you learn to
measure how WELL you seem to
have DONE it.

Definition
EDA consists of methods of discovering unanticipated
patterns and relationships in a data set, by summarizing
data quantitatively or presenting them visually.
3

Exploratory v. Confirmatory
Exploratory Data Analysis
Descriptive Statistics - Inductive Approach
Look for flexible ways to examine data without preconceptions
Heavy reliance on graphical displays
Let data suggest questions

Advantages
Flexible ways to generate hypotheses
Does not require more than data can support
Promotes deeper understanding of processes

Disadvantages
Usually does not provide definitive answers
Requires judgment - cannot be cookbooked

Exploratory v. Confirmatory

Confirmatory Data Analysis

Inferential Statistics - Deductive Approach

Hypothesis tests and formal confidence interval estimation

Hypotheses determined at outset
Heavy reliance on probability models
Look for definite answers to specific questions
Emphasis on numerical calculations

Advantages
Provide precise information in the right circumstances
Well-established theory and methods

Disadvantages
Misleading impression of precision in less than ideal circumstances
Analysis driven by preconceived ideas
Difficult to notice unexpected results

EDA Techniques
Graphical presentation of distribution

- Continuous variables (stem-and-leaf plot, box plot,

histogram, bivariate scatterplot)
- Categorical variables (bar graph, pie chart)

Non-graphical summary of distribution

- Continuous variables (mean, median, mode, variance,
standard deviation, range, correlation coefficient, linear
regression)
- Categorical variables (frequency table, cross-tabulation)

Stem-and-Leaf Plot
What is it?
A plot where each data value is split into a "leaf"
(usually the last digit) and a "stem" (the other digits).

Useful for describing distributions in terms of

-- Symmetry or skewness (right-skewed=long right tail or
left-skewed=long left tail)
-- Unimodality, bimodality or multimodality (one, two,
or more peaks)
-- Presence of outliers (a few very large or very small
observations)
7

How To Create Stem-and-Leaf Plot

Syntax
EXAMINE VARIABLES=Rain
/PLOT BOXPLOT STEMLEAF

By Mouse
Descriptive Statistics-> Explore -> Plot Stem and
Leaf Plot

Example: Stem-and-leaf Plot

We use SPSS to construct a stem-and-leaf plot for
rainfall in the US in metropolitan areas.
Frequency Stem & Leaf
4.00 Extremes (=<15)
1.00
1. 8
.00
2.
2.00
2 . 58
10.00
3 . 0001111234
15.00
3 . 555556666677889
16.00
4 . 0011222223333344
7.00
4 . 5555566
4.00
5 . 0234
1.00 Extremes (>=60)
9

Box Plot
What is it?
A way of graphically depicting groups of numerical data
through their five-number summaries: the smallest
observation (sample minimum), lower quartile (Q1),
median (Q2), upper quartile (Q3), and largest observation
(sample maximum). A box plot may also indicate which
observations, if any, might be considered outliers.

Useful in visualizing the following:

Location
Spread
Skewness
Outliers
10

How To Create Box Plot

Syntax
EXAMINE VARIABLES=Rain
/PLOT=BOXPLOT.

By mouse
Graphs> legacy plots-> Box Plots->Click summaries of
separate variables-> Scaled Variable-> Optional:
Label Case-> Okay

Example: Box Plot

Using the previous data on precipitation, we
would like to understand the distribution of
the rain and check for any outliers.

Example: Multiple Box Plots

Side-by-side box plots below display the
population distribution of large cities in 1960.

How To Create Box Plots

Syntax
EXAMINE VARIABLES=Population BY Country
/PLOT=BOXPLOT
/ID=City.

By mouse
Graph> legacy plots-> Box Plots> click summaries
of groups of cases> define> Variable (scalar) >
categories (how are we organize them)> label (IDs
or name (optional))
14

Histogram
What is it?
A diagram consisting of rectangles which area is
proportional to the frequency of a continuous variable
and which width is equal to the class interval (bin).

Useful for describing distributions in terms of

-- Symmetry or skewness

-- Unimodality, bimodality or multimodality

-- Presence of outliers
15

How To Create Histogram

Automatically chosen Bins
Syntax
GRAPH
/HISTOGRAM(NORMAL)=Population.

By Mouse
Graphs-> histogram-> Variable (scalar)-> okay

How To Create Histogram

User-selected number of bins
Syntax
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Population MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Population=col(source(s), name("Population"))
GUIDE: axis(dim(1), label("Population"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(Population, binCount(5)))),
shape.interior(shape.square))
END GPL.

By Mouse
Graphs-> Chartbuilder > Histogram-> Drag Variable (scalar) (x-axis)>set parameters-> custom -> number of intervals -> continue-> okay
17

How To Create Histogram

User-selected bin width
Syntax
* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=Population MISSING=LISTWISE
REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: Population=col(source(s), name("Population"))
GUIDE: axis(dim(1), label("Population"))
GUIDE: axis(dim(2), label("Frequency"))
ELEMENT: interval(position(summary.count(bin.rect(Population, binWidth(1)))),
shape.interior(shape.square))
END GPL.

By Mouse
Graphs-> Chartbuilder > Histogram-> Drag Variable (scalar) (x-axis)>set parameters-> custom -> number of intervals -> continue-> okay

Example: Histogram
A researcher might need to select bins to have
a better understanding of the distribution and
check what type of distribution we have.

Scatterplot
What is it?
A scatterplot is a plot of data points in xy-plane
that displays the strength, direction and shape of
the relationship between the two variables.

Used for
Analyzing relationships between two variables
Looking to see if there are any outliers in the data

How To Create Scatterplot

Syntax
GRAPH
/SCATTERPLOT(BIVAR)=Height WITH Wieght
/MISSING=LISTWISE.

By Mouse
> graph-> legacy dialogs-> scatter/dot-> Simple
Scatter-> Y axis (outcome) -> X axis (predictor)->
okay

Example: Scatterplot
Researchers wanted to see if there is a link
between Height and Weight.

Bar Graph
What is it?
-- A diagram consisting of rectangles which area is
proportional to the frequency of each level of
categorical variable.
-- Bar graph is similar to histogram but for
categorical variables.
Used for
-- comparison of frequencies for different levels
23

How To Create Bar Graph

Syntax
GRAPH
/BAR(SIMPLE)=COUNT BY Gender.
By Mouse
Graph-> legacy dialogues-> bar-> Categorical
Variable->Categorical Axis-> okay

Example: Bar Graph

Experimenters wanted to make sure they had
an close equal number of males and females
in a study.

Pie chart
What is it?
A type of graph in which a circle is divided into
sectors corresponding to each level of categorical
variable and illustrating numerical proportion for
that level.

Used for
-- comparison of proportions for different levels

How To Create Pie Chart

Syntax
GRAPH
/PIE=COUNT BY Bindedage.

By Mouse
Graph-> Legacy Dialogs-> Pie Chart->
Summaries for group of cases-> define->
categorical variable-> categorical axis-> okay

Example: Pie Chart

A researcher wants to partition the age
variable into a categorical variable in terms of
mental development (College Age, Older
Young Adult, Young Middle age, Middle
Middle Age and up).

Non-Graphical Techniques
Measures of Central Tendency
Central Tendency is the location of the middle
value
Mean=sum of all data values divided by the
number of values (arithmetic average).

Measures of Central Tendency

Median=the middle value after all the values are
put in an ordered list (50% observations lie below
and 50% above the median).
If there is a two middle observations, median is the average of
the two.

Mode=most likely or frequently occurring value.

Measures of Spread
Spread is how far observations lie from each
other.
-- Variance=average of the squared distances from
the mean.

-- Standard deviation=square root of the variance.

-- Range=maximum-minimum.
31

How to Compute Measures of Central

Tendency and Spread
Syntax
FREQUENCIES VARIABLES=MORT
/STATISTICS=STDDEV VARIANCE RANGE MEAN MEDIAN MODE
/ORDER=ANALYSIS.

By Mouse
Analyze-> Frequency -> Select a Scaled data->
click Statistics-> select Mean, Median, Mode,
Range, Maximum and Minimum.
32

Example: Central Tendency and Spread

We use SPSS to figure out the Central
Tendency and Spread of the Mortality rates in
the 1960s.
Statistics
MORT
N

Valid
Missing

60
0

Mean

940.3650

Median

943.7000

Mode

790.70 a

Std. Deviation

62.20482

Variance

3869.439

Range

322.30

Correlation Coefficient
What is it?
-- A numeric measure of linear relationship between two continuous
variables.

Properties of correlation coefficient:

-- Ranges between -1 and 1
-- The closer it is to -1 or 1, the stronger the linear relationship is
-- If r=0, the two variables are not correlated
-- If r is positive, relationship is described as positive (larger values of one
variable tend to accompany larger values of the other variable)
-- If r is negative, relationship is described as negative (larger value of one
variable tend to accompany smaller values of the other variable)

Correlation

Slight warning:
Correlation tend to measure linear relationship;
however there are events that a curves might exist

Linear Regression
What is it?
-- Statistical technique of fitting a linear function to
data points in attempt to describe a relationship
between two variables.

Used for
-- prediction
-- interpretation of coefficients (change in y for a
unit increase in x)
36

How To Find Correlation and

Fitted Regression Line
By Syntax
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT Wieght
/METHOD=ENTER Height.

By mouse
Analyze->Regression-> Y (Variable we want to
predict) to Dependent -> X (variable we are using to
predict Y) with Independent->
37

Example: Correlation
Referring to our weight and height scatterplot,
the researchers want to check how related
these two variable are.
Correlations
Wieght
Pearson
Correlation

Wieght

Hieght

1.000

.717

1.000

Hieght

Sig. (1tailed)

Wieght
Hieght

.000

Wieght

507

Hieght

507

.000

Example: Regression
Researchers want to create a linear model
using the height as an independent variable
(predictor) and weight as a dependent variable
(outcome or response).
The fitted line can be written as
Weight= -105.011+1.018 (Height)
Coefficientsa

Unstandardized
Coefficients
Model
1

B
(Constant)
Hieght

Std. Error

-105.011

7.539

1.018

.044

Standardiz
ed
Coefficient
s
Beta

.717

Sig.

-13.928

.000

23.135

.000

Frequency Table
What is it?
-- A table that shows frequency (count) for each
level of a categorical variable.

Used for
-- comparison of frequencies for different levels

How To Find Frequency Table

Syntax
FREQUENCIES VARIABLES=EDUbinned
/ORDER=ANALYSIS.

By mouse
Analyze-> Descriptives-> frequency->Variable
-> display Frequency-> okay

Example: Frequency Table

We want to know what was the frequencies of different
educational levels in the US metropolitan area in 1960s. We have to
use visual binning first and identify bins. Using the range, we create
bins from 9th, 10th, 11th, 12th grade and up.
Syntax
* Visual Binning.
*EDU.
RECODE EDU (MISSING=COPY) (12 THRU HI=4) (11 THRU HI=3) (10 THRU HI=2) (LO THRU
HI=1) (ELSE=SYSMIS) INTO EDUbins.
VARIABLE LABELS EDUbins 'EDU (Binned)'.
FORMATS EDUbins (F5.0).
VALUE LABELS EDUbins 1 '9th Grade' 2 '10th Grade' 3 '11th Grade' 4 '12th grade and up'.
VARIABLE LEVEL EDUbins (ORDINAL).

By Mouse
Transform-> Visual Binning-> variable we want to create into an ordinal value->
okay-> Make cut point-> enter number of cutpoints, and width-> apply-> okay

Example: Frequency Table

EDU (Binned)

Valid

Valid
Cumulative
Percent
Percent
15.0
15.0

Frequency
9

Percent
15.0

31.7

46.7

33.3

80.0

12th grade
and up

20.0

100.0

Total

100.0

9th Grade
10th Grade
11th Grade

Cross-tabulation
What it is?
a two-way table containing frequencies (counts)
for different levels of the column and row
variables.

Used for
Comparison of frequencies for different levels of
the variables (chi-squared test)

How To Find Cross-tabulation

Syntax:
CROSSTABS
/TABLES=EDUbins BY US
/FORMAT=AVALUE TABLES
/STATISTICS=CHISQ
/CELLS=COUNT
/COUNT ROUND CELL.

By Mouse
Analyze-> Descriptive Statistics-> Crosstabs-> select
variable for row-> select variable for column->
statistic-> Chi-Square-> continue-> Okay
45

Example: Cross-tabulation
Researchers wish to understand if the
educational levels from the SMSA data were
equally distributed among the US.
Looking at the p-value, we can see that the
educational levels are different among the
regions of the US.
Chi-Square Tests

EDU (Binned) * US Crosstabulation

Asymp.
Sig. (2sided)

Count
US
1.00
EDU
(Binned)

Total

9th Grade
10th
Grade
11th
Grade
12th grade
and up

2.00

3.00

4.00

Value

Total

Pearson ChiSquare

Likelihood
Ratio
Linear-byLinear
Association

N of Valid
Cases

26.078a

.002

25.377

.003

9.893

.002

Recommended Readings/Citations
Hartwig, F., & Dearing, B. E. (1979). Exploratory Data
Analysis. Beverly Hills : Sage Publications.
Hoaglin, D. C., Mostellar, F., & Tukey, J. W. (1983).
Understanding Robust and Exploratory Data Analysis. New
York: John Wile & Sons Inc.
Pampel, F. C. (2004). Exploratory Data Analysis . In M. S.
Lewis-Beck, A. Bryman, & L. t. Futing, The SAGE
Encyclopedia of Social Science Research Methods (pp. 359360). Thousand Oak, California : Sage Publications.
Vogt, W. P. (1999). Exploratory Data Analysis. In W. P. Vogt,
Dictionary of Statistics & Methodology: A Nontechnical
Guide for the Social Science (pp. 104-105). Thousand Oaks,
California: SAGE Publications. Inc.
48

Exploratory Data Analysis Techniques
100% (2)
Exploratory Data Analysis Techniques
49 pages
Genetic Algebras: Proceedings of The Royal Society of Edinburgh 59 (1939) 242-258
No ratings yet
Genetic Algebras: Proceedings of The Royal Society of Edinburgh 59 (1939) 242-258
22 pages
Time Series Analysis Objectives
No ratings yet
Time Series Analysis Objectives
23 pages
Monte Carlo Randdemo
No ratings yet
Monte Carlo Randdemo
18 pages
Maximum Likelihood Estimation - Stokastik
No ratings yet
Maximum Likelihood Estimation - Stokastik
8 pages
MLE Programming in Stata Guide
No ratings yet
MLE Programming in Stata Guide
18 pages
Geoscientists' Practical Statistics Guide
No ratings yet
Geoscientists' Practical Statistics Guide
180 pages
Discrete-Time Markov Chains Overview
No ratings yet
Discrete-Time Markov Chains Overview
37 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Create A Vector
No ratings yet
Create A Vector
46 pages
R Reference Guide for Programmers
No ratings yet
R Reference Guide for Programmers
6 pages
Regression Explained SPSS
No ratings yet
Regression Explained SPSS
24 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
9 pages
Two-Stage Sampling Explained
No ratings yet
Two-Stage Sampling Explained
21 pages
MLE in Stata
No ratings yet
MLE in Stata
17 pages
Basic - Statistics 30 Sep 2013 PDF
100% (1)
Basic - Statistics 30 Sep 2013 PDF
20 pages
Mod7 CVX CVXOPT
No ratings yet
Mod7 CVX CVXOPT
69 pages
Fundamentals of Applied Statistics
No ratings yet
Fundamentals of Applied Statistics
8 pages
Statistics With R Programming PDF
No ratings yet
Statistics With R Programming PDF
53 pages
Stat Term Paper
No ratings yet
Stat Term Paper
17 pages
Exploratory Data Analysis
100% (3)
Exploratory Data Analysis
791 pages
UGC Statistics Curriculum 2001
No ratings yet
UGC Statistics Curriculum 2001
101 pages
Statistical Models
No ratings yet
Statistical Models
35 pages
Optimization & Stochastic Theory
No ratings yet
Optimization & Stochastic Theory
29 pages
R Functions & Operators Guide
No ratings yet
R Functions & Operators Guide
22 pages
A Crash Course in Statistics - Handouts
No ratings yet
A Crash Course in Statistics - Handouts
46 pages
Regression Analysis in Healthcare
No ratings yet
Regression Analysis in Healthcare
3 pages
Moving Average 2
No ratings yet
Moving Average 2
11 pages
4 Data Analysis1
No ratings yet
4 Data Analysis1
32 pages
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
No ratings yet
Discrete Data Is A Count That Involves Integers. Only A Limited Number of
3 pages
Basic Statistics: Simple Linear Regression
No ratings yet
Basic Statistics: Simple Linear Regression
8 pages
Weighing The Odds A Course in Probability and Statics David Williams
0% (1)
Weighing The Odds A Course in Probability and Statics David Williams
5 pages
13 Pag Design and Analysis of Experiments in The Health Sciences
100% (1)
13 Pag Design and Analysis of Experiments in The Health Sciences
13 pages
RStudio Shortcuts Cheat Sheet
No ratings yet
RStudio Shortcuts Cheat Sheet
3 pages
An R Tutorial Starting Out
No ratings yet
An R Tutorial Starting Out
9 pages
Mining Class Comparisons
100% (1)
Mining Class Comparisons
4 pages
R for Multivariate Analysis Guide
No ratings yet
R for Multivariate Analysis Guide
51 pages
Principal Components Analysis
No ratings yet
Principal Components Analysis
50 pages
My Notes This Is Note For 2da3
100% (1)
My Notes This Is Note For 2da3
83 pages
R Packages for Machine Learning
No ratings yet
R Packages for Machine Learning
3 pages
Wishart Distribution
No ratings yet
Wishart Distribution
6 pages
Basic Statistics
No ratings yet
Basic Statistics
52 pages
Chapter 3 - Describing Data
No ratings yet
Chapter 3 - Describing Data
39 pages
M.C.a. (Sem - II) Probability and Statistics
100% (1)
M.C.a. (Sem - II) Probability and Statistics
272 pages
Topology of Musical Data PDF
100% (1)
Topology of Musical Data PDF
34 pages
Statistics Midterm Exam Guide
100% (1)
Statistics Midterm Exam Guide
2 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
EDA & Data Visualization Guide
No ratings yet
EDA & Data Visualization Guide
49 pages
5.1 Exploratory Analysis en
No ratings yet
5.1 Exploratory Analysis en
79 pages
Minitab Graphs for Data Analysis
No ratings yet
Minitab Graphs for Data Analysis
7 pages
AEM Lecture 2
No ratings yet
AEM Lecture 2
71 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Unit 2 DS
No ratings yet
Unit 2 DS
36 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
8 pages
DataViz 03. Distribution Visualization
No ratings yet
DataViz 03. Distribution Visualization
57 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
Exploratory Data Analysis Reference
No ratings yet
Exploratory Data Analysis Reference
50 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
43 pages
Group 4 CHM 812 Assgn.
No ratings yet
Group 4 CHM 812 Assgn.
7 pages
Functional Data Analysis
No ratings yet
Functional Data Analysis
8 pages
Linear Regression Assignment
0% (1)
Linear Regression Assignment
4 pages
CH 10
No ratings yet
CH 10
9 pages
Price Behaviour of Mulberry Silk Cocoon in Ramnagar and Siddlaghatta Market - A Statistical Analysis
No ratings yet
Price Behaviour of Mulberry Silk Cocoon in Ramnagar and Siddlaghatta Market - A Statistical Analysis
69 pages
DEM Generation Using Surfer: Batangas State University
No ratings yet
DEM Generation Using Surfer: Batangas State University
11 pages
PhD Econometrics Exam 2016/2017
No ratings yet
PhD Econometrics Exam 2016/2017
2 pages
Set2 ML Lab+viva
No ratings yet
Set2 ML Lab+viva
15 pages
Assessing Household Vulnerability to Poverty
No ratings yet
Assessing Household Vulnerability to Poverty
56 pages
Ag Statistics MSC Syllabus
No ratings yet
Ag Statistics MSC Syllabus
14 pages
Tutorial 23 Back Analysis Material Properties
No ratings yet
Tutorial 23 Back Analysis Material Properties
15 pages
Linearity in Regression, Domodar N Gujrati - Basic Econometrics
No ratings yet
Linearity in Regression, Domodar N Gujrati - Basic Econometrics
2 pages
Seston Retention by GF/C Filters
No ratings yet
Seston Retention by GF/C Filters
7 pages
Problems with Stepwise Regression
No ratings yet
Problems with Stepwise Regression
1 page
Basic and Advanced Statistical Tests Writing Results Sections and Creating Tables and Figures 1st Edition Amanda Ross
No ratings yet
Basic and Advanced Statistical Tests Writing Results Sections and Creating Tables and Figures 1st Edition Amanda Ross
61 pages
An Introduction To Classical Econometric Theory-Ruud
No ratings yet
An Introduction To Classical Econometric Theory-Ruud
975 pages
Econometrics for Graduate Students
No ratings yet
Econometrics for Graduate Students
46 pages
Spss Assignments Guidelines
No ratings yet
Spss Assignments Guidelines
2 pages
Ayar Et Al., 2021
No ratings yet
Ayar Et Al., 2021
16 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Artificial Int Syllabus Sem V Mumbai University
No ratings yet
Artificial Int Syllabus Sem V Mumbai University
39 pages
Multiple Regression
100% (1)
Multiple Regression
17 pages
Key Statistics Questions for Management
No ratings yet
Key Statistics Questions for Management
26 pages
Pidsdps 2445
No ratings yet
Pidsdps 2445
45 pages
Statistics For Business & Economics David R. Anderson Available Instanly
No ratings yet
Statistics For Business & Economics David R. Anderson Available Instanly
66 pages
Data Science Exam for BE Students
No ratings yet
Data Science Exam for BE Students
2 pages
Studentsrsquo Masteryof English Grammartowards Effective Writingand Speaking Competence
No ratings yet
Studentsrsquo Masteryof English Grammartowards Effective Writingand Speaking Competence
11 pages
OLS Regression Analysis Guide
No ratings yet
OLS Regression Analysis Guide
32 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Gwowen Shieh: Psychometrika
No ratings yet
Gwowen Shieh: Psychometrika
20 pages

Exploratory Data Analysis Techniques

Uploaded by

Exploratory Data Analysis Techniques

Uploaded by

Chocolate Cake Seminar

Series on Statistical Applications

Be an Explorer with Exploratory

Examples of Graphical Techniques

What is Exploratory Data Analysis (EDA)?

Confirmatory Data Analysis

Hypothesis tests and formal confidence interval estimation

- Continuous variables (stem-and-leaf plot, box plot,

Non-graphical summary of distribution

Useful for describing distributions in terms of

How To Create Stem-and-Leaf Plot

Example: Stem-and-leaf Plot

Useful in visualizing the following:

How To Create Box Plot

Example: Box Plot

Example: Multiple Box Plots

How To Create Box Plots

Useful for describing distributions in terms of

-- Unimodality, bimodality or multimodality

How To Create Histogram

How To Create Histogram

How To Create Histogram

How To Create Scatterplot

How To Create Bar Graph

Example: Bar Graph

How To Create Pie Chart

Example: Pie Chart

Measures of Central Tendency

Mode=most likely or frequently occurring value.

-- Standard deviation=square root of the variance.

How to Compute Measures of Central

Example: Central Tendency and Spread

Properties of correlation coefficient:

How To Find Correlation and

How To Find Frequency Table

Example: Frequency Table

Example: Frequency Table

How To Find Cross-tabulation

EDU (Binned) * US Crosstabulation

You might also like