0% found this document useful (0 votes)

98 views13 pages

Data Analysis 1

Uploaded by

bradleymakaure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views13 pages

Data Analysis 1

Uploaded by

bradleymakaure

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 13

Data Analysis

The purpose of data analysis is to::

•Produce descriptive statistics to summarize the data.

•Create graphics which help to visualize data.

•Use inferential statistics to distinguish between significant and non-

significant effects .

•Create predictive models which can be used to predict future results within a
given experimental domain.
Data Analysis
Descriptive Statistics can be categorized in two groups:
1. Measures of centrality.
2. Measures of variation.

Measure of Advantage Disadvantage Formula

Centrality
Arithmetic Mean Can be used for Sensitive to
inferential outliers
statistics
Geometric mean Damp the effect of Cannot be used
outliers. useful for for inferential
changing data statistics
Harmonic mean Damp the effect of Cannot be used
outliers. Useful for for inferential
rates and ratios statistics
Median Insensitive to Insensitive to Exact center of
outliers the distribution distribution
of data
Data Analysis

Measure Advantage Disadvantage Formula

of
dispersion
Standard Very useful parameter, Not additive
deviation properties are well known
Variance Useful parameter. The Squares true
variance is additive dispersion
Relative Useful when comparing Cannot be used for
STD dissimilar data sets statistical inference
Standard Used when calculating Not additive
Error uncertainties
Range Simple to calculate Based on only two
data points
Types of Variables

Type of Variable Definition Examples

Continuous Variable which can take on Mass,
any value between two Concentration,
specified limits Temperature.
Nominal Categorical variable in which Type of catalyst,
there is no order Method of analysis,
Binary variable: Pass/Fail.
Ordinal Ordered categorical variable Rating scale,
Diagnosis.
For many methods of data analysis, it is important to identify the
independent variables (factors) and the dependent variable (response)
Exploratory Data Analysis (EDA)
EDA is used for the following purposes:
•To help the researcher to formulate relevant hypothesis.
•To suggest the appropriate statistical tools to analyze the data.

Many EDA techniques involve graphical displays of the data such as:

•Histograms,
•Box and whisker plots,
•Pareto charts,
•Stem-and-leaf plots,
•Multi-vari charts.
Exploratory Data Analysis (EDA)
Example

Histogram: Yield (g) B ox P lot: Y ield (g)

400 12

10
300
8

Yield (g)
No of obs

200 6

4
100
2

0 0
1 2 3 4 5 6 7 8 9 10 11 12
Y ie ld (g)
Exploratory Data Analysis (EDA)
example 2: Box plots and Correlation matrix
of IQ and 4 Test marks (2000 students)

Box & Whisker Plot

120
Correlation Matrix
100 T1 T2 T3 T4
80 IQ 0.51 0.82 0.02 0.52

60 T1 0.42 0.03 0.60

40 T2 0.04 0.55

20 T3 0.02

0
IQ T1 T2 T3 T4
Exploratory Data Analysis (EDA)

Other EDA techniques:

• Cluster Analysis Collects “similar” variables in

clusters.

• Principle Component Analysis Reduces the number of

independent variables to the
essential variables.

• Factor Analysis Used to detect the relationship

between variables.

• Discriminant Analysis Used to detect variables which

discriminate between naturally
occurring groups.

• Categorical data Analysis Studies the relationship

between nominal and
ordinal variables.
Exploratory Data Analysis (EDA)
Example : Cluster Analysis
Cluster Diagram: Four Tests

Test 1

Test 4

Test 2

Test 3

400 600 800 1000 1200 1400

Linkage Distance
Exploratory Data Analysis (EDA)
Example : Categorical Data Analysis
Contingency Tables

Diagnosis
Treatment No Little Good
Improvement Improvement Improvement

A 12* 25 30

B 4 7 8
C 34 35 36
* The number in the cells are patient counts
From this contingency table, we can determine, by
performing a chi-squared test, whether there is a significant
difference between the treatments.
Statistical Inference:
Estimating the parameters of a population from the
statistics of a representative sample.

Examples Statistics Parameters

Statistic (from sample) Parameter

Sample Mean :X Population mean μ

Sample STD: S Population STD: σ

Sample Proportion: p Population proportion: ρ

Statistical Inference
The following statement always applies:

Measurement =Parameter ± Experimental error

• Parameters can only be estimated within a calculated uncertainty.

• Whenever a estimated parameter is given, the uncertainty associated

with it, must be given as well.

• The actual calculation of the uncertainty depends on the distribution of

the data.

• The uncertainty can be visualized by using error bars

Statistical Inference
Analysis Wanted Methods Available
Compare 2 independent samples T-Test for normal data
Mann-Withney test for non-normal data
Compare 2 related samples Paired t-Test
Compare n (n>2) independent ANOVA for normal data
samples Friedmann ANOVA for non-normal data
Compare trends Regression with indicator variables
Detect the effects of factors on a
response Multiple regression
Find the levels of the factors for
which maximum or / and minimum Response Surface Modeling
responses are achieved.

Definition: Significant effect = An effect not caused by experimental error

Whether an effect is significant or not, is decided on by using p-values.

Data Analysis
No ratings yet
Data Analysis
55 pages
Advanced Data Analytics Certificate Glossary
No ratings yet
Advanced Data Analytics Certificate Glossary
35 pages
06 Investment Decisions
No ratings yet
06 Investment Decisions
23 pages
660 Final Assignment (Maruf)
No ratings yet
660 Final Assignment (Maruf)
29 pages
Excel Business Analytics Guide
No ratings yet
Excel Business Analytics Guide
100 pages
Y MX + C: Let Us Learn More Its Graph, and The Derivation From Other Forms of Equations of A Line
No ratings yet
Y MX + C: Let Us Learn More Its Graph, and The Derivation From Other Forms of Equations of A Line
10 pages
Sample Question For Business Statistics
100% (2)
Sample Question For Business Statistics
12 pages
WOW-Statement Generator Guide
No ratings yet
WOW-Statement Generator Guide
2 pages
Tutorial 03 - S2 - 2017 - Solutions For Business Statistics
No ratings yet
Tutorial 03 - S2 - 2017 - Solutions For Business Statistics
15 pages
Workshop 03 - S1 - 2020 - Solutions For Business Statistics
No ratings yet
Workshop 03 - S1 - 2020 - Solutions For Business Statistics
13 pages
Method Chooser Basic Statistical Tests
100% (1)
Method Chooser Basic Statistical Tests
36 pages
Business Stats for Beginners
No ratings yet
Business Stats for Beginners
20 pages
Chemistry Basics for Students
No ratings yet
Chemistry Basics for Students
29 pages
Digital Customer Journey Insights
No ratings yet
Digital Customer Journey Insights
20 pages
Finance - Cameron Paff
No ratings yet
Finance - Cameron Paff
97 pages
CH-1 - Introduction-Updated
No ratings yet
CH-1 - Introduction-Updated
55 pages
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
No ratings yet
Palompon Institute of Technology Palompon, Leyte: FD 502 (Educational Statitics)
18 pages
Consulting Problem Solving Guide
No ratings yet
Consulting Problem Solving Guide
14 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
Business Statistics: Methods For Describing Sets of Data
No ratings yet
Business Statistics: Methods For Describing Sets of Data
103 pages
Quantitative Methods For Business
No ratings yet
Quantitative Methods For Business
15 pages
Business Analytics: Data Analysis & Decision Making 7th Edition
100% (1)
Business Analytics: Data Analysis & Decision Making 7th Edition
62 pages
Advance Excel & VBA Training Courses
No ratings yet
Advance Excel & VBA Training Courses
7 pages
The Informal Structure: Hidden Energies Within The Organisation
No ratings yet
The Informal Structure: Hidden Energies Within The Organisation
13 pages
Data-Analysis Probability Midterm
No ratings yet
Data-Analysis Probability Midterm
56 pages
Business Analytics: Key Statistical Measures
No ratings yet
Business Analytics: Key Statistical Measures
109 pages
Statistics For Business Analysis: Learning Objectives
No ratings yet
Statistics For Business Analysis: Learning Objectives
37 pages
Market Research Career Profile
No ratings yet
Market Research Career Profile
3 pages
Bayesian A/B Testing for Business
No ratings yet
Bayesian A/B Testing for Business
8 pages
Detecting Data Outliers Guide
No ratings yet
Detecting Data Outliers Guide
7 pages
Goal Seek
No ratings yet
Goal Seek
4 pages
Stata Commands Cheat Sheet
No ratings yet
Stata Commands Cheat Sheet
1 page
Business Analytics With Excel PDF
No ratings yet
Business Analytics With Excel PDF
3 pages
Introduction To IBM SPSS Statistics
100% (1)
Introduction To IBM SPSS Statistics
85 pages
Econometrics Exam
No ratings yet
Econometrics Exam
8 pages
Business Analytics: Finding Relationships Among Variables
No ratings yet
Business Analytics: Finding Relationships Among Variables
39 pages
Chapter 8 B - Trendlines and Regression Analysis
No ratings yet
Chapter 8 B - Trendlines and Regression Analysis
73 pages
Anova and F Test
No ratings yet
Anova and F Test
32 pages
Sntrep Spreadsheet Details
No ratings yet
Sntrep Spreadsheet Details
18 pages
Sas Stat
No ratings yet
Sas Stat
44 pages
Excel Keyboard Shortcuts Guide
No ratings yet
Excel Keyboard Shortcuts Guide
7 pages
Correlation Regression and Trend Analysis
No ratings yet
Correlation Regression and Trend Analysis
27 pages
Albright DADM 6e - PPT - Ch07
No ratings yet
Albright DADM 6e - PPT - Ch07
29 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Tournament Questions For CAT Exam
No ratings yet
Tournament Questions For CAT Exam
4 pages
Linear Regression Guide for Analysts
No ratings yet
Linear Regression Guide for Analysts
46 pages
EDA Interview Questions
No ratings yet
EDA Interview Questions
3 pages
Data Analysis and Linear Regression Insights
No ratings yet
Data Analysis and Linear Regression Insights
3 pages
Elements of Mathematical Statistics
No ratings yet
Elements of Mathematical Statistics
23 pages
Albright DADM 6e - PPT - Ch05
No ratings yet
Albright DADM 6e - PPT - Ch05
48 pages
Le Wagon - Data Science Course Syllabus
No ratings yet
Le Wagon - Data Science Course Syllabus
37 pages
Business Analytics with Excel Guide
No ratings yet
Business Analytics with Excel Guide
56 pages
Probability
No ratings yet
Probability
80 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Exploratory Data Analysis - v3 - Part1
No ratings yet
Exploratory Data Analysis - v3 - Part1
36 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
2 pages
Not 1
No ratings yet
Not 1
8 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Data Analysis
No ratings yet
Data Analysis
37 pages
SAP Product Costing Guide
No ratings yet
SAP Product Costing Guide
69 pages
Parametric vs Nonparametric Tests Explained
No ratings yet
Parametric vs Nonparametric Tests Explained
10 pages
Cost Function: y 2m 1 (Y ) 2m 1
No ratings yet
Cost Function: y 2m 1 (Y ) 2m 1
1 page
Introduction To Pressure Transient Analysis
No ratings yet
Introduction To Pressure Transient Analysis
38 pages
State Estimation Techniques in Power Systems
No ratings yet
State Estimation Techniques in Power Systems
6 pages
Semi F37
No ratings yet
Semi F37
3 pages
Mapping Enhancements: Sap Netweaver Process Integration 7.1
No ratings yet
Mapping Enhancements: Sap Netweaver Process Integration 7.1
23 pages
A Method To Determine The Safe Time For High Waxy Crude Pipeline With Uncertainty
No ratings yet
A Method To Determine The Safe Time For High Waxy Crude Pipeline With Uncertainty
7 pages
2.0 Machine Learning Introduction
No ratings yet
2.0 Machine Learning Introduction
24 pages
Automatic Moisture Analyzer Xy Xing Yun T7
No ratings yet
Automatic Moisture Analyzer Xy Xing Yun T7
8 pages
Pavement Design Reliability Guide
No ratings yet
Pavement Design Reliability Guide
26 pages
Ch04 Inferences About Process Quality
No ratings yet
Ch04 Inferences About Process Quality
111 pages
HEC-RAS DL Breach-20240807 - 015624
No ratings yet
HEC-RAS DL Breach-20240807 - 015624
25 pages
Combining Modified Weibull Distribution Models For Power System Reliability Forecast
No ratings yet
Combining Modified Weibull Distribution Models For Power System Reliability Forecast
10 pages
RSM, CCD Types & Contour Plots
No ratings yet
RSM, CCD Types & Contour Plots
44 pages
Lecturer1 - Bbit 308 Simulation and Modelling
No ratings yet
Lecturer1 - Bbit 308 Simulation and Modelling
28 pages
A-40384EN 0i-F Starting Manual
100% (1)
A-40384EN 0i-F Starting Manual
26 pages
2D/3D Tolerance Analysis Program
No ratings yet
2D/3D Tolerance Analysis Program
8 pages
Bayesian Inference with Stan Tutorial
No ratings yet
Bayesian Inference with Stan Tutorial
24 pages
SWOOD Design - Training Manual
No ratings yet
SWOOD Design - Training Manual
204 pages
Duracrete Report BE1347R01
No ratings yet
Duracrete Report BE1347R01
170 pages
Chapter 3-2857
No ratings yet
Chapter 3-2857
8 pages
01 - SAM600-MU Basic Configuration
No ratings yet
01 - SAM600-MU Basic Configuration
27 pages
Linear Programming
No ratings yet
Linear Programming
45 pages
Reliability in Flexible Pavement Design
No ratings yet
Reliability in Flexible Pavement Design
15 pages
Metacognitive Strategies & Test Success
No ratings yet
Metacognitive Strategies & Test Success
37 pages
VMI's Impact on Bullwhip Effect Reduction
No ratings yet
VMI's Impact on Bullwhip Effect Reduction
17 pages
User Exit
No ratings yet
User Exit
40 pages
MixSIR: Bayesian Isotope Mixing Model
No ratings yet
MixSIR: Bayesian Isotope Mixing Model
20 pages
Flood Control Applications PDF
100% (1)
Flood Control Applications PDF
117 pages

Data Analysis 1

Uploaded by

Data Analysis 1

Uploaded by

Data Analysis

The purpose of data analysis is to::

•Produce descriptive statistics to summarize the data.

•Create graphics which help to visualize data.

•Use inferential statistics to distinguish between significant and non-

Measure of Advantage Disadvantage Formula

Measure Advantage Disadvantage Formula

Type of Variable Definition Examples

Histogram: Yield (g) B ox P lot: Y ield (g)

Box & Whisker Plot

60 T1 0.42 0.03 0.60

Other EDA techniques:

• Cluster Analysis Collects “similar” variables in

• Principle Component Analysis Reduces the number of

• Factor Analysis Used to detect the relationship

• Discriminant Analysis Used to detect variables which

• Categorical data Analysis Studies the relationship

400 600 800 1000 1200 1400

Examples Statistics Parameters

Sample Mean :X Population mean μ

Sample STD: S Population STD: σ

Sample Proportion: p Population proportion: ρ

Measurement =Parameter ± Experimental error

• Parameters can only be estimated within a calculated uncertainty.

• Whenever a estimated parameter is given, the uncertainty associated

• The actual calculation of the uncertainty depends on the distribution of

• The uncertainty can be visualized by using error bars

Definition: Significant effect = An effect not caused by experimental error

Whether an effect is significant or not, is decided on by using p-values.

You might also like