0% found this document useful (0 votes)

54 views38 pages

The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests

The document provides an overview of key concepts in data analysis including data types, distributions, and common statistical tests. It discusses how understanding data types can guide appropriate analysis and visualization. It also explains why comprehending distributions is important for assumptions, modeling, and quality checks. Finally, it outlines popular parametric and non-parametric tests like t-tests, chi-square, ANOVA, and regression, describing their purposes, uses, outputs, and what they assess.

Uploaded by

Milton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views38 pages

The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests

Uploaded by

Milton

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

The Data Analyst's Guide to

Data Types, Distributions, and Statistical

Tests.

ANDREW MADSON
DATA TYPES
DATA TYPES
WHY IT MATTERS
1. Appropriate Analysis: Different types of data require
different statistical tests. For example, nominal data can be
analyzed using a Chi-square test, while interval data can be
analyzed using a t-test or ANOVA. Using the wrong test can
lead to incorrect conclusions.

2. Data Visualization: Your data type determines the best

way to visualize it. For instance, categorical data might be
best represented in a bar chart, while continuous data
might be better suited for a histogram or scatter plot.

3. Data Transformation: Understanding your data type can

guide you in transforming your data, if necessary. For
example, ordinal data might be converted into interval data
under certain conditions, or continuous data might be
categorized into ordinal data.

4. Data Quality: Knowing your data type can help you

identify potential errors or inconsistencies in your data. For
instance, if you expect a variable to be continuous and find
string values, this could indicate a data quality issue.

5. Interpretation of Results: The type of data you have

influenced how you interpret your results. For example, if
you have ordinal data, you can make statements about the
order of values but not the difference between values.
QUANTITATIVE
Numerical data that can be measured
or counted and can be represented
numerically, such as height, weight,
or temperature.

QUALITATIVE
Non-numerical data that consists of
descriptive information, such as
colors, tastes, textures, or any other
characteristics that cannot be
counted or measured.
QUANTITATIVE
QUANTITATIVE
DATA TYPES
Distinct and separate values
DISCRETE with no intermediate values in
between.

Infinitely divisible and can take

on any value within a certain
CONTINUOUS range or interval. Encompasses
both INTERVAL and RATIO
data.

Continuous Data Type -

numerical data where the
INTERVAL intervals between values are
equal but no true zero point
exists.

Continuous Data Type -

numerical data with a true zero
RATIO point, allowing for meaningful
ratios and comparisons
between values.
QUALITATIVE
QUALITATIVE
DATA TYPES

Distinct categories or groups

CATEGORICAL with no inherent order or
numerical significance.

Data with a natural order or

ranking among its categories,
ORDINAL indicating relative differences
or preferences.

Categorical data that has only

BINARY two possible outcomes or
categories.
DISTRIBUTIONS
DISTRIBUTION TYPES
WHY IT MATTERS
1. Understanding the data: Understanding the distribution of
your data gives insight into the nature and behavior of the
variables you are studying. It helps you identify your data's
patterns, trends, and potential outliers.
2. Statistical assumptions: Many statistical tests and models
make assumptions about the distribution of the data. For
example, the t-test assumes that the data follows a normal
distribution. If these assumptions are violated, it can lead to
incorrect conclusions. Knowing the distribution of your data
helps you choose the appropriate statistical methods.
3. Predictive modeling: When building predictive models, the
distribution of the data can inform the selection of
algorithms or the model's configuration. Some machine
learning algorithms are more suited to certain types of
distributions.
4. Data transformation: If your data does not follow the
distribution required by a particular statistical method, you
may need to transform it. For example, if your data is
skewed, you might apply a logarithmic transformation to
make it more symmetrical. Understanding the distribution
can guide these transformations.
5. Risk management: In fields like finance and insurance,
understanding data distribution is crucial for risk
assessment. For example, the distribution of returns on
investment can help determine the probability of a
significant loss.
6. Data quality: Examining data distribution can also be a way
to check data quality. If the data doesn't follow expected
distributions, it may indicate errors or bias in the data
collection process.
PARAMETRIC
Assume that the data follows a
certain specific distribution pattern,
and the parameters of that
distribution are estimated from the
data.

NON-PARAMETRIC
Do not assume that the data follow
any specific distribution. They are
defined without the assumption of
underlying parameters
PARAMETRIC
PARAMETRIC
DISTRIBUTIONS
Symmetric around the mean,
showing that data near the
NORMAL mean are more frequent in
occurrence than data far from
the mean.

Continuous probability
distribution that models the
time it takes for an event to
WEIBULL occur and is commonly used in
reliability and survival
analysis.

Discrete probability
distribution that models the
POISSON number of events occurring in
a fixed interval of time or
space.

Continuous probability
distribution that models the
time between events in a
EXPONENTIAL Poisson process, where events
occur independently and at a
constant average rate.
NON-PARAMETRIC
NON-PARAMETRIC
DISTRIBUTIONS

Probability distribution where

all outcomes or values within a
UNIFORM given range have an equal
probability of occurring.

Based on observed data rather

EMPIRACLE than being derived from a
known mathematical formula.

Discrete probability
distribution representing a
random experiment with only
BERNOULLI two possible outcomes,
typically denoted as success
(1) or failure (0), each with a
fixed probability.
STATISTICAL
TESTS
T-TEST
Compares the
PURPOSE means of two
groups

WHEN TO USE Two related groups

IT to compare

DISTRIBUTION Normal

DATA TYPE Continuous

If there is a
WHAT IT significant
SHOWS differences between
group means
T-TEST OUTPUT

The t-value is calculated based on

the difference in means between
Test Statistic the two groups and the variability
within the groups.

The number of independent pieces

Degrees of
of information available to estimate
Freedom the population parameter.

Probability of obtaining the

observed difference (or a more
extreme difference) between the
p-value groups by chance alone, assuming
that the null hypothesis is true (i.e.,
there is no difference between the
groups)
CHI-SQUARE
Test for association
PURPOSE between variables

Assess relationship
WHEN TO USE
between categorical
IT variables

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Categorical

Look for significant

WHAT IT differences between
SHOWS observed and
expected values
CHI-SQUARE
OUTPUT

Measures the discrepancy between

Chi-Square
the observed and expected
Value frequencies.

Degrees of
The number of categories minus 1
Freedom

The probability associated with the

p-value test statistic. It indicates the level
of statistical significance.
ANOVA
Compare means of
PURPOSE multiple groups

WHEN TO USE Three or more

IT groups

Normally
DISTRIBUTION distributed

DATA TYPE Numerical

Significant
WHAT IT
differences between
SHOWS group means
ANOVA OUTPUT

Information about the variation

Between
between the different groups being
Groups compared.

Within Information about the variation

Groups within each group.

Overall sum of squares and degrees

of freedom for the entire dataset,
Total combining the between and within
group variations.
REGRESSION
Examine
PURPOSE relationships
between variables

Predict the value of

WHEN TO USE
a dependent
IT variable

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Numerical

Assess the strength

WHAT IT
and significance of
SHOWS relationships
REGRESSION
OUTPUT

Regression
Y = 12.345 + 0.987 * X_Variable
Equation
The intercept (12.345) represents the
estimated value of the dependent
variable when the independent
variable (X_Variable) is zero.
Coefficients The coefficient for X_Variable (0.987)
represents the estimated change in
the dependent variable for a one-unit
increase in X_Variable.

Proportion of the variance in the

R-Square dependent variable that is explained by
the independent variables.

p-value Statistical significance of a coefficient.

Mann-Whitney U
Test
Compare
PURPOSE distributions of two
groups

Compare
WHEN TO USE
distributions of two
IT independent groups

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Numerical/Ordinal

Significant
WHAT IT
differences in rank
SHOWS order
MANN-WHITNEY
OUTPUT

Rank-based test statistic used in

the Mann-Whitney U test. It
U Statistic quantifies the degree of difference
between the two groups.

Statistical significance of the test. It

indicates the probability of
obtaining the observed difference
p-value between the groups if there were no
true differences in the populations
from which the samples were
drawn.
Kruskal-Wallis
Compare
PURPOSE distributions of
multiple groups

Compare
WHEN TO USE distributions of
IT three or more
independent groups

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Numerical/Ordinal

Look for significant

WHAT IT
differences in rank
SHOWS order
Kruskal-Wallis
Output

Sum of ranks across all groups and

H is used to assess the differences
between the groups.

Degrees of
Number of groups minus 1
Freedom

Strength of evidence against the

null hypothesis (the assumption
p-value that there are no differences
between the groups).
Pearson's
Correlation
Measure the
PURPOSE strength of linear
relationship

Assess the strength

WHEN TO USE
and direction of a
IT linear relationship

Normally
DISTRIBUTION distributed

DATA TYPE Numerical

Look for correlation

WHAT IT
coefficient and its
SHOWS significance
Pearson's
Correlation Output

Strength and direction of the linear

relationship between the variables.
It ranges from -1 to +1. A positive
Correlation value indicates a positive
Coefficient (r) correlation, a negative value
indicates a negative correlation,
and a value close to zero indicates a
weak or no correlation.

Probability of observing the given

p-value correlation coefficient by chance.

Number of data points used to

Sample Size
calculate the correlation
(n) coefficient.
Spearman's
Correlation
Measure the
strength of
PURPOSE monotonic
relationship

Assess the strength

WHEN TO USE and direction of a
IT monotonic
relationship

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Numerical/Ordinal

Look for correlation

WHAT IT
coefficient and its
SHOWS significance
Spearman's
Correlation Output

Strength and direction of the linear

Probability of observing the given

p-value correlation coefficient by chance.

Number of data points used to

Sample Size
calculate the correlation
(n) coefficient.
One-Sample
T-Test
Compare sample
PURPOSE mean to a known
population mean

Compare a sample
WHEN TO USE
mean to a known
IT value

Normally
DISTRIBUTION distributed

DATA TYPE Numerical

Look for significant

differences between
WHAT IT
the sample mean
SHOWS and the known
population mean
One Sample T-Test
Output

Difference between the sample

mean and the hypothesized
t-statistic population mean in terms of
standard errors

Probability of obtaining the

observed difference (or a more
p-value extreme difference) between the
sample and the hypothesized
population by chance alone.

Number of data points used to

Sample Size
calculate the correlation
(n) coefficient.
Wilcoxon
Signed-Rank
Compare paired
PURPOSE samples

WHEN TO USE Compare paired

IT observations

No strict
DISTRIBUTION distribution
requirement

DATA TYPE Numerical/Ordinal

Look for significant

WHAT IT
differences between
SHOWS paired observations
Wilcoxon Signed-
Rank Output

Summarizes the data and is used to

V assess the statistical significance of
the test.

p-value Statistical significance of the test

HOORAY!
🥳
Save this post, and tag me
as you develop these data
analytics core skills.

HAPPY LEARNING!
🙌

Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Data Analysis
No ratings yet
Data Analysis
49 pages
Cebu - Day 1 (Descriptive Statistics Lecture) Part 1
No ratings yet
Cebu - Day 1 (Descriptive Statistics Lecture) Part 1
107 pages
CCM 202 Lecture 2 Statistics
No ratings yet
CCM 202 Lecture 2 Statistics
11 pages
Descriptive Statistics: Instructor: Maira Sami
No ratings yet
Descriptive Statistics: Instructor: Maira Sami
55 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
It Is Also Including Hypothesis Testing and Sampling
No ratings yet
It Is Also Including Hypothesis Testing and Sampling
12 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Data Analysiswith SPSSPPT
No ratings yet
Data Analysiswith SPSSPPT
192 pages
Updated - BCSC 108 MAY 24 Introduction To Statistics
No ratings yet
Updated - BCSC 108 MAY 24 Introduction To Statistics
69 pages
Fds Presentation II YEAR
No ratings yet
Fds Presentation II YEAR
21 pages
Insem AIML
No ratings yet
Insem AIML
8 pages
Statistics For Management: Assignment - 1
No ratings yet
Statistics For Management: Assignment - 1
12 pages
Descriptive Statistics Basics
No ratings yet
Descriptive Statistics Basics
72 pages
Ba Lecture 2
No ratings yet
Ba Lecture 2
54 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (2)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
Dataanalysiswithspssppt 221110071954 6ebd3b41
No ratings yet
Dataanalysiswithspssppt 221110071954 6ebd3b41
189 pages
Choose The Right Statistical Test
No ratings yet
Choose The Right Statistical Test
13 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Further Bound Reference
No ratings yet
Further Bound Reference
42 pages
BA 14 - Decision-Making
No ratings yet
BA 14 - Decision-Making
13 pages
Statistics for Beginners
No ratings yet
Statistics for Beginners
26 pages
Basic Statistical Tests
No ratings yet
Basic Statistical Tests
36 pages
Method Chooser Basic Statistical Tests
100% (1)
Method Chooser Basic Statistical Tests
36 pages
Module1 Understanding Data1
No ratings yet
Module1 Understanding Data1
56 pages
Graphing Three Variable Data
No ratings yet
Graphing Three Variable Data
60 pages
WINSEM2024-25 MCSE615L TH VL2024250502897 2025-01-07 Reference-Material-I
No ratings yet
WINSEM2024-25 MCSE615L TH VL2024250502897 2025-01-07 Reference-Material-I
50 pages
Variables and Data Presentation
No ratings yet
Variables and Data Presentation
64 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Statistical Analysis (Lecture 1)
No ratings yet
Statistical Analysis (Lecture 1)
40 pages
MATH2203 Statistics I - Week 1
No ratings yet
MATH2203 Statistics I - Week 1
27 pages
BoS - Session 1
100% (1)
BoS - Session 1
37 pages
Statistical Techniques For Analyzing Quantitative Data
100% (1)
Statistical Techniques For Analyzing Quantitative Data
41 pages
Data Types & Distribution Basics
No ratings yet
Data Types & Distribution Basics
18 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Unit 1 AIDS
No ratings yet
Unit 1 AIDS
128 pages
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
100% (1)
Six Sigma: Statistics: By: - Hakeem-Ur-Rehman
44 pages
ES031 M1 DataCollection&Presentation
No ratings yet
ES031 M1 DataCollection&Presentation
64 pages
Midterms Gec Math Adooooor
100% (1)
Midterms Gec Math Adooooor
6 pages
Data Analyst
No ratings yet
Data Analyst
21 pages
MMW Stat 24 25
No ratings yet
MMW Stat 24 25
42 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Stat Distributions
No ratings yet
Stat Distributions
24 pages
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
100% (1)
Statistics I - Introduction To ANOVA, Regression, and Logistic Regression
29 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Categorical and Numerical Data Analysis
No ratings yet
Categorical and Numerical Data Analysis
30 pages
BCSC 108 MAY 24 Introduction To Statistics
No ratings yet
BCSC 108 MAY 24 Introduction To Statistics
63 pages
Biostatistics - I
No ratings yet
Biostatistics - I
46 pages
Wa0014
No ratings yet
Wa0014
63 pages
Module 6 Statistics
No ratings yet
Module 6 Statistics
44 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Statistics Notes Part - 1
No ratings yet
Statistics Notes Part - 1
25 pages
Stats Notes
No ratings yet
Stats Notes
81 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Statistics Equationls
No ratings yet
Statistics Equationls
5 pages
Psychology A Journey 5th Edition Dennis Coon - Downloadable PDF 2025
No ratings yet
Psychology A Journey 5th Edition Dennis Coon - Downloadable PDF 2025
52 pages
Analysis and Design of Offshore Tubular Members Against Ship Impacts
No ratings yet
Analysis and Design of Offshore Tubular Members Against Ship Impacts
40 pages
Mock Test in Science 5
No ratings yet
Mock Test in Science 5
8 pages
Column Buckling
No ratings yet
Column Buckling
22 pages
Envea Stack 602 Dynamic Opacity Particulate Measurement en
No ratings yet
Envea Stack 602 Dynamic Opacity Particulate Measurement en
4 pages
PGProspectus 2022 Final
No ratings yet
PGProspectus 2022 Final
128 pages
Seats of Matrix For UG PG 2024 25
No ratings yet
Seats of Matrix For UG PG 2024 25
35 pages
Castel Cool 2019 V1 - 0
No ratings yet
Castel Cool 2019 V1 - 0
56 pages
Basil Seed Bionanocomposite Films
No ratings yet
Basil Seed Bionanocomposite Films
9 pages
Geosynthetics & Reinforced Earth Guide
No ratings yet
Geosynthetics & Reinforced Earth Guide
2 pages
Effective Thesis Statement Guide
No ratings yet
Effective Thesis Statement Guide
4 pages
Iere High School Exam Timetable 2025
No ratings yet
Iere High School Exam Timetable 2025
2 pages
Rubrics
No ratings yet
Rubrics
8 pages
DLL - Science 3 - Q3 - W5
No ratings yet
DLL - Science 3 - Q3 - W5
4 pages
English Test Unit 16: Pronunciation and Language Focus
No ratings yet
English Test Unit 16: Pronunciation and Language Focus
4 pages
Climate Change Impact on Food Security
No ratings yet
Climate Change Impact on Food Security
11 pages
35 Essential Foreign Words in English
No ratings yet
35 Essential Foreign Words in English
3 pages
Report On Coal Resources of The Philippines, November 1985
No ratings yet
Report On Coal Resources of The Philippines, November 1985
51 pages
Philosophy Final Exam Guidelines 2020
No ratings yet
Philosophy Final Exam Guidelines 2020
4 pages
Biofilter Design PDF
No ratings yet
Biofilter Design PDF
119 pages
Virtual Reality - History Applications Technology
No ratings yet
Virtual Reality - History Applications Technology
78 pages
Department of Economics Undergraduate Courses
No ratings yet
Department of Economics Undergraduate Courses
4 pages
FEM DV Checklist For ADAPT-Builder (Summary) (20240101)
No ratings yet
FEM DV Checklist For ADAPT-Builder (Summary) (20240101)
33 pages
Carrier Load Fundamentals
100% (31)
Carrier Load Fundamentals
60 pages
29 Activity Sheet
No ratings yet
29 Activity Sheet
4 pages
Parts of A Scientific Research Paper Using IMRaD
No ratings yet
Parts of A Scientific Research Paper Using IMRaD
22 pages
Reading and Writing Lesson Plan
No ratings yet
Reading and Writing Lesson Plan
24 pages
Polynomial Division Lesson Plan
No ratings yet
Polynomial Division Lesson Plan
10 pages
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
No ratings yet
What We Know About Transformational Leadership in Tourism and Hospitality A Systematic Review and Future Agenda
44 pages
Essays On Psychology
100% (2)
Essays On Psychology
6 pages

The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests

Uploaded by

The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests

Uploaded by

The Data Analyst's Guide to

Data Types, Distributions, and Statistical

2. Data Visualization: Your data type determines the best

3. Data Transformation: Understanding your data type can

4. Data Quality: Knowing your data type can help you

5. Interpretation of Results: The type of data you have

Infinitely divisible and can take

Continuous Data Type -

Continuous Data Type -

Distinct categories or groups

Data with a natural order or

Categorical data that has only

Probability distribution where

Based on observed data rather

WHEN TO USE Two related groups

DATA TYPE Continuous

The t-value is calculated based on

The number of independent pieces

Probability of obtaining the

DATA TYPE Categorical

Look for significant

Measures the discrepancy between

The probability associated with the

WHEN TO USE Three or more

DATA TYPE Numerical

Information about the variation

Within Information about the variation

Overall sum of squares and degrees

Predict the value of

DATA TYPE Numerical

Assess the strength

Proportion of the variance in the

p-value Statistical significance of a coefficient.

DATA TYPE Numerical/Ordinal

Rank-based test statistic used in

Statistical significance of the test. It

DATA TYPE Numerical/Ordinal

Look for significant

Sum of ranks across all groups and

Strength of evidence against the

Assess the strength

DATA TYPE Numerical

Look for correlation

Strength and direction of the linear

Probability of observing the given

Number of data points used to

Assess the strength

DATA TYPE Numerical/Ordinal

Look for correlation

Strength and direction of the linear

Probability of observing the given

Number of data points used to

DATA TYPE Numerical

Look for significant

Difference between the sample

Probability of obtaining the

Number of data points used to

WHEN TO USE Compare paired

DATA TYPE Numerical/Ordinal

Look for significant

Summarizes the data and is used to

p-value Statistical significance of the test

You might also like