0% found this document useful (0 votes)
35 views10 pages

Exploratory Data Analysis2

Uploaded by

chunk2learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views10 pages

Exploratory Data Analysis2

Uploaded by

chunk2learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Products Company Resources Start a Conversation

Exploratory Data Analysis


Overview

By Radha Srinivasan
Techniques and tools
December 28, 2022 Data Analytics, Data Visualization, Technical  0 5 mins read
 Share

1. Visualization methods

2. Non Visualization
methods

Usage

Conclusion


Products Company Resources Start a Conversation

Overview

Techniques and tools

1. Visualization methods

2. Non Visualization
methods

Usage

Conclusion

“Far better an approximate answer to the right question, which is often vague, than an
Definition exact answer to the wrong question, which can always be made precise.” — John Tukey.
Exploratory Data Analysis is the process of performing initial investigations on data to
discover trends and patterns, spot anomalies, test hypotheses and check assumptions
with the help of statistical summary and graphical representations.


A smart strategy is to first comprehend the data and then try to extract as many insights
Products
from it as you can. It is all about makingCompany Resources
sense of the Start adigging
data in hand, before Conversation
deep
into it.
Overview

Techniques and tools


Overview
1. Visualization methods
EDA is used before modeling to see what the data can tell us. It is not easy to look
2. Non Visualizationat the whole spreadsheet and determine important characteristics of the data. It
methods may be overwhelming to derive insights by looking at plain numbers.

Usage EDA is not a formal process with a predefined set of rules, it is a state of mind on
how a data analysis should be carried out. It is a philosophy as to how to dissect
Conclusion a data set, what to look for, how to look, and how to interpret. In the initial phases
of EDA, we should feel free to investigate every idea that crosses our minds.
Some may pan out, while others can be dead ends.

EDA encompasses a larger venue, it is an approach to data analysis that


postpones the usual assumptions about what kind of model to follow. Though
EDA heavily uses the collection of techniques that is called “statistical graphics”,
it is not identical to statistical graphics per se.

Techniques and tools



With a few exceptions, most EDA techniques are graphical. The primary function
of EDA is open-minded
Products
exploration.
CompanyVisualsResources
or graphics giveStart
analysts
a Conversation
unmatched
power to achieve this by analyzing the data to expose its structural secrets and
Overview preparing them to always obtain some new, unexpected insight into the data.

The specific graphical methods used in EDA are straightforward and it consists
Techniques and tools
of different methods as below:
1. Visualization methods
Plotting raw data in data traces, histograms, probability, block and lag plots, etc.,
2. Non VisualizationPlotting simple statistics from raw data in mean, standard deviation, box and
methods
main effects plots.
Usage
Using numerous plots per page, for example, to leverage our innate pattern-
Conclusion recognition ability.

The most commonly utilized data science tools for developing an EDA are:

Python: Python and EDA can be used together to identify missing values in a
data set, which is important so you can decide how to handle missing values for
machine learning.

R: The R language is widely used among statisticians in data science in


developing statistical observations and data analysis.

1. Visualization methods

Univariate visualization means visualizing each field in the raw dataset, with the
Products Company Resources
statistical summary. Start a Conversation

Overview Bivariate visualizations and statistical summaries enable you to evaluate the link
between each field in the dataset and the target variable under consideration.
Techniques and tools
Multivariate visualizations are used to map and analyze interactions between
multiple fields in data.
1. Visualization methods
Techniques for clustering and dimension reduction aid in the creation of
2. Non Visualizationgraphical displays of high-dimensional data with many variables. Clustering is a
methods
technique that is widely used in market segmentation, pattern recognition, and
Usage image compression.

Conclusion 2. Non Visualization methods


Univariate non-graphical. This is the most basic type of data analysis, in which
the data being evaluated consists of only one variable. Since there is only one
variable, no causes or correlations are dealt with. Univariate analysis’s main
objective is to describe the data and find patterns within it.

Multivariate nongraphical: Multivariate Multivariate data is made up of multiple


variables. In general, multivariate non-graphical EDA techniques use cross-
tabulation or statistics to illustrate the link between two or more variables of the
data.


Predictive models, something like linear regression, rely on statistics and data to
make predictions.Products Company Resources Start a Conversation

Overview A data item or object that considerably differs from the other (so-called normal)
items is referred to as an outlier. Errors in measurement or execution may be the
reason for them. Outlier mining is the analysis used for outlier discovery.
Techniques and tools
Usage
1. Visualization methods
EDA is fundamentally a creative process. The key to asking good questions, as
2. Non Visualizationwith most creative processes, is to come up with a lot of them.
methods
1. Find out what are the most common values? Why?
Usage
2. List out the rare values? Why? Is that in line with your expectations?
Conclusion
3. Any visible unusual patterns? What could be the reason for those patterns?

Clusters of similar values suggest that subgroups do exist in your data. To get
more insights into subgroups ask the below questions.

1. How are the entities within the cluster similar to each other?

2. How are the entities in separate clusters differ from each other?

3. How to explain/describe the clusters?


We can quickly drill down into the most interesting parts of your data—and
Products Company
develop a set of thought-provoking Resources
questions—if we followStart a Conversation
up each question
with a new question based on our findings. Summarizing what we can get from
Overview EDA,

Techniques and tools maximize insight into a data set;


uncover underlying structure;
1. Visualization methodsextract important variables;
detect outliers and anomalies;
2. Non Visualization test underlying assumptions;
methods develop parsimonious models; and
determine optimal factor settings.
Usage
It can also help identify obvious errors, as well as better understand patterns
Conclusion within the data, detect outliers or anomalous events and find interesting
relations among the variables.

It is also used to ensure the results produced by data analysis are valid and
applicable to any desired business outcomes and goals. Once EDA is complete
and insights are drawn, Its capabilities can subsequently be applied to more
advanced data analysis or modeling, such as machine learning.

Conclusion
EDA provides the context necessary to create an acceptable model for the
problem at hand and to accurately understand its results, making it an essential
step to take before delving into machine learning or statistical modeling. Data 
scientists can benefit from EDA to ensure that the outcomes they provide are
Products
reliable, accurately Company
interpreted, Resources
and applicable Start business
to the intended a Conversation
contexts.
Overview

Techniques and tools

1. Visualization methods

2. Non Visualization
methods

Usage

Conclusion

Related Posts


Customer Engagement Technical Products Company Resources
Technical Start a Conversation
Generative AI Technical
The Multimodal How Conversational
Overview
9 ways ParrotGPT Frontier: A Glimpse BI Empowers Non-
Enhances Customer
Techniques and tools into the Future of Technical Users for
Feedback Collection Generative AI Informed Decision-
1. Visualization methods
Collecting meaningful We’ve automated text with Making
customer feedback has Generative AI. We’ve In today’s rapidly evolving
2. Non Visualization
always been a challenging automated voice with
methods business landscape, data has
task. Traditional methods— Generative AI. We’ve become the lifeblood of
email surveys, call centers, or
Usage automated art with decision-making. From
web forms—are slow, often… Generative AI. What’s the… optimizing operations to
Conclusion understanding customer
behavior,…
pS Content Desk Sridhar CS pS Content Desk
September 12, 2024 July 25, 2024 May 13, 2024


Products Company Resources Start a Conversation

Overview
Products Company Resources
Kea About Us Blogs
ParrotGPT Careers Case Studies
Techniques and tools
Nutcracker Events Guides
Contact Us
1. Visualization methods

2. Non Visualization
methods
© 2025 Purplescape Inc ⚡ Made with 🤍 at Chennai, India ⚡ Privacy Policy ⚡ Terms & Conditions
Usage

Conclusion

You might also like