0% found this document useful (0 votes)
151 views7 pages

Exploratory Data Analysis and Data Science - Part 1

The document outlines the importance of Exploratory Data Analysis (EDA) as a flexible approach to understanding data without predefined hypotheses or models. It discusses basic tools and methods used in EDA, such as plots, graphs, and summary statistics, and emphasizes its role in gaining intuition about data, checking for errors, and summarizing findings. Additionally, it highlights the distinction between EDA and data visualization, noting that EDA is an early step in the data science process.

Uploaded by

dhruthin1907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
151 views7 pages

Exploratory Data Analysis and Data Science - Part 1

The document outlines the importance of Exploratory Data Analysis (EDA) as a flexible approach to understanding data without predefined hypotheses or models. It discusses basic tools and methods used in EDA, such as plots, graphs, and summary statistics, and emphasizes its role in gaining intuition about data, checking for errors, and summarizing findings. Additionally, it highlights the distinction between EDA and data visualization, noting that EDA is an early step in the data science process.

Uploaded by

dhruthin1907
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Exploratory Data Analysis

and
Data Science
Module 2
Content

1. EDA
a. Basic tools of EDA
b. Philosophy of EDA
2. The Data Science process
a. Case study: Real Direct (online realestate firm)
3. Three basic Machine Learning Algorithms
a. Linear Regression
b. k-Nearest Neighbours (k-NN)
c. k-means
Exploratory Data Analysis
(EDA)
Introduction

1. “Exploratory data analysis” is an attitude, a state of flexibility, a willingness to look for


those things that we believe are not there, as well as those we believe to be there. — John
Tukey
2. Exploratory Data Analysis (EDA) as the first step toward building a model.
3. The “exploratory” aspect means that your understanding of the problem you are solving,
or might solve, is changing as you go.
4. So EDA, there is no hypothesis and there is no model.
5. It’s traditionally presented as a bunch of histograms and stem-and-leaf plots.
6. EDA is a critical part of the data science process,
a. Basic tools of EDA

1. The basic tools of EDA are plots, graphs and summary statistics.
2. It’s a method of systematically going through the data, plotting distributions of all
variables (using box plots), plotting time series of data, transforming variables, looking at
all pairwise relationships between variables using scatterplot matrices, and generating
summary statistics for all of them.
3. At the very least that would mean computing their mean, minimum, maximum, the upper
and lower quartiles, and identifying outliers.
4. EDA is about your relationship with the data.
5. You want to understand the data—gain intuition, understand the shape of it, and try to
connect your understanding of the process that generated the data to the data itself.
6. EDA happens between you and the data and isn’t about proving anything to anyone else
yet.
b. Philosophy of EDA

1. In the context of data in an Internet/engineering company, EDA is done for some of the
same reasons it’s done with smaller datasets, but there are additional reasons to do it
with data that has been generated from logs.
2. There are important reasons anyone working with data should do EDA. Namely,
a. To gain intuition about the data;
b. To make comparisons between distributions;
c. For sanity checking (making sure the data is on the scale you expect, in the format
you thought it should be);
d. To find out where data is missing or if there are outliers;
e. To summarize the data.
b. Philosophy of EDA

1. In the context of data generated from logs, EDA also helps with debugging the logging
process.
a. For example, “patterns” you find in the data could actually be something wrong in the
logging process that needs to be fixed. If you never go to the trouble of debugging,
you’ll continue to think your patterns are real.
2. The engineers we’ve worked with are always grateful for help in this area.
3. Visualization involved in EDA, we distinguish between EDA and data visualization in that
EDA is done toward the beginning of analysis, and data visualization is done toward the
end to communicate one’s findings.
4. In the end, EDA helps you make sure the product is performing as intended

You might also like