Exploratory Data Analysis and Data Science - Part 1

The document outlines the importance of Exploratory Data Analysis (EDA) as a flexible approach to understanding data without predefined hypotheses or models. It discusses basic tools and methods used in EDA, such as plots, graphs, and summary statistics, and emphasizes its role in gaining intuition about data, checking for errors, and summarizing findings. Additionally, it highlights the distinction between EDA and data visualization, noting that EDA is an early step in the data science process.

Uploaded by

dhruthin1907

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

151 views7 pages

Exploratory Data Analysis and Data Science - Part 1

Uploaded by

dhruthin1907

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Exploratory Data Analysis

and
Data Science
Module 2
Content

1. EDA
a. Basic tools of EDA
b. Philosophy of EDA
2. The Data Science process
a. Case study: Real Direct (online realestate ﬁrm)
3. Three basic Machine Learning Algorithms
a. Linear Regression
b. k-Nearest Neighbours (k-NN)
c. k-means
Exploratory Data Analysis
(EDA)
Introduction

1. “Exploratory data analysis” is an attitude, a state of ﬂexibility, a willingness to look for

those things that we believe are not there, as well as those we believe to be there. — John
Tukey
2. Exploratory Data Analysis (EDA) as the ﬁrst step toward building a model.
3. The “exploratory” aspect means that your understanding of the problem you are solving,
or might solve, is changing as you go.
4. So EDA, there is no hypothesis and there is no model.
5. It’s traditionally presented as a bunch of histograms and stem-and-leaf plots.
6. EDA is a critical part of the data science process,
a. Basic tools of EDA

1. The basic tools of EDA are plots, graphs and summary statistics.
2. It’s a method of systematically going through the data, plotting distributions of all
variables (using box plots), plotting time series of data, transforming variables, looking at
all pairwise relationships between variables using scatterplot matrices, and generating
summary statistics for all of them.
3. At the very least that would mean computing their mean, minimum, maximum, the upper
and lower quartiles, and identifying outliers.
4. EDA is about your relationship with the data.
5. You want to understand the data—gain intuition, understand the shape of it, and try to
connect your understanding of the process that generated the data to the data itself.
6. EDA happens between you and the data and isn’t about proving anything to anyone else
yet.
b. Philosophy of EDA

1. In the context of data in an Internet/engineering company, EDA is done for some of the
same reasons it’s done with smaller datasets, but there are additional reasons to do it
with data that has been generated from logs.
2. There are important reasons anyone working with data should do EDA. Namely,
a. To gain intuition about the data;
b. To make comparisons between distributions;
c. For sanity checking (making sure the data is on the scale you expect, in the format
you thought it should be);
d. To ﬁnd out where data is missing or if there are outliers;
e. To summarize the data.
b. Philosophy of EDA

1. In the context of data generated from logs, EDA also helps with debugging the logging
process.
a. For example, “patterns” you find in the data could actually be something wrong in the
logging process that needs to be fixed. If you never go to the trouble of debugging,
you’ll continue to think your patterns are real.
2. The engineers we’ve worked with are always grateful for help in this area.
3. Visualization involved in EDA, we distinguish between EDA and data visualization in that
EDA is done toward the beginning of analysis, and data visualization is done toward the
end to communicate one’s findings.
4. In the end, EDA helps you make sure the product is performing as intended

Exploratory Data Analysis Overview
No ratings yet
Exploratory Data Analysis Overview
34 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
34 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
DSV Module-3
No ratings yet
DSV Module-3
24 pages
DMV Unit-4-1 PDF
100% (1)
DMV Unit-4-1 PDF
10 pages
Data Mining & Warehousing Basics
100% (1)
Data Mining & Warehousing Basics
86 pages
Module 3 Notes
No ratings yet
Module 3 Notes
37 pages
CS3352-FDS 2 Marks Questions With Answer
No ratings yet
CS3352-FDS 2 Marks Questions With Answer
20 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Unit Iii
No ratings yet
Unit Iii
108 pages
JNTUA MCA V Semester R17 Syllabus
No ratings yet
JNTUA MCA V Semester R17 Syllabus
24 pages
Understanding Nice and Ugly Domains
100% (1)
Understanding Nice and Ugly Domains
76 pages
External Practical File PDF
No ratings yet
External Practical File PDF
39 pages
AIML Lab Manual for VTU Students
No ratings yet
AIML Lab Manual for VTU Students
43 pages
Data & Info Security Q&A Guide
No ratings yet
Data & Info Security Q&A Guide
5 pages
AD3491-Unit 2
No ratings yet
AD3491-Unit 2
102 pages
BA - Module 1
No ratings yet
BA - Module 1
27 pages
Module2 Ids 240201 162026
No ratings yet
Module2 Ids 240201 162026
11 pages
Unit 3 DS
No ratings yet
Unit 3 DS
16 pages
Enterprise Information Architecture Component Model - Chapter 5
100% (1)
Enterprise Information Architecture Component Model - Chapter 5
27 pages
Machine Learning Notes Anna University
No ratings yet
Machine Learning Notes Anna University
21 pages
DR B R Ambedkar National Institute of Technology, Jalandhar CSPC-203, Object Oriented Programming End Semester Examination, Dec 2020
No ratings yet
DR B R Ambedkar National Institute of Technology, Jalandhar CSPC-203, Object Oriented Programming End Semester Examination, Dec 2020
2 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
Unit V
No ratings yet
Unit V
49 pages
Banking System Design in C++ OOP
No ratings yet
Banking System Design in C++ OOP
6 pages
Unit 2 Fod
No ratings yet
Unit 2 Fod
27 pages
R23 Software Engineering Unit 2
No ratings yet
R23 Software Engineering Unit 2
32 pages
Ca Lab Manual 8085
No ratings yet
Ca Lab Manual 8085
12 pages
Unit-1 DVT
No ratings yet
Unit-1 DVT
44 pages
Fdsa Question Bank Unit 3,4,5
No ratings yet
Fdsa Question Bank Unit 3,4,5
9 pages
HTML5 Data Visualization Techniques
No ratings yet
HTML5 Data Visualization Techniques
130 pages
UNIT-III Data Warehouse and Minig Notes MDU
No ratings yet
UNIT-III Data Warehouse and Minig Notes MDU
42 pages
Data Structures Previous Year Question Paper
No ratings yet
Data Structures Previous Year Question Paper
6 pages
UNIT 5 - Data Science - III BSC CS
No ratings yet
UNIT 5 - Data Science - III BSC CS
16 pages
CS3352 FDS
No ratings yet
CS3352 FDS
23 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
Maheshwari Chapter 1
No ratings yet
Maheshwari Chapter 1
39 pages
4-Data Cleaning, Data Integration, Data Transformation, Data Reduction-03-02-2024
No ratings yet
4-Data Cleaning, Data Integration, Data Transformation, Data Reduction-03-02-2024
22 pages
DWM-Experiment No-1,2,3,4,5,6,7,8
No ratings yet
DWM-Experiment No-1,2,3,4,5,6,7,8
42 pages
Python Lab Mcan 191
No ratings yet
Python Lab Mcan 191
35 pages
Unit 3
100% (1)
Unit 3
22 pages
STREAM PROCESSING 2 Marks Question and Answers
No ratings yet
STREAM PROCESSING 2 Marks Question and Answers
8 pages
BA Lab Manual
No ratings yet
BA Lab Manual
62 pages
HCI Designer Career Exploration Guide
100% (1)
HCI Designer Career Exploration Guide
2 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
AI Unit 1.
No ratings yet
AI Unit 1.
15 pages
Python Lab Manual New
No ratings yet
Python Lab Manual New
16 pages
Lab-manual-Advanced Python Programming 4321602
No ratings yet
Lab-manual-Advanced Python Programming 4321602
24 pages
Data Science Foundations Question Bank
No ratings yet
Data Science Foundations Question Bank
16 pages
Flowchart of Sequential Search: Begin
No ratings yet
Flowchart of Sequential Search: Begin
2 pages
Data Analysis and Visualization Techniques
100% (1)
Data Analysis and Visualization Techniques
28 pages
Unit 1 DS BCA NOTES
No ratings yet
Unit 1 DS BCA NOTES
7 pages
Critical Characteristics of Information
No ratings yet
Critical Characteristics of Information
11 pages
Instruction Execution in Computer Architecture
No ratings yet
Instruction Execution in Computer Architecture
16 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
AI Full
No ratings yet
AI Full
104 pages
Python For Data Analysis 2nd Module
No ratings yet
Python For Data Analysis 2nd Module
14 pages
DSV Module-2
No ratings yet
DSV Module-2
23 pages
Module 2
No ratings yet
Module 2
78 pages
Moving Application for Residents
No ratings yet
Moving Application for Residents
2 pages
Applications of Partial Derivatives in Economics
No ratings yet
Applications of Partial Derivatives in Economics
4 pages
Cambridge IGCSE™: First Language English 0500/21 May/June 2020
No ratings yet
Cambridge IGCSE™: First Language English 0500/21 May/June 2020
11 pages
Prepare and Cook Meat: Cookery 10 School Year 2019 - 2020
No ratings yet
Prepare and Cook Meat: Cookery 10 School Year 2019 - 2020
43 pages
Concentrating Solar Power Explained
100% (1)
Concentrating Solar Power Explained
8 pages
Community Research Methods and Strategies
No ratings yet
Community Research Methods and Strategies
8 pages
The Victorian Workhouses
100% (1)
The Victorian Workhouses
3 pages
SVC Monnet Electrical Diagram & Impedance Analysis
No ratings yet
SVC Monnet Electrical Diagram & Impedance Analysis
4 pages
Wansview 106 HD Camera Specifications
No ratings yet
Wansview 106 HD Camera Specifications
1 page
Lotus Impex Stainless Steel Fittings
No ratings yet
Lotus Impex Stainless Steel Fittings
8 pages
2023 08 28 Budget Book 2022 23 Composed
No ratings yet
2023 08 28 Budget Book 2022 23 Composed
28 pages
Cement Stabilization of Soil
100% (1)
Cement Stabilization of Soil
19 pages
Chapter 1 - Introduction To Immunohematology
100% (1)
Chapter 1 - Introduction To Immunohematology
58 pages
Offerletters43706102179 2024 25aug2025
No ratings yet
Offerletters43706102179 2024 25aug2025
1 page
DevOps Middleware Lead Expertise
No ratings yet
DevOps Middleware Lead Expertise
4 pages
Training Input & Training Calendar
No ratings yet
Training Input & Training Calendar
11 pages
Analco-San Jose Tunnel
No ratings yet
Analco-San Jose Tunnel
6 pages
Around The World.: Languages and Communication English VI Languages and Communication English
No ratings yet
Around The World.: Languages and Communication English VI Languages and Communication English
2 pages
Bekasi Asri Pemula 2022 Annual Report
No ratings yet
Bekasi Asri Pemula 2022 Annual Report
164 pages
Failure of Chauras Bridge
No ratings yet
Failure of Chauras Bridge
8 pages
Chapter 10
No ratings yet
Chapter 10
5 pages
Pediatric Colorectal and Pelvic Reconstructive Surgery 1st Edition Alejandra Vilanova-Sanchez (Editor) Instant Download
No ratings yet
Pediatric Colorectal and Pelvic Reconstructive Surgery 1st Edition Alejandra Vilanova-Sanchez (Editor) Instant Download
96 pages
Look Sharp: Next Level Sex Lessons
No ratings yet
Look Sharp: Next Level Sex Lessons
140 pages
Evidence-Based Nursing Practice Overview
100% (2)
Evidence-Based Nursing Practice Overview
18 pages
Indian XX Movie Asian Top Girls - Us 694567
No ratings yet
Indian XX Movie Asian Top Girls - Us 694567
8 pages
Understanding Succession and Inheritance
100% (1)
Understanding Succession and Inheritance
48 pages
Sale of Undivided Interests Explained
No ratings yet
Sale of Undivided Interests Explained
4 pages
Language Study Tracker Template
No ratings yet
Language Study Tracker Template
6 pages
Bhavani - Job Ana
No ratings yet
Bhavani - Job Ana
11 pages
Lesson 4 Business Ethics and Social Responsibility
No ratings yet
Lesson 4 Business Ethics and Social Responsibility
35 pages