0% found this document useful (0 votes)

125 views8 pages

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

This document discusses a systematic approach to performing exploratory data analysis (EDA). It begins by defining EDA and explaining its significance in summarizing data, performing statistical analysis, and visualizing data to gain insights. The document then outlines the key steps in EDA: 1) defining the problem, 2) preparing the data, 3) analyzing the data through descriptive statistics and visualizations, and 4) exploring the data to answer questions and extract patterns. It provides examples of questions asked and techniques used at each step. Finally, the document emphasizes that the goal of EDA is to gain actionable insights from the data to help organizations make informed decisions. It concludes by stating EDA reveals truths about the data

Uploaded by

Velumani s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views8 pages

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

Uploaded by

Velumani s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ISSN 2278-3091

Volume 10, No.3, May - June 2021

Parvatham International Journal
Niranjan Kumaret et al., ofJournal
International Advanced Trends
of Advanced in Computer
Trends in Computer Science and Science
Engineering, and Engineering
10(3), May - June 2021, 1920 – 1927
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse601032021.pdf
https://doi.org/10.30534/ijatcse/2021/601032021

Systematic Approach to Perform Task Centric Exploratory

Data Analysis with Case study
Parvatham Niranjan Kumar1, Kambhampati Vijay Kumar 2
1
Assistant Professor, Anurag Engineering College, Kodad, India,[email protected]
2
Assistant Professor, Anurag Engineering College, Kodad, India,[email protected],

ABSTRACT EDA is treated as an art of looking at data in an effort to

understand the underlying structure of the dataset. It enables
Exploratory data analysis is a method to summarize main data analysts and data scientists to bring right information to
characteristics of data, and also to understand data more deeply the right people. It will be considered as the most important
using visualization techniques. This paper focuses on defining step on which a data-driven organization should focus.
systematic approach in the form of well-defined sequence of
steps to explore data in various aspects. Every organization EDA helps to summarize statistical characteristics of dataset by
produces lot of data. Organization needs to analyze this data focusing on four key aspects, i.e., Measuring of central
very carefully to extract hidden patterns in the data. Task tendency (comprising of the mean, mode and median),
Centric EDA [2] produces actionable insights as outcome to Measuring of spread (comprising of standard deviation and
improve business process. This uses Python programming variance) and Shape of the distribution. Rest of the paper is
language and Jupyter Notebook for data analysis. Python is an
organized as follows. Section II presents significance of EDA,
object oriented and interactive programming language, which
Section III explains about steps in EDA and also various
contains rich sets of libraries like pandas, MATplotlib,
seaborn[10] etc. We have used different types of charts and techniques for the exploratory data analysis. Section IV
various types of parameters to analyze retail dataset and to discusses how to conduct exploratory data analysis using
improve sales using precision marketing. Python and Jupyter Notebook, while Section V presents how to
work with datasets to conduct Exploratory Data Analysis with
Keywords: Exploratory data analysis, machine learning, EDA, case study. Finally, Section VI presents the concluding
sea born, matplotlib, precision marketing remarks.

2. THE SIGNIFICANCE OF EDA

1. INTRODUCTION Appropriate and well-established decisions should be made by
Data is growing very faster in today’s world. Every organization data driven organizations using huge amount of data collected
produces and also depends on a lot of data in their everyday from various sources. It is highly impossible to sense datasets
processes. It is not easy to process the data manually. containing more than a handful of data points without using
Organizations need to understand data carefully, before making computing tools. Exploratory Data Analysis is the key, and it is
assumptions or decisions based on this data. the first step in data mining process.

There are three motivations for analyzing data. First to The Key components of Exploratory Data Analysis includes
understand what has happened or what is happening, second to summarizing data, statistical analysis and visualization of data.
predict what is likely to happen in the near future, third to guide Certain insights collected by exploring the data help us to make
us in making decisions. further decisions.

Data analysis and visualization tools help us to understand data EDA actually reveals ground truth about the content without
much deeper. Data analytics allow organizations to understand making any underlying assumptions. This is the reason why
their business efficiency and performance, and also helps in data scientists use this process to actually understand what type
making informed decisions. For example, an e-commerce online of modelling and hypothesis can be created for further analysis
store might be interested in analyzing customer attributes to [13].
make ads by targeting particular categories of customers for
improving sales. 3. STEPS IN EDA
Exploratory Data Analysis (EDA) is an approach which uses
Having understood what EDA is, and its significance, let’s
both descriptive statistics and graphical tools to have better
understand the various steps involved in data analysis.
understanding of data. Especially, “Graphs” are important
Basically it involves four different steps[13]. Let’s go through
because humans are much better at seeing patterns in graphs
each of them to get a brief understanding of each step.
than in large collection of numbers.
1920
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

3.1 Problem Definition analysis, where each cell in the matrix shows the correlation
between two columns. It shows which features are strongly
Before trying to perform analysis to extract useful insights correlated with the target variable and which two features are
from the data, it is essential to define the business problem to highly correlated with each other.
be solved. The problem definition is the driving force for a data
3.4 Exploring Data
analysis. The following activities are involved in this step.
This is the heart of entire EDA process. In this step, sequence
 Defining the main objective of the analysis
of relevant questions will be prepared as story telling process to
 Obtaining the current status of the data
explore the data and also to answer the following questions
 Outlining the main roles and responsibilities [14].
 Creating an execution plan
 What happened or what is happening?
3.2 Data Preparation  To predict what will happen, in the near future
 Actionable insights help organizations to take
This step involves preparing the dataset ready for actual
informed decisions
analysis. In this step, we try to digest schema of dataset
(Column names and their data types) and main characteristics This questionnaire helps us to extract hidden patterns in data
of the data. Dataset cleaning process will be done by removing and helps us in finding solution for a given problem.
non-relevant data, transforming the data, and dividing the data
into required chunks for analysis. The following activities are 3.5 Conclusion with Actionable Insights
done in this step.
This step involves presenting analysis results to the target
 Identifying shape of dataset. audience in the form of graphs, summary tables, maps, and
 Identification of variables and data types diagrams. This step suggests Actionable Insights, which are
 Analyzing the basic metrics interpreted by the business stakeholders to improve business
 Detecting invalid data type of variables. process.
 Variable transformations
4. IMPLEMENTATION
 Missing value treatment
 Outlier treatment 4.1 Python
 Dimensionality Reduction
3.3 Data Analysis Python is the popular language used for Exploratory Data
Analysis. It has rich sets of libraries. Visualization process can
This is one of the most crucial step that deals with descriptive make it easier to create the clear report. Pandas is the most
statistics and analysis [5] of the data. The following visual powerful package in python to perform data analysis. It is built
methods help in summarizing the data, finding the hidden on the top of the NumPy package. Matplotlib or seaborn can be
correlation and relationships among the data.
used to draw plots.
3.3.1Graphical Univariate Analysis 4.2 Jupiter Notebook
Univariate analysis of data is done using only one variable. Jupyter Notebook is a web-based interactive development
Since it’s a single variable, it doesn’t deal with relationships environment for creating notebook documents. A Jupyter
among variables. Univariate analysis describes patterns that Notebook contains a list of input and output ordered cells that
exist within the data. Line chart and Histogram are used for can contain code, Markdown text, mathematical expressions,
performing univariate analysis: plots, charts, and media. Jupiter Notebooks utilize the .ipynb
file extension. A Jupyter Notebook is a great way to build step-
3.3.2 Bivariate Analysis
by-step interactive Python programs. The technology is
Bivariate analysis is to understand the relationship between particularly well-suited for data analysis and plotting.
two columns. There are many visualizations to perform 5. WORKING WITH THE DATA SET
bivariate analysis. For example, Scatter plot can be drawn to
check linear relationship between year_built and house_price, It’s time to explore the data and find about it. The data we are
and hexbin plot to check the distribution of price in different using belongs to Retail Dataset. We are going to analyze this
year ranges. data with possible set of options.

3.3.3 Correlation Analysis

5.1 Problem Definition
Correlation analysis is commonly used to find important
features or to identify redundant features. Heatmap gives Now a days, it has been recognized that precision
correlation matrix which is used to perform correlation marketing plays a key role in generating profit.

1921
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Precision marketing makes the task of providing personalized

customer service and customers are better informed about the 5.2.3 Identification of Variables and Data Types
products that they like. It helps enterprises to gain profits by
using high-efficiency marketing. The following command shows output as shown in Figure 2.
The accelerated pace of economic globalization and
increasing market competition, led enterprise managers to face retail_df.info()
the problem in choosing the right strategic decision-making
policies for selling the right products to the right customers at
the right time.
The main objective of the EDA on this dataset is to help
managers identify the potential characteristics of different types
of customers and put forward appropriate precision marketing
strategies, which can greatly optimize inventory for every
customer type.
The availability of customer data and their transaction
records provide better understanding of
customer’s consumption behaviors and preferences. The real-
world data from a company in UK were collected and used in
this case study. Exploratory Data Analysis on this dataset Figure 2: Columns and their Types in Retail Dataset
should extract following patterns from data.
5.2.4 Analyzing the Basic Metrics
 Identify important attributes to distinguish
different customer groups. retail_df.describe()
 Discover transactional patterns of different customers
Figure 3 shows output of above command.
 Verify the assumption of cancelled
orders/invoices that may help in preventing future 5.2.5 Detecting Invalid Data Type of Variables
cancellations.
 To get an overview of the general customers purchase All columns are assigned with appropriate data types based on
behavior. value that they contain except CustomerID. It is good to
declare CustomerID as int type.
5.2 Data Preparation
5.2.6 Variable Transformations
Importing Packages:Python packages can be imported as shown in
Figure1. CustomerID Column transformed from float type to int type

Figure 1: Figure Shows on Python Statements to Import Packages

retail_df_ye_cust_analysis2=retail_df_ye_cust_analysis.astype(
5.2.1 Loading Data {'CustomerID':'int32'})
Figure 3: Basic Statistics with Retail Dataset
retail_df=pd.read_csv(‘online_retail.csv’)
5.2.7Removing Duplicate Rows
5.2.2 Identifying Shape of Dataset
We can verify duplicate records using following command.
retail_df.shape duplicates = retail_df [retail_df.duplicated()]
(541909, 8)
541909 rows, 8 columns

1922
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

duplicates 5.2.10 Outlier Treatment

We have outliers in UnitPrice and Quantity column

Boxplot[3] can display outliers in a column. Figure 6 shows
outliers in Quantity Column. These outliers can be eliminated
using z-score (threshold value).Eliminating outliers gives more
accurate results in analysis.

Figure 4: Basic Statistics with Retail Dataset

There are 5268 duplicates records out of 541909 rows. Wecan

remove them using the following command.
retail_df = retail_df.drop_duplicates ()

5.2.8Missing Value Treatment

Figure 6: Boxplot Shows Outliers in Quantity Column

536641-retail_df.count () 5.2.11Dimensionality Reduction

Figure 5 shows output of above command. In this step two copies of retail_dataset are created. They are

retail_df_cust_analysis_1: This dataset contains data of

customers whose data is available completely. Records with
CustomerID as null are removed.
retail_df_cust_analysis = retail_df.copy ()
retail_df_cust_analysis.dropna (subset = ['CustomerID'],
inplace=True)
retail_df_cust_analysis.shape
(401604, 8)

retail_df_cust_analysis_2:This dataset contains all customers’

data, including records with CustomerID as null. This dataset is
used if CustomerID is not used in the analysis.

retail_df_cust_analysis_2= retail_df.copy()
retail_df_cust_analysis_2.shape
Figure 5: Figure Shows Missing Values in
DescriptionandCustomerIDColumn (535187, 8)

Observation: Only two variables Description and CustomerID 5.3 Data Analysis
have missing values. We can delete the missing values for
5.3.1 Univariate Analysis
Description Column because there are no corresponding
CustomerID and UnitPrice values for them. The following
Univariate analysis shows distribution of data points in the
commands removes all missing values. column. We have various visualization techniques [4] to
perform univariate analysis. Figure 7 shows Histogram on
Quantity column.
retail_df.dropna (subset = ['Description'], inplace=True)
bins = np.linspace(0,2000,100)
5.2.9Checking Null Values
groupby_inv=pd.DataFrame(retail_df_cust_analysis_2[~(retail
_df_cust_analysis_2['InvoiceNo'].astype('str').str.contains('C'))]
CustomerID column has null values. The following command
.groupby('InvoiceNo')['StockCode'].
displays records with CustomerID as null value
groupby_inv['Number of products'].plot.hist(bins = bins,
figsize=(15, 7), color='brown', xticks=bins)
retail_df [retail_df ['CustomerID'].isnull ()]

1923
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Figure 9 shows Heat map.

Figure 7: Figure Shows Histogram on QuantityColumn

Observation: Histogram on Quantity column shows that most of

the Customers buy less than 20 items. Figure 9: Figure Shows Heat Map on Retail_dataset.

corr_mat=retail_df_cust_analysis_1.corr()
5.3.2 Bivariate Analysis plt.figure(figsize=(8,8))
sns.heatmap(corr_mat,annot=True,cmap='viridis')
Bivariate Analysis shows relationship among columns in the plt.title("Heat plot shows correlation among columns in
dataset. the dataset").
r_c=retail_df_cust_analysis_1.groupby ('Country')
r_c_r=r_c['Revenue'].sum() Observation: Heat map shows strong correlation between
r_c_r2=r_c_r.reset_index().sort_values(by=['Revenue'],ascendi Quantity and Revenue.
ng=False)
plt.scatter(r_c_r2['Country'],r_c_r2['Revenue']) 5.4 Data exploration
plt.xticks(r_c_r2['Country'], rotation='vertical')
plt.show() This is the heart of entire EDA process. Data exploration isa
story telling processofasking relevant sequence of questions.
Observation: Scatter plot between Revenue and Country These questions reveal patterns related to
columns is shown in Figure 8.It shows that most of the customer’s consumption behaviors and preferences.
Revenue generated from customers in United Kingdom.
5.4.1 Analysis Based on Customer Transactions

i. How many Cancelled Orders do we have?

InvoiceNo starting with the letter "C" is treated as Cancelled

Order.

cancelled_orders=retail_df_cust_analysis_1
[retail_df_cust_analysis_1
['InvoiceNo'].astype('str').str.contains('C')]

cancelled_orders.shape
(8872, 9)
We have 8872 cancelled orders

ii. What is the Percentage of Cancelled Orders

Figure 8: ScatterPlot Shows Revenue from Various Countries
total_orders = retail_df_cust_analysis_1['InvoiceNo'].nunique()
can_orders = cancelled_orders['InvoiceNo'].nunique()
5.3.3 Correlation Analysis can_orders*100/total_orders
Heat map gives correlation matrix, where each cell in the 16.466876971608833
matrix shows the correlation between two columns. Percentage of cancelled orders are: 16.466876971608833
1924
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Observations: We have almost 16% Cancelled orders which is a Observations: In average, 1893.5 revenue is generated per
pretty big number for online retailer. Studying these cancelled customer
orders may help in preventing future cancellation.
Let's first get an overview of the general customers purchase iii. What is the total revenue per country?
behavior and then dig deeper.
groupby_country=
iii. What's the average number of orders per customer? pd.DataFrame(retail_df_cust_analysis_2.groupby('Country')['R
evenue'].sum())
groupby_cust= groupby_country
pd.DataFrame(retail_df_cust_analysis_1.groupby('CustomerID'
)['InvoiceNo'].nunique()).groupby_cust['InvoiceNo'].mean() Figure 10 shows output of above command

5.07548032936871

Observations:The average number of orders per customer is 5.

As we found in descriptive statistics, customers buy an average
(mean) quantity of 5.
Are they from the same product? Let's examine how many
products are purchased.

5.4.2 Analysis Based On Products/Items

i. What's the average number of unique items per order?

Figure 10:Figure Shows Revenue from Various Countries.
groupby_inv=pd.DataFrame(retail_df_cust_analysis_2[~(retail
_df_cust_analysis_2['InvoiceNo'].astype('str').str.contains('C'))] Observations: As we can see, the largest market is the one
.groupby('InvoiceNo')['StockCode'].nunique())
located in UK.So, we can conclude not only most sales
groupby_inv.median()
revenues are achieved in UK, but also most customers are
located there too. We can explore this to find more about what
Observation: Number of unique items per order: 15.0 products the customers buy together and what possible future
opportunities are there in the UK Market.
ii. How many products does a customer buy on an average?

Observations: Figure 7 shows skewed distribution of products. iv. What is the monthly revenue of the online store?
It shows that most people buy less than 20 items.
Revenue (monthly) = Monthly Invoice Count * Quantity * Unit
Price
5.4.3 Analysis Based on Revenue
groupby_month= =
i. What is the total revenue generated by the online retailer? pd.DataFrame(retail_df_cust_analysis_2.groupby(['year','mont
h'])['Revenue'].sum())
groupby_month
retail_df_cust_analysis_1['Revenue']=
Figure 11 shows output of above command.
retail_df_cust_analysis_1['Quantity']*retail_df_cust_analysis_1
['UnitPrice']
retail_df_cust_analysis_1['Revenue'].sum()
8278519.4240000015

ii. What is the average revenue per customer?

groupby_cust =
pd.DataFrame(retail_df_ye_cust_analysis.groupby('CustomerI
D')['Revenue'].sum())

groupby_cust['Revenue'].mean()
1893.5314327538888
Figure 11: Figure Shows Year- Month wise Revenue

1925
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Observations: group_by_stock_desc =
pd.DataFrame(retail_uk.groupby(['StockCode','Description'],
 Monthly revenue for the month as_index = False)['Quantity'].sum())
of September, October, and November are pretty #most bought products in uk
good. temp_df = group_by_stock_desc.sort_values(by = 'Quantity',
 The reason being, as we have Halloween, Black ascending = False)
Friday, and Thanksgiving sales coming up around
these months so the customers tend to buy more sort
of gifts.

v. What is the monthly growth rate for the online retail store?

groupby_month['Growth']=
groupby_month['Revenue'].pct_change()*100

groupby_month[['Revenue']].plot.line(figsize=(15,7),
color='green', fontsize=13, linestyle='-.')
plt.xlabel('time')
plt.ylabel('Revenue')
plt.title('Line chart') Figure 13: Figure Shows Monthly Revenue in UK.

Figure 12 shows output of above command.

temp_df[temp_df['StockCode'] == 22197]

Figure 14: Figure Shows Product Most People Buy in UK.

Observations: Popcorn holder is the most frequently ordered

item.

Figure 12: Figure Shows Monthly Growth Rate for the Online Retail iii. How many monthly active customers are there from UK?
Store
groupby_month_uk =
Observations:It seems like the growth rate of online store pd.DataFrame(retail_uk.groupby(['year','month'])['CustomerID'
is fluctuating. There is no stagnant growth over the months. ].nunique())
groupby_month_uk
5.4.4 Analysis Based on Geographical Location

i. What is the average monthly revenue in UK?

groupby_month_uk=
pd.DataFrame(retail_df_cust_analysis_2[retail_df_cust_analysi
s_2['Country']=='United
Kingdom'].groupby(['year','month'])['Revenue'].sum())
groupby_month_uk

ii. Which products are most bought in UK?

retail_uk = Figure 15:Figure Shows Active Customers Year-Moth Wise

retail_df_cust_analysis_2[retail_df_cust_analysis_2['Country']
=='United Kingdom']

1926
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Actionable Insights Journal Of Composition Theory, Volume XIV, Issue V,

MAY 2021,pp-82-92
 By analyzing the data in this way, we can 5. J.Rajendra, Prasad, S.SaiKumar, B.V.SubbaRao,Design
uncover groups of customers that behave in similar and Development of Financial Fraud Detection using
ways. This level of customer segmentation is useful in Machine Learning,International Journal of Emerging
precision marketing. Trends in Engineering Research, Volume 8. No.9,
 A marketing campaign that works for a group of September 2020, pp.5838 – 5843
customers that places low value orders frequently may 6. Pujo Hari Saputro,Herlino Nanang,Exploratory Data
not be suitable for customers who place sporadic high Analysis & Booking Cancelation Prediction on Hotel
value orders. Booking Demands Data, Journal of Applied Data
 Make relevant product recommendations to the Sciences Vol. 2, No. 1, January 2021, pp. 40.
customers using precision marketing. 7. Behrens, J. T. (1997). Principles and procedures of
 Empower your customers to actively share their exploratory data analysis. Psychological Methods, 2(2),
details, encourage them to share their data with you 131–160. https://doi.org/10.1037/1082-989X.2.2.131.
through conversations, surveys, and other methods. 8. Smith, A. F., & Prentice, D. A. (1993). Exploratory data
Doing so not only help you to know them better, but it analysis. In G. Keren & C. Lewis (Eds.), A handbook for
also builds trust. data analysis in the behavioral sciences: Statistical issues
 Additionally, before performing analysis it would be (p. 349–390). Lawrence Erlbaum Associates, Inc.
important to talk with the e-commerce team to 9. EstelleCamizuli, Emmanuel John Carranza, Exploratory
understand the business and its customers and its Data Analysis (EDA), First published: 26
strategic and tactical objectives. November,2018,https://doi.org/10.1002/9781119188230.sa
seas0271
6. CONCLUSION 10. Ren Jie Tan,A Starter Pack to Exploratory Data
Analysis with Python, pandas, seaborn, and scikit-learn
This paper clearly explained in detail about explorative data 11. Jitendra Pramanik,Abhaya Kumar Samal,Kabita
analysis. This paper mainly focused on significance of EDA Sahoo,Dr. Subhendu Kumar Pani,Exploratory Data
and also systematic approach i.e. sequence of steps to be Analysis using Python,International Journal of
followed to extract interesting hidden pattern in the dataset. Innovative Technology and Exploring Engineering
Here we have taken retail dataset as case study to implement (IJITEE) ISSN: 2278-3075, Volume-8, Issue-12, October
Task Centric EDA. We applied EDA on this dataset to identify 2019.
the potential characteristics of different customer categories 12. I.J. Good,The Philosophy of Exploratory Data
and put forward appropriate precision marketing strategies to Analysis, Philosophy of Science Volume 50, Number 2
improve sales.Python programming language with Jupyter ,Jun-1983.
Note Book are used to analyze data and to draw various charts.
At last the outcome of EDA produces conclusion remarks and 7.2 Books
actionable insights to improve business process.
13. Exploratory Data Analysis with MATLAB
By Wendy L. Martinez, Angel R. Martinez, Jeffrey Solka.
REFERENCES 14. Exploratory data analysis as a foundation of
inductive research,Andrew T.Jebb,Scott Parrigon,Sang
EunWoo,Human Resource Management
7.1 Journal Articles Review,Volume 27, Issue 2, June 2017, Pages 265-276.
15. Hands-On Exploratory Data Analysis with Python:
1. Tristan Langer and Tobias Meisen,System Design to Perform EDA,Suresh Kumar Mukhiya, Usman
UtilizeDomain Expertise for Visual Exploratory Data Ahmed,2020,PACKT publishing Ltd.
Analysis, Information 2021. 16. Exploratory Data Analysis Using R,Ronald K. Pearson
2. Jinglin Peng,Weiyuan Wu,Brandon Lockhart,Song 2018, CRC Press.
Bian,Jing Nathan Yan Linghao Xu,Zhixuan Chi,Jeffrey M. 17. Hands-On Exploratory Data Analysis with R,Radhika
Rzeszotarski,Jiannan Wang ,DataPrep.EDA: Task- Datar, Harish Garg · May-2019, PACKT publishing Ltd.
Centric Exploratory Data Analysis for Statistical 18. Exploratory Data Analysis - Volume 2 - Page 1, John
Modeling in Python. Wilder Tukey • 1977
3. Babangida Ibrahim Babura,Mohd Bakri
Adam2,Muhammad Sani3,Usman Waziri4,Felix Yakubu
Eguda1,Construction and Applications of Stairboxplot
for Exploratory Data Analysis, Journal of Physics:
Conference Series, ICMSDS 2020.
4. Parvatham Niranjan Kumar, Kambhampati Vijay
Kumar,Comparative Study of Univariate Data
Visualization with Case Study Approach, JAC : A
1927

Unit 1
No ratings yet
Unit 1
50 pages
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Exploratory Data Analysis Essentials
No ratings yet
Exploratory Data Analysis Essentials
47 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
AD3301 Data Exploration and Visualization
No ratings yet
AD3301 Data Exploration and Visualization
278 pages
P23MBA547 Predictive Analytics
No ratings yet
P23MBA547 Predictive Analytics
133 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
17 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Unit 1 DXV
No ratings yet
Unit 1 DXV
28 pages
Exploratory Data Analysis Overview
No ratings yet
Exploratory Data Analysis Overview
34 pages
Group 7
No ratings yet
Group 7
19 pages
Aiht Notes Dev 1-5
No ratings yet
Aiht Notes Dev 1-5
236 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
What Is Exploratory Data Analysis
No ratings yet
What Is Exploratory Data Analysis
28 pages
Notes - Unit 1 - Exploratory Data Analysis
No ratings yet
Notes - Unit 1 - Exploratory Data Analysis
33 pages
The Analysis - in - EDA
No ratings yet
The Analysis - in - EDA
7 pages
Data Acquisition and EDA Techniques
No ratings yet
Data Acquisition and EDA Techniques
58 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
Unit 4
No ratings yet
Unit 4
33 pages
ML Exp1 - 2201107
No ratings yet
ML Exp1 - 2201107
34 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Unit 1
No ratings yet
Unit 1
29 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
DSP Unit - Ii
No ratings yet
DSP Unit - Ii
14 pages
Essential Guide to Exploratory Data Analysis
No ratings yet
Essential Guide to Exploratory Data Analysis
3 pages
Lecture 21
No ratings yet
Lecture 21
16 pages
Social Media Data Analysis Guide
No ratings yet
Social Media Data Analysis Guide
12 pages
Eda
No ratings yet
Eda
6 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Data Analytics Interview Questions
No ratings yet
Data Analytics Interview Questions
3 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
Unit II. Methods and Techniques For Data Analytics
No ratings yet
Unit II. Methods and Techniques For Data Analytics
91 pages
Devish All Unit
No ratings yet
Devish All Unit
42 pages
Guide Eda Python 2
No ratings yet
Guide Eda Python 2
30 pages
UNIT 1 Exploratory Data Analysis
100% (4)
UNIT 1 Exploratory Data Analysis
21 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
129 pages
Unit 3
No ratings yet
Unit 3
31 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
6 pages
Comparing Tools Provided by Python and R For Exploratory Data Analysis
No ratings yet
Comparing Tools Provided by Python and R For Exploratory Data Analysis
12 pages
Eda 1
No ratings yet
Eda 1
25 pages
EDA Unit 1
No ratings yet
EDA Unit 1
41 pages
Document
No ratings yet
Document
21 pages
Datascience 3
No ratings yet
Datascience 3
40 pages
Eda 2
No ratings yet
Eda 2
69 pages
EDA Lecture Notes
No ratings yet
EDA Lecture Notes
205 pages
Unit I Exploratory Data Analysis
No ratings yet
Unit I Exploratory Data Analysis
38 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Currency Recognition System Using Image Processing: Libyan Banknote As A Case Study
No ratings yet
Currency Recognition System Using Image Processing: Libyan Banknote As A Case Study
5 pages
Biometric Based Authentication System For Computer Based Assessment
No ratings yet
Biometric Based Authentication System For Computer Based Assessment
8 pages
Performance Evaluation of Infrared Image Enhancement Techniques
No ratings yet
Performance Evaluation of Infrared Image Enhancement Techniques
11 pages
Effect of Carbon Nanofibers (CNF) On The Crystallization Kinetics of Polyphenylene Sulfide (PPS)
No ratings yet
Effect of Carbon Nanofibers (CNF) On The Crystallization Kinetics of Polyphenylene Sulfide (PPS)
6 pages
Designing Enterprise Architecture Toward Big Data Readiness Using TOGAF ADM in The Public Health Sector
No ratings yet
Designing Enterprise Architecture Toward Big Data Readiness Using TOGAF ADM in The Public Health Sector
9 pages
Use of Smart Wearable Devices For The Acquisition and Subsequent Analysis of The Stress Level of A University Professor
No ratings yet
Use of Smart Wearable Devices For The Acquisition and Subsequent Analysis of The Stress Level of A University Professor
5 pages
Physico-Chemical and Microbiological Analysis of Borehole Water Sources in E-One Estate Lemna, Calabar, Cross River State, Nigeria
No ratings yet
Physico-Chemical and Microbiological Analysis of Borehole Water Sources in E-One Estate Lemna, Calabar, Cross River State, Nigeria
6 pages
Automation of The Production of Paint Rollers by Thermofusion Method
No ratings yet
Automation of The Production of Paint Rollers by Thermofusion Method
8 pages
E-Commerce Fraud Detection via ML Techniques
No ratings yet
E-Commerce Fraud Detection via ML Techniques
6 pages
Selection Sort Algorithm Analysis
No ratings yet
Selection Sort Algorithm Analysis
7 pages
Lane Detection and Lane Departure Warning System
No ratings yet
Lane Detection and Lane Departure Warning System
7 pages
Mathematical Approach For The Determination of The Surface Roughness in The Milling of Poly-Ether-Ether-Cethone (PEEK)
No ratings yet
Mathematical Approach For The Determination of The Surface Roughness in The Milling of Poly-Ether-Ether-Cethone (PEEK)
5 pages
An Improved Queuing Model For Reducing Average Waiting Time of Emergency Surgical Patients Using Preemptive Priority Scheduling
No ratings yet
An Improved Queuing Model For Reducing Average Waiting Time of Emergency Surgical Patients Using Preemptive Priority Scheduling
14 pages
Record Management System for Counseling
100% (1)
Record Management System for Counseling
7 pages
Review On Learning Parity With Noise Based Cloud Computing
No ratings yet
Review On Learning Parity With Noise Based Cloud Computing
5 pages
Machine Learning for Fake News Detection
No ratings yet
Machine Learning for Fake News Detection
4 pages
CAPTCHA Design: A Novel Security Method Using Sindhi Language
No ratings yet
CAPTCHA Design: A Novel Security Method Using Sindhi Language
5 pages
To Analyze Energy Aware MAC Protocols For Wireless Sensor Networks
No ratings yet
To Analyze Energy Aware MAC Protocols For Wireless Sensor Networks
6 pages
Virtualization Security Challenges in Cloud
100% (1)
Virtualization Security Challenges in Cloud
11 pages
Adaptive Background Subtraction in Dance Videos
No ratings yet
Adaptive Background Subtraction in Dance Videos
4 pages
ML & Data-Mining for Software Vulnerability Detection
No ratings yet
ML & Data-Mining for Software Vulnerability Detection
18 pages
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
No ratings yet
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
8 pages
Optimization of Factorial Design With The Type of Plackett-Burman Design To Study The Effects of Organic Rice Production Process: Second Step Experiment
No ratings yet
Optimization of Factorial Design With The Type of Plackett-Burman Design To Study The Effects of Organic Rice Production Process: Second Step Experiment
5 pages
Organic Rice Production Optimization
No ratings yet
Organic Rice Production Optimization
5 pages
Fastclip FCA
No ratings yet
Fastclip FCA
2 pages
Agricultural Extension in The Parish Development Model - June 2021 - Revised
No ratings yet
Agricultural Extension in The Parish Development Model - June 2021 - Revised
10 pages
Job Satisfaction
No ratings yet
Job Satisfaction
3 pages
Safe Chemical Development Practices: Risks From Rising Temperature
No ratings yet
Safe Chemical Development Practices: Risks From Rising Temperature
14 pages
1999 Prestige-Seeking Vigneron-Johnson AMSREVIEW PDF
No ratings yet
1999 Prestige-Seeking Vigneron-Johnson AMSREVIEW PDF
23 pages
Bushara Reservoir Repair Project DRC
No ratings yet
Bushara Reservoir Repair Project DRC
241 pages
Leveraging Consumer Behavior and Psychology in The Digital Economy 1st Edition by Norazah Mohd Suki 9781799830443 1799830446
100% (16)
Leveraging Consumer Behavior and Psychology in The Digital Economy 1st Edition by Norazah Mohd Suki 9781799830443 1799830446
87 pages
Bhuvaneshwari - SHRI240717CR487168765
No ratings yet
Bhuvaneshwari - SHRI240717CR487168765
4 pages
Slot1.1 What Is Back End Development
No ratings yet
Slot1.1 What Is Back End Development
18 pages
2016suspension 2009RegisteredCorporations
No ratings yet
2016suspension 2009RegisteredCorporations
127 pages
Liquid Legal Humanization and The Law Kai Jacob Dierk Schindler Download
100% (1)
Liquid Legal Humanization and The Law Kai Jacob Dierk Schindler Download
87 pages
eSRS Guide
No ratings yet
eSRS Guide
3 pages
Game Devs: Master Procedural Design
No ratings yet
Game Devs: Master Procedural Design
2 pages
Human Resource Champions
88% (8)
Human Resource Champions
12 pages
U - CS - 20 - 023 Laundry Management System One
No ratings yet
U - CS - 20 - 023 Laundry Management System One
4 pages
Xu 1996
No ratings yet
Xu 1996
17 pages
La Marzocco Technical Newsletter July 2017
No ratings yet
La Marzocco Technical Newsletter July 2017
1 page
Mountain Dwellings
100% (3)
Mountain Dwellings
12 pages
8th Revision Worksheet IT
No ratings yet
8th Revision Worksheet IT
11 pages
Microsoft Word - A Comprehensive Overview
No ratings yet
Microsoft Word - A Comprehensive Overview
5 pages
Customer Information & Credit Application Form
No ratings yet
Customer Information & Credit Application Form
2 pages
Waves and Optics Module 1
No ratings yet
Waves and Optics Module 1
12 pages
Engine Pedestal Vibration Solutions with TMDs
100% (1)
Engine Pedestal Vibration Solutions with TMDs
22 pages
Android Final
No ratings yet
Android Final
1,185 pages
CIPC 2018 Winners Announced
No ratings yet
CIPC 2018 Winners Announced
3 pages
Company Analysis Content
No ratings yet
Company Analysis Content
8 pages
MCC PANEL Scope of Work
100% (1)
MCC PANEL Scope of Work
39 pages
Empoerment Tech Q4Module5 6
No ratings yet
Empoerment Tech Q4Module5 6
14 pages
Mirpuri Vs Court of Appeals, 318 SCRA 516, G.R. No. 114508, November 19, 1999
No ratings yet
Mirpuri Vs Court of Appeals, 318 SCRA 516, G.R. No. 114508, November 19, 1999
51 pages
Active Heating and Cooling
0% (1)
Active Heating and Cooling
16 pages

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

Uploaded by

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

Uploaded by

ISSN 2278-3091

Volume 10, No.3, May - June 2021

Systematic Approach to Perform Task Centric Exploratory

ABSTRACT EDA is treated as an art of looking at data in an effort to

2. THE SIGNIFICANCE OF EDA

3.3.3 Correlation Analysis

Precision marketing makes the task of providing personalized

Figure 1: Figure Shows on Python Statements to Import Packages

duplicates 5.2.10 Outlier Treatment

We have outliers in UnitPrice and Quantity column

Figure 4: Basic Statistics with Retail Dataset

There are 5268 duplicates records out of 541909 rows. Wecan

5.2.8Missing Value Treatment

536641-retail_df.count () 5.2.11Dimensionality Reduction

retail_df_cust_analysis_1: This dataset contains data of

retail_df_cust_analysis_2:This dataset contains all customers’

Figure 9 shows Heat map.

Figure 7: Figure Shows Histogram on QuantityColumn

Observation: Histogram on Quantity column shows that most of

i. How many Cancelled Orders do we have?

InvoiceNo starting with the letter "C" is treated as Cancelled

ii. What is the Percentage of Cancelled Orders

Observations:The average number of orders per customer is 5.

5.4.2 Analysis Based On Products/Items

i. What's the average number of unique items per order?

ii. What is the average revenue per customer?

Figure 12 shows output of above command.

Figure 14: Figure Shows Product Most People Buy in UK.

Observations: Popcorn holder is the most frequently ordered

i. What is the average monthly revenue in UK?

ii. Which products are most bought in UK?

retail_uk = Figure 15:Figure Shows Active Customers Year-Moth Wise

Actionable Insights Journal Of Composition Theory, Volume XIV, Issue V,

You might also like