0% found this document useful (0 votes)
125 views8 pages

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

This document discusses a systematic approach to performing exploratory data analysis (EDA). It begins by defining EDA and explaining its significance in summarizing data, performing statistical analysis, and visualizing data to gain insights. The document then outlines the key steps in EDA: 1) defining the problem, 2) preparing the data, 3) analyzing the data through descriptive statistics and visualizations, and 4) exploring the data to answer questions and extract patterns. It provides examples of questions asked and techniques used at each step. Finally, the document emphasizes that the goal of EDA is to gain actionable insights from the data to help organizations make informed decisions. It concludes by stating EDA reveals truths about the data

Uploaded by

Velumani s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views8 pages

Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study

This document discusses a systematic approach to performing exploratory data analysis (EDA). It begins by defining EDA and explaining its significance in summarizing data, performing statistical analysis, and visualizing data to gain insights. The document then outlines the key steps in EDA: 1) defining the problem, 2) preparing the data, 3) analyzing the data through descriptive statistics and visualizations, and 4) exploring the data to answer questions and extract patterns. It provides examples of questions asked and techniques used at each step. Finally, the document emphasizes that the goal of EDA is to gain actionable insights from the data to help organizations make informed decisions. It concludes by stating EDA reveals truths about the data

Uploaded by

Velumani s
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ISSN 2278-3091

Volume 10, No.3, May - June 2021


Parvatham International Journal
Niranjan Kumaret et al., ofJournal
International Advanced Trends
of Advanced in Computer
Trends in Computer Science and Science
Engineering, and Engineering
10(3), May - June 2021, 1920 – 1927
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse601032021.pdf
https://doi.org/10.30534/ijatcse/2021/601032021

Systematic Approach to Perform Task Centric Exploratory


Data Analysis with Case study
Parvatham Niranjan Kumar1, Kambhampati Vijay Kumar 2
1
Assistant Professor, Anurag Engineering College, Kodad, India,[email protected]
2
Assistant Professor, Anurag Engineering College, Kodad, India,[email protected],

ABSTRACT EDA is treated as an art of looking at data in an effort to


understand the underlying structure of the dataset. It enables
Exploratory data analysis is a method to summarize main data analysts and data scientists to bring right information to
characteristics of data, and also to understand data more deeply the right people. It will be considered as the most important
using visualization techniques. This paper focuses on defining step on which a data-driven organization should focus.
systematic approach in the form of well-defined sequence of
steps to explore data in various aspects. Every organization EDA helps to summarize statistical characteristics of dataset by
produces lot of data. Organization needs to analyze this data focusing on four key aspects, i.e., Measuring of central
very carefully to extract hidden patterns in the data. Task tendency (comprising of the mean, mode and median),
Centric EDA [2] produces actionable insights as outcome to Measuring of spread (comprising of standard deviation and
improve business process. This uses Python programming variance) and Shape of the distribution. Rest of the paper is
language and Jupyter Notebook for data analysis. Python is an
organized as follows. Section II presents significance of EDA,
object oriented and interactive programming language, which
Section III explains about steps in EDA and also various
contains rich sets of libraries like pandas, MATplotlib,
seaborn[10] etc. We have used different types of charts and techniques for the exploratory data analysis. Section IV
various types of parameters to analyze retail dataset and to discusses how to conduct exploratory data analysis using
improve sales using precision marketing. Python and Jupyter Notebook, while Section V presents how to
work with datasets to conduct Exploratory Data Analysis with
Keywords: Exploratory data analysis, machine learning, EDA, case study. Finally, Section VI presents the concluding
sea born, matplotlib, precision marketing remarks.

2. THE SIGNIFICANCE OF EDA


1. INTRODUCTION Appropriate and well-established decisions should be made by
Data is growing very faster in today’s world. Every organization data driven organizations using huge amount of data collected
produces and also depends on a lot of data in their everyday from various sources. It is highly impossible to sense datasets
processes. It is not easy to process the data manually. containing more than a handful of data points without using
Organizations need to understand data carefully, before making computing tools. Exploratory Data Analysis is the key, and it is
assumptions or decisions based on this data. the first step in data mining process.

There are three motivations for analyzing data. First to The Key components of Exploratory Data Analysis includes
understand what has happened or what is happening, second to summarizing data, statistical analysis and visualization of data.
predict what is likely to happen in the near future, third to guide Certain insights collected by exploring the data help us to make
us in making decisions. further decisions.

Data analysis and visualization tools help us to understand data EDA actually reveals ground truth about the content without
much deeper. Data analytics allow organizations to understand making any underlying assumptions. This is the reason why
their business efficiency and performance, and also helps in data scientists use this process to actually understand what type
making informed decisions. For example, an e-commerce online of modelling and hypothesis can be created for further analysis
store might be interested in analyzing customer attributes to [13].
make ads by targeting particular categories of customers for
improving sales. 3. STEPS IN EDA
Exploratory Data Analysis (EDA) is an approach which uses
Having understood what EDA is, and its significance, let’s
both descriptive statistics and graphical tools to have better
understand the various steps involved in data analysis.
understanding of data. Especially, “Graphs” are important
Basically it involves four different steps[13]. Let’s go through
because humans are much better at seeing patterns in graphs
each of them to get a brief understanding of each step.
than in large collection of numbers.
1920
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

3.1 Problem Definition analysis, where each cell in the matrix shows the correlation
between two columns. It shows which features are strongly
Before trying to perform analysis to extract useful insights correlated with the target variable and which two features are
from the data, it is essential to define the business problem to highly correlated with each other.
be solved. The problem definition is the driving force for a data
3.4 Exploring Data
analysis. The following activities are involved in this step.
This is the heart of entire EDA process. In this step, sequence
 Defining the main objective of the analysis
of relevant questions will be prepared as story telling process to
 Obtaining the current status of the data
explore the data and also to answer the following questions
 Outlining the main roles and responsibilities [14].
 Creating an execution plan
 What happened or what is happening?
3.2 Data Preparation  To predict what will happen, in the near future
 Actionable insights help organizations to take
This step involves preparing the dataset ready for actual
informed decisions
analysis. In this step, we try to digest schema of dataset
(Column names and their data types) and main characteristics This questionnaire helps us to extract hidden patterns in data
of the data. Dataset cleaning process will be done by removing and helps us in finding solution for a given problem.
non-relevant data, transforming the data, and dividing the data
into required chunks for analysis. The following activities are 3.5 Conclusion with Actionable Insights
done in this step.
This step involves presenting analysis results to the target
 Identifying shape of dataset. audience in the form of graphs, summary tables, maps, and
 Identification of variables and data types diagrams. This step suggests Actionable Insights, which are
 Analyzing the basic metrics interpreted by the business stakeholders to improve business
 Detecting invalid data type of variables. process.
 Variable transformations
4. IMPLEMENTATION
 Missing value treatment
 Outlier treatment 4.1 Python
 Dimensionality Reduction
3.3 Data Analysis Python is the popular language used for Exploratory Data
Analysis. It has rich sets of libraries. Visualization process can
This is one of the most crucial step that deals with descriptive make it easier to create the clear report. Pandas is the most
statistics and analysis [5] of the data. The following visual powerful package in python to perform data analysis. It is built
methods help in summarizing the data, finding the hidden on the top of the NumPy package. Matplotlib or seaborn can be
correlation and relationships among the data.
used to draw plots.
3.3.1Graphical Univariate Analysis 4.2 Jupiter Notebook
Univariate analysis of data is done using only one variable. Jupyter Notebook is a web-based interactive development
Since it’s a single variable, it doesn’t deal with relationships environment for creating notebook documents. A Jupyter
among variables. Univariate analysis describes patterns that Notebook contains a list of input and output ordered cells that
exist within the data. Line chart and Histogram are used for can contain code, Markdown text, mathematical expressions,
performing univariate analysis: plots, charts, and media. Jupiter Notebooks utilize the .ipynb
file extension. A Jupyter Notebook is a great way to build step-
3.3.2 Bivariate Analysis
by-step interactive Python programs. The technology is
Bivariate analysis is to understand the relationship between particularly well-suited for data analysis and plotting.
two columns. There are many visualizations to perform 5. WORKING WITH THE DATA SET
bivariate analysis. For example, Scatter plot can be drawn to
check linear relationship between year_built and house_price, It’s time to explore the data and find about it. The data we are
and hexbin plot to check the distribution of price in different using belongs to Retail Dataset. We are going to analyze this
year ranges. data with possible set of options.

3.3.3 Correlation Analysis


5.1 Problem Definition
Correlation analysis is commonly used to find important
features or to identify redundant features. Heatmap gives Now a days, it has been recognized that precision
correlation matrix which is used to perform correlation marketing plays a key role in generating profit.

1921
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Precision marketing makes the task of providing personalized


customer service and customers are better informed about the 5.2.3 Identification of Variables and Data Types
products that they like. It helps enterprises to gain profits by
using high-efficiency marketing. The following command shows output as shown in Figure 2.
The accelerated pace of economic globalization and
increasing market competition, led enterprise managers to face retail_df.info()
the problem in choosing the right strategic decision-making
policies for selling the right products to the right customers at
the right time.
The main objective of the EDA on this dataset is to help
managers identify the potential characteristics of different types
of customers and put forward appropriate precision marketing
strategies, which can greatly optimize inventory for every
customer type.
The availability of customer data and their transaction
records provide better understanding of
customer’s consumption behaviors and preferences. The real-
world data from a company in UK were collected and used in
this case study. Exploratory Data Analysis on this dataset Figure 2: Columns and their Types in Retail Dataset
should extract following patterns from data.
5.2.4 Analyzing the Basic Metrics
 Identify important attributes to distinguish
different customer groups. retail_df.describe()
 Discover transactional patterns of different customers
Figure 3 shows output of above command.
 Verify the assumption of cancelled
orders/invoices that may help in preventing future 5.2.5 Detecting Invalid Data Type of Variables
cancellations.
 To get an overview of the general customers purchase All columns are assigned with appropriate data types based on
behavior. value that they contain except CustomerID. It is good to
declare CustomerID as int type.
5.2 Data Preparation
5.2.6 Variable Transformations
Importing Packages:Python packages can be imported as shown in
Figure1. CustomerID Column transformed from float type to int type

Figure 1: Figure Shows on Python Statements to Import Packages


retail_df_ye_cust_analysis2=retail_df_ye_cust_analysis.astype(
5.2.1 Loading Data {'CustomerID':'int32'})
Figure 3: Basic Statistics with Retail Dataset
retail_df=pd.read_csv(‘online_retail.csv’)
5.2.7Removing Duplicate Rows
5.2.2 Identifying Shape of Dataset
We can verify duplicate records using following command.
retail_df.shape duplicates = retail_df [retail_df.duplicated()]
(541909, 8)
541909 rows, 8 columns

1922
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

duplicates 5.2.10 Outlier Treatment

We have outliers in UnitPrice and Quantity column


Boxplot[3] can display outliers in a column. Figure 6 shows
outliers in Quantity Column. These outliers can be eliminated
using z-score (threshold value).Eliminating outliers gives more
accurate results in analysis.

Figure 4: Basic Statistics with Retail Dataset

There are 5268 duplicates records out of 541909 rows. Wecan


remove them using the following command.
retail_df = retail_df.drop_duplicates ()

5.2.8Missing Value Treatment


Figure 6: Boxplot Shows Outliers in Quantity Column

536641-retail_df.count () 5.2.11Dimensionality Reduction

Figure 5 shows output of above command. In this step two copies of retail_dataset are created. They are

retail_df_cust_analysis_1: This dataset contains data of


customers whose data is available completely. Records with
CustomerID as null are removed.
retail_df_cust_analysis = retail_df.copy ()
retail_df_cust_analysis.dropna (subset = ['CustomerID'],
inplace=True)
retail_df_cust_analysis.shape
(401604, 8)

retail_df_cust_analysis_2:This dataset contains all customers’


data, including records with CustomerID as null. This dataset is
used if CustomerID is not used in the analysis.

retail_df_cust_analysis_2= retail_df.copy()
retail_df_cust_analysis_2.shape
Figure 5: Figure Shows Missing Values in
DescriptionandCustomerIDColumn (535187, 8)

Observation: Only two variables Description and CustomerID 5.3 Data Analysis
have missing values. We can delete the missing values for
5.3.1 Univariate Analysis
Description Column because there are no corresponding
CustomerID and UnitPrice values for them. The following
Univariate analysis shows distribution of data points in the
commands removes all missing values. column. We have various visualization techniques [4] to
perform univariate analysis. Figure 7 shows Histogram on
Quantity column.
retail_df.dropna (subset = ['Description'], inplace=True)
bins = np.linspace(0,2000,100)
5.2.9Checking Null Values
groupby_inv=pd.DataFrame(retail_df_cust_analysis_2[~(retail
_df_cust_analysis_2['InvoiceNo'].astype('str').str.contains('C'))]
CustomerID column has null values. The following command
.groupby('InvoiceNo')['StockCode'].
displays records with CustomerID as null value
groupby_inv['Number of products'].plot.hist(bins = bins,
figsize=(15, 7), color='brown', xticks=bins)
retail_df [retail_df ['CustomerID'].isnull ()]

1923
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Figure 9 shows Heat map.

Figure 7: Figure Shows Histogram on QuantityColumn

Observation: Histogram on Quantity column shows that most of


the Customers buy less than 20 items. Figure 9: Figure Shows Heat Map on Retail_dataset.

corr_mat=retail_df_cust_analysis_1.corr()
5.3.2 Bivariate Analysis plt.figure(figsize=(8,8))
sns.heatmap(corr_mat,annot=True,cmap='viridis')
Bivariate Analysis shows relationship among columns in the plt.title("Heat plot shows correlation among columns in
dataset. the dataset").
r_c=retail_df_cust_analysis_1.groupby ('Country')
r_c_r=r_c['Revenue'].sum() Observation: Heat map shows strong correlation between
r_c_r2=r_c_r.reset_index().sort_values(by=['Revenue'],ascendi Quantity and Revenue.
ng=False)
plt.scatter(r_c_r2['Country'],r_c_r2['Revenue']) 5.4 Data exploration
plt.xticks(r_c_r2['Country'], rotation='vertical')
plt.show() This is the heart of entire EDA process. Data exploration isa
story telling processofasking relevant sequence of questions.
Observation: Scatter plot between Revenue and Country These questions reveal patterns related to
columns is shown in Figure 8.It shows that most of the customer’s consumption behaviors and preferences.
Revenue generated from customers in United Kingdom.
5.4.1 Analysis Based on Customer Transactions

i. How many Cancelled Orders do we have?

InvoiceNo starting with the letter "C" is treated as Cancelled


Order.

cancelled_orders=retail_df_cust_analysis_1
[retail_df_cust_analysis_1
['InvoiceNo'].astype('str').str.contains('C')]

cancelled_orders.shape
(8872, 9)
We have 8872 cancelled orders

ii. What is the Percentage of Cancelled Orders


Figure 8: ScatterPlot Shows Revenue from Various Countries
total_orders = retail_df_cust_analysis_1['InvoiceNo'].nunique()
can_orders = cancelled_orders['InvoiceNo'].nunique()
5.3.3 Correlation Analysis can_orders*100/total_orders
Heat map gives correlation matrix, where each cell in the 16.466876971608833
matrix shows the correlation between two columns. Percentage of cancelled orders are: 16.466876971608833
1924
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Observations: We have almost 16% Cancelled orders which is a Observations: In average, 1893.5 revenue is generated per
pretty big number for online retailer. Studying these cancelled customer
orders may help in preventing future cancellation.
Let's first get an overview of the general customers purchase iii. What is the total revenue per country?
behavior and then dig deeper.
groupby_country=
iii. What's the average number of orders per customer? pd.DataFrame(retail_df_cust_analysis_2.groupby('Country')['R
evenue'].sum())
groupby_cust= groupby_country
pd.DataFrame(retail_df_cust_analysis_1.groupby('CustomerID'
)['InvoiceNo'].nunique()).groupby_cust['InvoiceNo'].mean() Figure 10 shows output of above command

5.07548032936871

Observations:The average number of orders per customer is 5.


As we found in descriptive statistics, customers buy an average
(mean) quantity of 5.
Are they from the same product? Let's examine how many
products are purchased.

5.4.2 Analysis Based On Products/Items

i. What's the average number of unique items per order?


Figure 10:Figure Shows Revenue from Various Countries.
groupby_inv=pd.DataFrame(retail_df_cust_analysis_2[~(retail
_df_cust_analysis_2['InvoiceNo'].astype('str').str.contains('C'))] Observations: As we can see, the largest market is the one
.groupby('InvoiceNo')['StockCode'].nunique())
located in UK.So, we can conclude not only most sales
groupby_inv.median()
revenues are achieved in UK, but also most customers are
located there too. We can explore this to find more about what
Observation: Number of unique items per order: 15.0 products the customers buy together and what possible future
opportunities are there in the UK Market.
ii. How many products does a customer buy on an average?

Observations: Figure 7 shows skewed distribution of products. iv. What is the monthly revenue of the online store?
It shows that most people buy less than 20 items.
Revenue (monthly) = Monthly Invoice Count * Quantity * Unit
Price
5.4.3 Analysis Based on Revenue
groupby_month= =
i. What is the total revenue generated by the online retailer? pd.DataFrame(retail_df_cust_analysis_2.groupby(['year','mont
h'])['Revenue'].sum())
groupby_month
retail_df_cust_analysis_1['Revenue']=
Figure 11 shows output of above command.
retail_df_cust_analysis_1['Quantity']*retail_df_cust_analysis_1
['UnitPrice']
retail_df_cust_analysis_1['Revenue'].sum()
8278519.4240000015

ii. What is the average revenue per customer?

groupby_cust =
pd.DataFrame(retail_df_ye_cust_analysis.groupby('CustomerI
D')['Revenue'].sum())

groupby_cust['Revenue'].mean()
1893.5314327538888
Figure 11: Figure Shows Year- Month wise Revenue

1925
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Observations: group_by_stock_desc =
pd.DataFrame(retail_uk.groupby(['StockCode','Description'],
 Monthly revenue for the month as_index = False)['Quantity'].sum())
of September, October, and November are pretty #most bought products in uk
good. temp_df = group_by_stock_desc.sort_values(by = 'Quantity',
 The reason being, as we have Halloween, Black ascending = False)
Friday, and Thanksgiving sales coming up around
these months so the customers tend to buy more sort
of gifts.

v. What is the monthly growth rate for the online retail store?

groupby_month['Growth']=
groupby_month['Revenue'].pct_change()*100

groupby_month[['Revenue']].plot.line(figsize=(15,7),
color='green', fontsize=13, linestyle='-.')
plt.xlabel('time')
plt.ylabel('Revenue')
plt.title('Line chart') Figure 13: Figure Shows Monthly Revenue in UK.

Figure 12 shows output of above command.


temp_df[temp_df['StockCode'] == 22197]

Figure 14: Figure Shows Product Most People Buy in UK.

Observations: Popcorn holder is the most frequently ordered


item.

Figure 12: Figure Shows Monthly Growth Rate for the Online Retail iii. How many monthly active customers are there from UK?
Store
groupby_month_uk =
Observations:It seems like the growth rate of online store pd.DataFrame(retail_uk.groupby(['year','month'])['CustomerID'
is fluctuating. There is no stagnant growth over the months. ].nunique())
groupby_month_uk
5.4.4 Analysis Based on Geographical Location

i. What is the average monthly revenue in UK?

groupby_month_uk=
pd.DataFrame(retail_df_cust_analysis_2[retail_df_cust_analysi
s_2['Country']=='United
Kingdom'].groupby(['year','month'])['Revenue'].sum())
groupby_month_uk

ii. Which products are most bought in UK?

retail_uk = Figure 15:Figure Shows Active Customers Year-Moth Wise


retail_df_cust_analysis_2[retail_df_cust_analysis_2['Country']
=='United Kingdom']

1926
Parvatham Niranjan Kumaret et al., International Journal of Advanced Trends in Computer Science and Engineering, 10(3), May - June 2021, 1920 – 1927

Actionable Insights Journal Of Composition Theory, Volume XIV, Issue V,


MAY 2021,pp-82-92
 By analyzing the data in this way, we can 5. J.Rajendra, Prasad, S.SaiKumar, B.V.SubbaRao,Design
uncover groups of customers that behave in similar and Development of Financial Fraud Detection using
ways. This level of customer segmentation is useful in Machine Learning,International Journal of Emerging
precision marketing. Trends in Engineering Research, Volume 8. No.9,
 A marketing campaign that works for a group of September 2020, pp.5838 – 5843
customers that places low value orders frequently may 6. Pujo Hari Saputro,Herlino Nanang,Exploratory Data
not be suitable for customers who place sporadic high Analysis & Booking Cancelation Prediction on Hotel
value orders. Booking Demands Data, Journal of Applied Data
 Make relevant product recommendations to the Sciences Vol. 2, No. 1, January 2021, pp. 40.
customers using precision marketing. 7. Behrens, J. T. (1997). Principles and procedures of
 Empower your customers to actively share their exploratory data analysis. Psychological Methods, 2(2),
details, encourage them to share their data with you 131–160. https://doi.org/10.1037/1082-989X.2.2.131.
through conversations, surveys, and other methods. 8. Smith, A. F., & Prentice, D. A. (1993). Exploratory data
Doing so not only help you to know them better, but it analysis. In G. Keren & C. Lewis (Eds.), A handbook for
also builds trust. data analysis in the behavioral sciences: Statistical issues
 Additionally, before performing analysis it would be (p. 349–390). Lawrence Erlbaum Associates, Inc.
important to talk with the e-commerce team to 9. EstelleCamizuli, Emmanuel John Carranza, Exploratory
understand the business and its customers and its Data Analysis (EDA), First published: 26
strategic and tactical objectives. November,2018,https://doi.org/10.1002/9781119188230.sa
seas0271
6. CONCLUSION 10. Ren Jie Tan,A Starter Pack to Exploratory Data
Analysis with Python, pandas, seaborn, and scikit-learn
This paper clearly explained in detail about explorative data 11. Jitendra Pramanik,Abhaya Kumar Samal,Kabita
analysis. This paper mainly focused on significance of EDA Sahoo,Dr. Subhendu Kumar Pani,Exploratory Data
and also systematic approach i.e. sequence of steps to be Analysis using Python,International Journal of
followed to extract interesting hidden pattern in the dataset. Innovative Technology and Exploring Engineering
Here we have taken retail dataset as case study to implement (IJITEE) ISSN: 2278-3075, Volume-8, Issue-12, October
Task Centric EDA. We applied EDA on this dataset to identify 2019.
the potential characteristics of different customer categories 12. I.J. Good,The Philosophy of Exploratory Data
and put forward appropriate precision marketing strategies to Analysis, Philosophy of Science Volume 50, Number 2
improve sales.Python programming language with Jupyter ,Jun-1983.
Note Book are used to analyze data and to draw various charts.
At last the outcome of EDA produces conclusion remarks and 7.2 Books
actionable insights to improve business process.
13. Exploratory Data Analysis with MATLAB
By Wendy L. Martinez, Angel R. Martinez, Jeffrey Solka.
REFERENCES 14. Exploratory data analysis as a foundation of
inductive research,Andrew T.Jebb,Scott Parrigon,Sang
EunWoo,Human Resource Management
7.1 Journal Articles Review,Volume 27, Issue 2, June 2017, Pages 265-276.
15. Hands-On Exploratory Data Analysis with Python:
1. Tristan Langer and Tobias Meisen,System Design to Perform EDA,Suresh Kumar Mukhiya, Usman
UtilizeDomain Expertise for Visual Exploratory Data Ahmed,2020,PACKT publishing Ltd.
Analysis, Information 2021. 16. Exploratory Data Analysis Using R,Ronald K. Pearson
2. Jinglin Peng,Weiyuan Wu,Brandon Lockhart,Song 2018, CRC Press.
Bian,Jing Nathan Yan Linghao Xu,Zhixuan Chi,Jeffrey M. 17. Hands-On Exploratory Data Analysis with R,Radhika
Rzeszotarski,Jiannan Wang ,DataPrep.EDA: Task- Datar, Harish Garg · May-2019, PACKT publishing Ltd.
Centric Exploratory Data Analysis for Statistical 18. Exploratory Data Analysis - Volume 2 - Page 1, John
Modeling in Python. Wilder Tukey • 1977
3. Babangida Ibrahim Babura,Mohd Bakri
Adam2,Muhammad Sani3,Usman Waziri4,Felix Yakubu
Eguda1,Construction and Applications of Stairboxplot
for Exploratory Data Analysis, Journal of Physics:
Conference Series, ICMSDS 2020.
4. Parvatham Niranjan Kumar, Kambhampati Vijay
Kumar,Comparative Study of Univariate Data
Visualization with Case Study Approach, JAC : A
1927

You might also like