0% found this document useful (0 votes)

5 views66 pages

Data Analytics Process

Data Analytics is the process of collecting and analyzing data to derive insights for better decision-making. It encompasses various types such as descriptive, diagnostic, predictive, and prescriptive analytics, and follows a structured process including data collection, cleansing, analysis, and visualization. Exploratory Data Analysis (EDA) is crucial for understanding datasets, identifying patterns, and preparing data for modeling, ultimately leading to informed predictions.

Uploaded by

vsingh701623

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views66 pages

Data Analytics Process

Uploaded by

vsingh701623

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit 2

Daxa Patel
Assistant Professor
Computer Science & Engineering Department
What is Data Analytics
• Data Analytics is the process of
collecting, organizing and studying
data to find useful information
understand what’s happening and
make better decisions.
• In simple words it helps people and
businesses learn from data like what
worked in the past, what is
happening now and what might
happen in the future.

Daxa Patel 2
Importance and Usage of Data Analytics
• Data analytics is used in many fields like banking, farming, shopping,
government and more. It helps in many ways:

Daxa Patel 3
Importance and Usage of Data Analytics
• Helps in Decision Making: It gives clear facts and patterns from data
which help people make smarter choices.
• Helps in Problem Solving: It points out what's going wrong and why
making it easier to fix problems.
• Helps Identify Opportunities: It shows trends and new chances for
growth that might not be obvious.
• Improved Efficiency: It helps reduce waste, saves time and makes
work smoother by finding better ways to do things.

Daxa Patel 4
Types of Data Analytics

Daxa Patel 5
Types of Data Analytics
1. Descriptive Data Analytics : Descriptive data analytics helps to
summarize and understand past data. It shows what has happened
by using tables, charts and averages. Companies use it to compare
results, find strengths and weaknesses and spot any unusual
patterns.
2. Diagnostic Data Analytics: Diagnostic data analytics looks at why
something happened in the past. It uses tools like correlation,
regression or comparison to find the cause of a problem. This helps
companies understand the reason behind a drop in sales or a
sudden change in performance.

Daxa Patel 6
Types of Data Analytics
3. Predictive Data Analytics: Predictive data analytics is used to guess
what might happen in the future. It looks at current and past data
to find patterns and make forecasts. Businesses use it to predict
things like customer behavior, future sales or possible risks.
4. Prescriptive Data Analytics: Prescriptive data analytics helps to
choose the best action or solution. It looks at different options and
suggests what should be done next. Companies use it for things like
loan approval, pricing decisions and managing machines or
schedules.

Daxa Patel 7
Process of Data Analytics
• Data Analytics can be done in the following steps which are
mentioned below:

Daxa Patel 8
Process of Data Analytics
• Data Collection : Data collection is the first step where raw
information is gathered from different places like websites, apps,
surveys or machines. Sometimes data comes from many sources and
needs to be joined together. Other times only a small useful part of
the data is selected.
• Data Cleansing : Once the data is collected it usually contains
mistakes like wrong entries, missing values or repeated rows. In this
step the data is cleaned to fix those problems and remove anything
that isn’t needed. Clean data makes the results more accurate and
trustworthy.

Daxa Patel 9
Process of Data Analytics
• Data Analysis and Data Interpretation: After cleaning the data is
studied using tools like Excel, Python, R or SQL. Analysts look for
patterns, trends or useful information that can help solve problems or
answer questions. The goal here is to understand what the data is
telling us.
• Data Visualization: Data visualization is the process of creating visual
representation of data using the plots, charts and graphs which helps
to analyze the patterns, trends and get the valuable insights of the
data. By comparing the datasets and analyzing it data analysts find
the useful data from the raw data.

Daxa Patel 10
Exploratory Data Analysis

Daxa Patel 11
Exploratory Data Analysis
• Exploratory Data Analysis (EDA) is a important step in data science as
it visualizing data to understand its main features, find patterns and
discover how different parts of the data are connected.

Daxa Patel 12
Why Exploratory Data Analysis is Important?
• Exploratory Data Analysis (EDA) is important for several reasons in the context of data
science and statistical modeling. Here are some of the key reasons:

• It helps to understand the dataset by showing how many features it has, what type of
data each feature contains and how the data is distributed.
• It helps to identify hidden patterns and relationships between different data points
which help us in and model building.
• Allows to identify errors or unusual data points (outliers) that could affect our results.
• The insights gained from EDA help us to identify most important features for building
models and guide us on how to prepare them for better performance.
• By understanding the data it helps us in choosing best modeling techniques and adjusting
them for better results.

Daxa Patel 13
Daxa Patel 14
Exploratory Data Analysis

Daxa Patel 15
1. Univariate Analysis
• Univariate analysis focuses on studying one variable to understand its
characteristics.
• It helps to describe data and find patterns within a single feature.
• Various common methods like histograms are used to show data
distribution
• box plots to detect outliers and understand data spread
• bar charts for categorical data.

Daxa Patel 16
Univariate Analysis
• Dataset link : [Link]
1oNUIvyCi4SkqdRBtG0YVND/view

• import pandas as pd
• import seaborn as sns
• data = pd.read_csv('Employee_dataset.csv')
• print([Link]())

Daxa Patel 17
Daxa Patel 18
Univariate Analysis
• Here we’ll be performing univariate analysis on Numerical variables
using the histogram function.

• [Link](data['age'])

Daxa Patel 19
Daxa Patel 20
Bar Chart
• Univariate analysis of categorical data. We’ll be using the count plot
function from the seaborn library

• [Link](data['gender_full'])

Daxa Patel 21
Daxa Patel 22
Univariate Analysis
• A piechart helps us to visualize the percentage of the data belonging
to each category.

x = data['STATUS_YEAR'].value_counts()
[Link]([Link],
labels=[Link],
autopct='%1.1f%%')
[Link]()

Daxa Patel 23
piechart

Daxa Patel 24
Bivariate analysis
• Bivariate analysis is the simultaneous analysis of two variables.
• It explores the concept of the relationship between two variable
whether there exists an association and the strength of this
association or whether there are differences between two variables
and the significance of these differences.

Daxa Patel 25
Bivariate analysis
The main three types we will see here are:

Categorical v/s Numerical

Numerical V/s Numerical
Categorical V/s Categorical data

Daxa Patel 26
Daxa Patel 27
Daxa Patel 28
Daxa Patel 29
Multivariate Analysis
• It is an extension of bivariate analysis which means it
involves multiple variables at the same time to find
correlation between them. Multivariate Analysis is a set of
statistical model that examine patterns in multidimensional
data by considering at once, several data variable.

Daxa Patel 30
Multivariate Analysis
from sklearn import datasets, decomposition
iris = datasets.load_iris()
X = [Link]
y = [Link]
pca = [Link](n_components=2)
X = pca.fit_transform(X)
[Link](x=X[:, 0], y=X[:, 1], hue=y)

Daxa Patel 31
Multivariate Analysis

Daxa Patel 32
Daxa Patel 33
Qualitative vs. Quantitative Data

Daxa Patel 34
Qualitative vs. Quantitative Data

Daxa Patel 35
Methods of Data Analytics
• There are two types of methods in data analytics which are mentioned below:

1. Qualitative Data Analytics

• Qualitative data analysis doesn’t use statistics and derives data from the
words, pictures and symbols. Some common qualitative methods are:

• Narrative Analytics is used for working with data acquired from diaries,
interviews and so on.
• Content Analytics is used for Analytics of verbal data and behaviour.
• Grounded theory is used to explain some given event by studying.

Daxa Patel 36
Methods of Data Analytics
2. Quantitative Data Analysis
• Quantitative data Analytics is used to collect data and then process it
into the numerical data. Some of the quantitative methods are
mentioned below:

• Hypothesis testing assesses the given hypothesis of the data set.

• Sample size determination is the method of taking a small sample
from a large group of people and then analysing it.
• Average or mean of a subject is dividing the sum total numbers in the
list by the number of items present in that list.

Daxa Patel 37
Daxa Patel 38
Daxa Patel 39
Daxa Patel 40
Daxa Patel 41
Daxa Patel 42
Daxa Patel 43
Daxa Patel 44
Daxa Patel 45
Daxa Patel 46
Daxa Patel 47
Daxa Patel 48
Daxa Patel 49
Daxa Patel 50
Quantitative Techniques
• Quantitative EDA techniques involve numerical summaries and
statistics to understand the data’s structure, central tendency, spread,
and relationships.
• These techniques help to:
• Get an overview of the dataset
• Identify patterns and trends
• Spot anomalies (like outliers)
• Prepare data for modeling

Daxa Patel 51
Technique Purpose Example Output
Mean, Median, Mode Central tendency (average values) Mean salary = ₹50,000
Minimum & Maximum Values Range of values in a feature Min age = 18, Max age = 65
Standard Deviation (SD) Measures spread or variability SD of income = ₹15,000
Variance Square of standard deviation High variance = more spread
Distribution split into 100
Percentiles & Quartiles Q1 = 25th percentile
(percentiles) or 4 (quartiles) parts
Shows how spread out the values
Range (Max - Min) Age range = 47
are
Shows asymmetry of data
Skewness Skewness > 0 = right-skewed
distribution
Measures sharpness of distribution
Kurtosis High kurtosis = heavy tails
peak
Measures linear relationship r = 0.85 → strong positive
Correlation Coefficient (r)
between two variables correlation
Number of times a value/category
Frequency Counts “Male”: 300, “Female”: 200
appears
Daxa Patel 52
Graphical EDA
• Graphical EDA refers to using visual methods to understand the
structure, distribution, trends, and relationships in data.
• It helps to see patterns, detect outliers, spot anomalies, and identify
correlations more effectively than numbers alone.

Daxa Patel 53
Common Graphical Techniques in EDA
Graph Type Purpose Best For
Shows frequency distribution of a
Histogram Understanding distribution
numeric variable
Shows median, quartiles, and
Box Plot Detecting outliers
outliers
Displays categorical data as
Bar Chart Comparing categories
rectangular bars
Shows proportions of categories as
Pie Chart Visualizing part-to-whole
slices of a circle
Line Plot Shows trends over time Time series data
Shows relationship between two
Scatter Plot Correlation/Regression analysis
numerical variables
Visualizes correlation matrix or
Heatmap Multivariate relationships
patterns using color scale
Matrix of scatterplots for several
Pair Plot Multivariate EDA
variable pairs

Daxa Patel 54
Example
• Histogram
• Purpose: Understand the
distribution (e.g., normal,
skewed)

import seaborn as sns

[Link](data['age'])

Daxa Patel 55
Example
• Box Plot
• Purpose: Detect outliers,
compare distributions

Daxa Patel 56
Scatter Plot
• Purpose: Check linear or non-linear relationships
between two variables
• [Link](x='age', y='salary', data=data)

Daxa Patel 57
Heatmap
• Purpose: Show correlation matrix
• corr = [Link]()
• [Link](corr, annot=True,
cmap='coolwarm')

Daxa Patel 58
Why Use Graphical EDA?
• Easier to spot trends and anomalies
• Supports better decision making
• Helps in feature selection
• Improves data quality understanding

Daxa Patel 59
Graphical EDA Quantitative EDA

Visual insights (charts, plots) Numerical summaries/statistics

Good for spotting patterns Good for measuring exact data characteristics

Examples: histogram, scatter plot Examples: mean, SD, correlation, skewness

Daxa Patel 60
Conclusion in Data Analytics
• After analyzing a dataset using various quantitative and graphical
techniques (EDA), the conclusion is a summary of key insights and
patterns found in the data.

• Key aspects of the conclusion:

• What trends did you observe?
• Were there any outliers or missing values?
• Which variables are most important?
• Are there correlations between variables?
• Is there a potential for prediction or decision-making?

Daxa Patel 61
Example
• After analyzing customer churn data, we observed that customers
with long complaint histories and no service upgrades in the last 6
months are more likely to leave the service.

Daxa Patel 62
Predictions in Data Analytics
• Prediction is the process of using past data to make informed
forecasts about future or unknown data.
• Types of predictions:
• Classification: Predicting categories (e.g., will a customer buy or not? →
Yes/No)
• Regression: Predicting continuous values (e.g., predicting house prices)
• Clustering (optional): Grouping similar data, helpful in understanding
customer segments

Daxa Patel 63
Example Techniques
• Logistic Regression
• Naïve Bayes Classifier
• Decision Trees
• K-Nearest Neighbors
• Linear Regression
• PCA (for preprocessing)

Daxa Patel 64
How Conclusion Leads to Prediction
Step Description

Explore patterns, clean the data,

EDA
understand relationships

Feature Selection Choose the most relevant features

Model Building Apply algorithms for prediction

Evaluation Test model accuracy using test data

Use the model to make forecasts on new

Prediction
data

Daxa Patel 65
Example
Problem: A company wants to predict whether a customer will subscribe
to a term deposit.
• Conclusion from EDA:
• Customers contacted by phone are more likely to subscribe.
• Age and job type influence decision.
• Prediction:
• Apply logistic regression using features like age, job, contact type, etc.
• Predict future customer behavior.

Daxa Patel 66

Notes
No ratings yet
Notes
5 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
Cami16 Data Analytics
No ratings yet
Cami16 Data Analytics
37 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Data Exploration I
No ratings yet
Data Exploration I
23 pages
Exploratory Data Analysis in Data Science
No ratings yet
Exploratory Data Analysis in Data Science
31 pages
Data Analytics With Python Lecture 1
No ratings yet
Data Analytics With Python Lecture 1
23 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Why Exploratory Data Analysis Is Important
No ratings yet
Why Exploratory Data Analysis Is Important
2 pages
Data Visualization & Analytics Guide
No ratings yet
Data Visualization & Analytics Guide
10 pages
Introd Ata Lytics
No ratings yet
Introd Ata Lytics
32 pages
Essential Guide to Exploratory Data Analysis
No ratings yet
Essential Guide to Exploratory Data Analysis
3 pages
Data Analytics
No ratings yet
Data Analytics
36 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
MODULE2 Material
No ratings yet
MODULE2 Material
14 pages
Data ANALYSIS and Data Interpretation
No ratings yet
Data ANALYSIS and Data Interpretation
15 pages
CH 1
No ratings yet
CH 1
31 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
6 pages
DS Mini
No ratings yet
DS Mini
3 pages
Data Analytics - UNIT1
No ratings yet
Data Analytics - UNIT1
18 pages
2.1 Data Analytics
No ratings yet
2.1 Data Analytics
16 pages
Unit 3
No ratings yet
Unit 3
31 pages
EDA Feature Eng - Estimation Inference and Hypothesis
No ratings yet
EDA Feature Eng - Estimation Inference and Hypothesis
53 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Yellow and Blue Data Visualization Basics Illustrated Presentation
No ratings yet
Yellow and Blue Data Visualization Basics Illustrated Presentation
9 pages
Module 8 - Data Analysis & Presentation
No ratings yet
Module 8 - Data Analysis & Presentation
39 pages
Detailed Data Analytics Notes
No ratings yet
Detailed Data Analytics Notes
3 pages
DA Interview Questions
No ratings yet
DA Interview Questions
7 pages
Unit 2 - Data Science
No ratings yet
Unit 2 - Data Science
37 pages
Probability and Stat Unit 1
No ratings yet
Probability and Stat Unit 1
12 pages
Data Analytics Msbte K Scheme: by Study Tech
No ratings yet
Data Analytics Msbte K Scheme: by Study Tech
19 pages
Welcome
No ratings yet
Welcome
8 pages
Data Analytics - TYBCS
No ratings yet
Data Analytics - TYBCS
6 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Data Analytics Overview for COMP 333
No ratings yet
Data Analytics Overview for COMP 333
7 pages
Wa0000.
No ratings yet
Wa0000.
15 pages
Data Visualization
No ratings yet
Data Visualization
18 pages
Data Acquisition and EDA Techniques
No ratings yet
Data Acquisition and EDA Techniques
58 pages
Unit 02 Business Analytics
No ratings yet
Unit 02 Business Analytics
22 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
4 pages
Q) Concept of Data Analytics
No ratings yet
Q) Concept of Data Analytics
28 pages
Module 1 - 2 - EDA
No ratings yet
Module 1 - 2 - EDA
12 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
6 pages
FTA-Module 1-Notes
No ratings yet
FTA-Module 1-Notes
24 pages
Data Analytics Notes
No ratings yet
Data Analytics Notes
2 pages
Dmda M2
No ratings yet
Dmda M2
1 page
Unit 3
No ratings yet
Unit 3
47 pages
MPRA Paper 120831
No ratings yet
MPRA Paper 120831
26 pages
Unit 1 Topic 1 Intro
100% (1)
Unit 1 Topic 1 Intro
30 pages
Understanding Data Analytics Basics
No ratings yet
Understanding Data Analytics Basics
6 pages
Data Mining Vs Data Exploration UNIT-II
No ratings yet
Data Mining Vs Data Exploration UNIT-II
11 pages
EDA Exploratory Data Analysis
No ratings yet
EDA Exploratory Data Analysis
6 pages
BM 411 Notes Con 2024
No ratings yet
BM 411 Notes Con 2024
59 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Aa MDM MST
No ratings yet
Aa MDM MST
8 pages
PDS 13011
No ratings yet
PDS 13011
11 pages
Wa0029.
No ratings yet
Wa0029.
1 page
Unit2 3 Sample Question Bank
No ratings yet
Unit2 3 Sample Question Bank
2 pages
CN Unit-1
No ratings yet
CN Unit-1
2 pages
德力西电气新纪元系列产品全集2023年第1版 - 终端配电.zh-CN.en
No ratings yet
德力西电气新纪元系列产品全集2023年第1版 - 终端配电.zh-CN.en
35 pages
Navy Ship Design Abbreviations Guide
100% (1)
Navy Ship Design Abbreviations Guide
17 pages
Information and Communication Technology 09: Daily Class Notes
No ratings yet
Information and Communication Technology 09: Daily Class Notes
24 pages
Fees Structure
No ratings yet
Fees Structure
3 pages
Hierarchy of Information Systems: Assignment
No ratings yet
Hierarchy of Information Systems: Assignment
4 pages
Catalog Appleton Atx Pre Series 16 Amp Plugs Sockets Cover
No ratings yet
Catalog Appleton Atx Pre Series 16 Amp Plugs Sockets Cover
5 pages
Iot Lab 2025
No ratings yet
Iot Lab 2025
38 pages
Mediant Virtual Edition SBC Installation Manual Ver 74
No ratings yet
Mediant Virtual Edition SBC Installation Manual Ver 74
64 pages
Illustrated Parts & Service Map: HP Compaq dc5850 Small Form Factor Business PC
No ratings yet
Illustrated Parts & Service Map: HP Compaq dc5850 Small Form Factor Business PC
4 pages
Product Data Sheet Deltav Electronic Marshalling Deltav en 57016
No ratings yet
Product Data Sheet Deltav Electronic Marshalling Deltav en 57016
25 pages
Def Stan Index
No ratings yet
Def Stan Index
160 pages
Unit - 1
No ratings yet
Unit - 1
83 pages
Ga LP Bypass
No ratings yet
Ga LP Bypass
16 pages
WIR - Ceiling Closure Inspection Request
No ratings yet
WIR - Ceiling Closure Inspection Request
3 pages
Game Port
No ratings yet
Game Port
7 pages
Wallpapers Ipad - Google Search 2
No ratings yet
Wallpapers Ipad - Google Search 2
1 page
PPR - 1 - Abhishek Jayant Roll 15
No ratings yet
PPR - 1 - Abhishek Jayant Roll 15
6 pages
c.pCO and tERA Integration Guide
No ratings yet
c.pCO and tERA Integration Guide
4 pages
Non-Conformance Control Procedure
100% (4)
Non-Conformance Control Procedure
12 pages
Operational Safety Excellence
No ratings yet
Operational Safety Excellence
10 pages
Visual Basic and MS Access Project Report in Electricity Billing System
85% (60)
Visual Basic and MS Access Project Report in Electricity Billing System
107 pages
Hydrogen Risk Assessment Models Manual
No ratings yet
Hydrogen Risk Assessment Models Manual
60 pages
WOLFCRAFT New Products en
No ratings yet
WOLFCRAFT New Products en
85 pages
"An Error Occurred!" - Trust Repair With Virtual Robot Using Levels of Mistake Explanation
No ratings yet
"An Error Occurred!" - Trust Repair With Virtual Robot Using Levels of Mistake Explanation
9 pages
Nox Nerbo WPT Review
No ratings yet
Nox Nerbo WPT Review
4 pages
4 Amicus Curiae 685
No ratings yet
4 Amicus Curiae 685
23 pages
Exc 4.1
No ratings yet
Exc 4.1
4 pages
Odel Sri Lanka
No ratings yet
Odel Sri Lanka
13 pages
Introduction To Oracle Database Performance Tuning
No ratings yet
Introduction To Oracle Database Performance Tuning
11 pages
ABAP 7.4 Internal Table Expressions
No ratings yet
ABAP 7.4 Internal Table Expressions
5 pages

Data Analytics Process

Uploaded by

Data Analytics Process

Uploaded by

Unit 2

Categorical v/s Numerical

1. Qualitative Data Analytics

• Hypothesis testing assesses the given hypothesis of the data set.

import seaborn as sns

Visual insights (charts, plots) Numerical summaries/statistics

Examples: histogram, scatter plot Examples: mean, SD, correlation, skewness

• Key aspects of the conclusion:

Explore patterns, clean the data,

Feature Selection Choose the most relevant features

Model Building Apply algorithms for prediction

Evaluation Test model accuracy using test data

Use the model to make forecasts on new

You might also like