Unit 2
Daxa Patel
Assistant Professor
Computer Science & Engineering Department
What is Data Analytics
• Data Analytics is the process of
collecting, organizing and studying
data to find useful information
understand what’s happening and
make better decisions.
• In simple words it helps people and
businesses learn from data like what
worked in the past, what is
happening now and what might
happen in the future.
Daxa Patel 2
Importance and Usage of Data Analytics
• Data analytics is used in many fields like banking, farming, shopping,
government and more. It helps in many ways:
Daxa Patel 3
Importance and Usage of Data Analytics
• Helps in Decision Making: It gives clear facts and patterns from data
which help people make smarter choices.
• Helps in Problem Solving: It points out what's going wrong and why
making it easier to fix problems.
• Helps Identify Opportunities: It shows trends and new chances for
growth that might not be obvious.
• Improved Efficiency: It helps reduce waste, saves time and makes
work smoother by finding better ways to do things.
Daxa Patel 4
Types of Data Analytics
Daxa Patel 5
Types of Data Analytics
1. Descriptive Data Analytics : Descriptive data analytics helps to
summarize and understand past data. It shows what has happened
by using tables, charts and averages. Companies use it to compare
results, find strengths and weaknesses and spot any unusual
patterns.
2. Diagnostic Data Analytics: Diagnostic data analytics looks at why
something happened in the past. It uses tools like correlation,
regression or comparison to find the cause of a problem. This helps
companies understand the reason behind a drop in sales or a
sudden change in performance.
Daxa Patel 6
Types of Data Analytics
3. Predictive Data Analytics: Predictive data analytics is used to guess
what might happen in the future. It looks at current and past data
to find patterns and make forecasts. Businesses use it to predict
things like customer behavior, future sales or possible risks.
4. Prescriptive Data Analytics: Prescriptive data analytics helps to
choose the best action or solution. It looks at different options and
suggests what should be done next. Companies use it for things like
loan approval, pricing decisions and managing machines or
schedules.
Daxa Patel 7
Process of Data Analytics
• Data Analytics can be done in the following steps which are
mentioned below:
Daxa Patel 8
Process of Data Analytics
• Data Collection : Data collection is the first step where raw
information is gathered from different places like websites, apps,
surveys or machines. Sometimes data comes from many sources and
needs to be joined together. Other times only a small useful part of
the data is selected.
• Data Cleansing : Once the data is collected it usually contains
mistakes like wrong entries, missing values or repeated rows. In this
step the data is cleaned to fix those problems and remove anything
that isn’t needed. Clean data makes the results more accurate and
trustworthy.
Daxa Patel 9
Process of Data Analytics
• Data Analysis and Data Interpretation: After cleaning the data is
studied using tools like Excel, Python, R or SQL. Analysts look for
patterns, trends or useful information that can help solve problems or
answer questions. The goal here is to understand what the data is
telling us.
• Data Visualization: Data visualization is the process of creating visual
representation of data using the plots, charts and graphs which helps
to analyze the patterns, trends and get the valuable insights of the
data. By comparing the datasets and analyzing it data analysts find
the useful data from the raw data.
Daxa Patel 10
Exploratory Data Analysis
Daxa Patel 11
Exploratory Data Analysis
• Exploratory Data Analysis (EDA) is a important step in data science as
it visualizing data to understand its main features, find patterns and
discover how different parts of the data are connected.
Daxa Patel 12
Why Exploratory Data Analysis is Important?
• Exploratory Data Analysis (EDA) is important for several reasons in the context of data
science and statistical modeling. Here are some of the key reasons:
• It helps to understand the dataset by showing how many features it has, what type of
data each feature contains and how the data is distributed.
• It helps to identify hidden patterns and relationships between different data points
which help us in and model building.
• Allows to identify errors or unusual data points (outliers) that could affect our results.
• The insights gained from EDA help us to identify most important features for building
models and guide us on how to prepare them for better performance.
• By understanding the data it helps us in choosing best modeling techniques and adjusting
them for better results.
Daxa Patel 13
Daxa Patel 14
Exploratory Data Analysis
Daxa Patel 15
1. Univariate Analysis
• Univariate analysis focuses on studying one variable to understand its
characteristics.
• It helps to describe data and find patterns within a single feature.
• Various common methods like histograms are used to show data
distribution
• box plots to detect outliers and understand data spread
• bar charts for categorical data.
Daxa Patel 16
Univariate Analysis
• Dataset link : [Link]
1oNUIvyCi4SkqdRBtG0YVND/view
• import pandas as pd
• import seaborn as sns
• data = pd.read_csv('Employee_dataset.csv')
• print([Link]())
Daxa Patel 17
Daxa Patel 18
Univariate Analysis
• Here we’ll be performing univariate analysis on Numerical variables
using the histogram function.
• [Link](data['age'])
Daxa Patel 19
Daxa Patel 20
Bar Chart
• Univariate analysis of categorical data. We’ll be using the count plot
function from the seaborn library
• [Link](data['gender_full'])
Daxa Patel 21
Daxa Patel 22
Univariate Analysis
• A piechart helps us to visualize the percentage of the data belonging
to each category.
x = data['STATUS_YEAR'].value_counts()
[Link]([Link],
labels=[Link],
autopct='%1.1f%%')
[Link]()
Daxa Patel 23
piechart
Daxa Patel 24
Bivariate analysis
• Bivariate analysis is the simultaneous analysis of two variables.
• It explores the concept of the relationship between two variable
whether there exists an association and the strength of this
association or whether there are differences between two variables
and the significance of these differences.
Daxa Patel 25
Bivariate analysis
The main three types we will see here are:
Categorical v/s Numerical
Numerical V/s Numerical
Categorical V/s Categorical data
Daxa Patel 26
Daxa Patel 27
Daxa Patel 28
Daxa Patel 29
Multivariate Analysis
• It is an extension of bivariate analysis which means it
involves multiple variables at the same time to find
correlation between them. Multivariate Analysis is a set of
statistical model that examine patterns in multidimensional
data by considering at once, several data variable.
Daxa Patel 30
Multivariate Analysis
from sklearn import datasets, decomposition
iris = datasets.load_iris()
X = [Link]
y = [Link]
pca = [Link](n_components=2)
X = pca.fit_transform(X)
[Link](x=X[:, 0], y=X[:, 1], hue=y)
Daxa Patel 31
Multivariate Analysis
Daxa Patel 32
Daxa Patel 33
Qualitative vs. Quantitative Data
Daxa Patel 34
Qualitative vs. Quantitative Data
Daxa Patel 35
Methods of Data Analytics
• There are two types of methods in data analytics which are mentioned below:
1. Qualitative Data Analytics
• Qualitative data analysis doesn’t use statistics and derives data from the
words, pictures and symbols. Some common qualitative methods are:
• Narrative Analytics is used for working with data acquired from diaries,
interviews and so on.
• Content Analytics is used for Analytics of verbal data and behaviour.
• Grounded theory is used to explain some given event by studying.
Daxa Patel 36
Methods of Data Analytics
2. Quantitative Data Analysis
• Quantitative data Analytics is used to collect data and then process it
into the numerical data. Some of the quantitative methods are
mentioned below:
• Hypothesis testing assesses the given hypothesis of the data set.
• Sample size determination is the method of taking a small sample
from a large group of people and then analysing it.
• Average or mean of a subject is dividing the sum total numbers in the
list by the number of items present in that list.
Daxa Patel 37
Daxa Patel 38
Daxa Patel 39
Daxa Patel 40
Daxa Patel 41
Daxa Patel 42
Daxa Patel 43
Daxa Patel 44
Daxa Patel 45
Daxa Patel 46
Daxa Patel 47
Daxa Patel 48
Daxa Patel 49
Daxa Patel 50
Quantitative Techniques
• Quantitative EDA techniques involve numerical summaries and
statistics to understand the data’s structure, central tendency, spread,
and relationships.
• These techniques help to:
• Get an overview of the dataset
• Identify patterns and trends
• Spot anomalies (like outliers)
• Prepare data for modeling
Daxa Patel 51
Technique Purpose Example Output
Mean, Median, Mode Central tendency (average values) Mean salary = ₹50,000
Minimum & Maximum Values Range of values in a feature Min age = 18, Max age = 65
Standard Deviation (SD) Measures spread or variability SD of income = ₹15,000
Variance Square of standard deviation High variance = more spread
Distribution split into 100
Percentiles & Quartiles Q1 = 25th percentile
(percentiles) or 4 (quartiles) parts
Shows how spread out the values
Range (Max - Min) Age range = 47
are
Shows asymmetry of data
Skewness Skewness > 0 = right-skewed
distribution
Measures sharpness of distribution
Kurtosis High kurtosis = heavy tails
peak
Measures linear relationship r = 0.85 → strong positive
Correlation Coefficient (r)
between two variables correlation
Number of times a value/category
Frequency Counts “Male”: 300, “Female”: 200
appears
Daxa Patel 52
Graphical EDA
• Graphical EDA refers to using visual methods to understand the
structure, distribution, trends, and relationships in data.
• It helps to see patterns, detect outliers, spot anomalies, and identify
correlations more effectively than numbers alone.
Daxa Patel 53
Common Graphical Techniques in EDA
Graph Type Purpose Best For
Shows frequency distribution of a
Histogram Understanding distribution
numeric variable
Shows median, quartiles, and
Box Plot Detecting outliers
outliers
Displays categorical data as
Bar Chart Comparing categories
rectangular bars
Shows proportions of categories as
Pie Chart Visualizing part-to-whole
slices of a circle
Line Plot Shows trends over time Time series data
Shows relationship between two
Scatter Plot Correlation/Regression analysis
numerical variables
Visualizes correlation matrix or
Heatmap Multivariate relationships
patterns using color scale
Matrix of scatterplots for several
Pair Plot Multivariate EDA
variable pairs
Daxa Patel 54
Example
• Histogram
• Purpose: Understand the
distribution (e.g., normal,
skewed)
import seaborn as sns
[Link](data['age'])
Daxa Patel 55
Example
• Box Plot
• Purpose: Detect outliers,
compare distributions
Daxa Patel 56
Scatter Plot
• Purpose: Check linear or non-linear relationships
between two variables
• [Link](x='age', y='salary', data=data)
Daxa Patel 57
Heatmap
• Purpose: Show correlation matrix
• corr = [Link]()
• [Link](corr, annot=True,
cmap='coolwarm')
Daxa Patel 58
Why Use Graphical EDA?
• Easier to spot trends and anomalies
• Supports better decision making
• Helps in feature selection
• Improves data quality understanding
Daxa Patel 59
Graphical EDA Quantitative EDA
Visual insights (charts, plots) Numerical summaries/statistics
Good for spotting patterns Good for measuring exact data characteristics
Examples: histogram, scatter plot Examples: mean, SD, correlation, skewness
Daxa Patel 60
Conclusion in Data Analytics
• After analyzing a dataset using various quantitative and graphical
techniques (EDA), the conclusion is a summary of key insights and
patterns found in the data.
• Key aspects of the conclusion:
• What trends did you observe?
• Were there any outliers or missing values?
• Which variables are most important?
• Are there correlations between variables?
• Is there a potential for prediction or decision-making?
Daxa Patel 61
Example
• After analyzing customer churn data, we observed that customers
with long complaint histories and no service upgrades in the last 6
months are more likely to leave the service.
Daxa Patel 62
Predictions in Data Analytics
• Prediction is the process of using past data to make informed
forecasts about future or unknown data.
• Types of predictions:
• Classification: Predicting categories (e.g., will a customer buy or not? →
Yes/No)
• Regression: Predicting continuous values (e.g., predicting house prices)
• Clustering (optional): Grouping similar data, helpful in understanding
customer segments
Daxa Patel 63
Example Techniques
• Logistic Regression
• Naïve Bayes Classifier
• Decision Trees
• K-Nearest Neighbors
• Linear Regression
• PCA (for preprocessing)
Daxa Patel 64
How Conclusion Leads to Prediction
Step Description
Explore patterns, clean the data,
EDA
understand relationships
Feature Selection Choose the most relevant features
Model Building Apply algorithms for prediction
Evaluation Test model accuracy using test data
Use the model to make forecasts on new
Prediction
data
Daxa Patel 65
Example
Problem: A company wants to predict whether a customer will subscribe
to a term deposit.
• Conclusion from EDA:
• Customers contacted by phone are more likely to subscribe.
• Age and job type influence decision.
• Prediction:
• Apply logistic regression using features like age, job, contact type, etc.
• Predict future customer behavior.
Daxa Patel 66