Data Analytics
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
1
Technological University, Pune
Course Outcomes
1. Examine and compare various datasets and features
2. Analyze the business issue that analytics can address and resolve
3. Apply the basic concepts and algorithms of data analytics
4. Interpret, implement, analyze and validate data using popular data
analytics tools
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
2
Technological University,Pune
Why data analytics in robotics?
• Robotics and data science, both are quickly evolving fields
• Similar to robotics, data science deals with an interdisciplinary field that combines
scientific techniques, statistics, mathematical formulas, and computer systems to
analyze enormous amounts of structured and unstructured data to analyze and
predict possible future business risks, identify future shopping trends, or find a
solution to a current problem.
• Like robotics, data science heavily relies on AI and machine learning to produce
“actionable insights” for various uses.
• Every time a robot needed a new function or was being prepared for a new real-time
pattern involving vision-oriented tasks, it required fresh programming.
• In AI and machine learning experiments, data scientists learned to work with robots
that evolved after determining errors in previous data; we’re constantly developing,
and acquiring new behavior through labeled data, etc. As a result, the scientist’s job
will be easier, and robot evolution will be possible with little human involvement.
Data Science and Robotics — The Growing Impact | by keerthika ravichandran | Medium, https://medium.com/@keerthika.ravi/data-
science-and-robotics-the-growing-impact-21e3549a1dd3, date accessed on: 21 July 2025.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
3
Technological University,Pune
Data science education into robotics
• Data analysis and modeling: Data science education can teach robotics engineers how to
collect, analyze, and model large amounts of data robots generate. This can help improve
the performance and efficiency of robots by enabling them to learn from their experiences
and make better decisions.
• Machine learning: Data science education can provide robotics engineers with the
knowledge and skills to develop machine learning algorithms to help robots learn from
their environment and adapt to changing conditions. This can enable robots to perform
more complex tasks and improve their overall performance.
• Computer vision: Data science education can teach robotics engineers how to develop
computer vision algorithms enabling robots to perceive their environment and make
decisions based on what they see. This can help robots navigate their environment and
perform tasks more accurately.
• Natural language processing (NLP): Data science education can also teach robotics
engineers how to develop natural language processing algorithms that enable robots to
communicate more effectively with humans. This can help improve robots’ usability and
user experience, making them more accessible to a wider range of users.
Data Science and Robotics — The Growing Impact | by keerthika ravichandran | Medium, https://medium.com/@keerthika.ravi/data-
science-and-robotics-the-growing-impact-21e3549a1dd3, date accessed on: 21 July 2025.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
4
Technological University,Pune
Contents
1. Fundamentals of data analytics
Descriptive, predictive and predictive analytics, data types, analytics types, analytics types, data analytics
steps: data preprocessing, data cleaning, data transformation and data visualization
2. Data Analytics Tools
Data analytics using python, statistical procedures, Numpy, Pandas, SciPy, Matplotlib
3. Data Preprocessing
Understanding the data, dealing with missing values, Data Formatting, Data Normalization, Data Binning,
Importing and Exporting data in Python, Turning categorical variable into quantitative variables in Python,
Accessing databases using Python
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
5
Technological University,Pune
Contents
4. Data Visualization
Graphic representation of data, Characteristics and charts for effective graphical displays, Chart
types-Single Var: Dot plot, Jitter plot, Error bar plot, Box-and whisker plot, Histogram, Two variables:
bar chart, scatter plot, line plot, log-log plot, More than two variables: Stacked plots, Parallel
coordinate plot
5. Descriptive and Inferential Statistics
Probability distribution, hypothesis testing, ANOVA, Regression
6. Machine learning concepts
Classification and clustering, Bayes” classifier, decision tree, Apriori algorithm, K-means algorithm,
Logistics Regression, Support Vector Machines, Introduction to recommendation system
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
6
Technological University,Pune
Data Analytics versus Data Science
Feature Data Science Data Analytics
Role it is used for extracting meaningful It is used to get conclusions by
information and insights by processing the raw data
applying various algorithms,
scientific methods on structured
and unstructured data
Programming skills In depth level Basic programming level
Languages used Python, C++, Java, Perl Python, Power BI, Excel
Use of machine learning Uses machine learning algorithm to Does not use machine learning to
get insights of the data get insights of the data
Goals Deals with explorations and Makes use of existing sources
innovations
Data Type Deals with unstructured data Mostly structured data
Statistical skills Necessary Little or minimal importance
Average salary Usually higher compared to data Lower compared to data scientist
analyst
Data Science Vs Artificial Intelligence Vs Machine Learning, https://collegevidya.com/blog/data-science-vs-
artificial-intelligence-vs-machine-learning/,Prepared
date byaccessed on: 14/07/2025
Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
7
Technological University,Pune
1. Fundamentals of data analytics
➢ Data analytics is a process of examining datasets to draw conclusions about the information they
contain.
➢ Involves various techniques and tools to analyze raw data and extract meaningful insights
➢ The primary goal is to support decision making by providing valuable insights
➢ In line with this, there are three types of data analytics as follows:
a) Descriptive analytics: it is the process of analyzing historical data to understand what has
happened in the past? It works on summarizing and analyzing data to provide insights into past
performance and trends.
Descriptive analytics answers the question, "what happened?”
• Techniques and tools for descriptive analytics are tabulated as follows:
Techniques/Tools Role
Data aggregation Combines data from multiple sources to get a comprehensive review
Data mining Extracts patterns and relationships from large datasets
Data visualization Use of charts, graphs and dashboards to visualize the data
Statistical analysis Use of statistical method to summarize and describe data
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
8
Technological University,Pune
1. Fundamentals of data analytics (continued)
Common tools used in data analytics
Tools role
Excel basic data analysis and visualization
Tableau advanced data visualization and dashboarding
Power BI Interactive data visualization and business intelligence
SQL Querying and managing databases
Applications of descriptive analytics
Task Description
Business reporting To generate regular reports on sales, revenue, and other key performance
indicators (KPIs).
Customer segmentation To analyze customer data to identify different segments and their
characteristics.
Market analysis To understand market trends and consumer behavior.
Operational efficiency To monitor and improve business processes.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
9
Technological University,Pune
b) Predictive analysis
It uses historical data and statistical algorithms to forecast future events. Based on the past trends and
patterns, it predicts what is likely to happen.
Predictive analytics answers the question, "What could happen?"
Techniques and Tools for Predictive Analytics are as follows:
Techniques/Tools Role
Regression analysis To model the relationship between input and output variables
Time series analysis analyzes data points collected or recorded at specific time intervals
Machine learning It uses algorithms to learn from data and make predictions
Classification and clustering Grouping the data into categories or clusters based on similarities
Common tools used in predictive analytics include:
Techniques/Tools Role
R For statistical computing and graphics
Python Machine learning and data analysis libraries like scikit-learn and Tensor flow
SAS Advanced analytics, business intelligence and data management
IBM SPSS For statistical analysis and predictive monitoring
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
10
Technological University,Pune
Applications of Predictive Analytics
Area Role
Risk management To predict potential risks and their impact on business operations
Customer retention To identify customers at the risk of churning and developing retention strategies
Sales forecasting Estimating future sales based on historical data
Healthcare Predictive disease outbreak and patient outcomes
c) Prescriptive analytics
It goes beyond predicting future outcomes by recommending actions to achieve desired results. It combines data,
algorithms, and business rules to suggest the best course of action.
Prescriptive analytics answers the question, "What should we do?"
Techniques and Tools for Prescriptive Analytics
Techniques/Tools Role
Optimization It is the method of finding best solution from available feasible options
Simulation Modeling complex systems to evaluate different scenarios
Decision analysis Assessing and comparing different decision options
Machine learning Using algorithms to learn from data and make recommendations
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
11
Technological University,Pune
Common tools used in prescriptive analytics include:
Tools Role
Gurobi Mathematical optimization
IBM ILOG CPLEX Optimization and decision support
AnyLogic Simulation modeling
MATLAB Numerical computing and optimization
Applications of Prescriptive Analytics
Area Description
Supply chain optimization Improving inventory management and logistics
Revenue management Setting optimal pricing strategies
Healthcare Recommending personalized treatment plans
Finance Optimizing investment portfolios and risk management strategies
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
12
Technological University,Pune
Use of Python language for data analytics
• In this course, we are going to learn all concepts of data analysis by using Python language
• Over the past five years, Python is widely used in scientific circles due to its large number of libraries that provides complete set of tools for analysis
and data manipulation
• Comparing with R and Matlab, Python not only provides a platform for processing data but also has features that make it unique compared to other
languages
• This is due to ever increasing number of support libraries, implementation of algorithm of more innovative methodologies and ability to interfere
with other programming languages like C, Fortran
• Apart from data analysis, Python is also used for scripting, generic programming, interface to other databases, and web development as well
• It is possible to develop data analysis projects that are compatible with the web server with the possibility to integrate it on the web
• Python is an interpreted language because it decodes the code line by line by using an interpreter rather than compiling the entire program into
machine code first
• In practice, when we run the python command, python interpreter starts characterized by a >>> prompt
• A python interpreter is simply a program that reads and interprets the commands passed to the prompt
• Interpreter can accept a single command or entire lines of code however, the approach by which it performs this is always the same
• Each time we press enter key, the interpreter begins to scan the code (either a row or a full file of the code) token by token (tokenization)
• These tokens are fragments of the text that interpreter arranges in a tree structure
• The tree obtained represents logical structure of the program which is then converted to bytecode (.pyc or .pyo)
• The process chain ends with the bytecode that will be executed by the python virtual machine (PVM)
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
13
Technological University,Pune
Steps performed by the Python Interpreter
• The standard python interpreter is reported as Cython (being written in C language)
• The Cython project is based on creating a compiler that translates python code into C
• The Jython is an implementation of python programming language into Java
• PyPy interpreter is a JIT compiler which converts python code into a machine code at runtime
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
14
Technological University,Pune
Data Analytics Steps
Six steps of Data Analysis Process
Define the problem Collect data Data cleaning Analyzing the data Data Visualization
Presenting the data
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
15
Technological University,Pune
Quantitative and Qualitative data
Qualitative Quantitative data
Textual, visual and audio data Numerical and categorical data
Qualitative predictions Quantitative predictions
More subjective conclusion More objective conclusion
More descriptive than numerical Expressed in numerical form
For example: the team is well prepared Example: the team has 7 players
I got M.Tech degree I got 9.5 CGPA in M.Tech
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
16
Technological University,Pune
Data Types (continued..)
Parameters Structured data Unstructured data
Representation It is a discrete form ,i.e. stored in It does not follow a specific format
rows and columns
Metadata Syntax (refers to formal rules and Semantics (ensures that code
structures that dictate how code performs the intended operations
must be written a programming and produces the desired result)
language)
Storage Database management system Unmanaged file structure
Standard SQL, ADO.net,ODBC Open XML, SMTO,SMS
Tools for integration ETL Batch processing or manual data
entry
Characteristics Certain information always Information can appear on
appears in the same location on unexpected places on the document
the page
Used by Low volume operations High volume operations
organizations
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
17
Technological University,Pune
Data collection
• It refers to the systematic approach of gathering and measuring information from a variety of
sources to get a complete and accurate picture of an area of interest
• The big data includes information collected by humans and devices
• The big data is focused on the following types of data:
a) Network data: is gathered on all kinds of network including social media, information, and
technological networks, the internet and mobile networks
b) Real-time data: online streaming media like YouTube, Skype, Netflix
c) Transactional data: gathered when a user makes an online purchase (information on the product,
time of purchase, payment methods)
d) Geographic data: location data, humans, vehicles, building, natural reserves and other objects
which are continuously supplied with satellites
e) Natural language data: mostly gathered from voice searches that can be made on different devices
accessing the internet
f) Time series data: related to observation of trends and phenomena taking place at this very
moment and over a period of time, for instance, global temperatures, mortality rates, pollution
levels, etc.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
18
Technological University,Pune
Data Preprocessing
• Data preprocessing is the process of preparing raw data for analysis
by cleaning and transforming it into a usable format. In data mining it
refers to preparing raw data for mining by performing tasks like
cleaning, transforming, and organizing it into a format suitable for
mining algorithms.
• Goal is to improve the quality of the data.
• Helps in handling missing values, removing duplicates, and
normalizing data.
• Ensures the accuracy and consistency of the dataset.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
19
Technological University,Pune
Data preprocessing (Data Preprocessing in Data Mining –
GeeksforGeeks, https://www.geeksforgeeks.org/dbms/data-
preprocessing-in-data-mining/, accessed on: 21/07/2025)
Data Data
Data cleaning integration transformation
Data reduction
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
20
Technological University,Pune
Data cleaning
1. Refers to the process of identifying and correcting errors or inconsistencies in the dataset. It involves handling
missing values, removing duplicates, and correcting incorrect or outlier data to ensure the dataset is accurate
and reliable.
• Clean data is essential for effective analysis since it improves the quality of results and enhances the
performances of data models
• Missing Values: This occur when data is absent from a dataset. You can either ignore the rows with missing
data or fill the gaps manually, with the attribute mean, or by using the most probable value. This ensures the
dataset remains accurate and complete for analysis.
• Noisy Data: It refers to irrelevant or incorrect data that is difficult for machines to interpret, often caused by
errors in data collection or entry. It can be handled in several ways:
• Binning Method: The data is sorted into equal segments, and each segment is smoothed by replacing
values with the mean or boundary values.
• Regression: Data can be smoothed by fitting it to a regression function, either linear or multiple, to
predict values.
• Clustering: This method groups similar data points together, with outliers either being undetected or
falling outside the clusters. These techniques help remove noise and improve data quality.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
21
Technological University,Pune
• Removing Duplicates: It involves identifying and eliminating repeated
data entries to ensure accuracy and consistency in the dataset. This
process prevents errors and ensures reliable analysis by keeping only
unique records.
2. Data Integration: It involves merging data from various sources into a single,
unified dataset. It can be challenging due to differences in data formats, structures,
and meanings. Techniques like record linkage and data fusion help in combining
data efficiently, ensuring consistency and accuracy.
•Record Linkage is the process of identifying and matching records from different
datasets that refer to the same entity, even if they are represented differently. It
helps in combining data from various sources by finding corresponding records
based on common identifiers or attributes.
•Data Fusion involves combining data from multiple sources to create a more
comprehensive and accurate dataset. It integrates information that may be
inconsistent or incomplete from different sources, ensuring a unified and richer
dataset for analysis.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
22
Technological University,Pune
• 3. Data Transformation: It involves converting data into a format suitable for
analysis. Common techniques include normalization, which scales data to a common
range; standardization, which adjusts data to have zero mean and unit variance; and
discretization, which converts continuous data into discrete categories. These
techniques help prepare the data for more accurate analysis.
• Data Normalization: The process of scaling data to a common range to ensure
consistency across variables.
• Discretization: Converting continuous data into discrete categories for easier
analysis.
• Data Aggregation: Combining multiple data points into a summary form, such as
averages or totals, to simplify analysis.
• Concept Hierarchy Generation: Organizing data into a hierarchy of concepts to
provide a higher-level view for better understanding and analysis.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
23
Technological University,Pune
• 4. Data Reduction: It reduces the dataset's size while maintaining key
information. This can be done through feature selection, which chooses the
most relevant features, and feature extraction, which transforms the data into
a lower-dimensional space while preserving important details. It uses various
reduction techniques such as,
• Dimensionality Reduction (e.g., Principal Component Analysis): A technique
that reduces the number of variables in a dataset while retaining its essential
information.
• Numerosity Reduction: Reducing the number of data points by methods like
sampling to simplify the dataset without losing critical patterns.
• Data Compression: Reducing the size of data by encoding it in a more
compact form, making it easier to store and process.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
24
Technological University,Pune
Uses of Data Preprocessing
• Data preprocessing is utilized across various fields to ensure that raw data is transformed into a usable
format for analysis and decision-making. Here are some key areas where data preprocessing is applied:
• 1. Data Warehousing: In data warehousing, preprocessing is essential for cleaning, integrating, and
structuring data before it is stored in a centralized repository. This ensures the data is consistent and reliable
for future queries and reporting.
• 2. Data Mining: Data preprocessing in data mining involves cleaning and transforming raw data to make it
suitable for analysis. This step is crucial for identifying patterns and extracting insights from large datasets.
• 3. Machine Learning: In machine learning, preprocessing prepares raw data for model training. This includes
handling missing values, normalizing features, encoding categorical variables, and splitting datasets into
training and testing sets to improve model performance and accuracy.
• 4. Data Science: Data preprocessing is a fundamental step in data science projects, ensuring that the data
used for analysis or building predictive models is clean, structured, and relevant. It enhances the overall
quality of insights derived from the data.
• 5. Web Mining: In web mining, preprocessing helps analyze web usage logs to extract meaningful user
behavior patterns. This can inform marketing strategies and improve user experience through personalized
recommendations.
• 6. Business Intelligence (BI): Preprocessing supports BI by organizing and cleaning data to create dashboards
and reports that provide actionable insights for decision-makers.
• 7. Deep Learning Purpose: Similar to machine learning, deep learning applications require preprocessing to
normalize or enhance features of the input data, optimizing model training processes.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
25
Technological University,Pune
Advantages of Data Preprocessing
• Improved Data Quality: Ensures data is clean, consistent, and reliable for analysis.
• Better Model Performance: Reduces noise and irrelevant data, leading to more accurate
predictions and insights.
• Efficient Data Analysis: Streamlines data for faster and easier processing.
• Enhanced Decision-Making: Provides clear and well-organized data for better business
decisions.
• Disadvantages of Data Preprocessing
• Time-Consuming: Requires significant time and effort to clean, transform, and organize data.
• Resource-Intensive: Demands computational power and skilled personnel for complex
preprocessing tasks.
• Potential Data Loss: Incorrect handling may result in losing valuable information.
• Complexity: Handling large datasets or diverse formats can be challenging.
Prepared by Dr. Vibhav Ambardekar, Adjunct Faculty, COEP
26
Technological University,Pune