Science
Introductio
n
INTRODUCTION TO DATA SCIENCE, OVERVIEW OF DATA
TOOLS IN DATA SCIENCE, DATA SCIENCE METHODOLOGY
DATA REQUIREMENTS, DATA UNDERSTANDING, DATA
PREPARATION, DATA MODELLING
MODEL EVALUATION, DEPLOYMENT, MODEL FEEDBACK,
OVERVIEW OF STRATEGIC IMPACT OF BAI ACROSS KEY
INDUSTRIES
ANALYTICS 3.0, THE NATURE OF ANALYTICAL COMPETITION,
WHAT MAKES ANALYTICAL COMPETITOR
ANALYTICS AND BUSINESS PERFORMANCE, COMPETING ON
ANALYTICS WITH INTERNAL AND EXTERNAL PROCESS
A ROAD MAP TO ANALYTICAL CAPABILITIES
MANAGING ANALYTICAL PEOPLE
THE ARCHITECTURE OF BUSINESS INTELLIGENCE
ESSENTIAL PRACTICE SKILLS FOR HIGH IMPACT ANALYTICAL
PROJECTS
LISTENING TO CLIENT, FRAMING CENTRAL PROBLEM AND
SCOPING A PROJECT
Introduction to Data Science
▪ Data science has emerged as a critical field for businesses in
today's data-driven world. It involves the application of
scientific methods, statistical analysis, and computational
techniques to extract valuable insights and knowledge from
large and complex datasets. By leveraging data science,
businesses can make informed decisions, enhance
operational efficiency, and gain a competitive edge.
▪ In the context of business, data science focuses on
extracting meaningful information from various data sources,
including customer transactions, social media interactions,
website traffic, sensor data, and more. The goal is to uncover
patterns, trends, and relationships within the data that can
drive actionable insights and support evidence-based
decision-making.
Introduction to Business Analytics
▪ Act of working with factual information in organization
▪ Using appropriate tools
▪ Identifies nuggets of wisdom
▪ Helps in decision making
Business Analytics and Business
Intelligence
▪ BI is about gleaning information from past data sources
hoping for information to be derived that are useful for
decision makers.
▪ Business Analytics is led by an objective to find specific
insights or test and validate some hunches that the
organization and its managers may have using
appropriate tools and techniques and plan for future
trends.
Business Analytics and Business Process
Management
▪ BI is about gleaning information from past data sources
hoping for information to be derived that are useful for
decision makers.
▪ Business Analytics is led by an objective to find specific
insights or test and validate some hunches that the
organization and its managers may have using
appropriate tools and techniques and plan for future
trends.
Process of Data science
Data Collection
Data Cleaning and Preprocessing
Exploratory data Analysis
Feature Engineering
Machine Learning and Statistical Models
Model Evaluation and Validation
Insights and decision making
Overview of Tools in Data Science
1. Programming Languages
2. Data Manipulation and Analysis
3. Data Visualization
4. Machine Learning and Data Modelling
5. Big Data Processing
6. Data integration and workflow
Machine
Data Data Data
Programming Learning and Big Data
Manipulation Visualization Integration
Languages Data Processing
and Analysis and Workflow
Modelling
Python Pandas Matplotlib Scikit-learn Apache Spark Apache Airflow
Tensor Flow
R SQL Seaborn Hadoop Knime
and Keras
Pytorch
DATA SCIENCE AND METHODOLOGY
▪ CRISP – DM ▪ OSEMN
(Obtain, Scrub, Explore, Model, iNterpret)
(Cross Industry Standard
Processing for Data Mining) 1. Data science methodology that provides
a sequential framework for data analysis
It has six phases 2. It begins with obtaining and collecting
relevant data, followed by data cleaning
Business Understanding and preprocessing (scrubbing).
Data understanding
3. Next, exploratory data analysis (EDA)
Data Preparation techniques are applied to gain insights and
Modelling understand the data. Modeling involves
Evaluation
building and training machine learning
models, and interpretation focuses on
Deployment deriving meaningful conclusions and
actionable insights from the results.
DATA SCIENCE AND METHODOLOGY
Hypothesis-Driven Approach: Agile Data Science
▪ Formulating hypotheses based on ▪ Borrowing from agile software
domain knowledge and prior development, this methodology
understanding, designing emphasizes an iterative and
experiments or analyses to test collaborative approach to data
these hypotheses, and drawing science projects.
conclusions based on the results. ▪ It involves breaking down the
▪ It emphasizes the use of project into smaller, manageable
statistical inference and tasks, prioritizing them, and
hypothesis testing to validate or delivering incremental results.
reject hypotheses. Feedback loops and regular
communication with stakeholders
are integral to this methodology.
DATA SCIENCE AND METHODOLOGY
Bayesian Inference: Experimental Design
Bayesian inference is a ▪ Experimental design focuses on
probabilistic methodology that planning and conducting controlled
combines prior knowledge and experiments to investigate causal
data to make inferences and relationships. It involves defining
the experimental factors, selecting
update beliefs.
appropriate variables,
It involves defining prior randomization, and statistical
probabilities, gathering data, and analysis to draw valid conclusions.
using Bayes' theorem to calculate ▪ This methodology helps establish
posterior probabilities. Bayesian cause-and-effect relationships and
methods are particularly useful supports decision-making based on
when dealing with uncertainty and experimental results.
incorporating prior knowledge into
Data Science and Methods
Cross Validation
▪ Cross-validation is a technique for evaluating
the performance of predictive models. It
involves partitioning the data into multiple
subsets, training the model on one subset, and
testing it on another.
▪ This approach helps assess the generalization
capability of the model and identify potential
issues such as overfitting or underfitting.
DATA REQUIREMENT – Defining data
requirement in data science
Data Type and Sources
Clearly defining data requirements at
Data Volume and Size the outset of a data science project
helps set expectations, ensures data
availability, and guides the subsequent
Data Quality and Completeness stages of data collection,
preprocessing, and analysis.
Data Variables and Features
Regular reassessment and refinement
of data requirements may be necessary
Data Granularity and Temporality as the project progresses and new
insights are gained.
Data Privacy and Security
Data Accessibility and
Availability
Data Governance and
Documentation
Ethical Consideration
DATA UNDERSTANDING – Types of data
Primary
Secondary
Unstructured Data
Semi Structured data
Meta Data – (Descriptive, Structural, Admistrative)
Structured Data
• DataDATA UNDERSTANDING
understanding is a crucial step in data science for business
analytics.
• It involves gaining a comprehensive understanding of the
available data to extract meaningful insights and support
decision-making.
Data Exploration
Data Profiling
Data Sources and Integration
Data Relationships and dependencies
Domain knowledge Integration
Data Sampling and Subset Creation
Data Documentation and Metadata
Data Privacy and Security
Data Preparation steps in Data Science
Data Cleaning
Data Integration
Data Transformation
Feature Selection
Feature Engineering
Data Encoding
Splitting the data
Data Modelling – Data Science and
Business Analytics Descriptive Modelling Predictive Modelling
focuses on – Predictive modeling
1. Summarizing aims to make
Text Mining: Graph Modeling: 2. Describing predictions or forecasts
Extract Based on historical data based on historical data
meaningful Graph modeling is Use: understand Techniques: statistical
information and used to represent patterns trends and analysis, data
insights from and analyze data relationships visualization, and
unstructured text with complex exploratory data
data. relationships or analysis (EDA)
networks. Use:
Techniques: train models on existing
sentiment Use: social data and then apply
analysis, topic networks, those models to new or
modeling, recommendation unseen data
document systems, fraud
classification, and detection, and Prescriptive Modelling: Time Series Modelling:
text generation supply chain providing recommendations capturing and modeling the
optimization. or optimization solutions patterns, trends, and
Use: analyzing Techniques: linear seasonality in the data to
customer regression, logistic make future predictions
feedback, social regression, decision trees, Techniques: ARIMA,
random forests, support Exponential smoothening,
Model Evaluation
Train/Test Split
Confusion Matrix
Performance Metrics
Cross-Validation
Overfitting and underfitting analysis
Feature Importance
Model Feedback
User / Client Feedback
Performance Evaluation
Error Analysis
Domain Expertise
Continuous Improvement
Communication and
Documnentation
Collaboration and Feedback Loop
Overview of Strategic Impact of BAI across
key industries
HEALTHCARE
HEALTHCARE
RETAIL
RETAIL
FINANCE
DEFENSE
DEFENSE BAI MANUFACTURING
AGRICULTURE
AGRICULTURE
TRANSPORTATION
ENERGY AND UTILITIES
Evolution of Analytics 3.0
▪ Analytics has undergone a ▪ With the advent of Analytics
transformative journey over 2.0, the focus shifted
the years. Analytics 1.0 was towards predictive
marked by descriptive analytics. Organizations
analytics, where historical began utilizing statistical
data was analyzed to gain models and algorithms to
insights into past events forecast future outcomes
and trends. This phase and trends. This enabled
provided organizations with them to make more
a basic understanding of informed decisions and take
their operations and helped proactive measures based
in reporting and on data-driven insights.
visualization.
Analytics 3.0 – the next frontier
▪ Analytics 3.0 represents a paradigm shift in the way data is utilized and
leveraged for decision making. It combines advanced technologies like
AI, machine learning, natural language processing, and big data
analytics to derive actionable insights from complex and diverse
datasets.
▪ One key aspect of Analytics 3.0 is the ability to process and analyze
unstructured data, such as text, images, audio, and video. AI-powered
algorithms can extract valuable information from these data sources,
enabling organizations to gain a deeper understanding of customer
sentiment, preferences, and behavior.
▪ Another defining characteristic of Analytics 3.0 is the integration of real-
time and streaming data analytics. With the proliferation of IoT devices
and sensors, organizations can capture and analyze data in real-time.
This facilitates faster decision making, proactive interventions, and the
ability to respond swiftly to changing market conditions.
Analytics 3.0 - Highlights
Enhanced customer insights
Advanced Risk Management
Process Optimization
Data Driven Decision Making
Innovation and New Business Model
Analytics 3.0 – Challenges and
Considerations
Talent and Skills Gap
Data Privacy and Security
Bias and Fairness
Data Quality and Integration
Interpretability and Explainability
Ethical Use of AI
Nature of Analytical Competition
Data Availability
Technological Advancements
Analytical Talent
Speed and Agility
Innovation and Creativity
Scalability and Infrastructure
Domain Knowledge and Context
Continuous Learning and Improvement
What makes an analytical competitor?
Data-driven Culture
Strong Analytical Skills
Advanced Technology and Tools
Domain Expertise
Scalability and Infrastructure
Innovation and Continuous Learning
Agile and Iterative Approach
Business Impact and Results
Collaboration and Communication
Ethical and Responsible Practices
Competing on analytics with internal and
external process
Internal Process External
Operational
Efficiency Customer Analytics
Talent Management Market Intelligence
Supply Chain Sales and Marketing
Optimization Optimization
Product and Service
Risk Management Innovation
A Roadmap to Analytical Capabilities
For Individual For Companies / Teams
Identify Business Goals and Objectives
Foundation in Data Science
Develop a Data Strategy
Machine Learning and Predictive
Modeling Build a Data Infrastructure
Advanced Analytical Techniques
Hire the Right Talent
Big Data Analytics
Develop Analytical Models
Data Visualization and
Communication Implement Models
Domain Expertise
Measure Performance
Continuous Learning and
Professional Development Refine and Optimize
How to manage analytical people
Foster a Data-Driven Culture
Provide Access to Quality Data and Tools
Set Clear Goals and Expectations
Encourage Autonomy and Creativity
Support Professional Development
Foster Collaboration and Cross-Functional Communication
Recognize and Reward Excellence
Provide Regular Feedback and Support
Balance Workload and Priorities
Encourage Knowledge Sharing
Architecture of Business Process
Management
Architecture of Business Intelligence
From the Architecture of BI
Four Major Components of BI
Data Warehouse Source of Data
Business Analytics A collection of tools for
manipulating, mining, and
analyzing the data in the data
warehouse
Business performance management For monitoring and analyzing
(BPM) performance
User Interface Browser, portals and
dashboards
Essential practice skills for high impact
analytical projects
Problem Formulation
Data Exploration and Preparation
Data Visualization and Communication
Statistical and Analytical Techniques
Machine Learning and Predictive Modeling
Experimental Design and A/B Testing
Business Acumen
Continuous Learning and Adaptability
Collaboration and Teamwork
Ethical Considerations
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Client Engagement: Start by actively Identify the Central Problem: Based on the Understand the Business Context: Gain
listening to the client's needs, goals, and client discussions, distill the information to a deeper understanding of the client's
challenges. Engage in open and identify the central problem or opportunity that industry, market dynamics, competition,
meaningful discussions to understand their the project aims to address. Clearly define the and other relevant factors. This knowledge
business context, objectives, and desired problem statement, ensuring that it is specific, will help you frame the problem in the
outcomes. Ask probing questions to clarify measurable, attainable, relevant, and time-bound appropriate business context and identify
any uncertainties and gain a (SMART). The central problem should capture key drivers and constraints that should be
comprehensive understanding of their the essence of the client's challenge and provide considered in the analysis.
requirements. a clear focus for the project.
Formulate Objectives and Goals: Scope the Project: Define the boundaries and Breakdown of Deliverables and
Collaborate with the client to establish scope of the project based on the problem Milestones: Collaboratively establish the
clear project objectives and goals. These statement and objectives. Determine the data project's deliverables, including
should align with the central problem and sources and variables required for analysis, as intermediate milestones and final
provide a roadmap for the analysis. well as any limitations or constraints. Identify outcomes. This breakdown helps manage
Objectives should be specific, measurable, the key stakeholders involved and establish client expectations, track progress, and
achievable, relevant, and time-bound communication channels and reporting ensure that the project stays on track.
(SMART). They should outline the desired requirements. Define the specific analyses, models,
outcomes, such as increasing revenue, reports, or recommendations that will be
optimizing operations, or improving provided at each stage.
customer satisfaction.
Listening to client, framing central problem and
scoping a project in data science and business
analytics - A step by step process
Resource Planning: Assess the resources required to execute the project successfully. This includes the
team members' skills, expertise, and availability, as well as any additional data or technology needs. Allocate
resources effectively to ensure a balanced workload and maximize efficiency.
Risk Assessment and Mitigation: Identify potential risks and challenges that could impact the project's
success. Evaluate the likelihood and impact of each risk and develop mitigation strategies to address them.
Communicate these risks to the client and establish contingency plans to manage any unforeseen
circumstances.
Establish Timelines and Deadlines: Create a project timeline with specific deadlines for each milestone
and deliverable. Clearly communicate the timeline to the client, ensuring mutual agreement on the project
schedule. Regularly monitor and update the timeline throughout the project to manage progress effectively.
Obtain Client Agreement: Seek formal client agreement and sign-off on the project scope, objectives,
deliverables, and timeline. This ensures that both parties have a shared understanding of the project's scope
and expectations.
Check if you are able to answer the
following questions
▪ Give a Overview of tools used ▪ What is the nature of analytical
in Data Science competition? What are the qualities
of an analytical competitor?
▪ What are the four major
▪ List down your observation on
components of architecture of Strategic Impact of BAI across key
BI. Give a skeletal industries.
representation of BI
architecture ▪ Give step by step approach to
prepare your data for data
▪ Evolution of analytics 3.0 analytics.
▪ What are the areas to compete ▪ What is data modelling? How do
you evaluate and get feedback on
with analytics as internal and
data model?
external process?
▪ List and evaluate the data science
▪ Give the perspectives of methods available for implementing
understanding the data. business analytics.