DATA SCIENCE
METHODOLOGY
WEEK 1
DATA SCIENCE
METHODOLOGY OVERVIEW
WHAT IS A METHODOLOGY?
A methodology is:
. A system of methods
· A guideline for decision-making
during the scientific process
APPLYING DATA SCIENCE
METHODOLOGY
Structured approach for solving
problems and making data-driven
decisions Includes :
Perform data collection
Create of measurement
strategies
Data analysis method
comparisons
ADDRESSING DATA SCIENCE
CHALLENGES
Apply practical guidance
Avoid the mistakes that can
happen by jumping to
solutions before the analysis
DATA METHODOLOGY SATGES
BUSINESS UNDERSTANDING
BUSINESS UNDERSTANDING
Understanding the objective is very important in
choosing the data science methodology.
Once the goal is clarified, the next piece of the puzzle is
to figure out the objectives that are in support of the
goal.
Depending on the problem, different stakeholders will
need to be engaged in the discussion to help
determine requirements and clarify questions.
Analytic Approach
Selecting the right analytic approach depends on
the question being asked.
The approach involves seeking clarification from
the person who is asking the question, so as to be
able to pick the most appropriate path or
approach
This means identifying what type of patterns will
be needed to address the question most
effectively.
DATA REQUIREMENTS
You want to prepare a
dish?
Step 1 : know about their
recipe and ingredients
Step 2 : collect the
ingredients and know
how to work with them
As cooking with data, the data scientist
needs to identify: which ingredients are
required, how to source or to collect
them, how to understand or work with
them, and how to prepare the data to
meet the desired outcome.
Prior to undertaking the data collection
and data preparation stages of the
methodology, it's vital to define the data
requirements for decision-tree
classification. This includes identifying
the necessary data content, formats
and sources for initial data collection.
DATA COLLECTION
A process of synthesizing all the
information from many different sources
and storing it in an established system
The purpose of collecting data is to
serve analysis, research, management,
business or making decisions related to
fields such as science, society, business..
WEEK 2
DATA UNDERSTANDING
Data understanding
encompasses all activities
related to constructing the data
set.
The data understanding
section of the data science
methodology answers the
question: Is the data that you
collected representative of the
problem to be solved?
Data Preparation
In a sense, data preparation is similar
to washing freshly picked vegetables
in so far as unwanted elements, such
as dirt or imperfections, are removed.
Together with data collection and
data understanding, data preparation
is the most time-consuming phase of
a data science project, typically
taking seventy percent and even up
to even ninety percent of the overall
project time.
Similarly, transforming data in the
Data Preparation data preparation phase is the
process of getting the data into a
state where it may be easier to
work with.
To work effectively with the data, it
must be prepared in a way that
addresses missing or invalid values
and removes duplicates, toward
ensuring that everything is properly
formatted
DATA PREPARATION - CASE STUDY
Data Collection
Data Cleaning
Data Transformation
Data Integration
Data Reduction
Data Validation
FROM MODELING TO EVALUATION
MODELING
What is the Purpose of Data Modeling?
Developing models that are either
descriptive or predictive.
What are some characteristics of this
process?
Based on statistical or machine learning
approaches.
Uses a training set with known outcomes
to calibrate the model.
Involves experimenting with different
algorithms and variables.
EVALUATION
A model evaluation goes
hand-in-hand with model
building as such, the
modeling and evaluation
stages are done iteratively.
Model evaluation is
performed during model
development and before the
model is deployed.
The first is the diagnostic measures phase,
which is used to ensure the model is
working as intended.
If the model is a predictive model, a
decision tree can be used to evaluate if
the answer the model can output, is
aligned to the initial design. It can be
used to see where there are areas that
require adjustments.
If the model is a descriptive model, one
in which relationships are being
assessed, then a testing set with known
outcomes can be applied, and the
model can be refined as needed.
The second phase of evaluation that may
be used is statistical significance testing.
This type of evaluation can be applied to
the model to ensure that the data is being
properly handled and interpreted within the
model. This is designed to avoid
unnecessary second guessing when the
answer is revealed
WEEK 3
UNDERSTANDING DEPLOYMENT
Case Study - Understand the results
Case Study - Gathering application
requirements
Assimilate knowledge for business
. Practical understanding of the meaning of model results
. Implications of model results for designing intervention
actions
FEEDBACK
DEVELOPMENT PHASE
Data Preparation
Model Building
Model Evaluation
TRANSITION TO FEEDBACK PHASE
Deployment
Monitoring
Feedback Collection
FEEDBACK PHASE
Error Analysis
Model Refinement
Retraining
Test
Assessing model performance
THANK YOU
CASE STUDY-FEEDBACK
Assessing model performance
Refinement
Redeployment