Introduction to data science
IBM SPSS Modeler (v18.1.1)
© Copyright IBM Corporation 2017
Course materials may not be reproduced in whole or in part without the written permission of IBM.
Unit objectives
• List two applications of data science
• Explain the stages in the CRISP-DM methodology
• Describe the skills needed for data science
Introduction to data science © Copyright IBM Corporation 2017
Introduction
• Data is everywhere
• Data science extract insights and actionable relationships
• Data science is interactive and iterative
• Domain knowledge is required
Introduction to data science © Copyright IBM Corporation 2017
Data-science use cases (1 of 2)
• Increase customer satisfaction by better addressing the needs of
customers.
• Reduce churn.
• Better target customers by classifying them into groups with distinct
usage or need patterns.
• Reduce costs in a manufacturing process by
preventing machine failures.
• Reduce the incidence of a heart attack among
those with a cardiac disease.
Introduction to data science © Copyright IBM Corporation 2017
Data-science use cases (2 of 2)
• Reduce costs by better targeting customers in direct mail campaigns.
• Reduce costs by preventing fraudulent credit-card activity, or detecting
it in an earlier stage.
• Increase revenues by increasing the number of products sold
by up- or cross-selling.
• Increase revenues by showing a visitor the best-next- page
on a website.
Introduction to data science © Copyright IBM Corporation 2017
Identify the data scientist persona
• Two personas:
The traditional data scientist
The citizen data scientist
• IBM SPSS Modeler provides the environment for both.
Introduction to data science © Copyright IBM Corporation 2017
Identify the need for a methodology
• A project can become complicated quickly.
• A methodology is needed that guides you through the critical issues.
• Recommendation: use the Cross-Industry Standard Process for Data
Mining (CRISP-DM).
Introduction to data science © Copyright IBM Corporation 2017
Identify the stages in CRISP-DM
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 1: Business understanding
Task Sub task 1 Sub task 2 Sub task 3
Determine Background Business Business
business objectives success
objectives criteria
Assess Inventory of Risks and Terminology
situation resources contingencies
Determine Modeling success
modeling criteria
objectives
Produce Write a project plan Initial assessment
project plan of tools and
techniques
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 2: Data understanding
Task Sub task 1
Collect initial data Data-collection report
Describe data Data-description report
Explore data Data-exploration report
Verify data quality Data-quality report
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 3: Data preparation
Task Sub task 1 Sub task 2
Select data Rationale for inclusion and
exclusion
Clean data Data-cleaning report
Construct data Derived attributes
Format data and Set the unit of analysis Integrate data
combine datasets
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 4: Modeling
Task Sub task 1 Sub task 2
Select modeling Modeling
techniques assumptions
Generate test design Test design
Build model Set model Model
parameters descriptions
Assess model Model Revise model
assessment parameters
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 5: Evaluation
Task Sub task 1 Sub task 2
Evaluate results Assessment of data-science Approve
results with respect to business models
success criteria
Review process Review of process
Determine next List of possible actions Decision
steps
Introduction to data science © Copyright IBM Corporation 2017
Explore stage 6: Deployment
Task Sub task 1 Sub task 2
Plan deployment Deployment plan
Maintenance Maintenance plan
Produce final report Final report Final presentation
Review project Documentation
Introduction to data science © Copyright IBM Corporation 2017
Identify the life cycle of a data-science project
• The stages influence each other in a non-linear way.
• A data science project is an ongoing endeavor.
Introduction to data science © Copyright IBM Corporation 2017
Identify the required skills
• Understand the business:
Asking the right question requires knowledge of the business and
organization.
Evaluating a solution requires a business perspective.
• Database knowledge:
The database administrator plays a key role.
• Knowledge of modeling:
Identify the best model(s) for the situation.
Fine-tune models.
• Team work combining multiple competencies:
Business domain knowledge.
Database knowledge.
Modeling.
Project management.
Introduction to data science © Copyright IBM Corporation 2017
Unit summary
• List two applications of data science
• Explain the stages in the CRISP-DM methodology
• Describe the skills needed for data science
Introduction to data science © Copyright IBM Corporation 2017