0% found this document useful (0 votes)
11 views17 pages

01-Introduction To Data Science

The document introduces data science, highlighting its applications such as improving customer satisfaction and reducing costs. It outlines the CRISP-DM methodology, which includes stages like business understanding, data preparation, and modeling, as essential for guiding data science projects. Additionally, it emphasizes the importance of various skills, including business knowledge and teamwork, for successful data science initiatives.

Uploaded by

orhan sivrikaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

01-Introduction To Data Science

The document introduces data science, highlighting its applications such as improving customer satisfaction and reducing costs. It outlines the CRISP-DM methodology, which includes stages like business understanding, data preparation, and modeling, as essential for guiding data science projects. Additionally, it emphasizes the importance of various skills, including business knowledge and teamwork, for successful data science initiatives.

Uploaded by

orhan sivrikaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to data science

IBM SPSS Modeler (v18.1.1)

© Copyright IBM Corporation 2017


Course materials may not be reproduced in whole or in part without the written permission of IBM.
Unit objectives
• List two applications of data science
• Explain the stages in the CRISP-DM methodology
• Describe the skills needed for data science

Introduction to data science © Copyright IBM Corporation 2017


Introduction
• Data is everywhere
• Data science extract insights and actionable relationships
• Data science is interactive and iterative
• Domain knowledge is required

Introduction to data science © Copyright IBM Corporation 2017


Data-science use cases (1 of 2)
• Increase customer satisfaction by better addressing the needs of
customers.
• Reduce churn.
• Better target customers by classifying them into groups with distinct
usage or need patterns.
• Reduce costs in a manufacturing process by
preventing machine failures.
• Reduce the incidence of a heart attack among
those with a cardiac disease.

Introduction to data science © Copyright IBM Corporation 2017


Data-science use cases (2 of 2)
• Reduce costs by better targeting customers in direct mail campaigns.
• Reduce costs by preventing fraudulent credit-card activity, or detecting
it in an earlier stage.
• Increase revenues by increasing the number of products sold
by up- or cross-selling.
• Increase revenues by showing a visitor the best-next- page
on a website.

Introduction to data science © Copyright IBM Corporation 2017


Identify the data scientist persona
• Two personas:
 The traditional data scientist
 The citizen data scientist
• IBM SPSS Modeler provides the environment for both.

Introduction to data science © Copyright IBM Corporation 2017


Identify the need for a methodology
• A project can become complicated quickly.
• A methodology is needed that guides you through the critical issues.
• Recommendation: use the Cross-Industry Standard Process for Data
Mining (CRISP-DM).

Introduction to data science © Copyright IBM Corporation 2017


Identify the stages in CRISP-DM
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 1: Business understanding

Task Sub task 1 Sub task 2 Sub task 3


Determine Background Business Business
business objectives success
objectives criteria
Assess Inventory of Risks and Terminology
situation resources contingencies
Determine Modeling success
modeling criteria
objectives
Produce Write a project plan Initial assessment
project plan of tools and
techniques

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 2: Data understanding

Task Sub task 1

Collect initial data Data-collection report


Describe data Data-description report
Explore data Data-exploration report
Verify data quality Data-quality report

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 3: Data preparation

Task Sub task 1 Sub task 2

Select data Rationale for inclusion and


exclusion
Clean data Data-cleaning report
Construct data Derived attributes
Format data and Set the unit of analysis Integrate data
combine datasets

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 4: Modeling

Task Sub task 1 Sub task 2

Select modeling Modeling


techniques assumptions
Generate test design Test design
Build model Set model Model
parameters descriptions
Assess model Model Revise model
assessment parameters

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 5: Evaluation

Task Sub task 1 Sub task 2


Evaluate results Assessment of data-science Approve
results with respect to business models
success criteria
Review process Review of process
Determine next List of possible actions Decision
steps

Introduction to data science © Copyright IBM Corporation 2017


Explore stage 6: Deployment

Task Sub task 1 Sub task 2


Plan deployment Deployment plan
Maintenance Maintenance plan
Produce final report Final report Final presentation
Review project Documentation

Introduction to data science © Copyright IBM Corporation 2017


Identify the life cycle of a data-science project
• The stages influence each other in a non-linear way.
• A data science project is an ongoing endeavor.

Introduction to data science © Copyright IBM Corporation 2017


Identify the required skills
• Understand the business:
 Asking the right question requires knowledge of the business and
organization.
 Evaluating a solution requires a business perspective.
• Database knowledge:
 The database administrator plays a key role.
• Knowledge of modeling:
 Identify the best model(s) for the situation.
 Fine-tune models.
• Team work combining multiple competencies:
 Business domain knowledge.
 Database knowledge.
 Modeling.
 Project management.
Introduction to data science © Copyright IBM Corporation 2017
Unit summary
• List two applications of data science
• Explain the stages in the CRISP-DM methodology
• Describe the skills needed for data science

Introduction to data science © Copyright IBM Corporation 2017

You might also like