0% found this document useful (0 votes)
80 views44 pages

Session1 - Introduction To Data Science

The document provides an introduction to data science, covering topics such as what data science is, real-life use cases, and the data science methodology and lifecycle. It defines data science and discusses skills needed for a data scientist. Common data science use cases in various domains like finance, healthcare, marketing, transport and manufacturing are presented. The CRISP-DM methodology involving business understanding, data understanding, data preparation, modeling, evaluation and deployment is introduced.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views44 pages

Session1 - Introduction To Data Science

The document provides an introduction to data science, covering topics such as what data science is, real-life use cases, and the data science methodology and lifecycle. It defines data science and discusses skills needed for a data scientist. Common data science use cases in various domains like finance, healthcare, marketing, transport and manufacturing are presented. The CRISP-DM methodology involving business understanding, data understanding, data preparation, modeling, evaluation and deployment is introduced.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to

Data Science
Put the Date Here

Accelerated Machine Learning Program

Program Studi Independen Bersertifikat


Zenius Bersama Kampus Merdeka
CLASS AGENDA

1. What is Data Science?


2. Data Science Use Cases in Real Life
3. Data Science Methodology & Life Cycle
4. Tools & Tech-Stacks for Data Scientist

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD

What is Data Science?


Introducing Data Science and
Machine Learning

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

What is Data Science?

Computer Math &


Computer
Scientist Statistics
Science

Data
Scientist
Software Data
Developer Analyst

Domain
Expertise

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

What is Data Science?

Math &
Data Science is an art of
Computer Computer
Scientist Statistics data: from extracting,
Science cleaning, analyzing, and
turning data into insights,
Data predictions, and decisions
Scientist
Software Data
Developer Analyst

Domain
Expertise

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Data Science is not only about the Jargons


Artificial
Intelligence

Machine Learning

Data
Science
Deep Learning

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Skills make a Data Scientist

Machine learning Programming language

Data
Scientist
Analytical Thinking

Database query
Unstructured Data Analysis

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

What is Machine Learning?


Simply, machine learning finds patterns in data and uses them to make predictions.

Input
Machine
Learning
Model

It’s An Apple

Prediction
Annotations

This is an Apple

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

AI vs Machine Learning vs Deep Learning


Artificial Programs with the ability to learn
Intelligence like humans

Machine Learning Algorithms with the ability to learn


without being explicitly
programmed

Deep Learning Subset of machine learning in


which artificial neural networks
adapt & learn from data

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
What are rule-based systems (like “chess
playing program”) included to ??
A. AI with Machine Learning
B. AI without Machine Learning

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
Predicting house prices with linear
regression is included as deep learning.
A. True
B. False

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD

Data Science Use Cases


in Real Life
Real applications and use cases

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

1. Financial & Risk Management


Credit Scoring Fraud Detection Stock Market Price Prediction

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

2. Healthcare
Medical Image Analysis Genetics & Genomics Virtual Assistance

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

3. Marketing
Targeted Ads/Campaigns Product Recommendation Customer Segmentation

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

4. Transport
Self Driving Car Routes Optimization Traffic Management

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

5. Manufacturing
Monitoring Systems Anomaly Detection Scheduling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
Netflix utilizes data science.

A. True
B. False

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
Recommendation engines provide
random recommendations.
A. True
B. False

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD

Data Science
Methodology & Life
Cycle
Methodology, Workflow, Cycle

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business
Understanding

This entails the understanding of a project’s objectives and requirements from the
business viewpoint. Such business perspectives are used to figure out what
business problems to solve via the use of data mining.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Data
Understanding

This phase allows us to become familiarize with the data and this involves
performing exploratory data analysis. Such initial data exploration may allow us to
figure out which subsets of data to use for further modeling as well as aid in the
generation of hypothesis to explore.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Data
Preparation

This can be considered to be the most time-consuming phase of the data mining
process as it involves rigorous data cleaning and pre-processing as well as the
handling of missing data.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Modeling

The pre-processed data are used for model building in which learning algorithms are
used to perform multivariate analysis.
Iterate model building and assessment until you strongly believe that you have found
the best model(s).

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Evaluation

It is important to evaluate the model results and review the process performed to
determine whether the originally set business objectives are met or not.

If deemed appropriate, some steps may need to be performed again. Rinse and
repeat. Once it is deemed that the results and process are satisfactory then we are
ready to move to deployment. Additionally, in this evaluation phase, some findings
may ignite new project ideas for which to explore

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Business Data
Understanding Understanding

Data Preparation

Deployment

Evaluation Modeling

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

CRISP - DM
Cross Industry Standard Process for Data Mining

Deployment

Once the model is of satisfactory quality, the model is then deployed, which may
range from being a simple report, an API that can be accessed via programmatic
calls, a web application, etc.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Obtain Data

Interpret
Results
N
Scrub
S Data

M
E
Model Data

Explore Data

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Obtain Data

Data forms the requisite of the data science process and data can come from
pre-existing ones or from newly acquired data (from surveys), from newly queried
data (from databases or APIs), downloaded from the internet (e.g. from repositories
available on the cloud such as GitHub) or extracted.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Scrub Data

Scrubbing the data is essentially data cleaning and this phase is considered to be
the most time-consuming as it involves handling missing data as well as
pre-processing it to be as error-free and uniform as possible.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Explore Data

This is essentially exploratory data analysis and this phase allows us to gain an
understanding of the data such that we can figure out the course of actions and
areas that we can to explore in the modeling phase. This entails the use of
descriptive statistics and data visualizations.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Model Data

Here, we make use of machine learning algorithms in efforts to make sense of data
and gain useful insights that are essential for data-driven decision-making.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

OSEMN
Interpret Results

This is perhaps one of the most important phase and yet the least technical as it pertains to
actually making sense of the data by figuring out how to simplify and summarize results
from all the models built.

This is including draws meaningful conclusion and rationalizing actionable insights that
would essentially allow us to figure out what the next course of actions are. For example,
what are the most important features that influences the class labels.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
Which of these is NOT part of the CRISP
DM Data Understanding phase?
A. Defining the problems that we want to solve.
B. Finding and identifying any problems within the data sets.
C. Cleaning and addressing any problems with the data sets.

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

Pop Quiz!
The CRISP DM phase of Evaluation is
similar to which step at OSEMN?
A. O
B. S
C. E
D. M
E. N
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD

Tools & Tech-Stacks for


Data Scientist

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Introduction to Data Science

© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka


Terima kasih!
Ada pertanyaan?

You might also like