Introduction to
Data Science
Put the Date Here
Accelerated Machine Learning Program
Program Studi Independen Bersertifikat
Zenius Bersama Kampus Merdeka
CLASS AGENDA
1. What is Data Science?
2. Data Science Use Cases in Real Life
3. Data Science Methodology & Life Cycle
4. Tools & Tech-Stacks for Data Scientist
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD
What is Data Science?
Introducing Data Science and
Machine Learning
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
What is Data Science?
Computer Math &
Computer
Scientist Statistics
Science
Data
Scientist
Software Data
Developer Analyst
Domain
Expertise
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
What is Data Science?
Math &
Data Science is an art of
Computer Computer
Scientist Statistics data: from extracting,
Science cleaning, analyzing, and
turning data into insights,
Data predictions, and decisions
Scientist
Software Data
Developer Analyst
Domain
Expertise
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Data Science is not only about the Jargons
Artificial
Intelligence
Machine Learning
Data
Science
Deep Learning
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Skills make a Data Scientist
Machine learning Programming language
Data
Scientist
Analytical Thinking
Database query
Unstructured Data Analysis
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
What is Machine Learning?
Simply, machine learning finds patterns in data and uses them to make predictions.
Input
Machine
Learning
Model
It’s An Apple
Prediction
Annotations
This is an Apple
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
AI vs Machine Learning vs Deep Learning
Artificial Programs with the ability to learn
Intelligence like humans
Machine Learning Algorithms with the ability to learn
without being explicitly
programmed
Deep Learning Subset of machine learning in
which artificial neural networks
adapt & learn from data
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
What are rule-based systems (like “chess
playing program”) included to ??
A. AI with Machine Learning
B. AI without Machine Learning
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
Predicting house prices with linear
regression is included as deep learning.
A. True
B. False
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD
Data Science Use Cases
in Real Life
Real applications and use cases
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
1. Financial & Risk Management
Credit Scoring Fraud Detection Stock Market Price Prediction
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
2. Healthcare
Medical Image Analysis Genetics & Genomics Virtual Assistance
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
3. Marketing
Targeted Ads/Campaigns Product Recommendation Customer Segmentation
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
4. Transport
Self Driving Car Routes Optimization Traffic Management
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
5. Manufacturing
Monitoring Systems Anomaly Detection Scheduling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
Netflix utilizes data science.
A. True
B. False
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
Recommendation engines provide
random recommendations.
A. True
B. False
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD
Data Science
Methodology & Life
Cycle
Methodology, Workflow, Cycle
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business
Understanding
This entails the understanding of a project’s objectives and requirements from the
business viewpoint. Such business perspectives are used to figure out what
business problems to solve via the use of data mining.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Data
Understanding
This phase allows us to become familiarize with the data and this involves
performing exploratory data analysis. Such initial data exploration may allow us to
figure out which subsets of data to use for further modeling as well as aid in the
generation of hypothesis to explore.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Data
Preparation
This can be considered to be the most time-consuming phase of the data mining
process as it involves rigorous data cleaning and pre-processing as well as the
handling of missing data.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Modeling
The pre-processed data are used for model building in which learning algorithms are
used to perform multivariate analysis.
Iterate model building and assessment until you strongly believe that you have found
the best model(s).
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Evaluation
It is important to evaluate the model results and review the process performed to
determine whether the originally set business objectives are met or not.
If deemed appropriate, some steps may need to be performed again. Rinse and
repeat. Once it is deemed that the results and process are satisfactory then we are
ready to move to deployment. Additionally, in this evaluation phase, some findings
may ignite new project ideas for which to explore
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Business Data
Understanding Understanding
Data Preparation
Deployment
Evaluation Modeling
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
CRISP - DM
Cross Industry Standard Process for Data Mining
Deployment
Once the model is of satisfactory quality, the model is then deployed, which may
range from being a simple report, an API that can be accessed via programmatic
calls, a web application, etc.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Obtain Data
Interpret
Results
N
Scrub
S Data
M
E
Model Data
Explore Data
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Obtain Data
Data forms the requisite of the data science process and data can come from
pre-existing ones or from newly acquired data (from surveys), from newly queried
data (from databases or APIs), downloaded from the internet (e.g. from repositories
available on the cloud such as GitHub) or extracted.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Scrub Data
Scrubbing the data is essentially data cleaning and this phase is considered to be
the most time-consuming as it involves handling missing data as well as
pre-processing it to be as error-free and uniform as possible.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Explore Data
This is essentially exploratory data analysis and this phase allows us to gain an
understanding of the data such that we can figure out the course of actions and
areas that we can to explore in the modeling phase. This entails the use of
descriptive statistics and data visualizations.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Model Data
Here, we make use of machine learning algorithms in efforts to make sense of data
and gain useful insights that are essential for data-driven decision-making.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
OSEMN
Interpret Results
This is perhaps one of the most important phase and yet the least technical as it pertains to
actually making sense of the data by figuring out how to simplify and summarize results
from all the models built.
This is including draws meaningful conclusion and rationalizing actionable insights that
would essentially allow us to figure out what the next course of actions are. For example,
what are the most important features that influences the class labels.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
Which of these is NOT part of the CRISP
DM Data Understanding phase?
A. Defining the problems that we want to solve.
B. Finding and identifying any problems within the data sets.
C. Cleaning and addressing any problems with the data sets.
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
Pop Quiz!
The CRISP DM phase of Evaluation is
similar to which step at OSEMN?
A. O
B. S
C. E
D. M
E. N
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
PUT THE TOPIC
Introduction to Data
HEREScience
AS OVERHEAD
Tools & Tech-Stacks for
Data Scientist
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Introduction to Data Science
© 2022 Program Studi Independen Bersertifikat Zenius Bersama Kampus Merdeka
Terima kasih!
Ada pertanyaan?