Data Science and Data Scientist
Dr. Alex Liu, Principal Data Scientist
© 2015 IBM Corporation
Data Science Example
Google Flu Trend Analytics
Detecting outbreaks
two weeks ahead
of CDC
Estimating which cities are
most at risk.
2 © 2015 IBM Corporation
Data Science Example
3 © 2015 IBM Corporation
More data science examples …
Capabilities Outcomes
Know Everything about your Customer Creates customized offers up
Analyze all sources of data to know your customers to 125x faster with better results
as individuals
Innovate New Products at Speed and Scale Reduced processing time in half
Capture all sources of feedback and analyze vast
data to drive innovation
Instant Awareness of Fraud and Risk Identified fraud which previously
Analyze all available data, detect fraud and went undetected
manage risk in real-time
Exploit Instrumented Assets Loads hurricane data in seconds
Predict and prevent maintenance, develop new and performs risk analysis in
near real-time for greater
products & services reliability
4 © 2015 IBM Corporation
Data Science – One Definition by Drew Conway
5 © 2015 IBM Corporation
Data Science Definition
Data Science is an interdisciplinary field about processes and systems to extract
knowledge or insights from large volumes of data in various forms either structured or
unstructured, which is a continuation of some of the data analysis fields such as data
mining and predictive analytics, as well as knowledge discovery and data mining (KDD).
Data Science is about turning data into insights.
6 © 2015 IBM Corporation
Data Science is a process
4Es – Equation – Estimation – Evaluation - Explanation
7 © 2015 IBM Corporation
Data Science – a new science paradigm
Data Science is a new science paradigm, under which the knowledge discovery processes
and systems are dramatically different from that in the past, and even how scientists work
and get organized is dramatically different from the past.
Data Science is a new research paradigm, under which researchers must obtain intelligent
assistance to deal with huge amount of data, large selection of equations and models, large
selection of estimation algorithms, and complicated results evaluation and explanation.
8 © 2015 IBM Corporation
Data Scientist
9 © 2015 IBM Corporation
Data Scientist – A Definition
A data scientist is a scientific professional who process large amount of data to
discover insights.
A data scientist represents an evolution from a business or data analyst role. The formal
training is similar, with a solid foundation typically in computer science and applications,
modeling, statistics, analytics, math or even applied social science. What sets the data
scientist apart is strong business acumen, coupled with the ability to communicate findings to
both business and IT leaders in a way that can influence how an organization approaches a
business challenge. Good data scientists will not just address business problems, they will
pick the right problems that have the most value to the organization.
Whereas a traditional data analyst may look only at data from a single source – a CRM
system, for example – a data scientist will most likely explore and examine data from multiple
disparate sources. The data scientist will sift through all incoming data with the goal of
discovering a previously hidden insight, which in turn can provide a competitive advantage or
address a pressing business problem. A data scientist does not simply collect and report on
data, but also looks at it from many angles, determines what it means, then recommends
ways to apply the data.
Source: [Link]
10 © 2015 IBM Corporation
Data Scientist Skills
ALGORITHMS
Data & STATISTICS
Sources MODELS COMPUTING &
Visualization Business
Data Regression Acumen
MLE
Storage RMSE
Decision Subject
ITERATIVE
Data Tree Confusion Knowledge
(MapReduce
RMS
Cleaning & Spark) Matrix
Bayesian & Communica
Causality tion
Feature R ROC Curve
Extraction Time Series
SPSS
Data Equation Estimation Evaluation Explanation
11 © 2015 IBM Corporation