Big Data and Machine Learning Using MATLAB
Big Data and Machine Learning Using MATLAB
Using MATLAB
Seth DeLand & Amit Doshi
MathWorks
2
Customer Example: Gas Natural Fenosa
User Story
Energy Production Optimization
Opportunity
• Allocate demand among power plants to minimize
generation costs
Analytics Use
• Data: Central database for historical power consumption
and price data, weather forecasts, and parameters for each
power plant
• Machine Learning: Develop price simulation scenarios
• Optimization: minimize production cost
Benefit
• Reduced generation costs
• White-box solution for optimizing power generation
3
Unit Commitment
Predictive and Prescriptive Analytics
Prescriptive Analytics
Predictive Analytics
Historical Unit Schedule
Weather Data Commitment
Load Forecast
Historical Generator
Load Data Parameters
4
Big Data Analytics Workflow
Access and Explore Develop Predictive Integrate Analytics with
Preprocess Data
Data Models Systems
5
Example: Working with Big Data in MATLAB
▪ Objective: Create a model to predict the cost of a taxi ride in New York City
▪ Inputs:
– Monthly taxi ride log files
– The local data set is small (~20 MB)
– The full data set is big (~21 GB)
▪ Approach:
– Access Data
– Preprocess and explore data
– Develop and validate predictive model (linear fit)
▪ Work with subset of data for prototyping and then run on spark enabled hadoop with full data
– Integrate analytics into a webapp
6
Example: Working with Big Data in MATLAB
7
Demo: Taxi Fare Predictor Web App
8
Big Data Analytics Workflow: Data Access and Pre-process
Access and Explore Develop Predictive Integrate Analytics with
Preprocess Data
Data Models Systems
9
Data Access and Pre-processing – Challenges
Challenges
▪ Data aggregation
– Different sources (files, web, etc.)
– Different types (images, text, audio, etc.)
▪ Data clean up
– Poorly formatted files
– Irregularly sampled data
– Redundant data, outliers, missing data etc.
10
Data Analytics Workflow: Big Data Access and Pre-processing
11
Next: Access Big Data from MATLAB
▪ datastore
– Tabular text files
– Images
– Excel spreadsheets
– (SQL) Databases
– HDFS (Hadoop)
– S3 - Amazon
12
Get data in MATLAB
13
What if the data is saved in HDFS?
14
Or Data is stored in a Database
15
Data Access: Summary
Software
16
Process data which doesn't fit into memory
Access and Explore Develop Predictive Integrate Analytics with
Preprocess Data
Data Models Systems
17
Pre-processing Big Data
tall arrays in
▪ New data type designed for data that doesn’t fit into memory
18
tall arrays Single
tall array Single
Machine Machine
Memory Process Memory
19
tall arrays Single
tall array Single
Machine Machine
Memory Process Memory
Single
Machine
Process Memory
20
Demo: Working with Tall Arrays
21
Data Access and pre-processing – challenges and solution
MATLAB makes it easy to 1
Challenges
work with business and
▪ Data aggregation engineering data
– Different sources (files, web, etc.)
– Different types (images, text, audio, etc.)
▪ Data clean up
– Poorly formatted files
– Irregularly sampled data
– Redundant data, outliers, missing data etc.
Files Databases
▪ Data specific processing
– Signals: Smoothing, resampling, denoising,
Wavelet transforms, etc. Signals Images
– Images: Image registration, morphological
filtering, deblurring, etc. ▪ Built-in algorithms for data
▪ Point and click tools to access
▪ Dealing with out of memory data (big data) variety of data sources preprocessing including sensor,
23
Machine Learning
Machine learning uses data and produces a program to perform a task
Computer Machine
Program Learning
24
Consider Machine/Deep Learning When
Problem is too complex for hand written rules or equations Because algorithms can
Discover an internal
representation from
input data only
26
Different Types of Learning
Type of Learning Categories of Algorithms
Support
Discriminant Nearest
Classification Vector
Analysis
Naive Bayes
Neighbor
Machines
Supervised
Learning
Linear
SVR, Ensemble Decision Neural
Develop predictive Regression Regression
GPR Methods Trees Networks
GLM
model based on both
Machine input and output data
Learning
27
Machine Learning with Big Data
28
Demo: Training a Machine Learning Model
29
Demo: Training a Machine Learning Model
30
Regression Learner
31
Regression Learner
App to apply advanced regression methods to your data
32
Classification Learner
App to apply advanced classification methods to your data
33
and Many More MATLAB Apps for Data Analytics
Distribution Fitting
System Identification
Signal Analysis
34
Tuning Machine Learning Models
Get more accurate models in less time
35
Machine Learning Hyperparameters
Hyperparameters
Tune all
hyperparameters for this model
36
Bayesian Optimization in Action
37
Big Data Analytics Workflow: Developing 2
MATLAB enables
Predictive models domain experts to
do Data Science
38
Back to our example: Working with Big Data in MATLAB
▪ Objective: Create a model to predict the cost of a taxi ride in New York City
▪ Inputs:
– Monthly taxi ride log files
– The local data set is small (~20 MB)
– The full data set is big (~25 GB)
▪ Approach:
– Acecss Data
– Preprocess and explore data
– Develop and validate predictive model (linear fit)
▪ Work with subset of data for prototyping
▪ Scale to full data set on a cluster
39
Data Analytics Workflow: Develop Predictive Models using Big Data
Access and Explore Develop Predictive Integrate Analytics with
Preprocess Data
Data Models Systems
40
Demo: Taxi Fare Predictor Web App
41
MATLAB Production Server
▪ Server software
– Manages packaged MATLAB
programs and worker pool
Enterprise
MATLAB Production Server
Application
MATLAB
Runtime
42
Integrate analytics with systems
MATLAB Analytics
3
run anywhere
MATLAB
Runtime
43
Product Support for Spark
Integrate with applications:
• Deploy MATLAB programs using “tall”
• Develop deployable applications for
From MATLAB desktop: Spark using MATLAB API for Spark
Web & Mobile Enterprise
• Access data from HDFS Applications Applications DEVELOPMENT TOOLS
MATLAB
Compiler
MATLAB Distributed
Computing Server
MATLAB
Spark Runtime
YARN
44
Deployment Offerings Program using tall
Program using
MATLAB API for Spark
45
Data Analytics Workflow
Access and Explore Develop Predictive Integrate Analytics
Preprocess Data
Data Models with Systems
46
Resources to learn and get started mathworks.com/machine-learning
mathworks.com/big-data
eBook
47
MathWorks Services
▪ Consulting
– Integration
– Data analysis/visualization
– Unify workflows, models, data
www.mathworks.com/services/consulting/
▪ Training
– Classroom, online, on-site
– Data Processing, Visualization, Deployment, Parallel Computing
www.mathworks.com/services/training/
48
MathWorks Training Offerings
http://www.mathworks.com/services/training/
49
Speaker Details Contact MathWorks India
Email:
Products/Training Enquiry Booth
[email protected]
[email protected]
Call: 080-6632-6000
LinkedIn: Email: [email protected]
https://in.linkedin.com/in/amit-doshi
https://www.linkedin.com/in/seth-deland