• Oracle Cloud Infrastructure
Oracle AutoML
Data
access
Data exploration
Monitoring, And
refresh, preparation
retirement
Machine Learning Life Business
Problem
Cycle: AutoML Modeling
Deployment
AutoML
Validation
AutoML: What and Why
• Building a successful machine learning model requires a lot of Model Selection
iterations and experimentation.
• Developers rarely achieve a model with an optimal set of
hyperparameters in the first iteration, which provides an opportunity for
ML automation.
• AutoML combines the processes of choosing and refining models, and
tuning parameters. This optimizes the outcome of the learning.
Hyperparameter Model
Tuning Assessment
AutoML Approaches
Bayesian Recommender Genetic
Optimization System Programming
Probabilistic model captures System maintains a record of Technique of evolving programs,
different hyperparameter the best configuration found for starting from a population of unfit
configurations and their each data set it has previously (usually random) programs, fit for a
performance. encountered. particular task by applying
operations
Oracle Automated Machine Learning (Oracle AutoML)
• AutoML solution from Oracle
Non-iterative
and faster Leverages
metalearning
Avoids cold-start problems
Benefits
Automates many routine but time-
consuming steps, and increases data
scientists’ productivity
Automates the process of feature
selection, model/algorithm selection,
and hyperparameter tuning
Reduces the overall compute time
required to deliver machine learning
models
Oracle AutoML Workflow
Selects a model from a large number of
viable candidate models
Tunes the hyperparameters for
each model
Selects predictive features to speed up the
pipeline and reduce overfitting
Data set
Data Scientist Tuned Model
Features > Labels
Ensures the model trained is generalized
and works for unseen data
AutoML Pipeline
Algorithm Adaptive Feature Hyperparameter
Tuning
Selection Sampling Selection
Data set Identify the best Identify the right De-noise the data Auto tune
algorithms for the sample size and and reduce the hyperparameters for Tuned Model
data and adjust for number of the best model
problem; faster unbalanced data. features. accuracy.
than exhaustive
search.
Algorithm Selection
Algorithm that yields max score is identified.
Algorithm Algorithms are ranked based on predicted
Selection scores.
Automated algorithm selection uses
metalearning.
Top K Algorithms
Algorithms with the highest scores are later
used for model tuning.
How Algorithms Are Selected
Extract Dataset Rank Algorithms Top K Algorithms
New Data set
Characteristic Based on Predicted
Scores
Invoke Score
Prediction Models
Adaptive Sampling
Identifies the right sampling percentage
Adaptive
Sampling
Speeds up model building
Identify Detects unbalanced data sets that can
Optimized Sample cause poor models
How Is Adaptive Sampling Done?
Optimize Until
Convergence Is
Achieved
New Data set
Identify Optimized Reduced Data
Sample set
Measure
ML Algorithms
Model Score
Feature Selection
Selects a subset of features that are most
predictive of the target
Feature
Selection
Reduces the number of features used in later
pipeline stages
Predict Best Feature Speeds up training without losing predictive
Set performance
How Is Feature Selection Done?
Repeat for
Extract Data set Multiple Ranking
Characteristics Algorithms
New Data set
Predict Best Measure Reduced Data
Feature Set Model Score set
ML Algorithms Rank Features
Hyperparameter Tuning
Filters for optimal configuration of the
shortlisted algorithms
Hyperparameter
Tuning
Tunes multiple machine learning models
Prediction Tunes each selected algorithm to find
Models hyperparameter settings
How Is Hyperparameter Tuning Done?
Optimize until
Measure
Convergence
Model Score
New Data set
Hyperparameter Hyperparameter Tuned Model
Choice Choice
Prediction
ML Algorithms
Models
Building with Oracle AutoML
• OracleAutoMLProvider delegates model training to the ads.automl
package from Oracle Accelerated Data Science Python SDK.
• OracleAutoMLProvider class supports two arguments:
• n_jobs: Specifies the degree of parallelism for Oracle AutoML. The
default is -1, which means all cores will be used.
• Loglevel: Verbosity of output for Oracle AutoML
• Results can be visualized at each stage of the AutoML pipeline.
Building with Oracle AutoML
• The Oracle AutoML process summarizes the optimization process by providing:
• Training data information
• Pipeline information with selected features, best choices, and respective hyperparameters
• Best model trial information
• Adaptive sampling will not run and visualizations will not be generated if data points are < 1000.
• model_list allows you to control what algorithms AutoML will consider during the
optimization process.
• score_metric allows you to provide your own scoring metric as a string from a list of metrics
or as a user-defined function. Default metrics are:
• Binary Classification: roc_auc
• Multiclass Classification: recall_macro
• Regression: neg_mean_squared_error
Oracle AutoML: Time Budget
› The Oracle AutoML tool also supports a user-given time budget in seconds.
› AutoML tries to terminate computation as soon as the time budget is exhausted by returning the current best model.
Time budget exhausts before preprocessing completes: A Naive
Scenario 1
Bayes model is returned for classification and Linear Regression
for regression.
Time budget exhausts before algorithm selection completes:
Scenario 2 Partial results for algorithm selection are used to evaluate the AutoML
best candidate that is returned. Pipeline
Time budget exhausts before hyperparameter tuning
Scenario 3 completes: Current best known hyperparameter
configuration is returned.
Oracle AutoML:
Minimum Feature List
AutoML ensures through min_features that the features in the list are part of the
final model that it creates, and these are not dropped during the feature selection
phase.
▪ If int, 0 < min_features <= n_features
▪ If float, 0 < min_features <= 1.0
▪ If list, names of features to keep. For example, [‘a’, ‘b’] means keep
features ‘a’ and ‘b’.