0% found this document useful (0 votes)

26 views83 pages

Deeplearning Ai

These slides can be used for educational purposes under a Creative Commons license but may not be used commercially without attribution to DeepLearning.AI. The slides may be copied and distributed for educational use as long as DeepLearning.AI is cited as the source. More details about the license can be found at the specified URL.

Uploaded by

Jian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views83 pages

Deeplearning Ai

Uploaded by

Jian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Copyright Notice

These slides are distributed under the Creative Commons License.

[Link] makes these slides available for educational purposes. You may not
use or distribute these slides for commercial purposes. You may make copies of these
slides and use or distribute them for educational purposes as long as you
cite [Link] as the source of the slides.

For the rest of the details of the license, see

[Link]
Model Analysis

Welcome
Model Analysis Overview

Model Performance
Analysis
What is next after model training/deployment?

● Is model performing well?

● Is there scope for improvement?
● Can the data change in future?
● Has the data changed since you created your training dataset?
Black box evaluation vs model introspection

● Models can be tested for metrics like accuracy and losses like test error
without knowing internal details
● For ﬁner evaluation, models can be inspected part by part
Black box evaluation
Model introspection
Performance metrics vs optimization objectives

● Performance metrics will vary based ● Machine Learning formulates the

on the task like regression, problem statement into an objective
classiﬁcation, etc. function
● Within a type of task, based on the ● Learning algorithms ﬁnd optimum
end-goal, your performance metrics values for each variable to converge
may be different into local/global minima
● Performance is measured after a
round of optimization
Performance metrics vs optimization objectives

[Link]
Top level aggregate metrics vs slicing

● Most of the time, metrics are calculated on the entire dataset

● Slicing deals with understanding how the model is performing on each
subset of data
Advanced Model Analysis and
Debugging

Introduction to TensorFlow
Model Analysis
Why should you slice your data?

Your top-level metrics may hide problems

● Your model may not perform well for particular [customers | products |
stores | days of the week | etc.]
Each prediction request is an individual event, maybe an individual
customer
● For example, customers may have a bad experience
● For example, some stores may perform badly
TensorFlow Model Analysis (TFMA)

Ensures models Used to compute Inspect model’s

Scalable Open source meet required and visualize performance
framework library quality evaluation against different
thresholds metrics slices of data
Architecture
ExtractAndEvaluate

Extractors
Tf. Example

Slice Custom
Read Predict
Keys Extractor
Inputs (default)
(default) (optional)
...

analysis
Evaluators AnalysisEvaluator(Default)
CustomEvaluator(Optional)
Write
MetricsandPlotsEvaluator(default)
Results metrics

Group Compute/ ...

By Combine
Slices
One model vs multiple models over time
TensorFlow metrics in TensorBoard TensorFlow metrics in TensorFlow
model analysis

Metric value
Metric value

Global steps
Aggregate vs sliced metrics
Aggregate metric computed over Metric “sliced” by different segments
entire eval dataset of the eval dataset
Sensitivity

Sensitivity
1 - Speciﬁcity 1 - Speciﬁcity
Streaming vs full-pass metrics

Streaming metrics are approximations

computed on mini-batches of data
TensorBoard
runs on batch
(estimates) TensorBoard visualizes metrics
through mini-batches
TFMA
runs on whole eval
set TFMA gives evaluation results after
running through entire dataset

Apache Beam is used for scaling on

large datasets
Advanced Model Analysis and
Debugging

TFMA in Practice
TFMA in practice

● Analyse impact of different slices of data over various metrics

● How to track metrics over time?
Step 1: Export EvalSavedModel for TFMA
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_model_analysis as tfma

def get_serve_tf_examples_fn(model, tf_transform_output):

# Return a function that parses a serialized [Link] and applies TFT

tf_transform_output = [Link](transform_output_dir)
signatures = {
'serving_default': get_serve_tf_examples_fn(model, tf_transform_output)
.get_concrete_function([Link](...)),
}

[Link](serving_model_dir_path, save_format='tf', signatures=signatures)

Step 2: Create EvalConﬁg
# Specify slicing spec
slice_spec = [[Link](columns=[‘column_name’]), ...]
# Define metrics
metrics = [[Link](name='accuracy'),
[Link](name='mean_prediction'), ...]
metrics_specs = [Link].specs_from_metrics(metrics)

eval_config = [Link](
model_specs=[[Link](label_key=features.LABEL_KEY)],
slicing_specs=slice_spec,
metrics_specs=metrics_specs, ...)
Step 3: Analyze model
# Specify the path to the eval graph and to where the result should be written
eval_model_dir = ...
result_path = ...

eval_shared_model = tfma.default_eval_shared_model(
eval_saved_model_path=eval_model_dir,
eval_config=eval_config)
# Run TensorFlow Model Analysis
eval_result = tfma.run_model_analysis(eval_shared_model=eval_shared_model,
output_path=result_path,
...)
Step 4: Visualizing metrics
# render results
[Link].render_slicing_metrics(result)
Advanced Model Analysis and
Debugging

Model Debugging
Overview
Model robustness

● Robustness is much more than generalization

● Is the model accurate even for slightly corrupted input data?
Robustness metrics

Robustness measurement shouldn’t take place during training

Split data in to train/val/dev sets

Speciﬁc metrics for regression and classiﬁcation problems

Model debugging

● Deals with detecting and dealing with problems in ML systems

● Applies mainstream software engineering practices to ML models
Model Debugging Objectives

Opaqueness Social Security Privacy Model

discrimination vulnerabilities harms decay
Model Debugging Techniques

Benchmark Sensitivity Residual

models analysis analysis
Advanced Model Analysis and
Debugging

Benchmark Models
Benchmark models

Simple, trusted and interpretable models solving the same problem

Compare your ML model against these models

Benchmark model is the starting point of ML development

Advanced Model Analysis and
Debugging

Sensitivity Analysis and

Adversarial Attacks
Sensitivity analysis

● Simulate data of your choice and see what your model predicts
● See how model reacts to data which has never been used before
What-If Tool for sensitivity analysis
Random Attacks

● Expose models to high volumes of random input data

● Exploits the unexpected software and math bugs
● Great way to start debugging
Partial dependence plots

● Visualize the effects of changing one or more variables in your model

● PDPbox and PyCEbox open source packages
How vulnerable to attacks is your model?

Sensitivity can mean vulnerability

● Attacks are aimed at fooling your model

● Successful attacks could be catastrophic
● Test adversarial examples
● Harden your model
A Famous Example: Ostrich
How vulnerable to attacks is your model?

Example:

A self-driving car crashes because black

and white stickers applied to a stop sign
cause a classiﬁer to interpret it as a Speed
Limit 45 sign.
How vulnerable to attacks is your model?

Example:

A spam detector fails to classify an email as

spam. The spam mail has been designed to
look like a normal email, but is actually
phishing.
How vulnerable to attacks is your model?

Example:

A machine-learning powered scanner scans

suitcases for weapons at an airport. A knife
was developed to avoid detection by
making the system think it is an umbrella.
Informational and Behavioral Harms

● Informational Harm: Leakage of information

● Behavioral Harm: Manipulating the behavior of the model
Informational Harms

● Membership Inference: was this person’s data

used for training?
● Model Inversion: recreate the training data
● Model Extraction: recreate the model
Behavioral Harms

● Poisoning: insert malicious data into

training data

● Evasion: input data that causes the model

to intentionally misclassify that data
Measuring your vulnerability to attack

Cleverhans:
an open-source Python library to benchmark
machine learning systems' vulnerability to
adversarial examples
Foolbox:
an open-source Python library that lets you
easily run adversarial attacks against machine
learning models
Adversarial example searches
Attempted defenses against adversarial examples
● Defensive distillation
Advanced Model Analysis and
Debugging

Residual Analysis
Residual analysis

● Measures the difference between model’s predictions and ground truth

● Randomly distributed errors are good
● Correlated or systematic errors show that a model can be improved
Residual analysis

Random = Good Systematic = Bad

Residual analysis

● Residuals should not be correlated with another feature

● Adjacent residuals should not be correlated with each other
(autocorrelation)
Advanced Model Analysis and
Debugging

Model Remediation
Remediation techniques

Adding synthetic data into training set

Data
augmentation
Helps correct for unbalanced training data

Overcome myth of neural networks as black box

Interpretable and
explainable ML Understand how data is getting transformed
Remediation techniques

● Model editing:
○ Applies to decision trees
○ Manual tweaks to adapt your use case

● Model assertions:
○ Implement business rules that override model predictions
Remediation techniques

Include people with varied backgrounds

for collecting training data

Discrimination Conduct feature selection on training data

remediation
Use fairness metrics to select hyperparameters
and decision cut-off thresholds
Remediation techniques

● Conduct model debugging at regular intervals

Model
monitoring ● Inspect accuracy, fairness, security problems, etc

● Anomalies can be a warning of an attack

Anomaly
● Enforce data integrity constraints on incoming data
detection
Advanced Model Analysis and
Debugging

Fairness
Fairness indicators

● Open source library to compute fairness metrics

● Easily scales across dataset of any size
● Built in top of TFMA
What does fairness indicators do?

● Compute commonly-identiﬁed fairness metrics for classiﬁcation

models
● Compare model performance across subgroups to other models
● No remediation tools provided
Evaluate at individual slices

● Overall metrics can hide poor performance for certain parts of data
● Some metrics may fare well over others
Aspects to consider

● Establish context and different user types

● Seek domain experts help
● Use data slicing widely and wisely
General guidelines

● Compute performance metrics at all slices of data

● Evaluate your metrics across multiple thresholds
● If decision margin is small, report in more detail
Advanced Model Analysis and
Debugging

Measuring Fairness
Positive rate / Negative rate

● Percentage data points classiﬁed as positive/negative

● Independent of ground truth
● Use case: having equal ﬁnal percentages of groups is important
True positive rate (TPR) / False negative rate (FNR)

● TPR: percentage of positive data points that are correctly labeled

positive
● FNR: percentage of positive data points that are incorrectly labeled
negative
● Measures equality of opportunity, when the positive class should be
equal across subgroups
● Use case: where it is important that same percent of qualiﬁed
candidates are rated positive in each group
True negative rate (TNR) / False positive rate (FPR)

● TNR: percentage of negative data points that are correctly labeled

negative
● FPR: percentage of negative data points that are incorrectly labeled
positive
● Measures equality of opportunity, when the negative class should be
equal across subgroup
● Use case: where misclassifying something as positive are more
concerning than classifying the positives
Accuracy & Area under the curve (AUC)

● Accuracy: The percentage of data points that are correctly labeled

● AUC: The percentage of data points that are correctly labeled when
each class is given equal weight independent of number of samples
● Metrics related to predictive parity
● Use case: when precision is critical
Tips
Unfair skews if there is a gap in a metric
between two groups

Good fairness indicators doesn’t always mean

the model is fair

Continuous evaluation throughout

development and deployment

Conduct adversarial testing for rare, malicious

examples
About the CelebA dataset

● 200K celebrity images

● Each image has 40 attribute annotations
● Each image has 5 landmark locations
● Assumption on smiling attribute
Fairness indicators in practice

Build a classiﬁer to detect smiling

Evaluate fairness and performance across age groups

Generate visualizations to gain model performance insight

Continuous Evaluation and
Monitoring

Continuous evaluation and

monitoring
Why do models need to be monitored?

● Training data is a snapshot of the world at a point in time

● Many types of data change over time, some quickly
● ML Models do not get better with age
● As model performance degrades, you want an early warning
Data drift and shift

● Concept drift: loss of prediction quality

● Concept Emergence: new type of data distribution
● Types of dataset shift:
○ covariate shift

○ prior probability shift

How are models monitored?

Raw Data
Training Data

Preprocessing
Labeling

Model

Prediction Monitoring
Statistical process control

Method used is drift detection method

Models number of error as binomial random variable

Alert rule
Sequential analysis

Method used is linear four rates

If data is stationary, contingency table should remain constant

Error distribution monitoring

Method used is Adaptive Windowing (ADWIN)

Calculate mean error rate at every window of data

Size of window adapts, becoming shorter when data is not stationary

Clustering/novelty detection

● Assign data to known cluster or detect emerging concept

● Multiple algorithms available: OLINDDA, MINAS, ECSMiner, and GC3
● Susceptible to curse of dimensionality
● Does not detect population level changes
Feature distribution monitoring

Monitors individual feature separately at every window of data

Algorithms to compare:

Pearson correlation in Change of Concept

Hellinger Distance in HDDDM

Use PCA to reduce number of features

Model-dependent monitoring

● Concentrate efforts near decision margin in latent space

● One algorithm is Margin Density Drift Detection (MD3)
● Area in latent space where classiﬁers have low conﬁdence matter more
● Reduces false alarm rate effectively
Google Cloud AI Continuous Evaluation

● Leverages AI Platform Prediction and Data Labeling services

● Deploy your model to AI Platform Prediction with model version
● Create evaluation job
● Input and output are saved in BigQuery table
● Run evaluation job on few of these samples
● View the evaluation metrics on Google Cloud console
How often should you retrain?

● Depends on the rate of change

● If possible, automate the management of detecting model drift and
triggering model retraining
How often should you retrain?

MLOps Getting From Good To Great
No ratings yet
MLOps Getting From Good To Great
41 pages
Segmentation Dataset
No ratings yet
Segmentation Dataset
41 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
79 pages
TensorFlow Basics and Examples
No ratings yet
TensorFlow Basics and Examples
33 pages
Neural Network Classification Guide
No ratings yet
Neural Network Classification Guide
25 pages
Implementing Neural Networks with TensorFlow
No ratings yet
Implementing Neural Networks with TensorFlow
24 pages
ML Challenges
No ratings yet
ML Challenges
8 pages
Module 5.pptx - 20250608 - 201231 - 0000
No ratings yet
Module 5.pptx - 20250608 - 201231 - 0000
43 pages
DL Mannual For Reference
No ratings yet
DL Mannual For Reference
58 pages
Deep Learning Workshop Session 2
No ratings yet
Deep Learning Workshop Session 2
4 pages
Lecture 14 Introduction To Pytorch
No ratings yet
Lecture 14 Introduction To Pytorch
45 pages
What Is TensorFlow
No ratings yet
What Is TensorFlow
38 pages
Tensorflow PDF
No ratings yet
Tensorflow PDF
62 pages
Tensorflow and Deep Learning
No ratings yet
Tensorflow and Deep Learning
51 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Backdoor On Federated Learning
No ratings yet
Backdoor On Federated Learning
34 pages
TensorFlow Tutorial, CME 323, 4-12-2018
No ratings yet
TensorFlow Tutorial, CME 323, 4-12-2018
40 pages
CSE488 - Lab7 - Neural Networks and TensorFlow
No ratings yet
CSE488 - Lab7 - Neural Networks and TensorFlow
21 pages
11 MLSecurity
No ratings yet
11 MLSecurity
42 pages
TensorFlow Crash Course: Linear Regression & Neural Networks
No ratings yet
TensorFlow Crash Course: Linear Regression & Neural Networks
63 pages
Tensorflow
No ratings yet
Tensorflow
29 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
AI ML Session Slides
No ratings yet
AI ML Session Slides
34 pages
Forecasting Stability Categories Using Neural Networks
No ratings yet
Forecasting Stability Categories Using Neural Networks
5 pages
How To Create A Python Model
No ratings yet
How To Create A Python Model
29 pages
TensorFlow Overview and Usage Guide
No ratings yet
TensorFlow Overview and Usage Guide
4 pages
General Tips: Examples
No ratings yet
General Tips: Examples
3 pages
Matplotlib CS
No ratings yet
Matplotlib CS
10 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Machine Learning Guide: Basics to Deployment
No ratings yet
Machine Learning Guide: Basics to Deployment
2 pages
Tensorflow Placeholders and Optimizers
No ratings yet
Tensorflow Placeholders and Optimizers
20 pages
DL Unit II
No ratings yet
DL Unit II
29 pages
TensorFlow Neural Network Regression Guide
No ratings yet
TensorFlow Neural Network Regression Guide
20 pages
All Important Machine Learning Functions 1754273575
No ratings yet
All Important Machine Learning Functions 1754273575
7 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
No ratings yet
Intro To PyTorch and Neural Networks - Intro To PyTorch and Neural Networks Cheatsheet - Codecademy
8 pages
Introduction To TensorFlow
No ratings yet
Introduction To TensorFlow
3 pages
BD 10 Tensorflow
No ratings yet
BD 10 Tensorflow
43 pages
TensorFlow Machine Learning Guide
No ratings yet
TensorFlow Machine Learning Guide
119 pages
Practical Machine Learning Pipelines With Mllib: Joseph K. Bradley
No ratings yet
Practical Machine Learning Pipelines With Mllib: Joseph K. Bradley
35 pages
ML Resources CW 2025
No ratings yet
ML Resources CW 2025
5 pages
Slides Security and Privacy in Machine Learning
No ratings yet
Slides Security and Privacy in Machine Learning
59 pages
Machine Learning Mastery Roadmap
No ratings yet
Machine Learning Mastery Roadmap
4 pages
UNIT II - PPT - Part 1
No ratings yet
UNIT II - PPT - Part 1
41 pages
Classification of Stability of Power Systems Using Deep Learning Models
No ratings yet
Classification of Stability of Power Systems Using Deep Learning Models
21 pages
AML Lecture1.3
No ratings yet
AML Lecture1.3
72 pages
Implementation of Time Series Forecasting
No ratings yet
Implementation of Time Series Forecasting
12 pages
NN From Scratch PDF 1735495327
No ratings yet
NN From Scratch PDF 1735495327
19 pages
Deeplearning Lab Manual
No ratings yet
Deeplearning Lab Manual
29 pages
Lec2 - Intro To Tensorflow
No ratings yet
Lec2 - Intro To Tensorflow
120 pages
Tensor Flow
No ratings yet
Tensor Flow
3 pages
TensorFlow 2.0 Guide for Developers
No ratings yet
TensorFlow 2.0 Guide for Developers
2 pages
Deep Learning With Google Cloud (PDFDrive)
No ratings yet
Deep Learning With Google Cloud (PDFDrive)
99 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
32 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
144 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
42 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
41 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
40 pages
C2 - W3 Mlopssasaddsad
No ratings yet
C2 - W3 Mlopssasaddsad
65 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
123 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
64 pages
MKT 750: Consumer Behavior Overview
No ratings yet
MKT 750: Consumer Behavior Overview
25 pages
The University Research System in Pakistan
No ratings yet
The University Research System in Pakistan
216 pages
Korean Impact On India 2
No ratings yet
Korean Impact On India 2
15 pages
Research Interview Request Letters
No ratings yet
Research Interview Request Letters
4 pages
Public Health Policy MCQs Overview
100% (3)
Public Health Policy MCQs Overview
6 pages
Reading 5-3-2 Astronaut
No ratings yet
Reading 5-3-2 Astronaut
3 pages
An Integrated AHP Modified VIKOR Model For Financial Perf - 2022 - Decision Anal
No ratings yet
An Integrated AHP Modified VIKOR Model For Financial Perf - 2022 - Decision Anal
11 pages
Understanding Research Philosophies and Approaches
No ratings yet
Understanding Research Philosophies and Approaches
14 pages
Lecture 9.1 - Model Evaluations - Train Test Cross-Validate (Autosaved)
No ratings yet
Lecture 9.1 - Model Evaluations - Train Test Cross-Validate (Autosaved)
33 pages
Research Project Full Plan Revised
No ratings yet
Research Project Full Plan Revised
2 pages
Critical Book Review Methodology Guide
No ratings yet
Critical Book Review Methodology Guide
10 pages
Strategy Implementation Guide
No ratings yet
Strategy Implementation Guide
15 pages
Attitudes Toward Science Education in Mandi
No ratings yet
Attitudes Toward Science Education in Mandi
7 pages
Foundation Scholarship (English Medium) English & Maths Paper 05.01.2025
No ratings yet
Foundation Scholarship (English Medium) English & Maths Paper 05.01.2025
8 pages
Impact of Hindi Language Cartoons On Kids of Lahore
100% (1)
Impact of Hindi Language Cartoons On Kids of Lahore
22 pages
Teens 5 Exam A
No ratings yet
Teens 5 Exam A
4 pages
9 - Improving A Classroom-Based Assessment Test - Worksheet
No ratings yet
9 - Improving A Classroom-Based Assessment Test - Worksheet
5 pages
ALG Accessing Justice
No ratings yet
ALG Accessing Justice
150 pages
Beck and Webb 2003
No ratings yet
Beck and Webb 2003
39 pages
Long Distance Runners
No ratings yet
Long Distance Runners
9 pages
Statistics for Risk Modeling Exam Guide
100% (1)
Statistics for Risk Modeling Exam Guide
5 pages
2nd Mid B.PHARM OBJECTIVE
No ratings yet
2nd Mid B.PHARM OBJECTIVE
4 pages
Characteristics of Tummy Time and Dose Response Relationships With Development in Infants
No ratings yet
Characteristics of Tummy Time and Dose Response Relationships With Development in Infants
9 pages
GED 1 Purposive Communication
No ratings yet
GED 1 Purposive Communication
38 pages
Agriculture Expert & Educator Profile
No ratings yet
Agriculture Expert & Educator Profile
3 pages
A Survey of Strategy-Driven Evasion Methods For PE Malware - Transformation, Concealment, and Attack
No ratings yet
A Survey of Strategy-Driven Evasion Methods For PE Malware - Transformation, Concealment, and Attack
26 pages
AGSC 55 Module 4 Lesson 1 The Research and Development Process
100% (1)
AGSC 55 Module 4 Lesson 1 The Research and Development Process
26 pages
Structural Beam 2BM-13a-s1 of College of Education, Central Luzon State University: A Fatigue Limit Evaluation Using Finite-Element Method
No ratings yet
Structural Beam 2BM-13a-s1 of College of Education, Central Luzon State University: A Fatigue Limit Evaluation Using Finite-Element Method
88 pages
Unit II
No ratings yet
Unit II
98 pages
Audit SDM
No ratings yet
Audit SDM
46 pages