EN3150 Pattern Recognition - L02
EN3150 Pattern Recognition - L02
M. T. U. Sampath K. Perera,
Department of Electronic and Telecommunication Engineering,
University of Moratuwa.
([email protected]).
Semester 5 – Batch 21.
What is learning ?
“A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks in T, as measured
by P, improves with experience E “[1]
tasks T
performance measure P
experience E
tasks T
performance measure P
experience E
data ➢ Supervised
preparation
➢ Unsupervised
Model
training ➢ Semi-supervised
➢ Self-supervised
Model
Evaluation
➢ Reinforcement learning
Learning from data: Supervised learning
➢Supervised learning:
o The algorithm learns from labeled training data to make predictions or decisions.
➢The goal of supervised learning is to learn a mapping function that can predict the correct
output label for new, unseen input examples.
Zero Five
ML Model
Learning from data: Supervised learning
➢ Labeled training data
➢ Handwritten digit - MNIST dataset
A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta and A. A. Bharath, "Generative Adversarial Networks: An Overview," in IEEE Signal Processing Magazine, vol.
35, no. 1, pp. 53-65, Jan. 2018, doi: 10.1109/MSP.2017.2765202.
Learning from data: Reinforcement Learning
➢An agent learns to make decisions by interacting with an environment.
➢The agent receives feedback in the form of rewards or penalties based on its
actions.
➢The goal of the agent is to learn an optimal policy that maximizes the
cumulative reward over time.
action action
If you have labeled data and a clear target variable If you have large amounts of unlabeled data and
to predict, use supervised learning for accurate want to find patterns or groupings in the data, opt
predictions for unsupervised learning
If you have a mix of labeled and unlabeled data, or the cost of labeling data is high, consider using semi-
supervised learning to leverage both types of data
Learning from data and
related challenges
➢ Data Quality and Quantity
o Noisy, incomplete data can lead to inaccurate and
unreliable predictions.
90 o Often requires large amount of data
%
➢ Data Imbalance:
o E.g, in classification imbalance classes
10
% o May lead to poor performance
Learning from data and related challenges
➢Overfitting: Overfitting occurs when a model performs
exceptionally well on the training data but fails to generalize to
new, unseen data.
➢ Underfitting: When a model is too simplistic to capture the
underlying patterns in the data.
Underfitting
➢ Generalization: Ensuring that machine learning models
generalize well to new, unseen data.
Overfitting
Data preparation
➢ Data cleaning: Handle missing or inconsistent data
o Approaches:
inliers
o Removing them
outliers
o Filling with zeros/mean/median,
o Interpolation
➢Data cleaning: outlier* detection and removing
inliers
➢Data Preprocessing: Feature scaling (E.g. normalization)
➢Data Preprocessing: Dimensionality reduction
➢Principal Component Analysis (PCA)
*An outlier in a dataset refers to a data point that deviates significantly from the majority of the other data points.
Data preparation
➢Data Augmentation: Artificially expand the size and diversity of a given dataset.
oE.g Image rotation, flipping, scaling, cropping ➔ New image
➢Imbalanced Data:
o Under sampling of majority class
o Generating synthetic samples of the minority class
Data preprocessing example
➢ https://scikit-learn.org/stable/modules/preprocessing.html
1. Standardization: scale the features of a dataset to have zero mean and unit
variance.
2. Scaling features to a range e.g., between 0 and 1
➢ Min max scalar ➔ [0, 1]
➢ Max Abs Scaler ➔ [-1, 1]
Suggestions?
Data preprocessing example
California Housing Dataset
Independent variables
import pandas as pd
1. MedInc Median income in block group (measured in tens of
from sklearn.datasets import fetch_california_housing thousands of US Dollars)
2. HouseAge Median house age in block group (a lower number is a
Use pandas df.describe() to get followings newer building)
3. AveRooms Average number of rooms per household
MedInc AveOccup 4. AveBedrms Average number of bedrooms per household
count 20640 20640 5. Population Block group population
mean 3.870671 3.070655 6. AveOccup Average number of household members
std 1.899822 10.386050 7. Latitude Block group latitude (a higher value is farther north)
min 0.499900 0.692308 8. Longitude Block group longitude (a higher value is farther west)
25% 2.563400 2.429741
50% 3.534800 2.818116 Dependent variable
75% 4.743250 3.282261 1. medianHouseValue Median house value for households within a
max 15.000100 1243.333333 block (measured in US Dollars)
Data preprocessing example: Min-max Scaler
Data preprocessing example: Min-max Scaler
Min-max Scaler
Standard Scaler
Average Occupancy:[-0.2, 0.2]
Median Income [-2, 4]
Training set
data data Performance evaluation
Model Validation
preparation splitting
• Accuracy
• Precision
Validation set • Recall
• F1-Score
Model Testing • Mean Absolute Error (MAE)
• Mean Squared Error (MSE)
sklearn.model_selection.train_te • Root Mean Squared Error (RMSE)
Testing set
st_split — scikit-learn 1.3.0
documentation
ML Training Process (Supervised
Learning)
Model Training Loss Functions
Used to see how different the guesses
made by a machine learning model are
Training set from the actual correct answers
data data
preparation splitting Model Validation
➢ Mean Squared Error (MSE)
Validation set ➢ Mean Absolute Error (MAE)
➢ Binary Cross-Entropy (Log Loss)
Model Testing ➢ Categorical Cross-Entropy
➢ Sparse Categorical Cross-Entropy
➢ Hinge Loss
Testing set ➢ Kullback-Leibler Divergence (KL Divergence)
➢ Huber Loss
➢ Triplet Loss
➢ Ranking Losses (e.g., Hinge Rank Loss, RankNet Loss))
How to evaluate a model
➢ Accuracy, Precision, Recall, and F-Score
TP + 𝑇𝑁
Accuracy =
Predicted class
TP + TN + FP + 𝐹𝑁
Positive (+) Negative (-) Total
TP
Precision =
Positive (+) False Neg. (FN) P TP + 𝐹𝑃
True Pos. (TP)
True
class
Negative (-) N TP
False Pos. (FP) True Neg. (TN) Recall =
TP + FN
Total P* N*
2
𝐹1 =
type I error, false alarm type II error, miss 1
+
1
Precision Recall
Higher F1-score values indicate a better
Accuracy can be a misleading metric for imbalanced data sets. balance between precision and recall
How to evaluate a model
➢ Accuracy, Precision, Recall, and F-Score
Predicted class
TP
Precision =
TP + 𝐹𝑃
Positive (+) Negative (-) Total
TP
Positive (+) True Pos. (TP) False Neg. (FN) P Recall =
True TP + FN
class
Negative (-) False Pos. (FP) True Neg. (TN) N
Total P* N*
https://en.wikipedia.org/wiki/Precision_and_recall
Model selection
➢ Model selection is the process of choosing the
best model from a set of candidate models for a
specific task.
Sample of the MNIST dataset of handwritten digits
(https://en.wikipedia.org/wiki/MNIST_database)
ML Model
k-Nearest
Convolutional Neural Decision Trees Neighbors (k-NN)
Networks (CNNs)
Model
selection
➢ Hyperparameters (parameters that are
set before the training process begins)
➢ E.g., Learning Rate
➢ Statistical modeling is used to choose the most appropriate model among a set of candidate
models.
➢Model comparison
✓ Generally, more penalize on model complexity than AIC ➔ more complex models less
like to select
Bias-variance trade-off.
➢ Given a dataset with samples denoted as
𝑛
1
MSE = ( 𝑦true,𝑖 −𝑦pred,𝑖 )2
𝑛
𝑖=1
➢ Learned model
Bias-variance trade-off.
➢ Given a dataset with samples denoted as
𝑛
1 𝑦𝑖 = 𝑦true,𝑖 = 𝑓 𝑥𝑖 + 𝑒
Mean Squared Error (MSE) MSE = ( 𝑦true,𝑖 −𝑦pred,𝑖 )2
𝑛
𝑖=1
𝑦pred,𝑖 = 𝑓መ 𝑥𝑖
2
MSE = 𝔼 𝑦true −𝑦pred
2 2
=𝔼 𝑓 𝑥 + 𝑒 − 𝑓መ 𝑥 = 𝔼 𝑓 𝑥 − 𝑓መ 𝑥 +𝑒
2
=𝔼 𝑓 𝑥 − 𝑓መ 𝑥 + 𝔼 (𝑒)2 + 2𝔼 𝑓 𝑥 − 𝑓መ 𝑥 (e)
Assuming e and (𝑓 𝑥 − 𝑓መ 𝑥 ) are
2
=𝔼 𝑓 𝑥 − 𝑓መ 𝑥 + 𝔼 (𝑒)2 + 2 𝔼 𝑓 𝑥 − 𝑓መ 𝑥 2𝔼 e independent
2
=𝔼 𝑓 𝑥 − 𝑓መ 𝑥 + 𝜎𝑒2 𝜎𝑒2 = 𝔼[(𝑒)2 ] - (𝔼[𝑒])2 and assuming 𝔼 e = 0
Bias-variance trade-off.
2 2
𝔼 𝑓 𝑥 − 𝑓መ 𝑥 =𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 + 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥
2 2
= 𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 +𝔼 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥 + 2𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥
2𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥 = 2 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 𝔼 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥 =0
2 2 2
𝔼 𝑓 𝑥 − 𝑓መ 𝑥 = 𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 +𝔼 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥
2
MSE = 𝔼 𝑓 𝑥 − 𝑓መ 𝑥 + 𝜎𝑒2
2 2
= 𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 +𝔼 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥 +𝜎𝑒2
Bias-variance trade-off
2 2
= 𝔼 𝑓 𝑥 − 𝔼 𝑓መ 𝑥 +𝔼 𝔼 𝑓መ 𝑥 − 𝑓መ 𝑥 + 𝜎𝑒2
2 2
=𝔼 𝔼 𝑓መ 𝑥 −𝑓 𝑥 + 𝔼 𝑓መ 𝑥 − 𝔼 𝑓መ 𝑥 + 𝜎𝑒2
2 2
= 𝔼 𝑓መ 𝑥 −𝑓 𝑥 + 𝔼 𝑓መ 𝑥 − 𝔼 𝑓መ 𝑥 + 𝜎𝑒2
2
= bias + variance + Irreducible error
Testing samples
Bias-variance trade-off
Low high
Underfitting Overfitting
high bias but low variance low bias but high variance
Bias-variance trade-off
Underfitting Overfitting
Variance low Variance high
Bias high Bias low
Testing sample
Error
Training sample
Model complexity
Low high
Bias-variance trade-off
Low bias
High bias
Thank You
Q&A