0% found this document useful (0 votes)
17 views38 pages

S24 Lecture 2 ML Problem Formulation

The document discusses the formulation of machine learning problems, focusing on the components of task, performance measure, and experience. It outlines various machine learning tasks such as classification and regression, using examples like the Iris dataset to illustrate input-output relationships. Additionally, it touches on the roles of agents, data representation, and the distinction between supervised and unsupervised learning.

Uploaded by

Shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views38 pages

S24 Lecture 2 ML Problem Formulation

The document discusses the formulation of machine learning problems, focusing on the components of task, performance measure, and experience. It outlines various machine learning tasks such as classification and regression, using examples like the Iris dataset to illustrate input-output relationships. Additionally, it touches on the roles of agents, data representation, and the distinction between supervised and unsupervised learning.

Uploaded by

Shruthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

10-315

Machine Learning
Problem Formulation

Instructor: Pat Virtue


Today
Autoencoder (Aliens) (previous slides)
▪ Features
ML Problem Formulation
▪ Task input and output
▪ Task, Performance, Experience
▪ Data and notation
▪ Examples: Iris Classification and Car Price Regression
ML Training and Models
▪ Linear
▪ Memorization
▪ Nearest Neighbor
ML Problem Formulation
Agents
An agent is an entity that perceives
and acts.
Actions can have an effect on the
environment.
The specific sensors and actuators
affect what the agent is capable
of perceiving and what actions it
is capable of taking

Environment
Sensors
Percepts

Agent
?

Actuators
Actions
Slide credit: [Link]
Agent: Simple Input/Output Task

Agent
Predicted
Input ? Output
Task Input and Output
Input Task Output
Petal measurements Iris classification Category

Time of day Traffic prediction Traffic Volume

Image Image classification Category

Image Image denoising Image

Text Text to image generation Image

??? Face generation Image


Task: Face Generation
[Link]
Machine Learning Problem Formulation
Three components <T,P,E>:
1. Task, T
2. Performance measure, P
3. Experience, E

Definition of learning:
A computer program learns if its performance at tasks in T, as
measured by P, improves with experience E

8
Definition from (Mitchell, 1997)
Machine Learning Problem Formulation
Notation
Task
Formalize the task as a mapping from input to output ℎ 𝑥 → 𝑦ො
Experience
𝑁
Data! Task experience examples will usually be pairs: 𝒟= 𝑥 (𝑖) , 𝑦 (𝑖) 𝑖=1
(input, measured output)

Performance measure
Objective function that gives a single numerical value 𝑁
representing how well the system performs for a given 1 𝑖 𝑖
෍𝕝 𝑦 ≠ 𝑦ො
dataset 𝑁
𝑖=1
▪ Classification: error rate
𝑁
▪ Regression: mean squared error 1 𝑖 𝑖 2
෍ 𝑦 − 𝑦ො
Slide: CMU ML, Tom Mitchel and Roni Rosenfeld
𝑁
𝑖=1
Notation alert: Indicator function
ML Problem Formulation 𝕝 𝑧 = 𝟏(𝑧) = ቊ
1 if 𝑧 is true
0 otherwise
Task
Formalize the task as a mapping from input to output ℎ 𝑥 → 𝑦ො
Experience
𝑁
Data! Task experience examples will usually be pairs: 𝒟= 𝑥 (𝑖) , 𝑦 (𝑖) 𝑖=1
(input, measured output)

Performance measure
Objective function that gives a single numerical value 𝑁
representing how well the system performs for a given 1 𝑖 𝑖
෍𝕝 𝑦 ≠ 𝑦ො
dataset 𝑁
𝑖=1
▪ Classification: error rate
𝑁
▪ Regression: mean squared error 1 𝑖 𝑖 2
෍ 𝑦 − 𝑦ො
Slide: CMU ML, Tom Mitchel and Roni Rosenfeld
𝑁
𝑖=1
Experience: Data and Notation
Example Dataset: Fisher Iris Dataset
Fisher (1936) used 150 measurements of flowers from
3 different species: Iris setosa (0), Iris virginica (1), Iris
versicolor (2) collected by Anderson (1936)

Sepal Sepal Petal Petal


Species
Length Width Length Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
2 5.9 3.0 5.1 1.8

Images and full dataset: [Link]


from sklearn import datasets
Example Dataset: Fisher Iris Dataset iris = datasets.load_iris()
X = [Link]
Assume samples in data are i.i.d. y = [Link]

Dataset notation
𝑖 𝑖 𝑁 Species
Sepal Sepal Petal Petal
𝒟= 𝑦 ,𝐱 𝑖=1
Length Width Length Width
𝑁 0 4.3 3.0 1.1 0.1
𝑖 𝑖 𝑖 𝑖 𝑖
= 𝑦 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 0 4.9 3.6 1.4 0.1
𝑖=1 0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
Linear algebra can represent all data
1 5.7 2.8 4.1 1.3
𝐲 ∈ 0,1,2 𝑁 1 6.3 3.3 4.7 1.6
𝑋 ∈ ℝ𝑁×4 (design matrix) 2 5.9 3.0 5.1 1.8

Images and full dataset: [Link]


from sklearn import datasets
Example Dataset: Fisher Iris Dataset iris = datasets.load_iris()
X = [Link]
Assume samples in data are i.i.d. y = [Link]

Dataset notation
𝑖 𝑖 𝑁 Species
Sepal Sepal Petal Petal
𝒟= 𝑦 ,𝐱 𝑖=1
Length Width Length Width
𝑁 0 4.3 3.0 1.1 0.1
𝑖 𝑖 𝑖 𝑖 𝑖
= 𝑦 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 0 4.9 3.6 1.4 0.1
𝑖=1 0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
Data point 𝑖 = 6: 𝑦 6 , 𝐱 (6) 1 6.3 3.3 4.7 1.6
2 5.9 3.0 5.1 1.8

Images and full dataset: [Link]


from sklearn import datasets
Example Dataset: Fisher Iris Dataset iris = datasets.load_iris()
X = [Link]
Assume samples in data are i.i.d. y = [Link]

Dataset notation
𝑖 𝑖 𝑁 Species
Sepal Sepal Petal Petal
𝒟= 𝑦 ,𝐱 𝑖=1
Length Width Length Width
𝑁 0 4.3 3.0 1.1 0.1
𝑖 𝑖 𝑖 𝑖 𝑖
= 𝑦 , 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 0 4.9 3.6 1.4 0.1
𝑖=1 0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
Linear algebra can represent all data
1 5.7 2.8 4.1 1.3
𝐲 ∈ 0,1,2 𝑁 1 6.3 3.3 4.7 1.6
𝑋 ∈ ℝ𝑁×4 (design matrix) 2 5.9 3.0 5.1 1.8

Images and full dataset: [Link]


Task: Classification
ML Task: Classification
Predict species label from first two input measurements
ℎ 𝐱 → 𝑦ො

Sepal Sepal
Species
Length Width
0 4.3 3.0
0 4.9 3.6
0 5.3 3.7
1 4.9 2.4
1 5.7 2.8
1 6.3 3.3

Images and full dataset: [Link]


Notation alert: Indicator function
Classification 1 if 𝑧 is true
𝕝 𝑧 = 𝟏(𝑧) = ቊ
0 otherwise
Iris data example
𝑖 𝑖 𝑁 (𝑖) 4 (𝑖)
𝒟= 𝐱 ,𝑦 𝑖=1
, where 𝐱 ∈ ℝ , 𝑦 ∈ {0, 1, 2}

Predict species label from input measurements


ℎ 𝐱 → 𝑦ො Species
Sepal Sepal Petal Petal
Length Width Length Width
Performance measure? 0 4.3 3.0 1.1 0.1

Classification error rate 0 4.9 3.6 1.4 0.1


0 5.3 3.7 1.5 0.2
▪ Fraction of times 𝑦 ≠ 𝑦ො in a given
1 4.9 2.4 3.3 1.0
dataset
1 1 5.7 2.8 4.1 1.3
▪ σ𝑁𝑖=1 𝕝 𝑦 𝑖 ≠𝑦 ො 𝑖
1 6.3 3.3 4.7 1.6
𝑁
2 5.9 3.0 5.1 1.8

Images and full dataset: [Link]


ML Tasks
Supervised learning: Pairs of input and output in training data
𝑖 𝑖 𝑁
𝒟= 𝐱 ,𝑦 𝑖=1
ℎ 𝐱 → 𝑦ො

Classification
▪ Output labels
▪ 𝑦 ∈ 𝒴, where 𝒴 is discrete and order of values has no meaning

Regression
▪ Output values
▪ 𝑦 ∈ 𝒴, where 𝒴 is usually continuous, order of values has meaning
Unsupervised Tasks
ML Tasks
Unsupervised learning
𝑖 𝑁
𝒟= 𝐱 𝑖=1
ℎ 𝐱 →???

▪ Training data has no output values


▪ Tasks can vary
▪ Often used to organize data for future (minimally) supervised learning
Task: Face Generation
[Link]
ML Tasks
Unsupervised learning
𝑖 𝑁
𝒟= 𝐱 𝑖=1
ℎ 𝐱 →???

▪ Training data has no output values


▪ Tasks can vary
▪ Often used to organize data for future (minimally) supervised learning

Example: Unsupervised autoencoder → Random image generation

𝐱 → ℎ 𝐱 → 𝐱ො

𝐱 → 𝑓 𝐱 →𝐳→ 𝑔 𝐳 → 𝐱ො 𝐳 → 𝑔 𝐳 → 𝐱ො
ML Tasks
Unsupervised learning
𝑖 𝑁
𝒟= 𝐱 𝑖=1
ℎ 𝐱 →???

▪ Training data has no output values


▪ Tasks can vary
▪ Often used to organize data for future (minimally) supervised learning

Example: Text Generation


Vocab pause Experience/Data Performance Measure
Task Input ▪ Objective function
▪ Prediction ▪ Input feature Classification
▪ Inference ▪ Measurement ▪ Error rate
▪ Hypothesis function ▪ Attribute ▪ Accuracy rate
▪ Classification Output Regression
▪ Regression ▪ Target ▪ Mean squared error
▪ Class/category/label
▪ True output Training
▪ Measured output ▪ Model
▪ Predicted output ▪ Model structure
Supervised ▪ Model parameters
Unsupervised
Training and ML Models
Machine Learning
Using (training) data to learn a model that we’ll later use for prediction

Training Data Model


Input and Training Structure and
Measured Output Parameters, 𝜃
𝑖 𝑖 𝑁
𝒟𝑡𝑟𝑎𝑖𝑛 = 𝐱 ,𝑦 𝑖=1

Prediction: ℎ(𝐱)
Predicted
Input
Model Output
𝐱 (𝑛𝑒𝑤) 𝑦ො (𝑛𝑒𝑤)
Machine Learning
Using (training) data to learn a model that we’ll later use for prediction
Training Data
𝐱 (1) , 𝑦 1
Model
𝐱 (2) , 𝑦 2
Training Structure and
𝐱 (3) , 𝑦 3 Parameters, 𝜃

𝐱 (𝑁) , 𝑦 𝑁
Prediction: ℎ(𝐱)
Predicted
Input
Model Output
𝐱 (𝑛𝑒𝑤) 𝑦ො (𝑛𝑒𝑤)
Task: Car Price Prediction
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
Example
Trying to see how much I
should sell my car for.

Prediction
Predicted
Input Model
Output
Task: Car Price Prediction
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
What input features should we use?

Prediction
Predicted
Input Model
Output
Poll 2
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
What input features should we use?

Prediction
Predicted
Input Model
Output
Regression
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)

Example
Trying to see how much I
should sell my car for.
Looking up data from car
websites, I find the mileage
for a set of cars and the
selling price for each car.
Machine Learning
Using (training) data to learn a model that we’ll later use for prediction

Training Data Model


Input and Training Structure and
Measured Output Parameters

Prediction
Predicted
Input Model
Output
Regression Model
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)

Model?
Regression Model
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
Model: Memorization
Regression Model
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
Model: Nearest neighbor
Regression Model
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)
Model: Linear
Regression Model
Regression: learning a model to predict a numerical output (but not
numbers that just represent categories, that would be classification)

Model?

You might also like