0% found this document useful (0 votes)

13 views25 pages

CHAPTER 2 Supervised Learning Complete

Uploaded by

mailanh27052003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views25 pages

CHAPTER 2 Supervised Learning Complete

Uploaded by

mailanh27052003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CHAPTER 2:

Supervised Learning
Learning a Class from Examples
 Class C of a “family car”
 Prediction: Is car x a family car?
 Knowledge extraction: What do people expect from a
family car?
 Output:
Positive (+) and negative (–) examples
 Input representation:
x1: price, x2 : engine power

2
Training set X
X  {xt ,r t }tN1

 x1 
x 
x2 

3
Class C
p1  price  p2  AND e1  engine power  e2 

4
Hypothesis class H
 1 if h says x is positive
h( x)  
0 if h says x is negative

Error of h on H

E (h| X )  1hxt   r t 
N

t 1

5
S, G, and the Version Space

most specific hypothesis, S

most general hypothesis, G

h H, between S and G is

consistent
and make up the
version space
(Mitchell, 1997)

6
Margin
 Choose h with largest margin (the distance between
the boundary and closest instance)

7
Candidate Elimination Algorithm
Initialize the sets S and G, respectively, to the sets of maximally
specific and maximally general generalizations that are consistent with the
first observed positive training example;
for each subsequent example ei
if ei is a negative example
- retain in S only those generalizations which do not match ei;
- specialize the members of G that match ei, only to the extent
required so that they no longer match ei, and only in such ways
that each remains more general than some generalization in S;
- remove from G any element that is more specific than some in G
else if ei is a positive example
- retain in G only those generalizations that match ei;
- generalize members of S that do not match ei, only to the
extent required to allow them to match ei, and only in such ways
that each remains more specific than some generalization in G;
- remove from S any element that is more general than some in S
end
Example: Candidate elimination
 Example
Sky AirTemp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same Yes
Sunny Warm High Strong Warm Same Yes
Rainy Cold High Strong Warm Change No
Sunny Warm High Strong Cool Change Yes
Example: Candidate Elimination

Consider a concept described by three attributes predefined as follows:

sky temperature humidity
| | | | | |
Sunny Rainy Warm Coo Normal Low
and the set of positive and negative training examples:
1. ( S W N )+)
2. ( R C L )-)
3. ( S C N )+)
4. ( S W L )-)
Example: candidate Elimination

Size Color Shape Class

Big Red Circle -ve
Small Red Triangle -ve
Small Red Circle +v
Big Blue Circle +ve
Small Blue Circle +ve
VC (Vapnik-Chervonenkis) Dimension

 N points can be labeled in 2N ways as +/–

 H shatters N if there
exists h  H consistent
for any of these:
VC(H ) = N
 An axis-aligned rectangle
shatters 4 points only !
 What about if the points are along one axis ?
 What about 5 points ?
12
Probably Approximately Correct
(PAC) Learning
 How many training examples N should we have, such that with probability
at least 1 ‒ δ, h has error (probablity) at most ε ?
(Blumer et al., 1989)

 Each strip is at most ε/4

 Pr that we miss a strip 1‒ ε/4
 Pr that N instances miss a strip (1 ‒ ε/4)N
 Pr that N instances miss 4 strips 4(1 ‒ ε/4)N
 4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x)
 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

13
Noise and Model Complexity
Select simpler model because
 Simpler to use (lower computational complexity)

 Easier to train (lower space complexity)

 Easier to explain (more interpretable)

 Generalizes better (lower variance - Occam’s

razor)
 Occam’s razor: simpler explanations are more plausible
and unnecessary complexity should be shaved off

14
Learning Multiple Classes
Data: X  {x ,r }t 1
t t N

1 if x t
Ci
ri  
t

0 if x t
C j , j  i

 Model it as a K two-class
problem
 Results into k hypothesis
Train hypotheses
hi(x), i =1,...,K:

 t
Ci
hi x   
t
1 if x
0 if x t
C j , j  i 15
Regression
Data: X  x , r
t

t N

gx   w1x  w0
t 1

r 
t

gx   w2 x 2  w1x  w0
 Actual fn r t  f x t   

 Loss function

E g | X    r  gx 
1 N t
N t 1
t 2


Ex: E w1 ,w0 | X    r t  w1 x t  w0  
N

1 2

N t 1
16
Solution to regression problem
 Close form solution: W= (XT X)-1. XT y
 Worst case time complexity: O(n3)
 n: number of features
 Gradient Descent: Tweak parameters interatively to minimize a cost
function
 Batch Gradient Descent:
 Stochastic Gradient Descent
 Programming Note: SGDRegressor in sklearn.linear_model
 Programming note:
Gradient Descent
 Tweak parameters
 To optimize a cost function

 Learning rate too small

 Learning rate too large

Pitfalls ?
Gradient Descent: Impact of scaling

With Scaling Without Scaling

Practice: Ensure all parameters have similar scale

Gradient Descent: Batch, Stochastic
and Mini batch
 Batch: Update all by using matrix operation

𝜕
𝐸(𝑊)
𝜕𝑤0
𝜕
𝐸(𝑊) 2 𝑇
𝜕𝑤2
 ∇𝑊 (𝐸(𝑊)) = . = 𝑋 . 𝑋. 𝑊 − 𝑟
𝑁
.
𝜕
𝐸(𝑊)
𝜕𝑤𝑁

 Gradient Descent step 𝑊 𝑛𝑒𝑥𝑡 = 𝑊 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝛻𝑊 (𝐸(𝑊))

 : Learning rate
 Stochastic: Pick a random instance and compute gradient on that single instance

 Mini Batch: Pick a random set of instances and computer gradient on that set
Batch vs Gradient vs Mini
Polynomial regression
 Add powers of each features and then train a
linear model
 Programming Note: Use Scikitlearn’s PolynomialFeatures to add
powers of features, then use LinearRegression()
Model Selection & Generalization
 Data is not sufficient to find a unique solution
 The need for inductive bias
 assumptions about H
 Generalization
 How well a model performs on new data
 Overfitting: H more complex than C or f
 Underfitting: H less complex than C or f
Triple Trade-Off

 There is a trade-off between three factors

(Dietterich, 2003):
 Complexity of H, c (H),
 Training set size, N,
 Generalization error, E, on new data
 As N ,E
 As c (H),first Eand then E

24
Cross-Validation: General Practice
 To estimate generalization error, we need data
unseen during training. We split the data as
 Training set (50%)
 Validation set (25%)
 Used to select the best model
 Test (publication) set (25%)
 Test the best model on this set
 Fit the best hypothesis, then select the one that
fits most accurately (on the validation set).
 Resampling when there is few data
25

Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
16 pages
Week 3
No ratings yet
Week 3
56 pages
I2ml2e Chap2 v1 0
No ratings yet
I2ml2e Chap2 v1 0
16 pages
SupervisedLearning 2 33
100% (1)
SupervisedLearning 2 33
32 pages
Unit 1
No ratings yet
Unit 1
92 pages
Machine Learning Course Notes CSE176
No ratings yet
Machine Learning Course Notes CSE176
80 pages
CSE176 Machine Learning Lecture Notes
No ratings yet
CSE176 Machine Learning Lecture Notes
80 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Types and Techniques in Machine Learning
No ratings yet
Types and Techniques in Machine Learning
9 pages
Decision Support: Rule Induction & k-NN
No ratings yet
Decision Support: Rule Induction & k-NN
507 pages
Machine Learning Module 2 Overview
No ratings yet
Machine Learning Module 2 Overview
20 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
73 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
125 pages
PAC Learning and Sample Complexity in ML
No ratings yet
PAC Learning and Sample Complexity in ML
64 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
25 pages
Supervised Classification Basics Explained
No ratings yet
Supervised Classification Basics Explained
16 pages
Classifiers and Decision Trees Overview
No ratings yet
Classifiers and Decision Trees Overview
119 pages
AI ch6
No ratings yet
AI ch6
42 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
145 pages
Machine Learning: Supervised Techniques
No ratings yet
Machine Learning: Supervised Techniques
30 pages
Understanding Machine Learning Basics
No ratings yet
Understanding Machine Learning Basics
12 pages
Concept Learning for Sports Fans
No ratings yet
Concept Learning for Sports Fans
42 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
31 pages
Inductive Bias in Concept Learning
No ratings yet
Inductive Bias in Concept Learning
22 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
Lecture3 Concept Learning
No ratings yet
Lecture3 Concept Learning
42 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
Learning Concept1
No ratings yet
Learning Concept1
61 pages
Lecture 4 Classification P1
No ratings yet
Lecture 4 Classification P1
50 pages
ML 02 Concept
No ratings yet
ML 02 Concept
7 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
PAC Learning & Machine Learning Course
No ratings yet
PAC Learning & Machine Learning Course
36 pages
ML Lab Programs
No ratings yet
ML Lab Programs
42 pages
Understanding Machine Learning Concepts
No ratings yet
Understanding Machine Learning Concepts
54 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
75 pages
Machine Learning for Nigerian Languages
No ratings yet
Machine Learning for Nigerian Languages
67 pages
Supervised Learning and Matrix Operations
No ratings yet
Supervised Learning and Matrix Operations
65 pages
Machine Learning Course Syllabus Overview
No ratings yet
Machine Learning Course Syllabus Overview
118 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
UNIT I 4 ML Hypothesis & Concept Learning
No ratings yet
UNIT I 4 ML Hypothesis & Concept Learning
69 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
Statistical Machine Learning Insights
No ratings yet
Statistical Machine Learning Insights
35 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning Fundamentals and Techniques
No ratings yet
Machine Learning Fundamentals and Techniques
6 pages
Machine Learning Basics: Supervised Learning
No ratings yet
Machine Learning Basics: Supervised Learning
6 pages
Hypothesis Learning for Sports Enjoyment
No ratings yet
Hypothesis Learning for Sports Enjoyment
42 pages
Hypothesis Space & Inductive Bias-1
No ratings yet
Hypothesis Space & Inductive Bias-1
47 pages
Computer Network: 02 December 2024 22:38
No ratings yet
Computer Network: 02 December 2024 22:38
5 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
35 pages
Data Labeling with Linear Classifiers
No ratings yet
Data Labeling with Linear Classifiers
13 pages
Intro Slides
No ratings yet
Intro Slides
31 pages
ML Day4
No ratings yet
ML Day4
24 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
CHAPTER 6 Dimensionality Reduction
No ratings yet
CHAPTER 6 Dimensionality Reduction
20 pages
CHAPTER 3 Bayesian Decision Theory
No ratings yet
CHAPTER 3 Bayesian Decision Theory
19 pages
CHAPTER 4 Parametric Methods
No ratings yet
CHAPTER 4 Parametric Methods
13 pages
Unit 12
No ratings yet
Unit 12
21 pages
2010-2011 Annual Report
No ratings yet
2010-2011 Annual Report
130 pages
XII Preboard Results 2025-26
No ratings yet
XII Preboard Results 2025-26
2 pages
Aurobindo Part 1 PDF
No ratings yet
Aurobindo Part 1 PDF
15 pages
Math
No ratings yet
Math
216 pages
Curriculum Map Esp 7
50% (2)
Curriculum Map Esp 7
15 pages
Mohamed Saqlain Zenskar
No ratings yet
Mohamed Saqlain Zenskar
3 pages
Half Yearly Project List 2024-25
No ratings yet
Half Yearly Project List 2024-25
4 pages
Operations Internship at The Tarzan Way
No ratings yet
Operations Internship at The Tarzan Way
5 pages
Teaching Practice
No ratings yet
Teaching Practice
12 pages
Resume of The Founder - Art Fuller
No ratings yet
Resume of The Founder - Art Fuller
3 pages
Exam Paper TBMM 2063 Multimedia For The Web, Mac 2019
No ratings yet
Exam Paper TBMM 2063 Multimedia For The Web, Mac 2019
4 pages
Business Studies Preamble Forms 3-4
No ratings yet
Business Studies Preamble Forms 3-4
14 pages
The Brothers Bishop 1st Edition Bart Yates Ebook Updated Publication 2025
100% (4)
The Brothers Bishop 1st Edition Bart Yates Ebook Updated Publication 2025
156 pages
How To Teach The Kindergarten
100% (1)
How To Teach The Kindergarten
2 pages
MS-Mock-1 (Oct - Nov) - Gr10-Extended Maths-0580-42-202
No ratings yet
MS-Mock-1 (Oct - Nov) - Gr10-Extended Maths-0580-42-202
7 pages
Action Plan in Music, Arts, Physical Education and Health (MAPEH)
100% (2)
Action Plan in Music, Arts, Physical Education and Health (MAPEH)
2 pages
SQ3R - Reading/Study System: Survey
0% (1)
SQ3R - Reading/Study System: Survey
3 pages
Acknowledgement of SIM Card Receipt
No ratings yet
Acknowledgement of SIM Card Receipt
2 pages
Curriculum Vitae: James Nyoike
No ratings yet
Curriculum Vitae: James Nyoike
2 pages
Narration
No ratings yet
Narration
12 pages
The Doctrine of Divine Love
No ratings yet
The Doctrine of Divine Love
336 pages
Informed Consent Form - Sample
No ratings yet
Informed Consent Form - Sample
2 pages
Majid Ali CV
No ratings yet
Majid Ali CV
3 pages
Tyeps of Speech Acts
No ratings yet
Tyeps of Speech Acts
10 pages
About John Jack Wigley
No ratings yet
About John Jack Wigley
4 pages
ITS323 Data Communications Final Exam 2010
No ratings yet
ITS323 Data Communications Final Exam 2010
21 pages
Accounting Graduate with Internship Experience
No ratings yet
Accounting Graduate with Internship Experience
1 page
Bio Psycho Social Model
No ratings yet
Bio Psycho Social Model
18 pages
DMDW Notes Unit 2
0% (1)
DMDW Notes Unit 2
11 pages
MATHEMATICS 10 Curriculum Map
86% (7)
MATHEMATICS 10 Curriculum Map
9 pages

CHAPTER 2 Supervised Learning Complete

Uploaded by

CHAPTER 2 Supervised Learning Complete

Uploaded by

CHAPTER 2:

most specific hypothesis, S

h H, between S and G is

Consider a concept described by three attributes predefined as follows:

Size Color Shape Class

 N points can be labeled in 2N ways as +/–

 Each strip is at most ε/4

 Easier to train (lower space complexity)

 Easier to explain (more interpretable)

 Generalizes better (lower variance - Occam’s

 Learning rate too small

 Learning rate too large

With Scaling Without Scaling

Practice: Ensure all parameters have similar scale

 Gradient Descent step 𝑊 𝑛𝑒𝑥𝑡 = 𝑊 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝛻𝑊 (𝐸(𝑊))

 There is a trade-off between three factors

You might also like