0% found this document useful (0 votes)

60 views40 pages

VC-dimension For Characterizing Classifiers

neural network

Uploaded by

Ratheesh P M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views40 pages

VC-dimension For Characterizing Classifiers

neural network

Uploaded by

Ratheesh P M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 40

VC-dimension for

characterizing
classifiers
Note to other teachers and users of Andrew W. Moore
these slides. Andrew would be delighted
if you found this source material useful in
giving your own lectures. Feel free to use
Associate Professor
these slides verbatim, or to modify them
to fit your own needs. PowerPoint School of Computer Science
originals are available. If you make use
of a significant portion of these slides in Carnegie Mellon University
your own lecture, please include this
message, or the following link to the www.cs.cmu.edu/~awm
source repository of Andrews tutorials:
http://www.cs.cmu.edu/~awm/tutorials .
[email protected]
Comments and corrections gratefully 412-268-7599
received.

Copyright 2001, Andrew W. Moore Nov 20th, 2001

A learning machine
A learning machine f takes an input x and
transforms it, somehow using weights a, into a
predicted output yest = +/- 1

a is some vector of a
adjustable parameters

x f yest

Copyright 2001, Andrew W. Moore VC-dimension: Slide 2

a
Examples
x f yest
f(x,b) = sign(x.x b)
denotes +1
denotes -1

Copyright 2001, Andrew W. Moore VC-dimension: Slide 3

a
Examples
x f yest
f(x,w) = sign(x.w)
denotes +1
denotes -1

Copyright 2001, Andrew W. Moore VC-dimension: Slide 4

a
Examples
x f yest
f(x,w,b) = sign(x.w+b)
denotes +1
denotes -1

Copyright 2001, Andrew W. Moore VC-dimension: Slide 5

How do we characterize power?
Different machines have different amounts of
power.
Tradeoff between:
More power: Can model more complex
classifiers but might overfit.
Less power: Not going to overfit, but restricted in
what it can model.
How do we characterize the amount of power?

Copyright 2001, Andrew W. Moore VC-dimension: Slide 6

Some definitions
Given some machine f
And under the assumption that all training points (xk,yk) were drawn i.i.d
from some distribution.
And under the assumption that future test points will be drawn from the
same distribution
Define
1 Probabilit y of
R(a ) TESTERR (a ) E y f ( x,a )
2 Misclassif ication
Official terminology Terminology well use

Copyright 2001, Andrew W. Moore VC-dimension: Slide 7

1 R 1 Fraction Training
R emp
(a ) TRAINERR (a ) yk f ( xk , a )
R k 1 2 Set misclassif ied

R = #training set
data points

Copyright 2001, Andrew W. Moore VC-dimension: Slide 8

Vapnik-Chervonenkis dimension
1 1 R 1
TESTERR (a ) E y f ( x, a ) TRAINERR (a ) yk f ( xk ,a )
2 R k 1 2
Given some machine f, let h be its VC dimension.
h is a measure of fs power (h does not depend on the choice of training set)
Vapnik showed that with probability 1-h

h(log( 2R / h) 1) log( h / 4)
TESTERR (a ) TRAINERR (a )
R

This gives us a way to estimate the error on

future data based only on the training error
and the VC-dimension of f

Copyright 2001, Andrew W. Moore VC-dimension: Slide 9

What VC-dimension is used for
1 1 R 1
TESTERR (a ) E y f ( x, a ) TRAINERR (a ) yk f ( xk ,a )
2 R k 1 2
Given some machine f, let h be its VC dimension.
h is a measure of fs power.
Vapnik showed that with probability 1-h

h(log( 2R / h) 1) log( h / 4)
TESTERR (a ) TRAINERR (a )
R

This gives us a way to estimate the error on

future data based only on the training error
and the VC-dimension of f

Copyright 2001, Andrew W. Moore VC-dimension: Slide 10

Shattering
Machine f can shatter a set of points x1, x2 .. xr if and only if
For every possible training set of the form (x1,y1) , (x2,y2) , (xr ,yr)
There exists some value of a that gets zero training error.

There are 2r such training sets

to consider, each with a
different combination of +1s
and 1s for the ys

Copyright 2001, Andrew W. Moore VC-dimension: Slide 11

Shattering
Machine f can shatter a set of points x1, x2 .. Xr if and only if
For every possible training set of the form (x1,y1) , (x2,y2) , (xr ,yr)
There exists some value of a that gets zero training error.
Question: Can the following f shatter the following points?

f(x,w) = sign(x.w)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 12

f(x,w) = sign(x.w)

Answer: No problem. There are four training sets to consider

w=(0,1) w=(-2,3) w=(2,-3) w=(0,-1)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 13
Shattering
Machine f can shatter a set of points x1, x2 .. Xr if and only if
For every possible training set of the form (x1,y1) , (x2,y2) , (xr ,yr)
There exists some value of a that gets zero training error.
Question: Can the following f shatter the following points?

f(x,b) = sign(x.x-b)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 14

f(x,b) = sign(x.x-b)

Answer: No way my friend.

Copyright 2001, Andrew W. Moore VC-dimension: Slide 15

Definition of VC dimension
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: Whats VC dimension of f(x,b) = sign(x.x-b)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 16

VC dim of trivial circle
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: Whats VC dimension of f(x,b) = sign(x.x-b)
Answer = 1: we cant even shatter two points! (but its
clear we can shatter 1)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 17

Reformulated circle
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: For 2-d inputs, whats VC dimension of f(x,q,b) =
sign(qx.x-b)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 18

Reformulated circle
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: Whats VC dimension of f(x,q,b) = sign(qx.x-b)

Answer = 2
q,b are -ve

Copyright 2001, Andrew W. Moore VC-dimension: Slide 19

Reformulated circle
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: Whats VC dimension of f(x,q,b) = sign(qx.x-b)

Answer = 2 (clearly cant do 3)

q,b are -ve

Copyright 2001, Andrew W. Moore VC-dimension: Slide 20

VC dim of separating line
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: For 2-d inputs, whats VC-dim of f(x,w,b) = sign(w.x+b)?
Well, can f shatter these three points?

VC dim of line machine
Given machine f, the VC-dimension h is
The maximum number of points that can be
arranged so that f shatter them.
Example: For 2-d inputs, whats VC-dim of f(x,w,b) = sign(w.x+b)?
Well, can f shatter these three points?

Yes, of course.
All -ve or all +ve is trivial
One +ve can be picked off by a line
One -ve can be picked off too.

Can always draw six lines between pairs of four

points.

Can always draw six lines between pairs of four

points.
Two of those lines will cross.

Can always draw six lines between pairs of four

points.
Two of those lines will cross.
If we put points linked by the crossing lines in the
same class they cant be linearly separated
So a line can shatter 3 points but not 4
So VC-dim of Line Machine is 3
Copyright 2001, Andrew W. Moore VC-dimension: Slide 26
VC dim of linear classifiers in m-dimensions
If input space is m-dimensional and if f is sign(w.x-b), what is
the VC-dimension?
Proof that h >= m: Show that m points can be shattered
Can you guess how?

VC dim of linear classifiers in m-dimensions
If input space is m-dimensional and if f is sign(w.x-b), what is
the VC-dimension?
Proof that h >= m: Show that m points can be shattered
Define m input points thus:

x1 = (1,0,0,,0)
x2 = (0,1,0,,0)
:
xm = (0,0,0,,1) So xk[j] = 1 if k=j and 0 otherwise

Let y1, y2, ym , be any one of the 2m combinations of class

labels.
Guess how we can define w1, w2, wm and b to ensure
sign(w. xk + b) = yk for all k ? Note: m
sign (w.x k b) sign b w j . xk [ j ]
j1
Copyright 2001, Andrew W. Moore VC-dimension: Slide 28
VC dim of linear classifiers in m-dimensions
If input space is m-dimensional and if f is sign(w.x-b), what is
the VC-dimension?
Proof that h >= m: Show that m points can be shattered
Define m input points thus:

x1 = (1,0,0,,0)
x2 = (0,1,0,,0)
:
xm = (0,0,0,,1) So xk[j] = 1 if k=j and 0 otherwise

Let y1, y2, ym , be any one of the 2m combinations of class

labels.
Guess how we can define w1, w2, wm and b to ensure
sign(w. xk + b) = yk for all k ? Note: m
sign (w.x k b) sign b w j . xk [ j ]
Answer: b=0 and wk = yk for all k. j1
Copyright 2001, Andrew W. Moore VC-dimension: Slide 29
VC dim of linear classifiers in m-dimensions
If input space is m-dimensional and if f is sign(w.x-b), what is
the VC-dimension?
Now we know that h >= m
In fact, h=m+1
Proof that h >= m+1 is easy
Proof that h < m+2 is moderate

What does VC-dim measure?
Is it the number of parameters?
Related but not really the same.
I can create a machine with one numeric
parameter that really encodes 7 parameters
(How?)
And I can create a machine with 7 parameters
which has a VC-dim of 1 (How?)
Andrews private opinion: it often is the number of parameters that counts.

Structural Risk Minimization
Let f(f) = the set of functions representable by f.
Suppose ( f1 ) ( f 2 ) ( f n )
Then h( f1 ) h( f 2 ) h( f n ) (Hey, can you formally prove this?)
Were trying to decide which machine to use.
We train each machine and make a table
h(log( 2R / h) 1) log( h / 4)
TESTERR (a ) TRAINERR (a )
R
i fi TRAINER VC-Conf Probable upper bound Choice
R on TESTERR
1 f1
2 f2
3 f3
4 f4
5 f5
6 f6
Copyright 2001, Andrew W. Moore VC-dimension: Slide 32
Using VC-dimensionality
Thats what VC-dimensionality is about
People have worked hard to find VC-dimension for..
Decision Trees
Perceptrons
Neural Nets
Decision Lists
Support Vector Machines
And many many more
All with the goals of
1. Understanding which learning machines are more or
less powerful under which circumstances
2. Using Structural Risk Minimization for to choose the
best learning machine
Copyright 2001, Andrew W. Moore VC-dimension: Slide 33
Alternatives to VC-dim-based model selection
What could we do instead of the scheme below?

h(log( 2R / h) 1) log( h / 4)
TESTERR (a ) TRAINERR (a )
R
i fi TRAINER VC-Conf Probable upper bound Choice
R on TESTERR
1 f1
2 f2
3 f3
4 f4
5 f5
6 f6
Copyright 2001, Andrew W. Moore VC-dimension: Slide 34
Alternatives to VC-dim-based model selection
What could we do instead of the scheme below?
1. Cross-validation

i fi TRAINER 10-FOLD-CV-ERR Choice

R
1 f1
2 f2
3 f3
4 f4
5 f5
6 f6
Copyright 2001, Andrew W. Moore VC-dimension: Slide 35
Alternatives to VC-dim-based model selection
What could we do instead of the scheme below?
As the amount of data
1. Cross-validation goes to infinity, AIC
2. AIC (Akaike Information Criterion) promises* to select the
model thatll have the
best likelihood for future
AICSCORE LL( Data | MLE params ) (# parameters ) data
*Subject to about a million
caveats

i fi LOGLIKE(TRAINERR) #parameters AIC Choice

1 f1
2 f2
3 f3
4 f4
5 f5
6 f6
Copyright 2001, Andrew W. Moore VC-dimension: Slide 36
Alternatives to VC-dim-based model selection
What could we do instead of the scheme below?
As the amount of data
1. Cross-validation goes to infinity, BIC
2. AIC (Akaike Information Criterion) promises* to select the
model that the data was
3. BIC (Bayesian Information Criterion) generated from. More
conservative than AIC.
# params *Another million caveats
BICSCORE LL( Data | MLE params ) log R
2
i fi LOGLIKE(TRAINERR) #parameters BIC Choice
1 f1
2 f2
3 f3
4 f4
5 f5
6 f6
Copyright 2001, Andrew W. Moore VC-dimension: Slide 37
Which model selection method is best?
1. (CV) Cross-validation
2. AIC (Akaike Information Criterion)
3. BIC (Bayesian Information Criterion)
4. (SRMVC) Structural Risk Minimize with VC-
dimension
AIC, BIC and SRMVC have the advantage that you only need the
training error.
CV error might have more variance
SRMVC is wildly conservative
Asymptotically AIC and Leave-one-out CV should be the same
Asymptotically BIC and a carefully chosen k-fold should be the same
BIC is what you want if you want the best structure instead of the best
predictor (e.g. for clustering or Bayes Net structure finding)
Many alternatives to the above including proper Bayesian approaches.
Its an emotional issue.
Copyright 2001, Andrew W. Moore VC-dimension: Slide 38
Extra Comments
Beware: that second VC-confidence term is
usually very very conservative (at least hundreds
of times larger than the empirical overfitting effect).
An excellent tutorial on VC-dimension and Support
Vector Machines (which well be studying soon):
C.J.C. Burges. A tutorial on support vector machines
for pattern recognition. Data Mining and Knowledge
Discovery, 2(2):955-974, 1998.
http://citeseer.nj.nec.com/burges98tutorial.html

What you should know
The definition of a learning machine: f(x,a)
The definition of Shattering
Be able to work through simple examples of
shattering
The definition of VC-dimension
Be able to work through simple examples of VC-
dimension
Structural Risk Minimization for model selection
Awareness of other model selection methods

Vcdim
No ratings yet
Vcdim
18 pages
VC Dimensions NG
No ratings yet
VC Dimensions NG
17 pages
ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
No ratings yet
ECS171: Machine Learning: Lecture 8: VC Dimension (LFD 2.2)
43 pages
VC Dimension
No ratings yet
VC Dimension
6 pages
VC Dimension & Model Complexity
No ratings yet
VC Dimension & Model Complexity
42 pages
Vapnik-Chervonenkis Dimension
No ratings yet
Vapnik-Chervonenkis Dimension
6 pages
SML Lecture3
No ratings yet
SML Lecture3
36 pages
C1 - Introduction To ML
No ratings yet
C1 - Introduction To ML
45 pages
Untitled 13
No ratings yet
Untitled 13
3 pages
VC Dim
No ratings yet
VC Dim
16 pages
MLSM Lecture3 190923
No ratings yet
MLSM Lecture3 190923
36 pages
Machine Learning Theory for Students
No ratings yet
Machine Learning Theory for Students
45 pages
hw2 5
No ratings yet
hw2 5
4 pages
Understanding VC Dimension in Machine Learning
No ratings yet
Understanding VC Dimension in Machine Learning
23 pages
The VC Dimension (Final)
No ratings yet
The VC Dimension (Final)
18 pages
Optimization
No ratings yet
Optimization
95 pages
QB Unit 1
No ratings yet
QB Unit 1
6 pages
ML Questions - GROUP - 08
No ratings yet
ML Questions - GROUP - 08
23 pages
VC-Dimension Bounds for Neural Nets
No ratings yet
VC-Dimension Bounds for Neural Nets
17 pages
IV - AI & DS - AL3451 - ML - Unit1 - QB
No ratings yet
IV - AI & DS - AL3451 - ML - Unit1 - QB
5 pages
Aml CH.1
No ratings yet
Aml CH.1
11 pages
Lecture 5
No ratings yet
Lecture 5
12 pages
hw2 3
No ratings yet
hw2 3
3 pages
SVM Insights for Data Scientists
No ratings yet
SVM Insights for Data Scientists
35 pages
Understanding VC Dimension in Hypothesis Classes
No ratings yet
Understanding VC Dimension in Hypothesis Classes
2 pages
Lec 10 SVM
No ratings yet
Lec 10 SVM
35 pages
ML Unit-1
No ratings yet
ML Unit-1
64 pages
Lect 3
No ratings yet
Lect 3
4 pages
ML I
No ratings yet
ML I
4 pages
Unit 1 ML - Ver 2
No ratings yet
Unit 1 ML - Ver 2
56 pages
PAC Learning and Complexity
No ratings yet
PAC Learning and Complexity
14 pages
FML (2) (1) - Compressed
No ratings yet
FML (2) (1) - Compressed
13 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
VC Dim
No ratings yet
VC Dim
22 pages
Machine Learning: VC-Dimension & Ensembles
No ratings yet
Machine Learning: VC-Dimension & Ensembles
7 pages
VC Dimension Detailed
No ratings yet
VC Dimension Detailed
2 pages
Unit 1 ML - Ver 2
No ratings yet
Unit 1 ML - Ver 2
56 pages
05 VC Theory
No ratings yet
05 VC Theory
11 pages
05 VC Bound
No ratings yet
05 VC Bound
27 pages
Support Vector Machines
No ratings yet
Support Vector Machines
35 pages
Week 7 Notes
No ratings yet
Week 7 Notes
11 pages
Unit6 - Lecture 26 1
No ratings yet
Unit6 - Lecture 26 1
23 pages
Neubert2019 Article AnIntroductionToHyperdimension
No ratings yet
Neubert2019 Article AnIntroductionToHyperdimension
12 pages
Neural Network Sizing Guide
No ratings yet
Neural Network Sizing Guide
13 pages
ML-24-SVM-other Info-V.0.1 - 15
No ratings yet
ML-24-SVM-other Info-V.0.1 - 15
20 pages
All Merged Chap 4
No ratings yet
All Merged Chap 4
37 pages
Chapter2 Unit III2024
No ratings yet
Chapter2 Unit III2024
71 pages
Ml-Ii Unit-1
No ratings yet
Ml-Ii Unit-1
4 pages
ML Unit-1
No ratings yet
ML Unit-1
42 pages
VC Dimensions in Multilayer Perceptrons
No ratings yet
VC Dimensions in Multilayer Perceptrons
17 pages
ML.1.Lecture.9 (Where It Actually Comes From)
No ratings yet
ML.1.Lecture.9 (Where It Actually Comes From)
31 pages
04 Handout
No ratings yet
04 Handout
53 pages
VC-Dimension of Exterior Visibility
No ratings yet
VC-Dimension of Exterior Visibility
6 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
PAC Learning and VC Dimension Explained
No ratings yet
PAC Learning and VC Dimension Explained
31 pages
Comparative Adjectives Explained
No ratings yet
Comparative Adjectives Explained
12 pages
Transcript - Cascading New Knowledge (1) - 1-1
100% (1)
Transcript - Cascading New Knowledge (1) - 1-1
2 pages
Preschool The Aesthetics-01-English-Gustav Theodor Fechner.
100% (3)
Preschool The Aesthetics-01-English-Gustav Theodor Fechner.
170 pages
Authenticity in Materials Development For Language Learning 1st Edition Alan Maley Full Chapters Included
No ratings yet
Authenticity in Materials Development For Language Learning 1st Edition Alan Maley Full Chapters Included
95 pages
Influencia de La Temperatura Sobre El Efecto de Los Anestésicos Locales Revista ADM 2015
No ratings yet
Influencia de La Temperatura Sobre El Efecto de Los Anestésicos Locales Revista ADM 2015
7 pages
Understanding Modal Adverbs in English
No ratings yet
Understanding Modal Adverbs in English
4 pages
1taller 2 Inglés P3 Décimo - 2020 1
No ratings yet
1taller 2 Inglés P3 Décimo - 2020 1
8 pages
Core Tools (APQP, PPAP, FMEA, MSA, SPC and Problem Solving)
100% (1)
Core Tools (APQP, PPAP, FMEA, MSA, SPC and Problem Solving)
2 pages
Border Crossings Translation Studies and Other Disciplines
100% (1)
Border Crossings Translation Studies and Other Disciplines
398 pages
Algebra 1: Factoring Trinomials
No ratings yet
Algebra 1: Factoring Trinomials
4 pages
Nguyen, N. H. T., Nguyen, L. T., & Van Ho, H. (2024) .
No ratings yet
Nguyen, N. H. T., Nguyen, L. T., & Van Ho, H. (2024) .
13 pages
Mindfulness for 6th Graders
No ratings yet
Mindfulness for 6th Graders
3 pages
Steven Hursh's Impact on Behavioral Economics
No ratings yet
Steven Hursh's Impact on Behavioral Economics
22 pages
Managing Classroom Elopement
No ratings yet
Managing Classroom Elopement
3 pages
March 2007 Ebook
No ratings yet
March 2007 Ebook
247 pages
Conditionals: Download My Infographic!
No ratings yet
Conditionals: Download My Infographic!
10 pages
Universal Design For Learning (UDL) Guidelines: Lesson Plan Analysis
No ratings yet
Universal Design For Learning (UDL) Guidelines: Lesson Plan Analysis
3 pages
E4l Causative - Exercises Print Ss
100% (1)
E4l Causative - Exercises Print Ss
8 pages
Affordances in Everyday Life A Multidisciplinary Collection of Essays Full Download
100% (8)
Affordances in Everyday Life A Multidisciplinary Collection of Essays Full Download
15 pages
1 s2.0 S2667318525000029 Main
No ratings yet
1 s2.0 S2667318525000029 Main
8 pages
An Efficient Approach For The Detection of Retinopathy of Prematurity Using Deep Learning
No ratings yet
An Efficient Approach For The Detection of Retinopathy of Prematurity Using Deep Learning
10 pages
Kindergarten Daily Lesson Plan
No ratings yet
Kindergarten Daily Lesson Plan
10 pages
2024-2025 Academic Year ...............
No ratings yet
2024-2025 Academic Year ...............
4 pages
Handout 2021 - Teaching Pronunciation
No ratings yet
Handout 2021 - Teaching Pronunciation
8 pages
B1.3.5 Measuring Plant Distribution
No ratings yet
B1.3.5 Measuring Plant Distribution
10 pages
Module 3A:: Designing Instruction in The Different Learning Delivery Modalities
No ratings yet
Module 3A:: Designing Instruction in The Different Learning Delivery Modalities
13 pages
Unit 1: Embracing The Humanities
0% (1)
Unit 1: Embracing The Humanities
35 pages
Overview of Research Methodology
No ratings yet
Overview of Research Methodology
6 pages
Class Magazine
100% (3)
Class Magazine
20 pages
Phonemes: Phonology Phonology Is Essentially The Description of The Systems and Patterns of Speech Sounds
No ratings yet
Phonemes: Phonology Phonology Is Essentially The Description of The Systems and Patterns of Speech Sounds
3 pages

VC-dimension For Characterizing Classifiers

Uploaded by

VC-dimension For Characterizing Classifiers

Uploaded by

VC-dimension for

Copyright 2001, Andrew W. Moore Nov 20th, 2001

Copyright 2001, Andrew W. Moore VC-dimension: Slide 2

Copyright 2001, Andrew W. Moore VC-dimension: Slide 3

Copyright 2001, Andrew W. Moore VC-dimension: Slide 4

Copyright 2001, Andrew W. Moore VC-dimension: Slide 5

Copyright 2001, Andrew W. Moore VC-dimension: Slide 6

Copyright 2001, Andrew W. Moore VC-dimension: Slide 7

Copyright 2001, Andrew W. Moore VC-dimension: Slide 8

This gives us a way to estimate the error on

Copyright 2001, Andrew W. Moore VC-dimension: Slide 9

This gives us a way to estimate the error on

Copyright 2001, Andrew W. Moore VC-dimension: Slide 10

There are 2r such training sets

Copyright 2001, Andrew W. Moore VC-dimension: Slide 11

Copyright 2001, Andrew W. Moore VC-dimension: Slide 12

Answer: No problem. There are four training sets to consider

w=(0,1) w=(-2,3) w=(2,-3) w=(0,-1)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 14

Answer: No way my friend.

Copyright 2001, Andrew W. Moore VC-dimension: Slide 15

Copyright 2001, Andrew W. Moore VC-dimension: Slide 16

Copyright 2001, Andrew W. Moore VC-dimension: Slide 17

Copyright 2001, Andrew W. Moore VC-dimension: Slide 18

Copyright 2001, Andrew W. Moore VC-dimension: Slide 19

Answer = 2 (clearly cant do 3)

Copyright 2001, Andrew W. Moore VC-dimension: Slide 20

Copyright 2001, Andrew W. Moore VC-dimension: Slide 21

Copyright 2001, Andrew W. Moore VC-dimension: Slide 22

Copyright 2001, Andrew W. Moore VC-dimension: Slide 23

Can always draw six lines between pairs of four

Copyright 2001, Andrew W. Moore VC-dimension: Slide 24

Can always draw six lines between pairs of four

Copyright 2001, Andrew W. Moore VC-dimension: Slide 25

Can always draw six lines between pairs of four

Copyright 2001, Andrew W. Moore VC-dimension: Slide 27

Let y1, y2, ym , be any one of the 2m combinations of class

Let y1, y2, ym , be any one of the 2m combinations of class

Copyright 2001, Andrew W. Moore VC-dimension: Slide 30

Copyright 2001, Andrew W. Moore VC-dimension: Slide 31

i fi TRAINER 10-FOLD-CV-ERR Choice

i fi LOGLIKE(TRAINERR) #parameters AIC Choice

Copyright 2001, Andrew W. Moore VC-dimension: Slide 39

Copyright 2001, Andrew W. Moore VC-dimension: Slide 40

You might also like