0% found this document useful (0 votes)

19 views34 pages

SVM Tutorial

Uploaded by

dtaditya26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views34 pages

SVM Tutorial

Uploaded by

dtaditya26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Support Vector

Machine & Its

Applications

A portion (1/3) of Mingyue Tan

the slides are taken from
Prof. Andrew Moore’s
SVM tutorial at The University of British Columbia
[Link] Nov 26, 2004
Overview

 Intro. to Support Vector Machines (SVM)

 Properties of SVM
 Applications
 Gene Expression Data Classification
 Text Categorization if time permits
 Discussion

Linear Classifiers
x f yest
f(x,w,b) = sign(w x +
denotes +1 w x + b>0
b)
denotes -1

0
b=
+
x
w
How would you
classify this
data?

w x + b<0

Linear Classifiers
x f yest
f(x,w,b) = sign(w x +
denotes +1 b)
denotes -1

How would you

classify this
data?

Linear Classifiers
x f yest
f(x,w,b) = sign(w x +
denotes +1 b)
denotes -1

How would you

classify this
data?

Linear Classifiers
x f yest
f(x,w,b) = sign(w x +
denotes +1 b)
denotes -1

Any of these
would be fine..

..but which is
best?

Linear Classifiers
x f yest
f(x,w,b) = sign(w x +
denotes +1 b)
denotes -1

How would you

classify this
data?

Misclassified
to +1 class

Classifier Margin
x f yest
f(x,w,b) = sign(w x +
denotes +1 b)
denotes -1 Define the
margin of a
linear classifier
as the width
that the
boundary could
be increased by
before hitting a
datapoint.

Maximum Margin
x f yest
1. Maximizing the margin is good
according to intuition
f(x,w,b) and PAC
= sign(w x+
denotes +1 theory b)
denotes -1 2. Implies that only
Thesupport vectors
maximum
are important; other training
margin linear
examples are ignorable.
classifier is the
3. Empirically it works very
linear very
classifier
Support well.
with the, um,
Vectors are
those maximum
datapoints margin.
that the
margin pushes This is the
up against simplest kind of
SVM (Called an
Linear SVM LSVM)
Linear SVM Mathematically
1” +
s s= x+ M=Margin
C la e
d ict zon Width
r e
“P
X- -1”
b=1
s s=
wx
+
C la
b=
+
wx 0=-1 d ict zone
r e
x +b “P
w

What we know:  
( x  x ) w 2
 w . x+ + b = +1 M  
 w . x- + b = -1 w w
 w . (x+-x-) = 2
Linear SVM Mathematically
 Goal: 1) Correctly classify all training data
if yi = +1
wxi ify =b-1 1
i

wx
2) Maximize the Margin i
 b 1
for all i

yi ( wxi  b) 1
same as minimize
2
M 
1 t w
 We can formulate a Quadratic Optimization Problem and solve for w and b

 Minimize ww
2
subject to

1 t
 ( w)  w w
2
yi ( wxi  b) 1 i
Solving the Optimization
Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
 Need to optimize a quadratic function subject to linear
constraints.
 Quadratic optimization problems are a well-known class of
mathematical programming problems, and many (rather
intricate) algorithms exist for solving them.
 The solution involves constructing a dual problem where a
Lagrange multiplier αi is associated with every constraint in the
primary problem:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
The Optimization Problem Solution

The solution has the form:
w =Σαiyixi b= yk- wTxk for any xk such that αk 0
 Each non-zero αi indicates that corresponding xi is a
support vector.
 Then the classifying function will have the form:
f(x) = ΣαiyixiTx + b
 Notice that it relies on an inner product between the test
point x and the support vectors xi – we will return to this
later.
 Also keep in mind that solving the optimization problem
involved computing the inner products xiTxj between all
pairs of training points.
Dataset with noise

denotes +1  Hard Margin: So far we require

all data points be classified correctly
denotes -1
- No training error
 What if the training set is
noisy?
- Solution 1: use very powerful
kernels

OVERFITTING!
Soft Margin Classification
Slack variables ξi can be added to allow
misclassification of difficult or noisy examples.

What should our quadratic

11 optimization criterion be?
2 Minimize
R
1
+ b =1
7 w.w  C  εk
wx
wx
+ b =0

+ 1b =- 2 k 1
wx
Hard Margin v.s. Soft Margin
 The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

 The new formulation incorporating slack variables:

Find w and b such that

Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i
 Parameter C can be viewed as a way to control
overfitting.
Linear SVMs: Overview
 The classifier is a separating hyperplane.
 Most “important” training points are support vectors; they
define the hyperplane.
 Quadratic optimization algorithms can identify which training
points xi are support vectors with non-zero Lagrangian
multipliers αi.
 Both in the dual formulation of the problem and in the solution
training points appear only inside dot products:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

f(x) = ΣαiyixiTx + b
Non-linear SVMs
 Datasets that are linearly separable with some noise
work out great:
0 x

 But what are we going to do if the dataset is just too

hard? x
0

 How about… mapping data to a higher-dimensional

space: x2

0 x
Non-linear SVMs: Feature spaces
 General idea: the original input space can always be
mapped to some higher-dimensional feature space
where the training set is separable:

Φ: x → φ(x)
The “Kernel Trick”
 The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj
 If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
 A kernel function is some function that corresponds to an inner product in
some expanded feature space.
 Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
What Functions are Kernels?
 For some functions K(xi,xj) checking that
K(xi,xj)= φ(xi) Tφ(xj) can be cumbersome.
 Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
 Semi-positive definite symmetric functions correspond to a
semi-positive definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)

K= K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)

… … … … …
K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN)
Examples of Kernel Functions
 Linear: K(xi,xj)= xi Txj

 Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

 Gaussian (radial-basis function network):

2
xi  x j
K (x i , x j ) exp( )
2 2

 Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

Non-linear SVMs Mathematically
 Dual problem formulation:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

 The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

 Optimization techniques for finding αi’s remain the same!

Nonlinear SVM - Overview
 SVM locates a separating hyperplane in the
feature space and classify points in that
space
 It does not need to represent the space
explicitly, simply by defining a kernel
function
 The kernel function plays the role of the dot
product in the feature space.
Properties of SVM
 Flexibility in choosing a similarity function
 Sparseness of solution when dealing with large data
sets
- only support vectors are used to specify the separating
hyperplane
 Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the
feature space
 Overfitting can be controlled by soft margin
approach
 Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution
 Feature Selection
SVM Applications

 SVM has been used successfully in many

real-world problems
- text (and hypertext) categorization
- image classification
- bioinformatics (Protein classification,
Cancer classification)
- hand-written character recognition
Application 1: Cancer
Classification
High Dimensional
Genes
- p>1000; n<100
Patients g-1 g-2 …… g-p
P-1
 Imbalanced p-2
…….
- less positive samples
p-n
n
K [ x, x] k ( x, x)  
N
FEATURE SELECTION
 Many irrelevant features
 Noisy In the linear case,
wi2 gives the ranking of dim i
SVM is sensitive to noisy (mis-labeled) data 
Weakness of SVM
 It is sensitive to noise
- A relatively small number of mislabeled examples can
dramatically decrease the performance

 It only considers two classes

- how to do multi-class classification with SVM?
- Answer:
1) with output arity m, learn m SVM’s
 SVM 1 learns “Output==1” vs “Output != 1”

 SVM 2 learns “Output==2” vs “Output != 2”

 :

 SVM m learns “Output==m” vs “Output != m”

2)To predict the output for a new input, just predict with each
SVM and find out which one puts the prediction the furthest
into the positive region.
Application 2: Text
Categorization
 Task: The classification of natural text (or
hypertext) documents into a fixed number of
predefined categories based on their content.
- email filtering, web searching, sorting documents by
topic, etc..
 A document can be assigned to more than
one category, so this can be viewed as a
series of binary classification problems, one
for each category
Representation of Text
IR’s vector space model (aka bag-of-words representation)
 A doc is represented by a vector indexed by a pre-fixed

set or dictionary of terms

 Values of an entry can be binary or weights

 Normalization, stop words, word stems

 Doc x => φ(x)
Text Categorization using
SVM
 The distance between two documents is φ(x)·φ(z)

 K(x,z) = 〈 φ(x)·φ(z) is a valid kernel, SVM can be

used with K(x,z) for discrimination.

 Why SVM?
-High dimensional input space
-Few irrelevant features (dense concept)
-Sparse document vectors (sparse instances)
-Text categorization problems are linearly separable
Some Issues
 Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
- domain experts can give assistance in formulating appropriate
similarity measures

 Choice of kernel parameters

- e.g. σ in Gaussian kernel
- σ is the distance between closest points with different
classifications
- In the absence of reliable criteria, applications rely on the use
of a validation set or cross-validation to set such parameters.

 Optimization criterion – Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters
are tested
Additional Resources
 An excellent tutorial on VC-dimension and Support
Vector Machines:
C.J.C. Burges. A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery, 2(2):955-
974, 1998.

 The VC/SRM/SVM Bible:

Statistical Learning Theory by Vladimir Vapnik, Wiley-
Interscience; 1998

[Link]
Reference
 Support Vector Machine Classification of
Microarray Gene Expression Data, Michael P. S.
Brown William Noble Grundy, David Lin, Nello
Cristianini, Charles Sugnet, Manuel Ares, Jr., David
Haussler
 [Link]/users/mooney/cs391L/[Link]
 Text categorization with Support Vector
Machines:
learning with many relevant features
T. Joachims, ECML - 98

SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
34 pages
SVM Overview and Applications
No ratings yet
SVM Overview and Applications
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Understanding Kernel Tricks in SVMs
No ratings yet
Understanding Kernel Tricks in SVMs
43 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
19 pages
ML SVM Lect10 11
No ratings yet
ML SVM Lect10 11
27 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
55 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
SVM
No ratings yet
SVM
21 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
40 pages
SVMs for Machine Learning Students
No ratings yet
SVMs for Machine Learning Students
36 pages
Financial Market Volatility Forecasting
No ratings yet
Financial Market Volatility Forecasting
52 pages
10 SVM
No ratings yet
10 SVM
23 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
10 SVM
No ratings yet
10 SVM
77 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
History and Basics of Support Vector Machines
No ratings yet
History and Basics of Support Vector Machines
35 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
ML 18-20 SVM
No ratings yet
ML 18-20 SVM
44 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
34 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Intro to Support Vector Machines
No ratings yet
Intro to Support Vector Machines
25 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
5 pages
Principle Component Analysis
No ratings yet
Principle Component Analysis
7 pages
Robust Regression
No ratings yet
Robust Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Ride Test On Vehicles Travelling Over Speed Bumps
No ratings yet
Ride Test On Vehicles Travelling Over Speed Bumps
12 pages
Assignment 1 - Problem - Statements
No ratings yet
Assignment 1 - Problem - Statements
6 pages
Aniket Gurav: Data Scientist Profile
No ratings yet
Aniket Gurav: Data Scientist Profile
4 pages
Practical English M1!25!26
No ratings yet
Practical English M1!25!26
2 pages
Py MVPA
No ratings yet
Py MVPA
17 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
11 pages
(2019) (1609) - Unmanned Aerial Vehicles UAVs A Survey On Civil Applications and Key Research Challenges
No ratings yet
(2019) (1609) - Unmanned Aerial Vehicles UAVs A Survey On Civil Applications and Key Research Challenges
63 pages
Automated Depression Detection Using Deep Representation and Sequence Learning With EEG Signals
No ratings yet
Automated Depression Detection Using Deep Representation and Sequence Learning With EEG Signals
12 pages
Generative AI Course Overview and Curriculum
No ratings yet
Generative AI Course Overview and Curriculum
2 pages
Project 1 3rd Sem
No ratings yet
Project 1 3rd Sem
82 pages
Nerative AI Agents B0F9KK7N2H
100% (5)
Nerative AI Agents B0F9KK7N2H
254 pages
Neural Networks for Signal Reconstruction
No ratings yet
Neural Networks for Signal Reconstruction
16 pages
Dimensionality Reduction Techniques Explained
No ratings yet
Dimensionality Reduction Techniques Explained
51 pages
Ekuma 2023 Artificial Intelligence and Automation in Human Resource Development A Systematic Review
No ratings yet
Ekuma 2023 Artificial Intelligence and Automation in Human Resource Development A Systematic Review
31 pages
Chapter8 GANs
No ratings yet
Chapter8 GANs
24 pages
Gini Index
No ratings yet
Gini Index
6 pages
AI and DevOps
No ratings yet
AI and DevOps
9 pages
Placement Shortlisted 2026 BTech For Sharing
No ratings yet
Placement Shortlisted 2026 BTech For Sharing
147 pages
COVID-19 Sentiment Analysis Using Deep Learning
No ratings yet
COVID-19 Sentiment Analysis Using Deep Learning
7 pages
M.Tech for Software Professionals
No ratings yet
M.Tech for Software Professionals
31 pages
Machine Learning Course Syllabus 2024
No ratings yet
Machine Learning Course Syllabus 2024
3 pages
Assignment 8 Solution
No ratings yet
Assignment 8 Solution
7 pages
A Systematic Review On Data Scarcity Pro2
No ratings yet
A Systematic Review On Data Scarcity Pro2
31 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
45 pages
Neural Networks in Matlab Guide
No ratings yet
Neural Networks in Matlab Guide
13 pages
An End-to-End Solution For Named Entity Recognition in Ecommerce Search
No ratings yet
An End-to-End Solution For Named Entity Recognition in Ecommerce Search
9 pages
Unit No. 01 - Introduction To AI & ML
No ratings yet
Unit No. 01 - Introduction To AI & ML
34 pages
PDF Machine Learning
100% (3)
PDF Machine Learning
222 pages
Ethical Considerations in Emerging Technologies: Balancing Innovation and Morality (WWW - Kiu.ac - Ug)
No ratings yet
Ethical Considerations in Emerging Technologies: Balancing Innovation and Morality (WWW - Kiu.ac - Ug)
6 pages
Car Insurance Claim Prediction with ML
No ratings yet
Car Insurance Claim Prediction with ML
26 pages
Tidy Finance With R First Edition Christoph Scheuch Full Access
No ratings yet
Tidy Finance With R First Edition Christoph Scheuch Full Access
121 pages
Swarm Intelligence Trends and Applications - 1st Edition Complete EPUB Download
No ratings yet
Swarm Intelligence Trends and Applications - 1st Edition Complete EPUB Download
17 pages

SVM Tutorial

Uploaded by

SVM Tutorial

Uploaded by

Support Vector

Machine & Its

A portion (1/3) of Mingyue Tan

 Intro. to Support Vector Machines (SVM)

How would you

How would you

How would you

denotes +1  Hard Margin: So far we require

What should our quadratic

 The new formulation incorporating slack variables:

Find w and b such that

 But what are we going to do if the dataset is just too

 How about… mapping data to a higher-dimensional

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)

K= K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)

 Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

 Gaussian (radial-basis function network):

 Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

 The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

 Optimization techniques for finding αi’s remain the same!

 SVM has been used successfully in many

 It only considers two classes

 SVM 2 learns “Output==2” vs “Output != 2”

 SVM m learns “Output==m” vs “Output != m”

set or dictionary of terms

 Normalization, stop words, word stems

 K(x,z) = 〈 φ(x)·φ(z) is a valid kernel, SVM can be

 Choice of kernel parameters

 Optimization criterion – Hard margin v.s. Soft margin

 The VC/SRM/SVM Bible:

You might also like