Support Vector Machine
1
2
3
4
5
6
7
8
9
10
11
12
13
Classification: Definition
• Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes is
the class label.
• Find a model for class attribute as a function of
the values of other attributes. q : X Y
• Goal: previously unseen records should be
assigned a class as accurately as possible.
14
Classification Example
height
x1
x x X 2
weight 2
Training examples {( x1 , y1 ), , (x l , yl )}
Linear classifier: x2 yH
H if (w x) b 0
q(x)
J if (w x) b 0
y J (w x) b 0
15
x1
Linear Classifiers
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
Any of these
would be fine..
..but which is
best?
16
Support Vector Machine
x+ M=Margin Width
Support Vectors
are those
datapoints that
X-
the margin
pushes up
against
What we know:
• w . x+ + b = +1 (x x ) w 2
• w . x- + b = -1 M
• w . (x+-x-) = 2 w w
17
Linear SVM Mathematically
• Goal: 1) Correctly classify all training data
wx i b 1 if yi = +1
wx i b 1 If yi = -1
yi ( wxi b) 1 for all i
2
2) Maximize the Margin M
w
1 t
same as minimize ww
2
• We can formulate a Quadratic Optimization Problem and solve for w and b
1 t
Minimize ( w) w w
2
subject to
yi (wxi b) 1 i
18
Linear SVM. Cont.
• Requiring the derivatives with respect to w,b to vanish yields:
m
1 m m
maximize
i 1
i
2 i 1 j 1
i j y i j i j
y x ,x
m
Subject to : i 0
y
i 1
i
i 0 i
• KKT conditions yield:
for any i 0, b y i w, x i
• Where:
m
w i y x i i
i 1 19
Linear SVM. Cont.
• The resulting separating function is:
m
f x i y i x i , x b
i 1
• If f(x) >0, x will be assigned class label +1 else -1
20
Linear SVM. Cont.
• The resulting separating function is:
m
f x i y i x i , x b
i 1
• Notes:
– The points with α=0 do not affect the solution.
– The points with α≠0 are called support vectors.
– The equality conditions hold true only for the
Support Vectors.
21
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be
mapped to some higher-dimensional feature space
where the training set is separable:
Φ: x → φ(x)
22
Non Linear SVM
• Note that the training data appears in the solution only in inner
products.
• If we pre-map the data into a higher and sparser space we can
get more separability and a stronger separation family of
functions.
• The pre-mapping might make the problem infeasible.
• We want to avoid pre-mapping and still have the same
separation ability.
• Suppose we have a simple function that operates on two training
points and implements an inner product of their pre-mappings,
then we achieve better separation with no added cost. 23
24
25
26
27
28
The “Kernel Trick”
• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj
• If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is equivalent to an inner product in some
feature space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
=
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
29
Mercer Kernels
• A Mercer kernel is a function: k:XdXd R
for which there exists a function: : Xd H
such that:
x, y X d k ( x, y ) ( x), ( y )
• A function k(.,.) is a Mercer kernel if
for any function g(.), such that:
( x)dx
2
g
the following holds true:
g ( x) g ( y)k ( x, y)dxdy 0
30
Commonly used Mercer Kernels
k ( x , y ) x, y p
• Homogeneous Polynomial Kernels:
k ( x, y ) x, y 1
p
• Non-homogeneous Polynomial Kernels:
• Radial Basis Function (RBF) Kernels:
k ( x, y ) exp x y
2
31
Solution of non-linear SVM
• The problem:
,x
m
1 m m
maximize
i 1
i
2 i 1 j 1
i j y i j
y k x i j
m
Subject to : i 0
y
i 1
i
• The separating function: 0 i C i
sgn f x sgn i y i k x i , x b
m
i 1
32
33
34
35
36
37
38