Machine Learning, Deep Learning, AI
Tutorials and blog posts
How Support Vector Machines work – an example.
December 18, 2016 Examples example, Support Vector Machine
Support Vector Machines are a common method for binary classification and regression. There is a large amount of resources online that attempt to explain how SVMs works, but few that
include an example with actual numbers. In this section, I explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they
understand SVMs correctly.
Data setup
Optimal separating hyperplane
Solving the quadratic programming problem
Calculating
and
In a previous post, I went over the theory of SVMs in great detail. Even though this chapter works standalone as well, I recommend checking it out first.
Data setup
We will try to separate the points
x1 = (2, 1)
and
x2 = (1, 2)
by hand. To that end, we assign the labels
1
and
−1
as seen in the picture below.
Optimal separating hyperplane
Our goal is to find a hyperplane that separates these points in the best possible way. This means we try to find a hyperplane that leaves the largest margin between the closest points and
the hyperplane. i.e. finding the optimal separating hyperplane.
We saw in the previous chapter that any hyperplane can be represented by
wx + b = 0
w is normal to the hyperplane.
∥w∥
is the perpendicular distance from the hyperplane to the origin
Support Vectors are the points (vectors) closest to the separating hyperplane. In our case, we only have two vectors, which therefore will both be Support Vectors. In the above
representation of hyperplanes, both
∥w∥
and
are unknowns. How do we retrieve them?
In the previous chapter, we deduced the so-called dual form . The dual-form contains an quadratic objective function with constraints. The dual form reads:
L
1
max LD = ∑ αi − ∑ αi αj yi yj ⟨xi , xj ⟩
2
α
i=1 i,j
subject to ∑ αi yi = 0
i=1
αi ≥ 0 ∀i
Solving the quadratic programming problem in order to retrieve
α1
and
α2
This boils down to solving a QP (Quadratic programming) problem.
Note: Quadratic Programming
Quadratic programming (QP) is a special kind of mathematical optimization problem. It is about trying to find a vector
minimizing or maximizing an objective function subject to bounds, linear equality, and inequality constraints.
1
T T
min ( x Hx + f x)(objective function)
x 2
Ax ≤ b (inequality constraint)
Ax = b (equality constraint)
ab ≤ x ≤ ub (bound constraint)
Solving quadratic programming problems is not trivial, but there are many so ware packages containing solvers.
Let´s insert our numbers into
LD
:
L
1
max LD = ∑ αi − ∑ αi αj yi yj ⟨xi , xj ⟩
2
α
i=1 i,j
2 2
1
= α1 + α2 − (α1 α1 ⋅ 1 ⋅ 1 ⋅ ⟨( ),( )⟩+
2
1 1
1 2
+2 ⋅ α1 α2 ⋅ 1 ⋅ (−1) ⋅ ⟨( ),( )⟩+
2 1
1 1
+α2 α2 ⋅ (−1) ⋅ (−1) ⋅ ⟨( ),( )⟩)
2 2
1 2 2
= α1 + α2 − (5α − 8α1 α2 + 5α )
2 1 2
subject to α1 y1 + α2 y2 = 0
α1 ≥ 0, α2 ≥ 0
In order to solve this Quadratic Programming Problem, we make use of a solver that I made using Wolfram Alpha: It maximizes some objective function with respect to arbitrary objective
functions. In the below widget, our optimization problem is already entered (note that
α1 = x
and
α2 = y
).
Objective function x+y-2.5x^2+4xy
subject to x-y=0
and x>=0
and y>=0
Submit
Input interpretation:
Global maximum:
3D plot:
Computing...
Get this widget
The Lagrange-Multipliers solving our Quadratic Programming Problem
(2)
are therefore
α1 = 1
and
α2 = 1
(the solution is listed under the paragraph “Global Maximum”).
Calculating
w
and
The bulk of the work is done now! What´s le to do now is to finally calculate w and b. Here´s how to do it:
L
2 1
w = ∑ αi yi xi = 1 ⋅ 1 ⋅ ( ) + 1 ⋅ (−1) ⋅ ( )
1 2
i=1
2 1
= ( ) − ( )
1 2
1
= ( )
−1
2 2
b = ys − ∑ αm ym ⟨xm , xs ⟩ = 1 − (1 ⋅ 1 ⋅ ⟨( ),( )⟩
1 1
m∈S
2 1
+ 1 ⋅ (−1) ⋅ ⟨( ),( )⟩)
1 2
= 0
Our hyperplane
(1)
, therefore, is:
1
( )x = 0
−1
And that´s it! By the way, you can rewrite that by evaluating the le -hand side:
x1 − x2 = 0 ⇔ x1 = x2
Sharing is caring!
Facebook 0 Twitter Google+