0% found this document useful (0 votes)
25 views52 pages

7.2. Machine Learning Support Vector Machine

The document provides an in-depth overview of Support Vector Machines (SVM), a supervised learning algorithm primarily used for classification tasks. It covers key concepts such as types of SVM algorithms, important terms, the mathematical intuition behind SVM, and the optimization functions involved. Additionally, it distinguishes SVM from logistic regression and explains how SVM determines the best hyperplane for classification by maximizing the margin between classes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views52 pages

7.2. Machine Learning Support Vector Machine

The document provides an in-depth overview of Support Vector Machines (SVM), a supervised learning algorithm primarily used for classification tasks. It covers key concepts such as types of SVM algorithms, important terms, the mathematical intuition behind SVM, and the optimization functions involved. Additionally, it distinguishes SVM from logistic regression and explains how SVM determines the best hyperplane for classification by maximizing the margin between classes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Machine Learning:

Support Vector Machine


Part – 2
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 1


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 2


What is a Support Vector Machine(SVM)?
Introduction

Ø SVM is a powerful supervised algorithm that works best on smaller


datasets but on complex ones, is often implemented through an SVM
model.

Ø Support Vector Machine, abbreviated as SVM can be used for both


regression and classification tasks, but generally, they work best in
classification problems.

Ø They were very famous around the time they were created, during the
1990s, and keep on being the go-to method for a high-performing
algorithm with a little tuning.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 3


What is a Support Vector Machine(SVM)?
Introduction
Ø Note: Don’t get confused between SVM and logistic regression. Both the
algorithms try to find the best hyperplane, but the main difference is
logistic regression is a probabilistic approach whereas support vector
machine is based on statistical approaches.

Ø Now the question is which hyperplane does it select? There can be an


infinite number of hyperplanes passing through a point and classifying
the two classes perfectly. So, which one is the best?

Ø SVM does this by finding the maximum margin between the hyperplanes
that means maximum distances between the two classes.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 4


What is a Support Vector Machine(SVM)?
Logistic Regression vs Support Vector Machine (SVM)
Ø Depending on the number of features you have you can either choose
Logistic Regression or SVM.

Ø SVM works best when the dataset is small and complex. It is usually
advisable to first use logistic regression and see how does it performs, if it
fails to give a good accuracy you can go for SVM without any kernel.

Ø Logistic regression and SVM without any kernel have similar performance
but depending on your features, one may be more efficient than the other.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 5


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 6


Types of Support Vector Machine (SVM)
Algorithms
Ø Linear SVM: When the data is perfectly linearly separable only then we
can use Linear SVM. Perfectly linearly separable means that the data
points can be classified into 2 classes by using a single straight line(if 2D).

Ø Non-Linear SVM: When the data is not linearly separable then we can use
Non-Linear SVM, which means when the data points cannot be separated
into 2 classes by using a straight line (if 2D) then we use some advanced
techniques like kernel tricks to classify them. In most real-world
applications we do not find linearly separable datapoints hence we use
kernel trick to solve them.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 7


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 8


Important Terms

Ø Support Vectors: These are the points 𝑥! Maximum Margin


that are closest to the hyperplane. A Maximum Margin
separating line will be defined with the Hyperline
help of these data points.

Ø Margin: it is the distance between the Positive Hyperline


hyperplane and the observations closest
to the hyperplane (support vectors). In
Support Vectors
SVM large margin is considered a good
margin. There are two types of
margins hard margin and soft margin. 𝑥"
Negative Hyperline

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 9


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 10


How Does SVM Work?

Ø SVM is defined such that it is defined in terms of the support vectors only,
we don’t have to worry about other observations since the margin is
made using the points which are closest to the hyperplane (support
vectors), whereas in logistic regression the classifier is defined over all
the points.

Ø Hence SVM enjoys some natural speed-ups.

Ø Let’s understand the working of SVM using an example.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 11


How Does SVM Work?

Ø To classify these points, we can have many


decision boundaries, but the question is which
is the best and how do we find it?

Ø NOTE: Since we are plotting the data points in


a 2-dimensional graph we call this decision
boundary a straight line but if we have more
dimensions, we call this decision boundary
a “hyperplane”

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 12


How Does SVM Work?

Ø The best hyperplane is that plane that has the Optimal Hyperline
maximum distance from both the classes, and Support
this is the main aim of SVM. Vector

Ø This is done by finding different hyperplanes


Support
which classify the labels in the best way then it Vector
will choose the one which is farthest from the
data points or the one which has a maximum
margin.

Maximised Margin

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 13


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 14


Mathematical Intuition Behind SVM
Understanding Dot-Product

Ø We all know that a vector is a quantity that has magnitude as well as


direction and just like numbers we can use mathematical operations such
as addition, multiplication.

Ø We will try to learn about the multiplication of vectors which can be done
in two ways, dot product, and cross product.

Ø The difference is only that the dot product is used to get a scalar value as
a resultant whereas cross-product is used to obtain a vector again.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 15


Mathematical Intuition Behind SVM
Understanding Dot-Product

Ø The dot product can be defined as the


projection of one vector along with
another, multiply by the product of
another vector.

Ø Here a and b are 2 vectors, to find the


dot product between these 2 vectors
we first find the magnitude of both the
vectors and to find magnitude we use
the Pythagorean theorem or the
distance formula.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 16


Mathematical Intuition Behind SVM
Understanding Dot-Product

Ø After finding the magnitude we simply


multiply it with the cosine angle
between both the vectors.
Mathematically it can be written as:

𝐴. 𝐵 = 𝐴 𝑐𝑜𝑠𝜃 ∗ 𝐵

Where |𝐴| cos𝜃 is the projection of 𝐴 on 𝐵


And |𝐵| is the magnitude of vector 𝐵

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 17


Mathematical Intuition Behind SVM
Understanding Dot-Product
Now in SVM we just need the projection of
A not the magnitude of B, we will see why
later.
To just get the projection we can simply
take the unit vector of B because it will be
in the direction of B but its magnitude will
be 1. Hence now the equation becomes:

𝑨. 𝑩 = |𝑨| cos𝜽 ∗ 𝒖𝒏𝒊𝒕 𝒗𝒆𝒄𝒕𝒐𝒓 𝒐𝒇 𝑩

Now let’s move to the next part and see


how we will use this in SVM.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 18


Mathematical Intuition Behind SVM
Use Dot-Product in SVM
Ø Consider a random point X and we
want to know whether it lies on the
right side of the plane or the left side of
the plane (positive or negative).

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 19


Mathematical Intuition Behind SVM
Use Dot-Product in SVM
Ø To find this first we assume this point is
a vector (X) and then we make a vector
(w) which is perpendicular to the
hyperplane.

Ø Let’s say the distance of vector w from


origin to decision boundary is ‘c’.

Ø Now we take the projection of X vector


on w.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 20


Mathematical Intuition Behind SVM
Use Dot-Product in SVM
Ø We already know that projection of any
vector or another vector is called dot-
product.
Ø Hence, we take the dot product of x and
w vectors.
Ø If the dot product is greater than ‘c’
then we can say that the point lies on
the right side.
Ø If the dot product is less than ‘c’ then
the point is on the left side and if the
dot product is equal to ‘c’ then the point
lies on the decision boundary.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 21


Mathematical Intuition Behind SVM
Use Dot-Product in SVM

Ø 𝑋. 𝑤 = 𝑐 (𝑡ℎ𝑒 𝑝𝑜𝑖𝑛𝑡 𝑙𝑖𝑒𝑠 𝑜𝑛 𝑡ℎ𝑒 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦)

Ø 𝑋. 𝑤 > 𝑐 (𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠)

Ø 𝑋. 𝑤 < 𝑐 (𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑎𝑚𝑝𝑙𝑒𝑠)

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 22


Mathematical Intuition Behind SVM
Use Dot-Product in SVM

Ø You must be having this doubt that why did


we take this perpendicular vector w to the
hyperplane?
Ø So what we want is the distance of vector X
from the decision boundary and there can be
infinite points on the boundary to measure
the distance from.
Ø So that’s why we come to standard, we
simply take perpendicular and use it as a
reference and then take projections of all the
other data points on this perpendicular
vector and then compare the distance.
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 23
Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 24


Margin in Support Vector Machine

Ø We all know the equation of a


hyperplane is 𝒘. 𝒙 + 𝒃 = 𝟎 where 𝑤
is a vector normal to hyperplane and
𝑏 is an offset.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 25


Margin in Support Vector Machine

Ø To classify a point as negative or


positive we need to define a decision
rule.

Ø We can define decision rule as:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 26


Margin in Support Vector Machine

Ø If the value of 𝑤. 𝑥 + 𝑏 > 0 then we can


say it is a positive point otherwise it is a
negative point.
Ø Now we need (𝑤, 𝑏) such that the
margin has a maximum distance. Let’s
say this distance is ‘𝑑’.
Ø To calculate ‘d’ we need the equation of
L1 and L2.
Ø For this, we will take few assumptions
that the equation of L1 is 𝒘. 𝒙 + 𝒃 =
𝟏 and for L2 it is 𝒘. 𝒙 + 𝒃 = −𝟏.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 27


Margin in Support Vector Machine

Ø If the value of 𝑤. 𝑥 + 𝑏 > 0 then we can say it is a positive point otherwise


it is a negative point.
Ø Now we need (𝑤, 𝑏) such that the margin has a maximum distance. Let’s
say this distance is ‘𝑑’.
Ø To calculate ‘d’ we need the equation of L1 and L2.
Ø For this, we will take few assumptions that the equation of L1 is 𝒘. 𝒙 +
𝒃 = 𝟏 and for L2 it is 𝒘. 𝒙 + 𝒃 = −𝟏.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 28


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 29


Optimization Function and its Constraints

Ø In order to get our optimization function,


there are few constraints to consider. That
constraint is that “We’ll calculate the distance
(d) in such a way that no positive or negative
point can cross the margin line”.

Ø Let’s write these constraints mathematically:

𝐹𝑜𝑟 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑅𝑒𝑑 𝑝𝑜𝑖𝑛𝑡 𝑤. 𝑋 + 𝑏 ≤ −1


𝐹𝑜𝑟 𝑎𝑙𝑙 𝑡ℎ𝑒 𝐺𝑟𝑒𝑒𝑛 𝑝𝑜𝑖𝑛𝑡 𝑤. 𝑋 + 𝑏 ≥ 1

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 30


Optimization Function and its Constraints

Ø Rather than taking 2 constraints forward,


we’ll now try to simplify these two
constraints into 1. We assume that negative
classes have 𝑦 = −1 and positive classes
have 𝑦 = 1.
Ø We can say that for every point to be
correctly classified this condition should
always be true:

𝑦! (𝑤. 𝑋 + 𝑏) ≥ 1

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 31


Optimization Function and its Constraints

Ø Suppose a green point is correctly classified


that means it will follow 𝑤. 𝑥 + 𝑏 >= 1, if we
multiply this with 𝑦 = 1 we get this same
equation mentioned above.

Ø Similarly, if we do this with a red point


with 𝑦 = −1 we will again get this
equation. Hence, we can say that we need to
maximize (d) such that this constraint holds
true.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 32


Optimization Function and its Constraints

Ø We will take 2 support vectors, 1 from the


negative class and 2nd from the positive class.
Ø The distance between these two vectors 𝑥1
and 𝑥2 will be (𝑥2 − 𝑥1) vector.
Ø What we need is, the shortest distance
between these two points which can be
found using a trick we used in the dot
product.
Ø We take a vector ‘𝑤’ perpendicular to the
hyperplane and then find the projection of
(𝑥2 − 𝑥1) vector on ‘𝑤’.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 33


Optimization Function and its Constraints

Ø Note: this perpendicular vector should be a


unit vector then only this will work. Why this
should be a unit vector? This has been
explained in the dot-product section. To
make this ‘w’ a unit vector we divide this
with the norm of ‘w’.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 34


Optimization Function and its Constraints
Finding Projection of a Vector on Another Vector Using Dot Product

Ø We already know how to find the


projection of a vector on another vector.
We do this by dot-product of both
vectors.
Ø So let’s see how:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 35


Optimization Function and its Constraints
Finding Projection of a Vector on Another Vector Using Dot Product

Ø Since 𝑥2 and 𝑥1 are support


vectors and they lie on the
hyperplane, hence they will
follow 𝑦𝑖 ∗ (2. 𝑥 + 𝑏) = 1 so we
can write it as:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 36


Optimization Function and its Constraints
Finding Projection of a Vector on Another Vector Using Dot Product

Ø Putting equations (2) and (3) in


equation (1) we get:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 37


Optimization Function and its Constraints
Finding Projection of a Vector on Another Vector Using Dot Product

Ø Hence the equation which we


have to maximize is:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 38


Optimization Function and its Constraints
Finding Projection of a Vector on Another Vector Using Dot Product

Ø We have now found our optimization function but there is a catch here
that we don’t find this type of perfectly linearly separable data in the
industry, there is hardly any case we get this type of data and hence we
fail to use this condition we proved here.

Ø The type of problem which we just studied is called Hard Margin


SVM now we shall study soft margin which is similar to this but there are
few more interesting tricks we use in Soft Margin SVM.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 39


Optimization Function and its Constraints
Soft Margin SVM

Ø In real-life applications, we rarely encounter datasets that are perfectly


linearly separable.
Ø Instead, we often come across datasets that are either nearly linearly
separable or entirely non-linearly separable.
Ø Unfortunately, the trick demonstrated above for linearly separable
datasets is not applicable in these cases.

Ø This is where SVM come into play. These are a powerful tool in machine
learning that can effectively handle both almost linearly separable and
non-linearly separable datasets, providing a robust solution to
classification problems in diverse real-world scenarios.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 40


Optimization Function and its Constraints
Soft Margin SVM

Ø To tackle this problem what we do is modify that equation in such a way


that it allows few misclassifications that means it allows few points to be
wrongly classified.

Ø We know that max[𝑓(𝑥)] can also be written as min[1/𝑓(𝑥)], it is


common practice to minimize a cost function for optimization problems;
therefore, we can invert the function.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 41


Optimization Function and its Constraints
Soft Margin SVM

Ø To make a soft margin equation we add 2 more terms to this equation


which is zeta and multiply that by a hyperparameter ‘c’

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 42


Optimization Function and its Constraints
Soft Margin SVM

Ø For all the correctly classified points


our zeta will be equal to 0 and for all
the incorrectly classified points the zeta is
simply the distance of that particular point
from its correct hyperplane that means if
we see the wrongly classified green points
the value of zeta will be the distance of
these points from L1 hyperplane and for
wrongly classified redpoint zeta will be the
distance of that point from L2 hyperplane.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 43


Content

ØWhat is a Support Vector Machine(SVM)?


ØTypes of Support Vector Machine (SVM) Algorithms
ØImportant Terms
ØHow Does Support Vector Machine Work?
ØMathematical Intuition Behind Support Vector Machine
ØMargin in Support Vector Machine
ØOptimization Function and its Constraints
ØKernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 44


Kernels in Support Vector Machine

Ø The most interesting feature of SVM is that it can even work with a non-
linear dataset and for this, we use “Kernel Trick” which makes it easier to
classifies the points. Suppose we have a dataset like this:

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 45


Kernels in Support Vector Machine

Ø Here we see we cannot draw a single line or say hyperplane which can
classify the points correctly.

Ø So what we do is try converting this lower dimension space to a higher


dimension space using some quadratic functions which will allow us to
find a decision boundary that clearly divides the data points.

Ø These functions which help us do this are called Kernels and which
kernel to use is purely determined by hyperparameter tuning.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 46


Kernels in Support Vector Machine

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 47


Kernels in Support Vector Machine
Polynomial Kernel
Following is the formula for the
polynomial kernel:

Here d is the degree of the


polynomial, which we need to specify
manually.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 48


Kernels in Support Vector Machine
Sigmoid Kernel
We can use it as the proxy for neural
networks. Equation is:

It is just taking your input, mapping


them to a value of 0 and 1 so that they
can be separated by a simple straight
line.

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 49


Kernels in Support Vector Machine

Advantages of SVM
• SVM works better when the data is Linear
• It is more effective in high dimensions
• With the help of the kernel trick, we can solve any complex problem
• SVM is not sensitive to outliers
• Can help us with Image classification
Disadvantages of SVM
• Choosing a good kernel is not easy
• It doesn’t show good results on a big dataset
• The SVM hyperparameters are Cost -C and gamma. It is not that easy to
fine-tune these hyper-parameters. It is hard to visualize their impact

Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 50


Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 51
Dr. Oybek Eraliyev Class: Artificial Intelligence SOC4040 52

You might also like