0% found this document useful (0 votes)

13 views58 pages

Understanding Support Vector Machines

Uploaded by

mohamadziad.altabel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views58 pages

Understanding Support Vector Machines

Uploaded by

mohamadziad.altabel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Support Vector Machines

More Generally Kernel Methods

2 Important Concepts
 Seek linear decision boundary with two new
concepts
 Optimization based on maximizing margins
(not GD)
 Kernel mapping for better linear separation
(“massaging data”)

PR , ANN, & ML 2
Two Class Problem: Linear
Separable Case

Many decision
Class 2
boundaries can
separate these two
classes
Which one should
Class 1
we choose?

3
Example of Bad Decision
Boundaries

Class 2 Class 2

Class 1 Class 1

4
Linear classifiers: Which Hyperplane?
 Lots of possible solutions for a,b,c.
 Some methods find a separating hyperplane,
but not the optimal one [according to some
criterion of expected goodness] This line
 E.g., perceptron, GD represents the
 Support Vector Machine (SVM) finds an decision
optimal solution. boundary:
 Maximizes the distance between the ax + by - c = 0
hyperplane and the “difficult points” close to
decision boundary
 One intuition: if there are no points near the
decision surface, then there are no very
uncertain classification decisions

5
5
Another intuition
 If you have to place a fat separator between
classes, you have less choices, and so the
capacity of the model has been decreased

6
6
Support Vector Machine (SVM)
 SVMs maximize the margin Support vectors
around the separating hyperplane.
 A.k.a. large margin
classifiers
 The decision function is fully
specified by a subset of training
samples, the support vectors.
 Quadratic programming problem Maximize
margin
 Seen by many as most successful
current text classification method

7
7
SVM with Kernel Mapping
 Original feature space might not be well
conditioned
 X -> f(X) (in a higher dimensional space)

PR , ANN, & ML 8
Non-linear SVMs
 Datasets that are linearly separable (with some noise) work out great:

0 x

 But what are we going to do if the dataset is just too hard?

0 x
 How about … mapping data to a higher-dimensional space:

0 x

9
9
Non-linear SVMs: Feature spaces
 General idea: the original feature space
can always be mapped to some higher-
dimensional feature space where the
training set is likely separable:

Φ: x → φ(x)

10
10
Transformation to Feature Space
 High computation burden due to high-dimensionality and hard to
get a good estimate
 “Kernel tricks” for efficient computation: Only the inner products
of feature vectors are used, mapping is not explicitly computed
(efficiency in time and space)

( )
( ) ( )
( ) ( ) ( )
(.) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( )
( )

Input space Feature space

11
Kernel-Induced Feature Spaces
 Recall from Perceptron learning rule,
weight is a combination of feature vectors
(those that are difficult to handle, or
classified wrongly)

w ( k )  cyx y ( w ( k )  x)  0, x   or 

w ( k 1) 
 w (k )
otherwise


PR , ANN, & ML 12
Kernel-Induced Feature Spaces
 Or, the decision rule involves only inner
product of features, not features themselves
h( x)    y ( x  x)
i i iw   y x i i i
i i

 The same holds true for mapped space

 Dimension is not important (may not

know the mapping)

 Inner products can be evaluated at the

low-dimensional space
h( x)    i yi ( (x i )   (x)) w   i yi ( x i )
i i

PR , ANN, & ML 13
Kernels
 A function that returns the value of dot
product of mapping of two arguments

k (x1 , x 2 )   (x1 )   (x 2 )   (x1 ), (x 2 )

 The same perceptron style learning can be

applied (just replace all dot products with
kernels; however, SVM has more
sophisticated learning rules)

PR , ANN, & ML 14
Example
 : x  ( x1 , x2 )   (x)  ( x12 ,2 x1 x2 , x22 )  F  R 3
k (x, y )  ( x1 y1  x2 y2 ) 2
 x12 y12  2 x1 y1 x2 y2  x22 y22
 ( x12 , 2 x1 x2 , x22 )  ( y12 , 2 y1 y2 , y22 )
  ( x)   ( y )

15
More Example

 : x  ( x1 , x2 )   (x)  ( x12 , x1 x2 , x1 x2 , x22 )  F  R 4

k (x, y )  ( x1 y1  x2 y2 ) 2
 x12 y12  2 x1 y1 x2 y2  x22 y22
 ( x12 , x1 x2 , x1 x2 , x22 )  ( y12 , y1 y2 , y1 y2 , y22 )
  ( x)   ( y )

16
Even More Example
1 2
 : x  ( x1 , x2 )   (x)  ( x1  x22 ,2 x1 x2 , x12  x22 )  F  R 3
2
k (x, y )  ( x1 y1  x2 y2 ) 2
 x12 y12  2 x1 y1 x2 y2  x22 y22
 ( x12 , 2 x1 x2 , x22 )  ( y12 , 2 y1 y2 , y22 )
1 2 1
 ( x1  x2 ,2 x1 x2 , x1  x2 ) 
2 2 2
( y12  y22 ,2 y1 y2 , y12  y22 )
2 2
  ( x)   ( y )

 The interpretation of mapping  is not

unique even with a single k function
17
Gaussian Kernel
 Identical to the Radial Basis Function
xz
2

k (x, z )   (x),  (z )  exp( )

2 2

 i
 Recall that ex  
x
i  0 i!

 The feature dimension is infinitely high in this

case
 Embedding is not explicitly computed
(impossible), only inner product is needed in SVM

18
SVM Cost Function

 Similar to logistic function, but piecewise

linear, no penalty for z>1 y=1, and z<-1,
y=0

PR , ANN, & ML 19
Margin
 for wTx to be >1 (for y=1) and <-1 (for y=-1), w
must be larger in the left figure than the right. So
large margin imply small |w|2

PR , ANN, & ML 20
Numerical Methods
 Maximize margin or minimize |w|2 cannot
be done by gradient descent but by
quadratic programming (DON'T try it
yourself, find a readily available
implementation SVM-light, SVMlib)
 h( x)    i yi ( xi  x)  b h( x )    i y i K ( x i  x )  b
i i

 i is the number of support vectors, xi is the

i-th support vector, aiis the weight, and b is
a bias

PR , ANN, & ML 21
The Optimization Problem
Let {x1, ..., xn} be our data set and let yi 
{1,-1} be the class label of xi
The decision boundary should classify all
points correctly  hard magin
A constrained optimization problem

||w||2 = wTw

22
Optimization f(X), no constraints
 𝛻𝑓 = 0

y
f=𝑥 2 +𝑦 2

PR , ANN, & ML 23
Optimization f(X), equality
constraints
 g(X) =0:
 Min f +𝜆g or 𝛻𝑓 + 𝜆𝛻𝑔 = 0

y
𝛻𝑓
f=𝑥 2 +𝑦 2
𝛻𝑔

x
g: x+y=1

PR , ANN, & ML 24
Optimization f(X), inequality
constraints
 h(X)<=0: Min f +𝜆h or 𝛻𝑓 + 𝜆𝛻ℎ = 0
 Necessary condition for optimality: 𝛻𝑓=0
 Outside feasible region (h(X)>0), not a solution
 Inside feasible region, a solution (h(X)<0), h does
not matter, 𝜆 = 0
 On the boundary of feasible region h(X)=0, h
behaves just like g (an equality constraint)
 For minimization problem
 𝛻𝑓 must point inside
 𝛻ℎ must point outside
 𝛻𝑓 = −𝜆𝛻ℎ, 𝜆 > 0

25
KKT condition
ℎ 𝑥 ≤0
𝜆>0
𝛻𝑓 + 𝜆𝛻ℎ = 0

PR , ANN, & ML 26
Lagrangian of Original Problem

 The Lagrangian is Lagrangian multipliers

i0
 Note that ||w||2 = wTw
 Setting the gradient of w.r.t. w and b to zero,
𝜕𝐿
=0
𝜕𝑤
𝜕𝐿
=0
𝜕𝑏 27
1 𝑇
L= 𝑤 𝑤 + 𝛼𝑖 (1 − 𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏))
2
1 𝑇
= 𝑤 𝑤 + 𝛼𝑖 − 𝑤 𝑇 𝛼𝑖 𝑦𝑖 𝑥𝑖 +𝑏 𝛼𝑖 𝑦𝑖
2
1 𝑇
= 𝑤 𝑤 + 𝛼𝑖 − 𝑤 𝑇 𝑤
2
1
=- 𝑤 𝑇 𝑤 + 𝛼𝑖
2
1
= - (𝛼𝑖 𝑦𝑖 𝑥𝑖 )𝑇 (𝛼𝑗 𝑦𝑗 𝑥𝑗 ) + 𝛼𝑖
2
1
= 𝛼𝑖 - 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑥𝑖 ∙ 𝑥𝑗
2

PR , ANN, & ML 28
The Dual Optimization Problem
We can transform the problem to its dual

Dot product of X

’s  New variables

(Lagrangian multipliers)
This is a convex quadratic programming
(QP) problem
Global maximum of i can always be found
well established tools for solving this
optimization problem (e.g. cplex)
29
Quadratic Programming
 The cost is quadratic
 Without constraint, there is a unique minimum
 Any descent search technique (e.g., gradient
descent) should eventually find the minimum
 Added twist
 The search is with constraint

PR , ANN, & ML 30
A Geometrical Interpretation
Class 2
Support vectors
8=0.6 10=0 ’s with values
different from zero
(they hold up the
7=0 separating plane)!
2=0
5=0

1=0.8
4=0
6=1.4
9=0
3=0
Class 1

31
Classification with SVMs
 Given a new point (x1,x2), we can score its
projection onto the hyperplane normal:
 In 2 dims: score = w1x1+w2x2+b.
 I.e., compute score: wx + b = ΣαiyixiTx + b

 Set confidence threshold t.

Score > t: yes

Score < -t: no
7
5
Else: don’t know 32 3
32
Soft Margin Classification
 If the training set is not
linearly separable, slack
variables ξi can be added to
allow misclassification of
difficult or noisy examples.
 Allow some errors
 Let some points be
moved to where they ξi
belong, at a cost ξj
 Still, try to minimize training
set errors, and to place
hyperplane “far” from each
class (large margin)
33
33
Soft margin
We allow “error” xi in classification; it is
based on the output of the discriminant
function wTx+b
 xi approximates the number of
misclassified samples New objective function:

Class 2 C : tradeoff parameter between

error and margin;
chosen by the user;
large C means a higher
penalty to errors

Class 1 34
Linear Non-Separable Case
 When classes are not separable
 Introduce slack variables and relax constraints

yi ( w  x i )  1  x i xi  0

 Minimize the objective functions

 Allow some samples into the buffer zone, but minimize
the number and the amount of protrusion
1
L  | w |2 C  x i C : 0 ( weight )
2 i

PR , ANN, & ML 35
Wolfe Dual
1
L  w  w  C  xi 0  x i  1, C  0
2 i

1
   i    i j yi y j x i  x j
i 2 i j
 y
i
i i 0

0  i  C

PR , ANN, & ML 36
General Case
1
L  w  w  C  xi 0  x i  1, C  0
2 i

1
   i    i j yi y j k (x i , x j )
i 2 i j
 y
i
i i 0

0  i  C

PR , ANN, & ML 37
Example
k ( x1 , x 2 )  (1  x1  x 2 ) 2  (1  x11x 21  x12 x 22 ) 2
 (1  x11
2
x 221  2x11x12 x 21x 22  x12
2
x 222  2x11x 21  2x12 x 22 )
 ( x i )  [1, x i21 , 2x i1x i 2 , x i22 , 2x i1 , 2x i 2 ]T
 Xor
9 1 1 1
X y 1 9 1 1
(1,1)  1 K 
(1,1)  1 1 1 9 1
 
(1,1)  1 1 1 1 9
(1,1)  1 L  1   2  3  4
1
 (912  9 22  9 32  9 42
2
 1 2  1 3  1 4
  2 3   2 4
  3 4 ) 1
L  i    i j yi y j k (x i , x j )
i 2 i j
PR , ANN, & ML 38
Example (cont.)
91   2   3   4  1
 1  9 2   3   4  1
y  w ,  ( x)   x1 x2
 1   2  9 3   4  1
1   2   3  9 4  1
1 >=0
SVM can solve Xor!
1   2   3   4 
8
1
w    i yi ( X i )  [ ( X1 )   ( X 2 )   ( X 3 )   ( X 4 )]
i 8
 1   1   1   1   0 
   1   1   
 1       1    0 
1   2   2   2   2    2 
        
8  1   1   1   1   0 
  2   2   2   2   0 
         
   2    2    2   2    0 

PR , ANN, & ML 39
Example: 5 1D data points

Value of discriminant function

class 1 class 2 class 1

1 2 4 5 6

40
Example
5 1D data points
x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class
1 and 4, 5 as class 2  y1=1, y2=1, y3=-1, y4=-1,
y5=1
We use the polynomial kernel of degree 2
K(x,y) = (xy+1)2
C is set to 100

We first find i (i=1, …, 5) by

41
Example
By using a QP solver, we get
1=0, 2=2.5, 3=0, 4=7.333, 5=4.833
Verify (at home) that the constraints are indeed
satisfied
The support vectors are {x2=2, x4=5, x5=6}

The discriminant function is

bis recovered by solving f(2)=1 or by f(5)=-1 or

by f(6)=1, as x , x , x lie on f(y) and all give b=9
2 4 5
with

42
Software
A list of SVM implementation can be found
at [Link]
[Link]/[Link]
Some implementation (such as LIBSVM)
can handle multi-class classification
SVMLight is among one of the earliest
implementation of SVM
Several Matlab toolboxes for SVM are also
available
43
Evaluation: Classic Reuters Data Set
 Most (over)used data set
 21578 documents
 9603 training, 3299 test articles
 118 categories
 An article can be in more than one category

 Learn 118 binary category distinctions

 Average document: about 90 types, 200 tokens

 Average number of classes assigned
 1.24 for docs with at least one category
 Only about 10 out of 118 categories are large

• Earn (2877, 1087) • Trade (369,119)

• Acquisitions (1650, 179) • Interest (347, 131)
Common categories
• Money-fx (538, 179) • Ship (197, 89)
(#train, #test) • Wheat (212, 71)
• Grain (433, 149)
• Crude (389, 189) • Corn (182,
44 56)
44
Reuters Text Categorization data set
(Reuters-21578) document

<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET"

OLDID="12981" NEWID="798">
<DATE> 2-MAR-1987 [Link].42</DATE>
<TOPICS><D>livestock</D><D>hog</D></TOPICS>
<TITLE>AMERICAN PORK CONGRESS KICKS OFF TOMORROW</TITLE>
<DATELINE> CHICAGO, March 2 - </DATELINE><BODY>The American Pork Congress
kicks off tomorrow, March 3, in Indianapolis with 160 of the nations pork producers from 44
member states determining industry positions on a number of issues, according to the National Pork
Producers Council, NPPC.
Delegates to the three day Congress will be considering 26 resolutions concerning various issues,
including the future direction of farm policy and the tax law as it applies to the agriculture sector.
The delegates will also debate whether to endorse concepts of a national PRV (pseudorabies virus)
control and eradication program, the NPPC said.
A large trade show, in conjunction with the congress, will feature the latest in technology in all
areas of the industry, the NPPC added. Reuter
45
</BODY></TEXT></REUTERS>
45
New Reuters: RCV1: 810,000 docs
 Top topics in Reuters RCV1

46
46
Good practice department:
Confusion matrix
This (i, j) entry means 53 of the docs actually in
class i were put in class j by the classifier.
Class assigned by classifier
Actual Class

 In a perfect classification, only the diagonal has non-zero entries

47
47
Per class evaluation measures
cii
 Recall: Fraction of docs in class i  cij
classified correctly: j

 Precision: Fraction of docs assigned cii

class i that are actually about class j c ji
i:
 “Correct rate”: (1- error rate) i cii
Fraction of docs classified  cij
correctly: j i

48
48
Dumais et al. 1998:
Reuters - Accuracy
Rocchio NBayes Trees LinearSVM
earn 92.9% 95.9% 97.8% 98.2%
acq 64.7% 87.8% 89.7% 92.8%
money-fx 46.7% 56.6% 66.2% 74.0%
grain 67.5% 78.8% 85.0% 92.4%
crude 70.1% 79.5% 85.0% 88.3%
trade 65.1% 63.9% 72.5% 73.5%
interest 63.4% 64.9% 67.1% 76.3%
ship 49.2% 85.4% 74.2% 78.0%
wheat 68.9% 69.7% 92.5% 89.7%
corn 48.2% 65.3% 91.8% 91.1%

Avg Top 10 64.6% 81.5% 88.4% 91.4%

Avg All Cat 61.7% 75.2% na 86.4%
Recall: % labeled in category among those stories that are really in category
Precision: % really in category among those stories labeled in category
Break Even: (Recall + Precision) / 2 49
49
Reuters ROC - Category Grain
1
0.9
0.8
0.7
0.6

Recall 0.5
0.4 LSVM
0.3 Decision Tree
0.2
Naïve Bayes
0.1
Find Similar
0
0 0.2 0.4 0.6 0.8 1

Precision
Recall: % labeled in category among those stories that are really in category
50
Precision: % really in category among those stories labeled in category
50
ROC for Category - Crude

1
0.9
0.8
0.7
0.6
Recall
0.5 LSVM
Decision Tree
0.4
Naïve Bayes
0.3 Find Similar
0.2
0.1
0
0 0.2 0.4 0.6 0.8 1
Precision 51
51
Linear Programming
 Standard max  Standard min

PR , ANN, & ML 52
Standard Max Example
 (x1,x2): (#pants, #shirts)
 (b1,b2): (#buttons, #zippers)
 (c1,c2): (profit/pant, profit/shirt)
 (y1, y2): ($/button, $/zipper) <- shadow price
 Aij: i:(buttons, zippers), j:(pants, shirts) use
𝑥1
max 𝑐1, 𝑐2
𝑥2
𝑏𝑛1 𝑏𝑛2 𝑥1 𝑏1
Subject to ≤
𝑧𝑝1 𝑧𝑝2 𝑥2 𝑏2 𝑏1
min𝑦1, 𝑦2
x1 ≥0, x2 ≥0 𝑏2
𝑏𝑛1 𝑏𝑛2
Subject to 𝑦1, 𝑦2 ≥ 𝑐1 𝑐2
𝑧𝑝1 𝑧𝑝2
y1 ≥0, y2 ≥0

PR , ANN, & ML 53
Standard Max
 Max is primal  Min is dual
 Make pants/shirts  Sell your material
by yourself to a buyer
 Max profits for you  Min costs for buyer

 Stay within  Buyer must offer

available buttons more than what you
and zippers can earn by making
things yourself

PR , ANN, & ML 54
Standard Min Example
 (y1,y2): (pound of meat, pound of veggi)
 (b1,b2): ($/pound meat, $/pound veggi)

 (c1,c2): (required calorie, required protein)

 Aij: i:(meat, veggi), j:(calorie, protein) provided

 (x1, x2): ($/1 calorie supplement, $/1 protein supplement)

<- shadow price 𝑏1
min 𝑦1, 𝑦2
𝑏2
𝑐/𝑝𝑚𝑒𝑎𝑡 𝑝/𝑝𝑚𝑒𝑎𝑡
Subject to 𝑦1, 𝑦2 ≥ 𝑐1 𝑐2
𝑐/𝑝𝑣𝑒𝑔𝑔𝑖 𝑝/𝑝𝑣𝑒𝑔𝑔𝑖
y1 ≥0, y2 ≥0 𝑥1
max 𝑐1, 𝑐2
𝑥2
𝑐/𝑚𝑒𝑎𝑡 𝑝/𝑚𝑒𝑎𝑡 𝑥1 𝑏1
Subject to ≤
𝑐/𝑣𝑒𝑔𝑔𝑖 𝑝/𝑣𝑒𝑔𝑔𝑖 𝑥2 𝑏2
x1 ≥0, x2 ≥0
PR , ANN, & ML 55
Standard Min
 Min is primal  Max is dual
 Buy meat and veggi  Aliens sell you
to provide calorie magic calorie and
and protein protein pills
 Min cost for you  Max profits for

 Supply enough seller

nutrient  seller must offer
less than what you
pay to buy for
yourself
PR , ANN, & ML 56
Some Facts about LP
 All linear programming problems can be
transformed into either standard max or
standard min
 If max if feasible then min if feasible and
vice versa
 Optimal max is also the optimal min (no
gap)

PR , ANN, & ML 57
Conversion from Primal to Dual
𝑏1
min 𝑦1, 𝑦2
𝑏2 c1-a11*y1-a21*y2≤0
𝑎11 𝑎12
Subject to 𝑦1, 𝑦2 ≥ 𝑐1 𝑐2 c2-a12*y1-a22*y2≤0
𝑎21 𝑎22
y1 ≥0, y2 ≥0

𝑚𝑎𝑥𝑥1,𝑥2 𝑚𝑖𝑛𝑦1,𝑦2 b1y1+b2y2

+x1*(c1-a11*y1-a21*y2) 𝑚𝑎𝑥𝑥1,𝑥2 𝑚𝑖𝑛𝑦1,𝑦2 𝑐1 ∗ 𝑥1 + 𝑐2 ∗ 𝑥2
+x2*(c2-a12*y1-a22*y2) +y1*(b1-a11*x1-a12*x2)
+y2*(b2-a21*x1-a22*x2)
𝑥1
max 𝑐1, 𝑐2
𝑥2
𝑎11 𝑎12 𝑥1 𝑏1
Subject to ≤
𝑎21 𝑎22 𝑥2 𝑏2
x1 ≥0, x2 ≥0
PR , ANN, & ML 58

Understanding Kernel Tricks in SVMs
No ratings yet
Understanding Kernel Tricks in SVMs
43 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
58 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Extending Machine Learning Models
No ratings yet
Extending Machine Learning Models
64 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
SVM Overview and Applications
No ratings yet
SVM Overview and Applications
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
ML 18-20 SVM
No ratings yet
ML 18-20 SVM
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
33 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Statistical Machine Learning Overview
No ratings yet
Statistical Machine Learning Overview
45 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Chapter 8
No ratings yet
Chapter 8
52 pages
SVM and Perceptron in Linear Classification
No ratings yet
SVM and Perceptron in Linear Classification
103 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Introduction to Support Vector Machines
100% (1)
Introduction to Support Vector Machines
56 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
40 pages
Financial Market Volatility Forecasting
No ratings yet
Financial Market Volatility Forecasting
52 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
37 pages
SVM Basics for Machine Learning Enthusiasts
No ratings yet
SVM Basics for Machine Learning Enthusiasts
4 pages
ML SVM Lect10 11
No ratings yet
ML SVM Lect10 11
27 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
10 SVM
No ratings yet
10 SVM
77 pages
History and Basics of Support Vector Machines
No ratings yet
History and Basics of Support Vector Machines
35 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
36 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
SVM Techniques for Image Classification
No ratings yet
SVM Techniques for Image Classification
22 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
61 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
40 pages
SVM Basics for Computer Science Students
No ratings yet
SVM Basics for Computer Science Students
36 pages
SVM for Data Scientists
No ratings yet
SVM for Data Scientists
38 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
SVM
No ratings yet
SVM
21 pages
A Comprehensive Review of AI and Machine Learning Techniques in Antenna Design Optimization and Measurement
No ratings yet
A Comprehensive Review of AI and Machine Learning Techniques in Antenna Design Optimization and Measurement
31 pages
Research Trends in Agriculture Science, Volume 1 by Ms. Sangeeta Dash
No ratings yet
Research Trends in Agriculture Science, Volume 1 by Ms. Sangeeta Dash
175 pages
Automated Prediction of Heart Disease Using AutoML-H20 Across Diverse Datasets
No ratings yet
Automated Prediction of Heart Disease Using AutoML-H20 Across Diverse Datasets
6 pages
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
100% (1)
TIME - Vivian Siahaan - AMAZON STOCK PRICE - VISUALIZATION - FORECASTING - AND PREDIC
672 pages
Avanti Kumari - A Report
No ratings yet
Avanti Kumari - A Report
39 pages
Machine Learning for Business Students
No ratings yet
Machine Learning for Business Students
32 pages
SMS SPAM FILTERING Report
No ratings yet
SMS SPAM FILTERING Report
38 pages
Python NumPy and Machine Learning A Comprehensive Guide
No ratings yet
Python NumPy and Machine Learning A Comprehensive Guide
10 pages
Multi-Dimensional Feature Fusion and Stacking Ensemble Mechanism
No ratings yet
Multi-Dimensional Feature Fusion and Stacking Ensemble Mechanism
14 pages
The Friendly Data Science Handbook 2020
No ratings yet
The Friendly Data Science Handbook 2020
17 pages
Machine Learning in Psychometric Testing
No ratings yet
Machine Learning in Psychometric Testing
7 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
A Review of Artificial Intelligence To Enhance The Security of Big Data Systems - State-of-Art, Methodologies, Applications, and Challenges
No ratings yet
A Review of Artificial Intelligence To Enhance The Security of Big Data Systems - State-of-Art, Methodologies, Applications, and Challenges
19 pages
50 Essential Machine Learning Questions
No ratings yet
50 Essential Machine Learning Questions
5 pages
Concrete Mix Design: AI & OR Techniques
No ratings yet
Concrete Mix Design: AI & OR Techniques
23 pages
Forest Optimization for Feature Selection
No ratings yet
Forest Optimization for Feature Selection
9 pages
Ps and Solution CS229
No ratings yet
Ps and Solution CS229
55 pages
Malayalam Mood Extraction Study
No ratings yet
Malayalam Mood Extraction Study
4 pages
A Survey On Customer Churn Prediction in
No ratings yet
A Survey On Customer Churn Prediction in
6 pages
Morphological Analyzer For Tamil
No ratings yet
Morphological Analyzer For Tamil
37 pages
Experiment 7 Support Vector Machine (SVM)
No ratings yet
Experiment 7 Support Vector Machine (SVM)
2 pages
Groundwater Level Prediction Using Machine Learning Algorithms in A Drought-Prone Area
No ratings yet
Groundwater Level Prediction Using Machine Learning Algorithms in A Drought-Prone Area
24 pages
Machine Learning in Acoustics - Theory and Applications
No ratings yet
Machine Learning in Acoustics - Theory and Applications
40 pages
Basic Concepts of Machine Learning For Beginners
No ratings yet
Basic Concepts of Machine Learning For Beginners
102 pages
Automated Vehicle Parking Occupancy Detection in Real-Time
No ratings yet
Automated Vehicle Parking Occupancy Detection in Real-Time
6 pages
21BRS1482 52279
No ratings yet
21BRS1482 52279
68 pages
Time Series Linear Models
No ratings yet
Time Series Linear Models
121 pages
Playstore App Review Analysis: Capstone Project
No ratings yet
Playstore App Review Analysis: Capstone Project
11 pages
Facial Emotion Recognition System Diagram
No ratings yet
Facial Emotion Recognition System Diagram
9 pages
CampusX DSMP Syllabus
No ratings yet
CampusX DSMP Syllabus
48 pages

Understanding Support Vector Machines

Uploaded by

Understanding Support Vector Machines

Uploaded by

Support Vector Machines

More Generally Kernel Methods

 But what are we going to do if the dataset is just too hard?

Input space Feature space

 The same holds true for mapped space

know the mapping)

k (x1 , x 2 )   (x1 )   (x 2 )   (x1 ), (x 2 )

 The same perceptron style learning can be

 : x  ( x1 , x2 )   (x)  ( x12 , x1 x2 , x1 x2 , x22 )  F  R 4

 The interpretation of mapping  is not

k (x, z )   (x),  (z )  exp( )

 The feature dimension is infinitely high in this

 Similar to logistic function, but piecewise

 i is the number of support vectors, xi is the

 The Lagrangian is Lagrangian multipliers

’s  New variables

 Set confidence threshold t.

Score > t: yes

Class 2 C : tradeoff parameter between

 Minimize the objective functions

Value of discriminant function

class 1 class 2 class 1

We first find i (i=1, …, 5) by

The discriminant function is

bis recovered by solving f(2)=1 or by f(5)=-1 or

 Learn 118 binary category distinctions

 Average document: about 90 types, 200 tokens

• Earn (2877, 1087) • Trade (369,119)

<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET"

 In a perfect classification, only the diagonal has non-zero entries

 Precision: Fraction of docs assigned cii

Avg Top 10 64.6% 81.5% 88.4% 91.4%

 Stay within  Buyer must offer

 (c1,c2): (required calorie, required protein)

 Aij: i:(meat, veggi), j:(calorie, protein) provided

 (x1, x2): ($/1 calorie supplement, $/1 protein supplement)

 Supply enough seller

𝑚𝑎𝑥𝑥1,𝑥2 𝑚𝑖𝑛𝑦1,𝑦2 b1*y1+b2*y2

You might also like

𝑚𝑎𝑥𝑥1,𝑥2 𝑚𝑖𝑛𝑦1,𝑦2 b1y1+b2y2