0% found this document useful (0 votes)
7 views13 pages

Linear Classification: The Perceptron

Uploaded by

sagor mohajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Linear Classification: The Perceptron

Uploaded by

sagor mohajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Linear Classification:

The Perceptron

These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made
their course materials freely available online. Feel free to reuse or adapt these slides for your own academic
purposes, provided that you include proper attribution. Please send comments and corrections to Eric.
Robot Image Credit: Viktoriya Sukhanova © [Link]
Linear Classifiers
• A hyperplane partitions Rd into two half-spaces
– Defined by the normal vector ✓ 2 Rd
• ✓2is orthogonal
Rd to any vector lying ✓
on the hyperplane
– Assumed to pass through the origin
• This is because we incorporated bias term ✓0 into it by x0 = 1

• Consider classification with +1, -1 labels ...

Based on slide by Piyush Rai 2


Linear Classifiers
• Linear classifiers: represent decision boundary by hyperplane
2 3
✓0
6 ✓1 7
6 7 x| = ⇥ 1 x . . . x ⇤ ✓
✓=6 . 7 1 d
4 .. 5
✓d

| 1 if z 0
h(x) = sign(✓ x) where sign(z) =
1 if z < 0
– Note that: ✓ | x > 0 =) y = +1
|
✓ x < 0 =) y = 1
3
The Perceptron

| 1 if z 0
h(x) = sign(✓ x) where sign(z) =
1 if z < 0
• The perceptron uses the following update rule each
time it receives a new training instance (x(i) , y (i) )
↵ ⇣ ⇣ (i) ⌘ (i)

(i)
✓j ✓j h✓ x y xj
2
either 2 or -2

– If the prediction matches the label, make no change


– Otherwise, adjust θ

4
The Perceptron
• The perceptron uses the following update rule each
time it receives a new training instance (x(i) , y (i) )
↵ ⇣ ⇣ ⌘ ⌘
(i) (i) (i)
✓j ✓j h✓ x y xj
2
either 2 or -2

(i) (i)
• Re-write as ✓j ✓j + ↵y xj (only upon misclassification)
– Can eliminate α in this case, since its only effect is to scale θ
by a constant, which doesn’t affect performance

Perceptron
✓ + y (i)
✓Rule: If x(i) is misclassified, do ✓ ✓ + y (i) x(i)
5
Why the Perceptron Update Works

✓old ✓old
x ✓old
+ x + ✓new
+
misclassified

Based on slide by Piyush Rai 6


Why the Perceptron Update Works
• Consider the misclassified example (y = +1)
|
– Perceptron wrongly thinks that ✓old x<0
• Update:
✓new = ✓old + yx = ✓old + x (since y = +1)
• Note that
|
✓new x = (✓old + x)| x
|
= ✓old x + x| x
kxk22 > 0
| | |
• Therefore, is less
✓new
= (✓oldnegative
x + x) xthan ✓old x
<0
|
– So, we are making =
ourselves
✓old + x|correct
x more x on this example!

Based on slide by Piyush Rai 7


The Perceptron Cost Function
• Prediction is correct if y (i) ✓ T x(i) > 0
• Could have used 0/1 loss
Xn
1
J0/1 (✓) = `(sign(✓ T x(i) ), y (i) )
n i=1
where `() is 0 if the prediction is correct, 1 otherwise

Doesn’t produce a useful gradient

Based on slide by Alan Fern 8


The Perceptron Cost Function
• The perceptron uses the following cost function
Xn
1
Jp (✓) = max(0, y (i) ✓ T x(i) )
n i=1

– max(0, y (i) ✓ T x(i) ) is 0 if the prediction is correct


– Otherwise, it is the confidence in the misprediction

Nice gradient

Based on slide by Alan Fern 9


Online Perceptron Algorithm

1.) Let ✓ [0, 0, . . . , 0]


2.) Repeat:
3.) Receive training example (x(i) , y (i) )
4.) if y (i) x(i) ✓  0 // prediction is incorrect
5.) ✓ ✓ + y (i) x(i)

Online learning – the learning mode where the model update is


performed each time a single observation is received

Batch learning – the learning mode where the model update is


performed after observing the entire training set

Based on slide by Alan Fern 10


Online Perceptron Algorithm

Red points are


labeled +

Blue points are


labeled -

See the perceptron in action: [Link]/watch?v=vGwemZhPlsA


Based on slide by Alan Fern 11
Batch Perceptron
n
1.) Given training data (x(i) , y (i) ) i=1
2.) Let ✓ [0, 0, . . . , 0]
2.) Repeat:
2.) Let [0, 0, . . . , 0]
3.) for i = 1 . . . n, do
4.) if y (i) x(i) ✓  0 // prediction for ith instance is incorrect
5.) + y (i) x(i)
6.) /n // compute average update
6.) ✓ ✓+↵
8.) Until k k2 < ✏

• Simplest case: α = 1 and don’t normalize, yields the fixed


increment perceptron
• Guaranteed to find a separating hyperplane if one exists
Based on slide by Alan Fern 12
Improving the Perceptron
• The Perceptron produces many θ‘s during training
• The standard Perceptron simply uses the final θ at test time
– This may sometimes not be a good idea!
– Some other θ may be correct on 1,000 consecutive examples,
but one mistake ruins it!

• Idea: Use a combination of multiple perceptrons


– (i.e., neural networks!)
• Idea: Use the intermediate θ‘s
– Voted Perceptron: vote on predictions of the intermediate θ‘s
– Averaged Perceptron: average the intermediate θ‘s

Based on slide by Piyush Rai 13

You might also like