0% found this document useful (0 votes)
51 views3 pages

Supervised Learning and Perceptron Basics

This document provides background information on supervised learning and perceptrons. It discusses how supervised learning deals with labeled datasets that teach a system to correctly assign outputs to inputs. Perceptrons are introduced as a simple model for supervised learning, with weights and a threshold that define a decision boundary. The perceptron learning rule is described, which updates the weights and threshold based on errors to correctly classify examples over repeated presentations.

Uploaded by

myname is
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views3 pages

Supervised Learning and Perceptron Basics

This document provides background information on supervised learning and perceptrons. It discusses how supervised learning deals with labeled datasets that teach a system to correctly assign outputs to inputs. Perceptrons are introduced as a simple model for supervised learning, with weights and a threshold that define a decision boundary. The perceptron learning rule is described, which updates the weights and threshold based on errors to correctly classify examples over repeated presentations.

Uploaded by

myname is
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Harvard University, Neurobiology 101hfm.

Fundamentals in Computational Neuroscience Spring term 2014/15

Lecture notes 1 – Supervised learning & Perceptron


Alexander Mathis, Ashesh Dhawale
February 3, 2015

1 Background: Supervised learning


A supervised learning problem deals with a situation, where one gets a dataset {xj }1≤j≤N and a teacher signal (yj ) (or
desired output; the supervisory signal). The learning system shall “learn” to correctly assign the desired output yj signal
to a piece of data xj . The learned mapping can then be used to predict labels for data the system has never seen before.
The desired output could be real valued (or a vector itself), but we will focus on binary supervisory signals for now and
denote them by +1 and −1.

For instance, xj could be a picture of a handwritten digit – a vector of discretized luminance values; see Fig. 2. You
can easily read each digit. Teaching a machine (computer) how to do this is not as straight forward. We will see that a
perceptron can easily be trained to tell when it is a 0 and when it is a 1.

Figure 1: Discretized, grayscale pictures of handwritten digits. Top row: Various Zeros. Bottom row: Various
Ones. Source: Scikit-learn

These example images have been taken from the digits toy dataset in Scikit-learn.1 A similar, more general benchmark
dataset is the MNIST database of handwritten digits which contains 60, 000 examples training examples and 10, 000 test
examples of 28 × 28 pixel images. Refer to Yann LeCun’s MNIST website2 for more details. Artificial neuronal networks
are the best known algorithms for this problem.3

2 Background: The perceptron


The perceptron is a powerful and simple model (for supervised learning). It can be traced back at least to the seminal paper
by McCulloch and Pitts in 1943, where they showed that based on such neurons “every [Turing] computable algorithm can
be implemented.”4

Figure 2: A perceptron is a simplified neuron model that responds with +1 when w · x ≥ θ, −1 otherwise.

1 Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011. http://scikit-learn.org/
2 http://yann.lecun.com/exdb/mnist/
3 See D.C. Ciresan, U. Meier, L.M. Gambardella and J. Schmidthuber (2010) “Deep, Big, Simple Neural Nets for Handwritten Digit
Recognition.” Neural Computation, Vol. 22, No. 12, Pages 3207-3220. as well as alternative approaches on http://yann.lecun.
com/exdb/mnist/.
4 W.S. McCulloch and W.H. Pitts, (1943) “A logical calculus of the ideas immanent in nervous activity,” Bulletin of Mathematical Bio-

physics, Vol. 5, pp. 115-133.

Page 1 of 3
A perceptron has two parameters the synaptic weight w = (w1 , . . P. , wi , . . . wn ), where the i-th weight connects to input
xi , and threshold θ. For a given input vector x it outputs +1 when i wi xi ≥ θ, and −1 otherwise. We write o(x) for the
output to input vector x.

Note that the weights and the threshold of the perceptron define a hyperplane (of dimension n − 1 in Rn ), which partitions
all inputs x ∈ Rn into two regions. The region with w · x ≥ θ is labeled as +1, the other region as −1.

A supervised learning problem {(x(j) , y (j) }1,...,N is called linearly separable when there are weights w and a threshold θ
such that for all j ∈ {1, . . . , N } the output of the perceptron weights w and a threshold θ satisfies: o(x(j) ) = y (j) .

But how should one pick weights for a learning problem?

2.1 Perceptron learning rule


 
Assume that you are given a pair x(j) , y (j) and a perceptron with weights w and threshold θ. Either o(x(j) ) = y (j) , then
neither w nor θ have to be changed, or o(x(j) ) 6= y (j) . If o(x(j) ) = −1 when y (j) = 1 then wx(j) − θ should be increased.
Conversely when o(x(j) ) = 1 when y (j) = −1 then wx(j) − θ should be decreased.

The perceptron plasticity rule performs such a change:

 
w 7→ w + α/2 y (j) − o(x(j) ) x(j) (1)
 
θ 7→ θ − α/2 y (j) − o(x(j) ) . (2)

Here the arrow 7→ stands for the update of the weight and threshold to “learn the j-th pair of data”. The parameter α is
the learning rate and determines how “fast” new information is incorporated into the parameters. Note that the updated
parameters, denoted by a star, satisfy:

   
w∗ x(j) − θ∗ = (w + α/2 y (j) − o(x(j) ) x(j) ) · x(j) − (θ − α/2 y (j) − o(x(j) ) ) =
 
= w · x(j) − θ + α/2 y (j) − o(x(j) ) (x(j) · x(j) + 1) .
| {z }
>0

Thus, the response of the updated perceptron is the same as the non-updated one plus an increase or decrease in the
intended direction (as discussed above). The parameters are only altered when there is a discrepancy between y (j) and
o(x(j) ); if the output is correct nothing changes.
 
To learn a set of input pairs x(j) , y (j) one applies the learning rule repeatedly to each pair either sequentially or in
j
an arbitrary order. Note that the learning rule neither necessary implies that after its application x(j) will be correctly
classified, nor that patterns that had already been “learned” will remain “learned”. However, an important result states
that: For linearly separable learning problems the perceptron learning rule will find parameters that solve the problem.5

3 Boolean formulas
A truth function is a mapping from a set of truth values to truth values. The domain and range (in classical logic) are
the binary values {truth (T), false (F)}. From one binary variable there are only four truth functions, i.e. 22 (tautology,
contradiction, identity, and negation). Any mapping can be summarized by truth tables:

tautology (↑) contradiction (↓) identity negation ¬


in out in out in out in out
T T T F T T T F
F T F F F F F T

Perhaps more interesting are the mappings from two binary variables; here are a few of the binary truth functions:
tautology conjunction implication exclusive disjunction (XOR)
A B A↑B A B A∧B A B A→B A B A 6↔B
T T T T T T T T T T T F
T F T T F F T F F T F T
F T T F T F F T T F T T
F F T F F F F F T F F F
5 Refer to Dayan & Abbott (2001) “Theoretical Neuroscience” for the proof.

Page 2 of 3
With these functions, one can build syntactically sophisticated sentences, like ((A ∧ B) → C) ↔ (A → (B → C)) or
(¬B → ¬A) → (A → B) which are both tautologies and thus correct ways to reason. We have already encountered
boolean expressions as logical operators when programming in MATLAB. As logical gates they are also fundamental
building blocks in digital circuits and allow the implementation of algorithms.

Page 3 of 3

You might also like