Competitive Learning Extended
Competitive Learning Extended
Lecture # 8
Prepared
By
• Inner product of two equal length vectors is maximum when they are
parallel (closest in the sense of angle b/w input and prototypes)
• Therefore the neuron of the FF layer with the highest output is
considered to be closest to the I/P vector
• The output of the FF layer is fed to the recurrent (competitive) layer, the
neurons of the competitive layers compete, at the end of the competition
only the winning neuron will have non zero value.
‘poslin’ is linear for positive values and is zero for negative values
Output of each neuron is decreased proportional to other, a larger value
is decreased less and lesser value will decrease more. The output of
the recurrent layer will be only one neuron with nonzero value. (It
effectively is the same neuron with the max output in the first layer)
is a design parameter with value less than 1/(S-1)
Let’s test our Hamming network on an orange
The output of the FF layer:
Layer 1:
• Recall a single instar could recognize only one pattern, for multiclass problem
we need to have multiple instars. This is accomplished in the Hamming
network
• Let’s have Q number of prototypes
• The number of neurons (S) equals to the number of prototypes (Q). The output
of the first layer:
• When an input ‘p’ is presented, the winning neuron and its neighboring
weight vectors will move towards ‘p’ i.e. they learn ‘p’ Neighborhood of radius 2
• The concept of neighborhood:
Neighborhood of radius 1
• 2D topology of neurons is not mandatory. Neurons can be arranged in
any dimension
• SOFM Demonstration:
• Initially weights are selected at random, we try to train our SOFM with
the vectors (indicated in black color)
Each time a training vector is presented, the neuron with the closest weight
vector will win the competition. The winning neuron and its neighbors
move their weight vectors closer to the input vector (and therefore to
each other), we have used a neighborhood of radius 1
Improving Feature Maps
• Varying neighborhood, initially we have large neighborhood which
continuously decreases as training progresses. Ultimately the
neighborhood only includes the winning neuron speeds up the self-
organizing
• The learning rate can also be varied over time, initially it is taken to be
1 to quickly learn the patterns and then gradually decreased to stabilize
the weights.
• Another alteration: The winning neuron has the larger learning rate
than the neighboring neurons speeds up the SOM
• Rather than using correlation, simple Euclidean distance can be used to
see which neuron is the closest to the input vector.
Another Self-Organizing Map
(SOM) Example
• From Fausett (1994)
• Dimension of feature space = 4, number of clusters
=2
– More typical of SOM application
– Smaller number of units in output than in input;
dimensionality reduction
• Training samples Network Architecture
i1: (1, 1, 0, 0) Input units:
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Output units: 1 2
i4: (0, 0, 1, 1)
16
What should we expect as outputs?
What are the Euclidean Distances
Between the Data Samples?
• Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i1 i2 i3 i4
i3: (1, 0, 0, 0)
i1 0
i4: (0, 0, 1, 1)
i2 0
i3 0
i4 0
17
Euclidean Distances Between Data
Samples
• Training samples
i1: (1, 1, 0, 0)
i1 i2 i3 i4
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0) i1 0
i4: (0, 0, 1, 1) i2 3 0
i3 1 2 0
Input units: i4 4 1 3 0
• Training samples
i1: (1, 1, 0, 0) Output units: 1 2
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
• With only 2 outputs, neighborhood = 0
– Only update weights associated with winning output unit (cluster) at each
iteration
• Learning rate
(t) = 0.6; 1 <= t <= 4
(t) = 0.5 (1); 5 <= t <= 8
(t) = 0.5 (5); 9 <= t <= 12
etc.
• Initial weight matrix Unit 1: .2 .6 .5 .9
(random values between 0 and 1) .8 .4 .7 .3
Unit 2:
2
n
d2 = (Euclidean distance)2 = k 1
(il , k w j , k (t ))
Weight update: w j (t 1) w j (t ) (t )(il w j (t ))
19
Problem: Calculate the weight updates for the first four
i1: (1, 1, 0, 0)
First Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .2 .6 .5 .9 i4: (0, 0, 1, 1)
• Training sample: i1 Unit 2:
.8 .4 .7 .3
– Unit 1 weights
• d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
– Unit 2 weights
• d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98
– Unit 2 wins
– Weights on winning unit are updated
new unit2 weights [.8 .4 .7 .3] 0.6([1 1 0 0] - [.8 .4 .7 .3])
[.92 .76 .28 .12]
– Giving an updated weight matrix:
Unit 1: .2 .6 .5
.9
Unit 2: .92 .76 .28 .12
20
i1: (1, 1, 0, 0)
Second Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .2 .6 .5 .9 i4: (0, 0, 1, 1)
• Training sample: i2
Unit 2: .92 .76 .28 .12
– Unit 1 weights
• d2 = (.2-0)2 + (.6-0)2 + (.5-0)2 + (.9-1)2 = .66
– Unit 2 weights
• d2 = (.92-0)2 + (.76-0)2 + (.28-0)2 + (.12-1)2 = 2.28
– Unit 1 wins
– Weights on winning unit are updated
new unit1 weights [.2 .6 .5 .9] 0.6([0 0 0 1] - [.2 .6 .5 .9])
[.08 .24 .20 .96]
– Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.92 .76 .28 .12
Unit 2:
21
i1: (1, 1, 0, 0)
Third Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .08 .24 .20 .96 i4: (0, 0, 1, 1)
• Training sample: i3 .92 .76 .28 .12
Unit 2:
– Unit 1 weights
• d2 = (.08-1)2 + (.24-0)2 + (.2-0)2 + (.96-0)2 = 1.87
– Unit 2 weights
• d2 = (.92-1)2 + (.76-0)2 + (.28-0)2 + (.12-0)2 = 0.68
– Unit 2 wins
– Weights on winning unit are updated
new unit 2 weights [.92 .76 .28 .12] 0.6([1 0 0 0] - [.92 .76 .28 .12])
[.97 .30 .11 .05]
– Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.97 .30 .11 .05
Unit 2:
22
i1: (1, 1, 0, 0)
Fourth Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .08 .24 .20 .96 i4: (0, 0, 1, 1)
• Training sample: i4
Unit 2: .97 .30 .11 .05
– Unit 1 weights
• d2 = (.08-0)2 + (.24-0)2 + (.2-1)2 + (.96-1)2 = .71
– Unit 2 weights
• d2 = (.97-0)2 + (.30-0)2 + (.11-1)2 + (.05-1)2 = 2.74
– Unit 1 wins
– Weights on winning unit are updated
new unit1 weights [.08 .24 .20 .96] 0.6([0 0 1 1] - [.08 .24 .20 .96])
[.03 .10 .68 .98]
– Giving an updated weight matrix:
Unit 1: .03 .10 .68 .98
Unit 2:
.97 .30 .11 .05
23
Applying the SOM Algorithm
Data sample utilized
• Sample: i1
– Distance from unit1 weights
• (1-0)2 + (1-0)2 + (0-.5)2 + (0-1.0)2 = 1+1+.25+1=3.25
– Distance from unit2 weights
• (1-1)2 + (1-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
• Sample: i2
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (0-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (0-0)2 + (1-0)2 =1+.25+0+1=2.25
2
n
d2 = (Euclidean distance)2 = (i w j ,k (t ))
k 1 l , k
26
Training samples
i1: (1, 1, 0, 0)
Solution Weights
Unit 1: 0 0 .5 1.0
i2: (0, 0, 0, 1) Input units:
Unit 2:
1.0 .5 0 0
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1) Output units: 1 2
• Sample: i3
– Distance from unit1 weights
• (1-0)2 + (0-0)2 + (0-.5)2 + (0-1.0)2 = 1+0+.25+1=2.25
– Distance from unit2 weights
• (1-1)2 + (0-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)
• Sample: i4
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (1-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (1-0)2 + (1-0)2 = 1+.25+1+1=3.25
2
n
d2 = (Euclidean distance)2 = (i w j ,k (t ))
k 1 l , k
27
Examples of Applications
28
Summary
• Unsupervised learning is very common
• US learning requires redundancy in the stimuli
• Self organization is a basic property of the brain’s
computational structure
• SOMs are based on
– competition (wta units)
– cooperation
– synaptic adaptation
• SOMs conserve topological relationships between
the stimuli
• Artificial SOMs have many applications in
29
computational neuroscience
Learning Vector Quantization (LVQ)
• LVQ is a hybrid network – it uses
both, supervised and unsupervised
learning
• The number of neurons in the first
Layer (S1) is usually greater than
the number of neurons in the second
layer (S2)
In layer 1, several neurons are assigned
to similar class, in layer 2, these subclasses are transformed to classes
• The output of the first layer:
If the classification is incorrect, it means that the wrong neuron won the
competition, so we move it away from the input
• We now must choose the number of subclasses each class consists of.
Let’s assume we have 2 subclasses for each class (i.e each class
consists of 2 convex regions). This implies that we have 4 neurons in
layer 1 (hidden layer) . A class is a union of it’s subclasses. This will
allow us to combine disconnected regions to form a single class. Layer
2 will have 2 neurons (equal to number of classes)
• We design W^2 (which will not be altered further)
• We initialize W^1 randomly