0% found this document useful (0 votes)
42 views35 pages

Competitive Learning Extended

Here are the steps to calculate the weight updates for the first four training samples: 1) Present sample i1 (1, 1, 0, 0) - Calculate Euclidean distance between i1 and each unit's weights - Unit 1 distance = sqrt((1-0.2)^2 + (1-0.6)^2 + (0-0.5)^2 + (0-0.9)^2) = 0.8 - Unit 2 distance = sqrt((1-0.8)^2 + (1-0.4)^2 + (0-0.7)^2 + (0-0.3)^2) = 1.2 - Unit 1 wins competition 2)

Uploaded by

Sannan Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views35 pages

Competitive Learning Extended

Here are the steps to calculate the weight updates for the first four training samples: 1) Present sample i1 (1, 1, 0, 0) - Calculate Euclidean distance between i1 and each unit's weights - Unit 1 distance = sqrt((1-0.2)^2 + (1-0.6)^2 + (0-0.5)^2 + (0-0.9)^2) = 0.8 - Unit 2 distance = sqrt((1-0.8)^2 + (1-0.4)^2 + (0-0.7)^2 + (0-0.3)^2) = 1.2 - Unit 1 wins competition 2)

Uploaded by

Sannan Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Competitive Learning

Lecture # 8

Prepared
By

Dr. Imran Naseem


Introduction
We will discuss three types of networks that adaptively learn to classify
patterns: (1) Competitive Network, (2) The Feature Map, and (3) The
Learning Vector Quantization (LVQ) Network
These are very similar (in structure and operation) to HAMMING Networks.
Hamming Network: Basically used for binary classification, has prototypes,
decides on basis of ‘closeness’ of I/P vector with prototypes

S= number of neurons=number of prototypes/classes; R= dim. Of feature


space
• Consists of two layers with equal number of neurons (S).
• Feedforward Layer: Performs correlation b/w I/P pattern and the
prototypes. Prototypes (training samples) are set to be the weights of
the first layer. Consider a classification problem b/w apple and oranges.
p_1 is orange prototype and p_2 is apple prototype

• So the output of the feedforward layer:

• Inner product of two equal length vectors is maximum when they are
parallel (closest in the sense of angle b/w input and prototypes)
• Therefore the neuron of the FF layer with the highest output is
considered to be closest to the I/P vector
• The output of the FF layer is fed to the recurrent (competitive) layer, the
neurons of the competitive layers compete, at the end of the competition
only the winning neuron will have non zero value.
‘poslin’ is linear for positive values and is zero for negative values
Output of each neuron is decreased proportional to other, a larger value
is decreased less and lesser value will decrease more. The output of
the recurrent layer will be only one neuron with nonzero value. (It
effectively is the same neuron with the max output in the first layer)
is a design parameter with value less than 1/(S-1)
Let’s test our Hamming network on an orange
The output of the FF layer:
Layer 1:
• Recall a single instar could recognize only one pattern, for multiclass problem
we need to have multiple instars. This is accomplished in the Hamming
network
• Let’s have Q number of prototypes

• The number of neurons (S) equals to the number of prototypes (Q). The output
of the first layer:

• Note that no nonlinearity is used in the first layer.


Layer 2:
In a single instar we used hardlim nonlinearity to decide the output of
instar . For multiple instars we will be using a competitive layer .
The first layer output is used to initialize the second layer:

To simplify things we will indicate competitive layer as a=compet(n)


It works by finding the index i* of neuron with the largest net input,
setting its output to 1 and all other outputs are set to 0
• Competitive Learning: We can now design a competitive network
classifier by setting the rows of W to the prototype vectors. But we need
to have a learning rule that could be used to train the weights in a
competitive network without knowing the prototype vectors.
• One such rule is the instar rule:
• In the competitive networks, output (a) is nonzero only for the winning
neuron (i*), we can therefore use the Kohonen rule:

• And for all other neurons


• Thus the row of the weight matrix ‘W’ which is closest to
the input vector ‘p’ moves towards ‘p’
Competitive Networks- Example

• Our competitive network will have 3 neurons (3 classes), we initialize


the three weight-vectors randomly:

• Let’s present the sample‘p_2’ to the network:


• 2 nd neuron turns out to be the closes to the input vector,  i*=2; we
therefore update the weight of the 2nd neuron only:

• The 2nd weight vector has moved towards p_2


• If we continue our learning by presenting the randomly picked inputs
to the network, each weight vector will point in direction of a
particular cluster and will become a prototype for that cluster.
• Once learning is completed, a test pattern is assigned to the cluster
whose weight vector is closest.
• Problem 1 - Learning rate: low value (near zero)  slow learning but
stable weights, i.e. once a weight vector reaches the center of a cluster,
it tends to remain at the center
High value (near one) fast learning but unstable weights i.e. weight
vector reaches the cluster center very fast, but oscillates as learning
continues
• Problem 2: What if clusters are near? One weight vector (prototype)
may invade the territory of the other weight vector and upset the
classification scheme.
• Problem 3: Competitive layer should have as many number of neurons
as the number of classes. What if number of clusters are not known in
advance?
• Problem 4: What if classes are union of unconnected regions?
• Problem 5: Occasionally it may happen that initial weights of a neuron
are far away from input vectors, the neuron will never win a
competition (will never learn) and is therefore a dead neuron.
– One possible solution: Add a negative bias to the net output of each neuron,
decrease the bias further of winning neuron after every iteration. It will make
winning harder for an often-winning neuron
Self Organizing Feature Maps (SOFM)

• Biological Perspective: A winning neuron not only reinforces itself but


also the neighboring neurons.
• Inspired by Biology, Kohonen proposed this modification to the
competitive networks.
• Winning neuron i* is found using the same procedure, but weight
update is applied also to the neurons lying in the neighborhood ‘d’ of
the winning neuron

• When an input ‘p’ is presented, the winning neuron and its neighboring
weight vectors will move towards ‘p’ i.e. they learn ‘p’ Neighborhood of radius 2
• The concept of neighborhood:

Neighborhood of radius 1
• 2D topology of neurons is not mandatory. Neurons can be arranged in
any dimension
• SOFM Demonstration:

• Initially weights are selected at random, we try to train our SOFM with
the vectors (indicated in black color)
Each time a training vector is presented, the neuron with the closest weight
vector will win the competition. The winning neuron and its neighbors
move their weight vectors closer to the input vector (and therefore to
each other), we have used a neighborhood of radius 1
Improving Feature Maps
• Varying neighborhood, initially we have large neighborhood which
continuously decreases as training progresses. Ultimately the
neighborhood only includes the winning neuron  speeds up the self-
organizing
• The learning rate can also be varied over time, initially it is taken to be
1 to quickly learn the patterns and then gradually decreased to stabilize
the weights.
• Another alteration: The winning neuron has the larger learning rate
than the neighboring neurons speeds up the SOM
• Rather than using correlation, simple Euclidean distance can be used to
see which neuron is the closest to the input vector.
Another Self-Organizing Map
(SOM) Example
• From Fausett (1994)
• Dimension of feature space = 4, number of clusters
=2
– More typical of SOM application
– Smaller number of units in output than in input;
dimensionality reduction
• Training samples Network Architecture
i1: (1, 1, 0, 0) Input units:
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Output units: 1 2
i4: (0, 0, 1, 1)
16
What should we expect as outputs?
What are the Euclidean Distances
Between the Data Samples?
• Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i1 i2 i3 i4
i3: (1, 0, 0, 0)
i1 0
i4: (0, 0, 1, 1)
i2 0
i3 0
i4 0
17
Euclidean Distances Between Data
Samples
• Training samples
i1: (1, 1, 0, 0)
i1 i2 i3 i4
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0) i1 0
i4: (0, 0, 1, 1) i2 3 0
i3 1 2 0
Input units: i4 4 1 3 0

Output units: 1 2 What might we expect from the SOM?


18
Example Details Input units:

• Training samples
i1: (1, 1, 0, 0) Output units: 1 2
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)
• With only 2 outputs, neighborhood = 0
– Only update weights associated with winning output unit (cluster) at each
iteration
• Learning rate
(t) = 0.6; 1 <= t <= 4
(t) = 0.5 (1); 5 <= t <= 8
(t) = 0.5 (5); 9 <= t <= 12
etc.
• Initial weight matrix Unit 1: .2 .6 .5 .9
(random values between 0 and 1) .8 .4 .7 .3
Unit 2:  
2

n
d2 = (Euclidean distance)2 = k 1
(il , k  w j , k (t ))
Weight update: w j (t  1)  w j (t )   (t )(il  w j (t ))
19
Problem: Calculate the weight updates for the first four
i1: (1, 1, 0, 0)
First Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .2 .6 .5 .9 i4: (0, 0, 1, 1)
• Training sample: i1 Unit 2:
.8 .4 .7 .3
 
– Unit 1 weights
• d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
– Unit 2 weights
• d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98
– Unit 2 wins
– Weights on winning unit are updated
new  unit2  weights  [.8 .4 .7 .3]  0.6([1 1 0 0] - [.8 .4 .7 .3]) 
[.92 .76 .28 .12]
– Giving an updated weight matrix:
Unit 1:  .2 .6 .5
.9 
 
Unit 2: .92 .76 .28 .12
20
i1: (1, 1, 0, 0)
Second Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1:  .2 .6 .5 .9  i4: (0, 0, 1, 1)
• Training sample: i2 
Unit 2: .92 .76 .28 .12

– Unit 1 weights
• d2 = (.2-0)2 + (.6-0)2 + (.5-0)2 + (.9-1)2 = .66
– Unit 2 weights
• d2 = (.92-0)2 + (.76-0)2 + (.28-0)2 + (.12-1)2 = 2.28
– Unit 1 wins
– Weights on winning unit are updated
new  unit1  weights  [.2 .6 .5 .9]  0.6([0 0 0 1] - [.2 .6 .5 .9]) 
[.08 .24 .20 .96]
– Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.92 .76 .28 .12
Unit 2:  
21
i1: (1, 1, 0, 0)
Third Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .08 .24 .20 .96 i4: (0, 0, 1, 1)
• Training sample: i3 .92 .76 .28 .12
Unit 2:  
– Unit 1 weights
• d2 = (.08-1)2 + (.24-0)2 + (.2-0)2 + (.96-0)2 = 1.87
– Unit 2 weights
• d2 = (.92-1)2 + (.76-0)2 + (.28-0)2 + (.12-0)2 = 0.68
– Unit 2 wins
– Weights on winning unit are updated
new  unit 2  weights  [.92 .76 .28 .12]  0.6([1 0 0 0] - [.92 .76 .28 .12]) 
[.97 .30 .11 .05]
– Giving an updated weight matrix:
Unit 1: .08 .24 .20 .96
.97 .30 .11 .05
Unit 2:  
22
i1: (1, 1, 0, 0)
Fourth Weight Update i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
Unit 1: .08 .24 .20 .96 i4: (0, 0, 1, 1)
• Training sample: i4 
Unit 2: .97 .30 .11 .05

– Unit 1 weights
• d2 = (.08-0)2 + (.24-0)2 + (.2-1)2 + (.96-1)2 = .71
– Unit 2 weights
• d2 = (.97-0)2 + (.30-0)2 + (.11-1)2 + (.05-1)2 = 2.74
– Unit 1 wins
– Weights on winning unit are updated
new  unit1  weights  [.08 .24 .20 .96]  0.6([0 0 1 1] - [.08 .24 .20 .96]) 
[.03 .10 .68 .98]
– Giving an updated weight matrix:
Unit 1: .03 .10 .68 .98
Unit 2:
.97 .30 .11 .05
 
23
Applying the SOM Algorithm
Data sample utilized

time (t) 1 2 3 4 D(t) (t)


1 Unit 2 0 0.6
2 Unit 1 0 0.6
3 Unit 2 0 0.6
4 Unit 1 0 0.6

‘winning’ output unit

After many iterations (epochs)


through the data set:
Unit 1:  0 0 .5 1.0
Unit 2:
1.0 .5 0 0 
 

Did we get the clustering that we expected? 24


Training samples
i1: (1, 1, 0, 0)
i2: (0, 0, 0, 1)
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1)

Input units: Weights


Unit 1:  0 0 .5 1.0
Unit 2:
1.0 .5 0 0 
Output units: 1 2  

What clusters do the


data samples fall into?
25
Training samples
i1: (1, 1, 0, 0)
Solution Weights
Unit 1:  0 0 .5 1.0
i2: (0, 0, 0, 1) Input units:
Unit 2:
1.0 .5 0 0 
 
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1) Output units: 1 2

• Sample: i1
– Distance from unit1 weights
• (1-0)2 + (1-0)2 + (0-.5)2 + (0-1.0)2 = 1+1+.25+1=3.25
– Distance from unit2 weights
• (1-1)2 + (1-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)

• Sample: i2
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (0-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (0-0)2 + (1-0)2 =1+.25+0+1=2.25
2

n
d2 = (Euclidean distance)2 = (i  w j ,k (t ))
k 1 l , k
26
Training samples
i1: (1, 1, 0, 0)
Solution Weights
Unit 1:  0 0 .5 1.0
i2: (0, 0, 0, 1) Input units:
Unit 2:
1.0 .5 0 0 
 
i3: (1, 0, 0, 0)
i4: (0, 0, 1, 1) Output units: 1 2

• Sample: i3
– Distance from unit1 weights
• (1-0)2 + (0-0)2 + (0-.5)2 + (0-1.0)2 = 1+0+.25+1=2.25
– Distance from unit2 weights
• (1-1)2 + (0-.5)2 + (0-0)2 + (0-0)2 = 0+.25+0+0=.25 (winner)

• Sample: i4
– Distance from unit1 weights
• (0-0)2 + (0-0)2 + (1-.5)2 + (1-1.0)2 = 0+0+.25+0 (winner)
– Distance from unit2 weights
• (0-1)2 + (0-.5)2 + (1-0)2 + (1-0)2 = 1+.25+1+1=3.25
2

n
d2 = (Euclidean distance)2 = (i  w j ,k (t ))
k 1 l , k
27
Examples of Applications

• Kohonen (1984). Speech recognition - a map


of phonemes in the Finish language
• Optical character recognition - clustering of
letters of different fonts
• Angeliol etal (1988) – travelling salesman
problem (an optimization problem)
• Kohonen (1990) – learning vector quantization
(pattern classification problem)
• Ritter & Kohonen (1989) – semantic maps

28
Summary
• Unsupervised learning is very common
• US learning requires redundancy in the stimuli
• Self organization is a basic property of the brain’s
computational structure
• SOMs are based on
– competition (wta units)
– cooperation
– synaptic adaptation
• SOMs conserve topological relationships between
the stimuli
• Artificial SOMs have many applications in
29
computational neuroscience
Learning Vector Quantization (LVQ)
• LVQ is a hybrid network – it uses
both, supervised and unsupervised
learning
• The number of neurons in the first
Layer (S1) is usually greater than
the number of neurons in the second
layer (S2)
In layer 1, several neurons are assigned
to similar class, in layer 2, these subclasses are transformed to classes
• The output of the first layer:

• Note that ‘closeness’ is measured using distance rather than correlation


• The neuron with weight vector closest to the input will output a=1,
other neurons will output 0
• Competitive Network: The nonzero neuron indicates the class to
which the input pattern belongs
• LVQ: The nonzero neuron indicates a subclass (rather than a class)
There may be several different neurons (subclasses) that makeup
each class.
The second layer is used to combine subclasses into a single class.
This is done with the W^2 matrix (columns of W^2subclasees,
rows of W^2classes)
W^2 has a single 1 in each column, with the other elements set to
zero. The row in which 1 occurs represents the class to which the
subclass belongs.
The process of combining subclasses to get a class allows the LVQ
network to create complex boundaries.
Mechanism of the LVQ Network
• Competitive learning is combined with supervised learning
• Set of training samples
• For instance a 3D input sample belonging to class 2 with total 4
classes can be represented by using target vector t_1
• A train vector is presented to the LVQ network
• If the LVQ network classifies it correctly, the
Closest neuron is moved towards the input vector:

If the classification is incorrect, it means that the wrong neuron won the
competition, so we move it away from the input

All other neurons are left unaltered


Worked Example

• We begin by assigning target vectors to each pattern

• We now must choose the number of subclasses each class consists of.
Let’s assume we have 2 subclasses for each class (i.e each class
consists of 2 convex regions). This implies that we have 4 neurons in
layer 1 (hidden layer) . A class is a union of it’s subclasses. This will
allow us to combine disconnected regions to form a single class. Layer
2 will have 2 neurons (equal to number of classes)
• We design W^2 (which will not be altered further)
• We initialize W^1 randomly

• We start our learning using p_3


• The third winning neuron is closest to p_3, in order to determine which
class it belongs to, we process layer 2

• LVQ network classify p_3 as class 2 (which is correct), so we move


the corresponding weight vector towards p_3

You might also like