Unit II Deep Learning
Unit II Deep Learning
UNIT-II Unsupervised Learning Network- Introduction, Fixed Weight Competitive Nets, Maxnet,
Hamming Network, Kohonen Self-Organizing Feature Maps, Learning Vector Quantization, Counter
Propagation Networks, Adaptive Resonance Theory Networks. Special Networks-Introduction to various
networks.
Unsupervised learning
Unsupervised learning is the training of a machine using information that is neither classified
nor labeled and allowing the algorithm to act on that information without guidance. Here the
task of the machine is to group unsorted information according to similarities, patterns, and
differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the
machine. Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as
‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and
differences, i.e., we can easily categorize the above picture into two parts. The first may contain
all pics having dogs in them and the second part may contain all pics having cats in them. Here
you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was previously
undetected. It mainly deals with unlabelled data.
Unsupervised learning is classified into two categories of algorithms:
Clustering: A clustering problem is where you want to discover the inherent groupings in the
data, such as grouping customers by purchasing behavior.
Association: An association rule learning problem is where you want to discover rules that
describe large portions of your data, such as people that buy X also tend to buy Y.
Types of Unsupervised Learning:-
Clustering
1. Exclusive (partitioning)
2. Agglomerative
3. Overlapping
4. Probabilistic
Clustering Types:-
1. Hierarchical clustering
2. K-means clustering
Architecture
It uses the mechanism which is an iterative process and each node receives inhibitory inputs
from all other nodes through connections. The single node whose value is maximum would be
active or winner and the activations of all other nodes would be inactive.
Hamming networks
This kind of network is Hamming network, where for every given input vectors, it would be
clustered into different groups. Following are some important features of Hamming Networks −
Lippmann started working on Hamming networks in 1987.
It is a single layer network.
The inputs can be either binary {0, 1} of bipolar {-1, 1}.
The weights of the net are calculated by the exemplar vectors.
It is a fixed weight network which means the weights would remain the same even during
training.
Hamming Distance
Kohonen Self-Organizing feature map (SOM) refers to a neural network, which is trained using competitive learning. Basic
competitive learning implies that the competition process takes place before the cycle of learning. The competition process
suggests that some criteria select a winning processing element.
The self-organizing map makes topologically ordered mappings between input data and processing elements of the map.
Topological ordered implies that if two inputs are of similar characteristics, the most active processing elements answering
to inputs that are located closed to each other on the map. The weight vectors of the processing elements are organized in
ascending to descending order. W i < Wi+1 for all values of i or W i+1 for all values of i (this definition is valid for one-
dimensional self-organizing map only).
The self-organizing map is typically represented as a two-dimensional sheet of processing elements described in the figure
given below. Each processing element has its own weight vector, and learning of SOM (self-organizing map) depends on the
adaptation of these vectors. The processing elements of the network are made competitive in a self-organizing process,
and specific criteria pick the winning processing element whose weights are updated. Generally, these criteria are used to
limit the Euclidean distance between the input vector and the weight vector. SOM (self-organizing map) varies from basic
competitive learning so that instead of adjusting only the weight vector of the winning processing element also weight
vectors of neighboring processing elements are adjusted
It is discovered by Finnish professor and researcher Dr. Teuvo Kohonen in 1982. The self-organizing map refers to an
unsupervised learning model proposed for applications in which maintaining a topology between input and output spaces
All the entire learning process occurs without supervision because the nodes are self-organizing. They are also known as
feature maps, as they are basically retraining the features of the input data, and simply grouping themselves as indicated
by the similarity between each other. It has practical value for visualizing complex or huge quantities of high dimensional
data and showing the relationship between them into a low, usually two-dimensional field to check whether the given
unlabeled data have any structure to it.
A self-Organizing Map (SOM) varies from typical artificial neural networks (ANNs) both in its architecture and algorithmic
properties. Its structure consists of a single layer linear 2D grid of neurons, rather than a series of layers. All the nodes on
this lattice are associated directly to the input vector, but not to each other. It means the nodes don't know the values of
their neighbors, and only update the weight of their associations as a function of the given input. The grid itself is the map
that coordinates itself at each iteration as a function of the input data. As such, after clustering, each node has its own
coordinate (i.j), which enables one to calculate Euclidean distance between two nodes
A Self-Organizing Map utilizes competitive learning instead of error-correction learning, to modify its weights. It implies that
only an individual node is activated at each cycle in which the features of an occurrence of the input vector are introduced
to the neural network, as all nodes compete for the privilege to respond to the input.
The selected node- the Best Matching Unit (BMU) is selected according to the similarity between the current input values
and all the other nodes in the network. The node with the fractional Euclidean difference between the input vector, all
nodes, and its neighboring nodes is selected and within a specific radius, to have their position slightly adjusted to
coordinate the input vector. By experiencing all the nodes present on the grid, the whole grid eventually matches the entire
input dataset with connected nodes gathered towards one area, and dissimilar ones are isolated.
Algorithm:
Step:1
Step:2
Step:3
Step:4
Calculate the Euclidean distance between weight vector w ij and the input vector x(t) connected with the first node,
where t, i, j =0.
Step:5
track the node that generates the smallest distance t.
Step:6
Calculate the overall Best Matching Unit (BMU). It means the node with the smallest distance from all calculated
ones.
Step:7
Discover topological neighborhood βij(t) its radius σ(t) of BMU in Kohonen Map.
Step:8
Repeat for all nodes in the BMU neighborhood: Update the weight vector w_ij of the first node in the neighborhood
of the BMU by including a fraction of the difference between the input vector x(t) and the weight w(t) of the neuron.
Wij(new)=wij(old)+alpha[xi-wij(old)]
Step:9
Repeat the complete iteration until reaching the selected iteration limit t=n.
Here, step 1 represents initialization phase, while step 2 to 9 represents the training phase.
Where;
t = current iteration.
W= weight vector
X = input vector
β_ij = the neighborhood function, decreasing and representing node i,j distance from the BMU.
σ(t) = The radius of the neighborhood function, which calculates how far neighbor nodes are examined in the 2D
grid when updating vectors.
given :
Let’s say that an input data of size ( m, n ) where m is the number of training examples and n is the number of features in each
example and a label vector of size ( m, 1 ). First, it initializes the weights of size ( n, c ) from the first c number of training samples
with different labels and should be discarded from all training samples. Here, c is the number of classes. Then iterate over the
remaining input data, for each training example, it updates the winning vector ( weight vector with the shortest distance ( e.g
Euclidean distance ) from the training example ).
The weight updation rule is given by:
if correctly_classified:
wij(new) = wij(old) + alpha(t) * (x ik - wij(old))
else:
wij(new) = wij(old) - alpha(t) * (xik - wij(old))
where alpha is a learning rate at time t, j denotes the winning vector, i denotes the i th feature of training example and k denotes the
kth training example from the input data. After training the LVQ network, trained weights are used for classifying new examples. A
new example is labelled with the class of the winning vector.
Algorithm LVQ :
Counter propagation network (CPN) were proposed by Hecht Nielsen in [Link] are
multilayer network based on the combinations of the input, output, and clustering layers. The
application of counter propagation net are data compression, function approximation and pattern
association. The counter propagation network is basically constructed from an instar-outstar
model. This model is three layer neural network that performs input-output data mapping,
producing an output vector y in response to input vector x, on the basis of competitive learning.
The three layer in an instar-outstar model are the input layer, the hidden(competitive) layer and
the output layer.
There are two stages involved in the training process of a counter propagation net. The input
vector are clustered in the first stage. In the second stage of training, the weights from the cluster
layer units to the output units are tuned to obtain the desired response.
There are two types of counter propagation network:
Full CPN
• The Full CPN allows to produce a correct output even when it is given an
input vector that is partially incomplete or incorrect.
• In first phase, the training vector pairs are used to form clusters using
either dot product or Euclidean distance.
• If dot product is used, normalization is a must.
• During second phase, the weights are adjusted between the cluster units
and output units.
• The architecture of CPN resembles an instar and outstar model.
• The model which connects the input layers to the hidden layer is called
Instar model and the model which connects the hidden layer to the output
layer is called Outstar model.
• The weights are updated in both the Instar (in first phase) and Outstar
model (second phase).
• The network is fully interconnected network.
Architecture of Full Counter propagation
X1 w Hidden layer Y1
Xi u Yk
Xn Z1 Z Ym
j t
Zp
Y 1* X 1*
Y k* X i*
Y m* Cluster
X n*
layer
First phase of Full CPN
• This phase of training is called as In star modeled training.
• The active units here are the units in the x-input, z-cluster and y-input
layers.
• The winning unit uses standard Kohonen learning rule for its weigh
updation.
• The rule is: • v ij(new)= vij(old) + α(xi– vij (old)
= (1- α)vij(old) + α.xi ;where i=1 to n
• w kj(new)= wkj(old) + β(yk– wi k (old)
= (1- β)wkj(old) + β.yk;where k=1 to n
• x* - Approximation to vector x.
• y* - Approximation to vector y.
XY
w XY
X1 u
Z1 Y1
Xi Zj Yk
Xn Zp
Ym
Input layer Cluster layer Output layer
Basic of Adaptive Resonance Theory (ART) Architecture The adaptive resonant theory is a
type of neural network that is self-organizing and competitive. It can be of both types, the
unsupervised ones(ART1, ART2, ART3, etc) or the supervised ones(ARTMAP). Generally, the
supervised algorithms are named with the suffix “MAP”. But the basic ART model is
unsupervised in nature and consists of :
F1 layer or the comparison field(where the inputs are processed)
F2 layer or the recognition field (which consists of the clustering units)
The Reset Module (that acts as a control mechanism)
The F1 layer accepts the inputs and performs some processing and transfers it to the F2 layer
that best matches with the classification factor. There exist two sets of weighted
interconnection for controlling the degree of similarity between the units in the F1 and the F2
layer. The F2 layer is a competitive layer. The cluster unit with the large net input becomes the
candidate to learn the input pattern first and the rest F2 units are ignored. The reset unit makes
the decision whether or not the cluster unit is allowed to learn the input pattern depending on
how similar its top-down weight vector is to the input vector and to the decision. This is called
the vigilance test.
Thus we can say that the vigilance parameter helps to incorporate new memories or new
information. Higher vigilance produces more detailed memories, lower vigilance produces more
general memories.
Generally two types of learning exists,slow learning and fast learning. In fast learning, weight
update during resonance occurs rapidly. It is used in [Link] slow learning, the weight change
occurs slowly relative to the duration of the learning trial. It is used in ART2.
Application of ART:
ART stands for Adaptive Resonance Theory. ART neural networks used for fast, stable learning
and prediction have been applied in different areas. The application incorporates target
recognition, face recognition, medical diagnosis, signature verification, mobile control robot.
Target recognition:
Fuzzy ARTMAP neural network can be used for automatic classification of targets depend on
their radar range profiles. Tests on synthetic data show the fuzzy ARTMAP can result in
substantial savings in memory requirements when related to k nearest neighbor(kNN) classifiers.
The utilization of multiwavelength profiles mainly improves the performance of both kinds of
classifiers.
Medical diagnosis:
Medical databases present huge numbers of challenges found in general information
management settings where speed, use, efficiency, and accuracy are the prime concerns. A direct
objective of improved computer-assisted medicine is to help to deliver intensive care in
situations that may be less than ideal. Working with these issues has stimulated several ART
architecture developments, including ARTMAP-IC.
Signature verification:
Automatic signature verification is a well known and active area of research with various
applications such as bank check confirmation, ATM access, etc. the training of the network is
finished using ART1 that uses global features as input vector and the verification and recognition
phase uses a two-step process. In the initial step, the input vector is coordinated with the stored
reference vector, which was used as a training set, and in the second step, cluster formation takes
place.
Nowadays, we perceive a wide range of robotic devices. It is still a field of research in their
program part, called artificial intelligence. The human brain is an interesting subject as a model
for such an intelligent system. Inspired by the structure of the human brain, an artificial neural
emerges. Similar to the brain, the artificial neural network contains numerous simple
computational units, neurons that are interconnected mutually to allow the transfer of the signal
from the neurons to neurons. Artificial neural networks are used to solve different issues with
good outcomes compared to other decision algorithms.
Limitations of Adaptive Resonance Theory Some ART networks are inconsistent (like the
Fuzzy ART and ART1) as they depend upon the order of training data, or upon the learning
rate.
Special Networks
Interconnections
Activation functions
Learning rules
Interconnections:
Interconnection can be defined as the way processing elements (Neuron) in ANN are connected
to each other. Hence, the arrangements of these processing elements and geometry of
interconnections are very essential in ANN.
These arrangements always have two layers that are common to all network architectures, the
Input layer and output layer where the input layer buffers the input signal, and the output layer
generates the output of the network. The third layer is the Hidden layer, in which neurons are
neither kept in the input layer nor in the output layer. These neurons are hidden from the people
who are interfacing with the system and act as a black box to them. By increasing the hidden
layers with neurons, the system’s computational and processing power can be increased but the
training phenomena of the system get more complex at the same time.
There exist five basic types of neuron connection architecture :
In this type of network, we have only two layers input layer and the output layer but the input
layer does not count because no computation is performed in this layer. The output layer is
formed when different weights are applied to input nodes and the cumulative effect per node is
taken. After this, the neurons collectively give the output layer to compute the output signals.
2. Multilayer feed-forward network
This layer also has a hidden layer that is internal to the network and has no direct contact with
the external layer. The existence of one or more hidden layers enables the network to be
computationally stronger, a feed-forward network because of information flow through the
input function, and the intermediate computations used to determine the output Z. There are no
feedback connections in which outputs of the model are fed back into itself.
3. Single node with its own feedback
When outputs can be directed back as inputs to the same layer or preceding layer nodes, then it
results in feedback networks. Recurrent networks are feedback networks with closed loops. The
above figure shows a single recurrent network having a single neuron with feedback to itself.
4. Single-layer recurrent network
The above network is a single-layer network with a feedback connection in which the
processing element’s output can be directed back to itself or to another processing element or
both. A recurrent neural network is a class of artificial neural networks where connections
between nodes form a directed graph along a sequence. This allows it to exhibit dynamic
temporal behavior for a time sequence. Unlike feedforward neural networks, RNNs can use
their internal state (memory) to process sequences of inputs.
5. Multilayer recurrent network
In this type of network, processing element output can be directed to the processing element in
the same layer and in the preceding layer forming a multilayer recurrent network. They perform
the same task for every element of a sequence, with the output being dependent on the previous
computations. Inputs are not needed at each time step. The main feature of a Recurrent Neural
Network is its hidden state, which captures some information about a sequence.