Unit – III
A Survey of Neural Network Model
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Single Layer Perceptron
Perceptron the first adaptive network architecture was invented
by Frank Rosenblatt in 1957.
It can be used for the classification of patterns that are linearly
separable.
Fundamentally, it consists of a single neuron with adjustable
weights and bias.
This algorithm is suitable for binary/bipolar input vector with
bipolar target
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Architecture of Single Layer Perceptron
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm
Step 1: Initialize all weights and bias to Zero and set learning rate α ( (0 to
1)
- wi = 0 for i =1 to n where n is number of input neurons and b =0.
Step 2: While stopping condition is false do step 3-7.
Step 3: For each input training pair s:t do steps 4-6.
Step 4: Set activation for input units with the input vectors.
- xi = Si ( i= 1 to n)
Step 5: Compute the output unit response
Y in = b + ∑xwi
i i
The activation function is used is,
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm…
Step 6: The weights and bias are updated if the target is not equal to
the output response.
If t ≠ y and the value of xi is not zero
- wi(new) = wi(old) + α xi t
- b(new) = b(old) + α t
Else
- wi(new) = wi(old)
- b(new) = b(old)
Step 7 : test for stopping condition
5
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Problems
1. Apply the Perceptron to the training the pattern that define AND
function with bipolar inputs and targets.
2. Apply the Perceptron to the training patterns that define OR function
input and target.
3. Classify the two dimensional input patterns (representing letters)
using the Perceptron rule (The T-C Problem).
4. Classify the two dimensional input patterns (representing letters)
using the Perceptron rule (The I- J Problem).
5. Apply the Perceptron to the training patterns that define XOR
function input and target.
6
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Application Procedure
Step 1: The weights to be used here are taken from the training
algorithm
Step 2: For each input vector x to be classified do steps 3- 4.
Step 3: Input units activations are set
Step 4: Calculate the response of output unit,
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
More Problem
For the following noisy version of training patterns, identify the
response of network by segregating it into correct, incorrect, or
indefinite.
(0 -1 -1) , ( 0 1 -1), (0 0 1), (0 0 -1), (0 1 0), (1 0 1),
( 1 0 -1), ( 1 -1 0), (1 0 0), (1 1 0), (0 -1 0), (1 1 1)
Solution (Hints)
- If x1w1 + x2w2 + x3w3 > 0 then the response is correct
- If x1w1 + x2w2 + x3w3 < 0 then the response is incorrect
- If x1w1 + x2w2 + x3w3 = 0 then the response is indefinite
8
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
More Problem..
Using the perceptron learning rule, find the weights required to
perform the following classifications. Vector (1 1 1 1), (-1 1 -1 -1) and
( 1 -1 -1 1) are the member of class (having target value 1); Vectors (
1 1 1 -1) and ( 1 -1 -1 1) are not the members of class (having target
value -1). Use learning rate of 1 and starting weights of zero (0).
Using each of the training and vectors as input, test the response of
the net.
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Least Mean Square
This rule is refer Delta Learning Rule or Widrow Hoff Rule
The delta rule is valid only for continuous activation functions and in
the supervised training mode.
The learning signal for this rule is called delta.
The adjustment made to a synaptic weight of a neuron is proportional
to the product of the error signal and the input signal of the synapse.
The delta rule can be applied for single output unit and several output
unit.
10
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Derivation of Delta Rule
The delta rule changes the weight of the connections to minimize the
difference between the net input to the output unit, yin and the target
value t.
The delta rule is given by
∆wi = α(t – yin) xi
- Where x is the vector of activation of input units.
- yin is the net input to output unit
- t is the target vector
- α is the learning rate
11
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Derivation of Delta Rule…
The mean square error for a particular training pattern is E = ∑ j inj
(
j
t − y ) 2
The gradient of E is a Vector consisting of the partial derivatives of E
with respect to each of the weights. The error can be reduced rapidly
by adjusting weight wij
Taking partial differentiation of E w.r.t. wij
∂E
= ∑ (t j − yinj ) 2
∂wij i
∂
= (t j − yinj ) 2
∂wij
12
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Derivation of Delta Rule…
Since the weight wij influence the error only at output unit yJ
Thus the error will be reduced rapidly depending upon the given
learning by adjusting the weights:
13
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Multilayer Perceptron
Hidden layers of computation nodes
input propagates in a forward direction, layer-by-layer basis
- also called Multilayer Feedforward Network, MLP
Error back-propagation algorithm
- supervised learning algorithm
- error-correction learning algorithm
- Forward pass
o input vector is applied to input nodes
o its effects propagate through the network layer-by-layer
o with fixed synaptic weights
- backward pass
o synaptic weights are adjusted in accordance with error signal
o error signal propagates backward, layer-by-layer fashion
14
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
MLP Distinctive Characteristics
Non-linear activation function 1
yi =
- differentiable 1 + exp( −v j )
- sigmoidal function, logistic function
- nonlinearity prevent reduction to single-layer perceptron
One or more layers of hidden neurons
- progressively extracting more meaningful features from input
patterns
High degree of connectivity
Nonlinearity and high degree of connectivity makes theoretical analysis
difficult
Learning process is hard to visualize
BP is a landmark in NN: computationally efficient training
15
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Preliminaries
Function signal
- input signals comes in at the input end of the network
- propagates forward to output nodes
Error signal
- originates from output neuron
- propagates backward to input nodes
Two computations in Training
- computation of function signal
- computation of an estimate of gradient vector
o gradient of error surface with respect to the weights
16
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Back Propagation Network (BPN)
Back propagation is a systematic method for training multi-layer
artificial neural networks.
It has a mathematical foundation that is strong if not highly practical.
It is multi-layer forward network using extend gradient descent based
delta learning rule.
Back propagation provides a computationally efficient method for
changing the weights in feed forward network with differentiable
activation function units, to learn a training set of input-output.
17
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Architecture
18
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Training Algorithm
The training algorithm of back propagation involves four stages:
1. Initialization of weights
2. Feed forward
3. Back Propagation of errors
4. Update of the weights and biases
19
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Parameters
x: input training vector
x = (x1, x2, …, xn)
t: Output target vector
t = (t1, t2, …, tm)
α : learning rate
δk : error at output unit yk
δj : error at hidden unit zj
yk = out put unit k.
20
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm
Step 1: Initialize weight to small random values
Step 2: While stopping condition is false, do steps 3-10.
Step 3: for each training pair do step 4-9
Feed Forward
Step 4: Each input unit receive the input signal xi and transmit this
signals to all unit in the layer above i.e. hidden units.
Step 5: Each hidden unit (zj : j = 1,..,p) sums it weighted input
signals.
21
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm..
Step 6: Each output unit (yk, k = 1,.., m) sum its weighted input
signals.
22
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm…
Back Propagation Error
Step 7: Each output unit (yk, k=1, …, m) receives a target pattern
corresponding to an input pattern error information term is calculated
as:
- δk = (tk – yk) f(y-ink)
Step 8: Each hidden unit (zj, j= 1,…, n) sum its delta inputs from units
in the layer above
- δ-inj = ∑ δj wjk
- The error information term is calculated as
- δj = δ-inj f(z-inj)
23
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm…
Updating of weight and biases
Step 9: Each output unit (yk, k = 1, …, m) updates it bias and weights
(j = 0, …, p)
The weight correction term is given by
- ∆wjk = αδkzj
And the bias correction term is given by
- ∆w0k = αδk
Therefore
- wjk(new) = wjk(old) + ∆wjk
- w0k(new) = w0k(old) + ∆w0k
24
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Algorithm…
Each hidden unit (zj, j = 1,…, p) updates it bias and weights (I =
0,…,n)
The weight correction term
- ∆vij = αδjxi
The bias correction term
∆v0j = αδj
Therefore
- vij(new) = vijold) + ∆vij
- v0j(new) = v0j(old) + ∆v0j
25
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Application Algorithm
Step 1: Initialize weights from training program
Step 2: For each input vector do steps 3-5.
Step 3: For i = 1, …, n; set activation of input unit xi:
Step 4: For j = 1,…, p
n
Z − inj = v oj + ∑i =1
x i v ij
Step 5: For k = 1, …, m
p
y − ink = w ok + ∑j=1
z jw jk
Y k = f ( y − ink )
26
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Merit of Back Propagation
1. The mathematical formula present here, can be applied to any
network and does not require any special mention of the features of
function to be learnt.
2. The computing time is reduced if the weights chosen are small at the
beginning.
3. The batch update of weights exist, which provides a smoothing
effect on the weight correction terms.
27
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Demerit of Back Propagation
1. The number of learning steps may be high and also the learning
phase has intensive calculations.
2. The selection of the number of hidden nodes in the network is a
problem. If number of hidden network is small, then the function to
be learnt may not be possibly represented, as capacity of network is
small. If the number of hidden neurons is increased, the number of
independent variable of the error function also increases and the
computing time also increases rapidly.
3. For complex problem it may require days or weeks to train the
network or it may not train at all. Long training time results in non-
optimum step size.
4. The network may get trapped in a local minima even though there is
much deeper minimum nearby.
5. The training may sometimes cause temporal instability to the system
28
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Problem
Find the new weights when the network illustrated in following figure.
The input pattern is [0.6 0.8 0] and target output is 0.9. Use learning
rate α = 0.3 and use binary sigmoid activation function.
29
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Solution
Step 1: Initialize the weight and bias
- w = [-1 1 2], w0 = [-1]
2 1 0
v = 1 2 2 v 0 = [0 0 − 1]
0 3 1
Step 3: For each training pair
- x = [0.6 0.8 0]
- t = [0.9]
Rest part of solution see on board
30
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Problem
Apply the Back Propagation Learning to the training patterns that
define XOR function input and target.
31
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Important point for the selection of parameters
Initial Weights
- It will influence whether the net reaches a global (or only a local)
minima of the error and if so how rapidly it converges.
- If the initial weight is too large the initial input signals to each hidden
or output unit will fall in the saturation region where derivative of
sigmoid has a very small value.
- If initial weights are too small, the net input of a hidden or output unit
will approach zero, which then causes extremely slow learning.
- To get better result the weight (and bias) are set to random numbers
between -0.5 and 0.5 or between -1 and 1.
32
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Important point for the selection of parameters
Number of hidden units
If the activation function can vary with the function, then it can be
seen that a n-input, m-output function requires at most 2n+1 hidden
units.
If more number of hidden layers are present, then the calculation for
the δ are repeated.
33
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Important point for the selection of parameters…
Selection of learning rate
A high learning rate leads to rapid learning but the weights may
oscillate, while a lower learning rate leads to slower learning. Method
suggested for adopting learning rate are:
Start with a high learning rate and steadily decrease it. Changes in the
weight vectors must be small in order to reduce oscillations or an
divergence.
A simple method is to increase the learning rate in order to improve
performance and to decrease the learning rate in order to worsen
performance.
Another method is to double the learning rate until the error value
worsen.
34
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Application of Back Propagation
Optical Character recognition
Image Compression
Data Compression
Load forecasting problem in power system area
Control problem
Non linear simulation
Fault detection problem etc.
35
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Adaline
It is developed Widrow and Hoff.
It is used for bipolar activations for its input signals and target output.
The weights and the bias of the Adaline are adujustable.
The learning rule used can be called as Delta rule, Least Mean
Square rule or Widrow Hoff Rule.
The activation of this rule with single output unit, several output units.
36
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Architecture
37
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Training Algorithm
Step 1: Initialize weights (not zero but small random values are used). Set
learning rate α.
Step 2: While stopping condition is false, do step 3-7.
Step 3: For each bipolar training pair s:t perform Steps 4-6.
Step 4: Set activations of input units xi = si for i = 1 to n.
Step 5: Compute net input to output unit y-in = b + ∑ xi wi
Step 6: update bias and weights, i = 1 to n.
- wi(new) = wi(old) + α (t – y-in)xi
- b(new) = b(old) + α (t – y-in)
Step 7: Test for stopping condition
38
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)
Application Algorithm
Step 1: Initialize weights obtained from the training algorithm.
Step 2: For each bipolar input vector x perform Steps 3-5.
Step 3: Set activation of input unit.
Step 4: Calculate the net input to output unit y-in = b + ∑ xi wi
Step 5: Finally apply the activation to obtain the output y.
1, y−in ≥ 0
y = f ( y−in ) =
− 1 y−in < 0
39
Dr L K Sharma, Rungta College of Engineering and Technology, Bhilai (CG)