NEURAL NETWORK &
APPLICATIONS
Introduction to Neural Networks 2
Single-Layer Perceptrons 21
Radial Basis Function Networks 46
Associative Memory Networks 53
Applications 79
NOTE:
WBUT course structure and syllabus of 8th Semester has been changed from 2014.
NEURAL NETWORK & APPLICATIONS [EC 802A] has been ‘ntroduced in thy 4
Present curriculum as a new subject. We are providing chapterwise some model
questions and answers along with the complete solutions of new university papérs,
$0 that students can get an idea about university questions patterns.POPULAR PUBLICATDNS
INTRODUCTION TO NEURAL NETWORKS
1. In a neural net, iffor the training input vectors, the target output is not know,
n,
tho training method idopted is called as ; et 2014, 2015)
a) supervised traning b) unsupervised training
-€) reinforcementtraining 4) none of these
Answer: (b)
2. The gradient descnt rule mostly is used in ._[WBUT 2044)
a) M-P Neural Leirning b) Hebb NeuralLearning
¢) Back-Propagaion Neural Learning - _) Adaline Neural Learning
Answer: (b) :
3. ADALINE stands br (WBUT 2014, 2017)
a) Additive Linezr Neuron b) Adaptive Linear Neuron
©) Associative Linear Neuron d) Adaptive De
Answer: (b)
4. Which of the following noural networks uses supervised learning?
[WBUT 2014, 2015)
a) simple recurrent network, b) self-organizing feature map
c) Hopfield network d) all of these
Answer: (b)
5. Which of the following algorithms can be used totrain a single-layer feedforward
network? (WBUT 2014, 2015, 2017)
a) hard competitive learning
b) soft competitive learning
¢) a genetic algoithm d) all of these
Answer: (4)
6. Supervised learning means
[IWBUT 2015)
a) having a teacher b) having a class.
c) having a feedoack d) none of these
Answer: (a)
!
7. Bias is : [WBUT 2015)
a) weight on a connection from a unit having activation 4
b) weight on a network having activation 2
¢) weight on a function having activation 4
4) none of these
Answer: (a)
NN&A-2NEURAL NETWORK & APPLICATIONS
8, Mc Culloch Model uses 2015
a) Sigmold function b) Step function sesnalae
¢) Signum function d) Tan hyperbolic function
Answer: (b)
9, The competitive rule is sulted for BUT 2015]
a) unsupervised-network trainin, b) i oh
c}renforcedetwerkraining "© Sh euperiead network taining
‘Answer: (a) :
10. se learning algorithm continues until further change in [WBUT 2016]
3 a aa b) non-linearity
qd 7
answer: (a) ) learning-rate
41. The synapse of a neuron is modeled by a {WBUT 2016, 2018]
a) linoar function b) non-linear function
c) non-linear rough function d) linear (non-linear function
Answer: (d) ‘
12. Functional value of Bipolar sigmoid function is (WBUT 2016]
a)Otot . ‘b) 1 to1
¢) any positive value d)none of these
Answer: (b)
43, The Hebbian rule is ... type of learning [WBUT 2017, 2018]
a) supervised b) unsupervised ¢) competitive " d) reinforced
Answer: (a) ,
44, What are the advantages of neural network over conventional computers? .
(l) They have the ability'to learn by example [WBUT 2017]
(ll) They are more faults tolerant
(lll) They are suited for real time operation due to their ‘computational’ rates
a) (I) and (Il) are true b) (I) and (II) are true
) (Il) and (Ill) are true d) all of these are true
Answer: (d)
15. Which of the following is/are true for neural networks? [WBUT 2017]
(l):The training time depends on the size of the network
(Il) Neural networks can be simulated on a conventional computer
(Ill) Artificial neurons are identical in operation to biological ones
a) all of these are true b) (Il) is true
¢) (Il) and (Ill) are true d) (1) and (Il) are true
Answer: (2)
NN&A-3POPULAR PUBLICATIONS
16. Artificial Neural Networks are inspired by eur
a) ‘Swarm \ntolligence b) high speed parallel processa,-""8
c) human brain d) All of these
Answer: (c)
17. Which of the following is an application of Neural Network?
[MODEL au;
a) Sales forecasting b) Data validation. ESTion
c) Risk management d) All of these
Answer: (d)
18. The Adaline neural network can be used as an adaptive filter for ech
cancellation in telephone circuits. For the telephone circuit given in the abo ;
figure, which one of the following signals carries the corrected message sent fron
the human speaker on the left to the human listener on the right? (Assume that th
person on the left transmits an outgoing Voice signal and receives an incoming
voice signal from the person on the right.) t
a) The outgoing voice signal, s. [MODEL QUESTION
b) The delayed incoming voice signal, n.
c) The contaminated outgoing signal, s + nO.
d) The output of the adaptive filter, y.
e) The error of the adaptive filter," =s + n0-y.
Answer: (¢)
19. What is classification? [MODEL QUESTION]
a) Deciding which features to use in a pattern recognition problem
b) Deciding which’class an input pattern belongs to
c) Deciding which type of neural network to use
Answer: (b) :
20. What is.a pattern vector? [MODEL QUESTION)
a) A vector of weights w = [w1, w2, ..., wn]" in a neural network.
b).A vector of measured features x = [x1, x2,..., xn]' of an input example.
¢) A vector of outputs y = [y1, y2, ..., yn] of a classifier.
Answer: (b)
Short Answer Type Questions
1. Implement AND function using McCulloch Pitts neuron (take binary data).
[WBUT 2014]
Answer:
The AND function returns a true value only if both the inputs are true, else it retums 3
false value. “1° represents true value ‘0’ represents false value.
The truth table for AND function is,NEURAL NETWORK & APPLICATIONS
% 2 YD
11
10 0
0 10
000
A ee Pitts neuron to implement AND function is shown in Fig. 1. The.threshold
on unit ¥ is
The output Y is, Y= f(y,,) & 1
The net input.is given by 1 GC)
Jo =D, Weights * input ©
Fig: 1 McCulloch-Pitts neuron to
Yq = 18x, +18 x,
perform logical AND function
Yin = F%
From this the activations sftp neuron can be formed.
r=s0n)={p Pyne
Now present the inputs
@ HH 4, = 1, Yq HH HH = 1412
Y=S(Yqn)=1 since y,, =2.
(i) x=1,4)=0, y, =a +x) =OFT=
Y=L(Yq) =O since Yn
This is same when-x,
(i) x, =0,x,=0, ¥, =aptx, =0+0=0
Hence, y= f(y) =0 since y,, =0<2.
if yg <2
2. What is the necessity of an activation function? List commonly used activation
functions. [WBUT 2014, 2015]
oR,
Discuss-about the different activation functions used of training artificial neural
networks, [WBUT 2016]
: OR,
Discuss different Activation Functions that are used in Artificial Neural Network.
[WBUT 2017]
Answer: : .
Ina neural network each neuron has an activation function which species the output of a
Feuron to a given input. Neurons are switches that output a when they are -sufliciently
activated and a 0 when not.
NN&A-5POPULAR PUBLICATIONS
Commonly used activation functions:
Step Function:
A step function is a function like that used by the original Perceptron. The Output is g
certain value, Al, if the input sum is above a certain threshold and AO if the input sum is
below a certain threshold. The values used by the Perceptron were Al = | and AO = 9,
Linear Combination: .
A linear combination is where the weighted sum input of the neuron plus a linearly
dependent bias becomes the system output.
Continuous Log-Sigmoid Function:
A log-sigmoid function, also known as a logistic function is given by the function
1
o() Ite”
Softmax Function:
The. sofimax activation function is useful predominantly in the output layer of
clustering system. Softmax functions conyert a raw value into a posterior Probability,
This provides a measure of certainty.
3. What is Adaline? Draw the model ofan Adaline network. [WBUT 2014)
. OR,
What is Adaline? What type of learning is used in Adaline? IWBUT 2018)
Answer:
ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early
single-layer artificial neural network and the name of the physical device. that
implemented this network,
Adaline is a single layer neural network with multiple nodes where each node accepts
multiple inputs.and generates one output, Given the following variables:
* x is the input vector
© — w is the weight vector
*. 2 is the number of inputs
*_@ some constant
* ~y is the output of the model
then we find that the output is y =)”
x,w, +@. If we further assume that
ath m
on, 0
then the output further reduces to the dot product of x and w: yexw,NEURAL NETWORK é APPLICATIONS
Consider a single ADALINE with two inputs. The diagram for-this network is shown
elow. Input ‘Simple ADALINE
a
It uses supervised leaning, 2 purclingW prey
4. Implement XOR function using McCulloch-Pitts neuron (consider binary data).
[WBUT 2014, 2016]
| oR,
Design a Hebb net to implement logical AND function with bipolar inputs and
target. [WBUT 2017]
Answer:
The exclusive OR (XOR) has the truth table:
V1 | V2] XOR
watt
0} 0| 0
Oo; ift
1] Of 1
1} 1[0
It cannot be represented with a single neuron, but the — relationship
XOR = (V, OR V2) AND NOT (V; ‘AND Y>) suggests that it can be represented with the
network. The network is‘as shown below:
vi
V.XOR Vs
. anw=i.e
5. What is the impact of weight in an artificial neural network? [WBUT 2015]
What is the role of weight and bias in an aN model? , [WBUT 2016]
How does a momentum factor make faster convergence of a network?
: [WBUT 2015)
Answer:
Individual nodes in a neural network emulate’ biological neurons by taking input data and
performing simple operations on the data, selectively passing the results on to other
neurons. The output of each node is called its "activation" (the terms "node values" and
"activations" are used interchangeably here). Weight values are associated with each
Vector and node in the network, and these values constrain how input data (e.g., satellite
NN&A-7POPULAR PUBLICATIONS
image values) are related to output data (e.g., land-cover classes). Weight Values
associated with individual nodes are also known as biases. Weight values are determineg
by the iterative flow of training data through the network (i.e. weight values are
established during a training phase in which the network learns how to identify Particular
classés by their typical input data characteristics). . .
The gradient descent is very slow if the learning rate a is small and oscillates Widely if
is too large. One very efficient and commonly used method that allows a larger learning
rate without oscillations is by adding a momentum factor to the normal. gradient descent
method The momentum factor denoted by 1] [0,1] and the value of 0.9 is often used for
the momentum factor. A momentum factor can be used with either pattern by pattem
updating or batch-mode updatitig. In case of batch mode, it has the effect of complete .
averaging over the patterns .Even though the averaging is partial in the pattem-by-pattern ©
mode; it leaves some useful information for weight updating.
6. Define Delta rule. Write down the error function for delta rule.
. [WBUT 2016, 2019)
Answer:
1" Part: .
The delta rule, also called the Least Mean Square (LMS) method, is one of the most
commonly used leaming rules. For a given input vector, the output vector is compared to
the correct answer. If the difference ‘is.zero, no learning takes place; otherwise, the
weights are adjusted to reduce this difference. The activation function in this case is
called a linear activation function, in which the output node's activation is simply equal
to the sum of the network’s respective inpu/weight products. The strengths of network's
connections (i.e., the values of the weights) are adjusted to reduce the difference between
target and-actual output activation (i.e, error).
2™ Part: ot ‘
The Delta Rule employs the error function for what is known as gradient descent
learning, which involves the modification of weights along the most direct path in
weight-space to minimize error, change applied to a given weight is proportional to the
negative.of the derivative of the error with respect to that weight. The error function is
commonly given.as the sum of the squates of the differences between all target and actual
node.activations for the output layer. For a particular training pattern (i.e, training case),
error is thus given by:
be l 2
£,=72(1, -4,)
total error over the training pattern, 4 is a value applied to simplify the .
derivative, n répresents all output nodes for a given training pattern, ty
represents the target value for node n in output layer j, and jy represents the actual
activation. for the same node. This particular error measure is attractive because its
derivative, whose value is needed in the employment of the Delta Rule, is easily
NN&A-8NEURAL NETWORK & APPLICA wns
calculated. Error over an entire set of training pattems (i.c., over one iteration, or epoch)
is calculated by summing all E,:
y
e-De=70L,
where E is total error, and p represents all training patterns.
7. Discuss different categories of learning rules. [wBuT 2017]
Answer: .
Different categories of leaming rules are: 4
+ Supervised Learning: The learning algorithm would fall under this category if the
desired output for the network is also provided with the input while training the
network. By providing the neural network with both an‘input and. output pair it is
possible to calculate an error based on it's target output and'actual output. It can then
use that error to make corrections to the network by updating it's weights. :
+ Unsupervised Learning: In this paradigm the neural network is only given a set of
* inputs and it's the neural network's responsibility to find some kind of pattern within
the inputs provided without any external aid. This type of learning paradigm is often
used in data mining and is also used by many recommendation algorithms due to
their ability to predict a user's preferences basedion.the preferences of other similar
users it has grouped together.
+ Reinforcement Learning: Reinforcement-learing is similar to supervised leaming
in that some feedback is given, however instead of providing a target output a reward
is given based on how well the system performed. The aim of reinforcement learning
is to maximize the reward the system receives through trial-and-error. This paradigm
relates strongly with how. learning works in nature, for example an animal might
remember the actions it's previously taken which helped it to find food (the reward).
8. Compare biological neuron and ANN. . [WBUT 2017]
‘Answer:
Artificial neural nets were originally designed to model in some small way .the
functionality of the biological neural networks which are a part of the human brain. Our
brains contain about 10'* neurons. Each biological neuron consists of a cell body, a
collection of dendrites which bring electrochemical information into the cell and an axon
which transmits electrochemical information out of the cell.
A neuron produces an output along its axon i.., it fires when the collective effect of its
inputs reaches a certain threshold. The axon from one neuron can influence the dendrites
of another. neuron across junctions called synapses. Some synapses will generate a
positive effect in the dendrite, i.e. one which encourages its neuron to fire, and others will
produce a negative effect, i.e. one which discourages the neuron from firing. A single
neuron receives inputs from perhaps 10° synapses and the total number of synapses in our
brains may be of the order of 10". It is still not clear exactly how our brains learn. and
remember but it appeazs to be associated with the interconnections between the neurons
(ie. at the synapses).
NN&A-9POPULAR PUBLICATIONS
Artificial neural nets try to model this low level functionality of the brain del gst
* with high level symbolic reasoning in artificial intelligence whi svious of main
level reasoning processes of the brain. When we think we are conse objects, We oe ing
Concepts to which we attach names (or symbols) e.g. for people on undemeath ne
conscious of the low level electrochemical processes which are a Tow level cn The
argument for the neural net approach to Al is that, if we can model t ort WVities
correctly, the high level functionality may be produced as an emergen ‘os i
It can be seen from the above that there is an analogy between biological (human) ang
artificial neural nets. The analogy is summarized below.
= :
Human Artificial
Neuron - Processing Element
Dendrites _| Combining Function
Cell Body Transfer Function
° Axons Element Output
Synapses Weights
However, it should be stressed that the analogy is nota strong one. Biological neurons
and neuronal activity are far more complex than might be suggested by studying artificial
neurons. Real neurons do not simply sum the weighted inputs and the dendritic
mechanisms in biological systems are much'more elaborate. Aiso, real neurons do not
stay on until the inputs change and the outputs may encode information using complex
pulse arrangement.
8. What is Boltzmann learning? How does it differ from Error-Correction learning?
[WBUT 2017)
Answer:
1"Part:
Boltzmann learning is\statistical in nature, and is derived from the: field of
thermodynamics. It is similar‘to error-correction learning and
training. In this algorithm, the state of each individual neuron,
output, are taken into account. In this respect, the Boltzmann leaning rule is Significantly
slower than the error-correction learning rule. Neural networks that use Boltzmann
leaming are called Boltzmann machines. ‘
Boltzmann learning is similar to an error-correction leaming rule, in that an error signal is
" Used to train the system in each iteration. However, instead of a direct difference between
the result value and the desired value, we take the difference between the Probability
distributions of the system.
is used during supervised
in addition to the system
2"* Part:
Boltzmann learning is similar to an €rror-correction learnin,
used to train the system in each iteration. Hi
the result value and the desired value,
distributions of the system.
learning rule.
I g rule, in that an error signal is
owever, instead of a direct difference between
We take the difference between the probability
It is also significantly slower than the error-correction
NN&A-10NEURAL NETWORK & APPLICATIONS
40, How neural network can be-applied for pattern classification and clustering?
[WBUT 2017]
‘Answer:
Classification
«the assignment of each object to a specific "class"
+ Weare provided with a “training set
© Recognizing printed or handwritten characters
Clustering
* Clustering requires grouping together objects that are similar.to each other
11. a) What Is delta learning rule? .. . | [WBUT 2017]
b) Compare delta learning rule and Perceptron learning rule.
Answer:
a) Refer to Question No. 6(1" Part) of Long Answer Type Questions.
b) There are two differences between the perceptron and the delta rule. The perceptron is
based on an output from a step function, wheréas.the delta rule uses the linear
combination of inputs directly. The perceptron is guaranteed to converge to a consistent
hypothesis assuming the data is lineafly separable. The delta rule converges in the limit ,
but it does not need the condition of linearly separable data. i
412. What aré the parameters to increase efficiency Hebbian Synapse as a function
of the correlation between the pre-synaptic and post-synaptic? How the
parameters are influencing the Hebbian Synapse? [WBUT 2018]
Answer: :
A Hebbian synapse is a synapse that uses a time-dependent, highly local, and strongly
interactive mechanism to increase synaptic efficiency as a function of the correlation
between the presynapticand postsynaptic activities. . .
*. Time-dependent, mechanism. This mechanism refers to the fact that the.
modifications in a Hebbian synapse depend on the exact time of occurrence of
the presynaptic'and postsynaptic activities.
© Local mechanism. By its very nature, a synapse is the transmission site where
information-bearing signals (representing ongoing activity in the presynaptic and
postsynaptic units) are in spatiotemporal contiguity. This locally available
ssinformation is used by a Hebbian synapse to produce a local synaptic
modification that is input-specific. It is this local mechanism that enables a neural
network made up of Hebbian synapses to perform unsupervised learning. __y
* Interactive mechanism. Here we note that. the occurrence of a change:in.a
Hebbian synapse depends on activity levels on both sides of the synapse. That is,
a Hebbian form of leaming depends on a “true interaction” between presynaptic
and postsynaptic activities in the sense that we cannot. make a prediction from
either one of these two activities by itself. Note also’ that this dependence or » .
interaction may be deterministic or statistical in nature.
NN&A-ITPOPULAR PUBLICATIONS.
© Conjunctional ‘or correlational mechanism. One eee one Hebb
postulate of learning is that the condition for a change | "Thus, accords iency is
the conjunction of presynaptic and postsynaptic activi 7 . hid naatle ne to this
interpretation, the co-occurrence of presynaptic ‘n Pi 7 _ modif Ctiviti <
(within a short interval of time) is enough to produce t le synap' theed Cation. i,
is for this reason that a Hebbian synapse is sometimes ee ic as y
conjunctional synapse. For another interpretation of. Hebb’ 's Pe Baise, earning
we may think of the interactive mechanism characterizing a Hel synapse jn
statistical terms. The correlation over time between presynaptic and postsynapti,
activities is viewed as being responsible for a synaptic change.
13.’ What are differences between Supervised and Unsupervised beanng? How
Reinforcement learning differs from Supervised Learning? [wi 2018)
Answer: .
Refer to Question No. 7 of Short Answer Type Questions.
14. How Competitive Learning is different from Hebbian Learning?) [WBUT 2018)
Answer: “ ° .
Competitive learning is a form of unsupervised learning in artificial neural networks, in
which nodes compete for the right to. respond to a subset of the input data. The significant
difference between competitive learning and Hebbian learning is in the number
neurons at any one time. Whereas.neural network based on Hebbian learnin;
output neurons may be active simultaneously in competitive learning, only a single
Output neuron is active at any one time. According to this feature, competitive learning is
highly suitable for discovering statistically salient features) which makes it useful. for
classification of input patterns.
Of active
B, several
45. Describe the main differences
between the human brain and tod:
(such as your desktop PC) in term:
jay’s computers.
's of information processing. °
, 7 [MODEL QUESTION}
Answer: « :
° The brain works in a highly. parallel fashion, but in’ the PC, everything has to go
through one or several processors. °
Neurons compute slewly (several ms per computation), electronic elements compute
fast (
/W, ;
‘ 2)
y= (Sum) . e of either (0,1) or (=|
. lized in the rang’ i" 1,1) ay
W,,W,,W, a, are weight ve isthe weighted gum, and T is athreshold cong’
d T as shown in figure (a)'below, i
hown in figure (b).
associated with each itput line,
The function f is a lirear step ful
symbolic representatiot of the linear tht
nction at threshol
reshold gate is s|
Inputs Weights
mw
+ h
=
Threshold T
Fig, (a): Linear Threslcld Function Fig, (b): Symbolic Illustration of Linear Threshold Gate
The McCulloch-Pitts rode! of a neuron is simple yet has substantial computing Potential
It also has a precise mhematical definition: However, this model is so simplistic that j,
only generates a binaryoutput and. also the weight and threshold values are fixed.
3. Write short notes o1 the following Z
a) Memory based learing DWBUT 2014
b) Supervised learnin: [WBUT 2016}
3 Neural network arcitecture [WBUT 2016}
d) Gradient descent larning [WBUT 2017]
e) Competitive Learniig [WBUT 2017}
f) Boltzman'Learning . WBUT 2018}
g) Reinforcement leaning [WBUT 2018]
Answer: :
a) Memory based leaning:
Memory Based Learmg (MBL) is based on the idea that intelligent behavior can be
‘obtained by analogica reasoning, rather than by the application of abstract mental rules
as in rule induction ad rule-based processing. In particular, MBL is founded in the
hypothesis that the trapolation of behavior: from stored representations of earlie
experience to new sitations, based on the similarity of the old and the new situation, i
of key importance. NBL- algorithms take a set of examples (fixed-length patterns of
feature-values and thir associated class) as input, and produce a classifier which can
classify new, previouly unseen, input patterns. MBL can in principle be applied to any
kind of classificatio task with symbolic: or numeric features and discrete (nor
continuous) classes fe which training data is available.
NN&A-16IEURAL NETWORK & APPLICATION:
b) Supervised learning:
Supervised learning is the machine learning task of inferring a function from supervised
training data. The training data consist of a set of sraining examples. In supervised
learning. each example is a pair consisting of an input object (typically a vector) and a
desired output value (also called the supervisory signal). A supervised learning algorithm
analyzes the training data and produces an inferred function, which is called a classifier
(if the output is discrete) or a regression function (if the putput is continuous). The
inferred function should predict the correct output value for any valid input object, This
requires the learning algorithm to generalize from the training data to unseen situations in -
a "reasonable" way (see inductive bias). The parallel task in human=and animal
psychology is often referred to as concept learning.
c) Neural network architecture:
Humans and other animals process information with neural networks. These are formed
from trillions of neurons (nerve cells) exchanging brief clectrical pulses called action
potentials. Computer algorithms that mimic these biological structures are formally
called artificial neural networks to distinguish them from the squishy things inside of
animals. However, most scientists and engineers-are not-this formal and use. the term
neural network to include both biological and nonbiological systems. .
xt Tnformation flow:
Neural network architecture:
This is the most common (Xie
structure for neural networks:
three layers with full
interconnection. The input layer
nodes are passive, doing‘ Xl
nothing but selaying the! values x,
from their single input to ther *!”
multiple outputs Ine” Xty
comparison, the nodes of the
hidden and output layers are Xt os
active, modifying the signals in Sutput layer
accordance with figure, The Xl Output layer
action of thisineural network is (ective nodes)
determined by “the weights
ed inthe hidéen and output,
Hidden ayer
{active nodes)
Input iayer
{passive nodes)
d) Gradient descent learning: .
Gradient descent is an optimization algorithm used to find the values of parameters
(coefficients) of a function (f) that minimizes a cost function (cost).
NN&A-17POPULAR PUBLICATIONS
Gradient descent is best used when the arameters i
using linear algebra) and must be searched for by an optimization cleans ally (ee
The procedure starts off with initial values for the coefficient oF coefficients
function. These could be 0.0 or a small random value. Fm (the
coefficient = 0.0 .
The cost of the -coefficients-is evaluated by ‘plugging them into the function and
calculating the cost. “
_____ Cost = (coefficient) or cost = evaluate({(coefficient))
The derivative of the cost is calculated. The derivative is a concept from calculus
refers to the slope of the function ata given point: We need to know the slope so that na
know the direction (sign) to move the coefficient ‘values in order to get a lower cost on
the next iteration. a "
. ‘ delta = derivative(cost) * @
Now that we know from the derivative which direction is downhill, we can now update
the coefficient values. A learning rate Parameter (alpha)“must be specified that contro}
how much the coefficients can change cin each update: .
| . coefficient = coefficient = (alpha * delta)
This process is repeated until the cost of the coefficients (Cost) is 0.0 or close enough to
zero to be good enough.
©) Competitive Learning: —
In competitive learning the following properties hold true:
© . Nodes compete for inputs -
© Node with highest activation is the winner
¢ Winner neuron adapts its tuning (pattern of weights) even further towards the current
input - .
Individual nodes specialize to win competition for a sét of similar inputs
© Process leads to most efficient neural representation of input space
© Typical for unsupervised leaming
f) Boltzman Learning:
Refer to Question No. 9 of Short Answer Type Questions.
g) Reinforcement learning:
Reinforcement Learning is a type of Machine Learning, and thereby also a branch of
Anificial Intelligence. It allows machines’and software agents to automatically determine
the ideal behaviour within a specific context, in order to maximize its performance.
Simple reward feedback is required for the agent to learn its behaviour; this is known as
the reinforcement signal. There are many different algorithms that tackle this issue.
Reinforcement Learning is defined by a specific type of problem, and all its solutions are
classed as Reinforcement Learning algorithms. In.the problem, an. agent is supposed
decide the best action to select based on his current state. When this step is repeated, the
problem is known as a Markov Decision Process. This automated learning scheme
: NNGALISETWORK PLICATIONS.
implies that there is little need for a human expert who knows about the domain of
application. Much less time. will be spent designing a solution, since there is no need for
hand-crafting complex sets of rules as with Expert Systems, and all that is required is
someone familiar with Reinforcement Leaning.
This automated learning scheme implies that there is little need for a human expert who
knows about the domain of application. Much less time will be spent designing a
solution, sinee there is no need for hand-crafting complex sets of rules as with Expert
Systems, and all that is required is someone familiar with Reinforcement Learning, The
possible applications of Reinforcement Learning are abundant, due to the generalness of
the problem specification. As a matter of fact, a very large number of problems in
Artificial Intelligence can be fundamentally mapped to a decision process. This is a
distinct advantage, since the same theory can be applied to many different domain
specific problem with little effort. In Practice, this ranges from controlling robotic arms to
find the most efficient motor combination, to robot navigation where collision avoidance
behaviour can be learnt by negative feedback from bumping into obstacles, Logic games
are also well-suited to Reinforcement Learning, as they are waditionally defined as a
sequence of decisions,
4, What is the principle of learning of the Adaline?’ Fully explain the Adaline
architecture and learning algorithm. [MODEL QUESTION]
Answer: .
Adjust weights
¢ Learning method: delta rule (another way of error driven), also called Widrow-Hoff
learning rule :
* Try ‘to teduce the mean squared error (MSE) between the net input and the desired
out put
Algorithm LMS-Adaline:
Start with a randomly chosen weight vector wy:
Let &
je MSE is unsatisfactory and
computational bounds are not exceeded, do
Let / be an input vector
(chosen randomly or in some sequence)
NN&A-19LICATI
for which d is the desired output value;
Update the weight vector to
wy, =, +d - Hy.)
Increment k :
end-while.
5. What is Hebbian Learning? Explain using mathematical terms.
[MODEL QUESTION}
Answer:
Hebb’s postulate of learning is the oldest and most famous of all learning rules:
1. If two neurons on either side of a synapse (connection) are activated simultaneously
(i.e. synchronously), then the strength of that synapse is, selectively increased.
2. If two neurons‘on either side of a synapse are activated, asynchronously, then-that ,’
synapse is selectively weakened or eliminated.
. win)
x(n) ~ yada)
Fig: Synaptic connection
* To formulate Hebb's postulate of learning. in mathematical terms, consider a synaptic
weight 4; with presynaptic and postsynaptic activities denoted by x; and yi, respectively.
. According to Hebb's postulate, the adjustment applied to the synaptic weight wijat time 1
is Aw, (n)=F(y. (1), x/())
As a special case we may use the activity product rule Awy (2) =7y, (=) x,(r)
where 7 is a poSitive.constant that determines the rate of learning. This rule clearly
emphasizes the correlational nature of a Hebbian synapse.
From this representation we see that the repeated application of the input signal x, leads to
an exponential growth that finally drives the synaptic weight wy into saturation.
«ag (HL) my (0) Haye (dx, (rn) = m4 (n) (L405)
Ifx, stays constant then,
w, (2 +N) =, (n)(14 03)"
To avoid such a situation from arising, we need to impose a limit on the growth of
synaptic weights. One method for doing this is to introduce a nonlinear forgetting factor
into the formula for the synaptic adjustment Aw,{7). Specifically, we redefine Aw,(m) as
~ ageneralized activity product rule:
omy (1) 09% (2), (2) e294 (2) Wy (7) = ay,(n)[cx,(n)-w, (n)]
where: c = qa. If the weight w,(m) increases to the point wherecx,(n)—wy (n)=0a
balance point is reached and the weight update stops.NEURAL NETWORK & APPLICATIONS
SINGLE-LAYER PERCEPTRONS .
Multiple Choice Type Questions
_ 4. A perceptron is . [WBUT 2014]
a) a single layer feed-forward neural network with Preprocessing -
b) an autoassociative neural network
¢) a double layer autoassociative neural network
d) all of these
Answer: (a)
2. In back-propagation algorithm is propagated backward:through the
network. [WBUT 2014, 2016, 2018]
a) error b) signal c)error+signal \d) signal — error
Answer: (c)
3. A perceptron is: [WBUT 2015]
a) a single layer feed-forward neural network with pre-processing
b) an auto-associative neural network
¢) a double layer auto-associative neural network
d) Hebb network
Answer: (a) : . :
4. A 3-input neuron Is trained to output a zero when the input is 110 and a one
when the input is 111. After generalization, the output will be zero when and only
when the input is
a) 000 or 110 or 014 or 104 [WBUT 2016]
b) 010 or 100 or 110 or 101
¢) 000 or 010 or 110 or 100
Answer: (c)
5. The madaline network is . [WBLT 2046, 2018]
a) The combination of two single layered feed forward neural networks
b) A type of multilayered feed forward neural network with multiple neurons in
output layer
©) The combination of adaline networks. and multilayered feed forward
network with one neuron in output layer
d) A type of feedback network - ai
Answer: (b)
6. Single layer Perceptron is used for [WBUT 2017, 2018)
a) linear separability b) error minimization
¢) back propagation d) annealing
Answer: (a)
NN&A-21POPULAR PUBLICATIONS
i iy X03) = (0.
7. For a three input neuron representing a Perceptron where {1 2.93) (9.8.0.6,
0.4) and weight (ws, Ws, Ws} = (0.1, 0.3, -0.2} and bias ad
using bipolar sigmoid activation function is ao259 zm
a) 0.265 b) 0.746 ©) 0.346 ”
Answer: (a)
3 and 4, The transfer function is linear wit
B.A 4-1 7
input neuron has weights 1, 2, 3 and 4. The Vansiet Ce eet an
the constant of proportionality being equa
respectively. The output will be: awe 2017)
a) 238 b) 76 e119 =
Answer: (b)
(4, 1) neuron representing a Perceptron with
9. For a four input (0, 0), (0, 4), (1, 0), [WBUT 2018)
wl=w2=1 and .5 the classification does
a) AND classifier b) OR classifier
c) XOR classifier d) None of these
Answer: (a)
10. The network of figure below is: : [MODEL QUESTION}
a) a single layer feed-forward neural network,
b) an autoassociative neural network
c) a multiple layer neural network
xt
. ZT. x)
OH]
X2r
Answer: (a)
11. A single perceptron can compute the XOR function.” [MODEL QUESTION]
a) True b) False :
Answer: (b)
42.,A perceptron adds up all the weighted inputs it receives, and if it exceeds a
certain value, it outputs a 1, otherwise it just outputs a0. (MODEL QUESTION]
a) True « — b) False
c) Sometimes — it can also output intermediate values as well
d) Can't say
Answer: (a)
NN&A-22NEURAL NETWORK & APPLICATIONS
43. “The XOR problem can be solved by a multi-layer perceptron, but a multi-layer
perceptron with bipolar step activation functions cannot learn to do this.”
a) True b) False . [MODEL QUESTION]
Answer: (a) .
14, The Perceptron Learning Rule states that “for any data set which is linearly
separable, the Perceptron Convergence Theorem is guaranteed to find a solution
in a finite number of steps.” [MODEL QUESTION]
a) True b) False r
Answer: (b)
15. A perceptron with a unipolar step function has two inputs with weights w1 ="0.5
" and w2 = -0.2, and a threshold @ = 0.3 (8 can therefore be considered as a weight
for an extra input which is always set to -1). For a given training example x = [0,1]",
the desired output is 1. Does the perceptron give the correct answer (that is, is the
actual output the same as the desired output)? [MODEL QUESTION]
a) Yes b) No
Answer: (b)
16. A perceptron is guaranteed to perfectly learna given linearly separable
function within a finite number of training steps. [MODEL QUESTION]
a) True b) False
Answer: (a)
17. In backpropagation learning, we should start with a small learning parameter n
and slowly increase it during the learning process. [MODEL QUESTION]
a) True b) False
Answer: (b)
Short Answer ye Questions
1. The Exclusive-OR. function is not.linearly separable and hence a single-layer
perception cannot simulate it. Justify it. [WBUT 2014)
OR,
Define the single layer perceptron net and Its linear separability. [WBUT 2016]
Answer: .
Consider the two input neuron shown in figure below,
Input
x We
‘Threshold Eb
Wy z °
The output from the summing stage of the neuron is:
S=XW, + XW, :
Input
Y
NN&A-23LAR PUBLICATI
We can re-arrange this equation into the equation of a straight line:
n above, then the physica)
‘on in which’the output ig
in figurebelow. =
with Y=mX +c.
Je threshold of the type show!
divider between the regi 1
<0” as showni
which can be compared
If the neuron is a simp!
meaning of this line jis that it
logic ‘1 and the region in
Line representing neuron
function (equation of
straight line given above)
01
XOR Truth table
10
which we would likeito produce @ +0” output are :
fh produce an output of “1” are shown & filled ci
‘hat no maticr where the line is:plotted on the graph, the ‘0
i re Gutputs by the lines, and hence, @ simple neuron can
Linear Separability-and we say 1
le Perceptron type neuron.
0.4. The transfer function is linear
shown as empty circles, ang
roles. It can be clearly seen,
outputs cannot be separated
not simulate a XOR gate,
hat the XOR function ig
i
This clas: °F problem is/called
not lineariy separable by asingl
2. A 4-input neuron tics weights 0.1, 0.2, 0.3 and 3
2. th the constant of proportionality being equal to 5. The inputs are 5, 10, 15 ang
20 respectively.-Find the output. [WBUT 2016)
Answer: : .
ghts with their respective inputs, summing the
The output is found by multiplying the wei
results and multiplying with the transfer function.
* (0.195 +0.2"10 + 0.3815 + 6.4*20) = 75.
ed in Back Propagation
3. Which type of Activation Function is commonly us
algorithm? [WBUT 2018)
Answer: .
trie activation function is non-linear and differentiable. A commonly used activation
function is the logistic function: 1/(1+e~*) .
NN&A-24NEURAL NETWORK & APPLICATIONS
4, Explain briefly the difference between A perceptron and a feed-forward, back-
propagation neural network, [MODEL QUESTION]
Answer: :
A neural network consists of a collection of perceptrons organized in layers, so that the
output of one layer is the input to the next layer. The perceptrons are modified so that the
output is a continuous function of the inputs, rather than being’ step. function. The
learning algorithm is similar in principle to the perceptron learning algorithm, but
involves a system of message passing from the output layer backwards to the input layer.
“5, Consider the following data set T. A and B are numerical attributes.and Z is a
Boolean classification. [MODEL QUESTION}
aA B Zz ~
1 2 T
2 1 F
3 2 T
1 i F
a) Let P be the perceptron with weights wa = 2, we = 1) and threshhold T=4.5.
What isthe value of the standard error function for this perceptron?
b) Find a set of weights and a threshold that categorizes all this data correctly.
Answer: y
a) Perceptron P gets the first instance wrong, with an error of |(2*1)+(1*2)-4.5|=0.5 and
the second instance wrong with an error of \(2*2)#(1*1)-4.5|=0.5. The total error is
therefore 1.0.
b) wa= 0, we=1, T=1.5 will do fine.
6. What is the advantage of Adalines over perceptrons? How is it achiéved?
. [MODEL QUESTION]
Answer: .
The advantage of Adalines.is that they do not simply find any solution, that leads to
perfect classification of the training set, but they try to optimize their computation so that
it works as best as possible with new (untrained) inputs. This is achieved by using a
continuous, differentiable error function and minimizing this error using gradient descent.
This error minimum is the best estimate for the optimal classification function with
regard to the entire data set (of which the training data are usually only a small subset).
z WwW
Long Answer e Questions an!
. rane
1. List the stages Involved in training of back propagation algorithm. [WBUT 2014]
Answer:
The back propagation-learning algorithm can be divided into two phases: propagation
and weight update.
NN&A-25POPULAR PUBLICATIONS . ‘
Phase 1: Propagation
Each propagation involves the following steps: .
Forward propagation of a training pattern's input through the neural networ, in
order to generate the propagation's output activations. _ tions through th
+ 2. Backward propagation of the propagation's output activat ee © eur
network using the training pattern target in order to g¢ Aeltas of
output and hidden neurons.
Phase 2: Weight update . . .
ight-synapse follow the following steps: ; /
Fer aay pe canpat delta and input activatian to eet the gradient of the weight,
2. Subtract a ratio (percentage) of the gradient from, A Shae + it is cal
This ratio (percentage) influences the speed and quality of eal et the. led the
learning rate. The greater the ratio, the faster the neuron trains; rporaia ratio, the
more accurate the training is. The sign of the gradient.of a weie) indicates eT the
error is increasing; this is why the weight must be updated in the oppes ion.
Repeat phase 1 and 2 until the performance of the network is satisfactory.
2. a) Define perceptron learning rule. How the linear separability concept jg
implemented using perceptron network training? (WBUT 2015,
Answer:
1" Part: +
Algorithm
Start with a randomly chosen weight vector Wo.
Letk=1; °
While there exist input vectors that are misclassified by w,.1, do
Let i; be a misclassified input vector;
Let x, = class(ij)-ij, implying that wi.y-x. <0;
, Update the weight vector to wi = Wii + MKS
Increment k; -
end-while;
For example, for some input i with class(i
If w-i > 0, then we have a misclassification.
Then the weight vector needs to be modified to w + Aw with
improve classification.
We can choose Aw = -ni, because :
(w+ Aw) = (w - Hi) = wei - nid < wei, and i-i is the square of the length of vector i and
is thus positive. - :
If class(i) = 1, things are the same but with opposite signs; we introduce x to unify these
two cases. .
( + Aw):i