0 ratings 0% found this document useful (0 votes) 24 views 35 pages FAI DeepLearning
Deep learning is a subset of machine learning that utilizes multi-layer neural networks to learn complex representations of data, inspired by the structure of the human brain. Neural networks consist of interconnected neurons that process information through weighted inputs and activation functions, enabling them to learn from examples and generalize. The backpropagation algorithm is commonly used for training these networks by adjusting weights based on the error between predicted and actual outputs.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save FAI_DeepLearning For Later
Fundamental of Al 4-2 Deep Learning
ERI Concept of Deep Learning
* Deep learning is a new area of machine learning research, which has been introduced with
the objective of moving machine learning closer to one of its original goals.
* Deep learning is about learning multiple levels of representation and abstraction that help to
make sense of data such as images, sound ahd text.
* ‘Deep learning’ means using a neural network with several layers of nodes between input
and output. It is generally better than other methods on image, speech and certain other
types of data because the series of layers between input and output do” feature
identification and processing in a series of stages, just as our brains seem to.
* Deep learning emphasizes the network architecture of today's most successful machine
learning approaches. These methods are based on "deep" multi-layer neural networks with
many hidden layers,
EERE] The Neuron
* Artificial neural systems are inspired by biological neural systems. The elementary building
block of biological neural systems is the neuron.
* The brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell
[right] that uses biochemical reactions to receive, process and. transmit information.
Fig. 4-1-1 shows biological neural systems.
Axon hillock
Dendrite Nucleus! Terminal buttons,
Fig. 4.1.1 Schematic of biological neuron
¢ The single cell neuron consists of the cell body or soma, the dendrites and the axon. The
dendrites receive signals from the axons of other neurons. The small space between the
. The afferent dendrites
conduct impulses toward the soma. The efferent axon conducts impulses away from the
axon of one neuron and the dendrite of another is the synapst
soma.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge.Deep Learning
fidial intelligence”,
Rania
sm ToosBlY thodeled based on the human brain, The field
Connectionism, parallel distributed processing, neuro-
: Systeins) inachine ‘learting algorithms “and atificial neural
gai ct
mation.
rete Tyra 1 a
pA AF x ural network is an information processing paradigm that is inspired by the
yy biological nervous systems process information :
sintsiea St “a large number of highly
ae simple processing elements (neurons) working together to solve specific
‘oblems".
imple artificial neuron
, The basic computational element is often called anode or unit. It receives input from some
other units or from an external source,
» Each input has an’ associated weight Ww, which can be modified so as to model synaptic
learning. ma EPR
4 The unit computes some function of the weighted sum of its inputs.
pees 2
Neural Networks (NN)
A Neural Network is usually structured into an input layer of neurons, one or more hidden
layers and one output layer.
Neurons belonging to adjacent layers are usually fully connected and the various types and
architectures are identified both by the different topologies adopted for the connections as
well by the choice of the activation function.
The values of the functions associated with the connections are called "weights".
The whole game of using NNs is in the fact that, in order for the network to yield
appropriate outputs for given inputs, the weight must be set to suitable values. The way this
"is obtained allows a further distinction among modes of operations,
A neural network is a processing device, either an algorithm or actual hardware, whose
design was motivated by the design and functioning of human brains and components
thereof.
_ * Most neural networks have some sort of "training" rule whereby the weights of connections
are adjusted on the basis of presented patterns,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeoa
Fundamental of Al 4-4 260 bearing
* In other words, neural networks "learn" from examples, just like children learn to recognize
dogs from examples of dogs and exhibit some structural capability for generalization,
* Neural networks normally have great potential for parallelism, since the computations of
the components are independent of each other.
Neural networks are a different paradigm for computing :
1, Von Neumann machines are based on the processing / memory abstraction of human
information processing,
2. Neural networks are based on the parallel architecture of animal brains.
* Neural networks are a form of multiprocessor computer system, with
a. Simple processing elements
b. A high degree of interconnection
c. Simple scalar messages
d. Adaptive interaction between elements
ESE Expressing Linear Perceptrons as Neurons
* The Perceptron is a kind of a single-layer artificial network with only one neuron. The
Percepton is a network in which the neuron unit calculates the linear combination of its
real-valued or boolean inputs and passes it through a threshold activation function.
* Fig. 4.1.2 shows the basic perceptron, The perceptron is sometimes referred to a Threshold
Logic Unit (TLU) since it discriminates the data depending on whether the sum is greater
than the threshold value.
Input 4
{ Output
‘Sigmoid
Threshold
Fig. 4.1.2
* The output of the neuron is a linear combination of the inputs rescaled by the synaptic
weights,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledged at } which class the input vector
or 1 ealybiogtn gai pobste
calculated. Each value of input array is
; between. and 1. Also, the summation
th weight value of | to represent threshold
er Wdeela) b+. =
inputs and a single output. The output of the
1 if s 01
W, X + Wx) +... tw, x, 50> 0
* The input values are presented to the perceptron and if the predicted output is the same as
_ the desired output, then the performance is considered satisfactory and no changes to the
weights are made,
* If the output does not match the desired output, then the weights need to be changed to
reduce the error.
_* The weight adjustment is done as follows :
aie Aw = nxdxx
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeerrr
Fundamental of Al 4-4 Deep Learning
* In other words, neural networks "learn" from examples, just like children learn to recognize
dogs from examples of dogs and exhibit some structural capability for generalization.
* Neural networks normally have great potential for parallelism, since the computations of
the components are independent of each other.
* Neural networks are a different paradigm for computing :
1. Von Neumann machines are based on the processing / memory abstraction of human
information processing.
2. Neural networks are based on the parallel architecture of animal brains.
© Neural networks are a form of multiprocessor computer system, with
a. Simple processing elements
b. A high degree of interconnection
c. Simple scalar messages
d. Adaptive interaction between elements
EEE] Expressing Linear Perceptrons as Neurons
¢ The Perceptron is a kind of a single-layer artificial network with only one neuron. The
Percepton is a network in which the neuron unit calculates the linear combination of its
real-valued or boolean inputs and passes it through a threshold activation function.
© Fig. 4.1.2 shows the basic perceptron. The perceptron is sometimes referred to a Threshold
Logic Unit (TLU) since it discriminates the data depending on whether the sum is greater
than the threshold value.
Input 4
l Output
Sigmoid
Threshold @
Input N
Fig. 4.1.2
* The output of the neuron is a linear combination of the inputs rescaled by the synaptic
weights.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge:of AL 4:5
ning is initiated by making adjustments to the relevant conection strengths and a
@ eshold value.
wwe consider only two class problem. Here output layer usually has only a single node.
* ror an n-class problem (i> 3), the output layer usually has n-nodes, each corresponding to
a class and the output node with the largest value indicates which class the input vector
pelongs 1 ‘
jp the first stage, the linear combination of inputs is calculated. Each value of input array is
.
associated with its weight value, which is normally between. 0 and 1. Also; the summation
function often takes an extra input value Theta with weight value of | to represent threshold
orbias of a neuron.
,_ nthe simplest case the network has only two inputs and a single output, The output of the
peuron iS:
2
wR {3 mx+9)
1
+ Suppose that the activation function is a threshold then
fuss { 1 if 850
Sdn ife ss)
‘The perceptron can represent most of the primitive boolean functions : AND, OR, NAND
and NOR but cannot represent XOR.
In single layer perceptron, initial weight values are assigned randomly because it does not
have previous knowledge. It sum all the weighted inputs. If the sum is greater than the
threshold value then it is activated i.e. output = 1.
output
WX) +W) x) +... +w,x,> 051
WX, +W)%) +... #w,x, S00
The input values are presented to the perceptron and if the predicted output is the same as
the desired output, then the performance is considered satisfactory and no changes to the
weights are made.
Ifthe output does not match the desired output, then the weights need to be changed to
reduce the error.
The weight adjustment is done as follows :
Aw = nxdxx
where
x= Input data
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFundamental of Al 4-8 Dep bearing
* Ibis not possible to find weights which enable single layer perceptrons to deal with non.
linearly separable problems like XOR
© Multi-layer perceptrons are able to cope with non-linearly separable problems.
* Bach neuron in one layer has direct connections to all the neurons of the subsequent layer,
MLP can implement nonlinear discriminants (for classification) and nonlinear regression
functions (for regression).
‘* Historically, the problem was that there were no known learning algorithms for training
MLPs. Fortunately; it is now known to be quite straightforward. The procedure for finding
@ gradient vector in the network structure is generally referred to as backpropagation,
Because the gradient vector is,calculated in the direction opposite to the flow of the output
of each node.
XOR
OR AND
Fig. 4.1.4
© Procedure of backpropagation :
1. The output values are compared with the target to compute the value of some
predefined error function.
The error is then fedback through the network,
3. Using this information, the algorithm adjusts the weights of each connection in order to
reduce the value of the error function.
* Continue this process until the connection weights in the network have been adjusted so
that the network output has converged, to an acceptable level, with the desired output.
* If we use the gradient vector in a simple steepest descent method, the resulting learning
paradigm is often referred to as the backpropagation learning rule. Backpropagation works
by approximating the non-linear relationship between the input and the output by adjusting
the weight values internally.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge|
ynsorertal of Al 4-9 Deep Learning
4 Generally, the backpropagation network has two stages, training and testini
training phase, the network is "shown" sample inputs and the correct classi
example, the input might be an encoded picture of a face and the output could be
represented by a code that corresponds to the name of the person.
\g. During the
fications. For
«Fig, 41.5 shows three most commonly used activation netions in backpropagation MPs.
0 40
Identify function
Logistic function Hyperbolic tangent
function,
Fig. 4.1.5 Activation function
Logistic function : f(x) =
Hyperbolic tangent function : f(x) =
(x/2) =
Identity function : f) = x
« Both the hyperbolic tangent function and logistic function approximate the signum and step
function respectively. Sometimes these two function are referred to as squashing functions
since the inputs to these functions are squashed to the range [0, 1] or = 1, 1]
‘These functions are also called sigmoidal functions because their S-shaped curves exhibits
smoothness and asymptotic properties.
A leaming process is organized through a learning algorithm, which is a process of
updating the weights in such a way that a machine learning tool implements a given
input/output mapping with no errors or with some minimal acceptable error.
Any learning algorithm is based on a certain learning rule, which determines how the
weights shall be updated if the error occurs
Backpropagation learning rule
+ The net input of a node is defined as the weighted sum of the incoming signals plus a bias
term, Fig, 4.1.6 shows the backpropagation MLP for node j. The net input and output of
node j is as follows
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFundamental of Al 4-10 Deep Learning
X= D+ wy kw,
!
MOR) eee
1+ exp (%)
x,y
4)
a
Wy,
Fig. 4.1.6 Backpropagation MLP for node j
Where, _ x; is the output of node i located in any one of the pervious layers,
Wj is the weight associated with the link connecting nodes i and j,
W, is the bias of node j,
* Internal parameters associated with each node j is the weight W;;. So changing the weights
of the node will change the behaviour of the whole backpropagation MLP..
* Fig. 4.1.7 shows two layer backpropagation MLP.
Hidden
layer
‘Back-propagation of error correction
Fig. 4.1.7 Two layer backpropagation MLP
*. The above backpropagation MLP will refer to as a 3-4-3 network, corresponding to the
number of nodes in each layer.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeee eee,
er AL 4M
phe backward error: propagation also known as ena ae (BP) or the
eanae! Delta Rule (GDR), A squidted error measure for the p input-output pair is
gefined as
B= Zo xy”
where dis the desired output for node k and x. is the actual output for node k when the
input part ofthe p data pair is presented.
¢ to find the gradient vector, an error term &; for node i is defined as :
ee
oR
The partial derivative can be rewritten as product of two terms using chain rule for partial
differentiation :
BE) — BEA), Ao)
dw; lt) Bai) * dw
BB sigmoid
« A sigmoid function produces a curve with 1
an"S" shape. The example sigmoid 0.9
function shown on the left is a special case 08
of the logistic function, which models the 0.7]
growth of some set. oe
|
sig) = OF rl
Ite 0.4!
In general, a sigmoid function is real-valued 0.3
and differentiable, having a non-negative or 92
non-positive first derivative, one local 4
minimum and one local maximum. °
ol
The logistic sigmoid function is related to
Fig. 4.1.8
the hyperbolic tangent as follows :
1-2sig(x) = 1-2
I+e
Sigmoid functions are often used in artificial neural networks to introduce nonlinearity in
the model.
A neural network element computes a linear combination of its input signals and applies a
sigmoid function to the result.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFundamental of Al 4-12 Deep Learning
* A reason for its popularity in neural networks is because the sigmoid function satisfies 4
property between the derivative and itself such that it is computationally easy to perform,
d
at 8 = sig(t) (1 - sig(t)
* Derivatives of the sigmoid function are usually employed in learning algorithms
EERES Zero - Centering
* Feature normalization is often required to neutralize the effect of different quantitative
features being measured on different scales. If the features are approximately normally
distributed, we can convert them into z-scores by centring on the mean and dividing by the
standard deviation. If we don’t want to assume normality we can centre on the median and
divide by the interquartile range,
Sometimes feature normalization is understood in the stricter sense of expressing the
feature on a [0,1] scale. If we know the feature's highest and lowest values h and 1, then we
can simply apply the linear scaling.
Feature calibration is understood as a supervised feature transformation adding a
meaningful scale carrying class information to arbitrary features. This has a number of
important advantages. For instance, it allows models that require scale, such as linear
classifiers, to handle categorical and ordinal features. It also allows the learning algorithm
to choose whether to treat a feature as categorical, ordinal or quantitative.
The goal of both types of normalization is to make it easier for your learning algorithm to
learn. In feature normalization, there are two standard things to do :
1. Centering : Moving the entire data set so that it is centered around the origin.
2. Scalin;
: Rescaling each feature so that one of the following holds :
a) Each feature has variance 1 across the training data.
b) Each feature has maximum absolute value 1 across the training data,
* The goal of centering is to make sure that no features are arbitrarily large.
Tanh and ReLU Neuron
* Tanh is also like logistic sigmoid but better. The range of the tanh function is from
1 to 1). Tanh is also sigmoidal (s - shaped).
* Fig. 4.1.9 shows tanh v/s Logistic Sigmoid.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFig. 4.1.9 tanh vis Logistic Sigmoid
« Tanh neuron is simply a scaled sigmoid neuron.
problems resolved by Tanh
1, The output is not zero centered
2, Small gradientof sigmoid function
+ ReLU (Rectified Linear Unit) is the most uséd activation function in the world right now.
Since, it is used in almost all the convolution neural networks or deep learning.
« Fig. 4.1.10 shows ReLU v/s Logistic Sigmoid.
Sigmoid
EEA 10)
8
5 10 =10
Fig. 4.1.10 ReLU vis Logistic Sigmoid
* As you can s
the ReLU is half rectified (from bottom). f(z) is zero when z is less than
zero and f(z) is equal to z. when z is above or equal to zero.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge* Compared to tanh/sigmoid neurons that involve expensive operations (exponent
the [Link] be implemented by simply thresholding a matrix of activations at 2
16 ce
Introduction to Neural Networks anode @
* Artificial Neural Network (ANN) is a computational system inspired by the structure,
processing method, learning ability of a biological brain. An artificial neural network is
composed of many artificial neurons that are linked together according to specific network
architecture. The objective of the neural network is to transform the inputs into meaningful
outputs. °
* ANNs do not execute programmed instructions; they respond in parallel to the pattern of
inputs presented to it. There are also no separate memory addresses for storing data,
Instead, information is contained in the overall activation 'state! of the network. 'Knowledge’
is thus represented by the network itself, which is quite literally more than the sum of its
individual components.
Fig. 4.2.1 shows artificial neural network.
Input
Input
Hidden tayer
Fig. 4.2.1 Artificial neural network
* Elements of ANN are processing units, topology and learning algorithm.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge—
andomontal
ar 4-15 Deep Learning
spasks to be solved by artificial neural networks
Controlling the movements of a robot based on self-perception and other information;
Deciding the category of potential food items in an artificial world;
Recognizing a visual object;
bo
Predicting where a moving object goes, when a robot wants to catch it.
Characteristics of Artificial Neural Networks
1, Large number of very simple processing neuron-like processing elements.
9, Large number of weighted connections between the elements.
3. Distributed representation of knowledge over the connections.
4, Knowledge is acquired by network through a learning process.
[4.2.1] Biological Content
gasic components of biological neurons
1. The majority of neurons encode their activations or outputs as a series of brief
electrical pulses (i.e. spikes or action potentials).
2, The neuron's cell body (soma) processes the incoming activations and converts them
into output activations.
3, The neuron's nucleus contains the genetic material in the form of DNA. This exists in
most types of cells, not just neurons.
4, Dendrites are fibres which emanate from the cell body and provide the receptive zones
that receive activation from other neurons.
as transmission lines that send activation to other neurons.
5, Axons are fibres actit
6. The junctions that allow signal transmission between the axons and dendrites are
called synapses. The process of transmission is by diffusion of chemicals called
neurotransmitter across the synaptic cleft.
* Comparison between Biological NN and Artificial NN
Biological NN Artificial NN
soma unit
Axon, dendrite dendrite
synapse weight
potential ‘weighted sum
threshold bias weight
signal activation
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeFundamental of Al 4-16 Deep Leaming
Neural Network Representation
© Neural Networks consists of many number of simple elements (neurons) connected
between them in system. Whole system is able to solve of complex tasks and to learn for it
like a natural brain.
* For user NN is black box with Input vector (source data) and Output vector (result),
* A Neural Network is usually structured into an input layer of neurons, one or more hidden
layers and one output layer.
+ Neurons belonging to adjecent layers are usually fully connected and the various types and
architectures are identified both by the different topologies adopted for the connections as
well by the choice of activation function.
* The values of the functions associated with the connections are called "weight:
© The whole game of using NNs is in the fact that, in order for the network to yield
appropriate outputs for given inputs, the weight must be set to suitable values. The way this
is obtained allows a further distinction among modes of operations.
© A neural network is a processing device, either an algorithm or actual hardware, whose
design was motivated by the design and functioning of human brains and components
thereof.
* Most neural networks have some sort of "training" rule whereby the weights of connections
are adjusted on the basis of presented patterns.
* In other words, neural networks "learn" from examples, just like children learn to recognize
dogs from examples of dogs and exhibit some structural capability for generalization.
© Neural networks normally have great potential for parallelism, since the computations of
the components are independent of each other.
© Neural networks are a different paradigm for computing :
1. Von Neumann machines are based on the processing / memory abstraction of human
information processing.
2. Neural networks are based on the parallel architecture of animal brains.
© Neural networks are a form of multiprocessor computer system, with :
a. Simple processing elements
b. A high degree of interconnection
c. Simple scalar messages
d. Adaptive interaction between elements.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge—
yam ea 4-17 Deep Learning
NN Architecture : Single Layer Network
, The architecture of the neural network refers to the arrangement of the connection between
neurons; processing element, number of layers and the flow of signal in the neural network.
Feed-forward networks [Recurrent networks
Multilayer ; a
single layer yy Radial basis Competitive Kohonen
perception | | Perception | | function net | | rotvenee SOM ee
Fig. 4.2.3,
+ There are mainly two category of neural network architecture :
a. Feed - forward
b, Feedback (recurrent) neural networks.
4, Architecture and learning rule
+ In late 1950s, Frank Rosenblatt introduced a network composed of the units that were
enhanced version of McCulloch-Pitts Threshold Logic Unit (TLU) model.
+ Rosenblatt's model of neuron, a perceptron, was the result of merger between two concepts
from the 1940s, McCulloch-Pitts model of an artificial neuron and Hebbian learning rule of
adjusting weights.
+ In addition to the variable weight values, the perceptron model added an extra input that
represents bias, Thus, the modified equation is now as follows :
Sum = z 1,W;
i=l
where b represents the bias value.
* Fig. 4.2.4 shows a typical perception setup for pattern recognition applications, in which
visual patterns are represented as matrices of elements between 0 and 1.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge‘Activation function
(more on this later)
Output
Fig. 4.2.4 Perception setup
1. First layer act as a set of feature detectors that are hardwired to the input signals to
detect specific features.
2. Second layer i.e. output layer takes the outputs of the feature detectors in the first layer
and classifies the given input pattern.
© Learning is initiated by making adjustments to the relevant connection strengths and a
threshold value .
* Here we consider only two class problem. Here output layer usually has only a single node,
For an n-class problem (n > 3), the output layer usually has n-nodes, each corresponding to
a class and the output node with the largest value indicates which class the input vector
belongs to.
¢ In the first stage, the linear combination of inputs is calculated. Each value of input array is
associated with its weight value, which is normally between 0 and 1. Also, the summation
function often takes an extra input value Theta with weight value of 1 to represent threshold
or bias of a neuron.
Assumptions :
1. At least one such set of weights, w*, exists and
2, There are a finite number of training patterns.
3. The threshold function is uni-polar (output is 0 or 1).
2. Exclusive OR problem
* XOR problem is a pattern recognition problem in neural network.
* Neural networks can be used to classify boolean functions depending on their desired
outputs.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge— OS
unaamentel of Al 4-19 Deep Learning
Fora two input binary XOR problem the desired output is given in the form of truth table.
Desired 1/0 pair 3
Desired VO pair 4
» The XOR problem is not linearly separable. We cannot use a single layer perceptron to
construct a straight line to partition the two dimensional input space into two regions, each
containing only data points of the same class,
+ Let us consider following four equations ;
Oxw, + 0X W, + Wy S 0 wy <0,
Oxw, +1 XW, + Wo > 0 € Wy >— Wy
Lxw, +0 XW, + Wo > 06 Wo>—w;
Lxwy +1 XW, + Wy <0. wy<-w,
Types of Deep Learning Models
Concept of Deep Learning Models
1, Deep learning models are complex networks that learn independently without human
intervention.
2. Deep learning model applies algorithms to immense data sets to find patterns and
solutions within the information.
3. Deep learning models usually have three or more layers of neural networks to help
process data. These models have the ability to process data that’s unstructured or
unlabeled, creating their own methods for identifying and understanding the information
without a person telling the computer what to look for or solve.
4, Deep learning models can identify both higher-level and lower-level information, hence
they can take difficult-to-understand data sets and create simpler, more efficient
categories. This ability allows the deep learning model to grow more accurate over time.
5. Deep learning models work by interacting with immense data sets and extracting
patterns and solutions from them through learning styles similar to what humans
naturally do. They use artificial neural networks to parse and process data sets. The
hetworks operate using algorithms, which allow the computer to adapt and learn on its
own without needing a human to guide the learning.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
— La
Fundamental of Al 4-20 Deep Learning
6. Each type of deep learning model use the same learning and training process, though
they are used for different applications. To train a deep learning model, huge data sets
need to feed into the network. This information passes from neuron to neuron, allowing
the computer to analyze and understand the data as it moves through the network,
7. Deep learning models have ability to analyze and process immense sets of unlabeled,
unstructured data, often too complex and unwieldy for humans to process on their own,
8. Deep learning models can learn information they aren’t specifically trained on, such ag
recommending new media based on one’s viewing habits compared to other users.
9. Deep leaming models are scalable and fast, so they have the ability to handle whatever
data sets are to be processed without large setup or maintenance. For effective and
successful deep learning complex setups are mandatory.
10. Small data sets and data privacy constraints may impact on success of deep learning,
Deep Learning Model Types
Convolutional Neural Networks
Fully connected layers
* Fully connected layers have the normal parameters for the layer and hyperparameters, This
layer performs transformations on the input data volume that are a function of the
activations in the input volume and the parameters.
© Neural networks are a set of dependent nonlinear functions. Each individual function
consists of a neuron (or a perceptron).
In fully connected layers, the neuron applies a linear transformation to the input vector
through a weights matrix. A non-linear transformation is then applied to the product
through a non-linear activation function f,
ea
Yy.) = (z Wi * Wo)
° Here, we are taking the dot product between the weights matrix W and the input vector x.
The bias term (WO) can be added inside the nonlinear function. I will ignore it for the rest
of the article as it doesn’t affect the output sizes or decision-making and is just another
weight.
* The activation function “f* wraps the dot product between the input of the layer and the
weights matrix of that layer.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeli RR rrr
yndamental of AL 4-24 Deep Learning
Espo besicistnicture of CNN
| 433.1 shows basie architecture of CNN,
ie Fully
connected
Convolution ern.
Feature extraction Classification
Fig. 4.3.1 Basic architecture of CNN
« Aconvolutional neural network, as discussed above, has the following layers that are useful
for various deep learning algorithms. Let us see the working of these layers taking an
example of the image having dimension of 12 x 12 x 4. These are :
1. Input layer : This layer will accept the image of width 12, height 12 and depth 4.
2, Convolution layer ; It computes the volume of the image by getting the dot product
between the image filters possible and the image patch. For example, there are 10 filters
possible, then the volume will be computed as 12 x 12 x 10.
3. Activation function layer : This layer applies activation function to each element in the
output of the convolutional layer. Some of the well accepted activation functions are
ReLu, Sigmoid, Tanh, Leaky ReLu, etc. These functions will not change the volume
obtained at the convolutional layer and hence it will remain equal to 12 x 12 x 10.
4, Pool layer : This function mainly reduces the volume of the intermediate outputs, which
enables fast computation of the network model, thus preventing it from overfitting.
EEEZ] Muttitayer Perceptron
* The perceptron is very useful for classifying data sets that are linearly separable.
* The Multilayer Perceptron (MLP) model features multiple layers that are interconnected in
such a way that they form. a feed-forward neural network. Each neuron in one layer has
directed connections to the neurons of a separate layer.
* Itconsists of three types of layers : the input layer, output layer and hidden layer.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
ee
tkae
Fundamental of Al 4-22 Peep Leering
* Fig, 4.3.2 shows multilayer perceptron model.
* The input layer receives the input signal to be processed. The input layer distributes the
values to each of the neurons in the hidden layer. In addition to the predictor Variables
there is @ constant input of 1.0, called the bias that is fed to each of the hidden layers, yy.
bias is multiplied by a weight and added to the sum going into the neuron.
Output
it
Inpul Hidden layer aver
layer
Input #4
Input #2
Input #3 —=(
Input #4 —m{
Fig, 4.3.2 Multilayer perceptron model
* Hidden layer : Arriving at a neuron in the hidden layer, the value from each input neuron
is multiplied by a weight and the resulting weighted values are added together producing a
combined value. The weighted sum is fed into a transfer function, which outputs a value,
The outputs from the hidden layer are distributed to the output layer.
© Output layer : Arriving at a neuron in the output layer, the value from each hidden layer
neuron is multiplied by a weight, and the resulting weighted values are added together
producing a combined value. The weighted sum is fed into a transfer function, which
outputs a value. The output values are the outputs of the network.
© The required task such as prediction and classification is performed by the output layer. An
arbitrary number of hidden layers that are placed in between the input and output layer are
the true computational engine of the MLP.
* The neurons in the MLP are trained with the back propagation learning algorithm. MLPs
are designed to approximate any continuous function and can solve problems which are not
linearly separable. The major use cases of MLP are pattern classification, recognition,
prediction and approximation.
* The perceptron is very useful for classifying data sets that are linearly separable. They
encounter serious limitations with data sets that do not conform to this pattern as discovered
with the XOR problem. The XOR problem shows that for any classification of four points
that there exists a set that are not linearly separable.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
eea 0 ee
santa of Al 423 ‘Deep Learning,
‘the Multilayer Perception breaks this restriction and classifies datasets which are not
: jinearly separable. ‘They do this by using a more robust and complex architecture to learn
regression and classification models for difficult datasets.
4. Deciding how many neurons to use in the hidden layers :
a) One of the most important characteristics of a perceptron network is the number of
neurons in the hidden layer(s). If an inadequate number of neurons are used, the
network will be unable to model complex data and the resulting fit will be poor.
b) If too many neurons are used, the training time may become excessively longs and,
worse, the network may over fit the data, When overfitting occurs, the network will
begin to model random noise in the data. The result is that the model fits the training
data extremely well, but it generalizes poorly to new, unseen data. Validation must be
used to test for this.
oa Recurrent Neural Networks
A recurrent neural network is a type of neural network that contains loops, allowing
information to be stored within the network.
+ A-RNN is particularly useful when a sequence of data is being processed to make a
classification decision or regression estimate but it can also be used on non-sequential data.
Recurrent neural networks are typically used to solve tasks related to time series data.
Applications of recurrent neural networks include natural language processing, speech
recognition, machine translation, character-level language modeling, image classification,
image captioning, stock prediction, and financial engineering.
« Fig, 4.3.3 shows architecture of recurrent neural network.
Recurrent network
a
“~~ Output layer
Input layer Soasd sii ais (class/target)
Hidden layers : "deep" if> 1
Fig. 4.3.3 Architecture of RNN
+ Recurrent Neural Networks can be thought of as a series of networks linked together. They
often have a chain-like architecture, making them applicable for tasks such as speech
recognition, language translation, etc.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgene
Fundamental of Al 4224 Deep Learing
* AM RNN ean be designed to operate across sequences of vectors in the input, output, op
both. For example, a sequenced input may take a sentence as an input and Output a positive
or negative sentiment value, Alternatively, a sequenced output may take an image as an
input and produce a sentence as an output.
Radial Basis Function Networks - Working, Architecture, Advantages
* A radial basis function network is a neural network approached by viewing the design as 4
curve-fitting (approximation) problem in a high dimensional space. Learning is equivalent
to finding a multidimensional function that provides a best fit to the training data, with the
criterion for “best fit” being measured in some statistical sense
Among various types of neural networks, radial basis function neural networks
(RBENN) are a unique class that have proved to be highly effective in various applications
including function approximation, time series prediction, classification and control.
Radial Basis Functions (RBFs) are a special category of feed-forward neural networks
comprising three layers :
= Input layer : Receives input data and passes it to the hidden layer.
= Hidden layer : The core computational layer where RBF neurons process the data,
= Output layer : Produces the network’s predictions, suitable for classification or
regression tasks.
Working of radial basis function networks
* RBF Networks are conceptually similar to k-Nearest Neighbor (k-NN) models, though their
implementation is distinct. The fundamental idea is that an item's predicted target value is
influenced by nearby items with similar predictor variable values.
° RBENs start with an n-dimensional input vector. This vector is fed into the input layer of
the network.
* The network also has a hidden layer, which comprises Radial Basis Function (RBF)
neurons. Each neuron in the hidden layer represents a prototype vector from the training set.
The network computes the Euclidean distance between the input vector and each neuron's
center.
* Each of these RBF neurons has a center and they measure how close the input is to their
center. The Euclidean distance is transformed using a Radial Basis Function called as @
Gaussian transfer function to compute the neuron’s activation value. This value decreases
exponentially as the distance increas
. The output of this function is higher when the input
is close to the neuron’s center and lower when the input is far away.
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgeundementel of Al 4-26 Deep Learning
The outputs from the hidden layer ate then combined in the output layer, Each node in the
output layer corresponds to a different category or class of data. The network determines
the input’s class by calculating a weighted sum of the outputs from the hidden layer.
+ The final output of the network is a combination of these weighted sums, which is used to
classify the input. Each output node calculates a score based on a weighted sum of the
activation values from all RBF neurons. The category with the highest score is chosen for
classification.
ar network architecture
«The typical architecture of a radial basis functions neural network consists of an input layer,
hidden layer and summation layer,
Input layer =
«The input layer consists of one neuron for every predictor variable. The input neurons pass
the value to each neuron in the hidden layer. N-I neurons are used for categorical values,
where N denotes the number of categories, The range of values is standardized by
subtracting the median and dividing by the interquartile range.
Hidden layer :
« The hidden layer contains a variable number of neurons (the ideal number determined by
the training process). Each neuron comprises a radial basis function centered on a point.
The number of dimensions coincides with the number of predictor variables. The radius or
spread of the RBF function may vary for each dimension.
When an x vector of input values is fed from the input layer, a hidden neuron calculates the
Euclidean distance between the test case and the neuron's center point, It then applies the
kernel function using the spread values. The resulting value gets fed into the summation
layer.
+ Every RBF neuron stores a prototype vector (also known as the neuron's center) from
amongst the vectors of the training set. An RBF neuron compares the input vector with its
prototype and outputs a value between 0 and 1 as'a measure of similarity. If an input is the
same as the prototype, the neuron's output will be 1. As the input and prototype difference
grows, the output falls exponentially towards 0, The shape of the response by the RBF
neuron is a bell curve, The response value is also called the activation value,
TECHNICAL PUBLICATIONS® - an up-thrust for knowledgerelated to the neuron
ya weight
; Tad valued ave added OF and the sum is
fo the summation ave one output per target
gg networks oulput CHS lent Hive
ey he probabili vvatuaied has tat cates
Br h category youre erin 10 classify.
categorys the value beint
Ily, classification
© The network's ‘output col
zach output node comput
decision is taken by assign
‘The score is calculated based on a Wei
peurons. It usually gives @ POStEv® weigl
thers. Each output nos
mprises a set of
tes a score for
ing the input to the cafe
ghted sum of
ht to the RBF
.de has its own S!
values from all RBF
suron belonging to ts CaleBOry and
et of weights.
a negative weight 10 01
Advantages
1. RBF has strong resistance to input
2, Radial Basi
complex non-]
functions and Gaussian fur
noise.
to solve the problems
tions,
that exist in datasets that have
.s Function, can be used
near distributions such as logarithmic fun trigonometric functions,
inction.
in the distribution can be
power
3, after utilizing the Radial Basis Function: hidden pattern:
generalized in a better Way.
‘L, Dealing with one hidden layer is quite e25Y in RBF.
§. It is possible to interpret the exact meaning of each node present in the hidden layer of
the radial basis function network.
EF] deep Belief Networks
4, Restricted Boltzmann Machines (RBMs)
chine, each node is connected to every other node and hence the
* Ina full Boltzmann ma
This is the reason RBMs are used. The restrictions in the
connections grow exponentially.
node connections in RBMs are as follows -
© Hidden nodes cannot be connected to one another
© Visible nodes connected to one another.
# A restricted tem
m mea eis
eas means that one is not permitted to connect two types of layer that are of
e to one a er. ke aa
a ie a another, In other words, the two hidden layers or input layers of
nable to fc col ct - i
10 form connections with one another. However, there may be
ver, there may
connect :
ions between the apparent and hidden layers.
TECHNICAL PUBLICA’
TIONS® - an up
17 up-thrust for knowledgepandemental of Al 4.27 Deep Leeming,
4, since there is No output! layer in the machine, it is unclear how the weights would be
erected and modified, and determine whether or not the prediction was right: One
response fits all the questions : Restricted Boltzmann Machine.
» The neural network that is a part of the energy-based model is: called RBM. It is 4
generative, unsupervised, probabilistic deep teaming algorithm. Finding the joint
probability distribution that maximizes the log-likelihood function is the goal of RBM.
RBM only has two layers : the input layer and the hidden layer and it is undirected. All of
the hidden nodes are linked to all of the visible nodes. RBM is also referred to as an
asymmetrical bipartite graph since it has two layers : a visible or input layer and a hidden
layer. The visible nodes don't have any connections inside the same layer. The concealed
nodes are not connected intralayer either. Only the input and hiding nodes have
connections.
« Allof the nodes in the original Boltzmann machine are connected. RBM is referred to as a
Restricted Boltzmann Machine since it restricts intralayer connectivity.
+ RBMs do not modify their weights through backpropagation and gradient descent since
they are undirected. They change their weights using a technique known as contrastive
divergence. The visible nodes’ weights are initially created at random and utilized to create
the hidden nodes. Then, these concealed nodés recreate exposed nodes using the same
weights. All throughout, the same weights were utilized to reconstruct the visible nodes.
Due to their lack of connectivity, the created nodes are different from one another.
Hidden units
Visible units @
Ee aan
om
Fig. 4,3.4 RBMs
Some key characteristics of the Restricted Boltzmann machine are ?
© There are no connections between the layers.
© They employ symmetric and recurring structures,
_—
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge