0% found this document useful (0 votes)
24 views35 pages

FAI DeepLearning

Deep learning is a subset of machine learning that utilizes multi-layer neural networks to learn complex representations of data, inspired by the structure of the human brain. Neural networks consist of interconnected neurons that process information through weighted inputs and activation functions, enabling them to learn from examples and generalize. The backpropagation algorithm is commonly used for training these networks by adjusting weights based on the error between predicted and actual outputs.

Uploaded by

snowier007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views35 pages

FAI DeepLearning

Deep learning is a subset of machine learning that utilizes multi-layer neural networks to learn complex representations of data, inspired by the structure of the human brain. Neural networks consist of interconnected neurons that process information through weighted inputs and activation functions, enabling them to learn from examples and generalize. The backpropagation algorithm is commonly used for training these networks by adjusting weights based on the error between predicted and actual outputs.

Uploaded by

snowier007
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
Fundamental of Al 4-2 Deep Learning ERI Concept of Deep Learning * Deep learning is a new area of machine learning research, which has been introduced with the objective of moving machine learning closer to one of its original goals. * Deep learning is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound ahd text. * ‘Deep learning’ means using a neural network with several layers of nodes between input and output. It is generally better than other methods on image, speech and certain other types of data because the series of layers between input and output do” feature identification and processing in a series of stages, just as our brains seem to. * Deep learning emphasizes the network architecture of today's most successful machine learning approaches. These methods are based on "deep" multi-layer neural networks with many hidden layers, EERE] The Neuron * Artificial neural systems are inspired by biological neural systems. The elementary building block of biological neural systems is the neuron. * The brain is a collection of about 10 billion interconnected neurons. Each neuron is a cell [right] that uses biochemical reactions to receive, process and. transmit information. Fig. 4-1-1 shows biological neural systems. Axon hillock Dendrite Nucleus! Terminal buttons, Fig. 4.1.1 Schematic of biological neuron ¢ The single cell neuron consists of the cell body or soma, the dendrites and the axon. The dendrites receive signals from the axons of other neurons. The small space between the . The afferent dendrites conduct impulses toward the soma. The efferent axon conducts impulses away from the axon of one neuron and the dendrite of another is the synapst soma. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge. Deep Learning fidial intelligence”, Rania sm ToosBlY thodeled based on the human brain, The field Connectionism, parallel distributed processing, neuro- : Systeins) inachine ‘learting algorithms “and atificial neural gai ct mation. rete Tyra 1 a pA AF x ural network is an information processing paradigm that is inspired by the yy biological nervous systems process information : sintsiea St “a large number of highly ae simple processing elements (neurons) working together to solve specific ‘oblems". imple artificial neuron , The basic computational element is often called anode or unit. It receives input from some other units or from an external source, » Each input has an’ associated weight Ww, which can be modified so as to model synaptic learning. ma EPR 4 The unit computes some function of the weighted sum of its inputs. pees 2 Neural Networks (NN) A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. Neurons belonging to adjacent layers are usually fully connected and the various types and architectures are identified both by the different topologies adopted for the connections as well by the choice of the activation function. The values of the functions associated with the connections are called "weights". The whole game of using NNs is in the fact that, in order for the network to yield appropriate outputs for given inputs, the weight must be set to suitable values. The way this "is obtained allows a further distinction among modes of operations, A neural network is a processing device, either an algorithm or actual hardware, whose design was motivated by the design and functioning of human brains and components thereof. _ * Most neural networks have some sort of "training" rule whereby the weights of connections are adjusted on the basis of presented patterns, TECHNICAL PUBLICATIONS® - an up-thrust for knowledge oa Fundamental of Al 4-4 260 bearing * In other words, neural networks "learn" from examples, just like children learn to recognize dogs from examples of dogs and exhibit some structural capability for generalization, * Neural networks normally have great potential for parallelism, since the computations of the components are independent of each other. Neural networks are a different paradigm for computing : 1, Von Neumann machines are based on the processing / memory abstraction of human information processing, 2. Neural networks are based on the parallel architecture of animal brains. * Neural networks are a form of multiprocessor computer system, with a. Simple processing elements b. A high degree of interconnection c. Simple scalar messages d. Adaptive interaction between elements ESE Expressing Linear Perceptrons as Neurons * The Perceptron is a kind of a single-layer artificial network with only one neuron. The Percepton is a network in which the neuron unit calculates the linear combination of its real-valued or boolean inputs and passes it through a threshold activation function. * Fig. 4.1.2 shows the basic perceptron, The perceptron is sometimes referred to a Threshold Logic Unit (TLU) since it discriminates the data depending on whether the sum is greater than the threshold value. Input 4 { Output ‘Sigmoid Threshold Fig. 4.1.2 * The output of the neuron is a linear combination of the inputs rescaled by the synaptic weights, TECHNICAL PUBLICATIONS® - an up-thrust for knowledge d at } which class the input vector or 1 ealybiogtn gai pobste calculated. Each value of input array is ; between. and 1. Also, the summation th weight value of | to represent threshold er Wdeela) b+. = inputs and a single output. The output of the 1 if s 01 W, X + Wx) +... tw, x, 50> 0 * The input values are presented to the perceptron and if the predicted output is the same as _ the desired output, then the performance is considered satisfactory and no changes to the weights are made, * If the output does not match the desired output, then the weights need to be changed to reduce the error. _* The weight adjustment is done as follows : aie Aw = nxdxx TECHNICAL PUBLICATIONS® - an up-thrust for knowledge errr Fundamental of Al 4-4 Deep Learning * In other words, neural networks "learn" from examples, just like children learn to recognize dogs from examples of dogs and exhibit some structural capability for generalization. * Neural networks normally have great potential for parallelism, since the computations of the components are independent of each other. * Neural networks are a different paradigm for computing : 1. Von Neumann machines are based on the processing / memory abstraction of human information processing. 2. Neural networks are based on the parallel architecture of animal brains. © Neural networks are a form of multiprocessor computer system, with a. Simple processing elements b. A high degree of interconnection c. Simple scalar messages d. Adaptive interaction between elements EEE] Expressing Linear Perceptrons as Neurons ¢ The Perceptron is a kind of a single-layer artificial network with only one neuron. The Percepton is a network in which the neuron unit calculates the linear combination of its real-valued or boolean inputs and passes it through a threshold activation function. © Fig. 4.1.2 shows the basic perceptron. The perceptron is sometimes referred to a Threshold Logic Unit (TLU) since it discriminates the data depending on whether the sum is greater than the threshold value. Input 4 l Output Sigmoid Threshold @ Input N Fig. 4.1.2 * The output of the neuron is a linear combination of the inputs rescaled by the synaptic weights. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge: of AL 4:5 ning is initiated by making adjustments to the relevant conection strengths and a @ eshold value. wwe consider only two class problem. Here output layer usually has only a single node. * ror an n-class problem (i> 3), the output layer usually has n-nodes, each corresponding to a class and the output node with the largest value indicates which class the input vector pelongs 1 ‘ jp the first stage, the linear combination of inputs is calculated. Each value of input array is . associated with its weight value, which is normally between. 0 and 1. Also; the summation function often takes an extra input value Theta with weight value of | to represent threshold orbias of a neuron. ,_ nthe simplest case the network has only two inputs and a single output, The output of the peuron iS: 2 wR {3 mx+9) 1 + Suppose that the activation function is a threshold then fuss { 1 if 850 Sdn ife ss) ‘The perceptron can represent most of the primitive boolean functions : AND, OR, NAND and NOR but cannot represent XOR. In single layer perceptron, initial weight values are assigned randomly because it does not have previous knowledge. It sum all the weighted inputs. If the sum is greater than the threshold value then it is activated i.e. output = 1. output WX) +W) x) +... +w,x,> 051 WX, +W)%) +... #w,x, S00 The input values are presented to the perceptron and if the predicted output is the same as the desired output, then the performance is considered satisfactory and no changes to the weights are made. Ifthe output does not match the desired output, then the weights need to be changed to reduce the error. The weight adjustment is done as follows : Aw = nxdxx where x= Input data TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Fundamental of Al 4-8 Dep bearing * Ibis not possible to find weights which enable single layer perceptrons to deal with non. linearly separable problems like XOR © Multi-layer perceptrons are able to cope with non-linearly separable problems. * Bach neuron in one layer has direct connections to all the neurons of the subsequent layer, MLP can implement nonlinear discriminants (for classification) and nonlinear regression functions (for regression). ‘* Historically, the problem was that there were no known learning algorithms for training MLPs. Fortunately; it is now known to be quite straightforward. The procedure for finding @ gradient vector in the network structure is generally referred to as backpropagation, Because the gradient vector is,calculated in the direction opposite to the flow of the output of each node. XOR OR AND Fig. 4.1.4 © Procedure of backpropagation : 1. The output values are compared with the target to compute the value of some predefined error function. The error is then fedback through the network, 3. Using this information, the algorithm adjusts the weights of each connection in order to reduce the value of the error function. * Continue this process until the connection weights in the network have been adjusted so that the network output has converged, to an acceptable level, with the desired output. * If we use the gradient vector in a simple steepest descent method, the resulting learning paradigm is often referred to as the backpropagation learning rule. Backpropagation works by approximating the non-linear relationship between the input and the output by adjusting the weight values internally. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge | ynsorertal of Al 4-9 Deep Learning 4 Generally, the backpropagation network has two stages, training and testini training phase, the network is "shown" sample inputs and the correct classi example, the input might be an encoded picture of a face and the output could be represented by a code that corresponds to the name of the person. \g. During the fications. For «Fig, 41.5 shows three most commonly used activation netions in backpropagation MPs. 0 40 Identify function Logistic function Hyperbolic tangent function, Fig. 4.1.5 Activation function Logistic function : f(x) = Hyperbolic tangent function : f(x) = (x/2) = Identity function : f) = x « Both the hyperbolic tangent function and logistic function approximate the signum and step function respectively. Sometimes these two function are referred to as squashing functions since the inputs to these functions are squashed to the range [0, 1] or = 1, 1] ‘These functions are also called sigmoidal functions because their S-shaped curves exhibits smoothness and asymptotic properties. A leaming process is organized through a learning algorithm, which is a process of updating the weights in such a way that a machine learning tool implements a given input/output mapping with no errors or with some minimal acceptable error. Any learning algorithm is based on a certain learning rule, which determines how the weights shall be updated if the error occurs Backpropagation learning rule + The net input of a node is defined as the weighted sum of the incoming signals plus a bias term, Fig, 4.1.6 shows the backpropagation MLP for node j. The net input and output of node j is as follows TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Fundamental of Al 4-10 Deep Learning X= D+ wy kw, ! MOR) eee 1+ exp (%) x,y 4) a Wy, Fig. 4.1.6 Backpropagation MLP for node j Where, _ x; is the output of node i located in any one of the pervious layers, Wj is the weight associated with the link connecting nodes i and j, W, is the bias of node j, * Internal parameters associated with each node j is the weight W;;. So changing the weights of the node will change the behaviour of the whole backpropagation MLP.. * Fig. 4.1.7 shows two layer backpropagation MLP. Hidden layer ‘Back-propagation of error correction Fig. 4.1.7 Two layer backpropagation MLP *. The above backpropagation MLP will refer to as a 3-4-3 network, corresponding to the number of nodes in each layer. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ee eee, er AL 4M phe backward error: propagation also known as ena ae (BP) or the eanae! Delta Rule (GDR), A squidted error measure for the p input-output pair is gefined as B= Zo xy” where dis the desired output for node k and x. is the actual output for node k when the input part ofthe p data pair is presented. ¢ to find the gradient vector, an error term &; for node i is defined as : ee oR The partial derivative can be rewritten as product of two terms using chain rule for partial differentiation : BE) — BEA), Ao) dw; lt) Bai) * dw BB sigmoid « A sigmoid function produces a curve with 1 an"S" shape. The example sigmoid 0.9 function shown on the left is a special case 08 of the logistic function, which models the 0.7] growth of some set. oe | sig) = OF rl Ite 0.4! In general, a sigmoid function is real-valued 0.3 and differentiable, having a non-negative or 92 non-positive first derivative, one local 4 minimum and one local maximum. ° ol The logistic sigmoid function is related to Fig. 4.1.8 the hyperbolic tangent as follows : 1-2sig(x) = 1-2 I+e Sigmoid functions are often used in artificial neural networks to introduce nonlinearity in the model. A neural network element computes a linear combination of its input signals and applies a sigmoid function to the result. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Fundamental of Al 4-12 Deep Learning * A reason for its popularity in neural networks is because the sigmoid function satisfies 4 property between the derivative and itself such that it is computationally easy to perform, d at 8 = sig(t) (1 - sig(t) * Derivatives of the sigmoid function are usually employed in learning algorithms EERES Zero - Centering * Feature normalization is often required to neutralize the effect of different quantitative features being measured on different scales. If the features are approximately normally distributed, we can convert them into z-scores by centring on the mean and dividing by the standard deviation. If we don’t want to assume normality we can centre on the median and divide by the interquartile range, Sometimes feature normalization is understood in the stricter sense of expressing the feature on a [0,1] scale. If we know the feature's highest and lowest values h and 1, then we can simply apply the linear scaling. Feature calibration is understood as a supervised feature transformation adding a meaningful scale carrying class information to arbitrary features. This has a number of important advantages. For instance, it allows models that require scale, such as linear classifiers, to handle categorical and ordinal features. It also allows the learning algorithm to choose whether to treat a feature as categorical, ordinal or quantitative. The goal of both types of normalization is to make it easier for your learning algorithm to learn. In feature normalization, there are two standard things to do : 1. Centering : Moving the entire data set so that it is centered around the origin. 2. Scalin; : Rescaling each feature so that one of the following holds : a) Each feature has variance 1 across the training data. b) Each feature has maximum absolute value 1 across the training data, * The goal of centering is to make sure that no features are arbitrarily large. Tanh and ReLU Neuron * Tanh is also like logistic sigmoid but better. The range of the tanh function is from 1 to 1). Tanh is also sigmoidal (s - shaped). * Fig. 4.1.9 shows tanh v/s Logistic Sigmoid. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Fig. 4.1.9 tanh vis Logistic Sigmoid « Tanh neuron is simply a scaled sigmoid neuron. problems resolved by Tanh 1, The output is not zero centered 2, Small gradientof sigmoid function + ReLU (Rectified Linear Unit) is the most uséd activation function in the world right now. Since, it is used in almost all the convolution neural networks or deep learning. « Fig. 4.1.10 shows ReLU v/s Logistic Sigmoid. Sigmoid EEA 10) 8 5 10 =10 Fig. 4.1.10 ReLU vis Logistic Sigmoid * As you can s the ReLU is half rectified (from bottom). f(z) is zero when z is less than zero and f(z) is equal to z. when z is above or equal to zero. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge * Compared to tanh/sigmoid neurons that involve expensive operations (exponent the [Link] be implemented by simply thresholding a matrix of activations at 2 16 ce Introduction to Neural Networks anode @ * Artificial Neural Network (ANN) is a computational system inspired by the structure, processing method, learning ability of a biological brain. An artificial neural network is composed of many artificial neurons that are linked together according to specific network architecture. The objective of the neural network is to transform the inputs into meaningful outputs. ° * ANNs do not execute programmed instructions; they respond in parallel to the pattern of inputs presented to it. There are also no separate memory addresses for storing data, Instead, information is contained in the overall activation 'state! of the network. 'Knowledge’ is thus represented by the network itself, which is quite literally more than the sum of its individual components. Fig. 4.2.1 shows artificial neural network. Input Input Hidden tayer Fig. 4.2.1 Artificial neural network * Elements of ANN are processing units, topology and learning algorithm. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge — andomontal ar 4-15 Deep Learning spasks to be solved by artificial neural networks Controlling the movements of a robot based on self-perception and other information; Deciding the category of potential food items in an artificial world; Recognizing a visual object; bo Predicting where a moving object goes, when a robot wants to catch it. Characteristics of Artificial Neural Networks 1, Large number of very simple processing neuron-like processing elements. 9, Large number of weighted connections between the elements. 3. Distributed representation of knowledge over the connections. 4, Knowledge is acquired by network through a learning process. [4.2.1] Biological Content gasic components of biological neurons 1. The majority of neurons encode their activations or outputs as a series of brief electrical pulses (i.e. spikes or action potentials). 2, The neuron's cell body (soma) processes the incoming activations and converts them into output activations. 3, The neuron's nucleus contains the genetic material in the form of DNA. This exists in most types of cells, not just neurons. 4, Dendrites are fibres which emanate from the cell body and provide the receptive zones that receive activation from other neurons. as transmission lines that send activation to other neurons. 5, Axons are fibres actit 6. The junctions that allow signal transmission between the axons and dendrites are called synapses. The process of transmission is by diffusion of chemicals called neurotransmitter across the synaptic cleft. * Comparison between Biological NN and Artificial NN Biological NN Artificial NN soma unit Axon, dendrite dendrite synapse weight potential ‘weighted sum threshold bias weight signal activation TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Fundamental of Al 4-16 Deep Leaming Neural Network Representation © Neural Networks consists of many number of simple elements (neurons) connected between them in system. Whole system is able to solve of complex tasks and to learn for it like a natural brain. * For user NN is black box with Input vector (source data) and Output vector (result), * A Neural Network is usually structured into an input layer of neurons, one or more hidden layers and one output layer. + Neurons belonging to adjecent layers are usually fully connected and the various types and architectures are identified both by the different topologies adopted for the connections as well by the choice of activation function. * The values of the functions associated with the connections are called "weight: © The whole game of using NNs is in the fact that, in order for the network to yield appropriate outputs for given inputs, the weight must be set to suitable values. The way this is obtained allows a further distinction among modes of operations. © A neural network is a processing device, either an algorithm or actual hardware, whose design was motivated by the design and functioning of human brains and components thereof. * Most neural networks have some sort of "training" rule whereby the weights of connections are adjusted on the basis of presented patterns. * In other words, neural networks "learn" from examples, just like children learn to recognize dogs from examples of dogs and exhibit some structural capability for generalization. © Neural networks normally have great potential for parallelism, since the computations of the components are independent of each other. © Neural networks are a different paradigm for computing : 1. Von Neumann machines are based on the processing / memory abstraction of human information processing. 2. Neural networks are based on the parallel architecture of animal brains. © Neural networks are a form of multiprocessor computer system, with : a. Simple processing elements b. A high degree of interconnection c. Simple scalar messages d. Adaptive interaction between elements. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge — yam ea 4-17 Deep Learning NN Architecture : Single Layer Network , The architecture of the neural network refers to the arrangement of the connection between neurons; processing element, number of layers and the flow of signal in the neural network. Feed-forward networks [Recurrent networks Multilayer ; a single layer yy Radial basis Competitive Kohonen perception | | Perception | | function net | | rotvenee SOM ee Fig. 4.2.3, + There are mainly two category of neural network architecture : a. Feed - forward b, Feedback (recurrent) neural networks. 4, Architecture and learning rule + In late 1950s, Frank Rosenblatt introduced a network composed of the units that were enhanced version of McCulloch-Pitts Threshold Logic Unit (TLU) model. + Rosenblatt's model of neuron, a perceptron, was the result of merger between two concepts from the 1940s, McCulloch-Pitts model of an artificial neuron and Hebbian learning rule of adjusting weights. + In addition to the variable weight values, the perceptron model added an extra input that represents bias, Thus, the modified equation is now as follows : Sum = z 1,W; i=l where b represents the bias value. * Fig. 4.2.4 shows a typical perception setup for pattern recognition applications, in which visual patterns are represented as matrices of elements between 0 and 1. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ‘Activation function (more on this later) Output Fig. 4.2.4 Perception setup 1. First layer act as a set of feature detectors that are hardwired to the input signals to detect specific features. 2. Second layer i.e. output layer takes the outputs of the feature detectors in the first layer and classifies the given input pattern. © Learning is initiated by making adjustments to the relevant connection strengths and a threshold value . * Here we consider only two class problem. Here output layer usually has only a single node, For an n-class problem (n > 3), the output layer usually has n-nodes, each corresponding to a class and the output node with the largest value indicates which class the input vector belongs to. ¢ In the first stage, the linear combination of inputs is calculated. Each value of input array is associated with its weight value, which is normally between 0 and 1. Also, the summation function often takes an extra input value Theta with weight value of 1 to represent threshold or bias of a neuron. Assumptions : 1. At least one such set of weights, w*, exists and 2, There are a finite number of training patterns. 3. The threshold function is uni-polar (output is 0 or 1). 2. Exclusive OR problem * XOR problem is a pattern recognition problem in neural network. * Neural networks can be used to classify boolean functions depending on their desired outputs. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge — OS unaamentel of Al 4-19 Deep Learning Fora two input binary XOR problem the desired output is given in the form of truth table. Desired 1/0 pair 3 Desired VO pair 4 » The XOR problem is not linearly separable. We cannot use a single layer perceptron to construct a straight line to partition the two dimensional input space into two regions, each containing only data points of the same class, + Let us consider following four equations ; Oxw, + 0X W, + Wy S 0 wy <0, Oxw, +1 XW, + Wo > 0 € Wy >— Wy Lxw, +0 XW, + Wo > 06 Wo>—w; Lxwy +1 XW, + Wy <0. wy<-w, Types of Deep Learning Models Concept of Deep Learning Models 1, Deep learning models are complex networks that learn independently without human intervention. 2. Deep learning model applies algorithms to immense data sets to find patterns and solutions within the information. 3. Deep learning models usually have three or more layers of neural networks to help process data. These models have the ability to process data that’s unstructured or unlabeled, creating their own methods for identifying and understanding the information without a person telling the computer what to look for or solve. 4, Deep learning models can identify both higher-level and lower-level information, hence they can take difficult-to-understand data sets and create simpler, more efficient categories. This ability allows the deep learning model to grow more accurate over time. 5. Deep learning models work by interacting with immense data sets and extracting patterns and solutions from them through learning styles similar to what humans naturally do. They use artificial neural networks to parse and process data sets. The hetworks operate using algorithms, which allow the computer to adapt and learn on its own without needing a human to guide the learning. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge — L a Fundamental of Al 4-20 Deep Learning 6. Each type of deep learning model use the same learning and training process, though they are used for different applications. To train a deep learning model, huge data sets need to feed into the network. This information passes from neuron to neuron, allowing the computer to analyze and understand the data as it moves through the network, 7. Deep learning models have ability to analyze and process immense sets of unlabeled, unstructured data, often too complex and unwieldy for humans to process on their own, 8. Deep learning models can learn information they aren’t specifically trained on, such ag recommending new media based on one’s viewing habits compared to other users. 9. Deep leaming models are scalable and fast, so they have the ability to handle whatever data sets are to be processed without large setup or maintenance. For effective and successful deep learning complex setups are mandatory. 10. Small data sets and data privacy constraints may impact on success of deep learning, Deep Learning Model Types Convolutional Neural Networks Fully connected layers * Fully connected layers have the normal parameters for the layer and hyperparameters, This layer performs transformations on the input data volume that are a function of the activations in the input volume and the parameters. © Neural networks are a set of dependent nonlinear functions. Each individual function consists of a neuron (or a perceptron). In fully connected layers, the neuron applies a linear transformation to the input vector through a weights matrix. A non-linear transformation is then applied to the product through a non-linear activation function f, ea Yy.) = (z Wi * Wo) ° Here, we are taking the dot product between the weights matrix W and the input vector x. The bias term (WO) can be added inside the nonlinear function. I will ignore it for the rest of the article as it doesn’t affect the output sizes or decision-making and is just another weight. * The activation function “f* wraps the dot product between the input of the layer and the weights matrix of that layer. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge li RR rrr yndamental of AL 4-24 Deep Learning Espo besicistnicture of CNN | 433.1 shows basie architecture of CNN, ie Fully connected Convolution ern. Feature extraction Classification Fig. 4.3.1 Basic architecture of CNN « Aconvolutional neural network, as discussed above, has the following layers that are useful for various deep learning algorithms. Let us see the working of these layers taking an example of the image having dimension of 12 x 12 x 4. These are : 1. Input layer : This layer will accept the image of width 12, height 12 and depth 4. 2, Convolution layer ; It computes the volume of the image by getting the dot product between the image filters possible and the image patch. For example, there are 10 filters possible, then the volume will be computed as 12 x 12 x 10. 3. Activation function layer : This layer applies activation function to each element in the output of the convolutional layer. Some of the well accepted activation functions are ReLu, Sigmoid, Tanh, Leaky ReLu, etc. These functions will not change the volume obtained at the convolutional layer and hence it will remain equal to 12 x 12 x 10. 4, Pool layer : This function mainly reduces the volume of the intermediate outputs, which enables fast computation of the network model, thus preventing it from overfitting. EEEZ] Muttitayer Perceptron * The perceptron is very useful for classifying data sets that are linearly separable. * The Multilayer Perceptron (MLP) model features multiple layers that are interconnected in such a way that they form. a feed-forward neural network. Each neuron in one layer has directed connections to the neurons of a separate layer. * Itconsists of three types of layers : the input layer, output layer and hidden layer. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ee tk ae Fundamental of Al 4-22 Peep Leering * Fig, 4.3.2 shows multilayer perceptron model. * The input layer receives the input signal to be processed. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor Variables there is @ constant input of 1.0, called the bias that is fed to each of the hidden layers, yy. bias is multiplied by a weight and added to the sum going into the neuron. Output it Inpul Hidden layer aver layer Input #4 Input #2 Input #3 —=( Input #4 —m{ Fig, 4.3.2 Multilayer perceptron model * Hidden layer : Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight and the resulting weighted values are added together producing a combined value. The weighted sum is fed into a transfer function, which outputs a value, The outputs from the hidden layer are distributed to the output layer. © Output layer : Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight, and the resulting weighted values are added together producing a combined value. The weighted sum is fed into a transfer function, which outputs a value. The output values are the outputs of the network. © The required task such as prediction and classification is performed by the output layer. An arbitrary number of hidden layers that are placed in between the input and output layer are the true computational engine of the MLP. * The neurons in the MLP are trained with the back propagation learning algorithm. MLPs are designed to approximate any continuous function and can solve problems which are not linearly separable. The major use cases of MLP are pattern classification, recognition, prediction and approximation. * The perceptron is very useful for classifying data sets that are linearly separable. They encounter serious limitations with data sets that do not conform to this pattern as discovered with the XOR problem. The XOR problem shows that for any classification of four points that there exists a set that are not linearly separable. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ee a 0 ee santa of Al 423 ‘Deep Learning, ‘the Multilayer Perception breaks this restriction and classifies datasets which are not : jinearly separable. ‘They do this by using a more robust and complex architecture to learn regression and classification models for difficult datasets. 4. Deciding how many neurons to use in the hidden layers : a) One of the most important characteristics of a perceptron network is the number of neurons in the hidden layer(s). If an inadequate number of neurons are used, the network will be unable to model complex data and the resulting fit will be poor. b) If too many neurons are used, the training time may become excessively longs and, worse, the network may over fit the data, When overfitting occurs, the network will begin to model random noise in the data. The result is that the model fits the training data extremely well, but it generalizes poorly to new, unseen data. Validation must be used to test for this. oa Recurrent Neural Networks A recurrent neural network is a type of neural network that contains loops, allowing information to be stored within the network. + A-RNN is particularly useful when a sequence of data is being processed to make a classification decision or regression estimate but it can also be used on non-sequential data. Recurrent neural networks are typically used to solve tasks related to time series data. Applications of recurrent neural networks include natural language processing, speech recognition, machine translation, character-level language modeling, image classification, image captioning, stock prediction, and financial engineering. « Fig, 4.3.3 shows architecture of recurrent neural network. Recurrent network a “~~ Output layer Input layer Soasd sii ais (class/target) Hidden layers : "deep" if> 1 Fig. 4.3.3 Architecture of RNN + Recurrent Neural Networks can be thought of as a series of networks linked together. They often have a chain-like architecture, making them applicable for tasks such as speech recognition, language translation, etc. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge ne Fundamental of Al 4224 Deep Learing * AM RNN ean be designed to operate across sequences of vectors in the input, output, op both. For example, a sequenced input may take a sentence as an input and Output a positive or negative sentiment value, Alternatively, a sequenced output may take an image as an input and produce a sentence as an output. Radial Basis Function Networks - Working, Architecture, Advantages * A radial basis function network is a neural network approached by viewing the design as 4 curve-fitting (approximation) problem in a high dimensional space. Learning is equivalent to finding a multidimensional function that provides a best fit to the training data, with the criterion for “best fit” being measured in some statistical sense Among various types of neural networks, radial basis function neural networks (RBENN) are a unique class that have proved to be highly effective in various applications including function approximation, time series prediction, classification and control. Radial Basis Functions (RBFs) are a special category of feed-forward neural networks comprising three layers : = Input layer : Receives input data and passes it to the hidden layer. = Hidden layer : The core computational layer where RBF neurons process the data, = Output layer : Produces the network’s predictions, suitable for classification or regression tasks. Working of radial basis function networks * RBF Networks are conceptually similar to k-Nearest Neighbor (k-NN) models, though their implementation is distinct. The fundamental idea is that an item's predicted target value is influenced by nearby items with similar predictor variable values. ° RBENs start with an n-dimensional input vector. This vector is fed into the input layer of the network. * The network also has a hidden layer, which comprises Radial Basis Function (RBF) neurons. Each neuron in the hidden layer represents a prototype vector from the training set. The network computes the Euclidean distance between the input vector and each neuron's center. * Each of these RBF neurons has a center and they measure how close the input is to their center. The Euclidean distance is transformed using a Radial Basis Function called as @ Gaussian transfer function to compute the neuron’s activation value. This value decreases exponentially as the distance increas . The output of this function is higher when the input is close to the neuron’s center and lower when the input is far away. TECHNICAL PUBLICATIONS® - an up-thrust for knowledge undementel of Al 4-26 Deep Learning The outputs from the hidden layer ate then combined in the output layer, Each node in the output layer corresponds to a different category or class of data. The network determines the input’s class by calculating a weighted sum of the outputs from the hidden layer. + The final output of the network is a combination of these weighted sums, which is used to classify the input. Each output node calculates a score based on a weighted sum of the activation values from all RBF neurons. The category with the highest score is chosen for classification. ar network architecture «The typical architecture of a radial basis functions neural network consists of an input layer, hidden layer and summation layer, Input layer = «The input layer consists of one neuron for every predictor variable. The input neurons pass the value to each neuron in the hidden layer. N-I neurons are used for categorical values, where N denotes the number of categories, The range of values is standardized by subtracting the median and dividing by the interquartile range. Hidden layer : « The hidden layer contains a variable number of neurons (the ideal number determined by the training process). Each neuron comprises a radial basis function centered on a point. The number of dimensions coincides with the number of predictor variables. The radius or spread of the RBF function may vary for each dimension. When an x vector of input values is fed from the input layer, a hidden neuron calculates the Euclidean distance between the test case and the neuron's center point, It then applies the kernel function using the spread values. The resulting value gets fed into the summation layer. + Every RBF neuron stores a prototype vector (also known as the neuron's center) from amongst the vectors of the training set. An RBF neuron compares the input vector with its prototype and outputs a value between 0 and 1 as'a measure of similarity. If an input is the same as the prototype, the neuron's output will be 1. As the input and prototype difference grows, the output falls exponentially towards 0, The shape of the response by the RBF neuron is a bell curve, The response value is also called the activation value, TECHNICAL PUBLICATIONS® - an up-thrust for knowledge related to the neuron ya weight ; Tad valued ave added OF and the sum is fo the summation ave one output per target gg networks oulput CHS lent Hive ey he probabili vvatuaied has tat cates Br h category youre erin 10 classify. categorys the value beint Ily, classification © The network's ‘output col zach output node comput decision is taken by assign ‘The score is calculated based on a Wei peurons. It usually gives @ POStEv® weigl thers. Each output nos mprises a set of tes a score for ing the input to the cafe ghted sum of ht to the RBF .de has its own S! values from all RBF suron belonging to ts CaleBOry and et of weights. a negative weight 10 01 Advantages 1. RBF has strong resistance to input 2, Radial Basi complex non-] functions and Gaussian fur noise. to solve the problems tions, that exist in datasets that have .s Function, can be used near distributions such as logarithmic fun trigonometric functions, inction. in the distribution can be power 3, after utilizing the Radial Basis Function: hidden pattern: generalized in a better Way. ‘L, Dealing with one hidden layer is quite e25Y in RBF. §. It is possible to interpret the exact meaning of each node present in the hidden layer of the radial basis function network. EF] deep Belief Networks 4, Restricted Boltzmann Machines (RBMs) chine, each node is connected to every other node and hence the * Ina full Boltzmann ma This is the reason RBMs are used. The restrictions in the connections grow exponentially. node connections in RBMs are as follows - © Hidden nodes cannot be connected to one another © Visible nodes connected to one another. # A restricted tem m mea eis eas means that one is not permitted to connect two types of layer that are of e to one a er. ke aa a ie a another, In other words, the two hidden layers or input layers of nable to fc col ct - i 10 form connections with one another. However, there may be ver, there may connect : ions between the apparent and hidden layers. TECHNICAL PUBLICA’ TIONS® - an up 17 up-thrust for knowledge pandemental of Al 4.27 Deep Leeming, 4, since there is No output! layer in the machine, it is unclear how the weights would be erected and modified, and determine whether or not the prediction was right: One response fits all the questions : Restricted Boltzmann Machine. » The neural network that is a part of the energy-based model is: called RBM. It is 4 generative, unsupervised, probabilistic deep teaming algorithm. Finding the joint probability distribution that maximizes the log-likelihood function is the goal of RBM. RBM only has two layers : the input layer and the hidden layer and it is undirected. All of the hidden nodes are linked to all of the visible nodes. RBM is also referred to as an asymmetrical bipartite graph since it has two layers : a visible or input layer and a hidden layer. The visible nodes don't have any connections inside the same layer. The concealed nodes are not connected intralayer either. Only the input and hiding nodes have connections. « Allof the nodes in the original Boltzmann machine are connected. RBM is referred to as a Restricted Boltzmann Machine since it restricts intralayer connectivity. + RBMs do not modify their weights through backpropagation and gradient descent since they are undirected. They change their weights using a technique known as contrastive divergence. The visible nodes’ weights are initially created at random and utilized to create the hidden nodes. Then, these concealed nodés recreate exposed nodes using the same weights. All throughout, the same weights were utilized to reconstruct the visible nodes. Due to their lack of connectivity, the created nodes are different from one another. Hidden units Visible units @ Ee aan om Fig. 4,3.4 RBMs Some key characteristics of the Restricted Boltzmann machine are ? © There are no connections between the layers. © They employ symmetric and recurring structures, _— TECHNICAL PUBLICATIONS® - an up-thrust for knowledge

You might also like