0% found this document useful (0 votes)

146 views90 pages

Deep Learning

Uploaded by

Hager Fathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

146 views90 pages

Deep Learning

Uploaded by

Hager Fathy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

‫جامعة المنوفية‬

{Deep Learning}
{Selected by Dr. Asmaa Awad }
{Third level}

Faculty of Artificial Intelligence

Menoufia University
2024-2025

1
‫‪‬رؤية الكلية‬

‫خلق جيل مبتكر قادر على إيداع تقنيات الذكاء في األنظمة واألشياء لتحقيق التطور والنماء‪.‬‬

‫‪‬رسالة الكلية‬

‫إعداد كوادر علمية متخصصة ومتفردة تمتلك من المهارات والمعرفة في مجال الذكاء االصطناعي ما يؤهلها‬
‫لمواكبة التطور في هذا المجال وابتكار أبحاث وتطبيقات قادرة على المنافسة محليًا وإقليميًا ودوليًا تنمو‬
‫بالمجتمع المحيط وتطور صناعته في ظل خطة مصر للتنمية المستدامة‪.‬‬

‫‪Table of Content‬‬ ‫‪2‬‬

Course Description:
This course provides foundational and advanced knowledge in neural networks and deep learning. Students will
explore the principles, architectures, and algorithms used to develop and train neural networks. The course
includes hands-on implementation of models for solving real-world problems in areas like image processing,
natural language processing, and predictive analytics.

Course Objectives:
1. Understand the theoretical foundations of neural networks and deep learning.
2. Learn to design, train, and optimize neural network models.
3. Gain proficiency in applying neural networks to practical problems using modern frameworks.

Intended Learning Outcomes (ILOs):

a. Knowledge and Understanding:
• Comprehend the structure and components of artificial neural networks (ANNs).
• Understand advanced architectures like CNNs, RNNs, and GANs.
• Familiarize with optimization techniques and learning algorithms.
b. Intellectual Skills:
• Analyze the performance of neural network models and troubleshoot challenges.
• Design appropriate architectures for specific data-driven problems.
• Explore advanced concepts like transfer learning and deep reinforcement learning.
c. Professional and Practical Skills:
• Develop deep learning models using frameworks like TensorFlow and PyTorch.
• Train, validate, and deploy models on diverse datasets.
• Solve domain-specific problems using deep learning techniques.
d. General and Transferable Skills:
• Enhance problem-solving and analytical thinking.
• Collaborate effectively on group projects.
• Communicate technical concepts and project results clearly.

Course Topics:
Table of Content 3
Neural Networks:
1. Introduction to Neural Networks: Structure and biological inspiration.
2. Feedforward Neural Networks: Architecture, activation functions, and backpropagation.
3. Optimization Techniques: Gradient descent, learning rate tuning, and regularization.
4. Applications: Handwritten digit recognition and simple classification tasks.
Deep Learning:
5. Convolutional Neural Networks (CNNs): Image recognition, feature extraction, and pooling.
6. Recurrent Neural Networks (RNNs): Sequential data modeling with LSTM and GRU networks.
7. Generative Adversarial Networks (GANs): Image generation and creative applications.
8. Advanced Topics: Transformers, attention mechanisms, and NLP applications.

Table of Content 4
Contents
CHAPTER 1: NEURAL NETWORKS .................................................................................................................. 7
1.1Overview .............................................................................................................................................. 7
1.2 History: ................................................................................................................................................ 7
1.3 Applications:........................................................................................................................................ 8
1.4 Biological Inspiration: ........................................................................................................................ 10
1.5Neuron Model and Network Architectures: ...................................................................................... 12
1.5.1 Neuron Model: ........................................................................................................................... 12
1.6 Network Architectures: ..................................................................................................................... 18
1.6.1 Single Layer network: ................................................................................................................. 18
1.6.2 Multiple Layers Network: ........................................................................................................... 20
1.6.3 Recurrent Networks: .................................................................................................................. 21
1.7Neural Network Learning: .................................................................................................................. 23
1.7.1 Types of Learning: ...................................................................................................................... 23
1.8 Learning Rules: .................................................................................................................................. 24
1.8.1 Hebbian Learning ....................................................................................................................... 24
1.8.2 Perceptron.................................................................................................................................. 27
1.8.3 backpropagation ........................................................................................................................ 31
Chapter 3 :Convolution Neural Network..................................................................................................... 45
3.1 What Computer "See" ........................................................................................................................... 45
3.2. Learning Visual Features ...................................................................................................................... 45
3.2.1 Using Spatial Structure ................................................................................................................... 46
3.3. Feature Extraction and Convolution - A Case study ................................................................................ 48
3.4. Convolutional Neural Networks (CNNs).................................................................................................. 50
3.4.1 CNNs: Spatial Arrangement of Output Volume ................................................................................. 52
3.4.2 Introducing Non-Linearity............................................................................................................... 52
3.4.3 Pooling ......................................................................................................................................... 53
3.4.4 CNNs for Classification: Feature Learning ........................................................................................ 54
3.5. An Architecture for Many Applications .................................................................................................. 56
3.5.1 Object Detection ............................................................................................................................ 57
3.5.2 Semantic Segmentation: Fully Convolutional Networks ..................................................................... 58
3.5.3 Continuous Control: navigation from Vision ..................................................................................... 59
3.6 Summary .............................................................................................................................................. 59

Table of Content 5
Chapter 4 : Recurrent Neural Network ....................................................................................................... 60
4.1Recurrent Neural Networks ............................................................................................................... 60
4.1.2Why sequence models ................................................................................................................ 60
4.1.2Name entity ................................................................................................................................ 61
4.1.3Representing words: ................................................................................................................... 62
4.1.4Forward RNN ............................................................................................................................... 65
4.1.5Backpropagation through time ................................................................................................... 67
4.1.6Different types of RNNs .............................................................................................................. 68
4.1.7Language model and sequence generation ................................................................................ 70
4.1.8 How to build language models with RNNs? ............................................................................... 70
4.1.9Vanishing gradients with RNNs ................................................................................................... 72
4.2Gated Recurrent Unit (GRU) .............................................................................................................. 74
4.3 LSTM Networks ................................................................................................................................. 78
4.1 Step-by-Step LSTM Walk Through ................................................................................................. 80
4.2Variants on Long Short Term Memory ........................................................................................... 81
4.3LSTM Example ................................................................................................................................ 83
4.4Course summary ................................................................................................................................ 90

Table of Content 6
CHAPTER 1: NEURAL NETWORKS
1.1Overview
Artificial Neural Network (ANN) is a mathematical function designed to mimic the basic function of a
biological neuron and it has used in many application such as Prediction, Classification of inputs and Data
Filtering.

The training of the network by using back propagation algorithm is produced where in the forward pass the
actual output is calculated and in the backward path the weights between output layer and hidden layer and
between hidden layer and input layer will be adjusted, then steps of this algorithm is repeated until the error is
reduced and the importance of sigmoid transfer function is presented also in details.

1.2 History:
A neural network is a machine that is designed to simulate the way of a human brain works, which is
composed of a large number neurons working to gather to solve a specific problem.

The history of Artificial Neural Network can be traced back to the early 1940s. The first important paper on
neural network was published by physiologist, Warren McCulloch and Walter Pitts in 1943, they proposed a
simple model of neuron with electronic circuit, this model consists of two input and one output, in 1949 Donald
Hebb proposed a learning law that become starting point for neural network training algorithm, in the 1950 and
1960, many researchers (Block, Minsky, Papert and Rosenblatt) worked on Perceptrons, where the first type of
neural network is called Perceptrons. The Perceptron is a very simple mathematical representation of the neuron
where most Artificial Neural Network is based on it to this day as shown below in figure 2.1

Figure 2.1: Perceptron [23].

Table of Content 7
This figure shows that the inputs of the neuron are represented by X1, X2… Xm then multiplied by
corresponding weight W1, W2… Wm similar to the synaptic strength in a biological neuron, the externally
applied bias is denoted by b. Summation of these inputs with their corresponding weights and bias ‘b’ is
symbolized by V, where V is calculated by equation 2.1:

V = ∑𝑚
𝑖=1 𝑊𝑖 ∗ 𝑋𝑖 + 𝑏 (2.1)

After that, the activation function is compared with value of a certain threshold. If the total summation of the
inputs multiplied by their corresponding weight is more than the threshold the output (O) will be “fires” and if
the total summation of the inputs multiplied by their corresponding weight is less than the threshold the output
(O) will be “not fires”.

Bernard Widrow and Marcian Hoff in 1959, they developed model called "ADALINE" (Adaptive Linier
Neuron) and "MADALINE" is composed of "many ADALINE" (Multilayer ADALINE).

Widrow and Hoff in 1960 developed a mathematical method for adapting the weight, where this algorithm was
depended on minimizing the error squared, and then this algorithm would become called as least mean square
error (LMS). In1962, Frank Rosenblatt was able to demonstrate the convergence of a learning algorithm. In
1969, Marvin Minsky and Seymour Papert published a book in which they showed that Perceptron could not
learn this function which are not linearly separable.

The effect of these problems was to limit of the funding available for research into artificial neural networks
therefore the neural networks research declined throughout 1970 and until mid of 1980.After a proof of the
limitations of neural network in the 1970's, but much work was done on self-organizing maps by Willshaw and
von der Malsburg. Hopfield presented a paper on neural networks with feedback known as Hopfield Networks.

The back propagation algorithm was first developed by Werbos in 1974; the most development happened
around 1985- 1986 when Rumelhart, Hinton and Willimas invented (back-propagation), where back-
propagation is a powerful tool for training multilayer neural network. Appearance of back-propagation method
has spectacular the range of problems to which neural network can be applied

1.3 Applications:
Google uses neural networks for image tagging (automatically identifying an image and assigning keywords), and
Microsoft has developed neural networks that can help convert spoken English speech into spoken Chinese speech. These
examples are indicative of the broad range of applications that can be found for neural networks. The applications are
Table of Content 8
expanding because neural networks are good at solving problems, not just in engineering, science and mathematics, but in
medicine, business, finance and literature as well. Their application to a wide variety of problems in many fields makes
them very attractive. Also, faster computers and faster algorithms have made it possible to use neural networks to solve
complex industrial problems that formerly required too much computation.

The following list are some of neural network applications:

1. Aerospace:

High performance aircraft autopilots, flight path simulations, aircraft control systems, autopilot enhancements,
aircraft component simulations, aircraft component fault detectors.

2. Automotive:

Automobile automatic guidance systems, fuel injector control, automatic braking systems, misfire detection,
virtual emission sensors, warranty activity analyzers.

3. Banking:

Check and other document readers, credit application evaluators, cash forecasting, firm classification, exchange
rate forecasting, predicting loan recovery rates, measuring credit risk.

4. Defense:

Weapon steering, target tracking, object discrimination, facial recognition, new kinds of sensors, sonar, radar and
image signal processing including data compression, feature extraction and noise suppression, signal/image identification.

5. Electronics:

Code sequence prediction, integrated circuit chip layout, process control, chip failure analysis, machine vision,
voice synthesis, nonlinear modeling.

6. Entertainment:

Animation, special effects, market forecasting.

7. Financial:

Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, credit line use analysis, portfolio
trading program, corporate financial analysis, currency price prediction.

8. Insurance:

Policy application evaluation, product optimization.

Table of Content 9
9. Manufacturing:

Manufacturing process control, product design and analysis, process and machine diagnosis, real-time particle
identification, visual quality inspection systems, beer testing, welding quality analysis, paper quality prediction, computer
chip quality analysis, analysis of grinding operations, chemical product design analysis, machine maintenance analysis,
project bidding, planning and management, dynamic modeling of chemical process systems.

10. Medical:

Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, `timization of transplant times, hospital
expense reduction, hospital quality improvement, emergency room test advisement.

11. Oil and Gas:

Exploration, smart sensors, reservoir modeling, well treatment decisions, seismic interpretation.

12. Robotics:

Trajectory control, forklift robot, manipulator controllers, vision systems, autonomous vehicles.

13. Speech:

Speech recognition, speech compression, vowel classification, text to speech synthesis.

14. Securities:

Market analysis, automatic bond rating, stock trading advisory systems.

15. Telecommunications:

Image and data compression, automated information services, real-time translation of spoken language, customer
payment processing systems.

16. Transportation:

Truck brake diagnosis systems, vehicle scheduling, routing systems.

1.4 Biological Inspiration:

In this section we will briefly describe the characteristics of brain function that have inspired the development of
artificial neural networks.

The brain consists of a large number of highly connected elements called neurons. These neurons have three
principal components: the dendrites, the cell body and the axon as shown in Figure 1.

Table of Content 10
Figure 1: Biological Neuron
The dendrites are tree-like receptive networks of nerve fibers that carry electrical signals into the cell body.

The cell body effectively sums and thresholds these incoming signals.

The axon is a single long fiber that carries the signal from the cell body out to other neurons.

The point of contact between an axon of one cell and a dendrite of another cell is called a synapse. It is the
arrangement of neurons and the strengths of the individual synapses, determined by a complex chemical process, that
establishes the function of the neural network.

The synapses are the connections which enable the transfer of electric axon impulses from a particular neuron to
dendrites of other neurons, as illustrated in Figure 2.

Figure 2: The synaptic connection between neurons

2.1.1 How does the Human's Brain Work?

The human brain has close to 100 billion nerve cells, called neurons. Each neuron is connected to thousands
of others, creating a neural network that shuttles information in the form of stimuli, in and out of the brain
constantly. Each of the yellow blobs in the figure 2.3 are neuronal cell bodies (soma), each neuron has long, thin
nerve fibres called dendrites that bring information in and even longer fibres called axons that send information
away.

Table of Content 11
Figure 2.3: Biological neurons of human brain [47].

The neuron receives information in the form of electrical signals from neighboring neurons across one of
thousands of synapses, small gaps that separate two neurons and act as input channels.

Once a neuron has received this charge it triggers either a "go" signal that allows the message to be passed to
the next neuron or a "stop" signal that prevents the message from being forwarded, so it is important to note that
a neuron fires only if the total signal received at the cell body exceeds a certain level.

For example, when a person thinks of something, sees an image, or smells a scent, that mental process or
sensory stimulus excites a neuron, which fires an electrical pulse that shoots out through the axons and fires
across the synapse. If enough input is received at the same time, the neuron is activated to send out a signal to
be picked up by the next neuron's dendrites

1.5Neuron Model and Network Architectures:

Notation: Scalars — small italic letters: a, b, c
Vectors — small bold non italic letters: a, b, c
Matrices — capital BOLD non italic letters: A, B, C

1.5.1 Neuron Model:

Single Input Neuron:
A single-input neuron is shown in Figure 3. The scalar input 𝑝 is multiplied by the scalar weight
𝑤 to form, one of the terms that is sent to the summer. The other input, 1 , is multiplied by a bias
(offset) 𝑏 and then passed to the summer. The summer output 𝑛 , often referred to as the net input,
goes into a transfer function (activation function) 𝑓 , which produces the scalar neuron output.

Table of Content 12
If we relate this simple model back to the biological neuron that we discussed in section 1.3, the
weight 𝑤 corresponds to the strength of a synapse, the cell body is represented by the summation and
the transfer function, and the neuron output 𝑎 represents the signal on the axon.

Figure 3: Single-Input Neuron

The neuron output is calculated as

𝑎 = 𝑓(𝑤𝑝 + 𝑏)

Example 2.1:
Let 𝑤 = 3 , 𝑝 = 2 and 𝑏 = –1.5, what is the single-input neuron output ?

𝑎 = 𝑓 3 ∗ 2 + (−1.5) = 𝑓(4.5)

The actual output depends on the particular transfer function that is chosen.

Notes:
1. The bias is much like a weight, except that it has a constant input of 1. However, if you do not want to have a bias
in a particular neuron, it can be omitted.
2. Note that 𝑤 and 𝑏 are both adjustable scalar parameters of the neuron. Typically, the transfer function is chosen by
the designer and then the parameters 𝑤 and 𝑏 will be adjusted by some learning rule so that the neuron input/output
relationship meets some specific goal.

Multiple Input Neuron:

Typically, a neuron has more than one input. A neuron with 𝑅 inputs is shown in Figure 13. The individual inputs
𝐩 = (𝑝 , 𝑝 , 𝑝 , … , 𝑝 ) are each weighted by

Table of Content 13
corresponding elements 𝑤 , , 𝑤 , , … , 𝑤 , of the weight matrix 𝐖.

Figure 13: Multiple-Input Neuron

The neuron has a bias, which is summed with the weighted inputs to form the net input 𝑛 :

𝑛 = 𝑤 ,𝑝 + 𝑤 , 𝑝 + … + 𝑤 , 𝑝 +𝑏

This expression can be written in matrix form:

𝑛 = 𝐖𝐩 + 𝑏

where the matrix 𝐖 for the single neuron case has only one row.

Now the neuron output can be written as:

𝑎 = 𝑓(𝐖𝐩 + 𝑏)

The elements of the weight matrix had indices, which are, the first index indicates the particular neuron
destination for that weight. The second index indicates the source of the signal fed to the neuron. Thus, the indices in 𝑤 ,
say that this weight represents the connection to the first (and only) neuron from the second source. Of course, this
convention is more useful if there is more than one neuron, as will be the case later.

We would like to draw networks with several neurons, each having several inputs. Further, we would like to have
more than one layer of neurons. You can imagine how complex such a network might appear if all the lines were drawn.
It would take a lot of ink, could hardly be read, and the mass of detail might obscure the main features. Thus, we will use
an abbreviated notation. A multiple-input neuron using this notation is shown in Figure 14.

Table of Content 14
Figure 14: Neuron with 𝑹 Inputs, Abbreviated Notation

Note that the number of inputs to a network is set by the external specifications of the problem. If, for instance,
you want to design a neural network that is to predict kite-flying conditions and the inputs are air temperature, wind
velocity and humidity, then there would be three inputs to the network.

Transfer Functions (Activation Functions):

The transfer function in Figure 3 may be a linear or a nonlinear function of 𝑛. A particular

transfer function is chosen to satisfy some specification of the problem that the neuron is attempting
to solve.

There are variety of transfer functions some of them are listed below:
1. Threshold (Hard Limit) Transfer Function:

Figure 4: Hard Limit Transfer Function

Table of Content 15
This function will used to create neurons that classify inputs into two distinct categories.

2. Symmetric Hard Limit Transfer Function:

Figure 5: Symmetric Hard Limit Transfer Function

3. Linear Transfer Function:

𝑓(𝑛) = 𝑛

Figure 6: Linear Transfer Function

4. Logistic (Log-Sigmoid ) Transfer Function:

Logistic function is a standard sigmoid function and is defined by

The derivative of 𝑓 is defined by 𝑓’ = 𝑓 (1 − 𝑓)

Table of Content 16
Figure 7: Log-Sigmoid Transfer Function

The log-sigmoid transfer function is commonly used in multilayer networks that are trained using the
backpropagation algorithm.

5. Hyperbolic Tangent Transfer Function:

The hyperbolic tangent is a sigmoid function and is defined by

The derivative of 𝑓 is defined by 𝑓’ = (1 − 𝑓 )

Figure 8: Hyperbolic Tangent Transfer Function

Since then using the tanh function instead of the logistic one is equivalent. The

tanh function has the advantage of being symmetrical with respect to the origin.

6. Radial Basis Transfer Function:

Table of Content 17
Figure 9: Radial Basis Transfer Function

Example 2.2:

Let 𝑤 = 4 , 𝑝 = 2 and 𝑏 = –2 with 𝑓 radial basis, what is the single neuron output?

1.6 Network Architectures:

Commonly one neuron, even with many inputs, may not be sufficient. We might need five or ten, operating in
parallel, in what we will call a “layer”. This concept of a layer is discussed below.

1.6.1 Single Layer network:

A single layer network of neurons is shown in Figure 15. Note that each of the inputs is connected to each of the
neurons and that the weight matrix now has 𝑆 rows.

The layer includes the weight matrix, the summers, the bias vector, the transfer function boxes and the output
vector.

Each element of the input vector 𝐩 is connected to each neuron through the weight matrix 𝐖. Each neuron has
a bias 𝑏 , a summer, a transfer function 𝑓 and an output 𝑎 .

Table of Content 18
Taken together, the outputs form the output vector.

Figure 15: Single Layer of 𝑺 Neurons

You might ask if all the neurons in a layer must have the same transfer function. The answer is no; you can define
a single (composite) layer of neurons having different transfer functions by combining two of the networks shown above
in parallel. Both networks would have the same inputs, and each network would create some of the outputs.

The input vector elements enter the network through the weight matrix 𝐖:

𝑤, 𝑤, … 𝑤,
𝑤, 𝑤, ⋯ 𝑤,
𝐖= ⋮ ⋱ ⋮
𝑤, 𝑤, ⋯ 𝑤,

As noted previously, the row indices of the elements of matrix 𝐖 indicate the destination neuron associated with
that weight, while the column indices indicate the source of the input for that weight. Thus, the indices in 𝑤 , say that
this weight represents the

connection to the third neuron from the second source.

Fortunately, the S-neuron, R-input, one-layer network also can be drawn in abbreviated notation, as shown in
Figure 16.

Table of Content 19
Figure 16: Layer of 𝑺 Neurons, Abbreviated Notation

1.6.2 Multiple Layers Network:

Now consider a network with several layers. Each layer has its own weight matrix, its own bias vector 𝐛 , a net
input vector 𝐧 and an output vector 𝐚. We need to introduce some additional notation to distinguish between these
layers. We will use superscripts to identify the layers. Specifically, we append the number of the layer as a superscript to
the names for each of these variables. Thus, the weight matrix for the first layer is written as 𝐖𝟏, and the weight matrix
for the second layer is written as 𝐖𝟐 . This notation is used in the three-layer network shown in Figure 17.

Figure 17: Three-Layer Network

As shown, there are 𝑅 inputs, 𝑆 neurons in the first layer, 𝑆 neurons in the second layer, etc. As noted, different
layers can have different numbers of neurons.

The outputs of layers one and two are the inputs for layers two and three. Thus layer 2 can be viewed as a one-

layer network with 𝑅 = 𝑆 inputs, 𝑆 = 𝑆 neurons, and an 𝑆 × 𝑆 weight matrix 𝐖𝟐 . The input to layer 2 is 𝐚𝟏, and the

output is 𝐚𝟐.

Table of Content 20
A layer whose output is the network output is called an output layer. The other layers are called hidden layers.
The network shown above has an output layer (layer 3) and two hidden layers (layers 1 and 2).

The same three-layer network discussed previously also can be drawn using our abbreviated notation, as shown
in Figure 18.

Figure 18: Three-Layer Network, Abbreviated Notation

Multilayer networks are more powerful than single layer networks. For instance, a two-layer
network having a sigmoid first layer and a linear second layer can be trained to approximate most
functions arbitrarily well. Single- layer networks cannot do this.

As for the number of layers, most practical neural networks have just two or three layers. Four
or more layers are used rarely.

We should say something about the use of biases. One can choose neurons with or without
biases. The bias gives the network an extra variable, and so you might expect that networks with
biases would be more powerful than those without, and that is true. Note, for instance, that a neuron
without a bias will always have a net input 𝑛 of zero when the network inputs 𝐩 are zero. This may
not be desirable and can be avoided by the use of a bias.

1.6.3 Recurrent Networks:

A recurrent network is a network with feedback; some of its outputs are connected to its inputs.
This is quite different from the networks that we have mentioned before, which were strictly
feedforward with no backward connections. It is similar to feedforward neural network with no
limitations regarding backloops. In these cases information is no longer transmitted only in one
direction but it is also transmitted backwards. Figure 19 shows small Fully Recurrent artificial neural
Table of Content 21
network and complexity of its artificial neuron interconnections. The most basic topology of recurrent
artificial neural network is fully recurrent artificial network where every basic building block (artificial
neuron) is directly connected to every other basic building block in all direction.

Figure 19: Fully Recurrent Artificial Neural Network

Other recurrent artificial neural networks such as Hopfield, Elman, Jordan, bidirectional and
other networks are just special cases of recurrent artificial neural networks.

How to Pick an Architecture:

Problem specifications help define the network in the following ways:

1. Number of network inputs = number of problem inputs.
2. Number of neurons in output layer = number of problem outputs.
3. Output layer transfer function choice at least partly determined by problem specification of
the outputs.

Example:
A single-layer neural network is to have six inputs and two outputs. The outputs are to be limited to and
continuous over the range 0 to 1. What can you tell about the network architecture? Specifically:

1. How many neurons are required?

2. What are the dimensions of the weight matrix?
3. What kind of transfer functions could be used?
4. Is a bias required?

Table of Content 22
Solution:

1. Two neurons, one for each output, are required.

2. The weight matrix has two rows corresponding to the two neurons and six columns corresponding to

the six inputs. (The product 𝐖𝐩 is a two-element vector).

3. The transfer functions is Logistic (Log-Sigmoid ) would be most appropriate.

4. Not enough information is given to determine if a bias is required.

1.7Neural Network Learning:

One of the questions will raised from the above examples “How do we determine the weight
matrix and bias for perceptron networks with many inputs, where it is impossible to visualize the
decision boundaries?”. The answer is to build an algorithm for training perceptron networks, so
that they can learn to solve classification problems.

1.7.1 Types of Learning:

A procedure for modifying the weights and biases of a network. (This procedure may also be
referred to as a training algorithm). The purpose of the learning rule is to train the network to
perform some tasks. There are many types of neural network learning rules. They fall into two
broad categories: supervised learning and unsupervised learning.

Supervised Learning:
The learning rule is provided with a set of examples (the training set) of proper network behavior:

{𝐩 , 𝐭 }, {𝐩 , 𝐭 }, {𝐩 , 𝐭 }, … , 𝐩 , 𝐭

where 𝐩 is an input to the network and 𝐭 is the corresponding correct (target) output. As the
inputs are applied to the network, the network outputs are compared to the targets. The learning
rule is then used to adjust the weights and biases of the network in order to move the network
outputs closer to the targets.

Table of Content 23
Unsupervised Learning:

The weights and biases are modified in response to network inputs only. There are no target
outputs available. At first glance this might seem to be impractical. How can you train a network if
you don’t know what it is supposed to do? Most of these algorithms perform some kind of
clustering operation. They learn to categorize the input patterns into a finite number of classes.

1.8 Learning Rules:

1.8.1 Hebbian Learning
Hebbian Learning Rule, also known as Hebb Learning Rule, was proposed by Donald O Hebb. It is one of the
first and also easiest learning rules in the neural network. It is used for pattern classification. It is a single layer
neural network, i.e. it has one input layer and one output layer. The input layer can have many units, say n. The
output layer only has one unit. Hebbian rule works by updating the weights between neurons in the neural
network for each training sample.

Hebbian Learning Rule Algorithm :

• Step 0: Initialize all weights to 0.

• Step 1: Given a training input, xᵢ, and output unit t.

• Step 2: Adjust the weights: wᵢ(new) = wᵢ(old) + xᵢt

• Step 3: Adjust the bias (just like the weights): b(new) = b(old) + t

Example

PROBLEM:

Construct a Hebb Net which performs like an AND function, that is, only when both features are "active"
will the data be in the target class.
x1 x2 bias Target
TRAINING SET (with the bias input always at 1):
1 1 1 1

1 -1 1 -1

-1 1 1 -1

-1 -1 11 -1

Table of Content 24
Training - First Input

• Initialize the weights to 0

• Present the first input (1 1 1) with a target of 1

Update the weights: w₁(new) = w₁(old) + x₁t = 0 + 1 = 1 w₂(new) = w₂(old) + x₂t = 0 + 1 = 1 b(new) = b(old)
+t=0+1=1

Training - Second Input

• Present the second input: (1 -1 1) with a target of -1

Update the weights:

w₁(new) = w₁(old) + x₁t = 1 + 1(-1) = 0 w₂(new) = w₂(old) + x₂t = 1 + (-1)(-1) = 2 b(new) = b(old) + t = 1 + (-1) = 0

Training - Third Input

• Present the third input: (-1 1 1) with a target of -1

Update the weights:

w₁(new) = w₁(old) + x₁t = 0 + (-1)(-1) = 1 w₂(new) = w₂(old) + x₂t = 2 + 1(-1) = 1 b(new) = b(old) + t = 0 + (-1) = -1

Table of Content 25
Training - Fourth Input

• Present the fourth input: (-1 -1 1) with a target of -1

Update the weights:

w₁(new) = w₁(old) + x₁t = 1 + (-1)(-1) = 2 w₂(new) = w₂(old) + x₂t = 1 + (-1)(-1) = 2 b(new) = b(old) + t = -1 + (-1) = -2

Final Neuron

This neuron works:

x1 x2 bias Target

1 1 1 1

1 -1 1 -1

-1 1 1 -1

-1 -1 1 -1

neural network with updated weights

12 + 12 + 1*(-2) = 2 > 0

(-1)2 + 12 + 1*(-2) = -2 < 0

1*2 + (-1)2 + 1(-2) = -2 < 0

Table of Content 26
(-1)*2 + (-1)2 + 1(-2) = -6 < 0

1.8.2 Perceptron

The perceptron model, proposed by Minsky-Papert, is a more general computational model than McCulloch-Pitts neuron.
It overcomes some of the limitations of the M-P neuron by introducing the concept of numerical weights (a measure of
importance) for inputs, and a mechanism for learning those weights. Inputs are no longer limited to boolean values like
in the case of an M-P neuron, it supports real inputs as well which makes it more useful and generalized.

Now, this is very similar to an M-P neuron but we take a weighted sum of the inputs and set the output as one only when
the sum is more than an arbitrary threshold (theta). However, according to the convention, instead of hand coding the

Table of Content 27
thresholding parameter thetha, we add it as one of the inputs, with the weight -theta like shown below, which makes it
learn-able (more on this in my next post — Perceptron Learning Algorithm).

Consider the task of predicting whether I would watch a random game of football on TV or not (the same example from
my M-P neuron post) using the behavioral data available. And let's assume my decision is solely dependent on 3 binary
inputs (binary for simplicity).

Here, w_0 is called the bias because it represents the prior (prejudice). A football freak may have a very low threshold
and may watch any football game irrespective of the league, club or importance of the game [theta = 0]. On the other
hand, a selective viewer like me may only watch a football game that is a premier league game, featuring Man United
game and is not friendly [theta = 2]. The point is, the weights and the bias will depend on the data (my viewing history in
this case).

Based on the data, if needed the model may have to give a lot of importance (high weight) to
the isManUnitedPlaying input and penalize the weights of other inputs.

Perceptront algorithm
Table of Content 28
Use a set of training samples:

• Input vectors, each paired with desired output

Choose a network structure.

Initialize the weights arbitrarily.

Repeat:

• Choose a sample

• Test the network against the sample

• If answer is incorrect, adjust weights

until all samples test correctly.

Perceptron vs McCulloch-Pitts Neuron

What kind of functions can be implemented using a perceptron? How different is it from McCulloch-Pitts neurons?

From the equations, it is clear that even a perceptron separates the input space into two halves, positive and negative.
All the inputs that produce an output 1 lie on one side (positive half space) and all the inputs that produce an output 0
lie on the other side (negative half space).

In other words, a single perceptron can only be used to implement linearly separable functions, just like the M-P neuron.
Then what is the difference? Why do we claim that the perceptron is an updated version of an M-P neuron? Here, the
weights, including the threshold can be learned and the inputs can be real values.

Boolean Functions Using Perceptron

OR Function — Can Do!

Just revisiting the good old OR function the perceptron way.

Table of Content 29
Try solving the equations on your own.

The above ‘possible solution’ was obtained by solving the linear system of equations on the left. It is clear that the
solution separates the input space into two spaces, negative and positive half spaces. I encourage you to try it out for
AND and other boolean function.

Now if you actually try and solve the linear equations above, you will realize that there can be multiple solutions. But
which solution is the best? To more formally define the ‘best’ solution, we need to understand errors and error surfaces,
which we will do in my next post on Perceptron Learning Algorithm.

XOR Function — Can’t Do!

Now let's look at a non-linear boolean function i.e., you cannot draw a line to separate positive inputs from the negative
ones.

Table of Content 30
Notice that the fourth equation contradicts the second and the third equation. Point is, there are no perceptron solutions
for non-linearly separated data. So the key take away is that a single perceptron cannot learn to separate the data that
are non-linear in nature.

Example

1.8.3 backpropagation
The project describes teaching process of multi-layer neural network employing backpropagation algorithm. To
illustrate this process the three layer neural network with two inputs and one output,which is shown in the
picture below, is used:

Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The
second unit realise nonlinear function, called neuron activation function. Signal e is adder output signal, and y
= f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.

Table of Content 31
To teach the neural network we need training data set. The training data set consists of input signals
(x1 and x2 ) assigned with corresponding target (desired output) z. The network training is an iterative process.
In each iteration weights coefficients of nodes are modified using new data from training data set. Modification
is calculated using algorithm described below: Each teaching step starts with forcing both input signals from
training set. After this stage we can determine output signals values for each neuron in each network layer.
Pictures below illustrate how signal is propagating through the network, Symbols w(xm)n represent weights of
connections between network input xm and neuron n in input layer. Symbols yn represents output signal of
neuron n.

Table of Content 32
Propagation of signals through the hidden layer. Symbols wmn represent weights of connections between
output of neuron m and input of neuron n in the next layer.

Table of Content 33
Propagation of signals through the output layer.

Table of Content 34
In the next algorithm step the output signal of the network y is compared with the desired output value (the
target), which is found in training data set. The difference is called error signal d of output layer neuron.

It is impossible to compute error signal for internal neurons directly, because output values of these neurons
are unknown. For many years the effective method for training multiplayer networks has been unknown. Only
in the middle eighties the backpropagation algorithm has been worked out. The idea is to propagate error
signal d (computed in single teaching step) back to all neurons, which output signals were input for discussed
neuron.

Table of Content 35
The weights' coefficients wmn used to propagate errors back are equal to this used during computing output
value. Only the direction of data flow is changed (signals are propagated from output to inputs one after the
other). This technique is used for all network layers. If propagated errors came from few neurons they are
added. The illustration is below:

Table of Content 36
When the error signal for each neuron is computed, the weights coefficients of each neuron input node may
be modified. In formulas below df(e)/de represents derivative of neuron activation function (which weights are
modified).

Table of Content 37
Table of Content 38
Table of Content 39
Coefficient h affects network teaching speed. There are a few techniques to select this parameter. The first
method is to start teaching process with large value of the parameter. While weights coefficients are being
established the parameter is being decreased gradually. The second, more complicated, method starts
teaching with small parameter value. During the teaching process the parameter is being increased when the
teaching is advanced and then decreased again in the final stage. Starting teaching process with low parameter
value enables to determine weights coefficients signs.

Table of Content 40
let’s clear that up by explicitly showing all the calculations for a full sized network with 2 inputs, 3 hidden layer
neurons and 2 output neurons as shown in figure 3.4. W+ represents the new, recalculated, weight, whereas
W (without the superscript) represents the old weight.

Figure 3.4, all the calculations for a reverse pass of Back Propagation.

Table of Content 41
The
constant η (called the learning rate, and nominally equal to one) is put in to speed

up or slow down the learning if required.

Table of Content 42
Table of Content 43
Learning in Back Propagation Algorithm

The proposed training algorithm used in the back propagation algorithm is shown below in many steps:

• Initialize the weights of the layers to small random values.

• Select a training vector (input and the corresponding output).
• Propagate the inputs data forward through the network and calculating the actual outputs in the feed
forward path.
• Calculating the error from the difference between actual output and target.
• Reduce the error function by updating the weights in the output layer and the hidden layer in the
backward path.
• Go to step 2 and repeat for the next pattern until the error is acceptably small or a maximum number of
iterations is reached (epoch) [28, 54, 55].

Because of the simplicity in use impressive speed for training and teaching Artificial Neural Network (ANN),
and because of their distinctive ability to extract meaning from the complicated data and recognize patterns
beside of its massive ability to predict and data filtering, all that made the back propagation learning algorithm a
powerful tool and widely used technique in the learning and training of the Artificial Neural Networks (ANNs).

Table of Content 44
Chapter 3 :Convolution Neural Network

3.1 What Computer "See"

To computers, images are just numbers.

• Regression: output variable takes continuous value

• Classification: output variable takes class label. Can produce probability of
belonging to a particular class.

Manual Feature Extraction: Domain knowledge -> Define features -> Detect features to
classify.

Learning Feature Representations. Can we learn a hierarchy of features directly from the
data instead of hand engineering? (We've mentioned this in the first lecture: MIT
Introduction to Deep Learning : 6.S191)

3.2. Learning Visual Features

Fully Connected Neural Network (Dense Neural Network).

Table of Content 45
Input.

• 2D images
• Array of pixel values

Fully Connected Characters:

• Connect neuron in hidden layer to all neurons in input layer

• No spatial information
• And many, many parameters

So, how can we use spatial structure in the input to inform the architecture of the
network?

3.2.1 Using Spatial Structure

IDEA: connect patches of input to neurons in hidden layer. neuron connected to region of
input. Only "sees" these values.

Table of Content 46
One way we can use the spatial structure would be to actually connect patches of our input. Not the
whole input but just patches of the input to neurons in the hidden layer. So before, everything was
connected from the input layer to the hidden layer, but now we're just gonna connect only things
that are within a single patch to the next neuron in the next layer. Each neuron only sees the
values coming from the patch that precedes it.

(patch: a small area of something, especially one that is different from the area around it.)

This will not only reduce the number of weights in our model, but it's also going to allow us to
leverage the fact that in an image spatially close pixels are likely to be somewhat related and
correlated to each other. That's a fact that we should really take it into account.

Connect patch in input layer to a single neuron in subsequent layer.

We can basically do this by sliding that patch across the input image. For each time we slide it,
we're going to have a new output neuron in the subsequent layer. This way, we can actually take
into account some of the spatial structure that inherent to our input, but remember that
our ultimate task is not only to preserve spatial structure but to actually learn the visual
features. And we do this by weighting the connections between the patches and the neurons.

In practice, there's an operation called a convolution.

Table of Content 47
• Apply a set of weights - a filter - to extract local features
• Use multiple filters to extract different features
• Spatially share parameters of each filter

3.3. Feature Extraction and Convolution - A Case study

Image is represented as matrix of pixel values and computers are literal! We want to be
able to classify an X as an X even if it's shifted, shrunk, rotated, deformed.

We want our model to basically compare images of a piece of an X (piece by piece) and
the really important pieces that it should look for are exactly what we've been calling
the features. If our model can find those important (and rough) features that define the X
roughly in the same positions, it can get a lot better at understanding the similarity
between different examples of X even in the presence of these types of deformities.

Table of Content 48
The Convolution Operation. We slide the 3×33×3 filter over the input image, element-
wise multiply, and add the outputs:

Table of Content 49
...

Producing Feature Maps. Different filter can be used to produce different feature maps.

3.4. Convolutional Neural Networks (CNNs)

CNNs for Classification.

Table of Content 50
1. Convolution: Apply filters to generate feature maps.
2. Non-linearity: Often ReLU.
3. Pooling: Downsampling operation on each feature map.

tf.keras.layers.Conv2D
tf.keras.activations.*
tf.keras.layers.MaxPool2D

Train model with image data. Learn wights of filters in convolutional layers.

For a neuron in hidden layer:

• take inputs from patch

• compute weighted sum
• apply bias

In the dense layers, we'll need to add on a bias to allow us to shift the activation function,
apply and activate it with some non-linearity, so that we can handle non-linear data
relationship.

What's special here is that the local connectivity is preserved each neuron in the hidden
layer you can see in the right only sees a very specific patch of its inputs. It does not see
the entire input neurons like it would have if it was a fully connected layer.

Let's define the actual operation more concretely using a mathematical equation here.
We're left with a 4×44×4 filter matrix, and for each neuron in the hidden layer, its inputs
are those neurons in the patch from the previous layer.

Table of Content 51
Summary. 1) applying a window of weights. 2) computing linear combinations.
3) activating with non-linear function.

3.4.1 CNNs: Spatial Arrangement of Output Volume

Previously, we know how to take input image and learn a single feature map. But in
reality there are many types of features in our image. How can we use convolutional
layers to learn a stack or many different types of features that could be useful for
performing our types of task? How can we use this to do multiple feature extraction?

Now the output layer is still convolution but now it has a volume dimension where the
height and the width are spatial dimensions dependent upon the dimensions of the input
layer.

3.4.2 Introducing Non-Linearity

• Apply after every convolution operation (i.e., after convolutional layers)

• ReLU: pixel-by-pixel operation that replaces all negative values bt zero. Non-linear
operation!
Table of Content 52
tf.keras.layers.ReLU

3.4.3 Pooling

Pooling is an operation that is commonly used to reduce the dimentionality of our inputs
and of our feature maps while still preserving spatial invariants.

Now, a common technique and a common type of pooling that is commonly used in
practice is called Max Pooling.

Max Pooling.

• Reduced dimensionality
• Spatial invariance
Table of Content 53
tf.keras.layers.MaxPool2D(
pool_size=(2,2),
strides=2
)

Mean Pooling. Taking the maximum over that patch is one idea. A very common
alternative is also taking the average that's called mean pooling.

Taking the average actually represents a very smooth way to perform the pooling
operation because you're not just taking a maximum which can be subject to maybe
outliers, but you averaging it, so you get a smoother result in your output layer.

Mean Pooling and Mean Pooling, they both have their advantages and
disadvantages.

3.4.4 CNNs for Classification: Feature Learning

The CNNs for classification can be broken down into two parts.

4.4.1 Part 1

First is the Feature Learning part where we actually try to learn the features in our input
image that can be used to perform our specific task. The feature learning part is actually
mentioned before in this blog.

Feature Learning:

1. Learn features in input image through convolution

Table of Content 54
2. Introduce non-linearity through activation function (real-world data is non-linear!)
3. Reduce dimensionality and preserve invariance with pooling

4.4.2 Part 2

The convolutional layers and pooling provide those the output excuse me of the first part
is those high-level features of the input. The second part is actually using these features
to perform our classification or whatever our task is in this case. The task is to output
the class probabilities that are present in the input image. So we feed those outputted
features into a fully connected or dense neural network to perform the classification. We
can do this now and don't mind about losing spatial invariance because we've already
down sampled our images so much that it's not really even an image anymore, it's
actually closer to a vector of numbers, and we can directly apply our dense neural
network to that vector of numbers. It's also much lower dimensional now. And We can
output a class of probabilities using a function called softmax whose output actually
represents a categorical probability distribution.

The spatial invariance of convolution refers to applying the same filter bank F to input patches at
all locations.

• CONV and POOL layers output high-level features of input

• Fully connect layer uses these features for classifying input image
• Express output as probability of image belonging to a particular class

Table of Content 55
import tensorflow as tf

def generate_model():
model = tf.keras.Sequential([
# first convolutional layer
tf.keras.layers.Conv2D(32, filter_size=3, activation='relu'),
tf.keras.layers.Conv2D(pool_size=2, strides=2),

# second convolutional layer

tf.keras.layers.Conv2D(64, filter_size=3, activation='relu'),
tf.keras.layers.MaxPool2D(pool_size=2, strides=2),

# fully connected classifier

tf.keras.layers.Flatten(),
tf.keras.layers.Conv2D(1024, activation='relu'),
tf.keras.layers.Conv2D(10, activation='softmax') # 10 output
])
return model

3.5. An Architecture for Many Applications

What the task is entirely up to you and what you desire. So that's really what makes these networks
incredibly powerful.

Task can be

• Classification
• Object detection
• Segmentation
• Probabilistic control

Table of Content 56
3.5.1 Object Detection

A naive solution. We can start by placing a random box over the input image somewhere.
It has some random location, it also has a random size. Then we can take that box and
feed it through our normal image classification network. This network's task is to predict
what is the class of this image. If there is no class of this box then it simply can ignore it.
We repeat this process then we pick another box in the scene and we pass that through
network to predict its class and we can keep doing this with different boxes in the scene…
In some sense if each of these boxes give us a prediction class we can pick the boxes that
do have a class in them and use those as a box where an object is found. Problem: there
are way too many inputs, this basically results in boxes and considering a number of
boxes that have way too many scales, way to many positions, too many sizes. We can't
possibly iterate over our images in all of these dimensions.

5.1.1 Object Detection with R-CNNs

R-CNN algorithm: Find regions that we think have objects. Use CNN to classify.

Problems:
Table of Content 57
1. Slow! Many regions; time intensive inference.
2. Brittle! Manually define region proposals.

5.1.2 Faster R-CNN Learns Region Proposals

Advantages: It only requires a single forward pass through the model. We only feed in
this image once, we have a region proposal network that extracts the regions, and all of
these regions are fed on to perform classification on the rest of the image.

3.5.2 Semantic Segmentation: Fully Convolutional Networks

FCN: Fully Convolutional Network

Network designed with all convolutional layers, with downsampling and upsampling
operations.

Table of Content 58
This output is created using an upsampling operation not a downsampling operation. But
upsampling allow the convolutional decoder to actually increase its spatial dimension.

tf.keras.layers.Conv2DTranspose

3.5.3 Continuous Control: navigation from Vision

End-to-End Framework for Autonomous Navigation. Entire model is trained end-to-

end without any human labelling or annotations.

3.6 Summary

Table of Content 59
Chapter 4 : Recurrent Neural Network
4.1Recurrent Neural Networks
Learn about recurrent neural networks. This type of model has been proven to perform extremely well on temporal data.
It has several variants including LSTMs, GRUs and Bidirectional RNNs, which you are going to learn about in this section.

4.1.2Why sequence models

• Sequence Models like RNN and LSTMs have greatly transformed learning on sequences in the past few years.

• Examples of sequence data in applications:

o Speech recognition (sequence to sequence):

▪ X: wave sequence

▪ Y: text sequence

o Music generation (one to sequence):

▪ X: nothing or an integer

▪ Y: wave sequence

o Sentiment classification (sequence to one):

▪ X: text sequence

▪ Y: integer rating from one to five

o DNA sequence analysis (sequence to sequence):

▪ X: DNA sequence

▪ Y: DNA Labels

o Machine translation (sequence to sequence):

▪ X: text sequence (in one language)

▪ Y: text sequence (in other language)

o Video activity recognition (sequence to one):

▪ X: video frames

▪ Y: label (activity)

o Name entity recognition (sequence to sequence):

▪ X: text sequence

▪ Y: label sequence

Table of Content 60
▪ Can be used by seach engines to index different type of words inside a text.

• All of these problems with different input and output (sequence or not) can be addressed as supervised learning
with label data X, Y as the training set.

4.1.2Name entity
• Named entities are the proper names that play an important role in searching important information of
interest.(understand meaning of word)

• Motivating example:

o Named entity recognition example:

▪ X: "Harry Potter and Hermoine Granger invented a new spell."

▪ Y: 1 1 0 1 1 0 0 0 0

▪ Both elements has a shape of 1 means its a name, while 0 means its not a name.

• We will index the first element of x by x<1>, the second x<2> and so on.

o x<1> = Harry

o x<2> = Potter

• Similarly, we will index the first element of y by y<1>, the second y<2> and so on.

o y<1> = 1

o y<2> = 1

• Tx is the size of the input sequence and Ty is the size of the output sequence.

o Tx = Ty = 9 in the last example although they can be different in other problems.

Table of Content 61
• x(i)<t> is the element t of the sequence of input vector i. Similarly y(i)<t> means the t-th element in the output
sequence of the i training example.

• Tx(i) the input sequence length for training example i. It can be different across the examples. Similarly for Ty(i) will
be the length of the output sequence in the i-th training example.

• Another classification for different variable

• For example , give 1 for name of person , 2 for name of city ,3 for currency ,4 book name and 0 for other

4.1.3Representing words:
o We will now work in this course with NLP which stands for natural language processing. One of the
challenges of NLP is how can we represent a word?

ii.We need a vocabulary list that contains all the words in our target sets.

▪ Example:

▪ [a ... And ... Harry ... Potter ... Zulu]

▪ Each word will have a unique index that it can be represented with.

▪ The sorting here is in alphabetical order.

▪ Vocabulary sizes in modern applications are from 30,000 to 50,000. 100,000 is not uncommon.
Some of the bigger companies use even a million.

▪ To build vocabulary list, you can read all the texts you have and get m words with the most
occurrence, or search online for m most occurrent words.

iii.Create a one-hot encoding sequence for each word in your dataset given the vocabulary you have created.

▪ While converting, what if we meet a word thats not in your dictionary?

▪ We can add a token in the vocabulary with name <UNK> which stands for unknown text and use
its index for your one-hot vector.

Table of Content 62
o Full example:

• The goal is given this representation for x to learn a mapping using a sequence model to then target output y as
a supervised learning problem.

Recurrent Neural Network Model

• Why not to use a standard network for sequence tasks? There are two problems:

o Inputs, outputs can be different lengths in different examples.

▪ This can be solved for normal NNs by paddings with the maximum lengths but it's not a good
solution.

o Doesn't share features learned across different positions of text/sequence.

▪ Using a feature sharing like in CNNs can significantly reduce the number of parameters in your
model. That's what we will do in RNNs.

Table of Content 63
o Long term dependence

• Recurrent neural network doesn't have either of the four mentioned problems.

Table of Content 64
4.1.4Forward RNN
• Lets build a RNN that solves name entity recognition task:

o In this problem Tx = Ty. In other problems where they aren't equal, the RNN architecture may be
different.

o a<0> is usually initialized with zeros, but some others may initialize it randomly in some cases.

o There are three weight matrices here: Wax, Waa, and Wya with shapes:

▪ Wax: (NoOfHiddenNeurons, nx)

▪ Waa: (NoOfHiddenNeurons, NoOfHiddenNeurons)

▪ Wya: (ny, NoOfHiddenNeurons)

• The weight matrix Waa is the memory the RNN is trying to maintain from the previous layers.

• A lot of papers and books write the same architecture this way:

o It's harder to interpret. It's easier to roll this drawings to the unrolled version.

• In the discussed RNN architecture, the current output ŷ<t> depends on the previous inputs and activations.

Table of Content 65
• Let's have this example 'He Said, "Teddy Roosevelt was a great president"'. In this example Teddy is a person
name but we know that from the word president that came after Teddy not from He and said that were before
it.

• So limitation of the discussed architecture is that it can not learn from elements later in the sequence. To
address this problem we will later discuss Bidirectional RNN (BRNN).

• Now let's discuss the forward propagation equations on the discussed architecture:

o The activation function of a is usually tanh or ReLU and for y depends on your task choosing some
activation functions like sigmoid and softmax. In name entity recognition task we will use sigmoid
because we only have two classes.

• In order to help us develop complex RNN architectures, the last equations needs to be simplified a bit.

• Simplified RNN notation:

o wa is waa and wax stacked horizontaly.

Table of Content 66
o [a<t-1>, x<t>] is a<t-1> and x<t> stacked verticaly.

o wa shape: (NoOfHiddenNeurons, NoOfHiddenNeurons + nx)

o [a<t-1>, x<t>] shape: (NoOfHiddenNeurons + nx, 1)

4.1.5Backpropagation through time

• Let's see how backpropagation works with the RNN architecture.

• Usually deep learning frameworks do backpropagation automatically for you. But it's useful to know how it
works in RNNs.

• Here is the graph:

o Where wa, ba, wy, and by are shared across each element in a sequence.

• We will use the cross-entropy loss function:

o Where the first equation is the loss for one example and the loss for the whole sequence is given by the
summation over all the calculated single example losses.

Table of Content 67
• Graph with losses:

• The backpropagation here is called backpropagation through time because we pass activation a from one
sequence element to another like backwards in time.

4.1.6Different types of RNNs

• So far we have seen only one RNN architecture in which Tx equals TY. In some other problems, they may not
equal so we need different architectures.

• Mainly this image has all types:

• The architecture we have descried before is called Many to Many.

Table of Content 68
• In sentiment analysis problem, X is a text while Y is an integer that rangers from 1 to 5. The RNN architecture for
that is Many to One as in Andrej Karpathy image.

• A One to Many architecture application would be music generation.

o Note that starting the second layer we are feeding the generated output back to the network.

• There are another interesting architecture in Many To Many. Applications like machine translation inputs and
outputs sequences have different lengths in most of the cases. So an alternative Many To Many architecture that
fits the translation would be as follows:

o There are an encoder and a decoder parts in this architecture. The encoder encodes the input sequence
into one matrix and feed it to the decoder to generate the outputs. Encoder and decoder have different
weight matrices.

Table of Content 69
• Summary of RNN types:

• There is another architecture which is the attention architecture which we will talk about in chapter 4.

4.1.7Language model and sequence generation

• RNNs do very well in language model problems. In this section, we will build a language model using RNNs.

• What is a language model

o Let's say we are solving a speech recognition problem and someone says a sentence that can be
interpreted into to two sentences:

▪ The apple and pair salad

▪ The apple and pear salad

o Pair and pear sounds exactly the same, so how would a speech recognition application choose from the
two.

o That's where the language model comes in. It gives a probability for the two sentences and the
application decides the best based on this probability.

• The job of a language model is to give a probability of any given sequence of words.

4.1.8 How to build language models with RNNs?

o The first thing is to get a training set: a large corpus of target language text.

o Then tokenize this training set by getting the vocabulary and then one-hot each word.

o Put an end of sentence token <EOS> with the vocabulary and include it with each converted sentence.
Also, use the token <UNK> for the unknown words.

• Given the sentence "Cats average 15 hours of sleep a day. <EOS>"

Table of Content 70
o In training time we will use this:

o The loss function is defined by cross-entropy loss:

▪ i is for all elements in the corpus, t - for all timesteps.

• To use this model:

i.For predicting the chance of next word, we feed the sentence to the RNN and then get the final y^<t> hot vector and sort
it by maximum probability.

ii.For taking the probability of a sentence, we compute this:

▪ p(y<1>, y<2>, y<3>) = p(y<1>) * p(y<2> | y<1>) * p(y<3> | y<1>, y<2>)

▪ This is simply feeding the sentence into the RNN and multiplying the probabilities (outputs).

Sampling novel sequences

• After a sequence model is trained on a language model, to check what the model has learned you can apply it to
sample novel sequence.

• Lets see the steps of how we can sample a novel sequence from a trained sequence language model:

i. Given this model:

ii. We first pass a<0> = zeros vector, and x<1> = zeros vector.

Table of Content 71
iii. Then we choose a prediction randomly from distribution obtained by ŷ<1>. For example it could be "The".

▪ In numpy this can be implemented using: numpy.random.choice(...)

▪ This is the line where you get a random beginning of the sentence each time you sample run a
novel sequence.

iv. We pass the last predicted word with the calculated a<1>

v. We keep doing 3 & 4 steps for a fixed length or until we get the <EOS> token.

vi. You can reject any <UNK> token if you mind finding it in your output.

• So far we have to build a word-level language model. It's also possible to implement a character-level language
model.

• In the character-level language model, the vocabulary will contain [a-zA-Z0-9], punctuation, special characters
and possibly token.

• Character-level language model has some pros and cons compared to the word-level language model

o Pros:

a. There will be no <UNK> token - it can create any word.

o Cons:

a. The main disadvantage is that you end up with much longer sequences.

b. Character-level language models are not as good as word-level language models at capturing long range
dependencies between how the the earlier parts of the sentence also affect the later part of the sentence.

c. Also more computationally expensive and harder to train.

• The trend Andrew has seen in NLP is that for the most part, a word-level language model is still used, but as
computers get faster there are more and more applications where people are, at least in some special cases,
starting to look at more character-level models. Also, they are used in specialized applications where you might
need to deal with unknown words or other vocabulary words a lot. Or they are also used in more specialized
applications where you have a more specialized vocabulary.

4.1.9Vanishing gradients with RNNs

• One of the problems with naive RNNs that they run into vanishing gradient problem.

• An RNN that process a sequence data with the size of 10,000 time steps, has 10,000 deep layers which is very
hard to optimize.

• Let's take an example. Suppose we are working with language modeling problem and there are two sequences
that model tries to learn:

o "The cat, which already ate ..., was full"

o "The cats, which already ate ..., were full"

o Dots represent many words in between.

Table of Content 72
• What we need to learn here that "was" came with "cat" and that "were" came with "cats". The naive RNN is not
very good at capturing very long-term dependencies like this.

• As we have discussed in Deep neural networks, deeper networks are getting into the vanishing gradient problem.
That also happens with RNNs with a long sequence size.

o For computing the word "was", we need to compute the gradient for everything behind. Multiplying
fractions tends to vanish the gradient, while multiplication of large number tends to explode it.

o Therefore some of your weights may not be updated properly.

• In the problem we descried it means that its hard for the network to memorize "was" word all over back to "cat".
So in this case, the network won't identify the singular/plural words so that it gives it the right grammar form of
verb was/were.

• The conclusion is that RNNs aren't good in long-term dependencies.

• In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick
parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don’t seem to be able to learn
them. http://colah.github.io/posts/2015-08-Understanding-LSTMs/

• Vanishing gradients problem tends to be the bigger problem with RNNs than the exploding gradients problem.
We will discuss how to solve it in next sections.

Table of Content 73
• Exploding gradients can be easily seen when your weight values become NaN. So one of the ways solve
exploding gradient is to apply gradient clipping means if your gradient is more than some threshold - re-scale
some of your gradient vector so that is not too big. So there are cliped according to some maximum value.

• Extra:

o Solutions for the Exploding gradient problem:

▪ Truncated backpropagation.

▪ Not to update all the weights in the way back.

▪ Not optimal. You won't update all the weights.

▪ Gradient clipping.

o Solution for the Vanishing gradient problem:

▪ Weight initialization.

▪ Like He initialization.

▪ Echo state networks.

▪ Use LSTM/GRU networks.

▪ Most popular.

▪ We will discuss it next.

4.2Gated Recurrent Unit (GRU)

• GRU is an RNN type that can help solve the vanishing gradient problem and can remember the long-term
dependencies.

Table of Content 74
• The basic RNN unit can be visualized to be like this:

• We will represent the GRU with a similar drawings.

• Each layer in GRUs has a new variable C which is the memory cell. It can tell to whether memorize something or
not.

• In GRUs, C<t> = a<t>

• Equations of the GRUs:

o The update gate is between 0 and 1

▪ To understand GRUs imagine that the update gate is either 0 or 1 most of the time.

o So we update the memory cell based on the update cell and the previous cell.

• Lets take the cat sentence example and apply it to understand this equations:

o Sentence: "The cat, which already ate ........................, was full"

o We will suppose that U is 0 or 1 and is a bit that tells us if a singular word needs to be memorized.

o Splitting the words and get values of C and U at each place:

Table of Content 75
Word Update gate(U) Cell memory (C)

The 0 val

cat 1 new_val

which 0 new_val

already 0 new_val

... 0 new_val

was 1 (I don't need it anymore) newer_val

full .. ..

• Drawing for the GRUs

o Drawings like in http://colah.github.io/posts/2015-08-Understanding-LSTMs/ is so popular and makes it

easier to understand GRUs and LSTMs. But Andrew Ng finds it's better to look at the equations.

• Because the update gate U is usually a small number like 0.00001, GRUs doesn't suffer the vanishing gradient
problem.

o In the equation this makes C<t> = C<t-1> in a lot of cases.

• Shapes:

o a<t> shape is (NoOfHiddenNeurons, 1)

o c<t> is the same as a<t>

o c~<t> is the same as a<t>

o u<t> is also the same dimensions of a<t>

• The multiplication in the equations are element wise multiplication.

• What has been descried so far is the Simplified GRU unit. Let's now describe the full one:

Table of Content 76
o The full GRU contains a new gate that is used with to calculate the candidate C. The gate tells you how
relevant is C<t-1> to C<t>

o Equations:

o Shapes are the same

• So why we use these architectures, why don't we change them, how we know they will work, why not add
another gate, why not use the simpler GRU instead of the full GRU; well researchers has experimented over
years all the various types of these architectures with many many different versions and also addressing the
vanishing gradient problem. They have found that full GRUs are one of the best RNN architectures to be used for
many different problems. You can make your design but put in mind that GRUs and LSTMs are standards.

Table of Content 77
4.3 LSTM Networks
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-
term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by
many people in following work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of
time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this
repeating module will have a very simple structure, such as a single tanh layer.

The repeating module in a standard RNN contains a single layer.

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single
neural network layer, there are four, interacting in a very special way.

The repeating module in an LSTM contains four interacting layers.

Don’t worry about the details of what’s going on. We’ll walk through the LSTM diagram step by step later. For now, let’s
just try to get comfortable with the notation we’ll be using.

Table of Content 78
In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink
circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers.
Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to
different locations.

The Core Idea Behind LSTMs

The key to LSTMs is the cell state, the horizontal line running through the top of the diagram.

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear
interactions. It’s very easy for information to just flow along it unchanged.

The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called
gates.

Gates are a way to optionally let information through. They are composed out of a sigmoid neural net layer and a
pointwise multiplication operation.

The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let
through. A value of zero means “let nothing through,” while a value of one means “let everything through!”
Table of Content 79
An LSTM has three of these gates, to protect and control the cell state.

4.1 Step-by-Step LSTM Walk Through

The first step in our LSTM is to decide what information we’re going to throw away from the cell state. This decision is
made by a sigmoid layer called the “forget gate layer.” It looks at ht−1and xt, and outputs a number
between 00 and 11 for each number in the cell state Ct−1. A 11 represents “completely keep this” while a 00 represents
“completely get rid of this.”

Let’s go back to our example of a language model trying to predict the next word based on all the previous ones. In such
a problem, the cell state might include the gender of the present subject, so that the correct pronouns can be used.
When we see a new subject, we want to forget the gender of the old subject.

The next step is to decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid
layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate
values, C~t, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.

In the example of our language model, we’d want to add the gender of the new subject to the cell state, to replace the
old one we’re forgetting.

It’s now time to update the old cell state, Ct−1, into the new cell state Ct. The previous steps already decided what to do,
we just need to actually do it.

We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it∗C~t. This is the new
candidate values, scaled by how much we decided to update each state value.

In the case of the language model, this is where we’d actually drop the information about the old subject’s gender and
add the new information, as we decided in the previous steps.

Table of Content 80
Finally, we need to decide what we’re going to output. This output will be based on our cell state, but will be a filtered
version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the
cell state through tanhtanh (to push the values to be between −1−1 and 11) and multiply it by the output of the sigmoid
gate, so that we only output the parts we decided to.

For the language model example, since it just saw a subject, it might want to output information relevant to a verb, in
case that’s what is coming next. For example, it might output whether the subject is singular or plural, so that we know
what form a verb should be conjugated into if that’s what follows next.

4.2Variants on Long Short Term Memory

What I’ve described so far is a pretty normal LSTM. But not all LSTMs are the same as the above. In fact, it seems like
almost every paper involving LSTMs uses a slightly different version. The differences are minor, but it’s worth mentioning
some of them.

One popular LSTM variant, introduced by Gers & Schmidhuber (2000), is adding “peephole connections.” This means that
we let the gate layers look at the cell state.

Table of Content 81
The above diagram adds peepholes to all the gates, but many papers will give some peepholes and not others.

Another variation is to use coupled forget and input gates. Instead of separately deciding what to forget and what we
should add new information to, we make those decisions together. We only forget when we’re going to input something
in its place. We only input new values to the state when we forget something older.

A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by Cho, et al. (2014). It
combines the forget and input gates into a single “update gate.” It also merges the cell state and hidden state, and makes
some other changes. The resulting model is simpler than standard LSTM models, and has been growing increasingly
popular.

Table of Content 82
4.3LSTM Example

Table of Content 83
Table of Content 84
Table of Content 85
Table of Content 86
Table of Content 87
Table of Content 88
Table of Content 89
4.4Course summary
Here are the course summary as its given on the course link:

This course will teach you how to build models for natural language, audio, and other sequence data. Thanks to deep
learning, sequence algorithms are working far better than just two years ago, and this is enabling numerous exciting
applications in speech recognition, music synthesis, chatbots, machine translation, natural language understanding, and
many others.

You will:

• Understand how to build and train Recurrent Neural Networks (RNNs), and commonly-used variants such as
GRUs and LSTMs.

• Be able to apply sequence models to natural language problems, including text synthesis.

• Be able to apply sequence models to audio applications, including speech recognition and music synthesis.

This is the fifth and final course of the Deep Learning Specialization.

Table of Content 90

Notions de Deep Learning
No ratings yet
Notions de Deep Learning
116 pages
AI & ML: Concepts and Comparisons
No ratings yet
AI & ML: Concepts and Comparisons
179 pages
CNN Short
No ratings yet
CNN Short
61 pages
Feature Engineering
No ratings yet
Feature Engineering
6 pages
OOP Python 1
No ratings yet
OOP Python 1
127 pages
Deep Learning LectureCNN
No ratings yet
Deep Learning LectureCNN
28 pages
Pytorch Tutorial 1
No ratings yet
Pytorch Tutorial 1
48 pages
Multi-Label Long Short-Term Memory-Based Framework To Analyze Drug Functions From Biological Properties
No ratings yet
Multi-Label Long Short-Term Memory-Based Framework To Analyze Drug Functions From Biological Properties
6 pages
Week 3 - Post - GAN
No ratings yet
Week 3 - Post - GAN
38 pages
Top Python Cheat Sheets for Learners
100% (1)
Top Python Cheat Sheets for Learners
17 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
Neural Networks & Deep Learning Basics
100% (1)
Neural Networks & Deep Learning Basics
24 pages
The Art of ChatGPT Prompting
No ratings yet
The Art of ChatGPT Prompting
18 pages
K-Nearest Neighbors Algorithm Overview
No ratings yet
K-Nearest Neighbors Algorithm Overview
14 pages
Neural Network Loss & Regularization
No ratings yet
Neural Network Loss & Regularization
112 pages
Attention Mechanism
No ratings yet
Attention Mechanism
11 pages
Imagecon MLops Syllabus
100% (1)
Imagecon MLops Syllabus
6 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
C15-Momentum RMSProp Adam
No ratings yet
C15-Momentum RMSProp Adam
23 pages
8 Workbook Hidden Layer
No ratings yet
8 Workbook Hidden Layer
25 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Introduction to Machine Learning
100% (1)
Introduction to Machine Learning
17 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
TDT4136: Intro to Artificial Intelligence
No ratings yet
TDT4136: Intro to Artificial Intelligence
40 pages
NLTK: Python for Natural Language Processing
No ratings yet
NLTK: Python for Natural Language Processing
23 pages
Machine Learning For Networking
No ratings yet
Machine Learning For Networking
10 pages
Convolutional Neural Networks For Malware Classification
100% (1)
Convolutional Neural Networks For Malware Classification
100 pages
Python Programming Important Notes
No ratings yet
Python Programming Important Notes
46 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
19 pages
Federated Learning
No ratings yet
Federated Learning
9 pages
Python Machine Learning - 2024
100% (1)
Python Machine Learning - 2024
6 pages
Understanding GAN Specialization
No ratings yet
Understanding GAN Specialization
20 pages
Lamb - Particle Swarm Optimization (PSO)
No ratings yet
Lamb - Particle Swarm Optimization (PSO)
11 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Lecture 1
No ratings yet
Lecture 1
53 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
65 pages
Chris Walshaw Computing & Information Systems University of Greenwich
No ratings yet
Chris Walshaw Computing & Information Systems University of Greenwich
45 pages
Python Projects for Beginners
100% (1)
Python Projects for Beginners
84 pages
MobilenetV2 (Quantization)
No ratings yet
MobilenetV2 (Quantization)
4 pages
Multi-Armed Bandits Algorithms Overview
No ratings yet
Multi-Armed Bandits Algorithms Overview
2 pages
Understanding NSGA-II Algorithm Basics
No ratings yet
Understanding NSGA-II Algorithm Basics
8 pages
3 Workbook Linear
No ratings yet
3 Workbook Linear
25 pages
Clustering Iris Data With Weka
No ratings yet
Clustering Iris Data With Weka
6 pages
Deep Learning For Time Series Forecasting - Tutorial and Literature Survey
100% (1)
Deep Learning For Time Series Forecasting - Tutorial and Literature Survey
36 pages
Deep Learning - Wikipedia
No ratings yet
Deep Learning - Wikipedia
36 pages
Back Propagation
100% (1)
Back Propagation
27 pages
Dart Presentation
No ratings yet
Dart Presentation
20 pages
Siamese Network: Shusen Wang
No ratings yet
Siamese Network: Shusen Wang
51 pages
AI Deep Learning & NLP Course
No ratings yet
AI Deep Learning & NLP Course
4 pages
What Is Computer Vision?
No ratings yet
What Is Computer Vision?
120 pages
Python Classes and Objects Overview
No ratings yet
Python Classes and Objects Overview
80 pages
Deep Learning Career Launch
No ratings yet
Deep Learning Career Launch
15 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Deep Learning: A Technical Guide
No ratings yet
Deep Learning: A Technical Guide
106 pages
CA2 NeuralNetworks Report
No ratings yet
CA2 NeuralNetworks Report
5 pages
Neural Networks: Applications & Learning
No ratings yet
Neural Networks: Applications & Learning
6 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
ANN White Paper by GG
No ratings yet
ANN White Paper by GG
6 pages
Deep Neural Network AIML Handout v1.0-1
No ratings yet
Deep Neural Network AIML Handout v1.0-1
8 pages
Automata Theory Course Overview
No ratings yet
Automata Theory Course Overview
65 pages
Context-Free Languages Guide
No ratings yet
Context-Free Languages Guide
52 pages
Understanding Electric Motors and Drivers
No ratings yet
Understanding Electric Motors and Drivers
7 pages
Lec 1
No ratings yet
Lec 1
106 pages
Lect 3 & 4
No ratings yet
Lect 3 & 4
29 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
11 pages
Overview of Recurrent Neural Networks
No ratings yet
Overview of Recurrent Neural Networks
53 pages
Understanding Generative Adversarial Networks
No ratings yet
Understanding Generative Adversarial Networks
22 pages
Deepooo
No ratings yet
Deepooo
13 pages
Ankit's Academic and Professional Profile
No ratings yet
Ankit's Academic and Professional Profile
2 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
19 pages
Object Detection Interview Insights
No ratings yet
Object Detection Interview Insights
15 pages
AI Exercises and Concepts Guide
No ratings yet
AI Exercises and Concepts Guide
5 pages
pTIA DataX+ Dy0-001
No ratings yet
pTIA DataX+ Dy0-001
16 pages
Prognosis CRC
No ratings yet
Prognosis CRC
22 pages
How Does Backpropagation Work in A CNN - Medium
No ratings yet
How Does Backpropagation Work in A CNN - Medium
29 pages
One Dollar Gestures - Gesture Recognition Method
No ratings yet
One Dollar Gestures - Gesture Recognition Method
10 pages
Unexpected Results: Embedded Information in Fingerprints Regarding Diabetes
No ratings yet
Unexpected Results: Embedded Information in Fingerprints Regarding Diabetes
11 pages
Deep Learning for Zema Genre Classification
No ratings yet
Deep Learning for Zema Genre Classification
12 pages
Machine Learning Based Social Media Bot Detection: A Comprehensive Literature Review
No ratings yet
Machine Learning Based Social Media Bot Detection: A Comprehensive Literature Review
40 pages
Maths Roadmap For Machine Learning - Linear Algebra-1
No ratings yet
Maths Roadmap For Machine Learning - Linear Algebra-1
5 pages
Jeong 等 - 2023 - Noisy adversarial representation learning for effe
No ratings yet
Jeong 等 - 2023 - Noisy adversarial representation learning for effe
10 pages
Neural Network for SFRC Beam Strength
No ratings yet
Neural Network for SFRC Beam Strength
11 pages
A Review of Artificial Intelligence To Enhance The Security of Big Data Systems - State-of-Art, Methodologies, Applications, and Challenges
No ratings yet
A Review of Artificial Intelligence To Enhance The Security of Big Data Systems - State-of-Art, Methodologies, Applications, and Challenges
19 pages
Deep Learning Chorale Prelude
No ratings yet
Deep Learning Chorale Prelude
6 pages
The Incorporation of Stacked Long Short-Term Memory Into Intrusion Detection Systems For Botnet Attack Classification
No ratings yet
The Incorporation of Stacked Long Short-Term Memory Into Intrusion Detection Systems For Botnet Attack Classification
14 pages
Tree Vs LSTM For SCM
No ratings yet
Tree Vs LSTM For SCM
17 pages
DL Unit 2.1
No ratings yet
DL Unit 2.1
11 pages
Neural Network Training Results
100% (2)
Neural Network Training Results
6 pages
Smart Education With Artificial Intelligence Based Determination of Learning Styles
No ratings yet
Smart Education With Artificial Intelligence Based Determination of Learning Styles
9 pages
Advancing Decision Review System (DRS) in Cricket: Harnessing Ai For Enhanced Decision Making
No ratings yet
Advancing Decision Review System (DRS) in Cricket: Harnessing Ai For Enhanced Decision Making
8 pages
A Review of Network Traffic Analysis and Prediction Techniques
No ratings yet
A Review of Network Traffic Analysis and Prediction Techniques
22 pages
Mechanical Behaviour Optimization of Saw Dust Ash and Quarry Dust Concrete Using Adaptive Neuro-Fuzzy Inference System
No ratings yet
Mechanical Behaviour Optimization of Saw Dust Ash and Quarry Dust Concrete Using Adaptive Neuro-Fuzzy Inference System
17 pages
Applied Machine Learning Course Schedule
No ratings yet
Applied Machine Learning Course Schedule
19 pages
ST-RAP A Spatio-Temporal Framework For Real Estate
No ratings yet
ST-RAP A Spatio-Temporal Framework For Real Estate
5 pages
Project Report ON "Artificial Intelligence"
100% (1)
Project Report ON "Artificial Intelligence"
24 pages
SNS - Final Project Report
No ratings yet
SNS - Final Project Report
19 pages
2023 Automated Fish Classification Using
No ratings yet
2023 Automated Fish Classification Using
19 pages
Image Style Transfer
No ratings yet
Image Style Transfer
9 pages