0% found this document useful (0 votes)

10 views29 pages

Unit IV Machine Learning Notes

This document provides an overview of neural networks, including their structure, properties, and learning methods. It discusses the biological inspiration behind artificial neural networks (ANNs), the architecture of basic neural networks, and various learning algorithms such as supervised and unsupervised learning. Additionally, it covers the perceptron model for pattern classification and its application in solving complex problems through adaptive learning.

Uploaded by

mansoorkhan.a006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Unit IV Machine Learning Notes

Uploaded by

mansoorkhan.a006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT V NEURAL NETWORKS 9

Perceptron - Multilayer perceptron, activation functions, network training –

gradient descent optimization – stochastic gradient descent, error backpropagation, from
shallow networks to deep networks –Unit saturation (aka the vanishing gradient problem) –
ReLU, hyperparameter tuning, batch normalization, regularization, dropout.

NEURAL NETWORK- INTRODUCTION

• Neural networks, also known as artificial neural networks (ANNs) or simulated

neural networks (SNNs), are a subset of machine learning and are at the heart of
deep learning algorithms. Their name and structure are inspired by the human
brain, mimicking the way that biological neurons signal to one another.
• A neural network is a method in artificial intelligence that teaches computers to
process data in a way that is inspired by the human brain.
• It is a type of machine learning process, called deep learning, that uses
interconnected nodes or neurons in a layered structure that resembles the human
brain.
• It creates an adaptive system that computers use to learn from their mistakes and
improve continuously.
• Thus, artificial neural networks attempt to solve complicated problems, like
summarizing documents or recognizing faces, with greater accuracy.
Artificial neural networks (ANNs) provide a general, practical method for learning real-
valued,discrete-valued, and vector-valued target functions.

Biological Motivation

The study of artificial neural networks (ANNs) has been inspired by the
observation that biological learning systems are built of very complex webs of
interconnected Neurons.
Human information processing system consists of brain neuron: basic building
blockcell that communicates information to and from various parts of body.
Facts of Human Neurobiology

Number of neurons ~ 1011

Connection per neuron ~ 10 4 – 5
Neuron switching time ~ 0.001 second or 10 -3
Scene recognition time ~ 0.1 second
100 inference steps doesn’t seem like enough
Highly parallel computation based on distributed representation

Properties of Neural Networks

Many neuron-like threshold switching units

Many weighted interconnections among units
Highly parallel, distributed process
Emphasis on tuning weights automatically
Input is a high-dimensional discrete or real-valued (e.g, sensor input )

NEURAL NETWORK REPRESENTATIONS

A prototypical example of ANN learning is provided by system ALVINN,which

uses a learned ANN to steer an autonomous vehicle driving at normal speeds on
public highways
The input to the neural network is a 30x32 grid of pixel intensities obtained from a forward-
pointed camera
mounted on the vehicle.The network output is the direction in which the vehicle
is steered

Neural network learning to steer an autonomous vehicle.

The network is shown on the left side of the figure, with the input camera image
depicted below it.
Each node (i.e., circle) in the network diagram corresponds to the output of a
single network unit, and the lines entering the node from below are its inputs.
There are four units that receive inputs directly from all of the 30 x 32 pixels in
the image. These are called "hidden" units because their output is available only
within thenetwork and is not available as part of the global network output. Each
of these four hidden units computes a single real-valued output based on a
weighted combination of its 960 inputs
These hidden unit outputs are then used as inputs to a second layer of 30 "output" units.
Each output unit corresponds to a particular steering direction, and the output
values ofthese units determine which steering direction is recommended most
strongly.

APPROPRIATE PROBLEMS FOR NEURAL NETWORK LEARNING

ANN learning is well-suited to problems in which the training data corresponds to

noisy,complex sensor data, such as inputs from cameras and microphones.

ANN is appropriate for problems with the following characteristics:

1. Instances are represented by many attribute-value pairs.

2. The target function output may be discrete-valued, real-valued, or a vector of
severalreal- or discrete-valued attributes.
3. The training examples may contain errors.
4. Long training times are acceptable.
5. Fast evaluation of the learned target function may be required
6. The ability of humans to understand the learned target function is not important.
Neural computing is an information processing paradigm, inspired by biological
system, composed of a large number of highly interconnected processing
elements(neurons) working in unison to solve specific problems.
Dendrites are branching fibres that extend from the cell body or soma. Soma or cell
body of a neuron contains the nucleus and other structures, support chemical
processing and production of neurotransmitters.
Axon is a singular fiber carries information away from the soma to the synaptic sites
of other neurons (dendrites ans somas), muscels, or glands.
Myelin sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap exists
between each myelinsheath cell along the axon. Since fat inhibits the propagation of
electricity, the signals jump from onegap to the next.
Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells. Since fat
serves as a good insulator, the myelin sheaths speed the rate of transmission of an
electrical impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or a gland.
Electrochemical communication between neurons take place at these junctions.
Terminal buttons of a neuron are the small knobs at the end of an axon that release
chemicals called neuro transmitters.
Axon is a singular fiber carries information away from the soma to the synaptic sites
of other neurons (dendrites ans somas), muscels, or glands.
Axon hillock is the site of summation for incoming information. At any moment,
the collective influence of all neurons that conduct impulses to a given neuron will
determine whether orn ot an action potential will be initiated at the axon hillock and
propagated along the axon.
Myelin sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap exists
between each myelinsheath cell along the axon. Since fat inhibits the propagation of
electricity, the signals jump from onegap to the next.
Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells. Since fat
serves as a good insulator, the myelin sheaths speed the rate of transmission of an
electrical impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or a gland.
Electrochemical communication between neurons take place at these junctions.
Terminal buttons of a neuron are the small knobs at the end of an axon that atlases
chemicals called neurotransmitters.

Artificial neuron model

Simple neural network architecture
A basic neural network has interconnected artificial neurons in three layers:

• Input Layer
Information from the outside world enters the artificial neural network from the input
layer. Input nodes process the data, analyze or categorize it, and pass it on to the next
layer.

• Hidden Layer
Hidden layers take their input from the input layer or other hidden layers. Artificial
neural networks can have a large number of hidden layers. Each hidden layer analyzes
the output from the previous layer, processes it further, and passes it on to the next
layer.

• Output Layer
The output layer gives the final result of all the data processing by the artificial neural
network. It can have single or multiple nodes. For instance, if we have a binary
(yes/no) classification problem, the output layer will have one output node, which will
give the result as 1 or 0. However, if we have a multi-class classification problem, the
output layer might consist of more than one output node.
An artificial neuron is a mathematical function conceived as a simple model of a
real (biological) neuron.
 The McCulloch-Pitts Neuron
This is a simplified model of real neurons, known as a Threshold Logic
Unit.
 A set of input connections brings in activations from other neuron.
 A processing unit sums the inputs, and then applies a non-
linear activation function (i.e. squashing/transfer/threshold
function).
 An output line transmits the result to other neurons.
Basic Elements of ANN:
Neuron consists of three basic components –weights, thresholds and a single
activation function. An Artificial neural network (ANN) model based on the
biological neural systems is shownin figure.

Different Learning Rules

A brief classification of Different Learning algorithms is depicted in figure 3.

Training: It is the process in which the network is taught to change its weight
and bias.
Learning: It is the internal process of training where the artificial neural systemlearns
to update/adapt the weights and biases.
Different Training /Learning procedure available in ANN are
 Supervised learning, Unsupervised learning
 Reinforced learning, Hebbian learning
 Gradient descent learning, Competitive learning, Stochastic learning

Requirements of Learning Laws:

• Learning Law should lead to convergence of weights

Learning or training time should be less for capturing the information from the
trainingpairs
• Learning should use the local information

Learning process should able to capture the complex non linear mapping
availablebetween the input & output pairs
• Learning should able to capture as many as patterns as possible

Storage of pattern information's gathered at the time of learning should be high

for thegiven network

Different Training methods of Artificial Neural Network

Supervised learning :

very input pattern that is used to train the network is associated with an output pattern
which isthe target or the desired pattern.
A teacher is assumed to be present during the training process, when a comparison is
made between the network’s computed output and the correct expected output, to
determine the error.The error can then be used to change network parameters, which
result in an improvement in performance.
Unsupervised learning:
In this learning method the target output is not presented to the network. It is as if
there is no teacher to present the desired patterns and hence the system learns of its
own by discovering and adapting to structural features in the input patterns.
Reinforced learning:
In this method, a teacher though available, doesnot present the expected answer but
only indicatesif the computed output corrects or incorrect. The information provided
helps the network in the learning process.
PERCEPTRON Model

Simple Perceptron for Pattern Classification

Perceptron network is capable of performing pattern classification into two or more
categories. The perceptron is trained using the perceptron learning rule. We will first
consider classification into two categories and then the general multiclass classification
later. For classification
One type of ANN system is based on a unit called a perceptron. Perceptron is
a single layer neural network.

Figure: A perceptron

A perceptron takes a vector of real-valued inputs, calculates a linear

combination ofthese inputs, then outputs a 1 if the result is greater than some
threshold and -1 otherwise.
Given inputs x through x, the output O(x1, . . . , xn) computed by the perceptron is

Where, each wi is a real-valued constant, or weight, that determines the

contribution ofinput xi to the perceptron output.
-w0 is a threshold that the weighted combination of inputs w1x1 + . . . + wnxn
must surpassin order for the perceptron to output a 1.
Sometimes, the perceptron function is written as,
Learning a perceptron involves choosing values for the weights w0 , . . . , wn . Therefore,
the space H of candidate hypotheses considered in perceptron learning is the set of all
possible real-valued weight vectors.

Representational Power of Perceptrons

The perceptron can be viewed as representing a hyperplane decision surface

in the n-dimensional space of instances (i.e., points)
The perceptron outputs a 1 for instances lying on one side of the hyperplane and
outputsa -1 for instances lying on the other side, as illustrated in below figure

Perceptrons can represent all of the primitive Boolean functions AND, OR, NAND (~
AND),and NOR (~OR)
Some Boolean functions cannot be represented by a single perceptron, such as
the XORfunction whose value is 1 if and only if x1 ≠ x2.

How Perceptron will work?

Example: Representation of AND functions:

If A=0 & B=0 → 00.6 + 00.6 = 0.

This is not greater than the threshold of 1, so the output = 0.
If A=0 & B=1 → 0*0.6 + 1*0.6 = 0.6.
This is not greater than the threshold, so the output = 0.
If A=1 & B=0 → 1*0.6 + 0*0.6 = 0.6.
This is not greater than the threshold, so the output = 0.
If A=1 & B=1 → 1*0.6 + 1*0.6 = 1.2.
This exceeds the threshold, so the output = 1.
Supportive problem
Suppose that we are going to work on AND Gate problem. The gate returns if and only if both
inputs are true.

X1 X2 Y

0 0 0

0 1 0

1 0 0

1 1 1

We are going to set weights randomly. Let’s say that w1 = 0.9 and w2 = 0.9
Round 1
We will apply 1st instance to the perceptron. x1 = 0 and x2 = 0.

Sum unit will be 0 as calculated below

Σ = x1 * w1 + x2 * w2 = 0 * 0.9 + 0 * 0.9 = 0

Activation unit checks sum unit is greater than a threshold. If this rule is satisfied, then it is
fired and the unit will return 1, otherwise it will return 0. BTW, modern neural networks
architectures do not use this kind of a step function as activation.

Activation threshold would be 0.5.

Sum unit was 0 for the 1st instance. So, activation unit would return 0 because it is less than
0.5. Similarly, its output should be 0 as well. We will not update weights because there is no
error in this case.

Let’s focus on the 2nd instance. x1 = 0 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.9 + 1 * 0.9 = 0.9

What about errors?

Activation unit will return 1 because sum unit is greater than 0.5. However, output of this
instance should be 0. This instance is not predicted correctly. That’s why, we will update
weights based on the error.

ε = actual – prediction = 0 – 1 = -1

We will add error times learning rate value to the weights. Learning rate would be 0.5. BTW,
we mostly set learning rate value between 0 and 1.

w1 = w1 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

w2 = w2 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

Focus on the 3rd instance. x1 = 1 and x2 = 0.

Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4

Activation unit will return 0 this time because output of the sum unit is 0.5 and it is less than
0.5. We will not update weights.

Mention the 4rd instance. x1 = 1 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8

Activation unit will return 1 because output of the sum unit is 0.8 and it is greater than the
threshold value 0.5. Its actual value should 1 as well. This means that 4th instance is predicted
correctly. We will not update anything.

Round 2
In previous round, we’ve used previous weight values for the 1st instance and it was classified
correctly. Let’s apply feed forward for the new weight values.

Remember the 1st instance. x1 = 0 and x2 = 0.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4

Activation unit will return 0 because sum unit is 0.4 and it is less than the threshold value 0.5.
The output of the 1st instance should be 0 as well. This means that the instance is classified
correctly. We will not update weights.

Feed forward for the 2nd instance. x1 = 0 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4

Activation unit will return 0 because sum unit is less than the threshold 0.5. Its output should
be 0 as well. This means that it is classified correctly and we will not update weights.

We’ve applied feed forward calculation for 3rd and 4th instances already for the current weight
values in the previous round. They were classified correctly.

Perceptron for AND Gate

Multi-Layer Perceptron Model:
The general representation of Multi-layer Perceptron network. In between the input
and output Layer there will be some more layers also known as Hidden layers.

Multilayer Perceptron falls under the category of feedforward algorithms,because inputs are
combined with the initial weights in a weighted sum and subjected to the activation function,
just like in the Perceptron. But the difference is that each linear combination is propagated to
the next layer.
Each layer is feeding the next one with the result of their computation, their internal
representation of the data. This goes all the way through the hidden layers to the output layer.
But it has more to it.
If the algorithm only computed the weighted sums in each neuron, propagated results to the
output layer, and stopped there, it wouldn’t be able to learn the weights that minimize the cost
function. If the algorithm only computed one iteration, there would be no actual learning.

Multi-layer Perceptron neural architecture

Structure of MLPs

A multi-layer perceptron (MLP) is composed of multiple layers of interconnected neurons.

With our student's example, we can say that each neuron is like a student in the group, and each
neuron is only able to perform simple arithmetic operations.

• In a typical MLP network, the input units (Xi) are fully connected to all hidden
layer units (Yj) and the hidden layer units are fully connected to all output layer
units (Zk). Each of the connections between the input to hidden and hidden to
output layer units has an associated weight attached to it (Wij or Wjk)

• The hidden and output layer units also derive their bias values (b j or bk) from
weighted connections to units whose outputs are always 1 (true neurons)
The Multilayer Perceptron was developed to tackle this limitation. It is a neural network where
the mapping between inputs and output is non-linear. A Multilayer Perceptron has input and
output layers, and one or more hidden layers with many neurons stacked together. And while
in the Perceptron the neuron must have an activation function that imposes a threshold, like
ReLU or sigmoid, neurons in a Multilayer Perceptron can use any arbitrary activation function.

Multilayer Perceptron.
The structure of an MLP can be broken down into three main parts: the input layer, the hidden
layers, and the output layer.

 The input layer is like the teacher giving out the math problem to the students. It
receives the input data, in this case, the equation 5 x 3 + 2 x 4 + 8 x 2, and passes it on
to the next layer.
 The hidden layers are like the students working together to solve the problem. Each
hidden layer contains a set of interconnected neurons, which process and analyze the
input data passed on from the previous layer. In this example, the hidden layer can have
three neurons, each one solving a specific part of the equation "5 x 3", "2 x 4" and "8 x
2".
 The output layer is like the student who is putting together the final solution. It
receives the output from the previous layers, combines them, and produces the final
output which is the solution to the problem. In this example, the output neuron can be
calculated as "15 + 8" and "23 + 16" to get the final result of 39.
 The structure of MLP is shown here:
Fig 3 (MLP structure)

 Note that, the neurons in the input layer must be the size of the training instances, and
the output layer must be the size of the output labels. However, there can be any number
of neurons or layers in the hidden layer of the neural network according to the needs,
So the more neurons in the hidden layer the more complex problem the network can
solve.

MLP training algorithm

A Multi-Layer Perceptron (MLP) neural network trained using the Backpropagation
learning algorithm is one of the most powerful forms of supervised neural network
system.
The training of such a network involves three stages:
• feedforward of the input training pattern,
• calculation and backpropagation of the associated error
• adjustment of the weights
This procedure is repeated for each pattern over several complete passes (epochs)
through the training set. After training, application of the net only involves the
computations of the feedforward phase.
Multi-layer perceptrons working

Ok, let's start with an example, Imagine a group of 7-year-old students who are working on a
math problem, Imagine that each of them can only do arithmetic with two numbers. But you
are giving them an equation like this 5 x 3 + 2 x 4 + 8 x 2, how can they solve it?
To solve this problem, we can break it down into smaller parts and give them to each of the
students. One student can solve the first part of the equation "5 x 3 = 15" and another student
can solve the second part of the equation "2 x 4 = 8". The third student can solve the third part
"8 x 2 = 16".
Finally, we can simplify it to 15 + 8 + 16. Same way, one of the students in the group can solve
"15 + 8 = 23" and another one can solve "23 + 16 = 39", and that's the answer.So here we are
breaking down the large math problem into different sections and giving them to each of the
students who are just doing really simple calculations, but as a result of the teamwork, they can
solve the problem efficiently,

(Example for working of MLP)

Just like how we broke down the equation into smaller parts and gave each student a specific
section to solve, in an MLP, the input data is passed through different layers of interconnected
neurons, each layer solving a specific part of the problem. And just like how the students
combined their answers to get the final solution, the output of each neuron is passed on to the
next neuron, until the final output is produced which is the solution to the complex problem.
This is just an easy example of how neural networks work, to make your mind visualize it.
Neural Networks are often more versatile in solving a lot of problems, not just math problems.

Applications of Multi-layer Perceptron

Multi-layer perceptrons have been used in a wide variety of applications. Some of the most
common applications of MLPs include:

 Image recognition: MLPs can be trained to recognize patterns in images and classify
them into different categories. This is useful in applications such as facial recognition,
object detection, and image segmentation.
 Natural Language Processing (NLP): MLPs can be used to understand and generate
human language. This is useful in applications such as text-to-speech, machine
translation, and sentiment analysis.
 Predictive modeling: It can be used to make predictions based on past data. This is
useful in applications such as stock market prediction, weather forecasting, and fraud
detection.
 Medical diagnosis: Can be used to diagnose diseases or interpret medical images by
recognizing patterns in the data.
Backpropagation Learning Algorithm

 BACKPROPAGATION The BACKPROPAGATION Algorithm learns the

weights for a multilayer network, given a network with a fixed set of units and
interconnections. It employs gradient descent to attempt to minimize the squared
error between the network output values andthe target values for these outputs.
 In BACKPROPAGATION algorithm, we consider networks with multiple
output unitsrather than single units as before, so we redefine E to sum the errors
over all of the network output units.

where,
 outputs - is the set of output units in the network
 tkd and Okd - the target and output values associated with the kth output unit
 d - training example
Feed Forward phase:
• Xi = input[i]

• Yj = f( bj + XiWij)
• Zk = f( bk + YjWjk)
Backpropagation of errors:
• k = Zk[1 - Zk](dk - Zk)

• j = Yj[1 - Yj]  k Wjk

Weight updating:
• Wjk(t+1) = Wjk(t) + kYj + [Wjk(t) - Wjk(t - 1)]

• bk(t+1) = bk(t) + kYtn + [bk(t) - bk(t - 1)]

• Wij(t+1) = Wij(t) + jXi + [Wij(t) - Wij(t - 1)]

• bj(t+1) = bj(t) + jXtn + [bj(t) - bj(t - 1)]

Test stopping condition
After each epoch of training the Root Mean Square error of the network for all of the patterns
in a separate validation set is calculated.

ERMS =  (dk - Zk)2 n.k

• n is the number of patterns in the set
• k is the number of neuron units in the output layer
Training is terminated when the ERMS value for the validation set either starts to increase or
remains constant over several epochs.
This prevents the network from being overtrained (i.e. memorising the training set) and
ensures that the ability of the network to generalise (i.e. correctly classify non-trained
patterns) will be at its maximum.
Need for Backpropagation:

Backpropagation is “backpropagation of errors” and is very useful for training neural

networks. It’s fast, easy to implement, and simple. Backpropagation does not require any
parameters to be set, except the number of inputs. Backpropagation is a flexible method
because no prior knowledge of the network is required.

Types of Backpropagation:

There are two types of backpropagation networks.

 Static backpropagation: Static backpropagation is a network designed to map

static inputs for static outputs. These types of networks are capable of solving
static classification problems such as OCR (Optical Character Recognition).
 Recurrent backpropagation: Recursive backpropagation is another network
used for fixed-point learning. Activation in recurrent backpropagation is feed-
forward until a fixed value is reached. Static backpropagation provides an instant
mapping, while recurrent backpropagation does not provide an instant mapping.

Advantages:

 It is simple, fast, and easy to program.

 Only numbers of the input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.

Disadvantages:

 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

Regularization in Machine Learning

Overfitting is a phenomenon that occurs when a Machine Learning model is constraint to
training set and not able to perform well on unseen data.
Regularization is a technique used to reduce the errors by fitting the function appropriately
on the given training set and avoid overfitting.
The commonly used regularization techniques are :

1. L1 regularization
2. L2 regularization
3. Dropout regularization

A regression model which uses L1 Regularization technique is called LASSO(Least

Absolute Shrinkage and Selection Operator) regression.
A regression model that uses L2 regularization technique is called Ridge regression.

Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the
loss function(L).

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss
function(L).

NOTE that during Regularization the output function(y_hat) does not change. The change
is only in the loss function.
The output function:

The loss function before regularization:

The loss function after regularization:

We define Loss function in Logistic Regression as :

L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)

Loss function with no regularization :

L = y log (wx + b) + (1 - y)log(1 - (wx + b))

Lets say the data overfits the above function.
Loss function with L1 regularization :

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||1

Loss function with L2 regularization :

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||22

lambda is a Hyperparameter Known as regularization constant and it is greater than zero.

lambda > 0
Dropout Regularization:Dropout regularization is a technique that randomly drops a number
of neurons in a neural network during model training.This means the contribution of the
dropped neurons is temporally removed and they do not have an impact on the model’s
performance.The image below shows how dropout regularization works:

In the image above, the neural network on the left shows an original neural network where all
neurons are activated and working.
On the right, the red neurons have been removed from the neural network. Therefore, red
neurons will not be considered during model training.

We will implement this concept practically using TensorFlow.

How will dropout help with overfitting?

Dropout regularization will ensure the following:

 The neurons can’t rely on one input because it might be dropped out at random. This
reduces bias due to over-relying on one input, bias is a major cause of overfitting.
 Neurons will not learn redundant details of inputs. This ensures only important
information is stored by the neurons. This enables the neural network to gain useful
knowledge which it uses to make predictions.

An unregularized network overfits instantly on the training dataset. Take note of how the

validation loss for the no-dropout run diverges dramatically after only a few epochs. This

explains why the generalization error has grown.

Overfitting is avoided by training with two dropout layers and a dropout probability of 25%.

However, this affects training accuracy, necessitating the training of a regularised network over

a longer period.

Leaving improves model generalisation. Although the training accuracy is lower than that of

the unregularized network, the total validation accuracy has improved. This explains why the

generalization error has decreased.

2. Hyperparameter tuning

A Machine Learning model is defined as a mathematical model with a number of parameters

that need to be learned from the data. By training a model with existing data, we are able to
fit the model parameters.
However, there is another kind of parameter, known as Hyperparameters, that cannot be
directly learned from the regular training process. They are usually fixed before the actual
training process begins. These parameters express important properties of the model such as
its complexity or how fast it should learn.
Some examples of model hyperparameters include:
1. The penalty in Logistic Regression Classifier i.e. L1 or L2 regularization
2. The learning rate for training a neural network.
3. The C and sigma hyperparameters for support vector machines.
4. The k in k-nearest neighbors.

Models can have many hyperparameters and finding the best combination of parameters can
be treated as a search problem. The two best strategies for Hyperparameter tuning are:
 GridSearchCV
 RandomizedSearchCV

GridSearchCV
In GridSearchCV approach, the machine learning model is evaluated for a range of
hyperparameter values. This approach is called GridSearchCV, because it searches for the
best set of hyperparameters from a grid of hyperparameters values.
For example, if we want to set two hyperparameters C and Alpha of the Logistic Regression
Classifier model, with different sets of values. The grid search technique will construct many
versions of the model with all possible combinations of hyperparameters and will return the
best one.
As in the image, for C = [0.1, 0.2, 0.3, 0.4, 0.5] and Alpha = [0.1, 0.2, 0.3, 0.4]. For a
combination of C=0.3 and Alpha=0.2, the performance score comes out to
be 0.726(Highest), therefore it is selected.

Drawback: GridSearchCV will go through all the intermediate combinations of

hyperparameters which makes grid search computationally very expensive.
RandomizedSearchCV
RandomizedSearchCV solves the drawbacks of GridSearchCV, as it goes through only a
fixed number of hyperparameter settings. It moves within the grid in a random fashion to find
the best set of hyperparameters. This approach reduces unnecessary computation.

The following code illustrates how to use GridSearchCV

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import GridSearchCV

# Creating the hyperparameter grid

c_space = np.logspace(-5, 8, 15)
param_grid = {'C': c_space}
# Instantiating logistic regression classifier
logreg = LogisticRegression()

# Instantiating the GridSearchCV object

logreg_cv = GridSearchCV(logreg, param_grid, cv = 5)

logreg_cv.fit(X, y)

# Print the tuned parameters and score

print("Tuned Logistic Regression Parameters: {}".format(logreg_cv.best_params_))
print("Best score is {}".format(logreg_cv.best_score_))

Output:
Tuned Logistic Regression Parameters: {‘C’: 3.7275937203149381} Best score is
0.7708333333333334
The following code illustrates how to use RandomizedSearchCV
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

# Creating the hyperparameter grid

param_dist = {"max_depth": [3, None],
"max_features": randint(1, 9),
"min_samples_leaf": randint(1, 9),
"criterion": ["gini", "entropy"]}

# Instantiating Decision Tree classifier

tree = DecisionTreeClassifier()

# Instantiating RandomizedSearchCV object

tree_cv = RandomizedSearchCV(tree, param_dist, cv = 5)

tree_cv.fit(X, y)

# Print the tuned parameters and score

print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

Output:
Tuned Decision Tree Parameters: {‘min_samples_leaf’: 5, ‘max_depth’: 3, ‘max_features’:
5, ‘criterion’: ‘gini’} Best score is 0.7265625
********************************************************************

Unit 5
No ratings yet
Unit 5
77 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
80 pages
ANN Material
No ratings yet
ANN Material
99 pages
What Is An Artificial Neural Network?
No ratings yet
What Is An Artificial Neural Network?
11 pages
Content Library Read
No ratings yet
Content Library Read
25 pages
AI EEE Unit-I
No ratings yet
AI EEE Unit-I
36 pages
Unit 5 Neural Networks
No ratings yet
Unit 5 Neural Networks
99 pages
Ann 502 1 1 27
No ratings yet
Ann 502 1 1 27
27 pages
Soft Com Putting
No ratings yet
Soft Com Putting
64 pages
Unit 9 - Neural Network
No ratings yet
Unit 9 - Neural Network
53 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
28 pages
11 12 13 Merged
No ratings yet
11 12 13 Merged
150 pages
IT 701 Soft Computing Unit I - 1722317885
No ratings yet
IT 701 Soft Computing Unit I - 1722317885
12 pages
Dhiraj Report1
No ratings yet
Dhiraj Report1
25 pages
Artificial Neural Network: Synapses Weight The Individual Parts of Information
No ratings yet
Artificial Neural Network: Synapses Weight The Individual Parts of Information
8 pages
Lecture 2.1.9 Comparison of BNN and ANN
No ratings yet
Lecture 2.1.9 Comparison of BNN and ANN
5 pages
DL UNIT 1 and 2 - NOTES
100% (1)
DL UNIT 1 and 2 - NOTES
67 pages
Artificial Neural Network - Edited-2
No ratings yet
Artificial Neural Network - Edited-2
43 pages
CHA-2-Fundamentals of ANN PDF
No ratings yet
CHA-2-Fundamentals of ANN PDF
23 pages
Preliminaries: Biological Neuron To Artificial Neural Network
No ratings yet
Preliminaries: Biological Neuron To Artificial Neural Network
21 pages
Introduction To ANN P1
No ratings yet
Introduction To ANN P1
20 pages
Machine Learning Course in Bangalore
No ratings yet
Machine Learning Course in Bangalore
14 pages
Unit5 ANN
No ratings yet
Unit5 ANN
8 pages
Artificial Neural Network Lecture 1
No ratings yet
Artificial Neural Network Lecture 1
9 pages
Introduction To Neural Networks: 1.1 What Is A Neural Network?
No ratings yet
Introduction To Neural Networks: 1.1 What Is A Neural Network?
3 pages
Unit 5
No ratings yet
Unit 5
25 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
17 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Soft Computing Notes-I Mca
No ratings yet
Soft Computing Notes-I Mca
142 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
9 pages
Neural Networks: Introduction & Applications
No ratings yet
Neural Networks: Introduction & Applications
43 pages
Logisctic Models Intro
No ratings yet
Logisctic Models Intro
60 pages
Lecture 2.1.3 INTRODUCTION TO ANN
No ratings yet
Lecture 2.1.3 INTRODUCTION TO ANN
7 pages
Neural Network
No ratings yet
Neural Network
37 pages
What Actions Can Human Brain Do?: Trained
No ratings yet
What Actions Can Human Brain Do?: Trained
40 pages
Intelligent Control System - 4th - Control and Mechatronics
No ratings yet
Intelligent Control System - 4th - Control and Mechatronics
26 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
Biological vs Artificial Neurons
100% (1)
Biological vs Artificial Neurons
91 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
34 pages
NN Lecture1 Introduction
No ratings yet
NN Lecture1 Introduction
40 pages
Introduction To Neural Network: (Adapted From Various Sources)
No ratings yet
Introduction To Neural Network: (Adapted From Various Sources)
35 pages
Neural Networks
No ratings yet
Neural Networks
21 pages
Artificial Neural Networks - Lect - 1
No ratings yet
Artificial Neural Networks - Lect - 1
18 pages
Ann Unit 1
No ratings yet
Ann Unit 1
26 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
5 pages
DP Learn
No ratings yet
DP Learn
72 pages
Neural Networks for CSE Students
No ratings yet
Neural Networks for CSE Students
80 pages
Soft Computing Project Report 2019
No ratings yet
Soft Computing Project Report 2019
54 pages
ML Unit Iiia
No ratings yet
ML Unit Iiia
180 pages
Artifcial Neural Network": "A Project On
No ratings yet
Artifcial Neural Network": "A Project On
31 pages
L2 Neural Network
No ratings yet
L2 Neural Network
44 pages
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
No ratings yet
We Are Intechopen, The World'S Leading Publisher of Open Access Books Built by Scientists, For Scientists
19 pages
Deep Learning
No ratings yet
Deep Learning
14 pages
Soft Computing Notes
No ratings yet
Soft Computing Notes
99 pages
Module 1
No ratings yet
Module 1
25 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
Tire Noise Sound Synthesis
No ratings yet
Tire Noise Sound Synthesis
16 pages
Programmable Density of Laser Additive Manufactured Parts by Considering An Inverse Problem
No ratings yet
Programmable Density of Laser Additive Manufactured Parts by Considering An Inverse Problem
20 pages
Analisis Gases Disueltos - Ingles
No ratings yet
Analisis Gases Disueltos - Ingles
5 pages
17013-Article Text-20507-1-2-20210518
No ratings yet
17013-Article Text-20507-1-2-20210518
8 pages
AI UG Course Book: Sem V-VI
No ratings yet
AI UG Course Book: Sem V-VI
39 pages
CM412 - DL - Model Paper
No ratings yet
CM412 - DL - Model Paper
5 pages
Ciucci
No ratings yet
Ciucci
17 pages
M. Bertero, P. Boccacci, Christine de Mol - Introduction To Inverse Problems in Imaging-CRC Press (2021)
100% (1)
M. Bertero, P. Boccacci, Christine de Mol - Introduction To Inverse Problems in Imaging-CRC Press (2021)
358 pages
Data Scientist & AI Engineer Profile
No ratings yet
Data Scientist & AI Engineer Profile
3 pages
Machinelearning GateNotes
No ratings yet
Machinelearning GateNotes
105 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Data Science For Civil Engineering Unit 5 Notes
No ratings yet
Data Science For Civil Engineering Unit 5 Notes
17 pages
Panion PDF
No ratings yet
Panion PDF
154 pages
XGBoost & Adaboost
No ratings yet
XGBoost & Adaboost
22 pages
Alade 2019
No ratings yet
Alade 2019
9 pages
Backward Heat Equation and Ill-Posedness
No ratings yet
Backward Heat Equation and Ill-Posedness
1 page
Unsupervised Face Morphing Defense
No ratings yet
Unsupervised Face Morphing Defense
11 pages
Interpretable Machine Learning For Macro Alpha: A News Sentiment Case Study
No ratings yet
Interpretable Machine Learning For Macro Alpha: A News Sentiment Case Study
12 pages
Assignment 2
No ratings yet
Assignment 2
42 pages
Machine Learning PPT Part II
No ratings yet
Machine Learning PPT Part II
56 pages
Deep Learning - AD3501 - Notes - Unit 5 - Autoencoders and Generative Models
No ratings yet
Deep Learning - AD3501 - Notes - Unit 5 - Autoencoders and Generative Models
32 pages
LLM Assignment
No ratings yet
LLM Assignment
44 pages
Deep Learning for Tech Enthusiasts
No ratings yet
Deep Learning for Tech Enthusiasts
40 pages
Individual Assignment 2 Guideline
No ratings yet
Individual Assignment 2 Guideline
8 pages
Fileml
No ratings yet
Fileml
54 pages
PEST Addendum
No ratings yet
PEST Addendum
250 pages
Sidorov 2012
No ratings yet
Sidorov 2012
16 pages
DL Unit1
100% (1)
DL Unit1
61 pages
CS 230 - Deep Learning Tips and Tricks Cheatsheet
No ratings yet
CS 230 - Deep Learning Tips and Tricks Cheatsheet
8 pages
Machine Learning For Sociology: Annual Review of Sociology
No ratings yet
Machine Learning For Sociology: Annual Review of Sociology
19 pages

Unit IV Machine Learning Notes

Uploaded by

Unit IV Machine Learning Notes

Uploaded by

UNIT V NEURAL NETWORKS 9

Perceptron - Multilayer perceptron, activation functions, network training –

NEURAL NETWORK- INTRODUCTION

• Neural networks, also known as artificial neural networks (ANNs) or simulated

Number of neurons ~ 1011

Properties of Neural Networks

Many neuron-like threshold switching units

NEURAL NETWORK REPRESENTATIONS

A prototypical example of ANN learning is provided by system ALVINN,which

Neural network learning to steer an autonomous vehicle.

APPROPRIATE PROBLEMS FOR NEURAL NETWORK LEARNING

ANN learning is well-suited to problems in which the training data corresponds to

ANN is appropriate for problems with the following characteristics:

1. Instances are represented by many attribute-value pairs.

Artificial neuron model

Different Learning Rules

Requirements of Learning Laws:

Storage of pattern information's gathered at the time of learning should be high

Different Training methods of Artificial Neural Network

Simple Perceptron for Pattern Classification

A perceptron takes a vector of real-valued inputs, calculates a linear

Where, each wi is a real-valued constant, or weight, that determines the

Representational Power of Perceptrons

The perceptron can be viewed as representing a hyperplane decision surface

How Perceptron will work?

If A=0 & B=0 → 0*0.6 + 0*0.6 = 0.

Sum unit will be 0 as calculated below

Activation threshold would be 0.5.

Let’s focus on the 2nd instance. x1 = 0 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.9 + 1 * 0.9 = 0.9

What about errors?

w1 = w1 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

w2 = w2 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

Focus on the 3rd instance. x1 = 1 and x2 = 0.

Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4

Mention the 4rd instance. x1 = 1 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8

Remember the 1st instance. x1 = 0 and x2 = 0.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4

Feed forward for the 2nd instance. x1 = 0 and x2 = 1.

Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4

Perceptron for AND Gate

Multi-layer Perceptron neural architecture

A multi-layer perceptron (MLP) is composed of multiple layers of interconnected neurons.

MLP training algorithm

(Example for working of MLP)

Applications of Multi-layer Perceptron

 BACKPROPAGATION The BACKPROPAGATION Algorithm learns the

• j = Yj[1 - Yj]  k Wjk

• bk(t+1) = bk(t) + kYtn + [bk(t) - bk(t - 1)]

• bj(t+1) = bj(t) + jXtn + [bj(t) - bj(t - 1)]

ERMS =  (dk - Zk)2 n.k

Backpropagation is “backpropagation of errors” and is very useful for training neural

There are two types of backpropagation networks.

 Static backpropagation: Static backpropagation is a network designed to map

 It is simple, fast, and easy to program.

Regularization in Machine Learning

A regression model which uses L1 Regularization technique is called LASSO(Least

The loss function before regularization:

The loss function after regularization:

L(y_hat,y) = y log y_hat + (1 - y)log(1 - y_hat)

Loss function with no regularization :

L = y log (wx + b) + (1 - y)log(1 - (wx + b))

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||1

L = y log (wx + b) + (1 - y)log(1 - (wx + b)) + lambda*||w||22

lambda is a Hyperparameter Known as regularization constant and it is greater than zero.

We will implement this concept practically using TensorFlow.

How will dropout help with overfitting?

Dropout regularization will ensure the following:

explains why the generalization error has grown.

generalization error has decreased.

A Machine Learning model is defined as a mathematical model with a number of parameters

Drawback: GridSearchCV will go through all the intermediate combinations of

The following code illustrates how to use GridSearchCV

from sklearn.linear_model import LogisticRegression

# Creating the hyperparameter grid

If A=0 & B=0 → 00.6 + 00.6 = 0.