0% found this document useful (0 votes)
12 views61 pages

ML Module5

The document discusses Artificial Neural Networks (ANNs), which mimic the human brain's processing capabilities and learn from examples rather than being explicitly programmed. It outlines the architecture of ANNs, including input, hidden, and output layers, and describes various activation functions like sigmoid, tanh, and ReLU. Additionally, it covers the backpropagation algorithm, which is essential for training ANNs by adjusting weights based on error rates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views61 pages

ML Module5

The document discusses Artificial Neural Networks (ANNs), which mimic the human brain's processing capabilities and learn from examples rather than being explicitly programmed. It outlines the architecture of ANNs, including input, hidden, and output layers, and describes various activation functions like sigmoid, tanh, and ReLU. Additionally, it covers the backpropagation algorithm, which is essential for training ANNs by adjusting weights based on error rates.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Machine Learning

Module 5

Classification models
Artificial Neural Network

■ The conventional computers incorporated algorithmic


approach that is the computer used to follow a set of
instructions to solve a problem
■ restricted the problem-solving capacity of conventional
computers to problems that we already understand and know
how to solve.
■ Neural Networks processes information in a similar way the
human brain does, and these networks actually learn from
examples, you cannot program them to perform a specific
task.
■ They will learn only from past experiences as well as
examples, which is why you don't need to provide all the
information regarding any specific task.

3
Artificial Neural Network
■ An ANN is usually a computational network based on biological
neural networks that construct the structure of the human brain.
■ Similar to a human brain has neurons interconnected to each other,
ANN also have neurons that are linked to each other in various
layers of the networks. These neurons are known as nodes.
■ ANNs are the computing system that are designed to simulate the
way the human brain analyzes and processes the information.
■ It has self-learning capabilities that enable it to produce a better
result as more data become available.
■ So, if the network is trained on more data, it will be more accurate
because these neural networks learn from the examples.
■ The neural network can be configured for specific applications like
data classification, pattern recognition, etc.

4
Motivation behind Neural Network:

• The neural network is based on the neurons, which are nothing


but the brain cells.
• A biological neuron receives input from other sources,
combines them in some way, followed by performing a
nonlinear operation on the result, and the output is the final
result.

5
Motivation behind Neural Network:

• The dendrites will act as a receiver that receives signals from


other neurons, which are then passed on to the cell body.
• The cell body will perform some operations that can be a
summation, multiplication, etc.
• After the operations are performed on the set of input, then they
are transferred to the next neuron via axion, which is the
transmitter of the signal for the neuron. 6
The architecture of an ANN:

7
The architecture of an artificial
neural network:
Artificial Neural Network primarily consists of three layers:

1. Input Layer :
it accepts inputs in several different formats provided by the
programmer.
2. Hidden Layer :
The hidden layer presents in-between input and output layers. It
performs all the calculations to find hidden features and patterns.
3. Output Layer :
The input goes through a series of transformations using the hidden
layer, which finally results in output that is conveyed using this layer.

8
Biological Neural Artificial Neural
Network Network

• Dendrites represent inputs


• cell nucleus represents Nodes
• synapse represents Weights
• Axon represents Output 9
Importance of Neural Network:
■ Without Neural Network: Let's have a look at the example
given below. Without the neural network, our machine would
not identify the cat in the picture. Basically, the machine will
get confused in figuring out where the cat is.

10
Importance of Neural Network:
■ With Neural Network: However, when we talk about the case
with a neural network, even if we have not trained our
machine with that particular cat. But still, it can identify certain
features of a cat that we have trained on, and it can match
those features with the cat that is there in that particular image
and can also identify the cat.

11
Working of ANN:
■ The basic unit of Neural Network is a Perceptron.
■ a perceptron can be defined as a neural network with a single
layer that classifies the linear data.
■ It further constitutes four major components, which are as
follows:

1. Inputs
2. Weights and Bias
3. Summation Functions
4. Activation or transformation function

12
Working of ANN:

13
Working of ANN:
■ Weights and Bias
As and when the input variable is fed into the network, a
random value is given as a weight of that particular input,
such that each individual weight represents the importance of
that input in order to make correct predictions of the result.
However, bias helps in the adjustment of the curve of
activation function so as to accomplish a precise output.
■ Summation Function
After the weights are assigned to the input, it then computes
the product of each input and weights. Then the weighted sum
is calculated by the summation function in which all of the
products are added.

14
Working of ANN:
■ Activation Function
The main objective of the activation function is to perform
a mapping of a weighted sum upon the output.
The activation function is categorized into two main
parts:
1. Linear Activation Function
2. Non-Linear Activation Function

15
Working of ANN:
■ Linear Activation Function
In the linear activation function, the output of functions is
not restricted in between any range.
Its range is specified from -infinity to infinity.
For each individual neuron, the inputs get multiplied with
the weight of each respective neuron, which in turn leads
to the creation of output signal proportional to the input.
If all the input layers are linear in nature, then the final
activation of the last layer will actually be the linear
function of the initial layer's input.

16
Working of ANN:
■ Linear Activation Function Example

17
Working of ANN:
■ Non- linear activation function :

18
Working of ANN:
■ The non-linear activation function is further divided into
the following parts:
1. Sigmoid or Logistic Activation Function
It has an output value range between 0 and 1 that helps
in the normalization of each neuron's output.
even a small change in the X can bring a lot of change in
Y.
It's value ranges between 0 and 1 due to which it is
highly preferred by binary classification whose result is
either 0 or 1.

19
Working of ANN:
1. Sigmoid or Logistic Activation Function

20
Working of ANN:
2. Tanh or Hyperbolic Tangent Activation Function
The tanh activation function works much better than that of
the sigmoid function, or simply we can say it is an
advanced version of the sigmoid activation function.
Since it has a value range between -1 to 1, so it is utilized
by the hidden layers in the neural network, and because of
this reason, it has made the process of learning much
easier.

21
Working of ANN:
2. Tanh or Hyperbolic Tangent Activation Function

22
Working of ANN:
3. ReLU(Rectified Linear Unit) Activation Function

ReLU is one of the most widely used activation function by


the hidden layer in the neural network.
Its value ranges from 0 to infinity. It clearly helps in solving
out the problem of backpropagation.
It tends out to be more expensive than the sigmoid, as well
as the tanh activation function.
It allows only a few neurons to get activated at a particular
instance that leads to effectual as well as easier
computations.

23
Working of ANN:
3. ReLU(Rectified Linear Unit) Activation Function

24
Working of ANN:
4. Softmax Function
It is one of a kind of sigmoid function whereby solving the
problems of classifications.
It is mainly used to handle multiple classes for which it
squeezes the output of each class between 0 and 1,
followed by dividing it by the sum of outputs.
This kind of function is specially used by the classifier in the
output layer.

25
McCulloch Pitt's Model of
Neuron

❑ The McCulloch Pitt's Model of Neuron is the earliest


logical simulation of a biological neuron
❑ developed by Warren McCulloch and Walter Pitts in 1943
❑ hence, the name McCulloch Pitt’s model.

26
McCulloch Pitt's Model Architecture

27
McCulloch Pitt's Model Architecture
■ McCulloch Pitt’s model of neuron is a fairly simple model which
consists of some (n) binary inputs with some weight associated
with each one of them.
■ An input is known as ‘inhibitory input’ if the weight associated
with the input is of negative magnitude and is known as
‘excitatory input’ if the weight associated with the input is of
positive magnitude.
■ As the inputs are binary, they can take either of the 2 values, 0
or 1.
■ Then we have a summation junction that aggregates all the
weighted inputs and then passes the result to the activation
function.
■ The activation function is a threshold function that gives out 1 as
the output if the sum of the weighted inputs is equal to or above
the threshold value and 0 otherwise.
28
McCulloch Pitt's Model Architecture

■ So let’s say we have n inputs = { X1, X2, X3, …. , Xn }

■ And we have n weights for each= {W1, W2, W3, …., W4}

■ The summation of weighted inputs X.W = X1.W1 + X2.W2 +


X3.W3 +....+ [Link]

■ If X ≥ ø(threshold value)
Output = 1
Else
Output = 0

29
McCulloch Pitt's Model Example:

■ A bank wants to decide if it can sanction a loan or not. There are


2 parameters to decide- Salary and Credit Score.
■ So there can be 4 scenarios to assess-
1. High Salary and Good Credit Score
2. High Salary and Bad Credit Score
3. Low Salary and Good Credit Score
4. Low Salary and Bad Credit Score
■ Let X1 = 1 denote high salary
X1 = 0 denote Low salary
X2 = 1 denote good credit score
X2 = 0 denote bad credit score
▪ Let the threshold value be 2.
▪ Let W1 = 1, W2 = 1
30
McCulloch Pitt's Model Example:
■ The truth table is as follows:

X1 X2 W1.X1+W2.X2 Loan approved

1 1 2 1
1 0 1 0
0 1 1 0
0 0 0 0

■ If W1.X1+W2.X2 ≥ 2
Loan approved = 1
Else
Loan approved = 0
31
McCulloch Pitt's Model :

■ Question No.1: Implement AND function using mcculloch pitts


neuron
■ Question No.1: Implement OR function using mcculloch pitts
neuron

32
What is linear separable and non linear
separable?
■ If you can draw a line that can separate points into two
classes, then the data is separable.
■ If not, then the data is termed as non linearly separable
data.
■ For example :
AND, OR problems are linearly separable
XOR problem is linearly non-separable

33
Limitations of Perceptron Model
■ A perceptron model has limitations as follows:
• The output of a perceptron can only be a binary number
(0 or 1) due to the hard limit transfer function.
• Perceptron can only be used to classify the linearly
separable sets of input vectors.
• If input vectors are non-linear, it is not easy to classify
them properly.

34
What is a Multilayer Perceptron
Neural Network?
■ It is an Artificial Neural Network in which all nodes are
interconnected with nodes of different layers.
■ The Multilayer Perceptron (MLP) Neural Network works
only in the forward direction.
■ All nodes are fully connected to the network.
■ Each node passes its value to the coming node only in
the forward direction.
■ The MLP neural network uses a Backpropagation
algorithm to increase the accuracy of the training model.

35
What is a Multilayer Perceptron
Neural Network?

36
Structure of MultiLayer
Perceptron Neural Network
■ Input Layer
It is the initial or starting layer of the Multilayer
perceptron.
It takes input from the training data set and forwards it to
the hidden layer.
There are n input nodes in the input layer.
The number of input nodes depends on the number of
dataset features.
Each input vector variable is distributed to each of the
nodes of the hidden layer.

37
Structure of MultiLayer
Perceptron Neural Network
■ Hidden Layer
It is the heart of all Artificial neural networks.
This layer comprises all computations of the neural
network.
The edges of the hidden layer have weights multiplied
by the node values.
This layer uses the activation function.
There can be one or two hidden layers in the model.
Several hidden layer nodes should be accurate as few
nodes in the hidden layer make the model unable to
work efficiently with complex data.
38
Structure of MultiLayer
Perceptron Neural Network
■ Output Layer
This layer gives the estimated output of the Neural
Network.
The number of nodes in the output layer depends on the
type of problem.
For a single targeted variable, use one node.
N classification problem, ANN uses N nodes in the
output layer.

39
Working of MultiLayer
Perceptron Neural Network
The input node represents the feature of the dataset.
Each input node passes the vector input value to the hidden layer.
In the hidden layer, each edge has some weight multiplied by the
input variable. All the production values from the hidden nodes are
summed together to generate the output
The activation function is used in the hidden layer to identify the
active nodes.
The output is passed to the output layer.
Calculate the difference between predicted and actual output at
the output layer.
The model uses backpropagation after calculating the predicted
output.
40
The backpropagation algorithm
■ Backpropagation is the essence of neural network training.
It is the method of fine-tuning the weights of a neural network
based on the error rate obtained in the previous epoch (i.e.,
iteration). Proper tuning of the weights allows you to reduce
error rates and make the model reliable by increasing its
generalization.
■ Backpropagation in neural network is a short form for
“backward propagation of errors.” It is a standard method of
training artificial neural networks.

41
How Backpropagation Algorithm
Works

42
How Backpropagation Algorithm
Works
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are
usually randomly selected.
3. Calculate the output for every neuron from the input
layer, to the hidden layers, to the output layer.
4. Calculate the error in the outputs,
Error= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to
adjust the weights such that the error is decreased.

43
The backpropagation algorithm
1. Initially the weights are assigned at random.
2. Then the algorithm iterates through many cycles of two
processes until a stopping criterion is reached.
Each cycle is known as an epoch.
Each epoch includes:
(a) A forward phase, in which the neurons are activated in
sequence from the input layer to the output layer, applying each
neuron’s weights and activation function along the way.
Upon reaching the final layer, an output signal is produced.

44
The backpropagation algorithm

(b) A backward phase, in which the network’s output signal


resulting from the forward phase is compared to the true target
value in the training data.

The difference between the network’s output signal and the true
value results in an error that is propagated backwards in the
network to modify the connection weights between neurons and
reduce future errors.

45
The backpropagation algorithm
3. The technique used to determine how much a weight should
be changed is known as gradient descent method. At every
stage of the computation, the error is a function of the weights.
If we plot the error against the wights, we get a higher
dimensional analog of something like a curve or surface.
At any point on this surface, the gradient suggests how steeply
the error will be reduced or increased for a change in the
weight.
The algorithm will attempt to change the weights that result in
the greatest reduction in error (see Figure).

46
The backpropagation algorithm

47
Example:

■ Consider a small network with two inputs, two outputs


and one hidden layer as shown in Figure

48
Example:

■ We assume that there are two observations:

■ We are required to estimate the optimal values of the


weights w1, . . . , w8, b1, b2.
■ Here b1 and b2 are the biases.
■ For simplicity, we have assigned the same biases to both
nodes in the same layer.
49
Example:

■ Step 1. We initialize the connection weights to small


random values. These initial weights are shown in Figure

50
Example:

■ Step 2. Present the first sample inputs and the


corresponding output targets to the network.
■ Step 3. Pass the input values to the first layer (the layer
with nodes h1 and h2).
■ Step 4. We calculate the outputs from h1 and h2.

51
■ Example:

We use the logistic activation function

52
Example:

■ Step 5. We repeat this process for every layer. We get


the outputs from the nodes in the output layer as follows:

53
Example:

■ The sum of the squares of the output errors is given by

54
Example:
Step 6.
■We begin backward phase. We adjust the weights.
■We first adjust the weights leading to the nodes o1 and o2
in the output layer and then the weights leading to the
nodes h1 and h2 in the hidden layer.
■The adjusted values of the weights w1, . . . , w8, b1, . . . ,
b4 are denoted by w1+ , . . . , w8+ , b1+ , . . . , b4+ .
■The computations use a certain constant η called the
learning rate.
■In the following we have taken η = 0.5.

55
Example:
Step 6.
a) Computation of adjusted weights leading to o1 and o2:

56
Example:
Step 6.
a) Computation of adjusted weights leading to o1 and o2:

57
Example:
Step 6.
b) Computation of adjusted weights leading to h1 and h2:

58
Example:
Step 6.
b) Computation of adjusted weights leading to h1 and h2:

59
Example:
Step 7. Now we set

We choose the next sample input and the corresponding output


targets to the network and repeat Steps 2 to 6.

Step 8. The process in Step 7 is repeated until the root mean


square of output errors is minimized.

60
Delta rule:

In the above computations, the method used to calculate the


adjusted weights is known as the delta rule.

61

You might also like