0% found this document useful (0 votes)
36 views64 pages

UNIT 1 Deep Learning 1

Uploaded by

studybridge1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views64 pages

UNIT 1 Deep Learning 1

Uploaded by

studybridge1234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 64

UNIT:-1 Introduction to Deep Learning

What is Machine Learning?


• Machine Leaning allows the computers to
learn from the experiences by its own, use
statistical methods to improve the
performance and predict the output without
being explicitly programmed.
• What is Biological Neuron:?
• A human brain has ( ) 100 billions of neurons.

• Neurons are interconnected nerve cells in the human brain


that are involved in processing and transmitting chemical and
electrical signals.

• Basically, a neuron takes an input signal (dendrite), processes


it like the CPU (soma), passes the output through a cable like
structure to other connected neurons (axon to synapse to
other neuron’s dendrite).

• Now, this might be biologically inaccurate as there is a lot


more going on out there but on a higher level, this is what is
going on with a neuron in our brain — takes an input,
processes it, throws out an output.
Biological inspiration of Neural Networks:

• A neuron (nerve cell) is the basic building block of the nervous system. A
human brain consists of billions of neurons that are interconnected to each
other.

1. Neurons are responsible for receiving and sending signals from the brain.

2. Dendrites: Receives signals from the other neurons.

3. Soma:- Soma is the core of a neuron. It is responsible for processing the


information received from dendrites.

4. Axon:- Axon is like a cable through which the neurons send the
information. Towards its end, the axon splits up into many branches that
make connections with the other neurons through their dendrites.

5. Synapses:- connection between the axon and other neuron dendrites is


called synapses. The transmission of signals to other neurons is carried by
the axon.
• What is Artificial Neuron?
• An artificial neuron is a mathematical function based on biological neurons,
where each neuron takes inputs, weighs them separately, sums them up and
passes this sum through a nonlinear function to produce output.

• The neuron takes inputs in the form of binary values i.e. 1 or 0.

• The output of an artificial neuron is usually calculated by applying a


threshold function to the sum of its input values.

• An Artificial Neural Network (ANN) is built on artificial neurons.


McCulloch-Pitts Neuron Model
• The first computational model of a neuron was proposed by Warren McCulloch
(Neuroscientist) and Walter Pitts (logician) in 1943.

• The McCulloch-Pitts neural model, which was the earliest ANN model, has only two
types of inputs -Excitatory and Inhibitory.

• The excitatory inputs have weights of positive magnitude and

• The inhibitory inputs have weights of negative magnitude.

• The inputs of the McCulloch-Pitts neuron could be either 0 or 1.

• It has a threshold function as an activation function.

• So, the output signal y= Yout is 1 if the input ysum is greater than or equal to a given
threshold value, else 0.

• Simple McCulloch-Pitts neurons can be used to design logical operations.

• The connection weights need to be correctly decided along with the threshold function
(rather than the threshold value of the activation function).
• It may be divided into 2 parts. The first part, g takes an input, performs an
aggregation(Sum) and based on the aggregated value the second part, f makes a
decision.

• g aggregates the inputs and

• function f takes a decision based on this aggregation (Summation)


• Consider the Example: John carries an umbrella if it is
sunny or if it is raining. There are four given
situations.

• I need to decide when John will carry the umbrella.

• The situations are as follows:

1. First scenario: It is not raining, nor it is sunny


2. Second scenario: It is not raining, but it is sunny
3. Third scenario: It is raining, but it is not sunny
4. Fourth scenario: It is raining as well as it is sunny
– To analyse the situations using the McCulloch-Pitts neural model,
I can consider the input signals as follows:
• X1: Is it raining?
• X2 : Is it sunny?

So, the value of both scenarios can be either 0 or 1. We can


use the value of both weights X1 and X2 as 1 and a threshold
function as 1.

So, the neural network model will look like:


Truth Table for this case will be:
Scenario x1 x2 ysum yout
1 0 0 0 0
2 0 1 1 1
3 1 0 1 1
4 1 1 2 1

• So, we can say that,

• Where θ=1, is called Thresholding Logic.

• From the truth table, I can conclude that in the situations where the value of yout is 1,
John needs to carry an umbrella.

• Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.


• We can see that Y(sum) is just doing a sum of the
inputs — a simple aggregation and θ (theta) here is
called thresholding parameter.

• For example, John needs to carry an umbrella when


the sum turns out to be 1 or more, the theta is 1 here.

• This is called the Thresholding Logic.


Let us implement some boolean functions using this
McCulloch & Pitts (MP) neuron
• Can any boolean function be represented using a
McCulloch Pitts unit ?
• A single McCulloch Pitts Neuron can be used to represent
boolean functions which are linearly separable.
• A single McCulloch Pitts Neuron can be used
to represent boolean functions which are
linearly separable.

• Linear separability (for boolean functions)


means: There exists a line (plane) such that all
inputs which produce a output 1 lie on one
side of the line (plane) and all inputs which
produce a output 0 lie on other side of the line
(plane).
Difference between MP Neuron and MP Neuron

Sr. MP Neuron Artificial Neuron


No.
01 It work on linearly separable It also work on linearly
data. separable data.

02 MP Neuron Model only accepts Perceptron Model can process


boolean input. (0 OR 1) any real value.

03 A weight is assigned to each A weight is assigned to each


input node of a perceptron. input node of a perceptron.

04 While using this models we can While using this models we can
adjust threshold value to make adjust threshold value to make
the model fit our dataset. the model fit our dataset.
What is Perceptron ?

• Perceptron is a building block of an Artificial Neural Network.

• A Perceptron is an artificial neuron, and

• It performs computations to detect features or patterns in the


input data.

• It allows artificial neurons to learn and process features in a


data set.

• Mr. Frank Rosenblatt, an American psychologist invented the


classical perceptron model (1958), a more general
computational model than McCulloch–Pitts neurons.
Basic Components of Perceptron:
1. Input Values: A set of values or a dataset for predicting the output value. They are also
described as a dataset’s features and dataset.
2. Weights: The real value of each feature is known as weight. It tells the importance of that
feature in predicting the final value.
3. Bias: A bias term is added to the input layer to provide the perceptron with additional
flexibility in modeling complex patterns in the input data. A bias term is often included in
the perceptron model. The bias allows the model to make adjustments that are
independent of the input.
4. Activation function: The activation function determines whether the neuron will fire or
not. At its simplest, the activation function is a step function, but based on the scenario,
different activation functions can be used.
5. Summation Function: The summation function binds the weights and inputs together. It is
a function to find their sum.
6. Output: The output of the perceptron is a single binary value, either 0 or 1, which
indicates the class or category to which the input data belongs.
• A Perceptron is an artificial neuron, essential for
Deep Learning neural networks.
• How does Perceptron work?
• A weight is assigned to each input node of a
perceptron, indicating the significance of that input to
the output.

• The perceptron’s output z is a weighted sum of the


inputs that have been run through an activation
function to decide whether or not the perceptron will
fire. it computes the weighted sum of its inputs as:

z = w1x1 + w1x2 + ... + wnxn = XTW


• The step function compares this weighted sum to the
threshold, which outputs 1 if the input is larger than a
threshold value and 0 otherwise, is the activation
function that perceptrons utilize the most frequently.

• The most common step function used in perceptron is


the Heaviside step function.

• A perceptron has a single layer of threshold logic


units(TLU) with each TLU connected to all inputs.

• When all the neurons in a layer are connected to every


neuron of the previous layer, it is known as a fully
connected layer or dense layer.
Working of a Perceptron:
Step-1: All the input values are multiplied with their respective weights and added
together.
The result obtained is called weighted sum ∑wi*xi, or stated differently, x1*w1 +
x2*w2 +…wn*xn.
Additionally, a bias term b is added to this sum ∑wi*xi + b.
Note:- Bias serves as another model parameter (in addition to weights) that can be
tuned to improve the model’s performance.
Step-2: An activation function f is applied over the above sum ∑wi*xi + b to obtain
output Y = f(∑wi*xi + b).
Depending upon the activation function used, the Output is either binary{1, 0} or
a continuous value.
• Step-1: Multiply all input values with corresponding
weight values and then add to calculate the weighted
sum.
The following is the mathematical expression of
it:
∑wi*xi = x1*w1 + x2*w2 + x3*w3+……..+xi*wi.
Add a term called bias ‘b’ to this weighted sum
to improve the model’s performance.

• Step 2: An activation function is applied with the


above-mentioned weighted sum giving us an output
either in binary form or a continuous value as follows:
Y=f(∑wi*xi + b)
• A biological neuron only fires when a certain threshold is exceeded.

• Similarly, the artificial neuron will also only fire when the sum of the
inputs (weighted sum) exceeds a certain threshold value, let’s say 0.

• Intuitively, we can think of a rule-based approach like this –

If ∑wi*xi + b > 0: output = 1


Else: output = 0

Activation Functions:

Unit Step (Threshold) activation function which was originally used by Rosenblatt.

Sigmoid activation function. It outputs between 0 and 1.

Hyperbolic tangent(tanh) function, which produces the output between -1 and 1.

ReLU is the most popularly used activation functions.


Activation Function:
• Typically, each neuron in the hidden layers and the output
layer applies an activation function to its weighted sum of
inputs.

• Common activation functions include sigmoid, tanh, ReLU


(Rectified Linear Unit), and softmax.

• These functions introduce nonlinearity into the network,


allowing it to learn complex patterns in the data.
• Neural Representation of AND, OR, NOT, XOR Logic Gates
(Perceptron Algorithm)
OR logical function truth table for 2-bit
binary variables, i.e, the input vector and the
corresponding output is Y.
For the implementation,
considered weight parameters are w1=1 ,
w2=1 and the bias parameter is b=-0.5 .

Also consider: w1=2, w2=2 and b=-1


• Therefore, we can conclude that the model to achieve an OR-
gate, using the Perceptron algorithm is 2x1+2x2–1
• Consider, w1=1, w2=1 and bias b=-1

• Therefore, we can conclude that the


model to achieve an AND gate, using
the Perceptron algorithm is x1+x2–1
• NOT Gate:-
• Consider, w1= -1, and bias b=1

Therefore, we can conclude that the model to achieve


a NOT gate, using the Perceptron algorithm is –x1+1
• NOR Gate
Consider, w1= -1, w2=-1 and bias b=1

• Therefore, we can conclude that the model to achieve a


NOR gate, using the Perceptron algorithm is -x1-x2+1
• NAND Gate
• Consider, w1= -1, w2=-1 and bias b=2

• Therefore, we can conclude that the model to achieve a NAND


gate, using the Perceptron algorithm is -x1-x2+2
Perceptron Learning Algorithm
• Perceptron Learning Rule states that the algorithm
would automatically learn the optimal weight
coefficients. The input features are then multiplied with
these weights to determine if a neuron fires or not.

• The Perceptron receives multiple input signals, and if


the sum of the input signals exceeds a certain threshold,
it either outputs a signal or does not return an output.
• Let us reconsider our problem of
deciding whether to watch a movie or
not?

• Suppose we are given a list of m movies


and a label (class) associated with each
movie indicating whether the user liked
this movie or not : binary decision

• Further, suppose we represent each


movie with n features (some boolean,
some real valued).

• We will assume that the data is linearly


separable and we want a perceptron to
learn how to make this decision.

• In other words, we want the perceptron


to find the equation of this separating
plane (or find the values of w0, w1, w2,
……., wm).
Perceptron learning algorithm and Convergence:

Step-1: P is the inputs with label 1;


Step-2: N is the inputs with label 0;
Step-3: Initialize w randomly;
Step-4: while (!convergence) do
Step-5: Pick randomly x=P or N ;
Step-5.1: if x=P and then
Step-5.2: w=w+x;
Step-5.3: end
Step-5.4: if x=N and then
Step-5.5: w=w-x;
Step-5.6: end
Step-6: end
Que1: What about non-boolean (say, real) inputs?
Ans:- Real valued inputs are allowed in perceptron

Que2:Are all inputs equal? What if we want to


assign more weight (importance) to some inputs?
Ans:- A perceptron allows weights to be assigned to
inputs.

Que3:- What about functions which are not linearly


separable ?
Ans:- Not possible with a single perceptron but we
will see how to handle this.
Types of Perceptron models:
• Single Layer Perceptron model: One of the easiest types consists of
a feed-forward network and includes a threshold transfer inside the
model. A Single-layer perceptron model can learn only linearly
separable patterns.

• Multi-Layered Perceptron model: It is mainly similar to a single-


layer perceptron model but has more hidden layers.

 Feed-Forward Stage: From the input layer in the on stage, activation


functions begin and terminate on the output layer.

 Feed-Backward Stage: In the backward stage, weight and bias values are
modified as per the model’s requirement. The backstage removed the
error between the actual output and demands originating backward on
the output layer.

 A multilayer perceptron model has a greater processing power and


can process linear and non-linear patterns.
Advantages of Perceptron Model:
1. Provide weights to the inputs and a mechanism for
learning these weights.

2. Inputs are no longer limited to boolean values (non-


boolean (say, real) inputs. The inputs can be real valued.

Limitations of Perceptron Model :


3. The output of a perceptron can only be a binary number
(0 or 1) due to the hard-edge transfer function.

4. It can only be used to classify the linearly separable sets


of input vectors. If the input vectors are non-linear, it is
not easy to classify them correctly.
Perceptron has the following characteristics:
1. Perceptron is an algorithm for Supervised Learning of
single layer binary linear classifiers.
2. Optimal weight coefficients are automatically learned.
3. Activation function applies a step rule to check if the
output of the weighting function is greater than zero.
4. Linear decision boundary is drawn enabling the
distinction between the two linearly separable classes
+1 and -1.
5. If the sum of the input signals exceeds a certain
threshold, it outputs a signal; otherwise, there is no
output.
6. Types of activation functions include the sign, step, and
sigmoid functions.
Multi-layer perception(MLP)
• It is mainly similar to a single-layer perceptron model but has
multiple hidden layers.

• A multi-layer perceptron (MLP) is a type of artificial neural


network consisting of multiple layers of neurons.

• To create a neural network, we combine neurons together so


that the outputs of some neurons are inputs of other neurons.

• The neurons in the MLP typically use nonlinear activation


functions, allowing the network to learn complex patterns in
data.

• MLPs are significant because they can learn nonlinear


relationships in data, making them powerful models for tasks
such as classification, regression, and pattern recognition.
• MLPs are trained using an optimization algorithm, such as gradient
descent, to iteratively adjust the weights and biases based on the gradient
of the loss function. This process continues until the network converges to
an optimal set of parameters that minimize the loss function.

• The term “multi-layer perceptron” is often used interchangeably with


“deep neural network,” although some sources may consider MLPs as a
specific type of deep neural network.

• The terminology can be confusing, but in general, an MLP refers to a


specific architecture of a deep neural network, characterized by its fully
connected layers and use of backpropagation for training.

• There are a few limitations to consider when employing MLPs:


• Computational cost: Training MLPs can be computationally expensive,
especially with large datasets or complex architectures.

• Tuning hyperparameters: Finding the optimal number of hidden layers,


neurons, and activation functions can require extensive experimentation.
• What is feed Forward Neural Network?
• A multilayer perceptron is a type of feedforward
neural network consisting of fully connected neurons
with a nonlinear kind of activation function.

• It is widely used to distinguish data that is not


linearly separable.

• During feedforward propagation, input data is passed


through the network layer by layer, with each layer
performing a computation based on the inputs it
receives and passing the result (output) to the next
layer.
A MLP typically includes the following components:

• Input layer: Receives input data and passes it on to the hidden layers. The number of
neurons in the input layer is equal to the number of input features.

• Hidden layers: Between the input and output layers, there can be one or more layers of
neurons. Each neuron in a hidden layer receives inputs from all neurons in the previous
layer (either the input layer or another hidden layer) and produces an output that is
passed to the next layer.

• The number of hidden layers and the number of neurons in each layer can be adjusted
to optimize the network’s performance.

• The number of hidden layers and the number of neurons in each hidden layer are called
hyperparameters.

• Output layer: This layer consists of neurons that produce the final output of the
network. The number of neurons in the output layer depends on the nature of the task.
• In binary classification, there may be either one or two neurons depending on the
activation function and representing the probability of belonging to one class;

• In multi-class classification, there can be multiple neurons in the output layer.


• Weight:- Neurons in adjacent layers are fully connected to each other. Each
connection has an associated weight, which determines the strength of the
connection. These weights are learned during the training process to minimize
the difference between the predicted output and the actual output values.

• Bias: In addition to the input and hidden neurons, each layer (except the input
layer) usually includes a bias neuron that provides a constant input to the
neurons in the next layer. The bias neuron has its own weight associated with
each connection, which is also learned during training.

• Activation function: Typically, each neuron in the hidden layers and the output
layer applies an activation function to its weighted sum of inputs.

• Activation functions introduce a non-linear transformation into the network,


allowing it to learn complex patterns in the data.

• Loss function: Measures the discrepancy between the network’s predictions and
the actual target values.

• Common loss functions include mean squared error(MSE) and cross-entropy.


Why do we Need Weight and Bias?

• These are learnable parameters and as the


network gets trained it adjusts both Weight
and Bias parameters to achieve the desired
values and the correct output.
Applications of multilayer perceptrons.

1. Image recognition: Classifying images into


different categories like cats, dogs, or cars.
2. Speech recognition: Converting spoken
language into text.
3. Natural language processing: Understanding
the meaning of text and performing tasks like
sentiment analysis or machine translation.
4. Time series forecasting: Predicting future
values based on past data, such as stock
prices or weather patterns.
Backpropagation Algorithm

• The term “Multi-layer perceptron” can be confusing,


but in general, an MLP refers to a specific type of deep
neural network, characterized by its fully connected
layers and use of backpropagation for training.

• In a neural network, we would update the weights and


biases of the neurons on the basis of the error at the
output. This process is known as back-propagation.

• Activation functions make the back-propagation


possible since the gradients are supplied along with the
error to update the weights and biases.
Backpropagation
Training with Backpropagation:
• MLPs are trained using the backpropagation algorithm, which computes gradients of a
loss function and updates the parameters iteratively to minimize the loss.

• Backpropagation is an algorithm used to train the neural networks by iteratively


adjusting the weights and biases in order to minimize the loss function.

• During backpropagation, the network adjusts its weights and biases by propagating
the error backwards from the output layer to the input layer. This iterative process
fine-tunes the model’s parameters, enabling it to make more accurate predictions
over time.

• A loss function (also known as a cost function or objective function) is a measure of


how well the model's predictions match the true target values in the training data.

• The loss function quantifies the difference between the predicted output and the
actual output, providing a signal that guides the optimization process during training.

• The goal of training a neural network is to minimize this loss function by adjusting the
weights and biases. The adjustments are guided by an optimization algorithm, such as
gradient descent.
• The input is fed to the input layer, the neurons perform a linear
transformation on this input using the weights and biases.
x = (weight * input) + bias

• After that, an activation function is applied on the above result.

• Finally, the output from the activation function moves to the next hidden
layer and the same process is repeated. This forward movement is known as
the forward propagation (Neural Network).

• What if the output generated(Predicted Output) is far away from the actual
value?

• Using the output generated from the forward propagation, error is calculated.
• Based on this error value, the weights and biases of the neurons are updated.
This process is known as back-propagation.
Representation Power of MLP
• We consider 2 inputs and 4 perceptrons.

• Each input is connected to all the 4 perceptrons with


specific weights.

• The bias (w0) of each perceptron is -2 (i.e., each perceptron


will fire only if the weighted sum of its input is 2)

• Each of these perceptrons is connected to an output


perceptron by weights (which need to be learned).

• The output of this perceptron (y) is the output of this


network
• This network contains 3 layers: input,
hidden and output layer.

• The outputs of the 4 perceptrons in the


hidden layer are denoted by h1, h2, h3,
h4.

• The red and blue edges are called layer-1


weights.

• w1, w2, w3, w4 are called layer-2 weights.

• This network can be used to implement


any boolean function (linearly separable
or not) !

• In other words, we can find w1, w2, w3,


w4 such that the truth table of any
boolean function can be represented by
this network.
• Each perceptron in the middle
layer fires only for a specific input
(and no two perceptrons fire for
the same input).

• the first perceptron fires for {-1,-1}

• the second perceptron fires for {-


1,1}

• the third perceptron fires for {1,-1}

• the fourth perceptron fires for {1,1}

• This network works by taking an


example of the XOR function.
Sigmoid Function
• What is a Sigmoid function?

• A sigmoid function is a mathematical function with a characteristic "S"-shaped curve


or sigmoid curve.

• It transforms any value in the domain (−∞,∞) to a number between 0 and 1.

• The sigmoid function's ability to transform any real number to a number between 0
and 1.

• It is advantageous in data science and deep learning.

• In binary classification,, the sigmoid function is used to predict the probability of a


binary variable.

• The formula of the sigmoid activation function is:


F(x) = σ(x) = 1 ⁄ (1 + e-x)
What is Deep learning

• Deep learning, is a subset of machine learning that uses neural


networks with multiple layers to analyze complex patterns and
relationships in data.

• It is inspired by the structure and function of the human brain.

• In Deep Learning artificial neural network (ANN) and the recurrent


neural network come in relation.

• In much simpler terms, it replicates just like the human brain as all
the neurons are connected in the brain, which exactly is the concept
of deep learning.

• It is inspired by the functionality of human brain cells, which are


called neurons, and leads to the concept of artificial neural
networks.
“Traditional” machine learning:

handcrafted learned cat


features classifier

Deep, “end-to-end” learning:

learned learned learned


learned cat
low-level mid-level high-level
classifier
features features features
Popular applications of deep learning are:
• self-driving cars,
• language translation,
• natural language processing, etc.

Some popular deep learning models are:


• Convolutional Neural Network (CNN)
• Recurrent Neural Network (RNN)
• Autoencoders
• Classic Neural Networks, etc.
Difference between ML and DL
Sr. ML DL
No.
1 feature extraction by the expert, manually No need to develop the feature extractor for
each problem; instead, it tries to learn high-
level features from the data on its own.

2 Machine learning learn from the data, and Deep learning creates an “artificial neural
make decisions based on the previous data network or DNN” that can learn and make
and experiences. intelligent decisions on its own.

3 ML models mostly require data in a DL models can work with both structured
structured form. and unstructured type of data

4 ML consists of thousands of data points. Big Data: Millions of data points.

5 ML models are suitable for solving simple or DL models are suitable for solving complex
bit-complex problems. problems.
CS 404/504, Fall 2021

Elements of Neural Networks


Introduction to Neural Networks

• Deep NNs have many hidden layers


 Fully-connected (dense) layers (a.k.a. Multi-Layer Perceptron or MLP)
 Each neuron is connected to all neurons in the succeeding layer

Input Layer 1 Layer 2 Layer L Output


x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM

Input Layer Output Layer


Hidden Layers
Slide credit: Hung-yi Lee – Deep Learning Tutorial 64

You might also like