Deeplearning Unit - 1 Notes Final
Deeplearning Unit - 1 Notes Final
LTPC
3003 UNIT-I
Deep Learning (DL) is a branch of Machine Learning that uses Artificial Neural
Networks (ANNs) with many layers (hence “deep”).
It teaches computers to learn from large amounts of data and automatically find
features without manual programming.
1. Input Layer – Takes raw data (image pixels, sound waves, text).
2. Hidden Layers – Each neuron processes inputs using weights and biases, applies
an activation function.
3. Output Layer – Produces prediction (e.g., “cat” or “dog”).
4. Learning Process:
o Compare output to the correct answer (loss function).
o Adjust weights using optimization algorithms (e.g., Gradient Descent).
o Repeat until the network learns.
5. Key Components
6. Advantage
7. Disadvantages
Biological Neuron
A biological neuron is the smallest functional unit of the human brain and nervous
system. It is responsible for processing and transmitting information in the form of
electrical and chemical signals. The human brain contains billions of neurons that work
together to perform complex tasks such as thinking, learning, memory storage, and
decision-making.
Structurally, a neuron has three main parts. Dendrites are branch-like structures that
receive signals (inputs) from other neurons or sensory organs and carry these signals to
the cell body. The Cell Body (Soma) processes the incoming signals by summing them
up and deciding whether to send a signal forward. The Axon is a long fiber that carries
the processed signal away from the cell body to other neurons or muscles. The axon is
often covered by a protective myelin sheath, which speeds up the transmission of
signals.
At the end of the axon are axon terminals, which connect to the dendrites of other
neurons through tiny gaps called synapses. Communication across synapses occurs via
neurotransmitters, which are chemical messengers that allow the signal to pass from
one neuron to another.
From an artificial intelligence perspective, the biological neuron is the inspiration for
Artificial Neural Networks (ANNs) in deep learning. In such networks, dendrites are
equivalent to the inputs, the soma corresponds to the summation unit, and the axon
output corresponds to the final activation output.
In simple terms, a biological neuron can be seen as a tiny processor that receives
signals, processes them, and sends them to other neurons. Understanding the working
of biological neurons is important because it helps us design computer systems that can
learn from data in a way that is similar to how humans learn from experience. Neurons
are found in the brains and nervous systems of humans and animals, and each neuron
connects to thousands of others through synapses, forming a massive communication
network.
Diagram:
Analog
Advantage
Disadvantages
Biological neurons are slow compared to computers (signal speed ~120 m/s,
computers are much faster).
Can get damaged (injury, disease).
Needs energy (glucose, oxygen).
Real-time Example
z=∑i=1 n(xi×wi)+b
This value z is then passed through an activation function f(z), which decides the final
output of the unit. The activation function introduces non-linearity, allowing the
network to handle more complex problems. In early AI models, a threshold function
was often used, where the output was 1 if the sum was greater than or equal to the
threshold and 0 otherwise. Modern deep learning models use advanced activation
functions such as Sigmoid, Tanh, and ReLU.
A computational unit is the basic building block of Artificial Neural Networks (ANNs)
and deep learning models. Its main tasks are:
1. Take inputs.
2. Multiply them by importance values (weights).
3. Add a constant value (bias).
4. Pass the result through a decision-making function (activation function).
5. Produce an output.
Also known as an artificial neuron, processing element, or node, a computational unit
is essential for enabling learning. Without it, a neural network would just be an empty
structure, unable to detect patterns or make decisions. While one computational unit
alone can only perform simple tasks, many connected together can process complex
information, much like how multiple biological neurons form the human brain.
Computational units are used in almost every deep learning application today, including
image classification (face recognition, medical imaging), speech recognition (Alexa,
Siri, Google Assistant), language translation (Google Translate), self-driving cars
(object detection), and recommendation systems (Netflix, YouTube, Amazon).
1. Inputs arrive
o Represented as x1,x2,x3,…x_1, x_2, x_3,….
o Example: features of a house — size, location score, number of rooms.
2. Weights multiply inputs
o Each input has a weight w1,w2,w3,…w_1, w_2, w_3,….
o Weights decide importance of each input.
Example: location may be more important than number of rooms.
3. Weighted sum is calculated
z=(x1×w1)+(x2×w2)+(x3×w3)+⋯+b
Advantages
Simple yet powerful concept that can be scaled into complex networks.
Can process any type of data if represented in numbers.
Works well with large data when combined in multiple layers.
Disadvantages
One unit alone is very limited — only solves simple, linearly separable problems.
Needs training to adjust weights (can be time-consuming).
Incorrect activation choice can make learning fail.
Step 1 – Inputs:
Z = (8×0.5)+(6×0.3)+(7×0.2)+1
z = (8 * 0.5) + (6 * 0.3) + (7 * 0.2) + 1
z = 4+1.8+1.4+1
z = 4 + 1.8 + 1.4 + 1
z=8.2
Key Takeaways
Computational unit = artificial brain cell.
Works by: Inputs → Weights → Bias → Summation → Activation → Output.
Multiple units form layers → multiple layers form deep networks.
The MCP neuron works in a straightforward way. It takes binary inputs (only 0 or 1),
multiplies each input by a fixed weight, and then adds up the results. The total is then
compared to a set threshold value:
If the sum is greater than or equal to the threshold, the neuron outputs 1
(meaning it “fires”).
If the sum is less than the threshold, the neuron outputs 0 (meaning it stays
inactive).
This simple mechanism is called threshold logic, where decisions are made based on
whether the input crosses a specific cut-off point. Using this approach, the MCP neuron
can perform basic logical operations like AND, OR, and NOT, which are the building
blocks of digital circuits.
Although the MCP model is too simple for solving complex problems, it is important
because it laid the foundation for modern neural networks and introduced the idea of
decision-making using thresholds.
In short, the McCulloch–Pitts unit is like the “grandparent” of today’s deep learning
models — basic, limited, but extremely important in history.
Detailed Working
Step-by-step process:
S=(x1×w1)+(x2×w2)+⋯+(xn×wn)
Diagram
x1 ---- w1 ----\
x2 ---- w2 ----- SUM ---> Compare with θ ---> Output (0 or 1)
x3 ---- w3 ----/
Visual Representation:
Disadvantages
Only works for binary inputs/outputs.
No learning capability — weights fixed.
Can only solve linearly separable problems (cannot solve XOR).
Real-Time Analogy
Think of a club bouncer:
0 0 0+0=0 No 0
0 1 0+1=1 No 0
1 0 1+0=1 No 0
1 1 1+1=2 Yes 1
Set:
0 0 0 No 0
0 1 1 Yes 1
1 0 1 Yes 1
1 1 2 Yes 1
Key Formula
Where:
xi = input values
wi = corresponding weights
θ = threshold value
y = output (1 or 0)
Threshold Logic
Threshold logic is a decision-making method used in early artificial neurons, such as
the McCulloch–Pitts (MCP) model and perceptrons. In this approach, the neuron
calculates the weighted sum of its inputs and compares it to a fixed cut-off value called
the threshold. If the sum is greater than or equal to the threshold, the neuron outputs 1
(meaning “Yes,” “True,” or “Fire”). If the sum is less than the threshold, the neuron
outputs 0 (meaning “No,” “False,” or “Don’t Fire”).
This method produces a binary output instead of a continuous value, closely mimicking
the “all-or-nothing” firing behavior of biological neurons. Threshold logic was one of the
earliest techniques that allowed artificial neurons to replicate basic human brain
decision-making.
S=(x1×w1)+(x2×w2)+⋯+(xn×wn)
Compare SS to the threshold θ\theta:
o If S ≥ θS, output = 1.
o Else, output = 0.
Formula
Diagram
Inputs Weights Summation Threshold Decision Output
x1 ----w1----\
x2 ----w2----- Σ ---- Compare with θ -----> 1 (fire) or 0 (no fire)
x3 ----w3----/
Visual:
x1 → [× w1] \
x2 → [× w2] >-- [ SUM + Bias ] --[ Compare with θ ]--> Output y
x3 → [× w3] /
Advantages
Very simple and fast for binary decisions.
Good for implementing logic gates.
Computationally inexpensive.
Disadvantages
Only outputs binary values — can’t give probabilities or confidence levels.
Cannot solve problems where a smooth decision boundary is needed.
Fails for non-linearly separable problems (like XOR).
Real-Time Analogy
Imagine an automatic entry gate:
Inputs:
o x1 = You have an ID card (1 if yes, 0 if no) — weight 5.
o x2 = You entered a correct PIN (1 if yes, 0 if no) — weight 3.
Threshold = 5.
If your weighted score ≥ 5, gate opens (output 1). Else, gate stays closed (output 0).
Step 1 – Parameters:
0 0 0 No 0
1 0 1×2=2 No 0
0 1 1×3=3 No 0
1 1 2+3=5 Yes 1
Weights: w1=1,w2=1
Threshold θ=2
x1 x2 Sum Output
0 0 0 0
0 1 1 0
1 0 1 0
1 1 2 1
Linear Perceptron
The Linear Perceptron is an improved version of the McCulloch–Pitts neuron that has
the ability to learn from data. It was introduced in 1958 by Frank Rosenblatt and
marked a major step forward in the development of artificial intelligence.
The key difference between the MCP neuron and the linear perceptron is that the MCP
has fixed weights, whereas the perceptron can adjust its weights automatically
during training using a learning algorithm. This ability to update weights allows the
perceptron to improve its performance over time.
The linear perceptron is important because it was the first model capable of learning
from data, making it the foundation of modern deep learning techniques. It can
successfully solve linearly separable problems, such as logical AND and OR gates,
where a single straight line can separate the classes.
Although modern neural networks have moved beyond the limitations of the simple
perceptron, this model remains an essential concept for understanding how learning in
AI began.
Structure & Working
Step-by-step:
z=(x1×w1)+(x2×w2)+...+b
Pass z through activation function (commonly step function for linear perceptron).
Diagram
x1 ---- w1 ----\
x2 ---- w2 ----- SUM (+ b) ---> Activation ---> Output y
x3 ---- w3 ----/
Visual:
Advantages
Can learn from examples.
Simple to implement.
Works well for linearly separable problems.
Guaranteed to converge if data is linearly separable.
Disadvantages
Cannot solve non-linearly separable problems (like XOR).
Only works for binary classification in original form.
Limited to straight-line decision boundaries.
Real-Life Analogy
Think of a job applicant screening system:
Inputs:
o x1x_1 = Education score.
o x2x_2 = Work experience score.
The system learns weights for each input:
o Maybe experience is more important than education.
Bias = baseline preference.
After training on past hiring data, it predicts:
o 1 → Accept, 0 → Reject.
Numerical Example
We will train a perceptron to learn the AND Gate.
x1 x2 Target t
0 0 0
0 1 0
x1 x2 Target t
1 0 0
1 1 1
Step 3 – Next epoch 2 produces correct outputs for all cases. ✅
Epoch 2
Final Weights Table
Epoch Case (x1, x2) Target (t) z Value Output (y) Update Applied? w1 w2 b
1 (0,0) 0 0 1 (wrong) Yes 0 0 -1
1 (0,1) 0 -1 0 (correct) No 0 0 -1
1 (1,0) 0 -1 0 (correct) No 0 0 -1
1 (1,1) 1 -1 0 (wrong) Yes 1 1 0
2 (0,0) 0 0 1 (wrong) Yes 1 1 -1
2 (0,1) 0 0 1 (wrong) Yes 1 0 -2
2 (1,0) 0 -1 0 (correct) No 1 0 -2
2 (1,1) 1 -1 0 (wrong) Yes 2 1 -1
PLA works in an iterative manner. It goes through the training dataset multiple times,
updating the parameters whenever an error occurs. The process continues until:
1. The perceptron gives the correct output for all training data, or
2. A maximum number of iterations (epochs) is reached.
The main idea is simple — if the perceptron predicts correctly, no change is made. If the
prediction is wrong, the weights and bias are updated in the direction that would have
made the prediction correct. This allows the perceptron to learn from mistakes and
improve over time.
The PLA is important because it was the first supervised learning algorithm designed
for binary classification. Without this algorithm, the perceptron would remain static
and never improve its performance. PLA also forms the foundation for training in more
complex models like multi-layer perceptrons and deep neural networks.
In real-world applications, the perceptron learning algorithm has been used for:
Although PLA works only for problems that are linearly separable, it remains an
important milestone in the history of machine learning.
Step-by-Step Working of PLA
Initialization
Choose:
o Initial weights (often zeros or small random values).
o Initial bias.
o Learning rate η\eta (0 < η\eta ≤ 1).
Diagram
Inputs → Weights → Weighted Sum → Activation → Output
↑
Weight Update Rule
Visual Representation:
+-------------+
x1 -----> | |
x2 -----> | Weighted | ---> Step ---> y
... | Sum (z) |
+-------------+
↑
(Update weights if wrong)
Advantages
Simple to implement.
Guaranteed to find a solution if data is linearly separable.
Fast for small datasets.
Disadvantages
Only works for linearly separable problems.
Fails for XOR-type problems.
Does not give probability scores — only 0 or 1.
Real-Life Analogy
Imagine a teacher grading essays:
Numerical Example
We’ll train perceptron to learn OR Gate.
Step 1 – Setup
0 0 0
0 1 1
1 0 1
1 1 1
Step 4 – Next epoch
Here’s the Perceptron Training Table combining Epoch 1 and Epoch 2 for your example.
Epoch Case (x1, x2) Target (t) z Calculation Output (y) Correct? w1 w2 b
1 (0,0) 0 0 1 ❌ 0 0 -1
(0,1) 1 -1 0 ❌ 0 1 0
(1,0) 1 0 1 ✅ 0 1 0
(1,1) 1 1 1 ✅ 0 1 0
2 (0,0) 0 0 1 ❌ 0 1 -1
(0,1) 1 0 1 ✅ 0 1 -1
(1,0) 1 -1 0 ❌ 1 1 0
(1,1) 1 2 1 ✅ 1 1 0
Linear Separability
Linear separability is a property of a dataset that describes whether its different classes
can be separated by a straight boundary. In two dimensions (2D), this boundary is a
straight line; in three dimensions (3D), it is a plane; and in higher dimensions, it is
called a hyperplane.
In simple terms, a dataset is linearly separable if you can draw a single straight line (or
plane) that perfectly separates all the points of one class from all the points of another
class without making any classification errors. If such a line exists, the two classes can
be completely distinguished using a linear decision boundary.
For example:
The AND and OR logic gates are linearly separable because we can draw a single
straight line that separates outputs of 0 from outputs of 1.
The XOR logic gate is not linearly separable because no single straight line can
separate its two classes.
its important
Linear separability is crucial because the Perceptron Learning Algorithm (PLA) works
only when the dataset is linearly separable. If the dataset is not linearly separable (like
in the XOR problem), a single-layer perceptron will never converge to a perfect solution,
no matter how many times it is trained.
uses
Pattern classification problems where the features of different classes are clearly
distinct.
Quality control systems, where products are classified as pass/fail based on
measurable features like size or weight.
Binary classification tasks in early AI models, such as spam/not-spam
classification (when the features are well-separated).
Visual Explanation
Case 1: Linearly Separable (AND Gate)
We can draw a single straight line that separates all 0 outputs from 1 outputs.
Mathematical Definition
A dataset {(xi,ti)} is linearly separable if there exist weights ww and bias bb such that:
Real-Life Analogy
Think of sorting fruits by color:
If you only have red apples and yellow bananas, you can separate them easily
with one decision rule: “If color value ≥ X → Apple, else Banana.”
That’s linearly separable.
But if you have fruits that are mixed colors or overlap in shades, you can’t
separate them perfectly with one straight rule — you’d need multiple conditions.
Numerical Example
Consider 2D data for AND gate:
x1 x2 Class
0 0 0
0 1 0
1 0 0
1 1 1
If we plot these points:
If x1 + x2 ≥ 1.5 → Class 1
Else → Class 0
Disadvantages
Many real-world problems are not linearly separable.
Cannot model complex decision boundaries.
If the given dataset is linearly separable, the perceptron learning algorithm will
always find a set of weights and bias that perfectly classify the training
examples in a finite number of steps.
In simple words, if it’s possible to separate the classes with a straight line (or
hyperplane), PLA will definitely reach the correct solution and stop making mistakes.
This is important because it removes uncertainty — without this theorem, we wouldn’t
know whether PLA will ever succeed or keep running forever without finding a correct
answer.
its important:
Uses:
While the theorem itself is mostly part of AI theory and mathematics, it has practical
implications:
Ensures the reliability of PLA in early applications like OCR (Optical Character
Recognition) for character recognition.
Early speech classification systems that dealt with simple, linearly separable
features.
Any binary classification problem where the data can be separated by a straight
line or hyperplane.
Then:
Key Assumptions
Data must be linearly separable.
The algorithm updates one sample at a time (online learning).
The learning rate does not change.
Real-Life Analogy
Imagine teaching a child to sort red and blue balls:
Numerical Example
We’ll see the theorem in action using an AND gate.
x1 x2 t
0 0 0
0 1 0
1 0 0
1 1 1
w1=1,w2=1,b=−1.5
Limitations
Does not apply to non-linearly separable problems (e.g., XOR).
Does not say how many steps it will take — only that it will stop eventually.
Only applies to single-layer perceptrons.
Key Takeaway
If your data is linearly separable, PLA will converge in finite time.
If your data is not linearly separable, it will never converge — you’ll need multi-
layer networks.
----THE END----