0% found this document useful (0 votes)
117 views28 pages

Deeplearning Unit - 1 Notes Final

The document provides an overview of Deep Learning, explaining its definition, applications, and workings, including the structure of biological neurons and their influence on artificial neural networks. It details the components of deep learning, advantages and disadvantages, and introduces the McCulloch–Pitts unit as a foundational model for artificial neurons. Additionally, it outlines the functioning of computational units in machine learning, emphasizing their role in processing data and making decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views28 pages

Deeplearning Unit - 1 Notes Final

The document provides an overview of Deep Learning, explaining its definition, applications, and workings, including the structure of biological neurons and their influence on artificial neural networks. It details the components of deep learning, advantages and disadvantages, and introduces the McCulloch–Pitts unit as a foundational model for artificial neurons. Additionally, it outlines the functioning of computational units in machine learning, emphasizing their role in processing data and making decisions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

B.

Tech III Year – II Semester

III Year I Semester DEEP LEARNING

LTPC
3003 UNIT-I

Basics- Biological Neuron, Idea of computational units, McCulloch–


Pitts unit and Thresholding logic, Linear Perceptron, Perceptron
Learning Algorithm, Linear separability, Convergence theoremfor
Perceptron Learning Algorithm

1. What is Deep Learning?

 Deep Learning (DL) is a branch of Machine Learning that uses Artificial Neural
Networks (ANNs) with many layers (hence “deep”).
 It teaches computers to learn from large amounts of data and automatically find
features without manual programming.

2. Why is it called “Deep”?

 “Deep” refers to the number of layers in the neural network.


 A network with more than 3 layers (input, hidden layers, output) is considered
“deep”.
 Each layer extracts higher-level features from data.

3. Where is it used? (Real-world applications)

 Image Recognition – Face unlock in phones, self-driving cars detecting objects.


 Speech Recognition – Siri, Alexa, Google Assistant.
 Natural Language Processing (NLP) – Chatbots, language translation.
 Healthcare – Disease diagnosis from medical scans.
 Recommendation Systems – Netflix, YouTube suggestions.

4. How does Deep Learning work?

1. Input Layer – Takes raw data (image pixels, sound waves, text).
2. Hidden Layers – Each neuron processes inputs using weights and biases, applies
an activation function.
3. Output Layer – Produces prediction (e.g., “cat” or “dog”).
4. Learning Process:
o Compare output to the correct answer (loss function).
o Adjust weights using optimization algorithms (e.g., Gradient Descent).
o Repeat until the network learns.

5. Key Components

 Neurons – Basic computational units.


 Weights – Numbers that determine how much influence each input has.
 Bias – Helps shift activation to fit the data better.
 Activation Functions – Decide whether a neuron “fires” or not (e.g., sigmoid,
ReLU).
 Loss Function – Measures error.
 Optimizer – Adjusts weights to reduce error.

6. Advantage

 Can learn complex patterns from huge datasets.


 Works well for unstructured data (images, audio, text).
 Removes need for manual feature extraction.

7. Disadvantages

 Requires large amounts of data.


 Needs powerful hardware (GPUs/TPUs).
 Takes long training time.
 Can be a “black box” – difficult to explain decisions.

8. Simple Real-Time Example

📱 Face Unlock on Phone:

1. Camera captures your face.


2. Neural network processes image in layers:
o Early layers detect edges and shapes.
o Middle layers detect facial parts (eyes, nose, mouth).
o Last layers recognize it as “Your Face”.
3. If match is found → unlocks phone.

Biological Neuron
A biological neuron is the smallest functional unit of the human brain and nervous
system. It is responsible for processing and transmitting information in the form of
electrical and chemical signals. The human brain contains billions of neurons that work
together to perform complex tasks such as thinking, learning, memory storage, and
decision-making.

Structurally, a neuron has three main parts. Dendrites are branch-like structures that
receive signals (inputs) from other neurons or sensory organs and carry these signals to
the cell body. The Cell Body (Soma) processes the incoming signals by summing them
up and deciding whether to send a signal forward. The Axon is a long fiber that carries
the processed signal away from the cell body to other neurons or muscles. The axon is
often covered by a protective myelin sheath, which speeds up the transmission of
signals.

At the end of the axon are axon terminals, which connect to the dendrites of other
neurons through tiny gaps called synapses. Communication across synapses occurs via
neurotransmitters, which are chemical messengers that allow the signal to pass from
one neuron to another.

The working of a neuron can be described as follows:


 When the total signal received by the cell body is greater than a certain threshold,
the neuron becomes “active” and sends an electrical impulse through its axon.
This is called firing or spiking.
 If the signal is weaker than the threshold, the neuron remains inactive.

From an artificial intelligence perspective, the biological neuron is the inspiration for
Artificial Neural Networks (ANNs) in deep learning. In such networks, dendrites are
equivalent to the inputs, the soma corresponds to the summation unit, and the axon
output corresponds to the final activation output.

In simple terms, a biological neuron can be seen as a tiny processor that receives
signals, processes them, and sends them to other neurons. Understanding the working
of biological neurons is important because it helps us design computer systems that can
learn from data in a way that is similar to how humans learn from experience. Neurons
are found in the brains and nervous systems of humans and animals, and each neuron
connects to thousands of others through synapses, forming a massive communication
network.

Structure & Working

A neuron has these main parts:

1. Dendrites – Receive signals from other neurons (like input wires).


2. Cell body (Soma) – Processes the signals and decides what to do.
3. Axon – Sends the processed signal to other neurons (like an output wire).
4. Axon Hillock – The “decision point” where the neuron decides whether to fire a
signal or not.
5. Myelin Sheath – A fatty layer around the axon that makes signal travel faster.
6. Nodes of Ranvier – Gaps in the myelin that help signals “jump” faster.
7. Synapse – The gap between neurons where chemicals (neurotransmitters) pass
the signal.
8. Terminal Buttons – Release neurotransmitters to communicate with the next
neuron.

Diagram:

Working in simple steps:

1. Signals (electrical or chemical) come into the dendrites.


2. Soma sums them up.
3. If the sum crosses a threshold, neuron “fires” an action potential.
4. Signal travels down the axon.
5. At the terminal buttons, neurotransmitters are released into the synapse to send
the signal to the next neuron.

Analog

 Imagine a group chat:


o Messages from friends (inputs) → Dendrites
o You read and decide if it’s important → Soma & Axon Hillock
o If yes, you forward it → Axon
o Your message reaches others → Terminal Buttons & Synapse

Advantage

 Processes massive information in parallel.


 Learns and adapts through connections (synapses) getting stronger or weaker.
 Fault tolerant — if some neurons fail, the brain still works.

Disadvantages

 Biological neurons are slow compared to computers (signal speed ~120 m/s,
computers are much faster).
 Can get damaged (injury, disease).
 Needs energy (glucose, oxygen).

Real-time Example

 When you touch a hot object:


1. Skin neurons detect heat.
2. Signal travels to spinal cord & brain.
3. Brain processes “it’s hot” and sends a signal back.
4. Muscles pull your hand away.

Idea of Computational Units

The concept of computational units in artificial intelligence is inspired by the structure


and working of biological neurons. Just like a biological neuron collects signals from
multiple dendrites, processes them in the cell body, and sends the result through the
axon, a computational unit in machine learning performs a similar function using
mathematical operations.
A computational unit takes several input values x1,x2,x3,…,xn and assigns a weight
w1,w2,w3,…,wn to each input. The weight represents the importance or influence of that
input on the final decision. These weighted inputs are then summed, and a bias term b
is added to make the model more flexible and to adjust the decision boundary.
Mathematically, the computation inside a unit is expressed as:

z=∑i=1 n(xi×wi)+b
This value z is then passed through an activation function f(z), which decides the final
output of the unit. The activation function introduces non-linearity, allowing the
network to handle more complex problems. In early AI models, a threshold function
was often used, where the output was 1 if the sum was greater than or equal to the
threshold and 0 otherwise. Modern deep learning models use advanced activation
functions such as Sigmoid, Tanh, and ReLU.
A computational unit is the basic building block of Artificial Neural Networks (ANNs)
and deep learning models. Its main tasks are:
1. Take inputs.
2. Multiply them by importance values (weights).
3. Add a constant value (bias).
4. Pass the result through a decision-making function (activation function).
5. Produce an output.
Also known as an artificial neuron, processing element, or node, a computational unit
is essential for enabling learning. Without it, a neural network would just be an empty
structure, unable to detect patterns or make decisions. While one computational unit
alone can only perform simple tasks, many connected together can process complex
information, much like how multiple biological neurons form the human brain.
Computational units are used in almost every deep learning application today, including
image classification (face recognition, medical imaging), speech recognition (Alexa,
Siri, Google Assistant), language translation (Google Translate), self-driving cars
(object detection), and recommendation systems (Netflix, YouTube, Amazon).

Detailed Working of a Computational Unit


Let’s break it into simple steps:

1. Inputs arrive
o Represented as x1,x2,x3,…x_1, x_2, x_3,….
o Example: features of a house — size, location score, number of rooms.
2. Weights multiply inputs
o Each input has a weight w1,w2,w3,…w_1, w_2, w_3,….
o Weights decide importance of each input.
Example: location may be more important than number of rooms.
3. Weighted sum is calculated

z=(x1×w1)+(x2×w2)+(x3×w3)+⋯+b

z = (x1. w1) + (x2 . w2) + (x3 . w3) +….. + b


where b = bias (gives flexibility to the model).

4. Activation function decides


o Without activation function, the unit is just a linear equation.
o Activation introduces non-linearity so it can solve complex problems.
5. Output is sent forward
o This output becomes input for the next layer.
Diagram
x1 -------[w1]----\
x2 -------[w2]----- SUM + b ----> [Activation] ---> Output (y)
x3 -------[w3]----/
Visual Representation:

(Feature 1) (Feature 2) (Feature 3)


x1 x2 x3
| | |
w1 w2 w3
\ | /
\ | /
\ | /
[ Summation + Bias ]
|
[ Activation Function ]
|
Output y
Real-Life Analogy
Imagine you are a judge in a cooking competition:

 Inputs = contestant’s scores for Taste, Presentation, Creativity.


 Weights = how much you value each category (e.g., Taste is most important,
Creativity least important).
 Bias = your personal mood that day (extra bonus or penalty).
 Weighted sum = total score after applying importance.
 Activation function = final decision: "Pass" if score ≥ 7, "Fail" otherwise.
 Output = “Pass” or “Fail”.

Advantages
 Simple yet powerful concept that can be scaled into complex networks.
 Can process any type of data if represented in numbers.
 Works well with large data when combined in multiple layers.

Disadvantages
 One unit alone is very limited — only solves simple, linearly separable problems.
 Needs training to adjust weights (can be time-consuming).
 Incorrect activation choice can make learning fail.

Real-Time Example with Numbers


Let’s check if a student passes based on:

 x1x_1 = Exam Score (out of 10) → weight w1=0.5w_1 = 0.5


 x2x_2 = Attendance Score (out of 10) → weight w2=0.3w_2 = 0.3
 x3x_3 = Project Score (out of 10) → weight w3=0.2w_3 = 0.2
 Bias b=1b = 1 (extra encouragement)

Step 1 – Inputs:

x1=8, x2=6, x3=7

Step 2 – Weighted sum:

Z = (8×0.5)+(6×0.3)+(7×0.2)+1
z = (8 * 0.5) + (6 * 0.3) + (7 * 0.2) + 1
z = 4+1.8+1.4+1
z = 4 + 1.8 + 1.4 + 1
z=8.2

Step 3 – Activation (Pass if ≥ 5):


Since z=8.2≥5

z = 8.2 \geq 5 → Output = 1 (“Pass”).

Key Takeaways
 Computational unit = artificial brain cell.
 Works by: Inputs → Weights → Bias → Summation → Activation → Output.
 Multiple units form layers → multiple layers form deep networks.

McCulloch–Pitts (MCP) Unit & Threshold


Logic
The McCulloch–Pitts (MCP) neuron was the very first mathematical model of a
biological neuron. It was proposed in 1943 by neuroscientist Warren McCulloch and
logician Walter Pitts. This model is a simplified computational unit that works with
binary values and is considered the starting point for all artificial neural networks.

The MCP neuron works in a straightforward way. It takes binary inputs (only 0 or 1),
multiplies each input by a fixed weight, and then adds up the results. The total is then
compared to a set threshold value:

 If the sum is greater than or equal to the threshold, the neuron outputs 1
(meaning it “fires”).
 If the sum is less than the threshold, the neuron outputs 0 (meaning it stays
inactive).

This simple mechanism is called threshold logic, where decisions are made based on
whether the input crosses a specific cut-off point. Using this approach, the MCP neuron
can perform basic logical operations like AND, OR, and NOT, which are the building
blocks of digital circuits.

Although the MCP model is too simple for solving complex problems, it is important
because it laid the foundation for modern neural networks and introduced the idea of
decision-making using thresholds.

Today, the MCP neuron is mainly used for:

 Simulating simple logic circuits.


 Studying the basics of early artificial intelligence.
 Understanding how perceptrons work.
 Learning about binary classification problems.

In short, the McCulloch–Pitts unit is like the “grandparent” of today’s deep learning
models — basic, limited, but extremely important in history.

Detailed Working
Step-by-step process:

1. Inputs x1,x2,...,xnx_1, x_2, ..., x_n are binary (0 or 1).


2. Each input has a weight w1,w2,...,wn (importance factor).
3. Calculate weighted sum:

S=(x1×w1)+(x2×w2)+⋯+(xn×wn)

4. Compare with threshold θ\theta:


o If S ≥ θS --> theta → output = 1 (fires).
o If S < θS  theta → output = 0 (does not fire).
5. No learning in original MCP — weights and threshold are fixed.

Diagram
x1 ---- w1 ----\
x2 ---- w2 ----- SUM ---> Compare with θ ---> Output (0 or 1)
x3 ---- w3 ----/
Visual Representation:

Input Layer Processing Unit Output


----------- ------------------- --------
x1 --- w1 \
x2 --- w2 >---[ Σ (Weighted Sum) ]--[ Threshold ]---> y
x3 --- w3 /
Advantages
 Simple to understand and implement.
 Can model basic logical decisions (AND, OR, NOT).
 Mathematical foundation for neural networks.

Disadvantages
 Only works for binary inputs/outputs.
 No learning capability — weights fixed.
 Can only solve linearly separable problems (cannot solve XOR).

Real-Time Analogy
Think of a club bouncer:

 Inputs: your age, dress code, membership card.


 Each input has a certain importance (weight).
 The bouncer adds up your points.
 If your score ≥ threshold (say 5 points) → you get in (output 1).
 Else → you’re denied entry (output 0).

Numerical Example (AND Gate)


We want to implement an AND gate:

 Rule: Output = 1 only if both inputs are 1.

Step 1 – Set weights and threshold:

w1=1, w2=1, θ=2;

Step 2 – Test inputs:

x1 x2 Weighted Sum SS S ≥ θS  theta? Output

0 0 0+0=0 No 0

0 1 0+1=1 No 0

1 0 1+0=1 No 0

1 1 1+1=2 Yes 1

Another Example (OR Gate)


 Rule: Output = 1 if at least one input is 1.

Set:

w1=1, w2=1, θ=1


x1 x2 Weighted Sum S≥θS Output

0 0 0 No 0

0 1 1 Yes 1

1 0 1 Yes 1

1 1 2 Yes 1

Key Formula

Where:

 xi = input values
 wi = corresponding weights
 θ = threshold value
 y = output (1 or 0)

Threshold Logic
Threshold logic is a decision-making method used in early artificial neurons, such as
the McCulloch–Pitts (MCP) model and perceptrons. In this approach, the neuron
calculates the weighted sum of its inputs and compares it to a fixed cut-off value called
the threshold. If the sum is greater than or equal to the threshold, the neuron outputs 1
(meaning “Yes,” “True,” or “Fire”). If the sum is less than the threshold, the neuron
outputs 0 (meaning “No,” “False,” or “Don’t Fire”).

This method produces a binary output instead of a continuous value, closely mimicking
the “all-or-nothing” firing behavior of biological neurons. Threshold logic was one of the
earliest techniques that allowed artificial neurons to replicate basic human brain
decision-making.

Threshold logic is important because it forms the foundation of logic gate


implementation (such as AND, OR, and NOT) inside neural networks and is widely
used in simple binary classification tasks. While it is less common in modern deep
learning models, it remains essential for understanding how early AI systems worked.

In practice, threshold logic was used in:


 Early AI models and perceptrons.
 Simulation of logical functions like AND, OR, and NOT.
 Basic pattern recognition where only a yes/no decision is needed.
 Simple control systems — for example, deciding whether a machine should stop or
continue running based on certain sensor readings.

How Threshold Logic Works


Step-by-step:

1. Inputs x1,x2,...,xn . come into the unit.


2. Each input is multiplied by its weight w1,w2,...,wn.
3. Compute weighted sum:

S=(x1×w1)+(x2×w2)+⋯+(xn×wn)
Compare SS to the threshold θ\theta:

o If S ≥ θS, output = 1.
o Else, output = 0.

Formula

Diagram
Inputs Weights Summation Threshold Decision Output
x1 ----w1----\
x2 ----w2----- Σ ---- Compare with θ -----> 1 (fire) or 0 (no fire)
x3 ----w3----/

Visual:

x1 → [× w1] \
x2 → [× w2] >-- [ SUM + Bias ] --[ Compare with θ ]--> Output y
x3 → [× w3] /

Advantages
 Very simple and fast for binary decisions.
 Good for implementing logic gates.
 Computationally inexpensive.

Disadvantages
 Only outputs binary values — can’t give probabilities or confidence levels.
 Cannot solve problems where a smooth decision boundary is needed.
 Fails for non-linearly separable problems (like XOR).

Real-Time Analogy
Imagine an automatic entry gate:

 Inputs:
o x1 = You have an ID card (1 if yes, 0 if no) — weight 5.
o x2 = You entered a correct PIN (1 if yes, 0 if no) — weight 3.
 Threshold = 5.
 If your weighted score ≥ 5, gate opens (output 1). Else, gate stays closed (output 0).

Numerical Example (Implementing a Decision Rule)


Rule: A machine should turn ON if:

 Sensor 1 detects heat (weight 2)


 Sensor 2 detects movement (weight 3)
 Total score ≥ threshold 4

Step 1 – Parameters:

w1=2, w2=3, θ=4

Step 2 – Test cases:

Sensor1 (x1) Sensor2 (x2) Sum SS Decision S≥θS? Output

0 0 0 No 0

1 0 1×2=2 No 0

0 1 1×3=3 No 0

1 1 2+3=5 Yes 1

Logical Gates with Threshold Logic


AND Gate Example

 Weights: w1=1,w2=1
 Threshold θ=2
x1 x2 Sum Output

0 0 0 0

0 1 1 0

1 0 1 0

1 1 2 1

Linear Perceptron
The Linear Perceptron is an improved version of the McCulloch–Pitts neuron that has
the ability to learn from data. It was introduced in 1958 by Frank Rosenblatt and
marked a major step forward in the development of artificial intelligence.

A linear perceptron is a computational model that works in five main steps:

1. It takes multiple input values.


2. Each input is multiplied by an adjustable weight that represents its importance.
3. A bias value is added to the sum to make the model more flexible.
4. The result is passed through an activation function (often a step function in the
basic perceptron).
5. The activation function produces the final output.

The key difference between the MCP neuron and the linear perceptron is that the MCP
has fixed weights, whereas the perceptron can adjust its weights automatically
during training using a learning algorithm. This ability to update weights allows the
perceptron to improve its performance over time.

The linear perceptron is important because it was the first model capable of learning
from data, making it the foundation of modern deep learning techniques. It can
successfully solve linearly separable problems, such as logical AND and OR gates,
where a single straight line can separate the classes.

In practice, linear perceptrons have been used for:

 Simple binary classification problems.


 Early image and speech recognition systems.
 Basic spam detection in emails.
 Quality control in manufacturing, such as deciding whether a product passes or
fails inspection.

Although modern neural networks have moved beyond the limitations of the simple
perceptron, this model remains an essential concept for understanding how learning in
AI began.
Structure & Working
Step-by-step:

1. Inputs (x1,x2,...,xn) are fed into the perceptron.


2. Each input has a weight (w1,w2,...,wn).
3. Weighted sum is calculated:

z=(x1×w1)+(x2×w2)+...+b
Pass z through activation function (commonly step function for linear perceptron).

4. Output = 1 or 0 (for binary classification).


5. If output is wrong → update weights using Perceptron Learning Rule.

Perceptron Learning Rule

Diagram
x1 ---- w1 ----\
x2 ---- w2 ----- SUM (+ b) ---> Activation ---> Output y
x3 ---- w3 ----/

Visual:

Inputs Weights Summation Activation Output


x1 → × w1 \
x2 → × w2 >-- [ Σ + b ] ---> Step function ---> 0 or 1
x3 → × w3 /

Advantages
 Can learn from examples.
 Simple to implement.
 Works well for linearly separable problems.
 Guaranteed to converge if data is linearly separable.

Disadvantages
 Cannot solve non-linearly separable problems (like XOR).
 Only works for binary classification in original form.
 Limited to straight-line decision boundaries.

Real-Life Analogy
Think of a job applicant screening system:

 Inputs:
o x1x_1 = Education score.
o x2x_2 = Work experience score.
 The system learns weights for each input:
o Maybe experience is more important than education.
 Bias = baseline preference.
 After training on past hiring data, it predicts:
o 1 → Accept, 0 → Reject.

Numerical Example
We will train a perceptron to learn the AND Gate.

Step 1 – Initial setup:

w1=0, w2=0, b=0, η=1

Threshold logic via step activation:

Step 2 – Training data:

x1 x2 Target t

0 0 0

0 1 0
x1 x2 Target t

1 0 0

1 1 1
Step 3 – Next epoch 2 produces correct outputs for all cases. ✅

Epoch 2
Final Weights Table

Epoch Case (x1, x2) Target (t) z Value Output (y) Update Applied? w1 w2 b
1 (0,0) 0 0 1 (wrong) Yes 0 0 -1
1 (0,1) 0 -1 0 (correct) No 0 0 -1
1 (1,0) 0 -1 0 (correct) No 0 0 -1
1 (1,1) 1 -1 0 (wrong) Yes 1 1 0
2 (0,0) 0 0 1 (wrong) Yes 1 1 -1
2 (0,1) 0 0 1 (wrong) Yes 1 0 -2
2 (1,0) 0 -1 0 (correct) No 1 0 -2
2 (1,1) 1 -1 0 (wrong) Yes 2 1 -1

Decision rule: y=1 { y = 1 if x1 + x2 ≥ 1 } → AND gate learned.

6. Perceptron Learning Algorithm (PLA)


The Perceptron Learning Algorithm (PLA) is the step-by-step procedure used to train
a perceptron so that it can correctly classify training examples. The algorithm works by
adjusting the weights and bias of the perceptron whenever it makes a wrong prediction.
Over time, these adjustments improve the perceptron’s ability to make accurate
decisions.

PLA works in an iterative manner. It goes through the training dataset multiple times,
updating the parameters whenever an error occurs. The process continues until:

1. The perceptron gives the correct output for all training data, or
2. A maximum number of iterations (epochs) is reached.

The main idea is simple — if the perceptron predicts correctly, no change is made. If the
prediction is wrong, the weights and bias are updated in the direction that would have
made the prediction correct. This allows the perceptron to learn from mistakes and
improve over time.

The PLA is important because it was the first supervised learning algorithm designed
for binary classification. Without this algorithm, the perceptron would remain static
and never improve its performance. PLA also forms the foundation for training in more
complex models like multi-layer perceptrons and deep neural networks.

In real-world applications, the perceptron learning algorithm has been used for:

 Binary classification problems like AND and OR logic gates.


 Early pattern recognition tasks such as handwritten character recognition
(basic forms).
 Simple voice command recognition systems.
 Quality inspection in manufacturing, where products are classified as pass or
fail.

Although PLA works only for problems that are linearly separable, it remains an
important milestone in the history of machine learning.
Step-by-Step Working of PLA
Initialization

 Choose:
o Initial weights (often zeros or small random values).
o Initial bias.
o Learning rate η\eta (0 < η\eta ≤ 1).

Diagram
Inputs → Weights → Weighted Sum → Activation → Output

Weight Update Rule
Visual Representation:

+-------------+
x1 -----> | |
x2 -----> | Weighted | ---> Step ---> y
... | Sum (z) |
+-------------+

(Update weights if wrong)
Advantages
 Simple to implement.
 Guaranteed to find a solution if data is linearly separable.
 Fast for small datasets.

Disadvantages
 Only works for linearly separable problems.
 Fails for XOR-type problems.
 Does not give probability scores — only 0 or 1.

Real-Life Analogy
Imagine a teacher grading essays:

 Teacher has a rubric (weights for grammar, creativity, structure).


 First grading might be inconsistent.
 After feedback from head teacher (target answer), teacher adjusts weights to
match expected grading.
 After several essays (iterations), grading matches the desired standard — teacher
has “learned”.

Numerical Example
We’ll train perceptron to learn OR Gate.

Step 1 – Setup

w1=0, w2=0, b=0, η=1

Activation: Step function (output 1 if z ≥ 0, else 0)

Step 2 – Training Data


x1 x2 Target t

0 0 0

0 1 1

1 0 1

1 1 1
Step 4 – Next epoch

Here’s the Perceptron Training Table combining Epoch 1 and Epoch 2 for your example.

📊 Perceptron Training Table

Epoch Case (x1, x2) Target (t) z Calculation Output (y) Correct? w1 w2 b
1 (0,0) 0 0 1 ❌ 0 0 -1

(0,1) 1 -1 0 ❌ 0 1 0

(1,0) 1 0 1 ✅ 0 1 0

(1,1) 1 1 1 ✅ 0 1 0

2 (0,0) 0 0 1 ❌ 0 1 -1

(0,1) 1 0 1 ✅ 0 1 -1

(1,0) 1 -1 0 ❌ 1 1 0

(1,1) 1 2 1 ✅ 1 1 0
Linear Separability
Linear separability is a property of a dataset that describes whether its different classes
can be separated by a straight boundary. In two dimensions (2D), this boundary is a
straight line; in three dimensions (3D), it is a plane; and in higher dimensions, it is
called a hyperplane.

In simple terms, a dataset is linearly separable if you can draw a single straight line (or
plane) that perfectly separates all the points of one class from all the points of another
class without making any classification errors. If such a line exists, the two classes can
be completely distinguished using a linear decision boundary.

For example:

 The AND and OR logic gates are linearly separable because we can draw a single
straight line that separates outputs of 0 from outputs of 1.
 The XOR logic gate is not linearly separable because no single straight line can
separate its two classes.

its important

Linear separability is crucial because the Perceptron Learning Algorithm (PLA) works
only when the dataset is linearly separable. If the dataset is not linearly separable (like
in the XOR problem), a single-layer perceptron will never converge to a perfect solution,
no matter how many times it is trained.

Understanding whether a dataset is linearly separable helps in choosing the right


model:

 If data is linearly separable → A single-layer perceptron is sufficient.


 If data is not linearly separable → A multi-layer network with non-linear
activation functions is needed.

uses

Linear separability plays an important role in:

 Pattern classification problems where the features of different classes are clearly
distinct.
 Quality control systems, where products are classified as pass/fail based on
measurable features like size or weight.
 Binary classification tasks in early AI models, such as spam/not-spam
classification (when the features are well-separated).

Visual Explanation
Case 1: Linearly Separable (AND Gate)

We can draw a single straight line that separates all 0 outputs from 1 outputs.

Case 2: Not Linearly Separable (XOR Gate)

No single straight line can separate the classes.


You’d need curved lines or multiple lines → which requires multi-layer perceptrons.

Mathematical Definition
A dataset {(xi,ti)} is linearly separable if there exist weights ww and bias bb such that:

ti[(w⋅xi)+b]>0∀it_i \left[ (w \cdot x_i) + b \right] > 0 \quad \forall i

 tit_i = +1 for Class 1, -1 for Class 0.


 The equation w⋅x+b=0w \cdot x + b = 0 defines the separating hyperplane.

Real-Life Analogy
Think of sorting fruits by color:

 If you only have red apples and yellow bananas, you can separate them easily
with one decision rule: “If color value ≥ X → Apple, else Banana.”
 That’s linearly separable.
 But if you have fruits that are mixed colors or overlap in shades, you can’t
separate them perfectly with one straight rule — you’d need multiple conditions.

Numerical Example
Consider 2D data for AND gate:

x1 x2 Class

0 0 0

0 1 0

1 0 0

1 1 1
If we plot these points:

 Class 1 point = (1,1)


 Class 0 points = (0,0), (0,1), (1,0)

We can draw a line x1+x2=1.5 that perfectly separates them:

 If x1 + x2 ≥ 1.5 → Class 1
 Else → Class 0

✅ This means linearly separable.

Checking Linear Separability


Steps to check:

1. Plot data points (for 2D/3D).


2. Try to draw a straight line (plane/hyperplane in higher dimensions) that separates
classes.
3. If all points of one class are on one side → linearly separable.
4. If overlap exists → not linearly separable.

Advantages of Linearly Separable Data


 Easy to classify with simple models (Perceptron, Linear SVM).
 Fast training.

Disadvantages
 Many real-world problems are not linearly separable.
 Cannot model complex decision boundaries.

Convergence Theorem for Perceptron Learning


Algorithm
The Convergence Theorem for the Perceptron Learning Algorithm (PLA) is a
mathematical guarantee that states:

If the given dataset is linearly separable, the perceptron learning algorithm will
always find a set of weights and bias that perfectly classify the training
examples in a finite number of steps.

In simple words, if it’s possible to separate the classes with a straight line (or
hyperplane), PLA will definitely reach the correct solution and stop making mistakes.
This is important because it removes uncertainty — without this theorem, we wouldn’t
know whether PLA will ever succeed or keep running forever without finding a correct
answer.

The theorem tells us that:


 For linearly separable data, PLA will converge (stop training) after a finite
number of updates.
 For non-linearly separable data, PLA will never converge — it will keep adjusting
the weights endlessly without reaching a perfect solution.

its important:

 It guarantees that PLA works for linearly separable problems.


 Without this proof, we wouldn’t be certain if PLA will eventually find a solution.
 It helps in deciding when to use a perceptron and when to switch to a multi-
layer network.

Uses:

While the theorem itself is mostly part of AI theory and mathematics, it has practical
implications:

 Ensures the reliability of PLA in early applications like OCR (Optical Character
Recognition) for character recognition.
 Early speech classification systems that dealt with simple, linearly separable
features.
 Any binary classification problem where the data can be separated by a straight
line or hyperplane.

The Statement (Formal)


If:

1. The training dataset is finite.


2. The dataset is linearly separable.
3. Learning rate η>0\eta > 0 is constant.

Then:

 The perceptron learning algorithm converges to a solution in a finite number of


updates.

Key Assumptions
 Data must be linearly separable.
 The algorithm updates one sample at a time (online learning).
 The learning rate does not change.

Idea of Proof (Simple Version)


The proof has two parts:

1. Bounded weight growth


o Every time the algorithm updates weights, it moves in a direction that
increases the alignment with the correct separating hyperplane.
o Since data is separable, weights keep moving toward a fixed region.
2. Guaranteed separation
o Eventually, all points are classified correctly — so no more updates happen.
o This means the algorithm stops in finite steps.

Real-Life Analogy
Imagine teaching a child to sort red and blue balls:

 The rule is simple: if red → bucket A, if blue → bucket B.


 The child starts making mistakes, and you correct them (update weights).
 If balls are clearly red or blue (linearly separable), the child will eventually get all
of them right and stop making mistakes.
 If colors are mixed (not separable), they will never get perfect accuracy — they’ll
keep making mistakes forever.

Numerical Example
We’ll see the theorem in action using an AND gate.

Training data (linearly separable):

x1 x2 t

0 0 0

0 1 0

1 0 0

1 1 1

If we run PLA (like we did earlier):

 It will make a few mistakes in the beginning (wrong predictions → updates).


 After finite updates, it learns the correct weights:

w1=1,w2=1,b=−1.5

From then on, all predictions are correct → algorithm stops.

✅ This matches the theorem’s guarantee.

Advantages of Convergence Theorem


 Gives confidence in using PLA for linearly separable problems.
 Helps decide whether a perceptron will solve a problem or not.
 Foundation for later theories in machine learning.

Limitations
 Does not apply to non-linearly separable problems (e.g., XOR).
 Does not say how many steps it will take — only that it will stop eventually.
 Only applies to single-layer perceptrons.
Key Takeaway
 If your data is linearly separable, PLA will converge in finite time.
 If your data is not linearly separable, it will never converge — you’ll need multi-
layer networks.

7-Mark Questions (Only Questions)


1. Explain the structure and working of a McCulloch–Pitts neuron with an example
and diagram.
2. Describe threshold logic. How can it be used to implement basic logic gates? Give
suitable examples.
3. Explain the architecture, working, advantages, and disadvantages of a linear
perceptron.
4. Describe the Perceptron Learning Algorithm with step-by-step working and a
suitable numerical example.
5. What is linear separability? Explain with diagrams and examples of linearly
separable and non-separable problems.
6. State and explain the Convergence Theorem for the Perceptron Learning
Algorithm. Why is it important?
7. Compare and contrast McCulloch–Pitts neurons and Linear Perceptrons in terms
of learning ability, applications, and limitations.
8. Using threshold logic, design AND and OR gates. Show weights, thresholds, and
truth tables.

Important 2-Mark Questions


1. Define an artificial neuron.
2. Who proposed the McCulloch–Pitts neuron model and in which year?
3. What is meant by computational unit in neural networks?
4. State two functions of a computational unit.
5. Define threshold logic.
6. Write the mathematical expression for threshold logic.
7. Mention one advantage and one disadvantage of threshold logic.
8. What is a linear perceptron?
9. Who introduced the perceptron model?
10. What is the role of the activation function in a perceptron?
11. Write the perceptron learning rule formula.
12. What is the purpose of the learning rate in PLA?
13. Define linear separability.
14. Give one example of a linearly separable problem.
15. Give one example of a non-linearly separable problem.
16. State the Convergence Theorem for the Perceptron Learning Algorithm.
17. Mention one limitation of the Convergence Theorem.
18. What is the main difference between MCP and perceptron?
19. Name two real-life applications of perceptrons.
20. What is the output range of a perceptron with a step activation function?

----THE END----

You might also like