0% found this document useful (0 votes)

60 views28 pages

GANs and Deep Learning Overview

The document provides an overview of Generative Adversarial Networks (GANs) and their historical context within deep learning, detailing the evolution of neural networks from the 1940s to recent advancements. It explains the architecture of GANs, including the roles of the generator and discriminator, and highlights various types of GANs, such as Vanilla GANs and Conditional GANs, along with their applications in image synthesis and data augmentation. Additionally, the document discusses the advantages of GANs, including their ability to generate high-quality synthetic data and perform unsupervised learning.

Uploaded by

kpbharath1425

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views28 pages

GANs and Deep Learning Overview

Uploaded by

kpbharath1425

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised

Learning

A Brief History of Deep Learning

 Deep Learning, is a more evolved branch of machine learning, and uses layers of
algorithms to process data, and imitate the thinking process, or to develop
abstractions.
 It is often used to visually recognize objects and understand human speech.
Information is passed through each layer, with the output of the previous layer
providing input for the next layer.
 The first layer in a network is called the input layer, while the last is called an output
layer.
 All the layers between input and output are referred to as hidden layers. Each layer is
typically a simple, uniform algorithm containing one kind of activation function.

 Feature extraction is another aspect of deep learning. It is used for pattern recognition
and image processing.
 Feature extraction uses an algorithm to automatically construct meaningful
“features” of the data for purposes of training, learning, and understanding.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Early Beginnings (1940s - 1960s)

1943: The journey began with Warren McCulloch and Walter Pitts' model of artificial
neurons, the McCulloch-Pitts neuron, which laid the foundation for neural network theory.
1957: Frank Rosenblatt introduced the Perceptron, an early neural network model capable of
learning and recognizing patterns.
The Winter of AI (1970s - 1980s)
Despite early enthusiasm, neural networks faced challenges, including computational
limitations and the inability to train multi-layer networks, leading to reduced interest in the
field, known as the "AI winter."
1974: Paul Werbos developed backpropagation, a key algorithm for training neural
networks, but it remained largely unnoticed until the mid-1980s.
Revival and Growth (1980s - 1990s)
1986: Geoffrey Hinton, David Rumelhart, and Ronald Williams popularized backpropagation,
reviving interest in neural networks.
1989: Yann LeCun applied backpropagation to handwritten digit recognition, leading to the
development of Convolutional Neural Networks (CNNs).
The Emergence of Deep Learning (2000s)
2006: Hinton and his colleagues introduced the concept of deep belief networks (DBNs),
marking the formal beginning of deep learning.
2009: Fei-Fei Li's ImageNet project provided a large-scale dataset for training deep learning
models, fueling advancements in computer vision.
Breakthroughs and Dominance (2010s)
2012: Alex Krizhevsky, Ilya Sutskever, and Hinton won the ImageNet competition with
AlexNet, a deep CNN, demonstrating the power of deep learning in image recognition.
2014: The introduction of Generative Adversarial Networks (GANs) by Ian Goodfellow
opened new possibilities in generative modeling.
2015: Google's DeepMind developed AlphaGo, which defeated the world champion Go
player, showcasing deep learning's potential in complex strategy games.
2016: The emergence of frameworks like TensorFlow and PyTorch made deep learning more
accessible to researchers and practitioners.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Recent Advances and Future Directions (2020s)

2020: OpenAI's GPT-3, a language model with 175 billion parameters, demonstrated the
capabilities of deep learning in natural language processing.
Ongoing Research: Deep learning continues to evolve with advancements in areas like
reinforcement learning, unsupervised learning, and multimodal learning.

Convolutional Neural Networks (CNN)

 A Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision.
 Computer vision is a field of Artificial Intelligence that enables a computer to
understand and interpret the image or visual data.
 Convolutional Neural Network (CNN) is the extended version of artificial neural
networks (ANN) which is predominantly used to extract the feature from the grid-like
matrix dataset.
CNN Architecture

 The Convolutional layer applies filters to the input image to extract features, the
Pooling layer downsamples the image to reduce computation, and the fully connected
layer makes the final prediction.
 The network learns the optimal filters through backpropagation and gradient descent.

 The Convolutional layer applies filters to the input image to extract features, the
Pooling layer downsamples the image to reduce computation, and the fully connected
layer makes the final prediction.
How Convolutional Layers Works?
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

 Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image.
 It can be represented as a cuboid having its length, width (dimension of the image),
and height (i.e the channel as images generally have red, green, and blue channels).

 Now imagine taking a small patch of this image and running a small neural network,
called a filter or kernel on it, with say, K outputs and representing them vertically.
 Now slide that neural network across the whole image, as a result, we will get another
image with different widths, heights, and depths.
 Instead of just R, G, and B channels now we have more channels but lesser width and
height. This operation is called Convolution.
 If the patch size is the same as that of the image it will be a regular neural network.
Because of this small patch, we have fewer weights.

Layers Used to Build ConvNets

A complete Convolution Neural Networks architecture is also known as covnets. A covnets is
a sequence of layers, and every layer transforms one volume to another through a
differentiable function.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Types of layers:
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.

 Input Layers:
 It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images.
 This layer holds the raw input of the image with width 32, height 32, and depth
3.
 Convolutional Layers:
 This is the layer, which is used to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels to the input images.
 The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides
over the input image data and computes the dot product between kernel
weight and the corresponding input image patch.
 The output of this layer is referred as feature maps. Suppose we use a total of
12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

 Activation Layer:
 By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network.
 it will apply an element-wise activation function to the output of the
convolution layer.
 Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU,
etc. The volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.
 Pooling layer:
 This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory
and also prevents overfitting.
 Two common types of pooling layers are max pooling and average pooling. If
we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be
of dimension 16x16x12.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

 Flattening layer: The resulting feature maps are flattened into a one-dimensional
vector after the convolution and pooling layers so they can be passed into a
completely linked layer for categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the
final classification or regression task.
 Output Layer: The output from the fully connected layers is then fed into a logistic
function for classification tasks like sigmoid or softmax which converts the output of
each class into the probability score of each class.

 CNN (Convolutional Neural Network) is a feed-forward neural network as the

information moves from one layer to the next. CNN is also called ConvNets.
 It consists of hidden layers having convolution and pooling functions in addition to the
activation function for introducing nonlinearity.
 CNN is mainly used for image recognition.
 CNN first learns to recognize the components of an image (e.g. lines, corners, curves,
shapes, texture etc.) and then learns to combine these components (pooling) to
recognize larger structures (e.g. faces, objects etc.).
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Generative Adversarial Network (GAN)

 Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow and his
colleagues in 2014.
 GANs are a class of neural networks that autonomously learn patterns in the input
data to generate new examples resembling the original dataset.
GAN’s architecture consists of two networks:
 Generator: creates synthetic data from random noise to produce data so realistic that
the discriminator cannot distinguish it from real data.
 Discriminator: acts as a critic, evaluating whether the data it receives is real or fake.

 The Generator improves its ability to create realistic data, while the Discriminator
becomes better at detecting fakes.
 Over time, this adversarial process leads to the generation of highly realistic and high-
quality data.
Detailed Architecture of GANs
Let’s explore the generator and discriminator model of GANs in detail:
1. Generator Model
The generator is a deep neural network that takes random noise as input to generate realistic
data samples (e.g., images or text). It learns the underlying data distribution by adjusting its
parameters through backpropagation.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

The generator’s objective is to produce samples that the discriminator classifies as real. The
loss function is:

2. Discriminator Model
 The discriminator acts as a binary classifier, distinguishing between real and generated
data.
 It learns to improve its classification ability through training, refining its parameters to
detect fake samples more accurately.
 When dealing with image data, the discriminator often employs convolutional layers
or other relevant architectures suited to the data type.
 These layers help extract features and enhance the model’s ability to differentiate
between real and generated samples.
 The discriminator reduces the negative log likelihood of correctly classifying both
produced and real samples.
 This loss incentivizes the discriminator to accurately categorize generated samples as
fake and real samples with the following equation:

 By minimizing this loss, the discriminator becomes more effective at distinguishing

between real and generated samples.
MinMax Loss
GANs follow a minimax optimization where the generator and discriminator are adversaries:
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

The generator aims to minimize the loss, while the discriminator tries to maximize its
classification accuracy.

How does a GAN work?

Let’s understand how the generator (G) and discriminator (D) complete to improve each other
over time:
Generator’s First Move

 G takes a random noise vector as input. This noise vector contains random values and
acts as the starting point for G’s creation process.
 Using its internal layers and learned patterns, G transforms the noise vector into a
new data sample, like a generated image.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Discriminator’s Turn
D receives two kinds of inputs:

 Real data samples from the training dataset.

 The data samples generated by G in the previous step.
D’s job is to analyze each input and determine whether it’s real data or something G cooked
up. It outputs a probability score between 0 and 1. A score of 1 indicates the data is likely real,
and 0 suggests it’s fake.
Adversarial Learning

 If the discriminator correctly classifies real data as real and fake data as fake, it
strengthens its ability slightly.
 If the generator successfully fools the discriminator, it receives a positive update,
while the discriminator is penalized.
Generator’s Improvement
Every time the discriminator misclassifies fake data as real, the generator learns and
improves. Over multiple iterations, the generator produces more convincing synthetic
samples.
Discriminator’s Adaptation
The discriminator continuously refines its ability to distinguish real from fake data. This
ongoing duel between the generator and discriminator enhances the overall model’s learning
process.
Training Progression

 As training continues, the generator becomes highly proficient at producing realistic

data.
 Eventually, the discriminator struggles to distinguish real from fake, indicating that the
GAN has reached a well-trained state.
 At this point, the generator can be used to generate high-quality synthetic data for
various applications.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Audio-Visual Speech Recognition using GAN

Types of GANs

 Vanilla GAN:
Vanilla GAN is the simplest type of GAN. It consists of:

 A generator and a discriminator, both are built using multi-layer perceptrons (MLPs).
 The model optimizes its mathematical formulation using stochastic gradient descent
(SGD).
 While Vanilla GANs serve as the foundation for more advanced GAN models, they
often struggle with issues like mode collapse and unstable training.

 Conditional GAN (CGAN)

 Conditional GANs (CGANs) introduce an additional conditional parameter to guide the
generation process.
 Instead of generating data randomly, CGANs allow the model to produce specific types
of outputs.
Working of CGANs:

 A conditional variable (y) is fed into both the generator and the discriminator.
 This ensures that the generator creates data corresponding to the given condition
(e.g., generating images of specific objects).
 The discriminator also receives the labels to help distinguish between real and fake
data.
 Deep Convolutional GAN (DCGAN)
Deep Convolutional GANs (DCGANs) are among the most popular and widely used types
of GANs, particularly for image generation.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

What Makes DCGAN Special?

 Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer

perceptrons (MLPs).
 Max pooling layers are replaced with convolutional stride, making the model more
efficient.
 Fully connected layers are removed, allowing for better spatial understanding of
images.

Application Of Generative Adversarial Networks (GANs)

 Image Synthesis & Generation: GANs generate realistic images, avatars, and high-
resolution visuals by learning patterns from training data. They are widely used in art,
gaming, and AI-driven design.
 Image-to-Image Translation: GANs can transform images between domains while
preserving key features. Examples include converting day images to night, sketches to
realistic images, or changing artistic styles.
 Text-to-Image Synthesis: GANs create visuals from textual descriptions, enabling
applications in AI-generated art, automated design, and content creation.
 Data Augmentation: GANs generate synthetic data to improve machine learning
models, making them more robust and generalizable, especially in fields with limited
labeled data.
 High-Resolution Image Enhancement: GANs upscale low-resolution images,
improving clarity for applications like medical imaging, satellite imagery, and video
enhancement.
Advantages of GAN
The advantages of the GANs are as follows:

 Synthetic data generation: GANs can generate new, synthetic data that resembles
some known data distribution, which can be useful for data augmentation, anomaly
detection, or creative applications.
 High-quality results: GANs can produce high-quality, photorealistic results in image
synthesis, video synthesis, music synthesis, and other tasks.
 Unsupervised learning: GANs can be trained without labelled data, making them
suitable for unsupervised learning tasks, where labelled data is scarce or difficult to
obtain.
 Versatility: GANs can be applied to a wide range of tasks, including image synthesis,
text-to-image synthesis, image-to-image translation, anomaly detection, data
augmentation, and others.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Shallow Neural Networks

 A shallow neural network refers to a neural network that consists of only one hidden
layer between the input and output layers.
 This structure is simpler compared to deep neural networks that feature multiple
hidden layers.
 Despite their simplicity, shallow networks are powerful tools capable of
approximating any function, given sufficient neurons in the hidden layer—a property
known as the universal approximation theorem.
Components of a Shallow Neural Network
Input Layer: This is where the network receives its input data. Each neuron in this layer
represents a feature of the input dataset.
Hidden Layer: The single hidden layer in a shallow network transforms the inputs into
something that the output layer can use. The neurons in this layer apply a set of weights to
the inputs and pass them through an activation function to introduce non-linearity to the
process.
Output Layer: The final layer produces the output of the network. For regression tasks, this
might be a single neuron; for classification, it could be multiple neurons corresponding to the
classes.

How Do Shallow Neural Networks Work?

The functionality of shallow neural networks hinges on the transformation of inputs through
the hidden layer to produce outputs. Here's a step-by-step breakdown:

 Weighted Sum: Each neuron in the hidden layer calculates a weighted sum of the
inputs.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

 Activation Function: The weighted sums are passed through an activation function
(such as Sigmoid, Tanh, or ReLU) to introduce non-linearity, enabling the network to
learn complex patterns.
 Output Generation: The output layer integrates the signals from the hidden layer,
often through another set of weights, to produce the final output.
Training Shallow Neural Networks
Training a shallow neural network typically involves:

 Forward Propagation: Calculating the output for a given input by passing it through
the layers of the network.
 Loss Calculation: Determining how far the network's output is from the actual desired
output using a loss function.
 Backpropagation: Calculating the gradient of the loss function with respect to each
weight in the network, which informs how the weights should be adjusted to minimize
the loss.
 Weight Update: Adjusting the weights using an optimization algorithm like gradient
descent.

Training Shallow Neural Network for Binary Classification

 Step 1: Importing Libraries

 Step 2: Generating and Pre-processing Data
 Step 3: Building the Shallow Neural Network
 Step 4: Training the Model
 Step 5: Evaluating the Model
Code:
# Import necessary libraries
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam

# Generate synthetic data

X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)

# Split the dataset into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

# Scale the features

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize the model

model = Sequential()

# Add the hidden layer with 10 neurons and ReLU activation function
model.add(Dense(10, input_shape=(2,), activation='relu'))

# Add the output layer with sigmoid activation function for binary classification
model.add(Dense(1, activation='sigmoid'))

# Compile the model with Adam optimizer, binary cross-entropy loss, and accuracy metric
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy',
metrics=['accuracy'])

# Train the model

history = model.fit(X_train_scaled, y_train, epochs=100, verbose=1,
validation_data=(X_test_scaled, y_test))

# Evaluate the model on test data

results = model.evaluate(X_test_scaled, y_test)
print(f"Test Loss: {results[0]}, Test Accuracy: {results[1]}")

Advantages of Shallow Neural Networks

 Simplicity: Easier to set up and train, requiring less computational resources than
deep neural networks.
 Speed: Faster training times due to fewer parameters and computational complexity.
 Less Prone to Overfitting: With fewer layers and weights, they can generalize better
to new data, provided they are adequately trained.
 Good for Small Datasets: Effective in situations where the volume of data is limited,
and deep networks might overfit.

Limitations of Shallow Neural Networks

 Limited Complexity: May not capture complex patterns as effectively as deeper
networks, particularly in large or high-dimensional datasets.
 Less Flexibility: Often outperformed by deep networks in tasks involving high levels of
abstraction, such as image and speech recognition.

Applications of Shallow Neural Networks

Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Shallow neural networks are particularly useful in scenarios where simplicity and speed are
more critical than capturing complex relationships. They are commonly used in:

 Binary Classification Tasks: Simple decision boundaries can be effectively learned by

shallow networks.
 Baseline Models: Quick initial assessments for machine learning tasks can be
efficiently provided by shallow networks.
 Small-scale Regression: Modeling relationships in small or medium-sized datasets
where deep networks might overfit.

Difference between Deep nets and shallow networks

shallow networks Deep neural networks

Shallow Neural network with few layers Deep Neural network with many layers
(usually 1 hidden layer) (multiple hidden layers)
Complexity is low Complexity is high
Limited learning capacity Higher learning capacity
Lower risk of overfitting Higher risk of overfitting
Requires less data Requires more data for effective training
Requires less computational resources Requires more computational resources
(e.g., GPUs)
Example: Single-layer Perceptron, Logistic Example: Convolutional Neural Networks
Regression. (CNNs), Recurrent Neural Networks (RNNs)
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

A Probabilistic Theory of Deep Learning

 By introducing probability to a deep learning system, we introduce common sense to

the system.
 Otherwise the system would be very brittle and will not be useful. In deep learning,
several models like bayesian models, probabilistic graphical models, and hidden
markov models are used. They depend entirely on probability concepts.
 Real world data is chaotic. Since deep learning systems utilize real world data, they
require a tool to handle the chaoticness.
 It is always practical to use a simple and uncertain system rather than a complex but
certain and brittle one.
 For instance, visual object recognition involves the unknown object position,
orientation, and scale in object recognition while speech recognition involves the
unknown voice pronunciation, pitch, and speed.
Significance of Probabilistic theory

Probabilistic theory plays a fundamental role in deep learning by providing a mathematical

framework to model uncertainty, optimize learning algorithms, and interpret predictions.
Here are some key aspects of its significance:

1. Bayesian Inference and Uncertainty Estimation

 Deep learning models often make predictions in uncertain environments. Probability
theory allows us to quantify this uncertainty using Bayesian inference.
 Bayesian neural networks incorporate probabilistic weights to model uncertainty in
predictions.

2. Loss Functions and Optimization

 Many loss functions in deep learning are derived from probabilistic principles, such as
cross-entropy loss, which is based on the likelihood function in probability theory.
 Probabilistic frameworks like Maximum Likelihood Estimation (MLE) and Maximum A
Posteriori (MAP) estimation guide model training.

3. Generative Models
 Generative models like Variational Autoencoders (VAEs) and Generative Adversarial
Networks (GANs) rely heavily on probability distributions to generate new data
samples.
 VAEs use latent variable models with probabilistic encoding and decoding.

4. Regularization Techniques
 Dropout, a common regularization technique, can be interpreted as an approximation
to Bayesian inference, where a probability distribution over model parameters is
considered.
 L1 and L2 regularization are linked to probabilistic priors in Bayesian modelling.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

5. Markov Processes and Sequential Models

- Probabilistic models

Backpropagation in Neural Network

Backpropagation is also known as "Backward Propagation of Errors" and it is a method used

to train neural network. Its goal is to reduce the difference between the model’s predicted
output and the actual output by adjusting the weights and biases in the network

What is Backpropagation?

 Backpropagation is a technique used in deep learning to train artificial neural networks

particularly feed-forward networks. It works iteratively to adjust weights and bias to
minimize the cost function.
 In each epoch the model adapts these parameters reducing loss by following the error
gradient. Backpropagation often uses optimization algorithms like gradient descent or
stochastic gradient descent.
 The algorithm computes the gradient using the chain rule from calculus allowing it to
effectively navigate complex layers in the neural network to minimize the cost
function.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Backpropagation plays a critical role in how neural networks improve over time.
 Efficient Weight Update: It computes the gradient of the loss function with respect to
each weight using the chain rule making it possible to update weights efficiently.
 Scalability: The backpropagation algorithm scales well to networks with multiple
layers and complex architectures making deep learning feasible.
 Automated Learning: With backpropagation the learning process becomes
automated and the model can adjust itself to optimize its performance.

Working of Backpropagation Algorithm

The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward
Pass.

How Does Forward Pass Work?

 In forward pass the input data is fed into the input layer. These inputs combined with
their respective weights are passed to hidden layers.
 For example in a network with two hidden layers (h1 and h2) the output from h1
serves as the input to h2. Before applying an activation function, a bias is added to the
weighted inputs.
 Each hidden layer applies an activation function like ReLU (Rectified Linear Unit) which
returns the input if it’s positive and zero otherwise. This adds non-linearity allowing
the model to learn complex relationships in the data.
 Finally the outputs from the last hidden layer are passed to the output layer where an
activation function such as softmax converts the weighted outputs into probabilities
for classification.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

How Does the Backward Pass Work?

 In the backward pass the error (the difference between the predicted and actual
output) is propagated back through the network to adjust the weights and biases.
 One common method for error calculation is the Mean Squared Error (MSE) given by:

 Once the error is calculated the network adjusts weights using gradients which are
computed with the chain rule.
 These gradients indicate how much each weight and bias should be adjusted to
minimize the error in the next iteration.
 The backward pass continues layer by layer ensuring that the network learns and
improves its performance. The activation function through its derivative plays a crucial
role in computing these gradients during backpropagation.

Example of Backpropagation in Machine Learning

Let’s walk through an example of backpropagation in machine learning. Assume the neurons
use the sigmoid activation function for the forward and backward pass. The target output is
0.5, and the learning rate is 1.

Forward Propagation

1. Initial Calculation: The weighted sum at each node is calculated using:

Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

2. Sigmoid Function: The sigmoid function returns a value between 0 and 1, introducing
non-linearity into the model.

3. Computing Outputs: At h1 node

Once we calculated the a1 value, we can now proceed to find the y3 value:

Similarly find the values of y4 at h2 and y5 at O3

Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

4. Error Calculation
Our actual output is 0.5 but we obtained 0.67. To calculate the error we can use the
below formula:

Using this error value we will be backpropagation.

Backpropagation

1. Calculating Gradients
The change in each weight is calculated as:
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

2. Output Unit Error: For O3:

3. Hidden Unit Error

For h1:

For h2:

4. Weight Updates
For the weights from hidden to output layer:

New weight:

For weights from input to hidden layer:

New weight:

Similarly other weights are updated:

Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

The updated weights are illustrated below

After updating the weights the forward pass is repeated yielding:

Since 𝑦5 = 0.61 is still not the target output the process of calculating the error and
backpropagation continues until the desired output is reached.

This process demonstrates how backpropagation iteratively updates weights by minimizing

errors until the network accurately predicts the output.

This process is said to be continued until the actual output is gained by the neural network.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Regularization

 Regularization is a technique used in machine learning to prevent overfitting.

Overfitting happens when a model learns the training data too well, including the
noise and outliers, which causes it to perform poorly on new data.
 In simple terms, regularization adds a penalty to the model for being too complex,
encouraging it to stay simpler and more general.
 This way, it’s less likely to make extreme predictions based on the noise in the data.

The commonly used regularization techniques are:

1. Lasso Regularization – (L1 Regularization): Achieve feature selection

2. Ridge Regularization – (L2 Regularization): Updating the feature weights
3. Elastic Net Regularization – (L1 and L2 Regularization): add the absolute norm of the
weights

Batch Normalization

 Batch normalisation is a technique for improving the performance and stability of

neural networks, and also makes more sophisticated deep learning architectures work
in practice
 The normalization process involves calculating the mean and variance of each feature
in a mini-batch and then scaling and shifting the features using these statistics.
 This ensures that the input to each layer remains roughly in the same distribution,
regardless of changes in the distribution of earlier layers' outputs.
 Consequently, Batch Normalization helps in stabilizing the training process, enabling
higher learning rates and faster convergence.

Need for Batch Normalization

 Batch Normalization is extension of concept of normalization from just the input layer
to the activations of each hidden layer throughout the neural network.
 By normalizing the activations of each layer, Batch Normalization helps to alleviate the
internal covariate shift problem, which can hinder the convergence of the network
during training.
 In traditional neural networks, as the input data propagates through the network, the
distribution of each layer's inputs changes. This phenomenon, known as internal
covariate shift, can slow down the training process.
 Batch Normalization aims to mitigate this issue by normalizing the inputs of each layer.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Semi-Supervised Learning

 Semi-supervised learning is a type of machine learning that falls in between supervised

and unsupervised learning.
 It is a method that uses a small amount of labelled data and a large amount of
unlabelled data to train a model.
 The goal of semi-supervised learning is to learn a function that can accurately predict
the output variable based on the input variables, similar to supervised learning.
 However, unlike supervised learning, the algorithm is trained on a dataset that
contains both labelled and unlabelled data.

Semi-supervised learning is particularly useful when there is a large amount of unlabelled data
available, but it’s too expensive or difficult to label all of it.

Intuitively, one may imagine the three types of learning algorithms as Supervised learning
where a student is under the supervision of a teacher at both home and school, Unsupervised
learning where a student has to figure out a concept himself and Semi-Supervised learning
where a teacher teaches a few concepts in class and gives questions as homework which are
based on similar concepts.

Examples of Semi-Supervised Learning

 Text classification: In text classification, the goal is to classify a given text into one or
more predefined categories. Semi-supervised learning can be used to train a text
classification model using a small amount of labelled data and a large amount of
unlabelled text data.
 Image classification: In image classification, the goal is to classify a given image into
one or more predefined categories. Semi-supervised learning can be used to train an
image classification model using a small amount of labelled data and a large amount
of unlabelled image data.
 Anomaly detection: In anomaly detection, the goal is to detect patterns or
observations that are unusual or different from the norm
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning

Question Bank

1. Illustrate the significance of probabilistic theory in deep learning

2. Discuss and explain the working of Convolutional Neural Networks (CNNs) and their
role in modern deep learning.
3. Write the difference between deep nets and shallow networks
4. Generative Adversarial Networks (GANs) have revolutionized the field of generative
modeling. Briefly discuss generator model and discriminator model.
5. With a suitable model and equations briefly explain the discriminator model and
generator model
6. Define semi-supervised learning. List any four different application
7. What is backpropagation with a neat sketch and mathematical equations briefly
explain the backpropagation neural networks
8. What is batch normalization and discuss the need of batch normalization

DL Unit-II
No ratings yet
DL Unit-II
40 pages
AD3501-DL-Unit 2
No ratings yet
AD3501-DL-Unit 2
33 pages
CNN Basics for AI Enthusiasts
No ratings yet
CNN Basics for AI Enthusiasts
29 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
AI Slide 2
No ratings yet
AI Slide 2
82 pages
DL Ia2
No ratings yet
DL Ia2
13 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Module 5
No ratings yet
Module 5
20 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Unit III
No ratings yet
Unit III
89 pages
DL Unit 4 Modified
No ratings yet
DL Unit 4 Modified
64 pages
DL Unit 4
No ratings yet
DL Unit 4
58 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
6 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
An Introduction To Convolutional Neural Networks
No ratings yet
An Introduction To Convolutional Neural Networks
11 pages
CNN Layer Sequence in Transfer Learning
No ratings yet
CNN Layer Sequence in Transfer Learning
8 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Unit 2
No ratings yet
Unit 2
20 pages
Deep Learning Unit-III
No ratings yet
Deep Learning Unit-III
9 pages
4th Unit Aktu Machine Learning
No ratings yet
4th Unit Aktu Machine Learning
9 pages
CNN, RNN
No ratings yet
CNN, RNN
60 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Introduction to CNNs in Deep Learning
No ratings yet
Introduction to CNNs in Deep Learning
42 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
CNN Notes Unit-3
No ratings yet
CNN Notes Unit-3
12 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
19 pages
AI & CNN Concepts in B.Tech CSE
No ratings yet
AI & CNN Concepts in B.Tech CSE
7 pages
What Is A Convolutional Neural Network-Unit3
No ratings yet
What Is A Convolutional Neural Network-Unit3
12 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
CNNs for Image Recognition
No ratings yet
CNNs for Image Recognition
16 pages
UNIT 2 Study Materials 1
No ratings yet
UNIT 2 Study Materials 1
42 pages
Convolutional Neural Networks Overview
No ratings yet
Convolutional Neural Networks Overview
14 pages
Convolutional Neural Networks CNN
No ratings yet
Convolutional Neural Networks CNN
8 pages
Max78000 Article Series Part 1
No ratings yet
Max78000 Article Series Part 1
4 pages
Chapter14 CNN
No ratings yet
Chapter14 CNN
54 pages
Intro to CNNs for Tech Enthusiasts
No ratings yet
Intro to CNNs for Tech Enthusiasts
31 pages
Convolutional Neural Networks Guide
No ratings yet
Convolutional Neural Networks Guide
31 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
CNNs: Deep Learning for Visual Data
No ratings yet
CNNs: Deep Learning for Visual Data
21 pages
Unit III
No ratings yet
Unit III
89 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
Variants of CNN (Page No 17-23), Structured Output (29-31), Datatypes
No ratings yet
Variants of CNN (Page No 17-23), Structured Output (29-31), Datatypes
31 pages
Unit 1 GEN AI
No ratings yet
Unit 1 GEN AI
61 pages
DL-Unit-3 Final
No ratings yet
DL-Unit-3 Final
25 pages
2111CS010077 Deep Learning
No ratings yet
2111CS010077 Deep Learning
10 pages
Sommaire CNN Presentation
No ratings yet
Sommaire CNN Presentation
10 pages
CNN Basics for AI Enthusiasts
No ratings yet
CNN Basics for AI Enthusiasts
6 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Convolutional Neural Networks: Convolutional Layer Pooling Layer Fully Connected Layer
No ratings yet
Convolutional Neural Networks: Convolutional Layer Pooling Layer Fully Connected Layer
33 pages
Convolutional Neural Networks 2 Now
No ratings yet
Convolutional Neural Networks 2 Now
6 pages
Poojitha Updated Resume
No ratings yet
Poojitha Updated Resume
2 pages
Health Informatics (Reviewer)
No ratings yet
Health Informatics (Reviewer)
2 pages
Mr. Cua Rice - Authentic ST25 Rice - English
No ratings yet
Mr. Cua Rice - Authentic ST25 Rice - English
20 pages
DL - 3850 - Parts Catalog
No ratings yet
DL - 3850 - Parts Catalog
21 pages
2023 SpecSheet 71036 Defender-Base 1.0.1
No ratings yet
2023 SpecSheet 71036 Defender-Base 1.0.1
2 pages
ATT-CEM-18002 OEM Radio Breaker Size Standard v27
No ratings yet
ATT-CEM-18002 OEM Radio Breaker Size Standard v27
4 pages
Gradient Descent Algorithm and Back-Propagation Derivation
No ratings yet
Gradient Descent Algorithm and Back-Propagation Derivation
4 pages
Mine Office Start Up Layout
No ratings yet
Mine Office Start Up Layout
1 page
OOP-I - Practical - List - Even - 2022-23
No ratings yet
OOP-I - Practical - List - Even - 2022-23
3 pages
Ba Sqr2!05!14 Ac2 Modbus en
No ratings yet
Ba Sqr2!05!14 Ac2 Modbus en
92 pages
47 - 23@ Vocabspedia PSC
No ratings yet
47 - 23@ Vocabspedia PSC
25 pages
Secure Scalable Video Conference Solution
No ratings yet
Secure Scalable Video Conference Solution
2 pages
Software Project Management: Telone Centre For Learning
No ratings yet
Software Project Management: Telone Centre For Learning
10 pages
Data Storage and Back Up ALBERTO COUTO
No ratings yet
Data Storage and Back Up ALBERTO COUTO
2 pages
MIS 107 Mid Nafisa Assignment
No ratings yet
MIS 107 Mid Nafisa Assignment
3 pages
Ict Its4 09 0811 Monitor and Administer Database
100% (8)
Ict Its4 09 0811 Monitor and Administer Database
24 pages
IWA Winch User Manual
No ratings yet
IWA Winch User Manual
32 pages
BESCK104E-204E Module-3 - Notes
No ratings yet
BESCK104E-204E Module-3 - Notes
24 pages
R Integration User Guide
No ratings yet
R Integration User Guide
46 pages
Practice Test - Ai Modeling
No ratings yet
Practice Test - Ai Modeling
22 pages
Sign Language Recognition Project
No ratings yet
Sign Language Recognition Project
24 pages
Sha 3
No ratings yet
Sha 3
10 pages
Corporate e-Learning Market Insights
No ratings yet
Corporate e-Learning Market Insights
5 pages
4th Project Class XI
No ratings yet
4th Project Class XI
2 pages
DSC User Manual for Windows
No ratings yet
DSC User Manual for Windows
19 pages
Communication
No ratings yet
Communication
1,695 pages
Troubleshooting
No ratings yet
Troubleshooting
46 pages
SPM ST2 QP Solution
No ratings yet
SPM ST2 QP Solution
23 pages
BDA Viva
No ratings yet
BDA Viva
26 pages
Industrial Serial to WiFi Converter
No ratings yet
Industrial Serial to WiFi Converter
2 pages