GANs and Deep Learning Overview
GANs and Deep Learning Overview
Learning
Feature extraction is another aspect of deep learning. It is used for pattern recognition
and image processing.
Feature extraction uses an algorithm to automatically construct meaningful
“features” of the data for purposes of training, learning, and understanding.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
The Convolutional layer applies filters to the input image to extract features, the
Pooling layer downsamples the image to reduce computation, and the fully connected
layer makes the final prediction.
The network learns the optimal filters through backpropagation and gradient descent.
The Convolutional layer applies filters to the input image to extract features, the
Pooling layer downsamples the image to reduce computation, and the fully connected
layer makes the final prediction.
How Convolutional Layers Works?
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Convolution Neural Networks or covnets are neural networks that share their
parameters. Imagine you have an image.
It can be represented as a cuboid having its length, width (dimension of the image),
and height (i.e the channel as images generally have red, green, and blue channels).
Now imagine taking a small patch of this image and running a small neural network,
called a filter or kernel on it, with say, K outputs and representing them vertically.
Now slide that neural network across the whole image, as a result, we will get another
image with different widths, heights, and depths.
Instead of just R, G, and B channels now we have more channels but lesser width and
height. This operation is called Convolution.
If the patch size is the same as that of the image it will be a regular neural network.
Because of this small patch, we have fewer weights.
Types of layers:
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
Input Layers:
It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images.
This layer holds the raw input of the image with width 32, height 32, and depth
3.
Convolutional Layers:
This is the layer, which is used to extract the feature from the input dataset. It
applies a set of learnable filters known as the kernels to the input images.
The filters/kernels are smaller matrices usually 2×2, 3×3, or 5×5 shape. it slides
over the input image data and computes the dot product between kernel
weight and the corresponding input image patch.
The output of this layer is referred as feature maps. Suppose we use a total of
12 filters for this layer we’ll get an output volume of dimension 32 x 32 x 12.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Activation Layer:
By adding an activation function to the output of the preceding layer,
activation layers add nonlinearity to the network.
it will apply an element-wise activation function to the output of the
convolution layer.
Some common activation functions are RELU: max(0, x), Tanh, Leaky RELU,
etc. The volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.
Pooling layer:
This layer is periodically inserted in the covnets and its main function is to
reduce the size of volume which makes the computation fast reduces memory
and also prevents overfitting.
Two common types of pooling layers are max pooling and average pooling. If
we use a max pool with 2 x 2 filters and stride 2, the resultant volume will be
of dimension 16x16x12.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Flattening layer: The resulting feature maps are flattened into a one-dimensional
vector after the convolution and pooling layers so they can be passed into a
completely linked layer for categorization or regression.
Fully Connected Layers: It takes the input from the previous layer and computes the
final classification or regression task.
Output Layer: The output from the fully connected layers is then fed into a logistic
function for classification tasks like sigmoid or softmax which converts the output of
each class into the probability score of each class.
Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow and his
colleagues in 2014.
GANs are a class of neural networks that autonomously learn patterns in the input
data to generate new examples resembling the original dataset.
GAN’s architecture consists of two networks:
Generator: creates synthetic data from random noise to produce data so realistic that
the discriminator cannot distinguish it from real data.
Discriminator: acts as a critic, evaluating whether the data it receives is real or fake.
The Generator improves its ability to create realistic data, while the Discriminator
becomes better at detecting fakes.
Over time, this adversarial process leads to the generation of highly realistic and high-
quality data.
Detailed Architecture of GANs
Let’s explore the generator and discriminator model of GANs in detail:
1. Generator Model
The generator is a deep neural network that takes random noise as input to generate realistic
data samples (e.g., images or text). It learns the underlying data distribution by adjusting its
parameters through backpropagation.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
The generator’s objective is to produce samples that the discriminator classifies as real. The
loss function is:
2. Discriminator Model
The discriminator acts as a binary classifier, distinguishing between real and generated
data.
It learns to improve its classification ability through training, refining its parameters to
detect fake samples more accurately.
When dealing with image data, the discriminator often employs convolutional layers
or other relevant architectures suited to the data type.
These layers help extract features and enhance the model’s ability to differentiate
between real and generated samples.
The discriminator reduces the negative log likelihood of correctly classifying both
produced and real samples.
This loss incentivizes the discriminator to accurately categorize generated samples as
fake and real samples with the following equation:
The generator aims to minimize the loss, while the discriminator tries to maximize its
classification accuracy.
G takes a random noise vector as input. This noise vector contains random values and
acts as the starting point for G’s creation process.
Using its internal layers and learned patterns, G transforms the noise vector into a
new data sample, like a generated image.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Discriminator’s Turn
D receives two kinds of inputs:
If the discriminator correctly classifies real data as real and fake data as fake, it
strengthens its ability slightly.
If the generator successfully fools the discriminator, it receives a positive update,
while the discriminator is penalized.
Generator’s Improvement
Every time the discriminator misclassifies fake data as real, the generator learns and
improves. Over multiple iterations, the generator produces more convincing synthetic
samples.
Discriminator’s Adaptation
The discriminator continuously refines its ability to distinguish real from fake data. This
ongoing duel between the generator and discriminator enhances the overall model’s learning
process.
Training Progression
Types of GANs
Vanilla GAN:
Vanilla GAN is the simplest type of GAN. It consists of:
A generator and a discriminator, both are built using multi-layer perceptrons (MLPs).
The model optimizes its mathematical formulation using stochastic gradient descent
(SGD).
While Vanilla GANs serve as the foundation for more advanced GAN models, they
often struggle with issues like mode collapse and unstable training.
A conditional variable (y) is fed into both the generator and the discriminator.
This ensures that the generator creates data corresponding to the given condition
(e.g., generating images of specific objects).
The discriminator also receives the labels to help distinguish between real and fake
data.
Deep Convolutional GAN (DCGAN)
Deep Convolutional GANs (DCGANs) are among the most popular and widely used types
of GANs, particularly for image generation.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Image Synthesis & Generation: GANs generate realistic images, avatars, and high-
resolution visuals by learning patterns from training data. They are widely used in art,
gaming, and AI-driven design.
Image-to-Image Translation: GANs can transform images between domains while
preserving key features. Examples include converting day images to night, sketches to
realistic images, or changing artistic styles.
Text-to-Image Synthesis: GANs create visuals from textual descriptions, enabling
applications in AI-generated art, automated design, and content creation.
Data Augmentation: GANs generate synthetic data to improve machine learning
models, making them more robust and generalizable, especially in fields with limited
labeled data.
High-Resolution Image Enhancement: GANs upscale low-resolution images,
improving clarity for applications like medical imaging, satellite imagery, and video
enhancement.
Advantages of GAN
The advantages of the GANs are as follows:
Synthetic data generation: GANs can generate new, synthetic data that resembles
some known data distribution, which can be useful for data augmentation, anomaly
detection, or creative applications.
High-quality results: GANs can produce high-quality, photorealistic results in image
synthesis, video synthesis, music synthesis, and other tasks.
Unsupervised learning: GANs can be trained without labelled data, making them
suitable for unsupervised learning tasks, where labelled data is scarce or difficult to
obtain.
Versatility: GANs can be applied to a wide range of tasks, including image synthesis,
text-to-image synthesis, image-to-image translation, anomaly detection, data
augmentation, and others.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Weighted Sum: Each neuron in the hidden layer calculates a weighted sum of the
inputs.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Activation Function: The weighted sums are passed through an activation function
(such as Sigmoid, Tanh, or ReLU) to introduce non-linearity, enabling the network to
learn complex patterns.
Output Generation: The output layer integrates the signals from the hidden layer,
often through another set of weights, to produce the final output.
Training Shallow Neural Networks
Training a shallow neural network typically involves:
Forward Propagation: Calculating the output for a given input by passing it through
the layers of the network.
Loss Calculation: Determining how far the network's output is from the actual desired
output using a loss function.
Backpropagation: Calculating the gradient of the loss function with respect to each
weight in the network, which informs how the weights should be adjusted to minimize
the loss.
Weight Update: Adjusting the weights using an optimization algorithm like gradient
descent.
# Add the hidden layer with 10 neurons and ReLU activation function
model.add(Dense(10, input_shape=(2,), activation='relu'))
# Add the output layer with sigmoid activation function for binary classification
model.add(Dense(1, activation='sigmoid'))
# Compile the model with Adam optimizer, binary cross-entropy loss, and accuracy metric
model.compile(optimizer=Adam(learning_rate=0.01),loss='binary_crossentropy',
metrics=['accuracy'])
Shallow neural networks are particularly useful in scenarios where simplicity and speed are
more critical than capturing complex relationships. They are commonly used in:
3. Generative Models
Generative models like Variational Autoencoders (VAEs) and Generative Adversarial
Networks (GANs) rely heavily on probability distributions to generate new data
samples.
VAEs use latent variable models with probabilistic encoding and decoding.
4. Regularization Techniques
Dropout, a common regularization technique, can be interpreted as an approximation
to Bayesian inference, where a probability distribution over model parameters is
considered.
L1 and L2 regularization are linked to probabilistic priors in Bayesian modelling.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
What is Backpropagation?
Backpropagation plays a critical role in how neural networks improve over time.
Efficient Weight Update: It computes the gradient of the loss function with respect to
each weight using the chain rule making it possible to update weights efficiently.
Scalability: The backpropagation algorithm scales well to networks with multiple
layers and complex architectures making deep learning feasible.
Automated Learning: With backpropagation the learning process becomes
automated and the model can adjust itself to optimize its performance.
The Backpropagation algorithm involves two main steps: the Forward Pass and the Backward
Pass.
In the backward pass the error (the difference between the predicted and actual
output) is propagated back through the network to adjust the weights and biases.
One common method for error calculation is the Mean Squared Error (MSE) given by:
Once the error is calculated the network adjusts weights using gradients which are
computed with the chain rule.
These gradients indicate how much each weight and bias should be adjusted to
minimize the error in the next iteration.
The backward pass continues layer by layer ensuring that the network learns and
improves its performance. The activation function through its derivative plays a crucial
role in computing these gradients during backpropagation.
Let’s walk through an example of backpropagation in machine learning. Assume the neurons
use the sigmoid activation function for the forward and backward pass. The target output is
0.5, and the learning rate is 1.
Forward Propagation
2. Sigmoid Function: The sigmoid function returns a value between 0 and 1, introducing
non-linearity into the model.
Once we calculated the a1 value, we can now proceed to find the y3 value:
4. Error Calculation
Our actual output is 0.5 but we obtained 0.67. To calculate the error we can use the
below formula:
Backpropagation
1. Calculating Gradients
The change in each weight is calculated as:
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
For h2:
4. Weight Updates
For the weights from hidden to output layer:
New weight:
New weight:
Since 𝑦5 = 0.61 is still not the target output the process of calculating the error and
backpropagation continues until the desired output is reached.
This process is said to be continued until the actual output is gained by the neural network.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Regularization
Batch Normalization
Batch Normalization is extension of concept of normalization from just the input layer
to the activations of each hidden layer throughout the neural network.
By normalizing the activations of each layer, Batch Normalization helps to alleviate the
internal covariate shift problem, which can hinder the convergence of the network
during training.
In traditional neural networks, as the input data propagates through the network, the
distribution of each layer's inputs changes. This phenomenon, known as internal
covariate shift, can slow down the training process.
Batch Normalization aims to mitigate this issue by normalizing the inputs of each layer.
Unit-2: Generative Adversarial Networks (GAN) and Semi-Supervised
Learning
Semi-Supervised Learning
Semi-supervised learning is particularly useful when there is a large amount of unlabelled data
available, but it’s too expensive or difficult to label all of it.
Intuitively, one may imagine the three types of learning algorithms as Supervised learning
where a student is under the supervision of a teacher at both home and school, Unsupervised
learning where a student has to figure out a concept himself and Semi-Supervised learning
where a teacher teaches a few concepts in class and gives questions as homework which are
based on similar concepts.
Question Bank