Deep learning Assignment questions
Module 1: Foundations of Deep Learning
Question 1: Explain the difference between L1 and L2 regularization.
Answer: L1 regularization (Lasso) adds a penalty equal to the absolute value of the magnitude
of the coefficients, which can lead to sparse models by driving some coefficients to zero. L2
regularization (Ridge) adds a penalty equal to the squared magnitude of the coefficients, which
shrinks all coefficients toward zero but does not set them to exactly zero.
Question 2: Describe the purpose of an activation function in a neural network.
Answer: An activation function introduces non-linearity into the network, allowing it to learn
and model more complex relationships between inputs and outputs. Without them, a neural
network would only be able to learn linear transformations, regardless of the number of layers.
Question 3: What is the primary goal of the Backpropagation algorithm?
Answer: Backpropagation is an algorithm used to efficiently calculate the gradients of the loss
function with respect to the weights and biases of a neural network. It's the core method for
training neural networks, enabling the network to adjust its parameters to minimize the loss.
Question 4: Define the Bias-Variance Tradeoff.
Answer: The Bias-Variance Tradeoff is a central concept in machine learning that describes the
relationship between a model's bias and its variance. High bias occurs when a model is too
simple and underfits the data, while high variance occurs when a model is too complex and
overfits the data. The goal is to find a balance between the two to achieve optimal generalization.
Module 2: Convolutional Neural Networks (CNNs)
Question 1: What is the function of a convolutional layer in a CNN, and what is a filter (or
kernel)?
Answer: A convolutional layer applies a set of learnable filters to an input image to produce
feature maps. A filter is a small matrix of weights that slides over the input image, performing a
dot product at each position to detect specific features like edges, textures, or patterns.
Question 2: Briefly explain the difference between LeNet and AlexNet.
Answer: LeNet was a pioneering CNN for digit recognition, with a relatively simple
architecture. AlexNet was a larger and more complex CNN that won the 2012 ImageNet
competition, introducing key advancements such as using ReLU activation functions, dropout
regularization, and data augmentation, which significantly improved performance.
Question 3: Describe the main purpose of transfer learning in the context of CNNs.
Answer: Transfer learning involves using a pre-trained CNN, which has been trained on a large
dataset (like ImageNet), as a starting point for a new, often smaller, task. The pre-trained model's
learned features from the original task are "transferred" and fine-tuned for the new task, which
can lead to faster training and better performance, especially when limited data is available.
Question 4: How does a pooling layer contribute to a CNN's performance?
Answer: A pooling layer reduces the spatial dimensions (width and height) of the feature maps.
This reduces the number of parameters and computational costs, helps to control overfitting, and
makes the model more robust to minor shifts and distortions in the input image.
Module 3: Recurrent Neural Networks (RNNs) & Sequence Models
Question 1: What is the main limitation of a basic RNN, and how do LSTMs and GRUs address
this?
Answer: The main limitation of basic RNNs is the vanishing gradient problem, which makes it
difficult for them to learn long-term dependencies in sequential data. LSTMs (Long Short-Term
Memory) and GRUs (Gated Recurrent Units) address this by using gating mechanisms to control
the flow of information, allowing them to selectively remember or forget past information over
long sequences.
Question 2: Explain the concept of a sequence-to-sequence model.
Answer: A sequence-to-sequence model consists of two main components: an encoder and a
decoder. The encoder processes an input sequence and compresses it into a fixed-size context
vector, and the decoder takes this context vector to generate an output sequence. This
architecture is commonly used for tasks like machine translation and text summarization.
Question 3: What is the role of an Attention Mechanism in sequence models?
Answer: An attention mechanism allows the model to "pay attention" to different parts of the
input sequence when generating each part of the output sequence. Instead of relying on a single
context vector, it creates a dynamic context that is relevant to the current output being generated,
improving performance on long and complex sequences.
Question 4: How does the Transformer architecture differ fundamentally from RNN-based
models for sequence processing?
Answer: The Transformer architecture abandons the sequential, recurrent nature of RNNs.
Instead, it processes all time steps of a sequence in parallel using a self-attention mechanism,
which allows it to capture long-range dependencies more effectively and is more
computationally efficient, especially with modern hardware like GPUs.
Module 4: Generative Models
Question 1: What are the two primary components of a Generative Adversarial Network (GAN),
and what is their relationship?
Answer: A GAN has a generator and a discriminator. The generator's goal is to create new data
that is indistinguishable from real data. The discriminator's goal is to distinguish between real
data and the fake data produced by the generator. They are trained in an adversarial, two-player
game until the generator can produce data that fools the discriminator.
Question 2: Explain the concept of a Variational Autoencoder (VAE).
Answer: A VAE is a generative model that learns a probabilistic representation of the training
data. It consists of an encoder that maps input data to a latent distribution (mean and variance),
and a decoder that samples from this distribution to reconstruct the original input. This allows
VAEs to generate new, similar-looking data by sampling from the learned latent space.
Question 3: How do Diffusion Models work to generate images?
Answer: Diffusion models work in two stages: a forward diffusion process and a reverse
diffusion process. The forward process gradually adds Gaussian noise to an image until it is pure
noise. The reverse process is a trained neural network that learns to reverse this process step-by-
step, gradually denoising a random noise input to generate a new, high-quality image.
Question 4: Name two applications of generative models beyond image generation.
Answer: Generative models can be used for various applications, including style transfer (e.g.,
making a photo look like a painting), data synthesis (creating synthetic data for training other
models), and text generation (e.g., creating stories or articles).
Module: Autoencoders
Question 1: Describe the basic architecture and primary purpose of a standard autoencoder.
Answer: A standard autoencoder is a type of neural network consisting of two main parts: an
encoder and a decoder. The encoder compresses the input data into a lower-dimensional
representation called the latent vector. The decoder then reconstructs the original input from this
latent vector. Its primary purpose is dimensionality reduction and feature learning in an
unsupervised manner.
Question 2: What is the main limitation of a standard autoencoder that a Variational
Autoencoder (VAE) aims to solve?
Answer: A standard autoencoder's latent space is not continuous or regular. This means that a
standard autoencoder cannot be used to generate new data by sampling from its latent space, as
there is no guarantee that a randomly sampled point will produce a meaningful output.
Question 3: Explain the two primary components of a Variational Autoencoder (VAE) and what
makes it "variational."
Answer: A VAE consists of an encoder and a decoder. The encoder maps the input to a
probability distribution (specifically, a mean and variance) in the latent space, not a single point.
The "variational" aspect refers to the VAE's use of a Kullback–Leibler (KL) divergence loss
term, which regularizes the latent space by forcing it to conform to a standard normal
distribution, making the latent space continuous and easy to sample from.
Question 4: Name two distinct applications of autoencoders.
Answer: 1. Dimensionality Reduction: Autoencoders can be used to compress high-dimensional
data (like images) into a compact latent representation, which can then be used for visualization
or as a preprocessing step for other models.
2. Anomaly Detection: By training an autoencoder on normal data, it learns to reconstruct normal
patterns with a low reconstruction error. When presented with anomalous data, it will fail to
reconstruct it well, leading to a high reconstruction error that can be used to identify anomalies.
Module 5: Reinforcement Learning & Deep RL
Question 1: Describe the key components of a Markov Decision Process (MDP).
Answer: An MDP is a mathematical framework for modeling decision-making in a sequential
environment. Its key components are: a set of states (S), a set of actions (A), a transition function
(T) that defines the probability of moving to a new state given the current state and action, and a
reward function (R) that provides feedback for each action.
Question 2: Explain the difference between Q-Learning and Policy Gradients.
Answer: Q-Learning is a value-based reinforcement learning algorithm that learns a Q-value for
each state-action pair, representing the expected future reward. Policy Gradients are a class of
policy-based algorithms that directly learn a policy, which is a mapping from states to actions,
without explicitly learning the value function.
Question 3: What is the purpose of a Deep Q-Network (DQN)?
Answer: A DQN uses a deep neural network to approximate the Q-values, enabling
reinforcement learning to handle environments with very large or continuous state spaces, such
as video games. It uses techniques like experience replay and a target network to stabilize the
training process.
Question 4: What is the fundamental dilemma known as the "Exploration vs. Exploitation"
tradeoff?
Answer: The Exploration vs. Exploitation tradeoff is the dilemma faced by a reinforcement
learning agent. The agent must decide whether to "explore" new, unknown actions to discover
potentially better rewards, or to "exploit" the current knowledge and choose the action that has
yielded the best reward in the past.
Module 6: Natural Language Processing (NLP) with Deep Learning
Question 1: How do word embeddings like Word2Vec improve upon traditional text
representations like one-hot encoding?
Answer: One-hot encoding creates sparse, high-dimensional vectors that treat each word as an
independent entity, without capturing any semantic relationships. Word embeddings, in contrast,
represent words as dense, low-dimensional vectors in a continuous space, where semantically
similar words are located closer to each other.
Question 2: Explain the significance of the Transformer architecture in modern NLP.
Answer: The Transformer architecture, particularly with its self-attention mechanism,
revolutionized NLP by allowing models to capture long-range dependencies in text more
effectively than RNNs. This enabled the development of large-scale pre-trained models like
BERT and GPT, which have become the foundation for a wide range of state-of-the-art NLP
applications.
Question 3: What is the primary difference between BERT and GPT?
Answer: BERT (Bidirectional Encoder Representations from Transformers) is an encoder-only
model that is pre-trained to understand the context of a sentence by looking at words both to its
left and right. It is best suited for tasks like sentiment analysis and question answering. GPT
(Generative Pre-trained Transformer) is a decoder-only model that is trained to predict the next
word in a sequence and is primarily used for text generation.
Question 4: Provide an example of a sequence-to-sequence NLP task and a classification NLP
task.
Answer: Machine translation, where an input sequence of text in one language is translated into
an output sequence in another, is a classic example of a sequence-to-sequence task. Sentiment
analysis, which classifies a piece of text as having a positive, negative, or neutral sentiment, is an
example of a classification task.
Module 7: Deep Learning Frameworks & Tools
Question 1: Name two popular open-source deep learning frameworks and briefly describe one
of their key characteristics.
Answer: TensorFlow and PyTorch are two popular frameworks. TensorFlow is known for its
strong production-level deployment capabilities and its ecosystem of tools like TensorBoard for
visualization. PyTorch is known for its Pythonic and flexible interface, often favored in research
due to its dynamic computation graph.
Question 2: What is the purpose of TensorBoard?
Answer: TensorBoard is a visualization tool that is part of the TensorFlow ecosystem (and can
be used with other frameworks). It allows users to visualize model graphs, track and plot metrics
and loss functions over time, view histograms of weights, and visualize embeddings, which is
crucial for debugging and understanding the training process.
Question 3: What is the role of a data loading pipeline in a deep learning workflow?
Answer: A data loading pipeline is responsible for efficiently fetching, preprocessing, and
batching data for the model to consume during training and inference. It handles tasks like
reading data from files, applying transformations (e.g., resizing images), and shuffling the
dataset to ensure the model sees a diverse range of examples.
Question 4: How do debugging and profiling tools help in the deep learning workflow?
Answer: Debugging tools help identify and fix issues with the model's implementation, such as
incorrect loss calculations or vanishing gradients. Profiling tools help analyze the performance of
the model, identifying computational bottlenecks in the code (e.g., a slow data loader or an
inefficient layer) and suggesting ways to optimize it.
Module 8: Deployment & Optimization of Deep Learning Models
Question 1: Define model compression and provide two common techniques.
Answer: Model compression is the process of reducing the size and computational requirements
of a deep learning model to make it more efficient for deployment, especially on resource-
constrained devices. Two common techniques are pruning (removing redundant connections)
and quantization (reducing the precision of model weights).
Question 2: What are some key considerations for deploying a deep learning model on edge
devices (Edge AI)?
Answer: Key considerations include the limited computational power, memory, and energy
consumption of the device. The model must be highly optimized for inference speed and size.
Additionally, communication with the cloud, security, and the need for robust performance in
real-world, often unpredictable environments are important factors.
Question 3: What is MLOps, and why is it important for deploying deep learning models?
Answer: MLOps (Machine Learning Operations) is a set of practices that combines machine
learning, DevOps, and data engineering. It is important because it provides a framework for
managing the entire lifecycle of a deep learning model, from data collection and training to
deployment, monitoring, and maintenance, ensuring that models are reliable, scalable, and
reproducible in production.
Question 4: Name one advantage and one disadvantage of deploying a model on a cloud
platform (e.g., AWS, Google AI Platform).
Answer: An advantage of cloud deployment is scalability, as cloud platforms can easily handle
large-scale inference requests and provide access to powerful hardware. A disadvantage is cost,
as running large models on cloud infrastructure can be expensive, and there can be concerns
about data privacy and vendor lock-in.
Module 9: Ethics, Fairness & Explainable AI (XAI)
Question 1: How can bias be introduced into an AI system?
Answer: Bias can be introduced at various stages, most notably in the training data itself if it is
unrepresentative or reflects societal prejudices. It can also be introduced through the algorithm's
design (e.g., a skewed objective function) or in the way the model is evaluated and deployed.
Question 2: What is the goal of Explainable AI (XAI)?
Answer: The goal of XAI is to make AI systems more transparent and understandable by
humans. This involves developing techniques that can explain how a model arrived at a
particular decision or prediction, which is crucial for building trust, debugging models, ensuring
fairness, and complying with regulations.
Question 3: Describe one common XAI technique, such as LIME or SHAP.
Answer: LIME (Local Interpretable Model-agnostic Explanations) is a technique that explains
the predictions of any black-box model. It works by training a simple, interpretable model (like a
linear model) on a perturbed version of the data instance to be explained. This simple model's
predictions mimic the complex model's behavior locally around the instance, providing an
explanation of why the original model made its decision.
Question 4: Explain the concept of Federated Learning and its relevance to privacy-preserving
AI.
Answer: Federated Learning is a decentralized machine learning approach where a shared model
is trained across multiple user devices without the need for the raw data to be sent to a central
server. This is a key privacy-preserving technique because sensitive data remains on the local
device, and only model updates are shared, thus reducing the risk of data exposure.
Module 10: Advanced Topics & Research Trends
Question 1: What is a Graph Neural Network (GNN), and what kind of data is it used for?
Answer: A GNN is a type of neural network designed to operate on graph-structured data. It
learns representations of nodes by aggregating information from their neighbors. GNNs are used
for tasks like social network analysis, molecule property prediction, and recommendation
systems, where the relationships between data points are as important as the data points
themselves.
Question 2: Briefly describe Self-Supervised Learning.
Answer: Self-supervised learning is a technique where a model learns representations from
unlabeled data by creating its own supervisory signals. It involves training the model on a
"pretext task," such as predicting a masked word in a sentence or rotating an image back to its
original orientation, and then using the learned representations for a downstream task.
Question 3: How does Causal Inference with Deep Learning differ from standard predictive
modeling?
Answer: Standard predictive modeling focuses on correlation and predicting an outcome based
on a given input. Causal inference, however, aims to understand the cause-and-effect
relationships between variables. Using deep learning for causal inference involves building
models that can estimate the effect of an intervention or action, going beyond simple prediction
to answer "what if" questions.
Question 4: What is Neuro-Symbolic AI?
Answer: Neuro-Symbolic AI is an approach that combines the strengths of neural networks
(which excel at pattern recognition and learning from data) with symbolic AI (which uses rules,
logic, and reasoning). The goal is to create systems that are not only capable of learning from
data but can also reason, explain their decisions, and generalize to new tasks more effectively by
leveraging symbolic knowledge.