0% found this document useful (0 votes)
49 views9 pages

Lecture5 MCQ Guide

This document serves as a study guide for deep learning fundamentals, covering key concepts such as neural networks, activation functions, training processes, and various architectures like CNNs and RNNs. It also discusses challenges in deep learning, including overfitting and gradient issues, along with regularization techniques and transfer learning. Additionally, it contains multiple-choice questions and calculation problems to test understanding of the material.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views9 pages

Lecture5 MCQ Guide

This document serves as a study guide for deep learning fundamentals, covering key concepts such as neural networks, activation functions, training processes, and various architectures like CNNs and RNNs. It also discusses challenges in deep learning, including overfitting and gradient issues, along with regularization techniques and transfer learning. Additionally, it contains multiple-choice questions and calculation problems to test understanding of the material.

Uploaded by

pereraasp2022
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Lecture 5: Deep Learning Fundamentals - MCQ

Study Guide
Key Concepts Explained Simply
Introduction to Deep Learning
What is Deep Learning? Deep learning is a subset of machine learning
that uses neural networks with multiple layers (deep neural networks) to learn
from data. It’s like having a brain with many layers of neurons that can learn
increasingly complex patterns.

Why Deep Learning?


• Can automatically discover features from raw data
• Performs well on unstructured data (images, text, audio)
• Scales well with more data and computation
• Achieves state-of-the-art results in many domains

Difference from Traditional Machine Learning


• Traditional ML: Requires manual feature engineering
• Deep Learning: Automatically learns features from raw data

Neural Network Basics


Neurons (Perceptrons)
• What it is: Basic computational unit of neural networks
• Components:
– Inputs (x�, x�, …, x�)
– Weights (w�, w�, …, w�)
– Bias (b)
– Activation function (f)
• Output: f(w�x� + w�x� + … + w�x� + b)

Layers
• Input Layer: Receives the raw input data
• Hidden Layers: Process information from previous layers
• Output Layer: Produces the final prediction

Activation Functions
• Sigmoid: f(x) = 1/(1+e^(-x))
– Range: (0, 1)
– Used for binary classification
• Tanh: f(x) = (e^x - e(-x))/(e x + e^(-x))

1
– Range: (-1, 1)
– Zero-centered
• ReLU (Rectified Linear Unit): f(x) = max(0, x)
– Range: [0, ∞)
– Most commonly used in hidden layers
• Leaky ReLU: f(x) = max(�x, x) where � is a small constant
– Prevents “dying ReLU” problem
• Softmax: f(x_i) = e(x_i)/Σ(e (x_j))
– Used for multi-class classification
– Outputs sum to 1

Neural Network Training


Forward Propagation
• Process of computing the output of a neural network given an input
• Information flows from input layer through hidden layers to output layer

Loss Functions
• Mean Squared Error (MSE): For regression
– MSE = (1/n) × Σ(y_actual - y_predicted)²
• Binary Cross-Entropy: For binary classification
– BCE = -(1/n) × Σ[y × log(p) + (1-y) × log(1-p)]
• Categorical Cross-Entropy: For multi-class classification
– CCE = -(1/n) × Σ[Σ(y_ij × log(p_ij))]

Backpropagation
• Algorithm to compute gradients of the loss with respect to weights
• Uses the chain rule of calculus
• Gradients flow backward from output layer to input layer

Gradient Descent
• Optimization algorithm that adjusts weights to minimize the loss
• Types:
– Batch Gradient Descent: Uses all training examples
– Stochastic Gradient Descent (SGD): Uses one example at a time
– Mini-batch Gradient Descent: Uses a small batch of examples

Learning Rate
• Controls how much weights are adjusted in each step
• Too high: May overshoot the minimum
• Too low: May take too long to converge or get stuck

2
Deep Learning Architectures
Feedforward Neural Networks (FNN)
• Simplest type of neural network
• Information flows in one direction (no loops)
• Used for structured data and tabular datasets

Convolutional Neural Networks (CNN)


• Specialized for grid-like data (e.g., images)
• Key components:
– Convolutional layers: Apply filters to detect features
– Pooling layers: Reduce spatial dimensions
– Fully connected layers: Final classification
• Operations:
– Convolution: Sliding window operation with filters
– Pooling: Max pooling, average pooling

Recurrent Neural Networks (RNN)


• Designed for sequential data (e.g., time series, text)
• Has loops to maintain information over time
• Types:
– Simple RNN: Basic recurrent structure
– LSTM (Long Short-Term Memory): Solves vanishing gradient prob-
lem
– GRU (Gated Recurrent Unit): Simplified version of LSTM

Transformers
• Architecture based on self-attention mechanisms
• Excels at capturing long-range dependencies
• Used in state-of-the-art natural language processing models (e.g., BERT,
GPT)

Deep Learning Challenges


Overfitting
• Model performs well on training data but poorly on new data
• Solutions:
– More training data
– Regularization (L1, L2)
– Dropout
– Early stopping
– Data augmentation

3
Vanishing/Exploding Gradients
• Vanishing: Gradients become very small during backpropagation
• Exploding: Gradients become very large during backpropagation
• Solutions:
– Proper weight initialization
– Batch normalization
– Gradient clipping
– Using ReLU or Leaky ReLU activation
– Using architectures like LSTM or GRU

Computational Requirements
• Deep learning typically requires:
– Large amounts of data
– Significant computational resources (GPUs/TPUs)
– Long training times

Regularization Techniques
Dropout
• Randomly deactivates neurons during training
• Forces the network to learn redundant representations
• Typically set to 0.2-0.5 (20-50% of neurons dropped)

L1/L2 Regularization
• Adds penalty terms to the loss function
• L1: Encourages sparse weights (some weights become exactly zero)
• L2: Encourages smaller weights overall

Batch Normalization
• Normalizes the outputs of a layer for each mini-batch
• Helps with faster training and reduces internal covariate shift
• Applied before the activation function

Early Stopping
• Stop training when validation error starts increasing
• Prevents overfitting by not training for too long

Transfer Learning
What is Transfer Learning?
• Using a pre-trained model as a starting point for a new task

4
• Leverages knowledge learned from one task to improve performance on
another

Approaches
• Feature Extraction: Use pre-trained model as a fixed feature extractor
• Fine-Tuning: Update some or all weights of the pre-trained model for
the new task

Benefits
• Requires less data for the new task
• Faster training
• Often better performance

MCQ Practice Questions


Question 1
Which activation function is most commonly used in hidden layers of
deep neural networks? - A) Sigmoid - B) Tanh - C) ReLU - D) Softmax
Answer: C) ReLU
Explanation: ReLU (Rectified Linear Unit) is the most commonly used acti-
vation function in hidden layers because it helps mitigate the vanishing gradient
problem and allows for faster training.

Question 2
What is the purpose of dropout in neural networks? - A) To speed up
training by skipping neurons - B) To prevent overfitting by randomly deactivat-
ing neurons - C) To initialize weights properly - D) To normalize inputs across
mini-batches
Answer: B) To prevent overfitting by randomly deactivating neurons
Explanation: Dropout is a regularization technique that prevents overfitting
by randomly deactivating a percentage of neurons during each training iteration,
forcing the network to learn redundant representations.

Question 3
Which of the following neural network architectures is best suited
for image classification tasks? - A) Recurrent Neural Networks (RNN) -
B) Convolutional Neural Networks (CNN) - C) Feedforward Neural Networks
(FNN) - D) Generative Adversarial Networks (GAN)
Answer: B) Convolutional Neural Networks (CNN)

5
Explanation: CNNs are specifically designed for grid-like data such as images.
Their convolutional layers can detect spatial patterns and features at different
scales, making them ideal for image classification tasks.

Question 4
What problem do LSTM networks solve that simple RNNs suffer
from? - A) Slow training speed - B) Vanishing gradient problem - C) Too
many parameters - D) Inability to process sequential data
Answer: B) Vanishing gradient problem
Explanation: LSTM (Long Short-Term Memory) networks were designed to
address the vanishing gradient problem in simple RNNs by using gates that
control the flow of information, allowing them to capture long-term dependencies
in sequential data.

Question 5
In a neural network with 784 input neurons, 128 neurons in the hidden
layer, and 10 output neurons, how many weights are there between
the input and hidden layers? - A) 100,352 - B) 10,112 - C) 912 - D) 1,280
Answer: A) 100,352
Explanation: The number of weights between two layers is calculated as the
product of the number of neurons in each layer. So, between the input and
hidden layers, there are 784 × 128 = 100,352 weights.

Question 6
Which of the following is NOT a common loss function used in deep
learning? - A) Mean Squared Error - B) Binary Cross-Entropy - C) Categorical
Cross-Entropy - D) Gini Impurity
Answer: D) Gini Impurity
Explanation: Gini Impurity is a measure used in decision trees to determine
the quality of a split. It is not a loss function used in deep learning. The
common loss functions in deep learning include Mean Squared Error, Binary
Cross-Entropy, and Categorical Cross-Entropy.

Question 7
What is the output range of the sigmoid activation function? - A) [0,
1] - B) [-1, 1] - C) [0, ∞) - D) (-∞, ∞)
Answer: A) [0, 1]

6
Explanation: The sigmoid function is defined as f(x) = 1/(1+e^(-x)), which
always produces values between 0 and 1, making it useful for binary classification
problems where the output can be interpreted as a probability.

Question 8
Which of the following techniques helps address the “internal co-
variate shift” problem in deep neural networks? - A) Dropout - B) L2
Regularization - C) Batch Normalization - D) Early Stopping
Answer: C) Batch Normalization
Explanation: Batch Normalization addresses the internal covariate shift prob-
lem by normalizing the outputs of a layer for each mini-batch, which helps
stabilize and accelerate the training process.

Question 9
In transfer learning, what is “fine-tuning”? - A) Adjusting the learning
rate during training - B) Updating some or all weights of a pre-trained model
for a new task - C) Selecting the optimal hyperparameters - D) Normalizing the
input data
Answer: B) Updating some or all weights of a pre-trained model for a new
task
Explanation: Fine-tuning in transfer learning refers to the process of taking a
pre-trained model and updating some or all of its weights to adapt it to a new,
related task, often with a smaller dataset.

Calculation Problems
Problem 1: Neural Network Output
**Consider a simple neural network with one input layer (2 neurons), one hidden
layer (3 neurons with ReLU activation), and one output layer (1 neuron with
sigmoid activation). The weights and biases are as follows: - Input to hidden:
W� = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]], b� = [0.1, 0.2, 0.3] - Hidden to output: W�
= [[0.7, 0.8, 0.9]], b� = [0.5]
If the input is [1, 2], calculate the output of the network.**
Solution: 1. Calculate the input to the hidden layer: - z�� = 0.1×1 + 0.2×2 +
0.1 = 0.1 + 0.4 + 0.1 = 0.6 - z�� = 0.3×1 + 0.4×2 + 0.2 = 0.3 + 0.8 + 0.2 =
1.3 - z�� = 0.5×1 + 0.6×2 + 0.3 = 0.5 + 1.2 + 0.3 = 2.0
2. Apply ReLU activation to the hidden layer:
• a�� = max(0, 0.6) = 0.6
• a�� = max(0, 1.3) = 1.3
• a�� = max(0, 2.0) = 2.0

7
3. Calculate the input to the output layer:
• z� = 0.7×0.6 + 0.8×1.3 + 0.9×2.0 + 0.5 = 0.42 + 1.04 + 1.8 + 0.5
= 3.76
4. Apply sigmoid activation to the output layer:
• a� = 1/(1+e^(-3.76)) = 1/(1+0.023) = 1/1.023 � 0.977
Therefore, the output of the network is approximately 0.977.

Problem 2: Gradient Descent Update


In a neural network, a weight w has a current value of 0.5. The
gradient of the loss with respect to this weight is calculated as 0.2.
If the learning rate is 0.1, what will be the updated weight after one
step of gradient descent?
Solution: The gradient descent update rule is: w_new = w_old - learning_rate
× gradient
w_new = 0.5 - 0.1 × 0.2 = 0.5 - 0.02 = 0.48
Therefore, the updated weight will be 0.48.

Problem 3: CNN Output Size


An image of size 32×32×3 (height × width × channels) is passed
through a convolutional layer with 16 filters of size 5×5, stride of 2,
and padding of 1. What will be the dimensions of the output feature
map?
Solution: Using the formula: Output size = ((Input size - Filter size + 2 ×
Padding) / Stride) + 1
Height: ((32 - 5 + 2×1) / 2) + 1 = ((32 - 5 + 2) / 2) + 1 = (29 / 2) + 1 =
14.5 + 1 = 15.5, which rounds down to 15 Width: ((32 - 5 + 2×1) / 2) + 1 =
((32 - 5 + 2) / 2) + 1 = (29 / 2) + 1 = 14.5 + 1 = 15.5, which rounds down to
15 Channels: Number of filters = 16
Therefore, the output feature map dimensions will be 15×15×16.

Problem 4: Cross-Entropy Loss


In a binary classification problem, a model predicts a probability of
0.8 for a positive instance (actual label is 1) and a probability of
0.3 for a negative instance (actual label is 0). Calculate the binary
cross-entropy loss for these two predictions.
Solution: Binary cross-entropy loss for a single instance is: BCE = -[y × log(p)
+ (1-y) × log(1-p)]
For the positive instance (y=1, p=0.8): BCE� = -[1 × log(0.8) + (1-1) × log(1-
0.8)] = -log(0.8) = -(-0.223) = 0.223

8
For the negative instance (y=0, p=0.3): BCE� = -[0 × log(0.3) + (1-0) × log(1-
0.3)] = -log(0.7) = -(-0.357) = 0.357
Average binary cross-entropy loss: BCE_avg = (0.223 + 0.357) / 2 = 0.29
Therefore, the binary cross-entropy loss is 0.29.

Key Formulas to Remember


1. Neuron Output: f(w�x� + w�x� + … + w�x� + b)
2. Activation Functions:
• Sigmoid: f(x) = 1/(1+e^(-x))
• ReLU: f(x) = max(0, x)
• Tanh: f(x) = (e^x - e(-x))/(e x + e^(-x))
• Softmax: f(x_i) = e(x_i)/Σ(e (x_j))
3. Loss Functions:
• MSE: (1/n) × Σ(y_actual - y_predicted)²
• Binary Cross-Entropy: -(1/n) × Σ[y × log(p) + (1-y) × log(1-p)]
• Categorical Cross-Entropy: -(1/n) × Σ[Σ(y_ij × log(p_ij))]
4. Gradient Descent Update: w_new = w_old - learning_rate × gradi-
ent
5. CNN Output Size: ((Input size - Filter size + 2 × Padding) / Stride)
+1
6. Number of Parameters:
• Between two fully connected layers: n_inputs × n_outputs +
n_outputs (weights + biases)
• In a convolutional layer: (filter_height × filter_width × in-
put_channels + 1) × num_filters

Tips for MCQ Questions


1. Understand the architectures: Know which neural network architec-
ture is appropriate for different types of data.
2. Know activation functions: Understand the properties and use cases
of different activation functions.
3. Calculate network parameters: Be able to calculate the number of
weights and biases in a network.
4. Understand regularization: Know how different regularization tech-
niques work and when to use them.
5. Practice forward propagation: Be comfortable with calculating the
output of a simple neural network given weights and inputs.

You might also like