0% found this document useful (0 votes)

49 views9 pages

Lecture5 MCQ Guide

This document serves as a study guide for deep learning fundamentals, covering key concepts such as neural networks, activation functions, training processes, and various architectures like CNNs and RNNs. It also discusses challenges in deep learning, including overfitting and gradient issues, along with regularization techniques and transfer learning. Additionally, it contains multiple-choice questions and calculation problems to test understanding of the material.

Uploaded by

pereraasp2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views9 pages

Lecture5 MCQ Guide

Uploaded by

pereraasp2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Lecture 5: Deep Learning Fundamentals - MCQ

Study Guide
Key Concepts Explained Simply
Introduction to Deep Learning
What is Deep Learning? Deep learning is a subset of machine learning
that uses neural networks with multiple layers (deep neural networks) to learn
from data. It’s like having a brain with many layers of neurons that can learn
increasingly complex patterns.

Why Deep Learning?

• Can automatically discover features from raw data
• Performs well on unstructured data (images, text, audio)
• Scales well with more data and computation
• Achieves state-of-the-art results in many domains

Difference from Traditional Machine Learning

• Traditional ML: Requires manual feature engineering
• Deep Learning: Automatically learns features from raw data

Layers
• Input Layer: Receives the raw input data
• Hidden Layers: Process information from previous layers
• Output Layer: Produces the final prediction

Activation Functions
• Sigmoid: f(x) = 1/(1+e^(-x))
– Range: (0, 1)
– Used for binary classification
• Tanh: f(x) = (e^x - e(-x))/(e x + e^(-x))

1
– Range: (-1, 1)
– Zero-centered
• ReLU (Rectified Linear Unit): f(x) = max(0, x)
– Range: [0, ∞)
– Most commonly used in hidden layers
• Leaky ReLU: f(x) = max(�x, x) where � is a small constant
– Prevents “dying ReLU” problem
• Softmax: f(x_i) = e(x_i)/Σ(e (x_j))
– Used for multi-class classification
– Outputs sum to 1

Neural Network Training

Forward Propagation
• Process of computing the output of a neural network given an input
• Information flows from input layer through hidden layers to output layer

Loss Functions
• Mean Squared Error (MSE): For regression
– MSE = (1/n) × Σ(y_actual - y_predicted)²
• Binary Cross-Entropy: For binary classification
– BCE = -(1/n) × Σ[y × log(p) + (1-y) × log(1-p)]
• Categorical Cross-Entropy: For multi-class classification
– CCE = -(1/n) × Σ[Σ(y_ij × log(p_ij))]

Backpropagation
• Algorithm to compute gradients of the loss with respect to weights
• Uses the chain rule of calculus
• Gradients flow backward from output layer to input layer

Gradient Descent
• Optimization algorithm that adjusts weights to minimize the loss
• Types:
– Batch Gradient Descent: Uses all training examples
– Stochastic Gradient Descent (SGD): Uses one example at a time
– Mini-batch Gradient Descent: Uses a small batch of examples

Learning Rate
• Controls how much weights are adjusted in each step
• Too high: May overshoot the minimum
• Too low: May take too long to converge or get stuck

2
Deep Learning Architectures
Feedforward Neural Networks (FNN)
• Simplest type of neural network
• Information flows in one direction (no loops)
• Used for structured data and tabular datasets

Convolutional Neural Networks (CNN)

• Specialized for grid-like data (e.g., images)
• Key components:
– Convolutional layers: Apply filters to detect features
– Pooling layers: Reduce spatial dimensions
– Fully connected layers: Final classification
• Operations:
– Convolution: Sliding window operation with filters
– Pooling: Max pooling, average pooling

Recurrent Neural Networks (RNN)

• Designed for sequential data (e.g., time series, text)
• Has loops to maintain information over time
• Types:
– Simple RNN: Basic recurrent structure
– LSTM (Long Short-Term Memory): Solves vanishing gradient prob-
lem
– GRU (Gated Recurrent Unit): Simplified version of LSTM

Transformers
• Architecture based on self-attention mechanisms
• Excels at capturing long-range dependencies
• Used in state-of-the-art natural language processing models (e.g., BERT,
GPT)

Deep Learning Challenges

Overfitting
• Model performs well on training data but poorly on new data
• Solutions:
– More training data
– Regularization (L1, L2)
– Dropout
– Early stopping
– Data augmentation

3
Vanishing/Exploding Gradients
• Vanishing: Gradients become very small during backpropagation
• Exploding: Gradients become very large during backpropagation
• Solutions:
– Proper weight initialization
– Batch normalization
– Gradient clipping
– Using ReLU or Leaky ReLU activation
– Using architectures like LSTM or GRU

Computational Requirements
• Deep learning typically requires:
– Large amounts of data
– Significant computational resources (GPUs/TPUs)
– Long training times

Regularization Techniques
Dropout
• Randomly deactivates neurons during training
• Forces the network to learn redundant representations
• Typically set to 0.2-0.5 (20-50% of neurons dropped)

L1/L2 Regularization
• Adds penalty terms to the loss function
• L1: Encourages sparse weights (some weights become exactly zero)
• L2: Encourages smaller weights overall

Batch Normalization
• Normalizes the outputs of a layer for each mini-batch
• Helps with faster training and reduces internal covariate shift
• Applied before the activation function

Early Stopping
• Stop training when validation error starts increasing
• Prevents overfitting by not training for too long

Transfer Learning
What is Transfer Learning?
• Using a pre-trained model as a starting point for a new task

4
• Leverages knowledge learned from one task to improve performance on
another

Approaches
• Feature Extraction: Use pre-trained model as a fixed feature extractor
• Fine-Tuning: Update some or all weights of the pre-trained model for
the new task

Benefits
• Requires less data for the new task
• Faster training
• Often better performance

MCQ Practice Questions

Question 1
Which activation function is most commonly used in hidden layers of
deep neural networks? - A) Sigmoid - B) Tanh - C) ReLU - D) Softmax
Answer: C) ReLU
Explanation: ReLU (Rectified Linear Unit) is the most commonly used acti-
vation function in hidden layers because it helps mitigate the vanishing gradient
problem and allows for faster training.

Question 2
What is the purpose of dropout in neural networks? - A) To speed up
training by skipping neurons - B) To prevent overfitting by randomly deactivat-
ing neurons - C) To initialize weights properly - D) To normalize inputs across
mini-batches
Answer: B) To prevent overfitting by randomly deactivating neurons
Explanation: Dropout is a regularization technique that prevents overfitting
by randomly deactivating a percentage of neurons during each training iteration,
forcing the network to learn redundant representations.

Question 3
Which of the following neural network architectures is best suited
for image classification tasks? - A) Recurrent Neural Networks (RNN) -
B) Convolutional Neural Networks (CNN) - C) Feedforward Neural Networks
(FNN) - D) Generative Adversarial Networks (GAN)
Answer: B) Convolutional Neural Networks (CNN)

5
Explanation: CNNs are specifically designed for grid-like data such as images.
Their convolutional layers can detect spatial patterns and features at different
scales, making them ideal for image classification tasks.

Question 4
What problem do LSTM networks solve that simple RNNs suffer
from? - A) Slow training speed - B) Vanishing gradient problem - C) Too
many parameters - D) Inability to process sequential data
Answer: B) Vanishing gradient problem
Explanation: LSTM (Long Short-Term Memory) networks were designed to
address the vanishing gradient problem in simple RNNs by using gates that
control the flow of information, allowing them to capture long-term dependencies
in sequential data.

Question 5
In a neural network with 784 input neurons, 128 neurons in the hidden
layer, and 10 output neurons, how many weights are there between
the input and hidden layers? - A) 100,352 - B) 10,112 - C) 912 - D) 1,280
Answer: A) 100,352
Explanation: The number of weights between two layers is calculated as the
product of the number of neurons in each layer. So, between the input and
hidden layers, there are 784 × 128 = 100,352 weights.

Question 6
Which of the following is NOT a common loss function used in deep
learning? - A) Mean Squared Error - B) Binary Cross-Entropy - C) Categorical
Cross-Entropy - D) Gini Impurity
Answer: D) Gini Impurity
Explanation: Gini Impurity is a measure used in decision trees to determine
the quality of a split. It is not a loss function used in deep learning. The
common loss functions in deep learning include Mean Squared Error, Binary
Cross-Entropy, and Categorical Cross-Entropy.

Question 7
What is the output range of the sigmoid activation function? - A) [0,
1] - B) [-1, 1] - C) [0, ∞) - D) (-∞, ∞)
Answer: A) [0, 1]

6
Explanation: The sigmoid function is defined as f(x) = 1/(1+e^(-x)), which
always produces values between 0 and 1, making it useful for binary classification
problems where the output can be interpreted as a probability.

Question 8
Which of the following techniques helps address the “internal co-
variate shift” problem in deep neural networks? - A) Dropout - B) L2
Regularization - C) Batch Normalization - D) Early Stopping
Answer: C) Batch Normalization
Explanation: Batch Normalization addresses the internal covariate shift prob-
lem by normalizing the outputs of a layer for each mini-batch, which helps
stabilize and accelerate the training process.

Question 9
In transfer learning, what is “fine-tuning”? - A) Adjusting the learning
rate during training - B) Updating some or all weights of a pre-trained model
for a new task - C) Selecting the optimal hyperparameters - D) Normalizing the
input data
Answer: B) Updating some or all weights of a pre-trained model for a new
task
Explanation: Fine-tuning in transfer learning refers to the process of taking a
pre-trained model and updating some or all of its weights to adapt it to a new,
related task, often with a smaller dataset.

Calculation Problems
Problem 1: Neural Network Output
**Consider a simple neural network with one input layer (2 neurons), one hidden
layer (3 neurons with ReLU activation), and one output layer (1 neuron with
sigmoid activation). The weights and biases are as follows: - Input to hidden:
W� = [[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]], b� = [0.1, 0.2, 0.3] - Hidden to output: W�
= [[0.7, 0.8, 0.9]], b� = [0.5]
If the input is [1, 2], calculate the output of the network.**
Solution: 1. Calculate the input to the hidden layer: - z�� = 0.1×1 + 0.2×2 +
0.1 = 0.1 + 0.4 + 0.1 = 0.6 - z�� = 0.3×1 + 0.4×2 + 0.2 = 0.3 + 0.8 + 0.2 =
1.3 - z�� = 0.5×1 + 0.6×2 + 0.3 = 0.5 + 1.2 + 0.3 = 2.0
2. Apply ReLU activation to the hidden layer:
• a�� = max(0, 0.6) = 0.6
• a�� = max(0, 1.3) = 1.3
• a�� = max(0, 2.0) = 2.0

7
3. Calculate the input to the output layer:
• z� = 0.7×0.6 + 0.8×1.3 + 0.9×2.0 + 0.5 = 0.42 + 1.04 + 1.8 + 0.5
= 3.76
4. Apply sigmoid activation to the output layer:
• a� = 1/(1+e^(-3.76)) = 1/(1+0.023) = 1/1.023 � 0.977
Therefore, the output of the network is approximately 0.977.

Problem 2: Gradient Descent Update

In a neural network, a weight w has a current value of 0.5. The
gradient of the loss with respect to this weight is calculated as 0.2.
If the learning rate is 0.1, what will be the updated weight after one
step of gradient descent?
Solution: The gradient descent update rule is: w_new = w_old - learning_rate
× gradient
w_new = 0.5 - 0.1 × 0.2 = 0.5 - 0.02 = 0.48
Therefore, the updated weight will be 0.48.

Problem 3: CNN Output Size

An image of size 32×32×3 (height × width × channels) is passed
through a convolutional layer with 16 filters of size 5×5, stride of 2,
and padding of 1. What will be the dimensions of the output feature
map?
Solution: Using the formula: Output size = ((Input size - Filter size + 2 ×
Padding) / Stride) + 1
Height: ((32 - 5 + 2×1) / 2) + 1 = ((32 - 5 + 2) / 2) + 1 = (29 / 2) + 1 =
14.5 + 1 = 15.5, which rounds down to 15 Width: ((32 - 5 + 2×1) / 2) + 1 =
((32 - 5 + 2) / 2) + 1 = (29 / 2) + 1 = 14.5 + 1 = 15.5, which rounds down to
15 Channels: Number of filters = 16
Therefore, the output feature map dimensions will be 15×15×16.

Problem 4: Cross-Entropy Loss

In a binary classification problem, a model predicts a probability of
0.8 for a positive instance (actual label is 1) and a probability of
0.3 for a negative instance (actual label is 0). Calculate the binary
cross-entropy loss for these two predictions.
Solution: Binary cross-entropy loss for a single instance is: BCE = -[y × log(p)
+ (1-y) × log(1-p)]
For the positive instance (y=1, p=0.8): BCE� = -[1 × log(0.8) + (1-1) × log(1-
0.8)] = -log(0.8) = -(-0.223) = 0.223

8
For the negative instance (y=0, p=0.3): BCE� = -[0 × log(0.3) + (1-0) × log(1-
0.3)] = -log(0.7) = -(-0.357) = 0.357
Average binary cross-entropy loss: BCE_avg = (0.223 + 0.357) / 2 = 0.29
Therefore, the binary cross-entropy loss is 0.29.

Key Formulas to Remember

1. Neuron Output: f(w�x� + w�x� + … + w�x� + b)
2. Activation Functions:
• Sigmoid: f(x) = 1/(1+e^(-x))
• ReLU: f(x) = max(0, x)
• Tanh: f(x) = (e^x - e(-x))/(e x + e^(-x))
• Softmax: f(x_i) = e(x_i)/Σ(e (x_j))
3. Loss Functions:
• MSE: (1/n) × Σ(y_actual - y_predicted)²
• Binary Cross-Entropy: -(1/n) × Σ[y × log(p) + (1-y) × log(1-p)]
• Categorical Cross-Entropy: -(1/n) × Σ[Σ(y_ij × log(p_ij))]
4. Gradient Descent Update: w_new = w_old - learning_rate × gradi-
ent
5. CNN Output Size: ((Input size - Filter size + 2 × Padding) / Stride)
+1
6. Number of Parameters:
• Between two fully connected layers: n_inputs × n_outputs +
n_outputs (weights + biases)
• In a convolutional layer: (filter_height × filter_width × in-
put_channels + 1) × num_filters

Tips for MCQ Questions

1. Understand the architectures: Know which neural network architec-
ture is appropriate for different types of data.
2. Know activation functions: Understand the properties and use cases
of different activation functions.
3. Calculate network parameters: Be able to calculate the number of
weights and biases in a network.
4. Understand regularization: Know how different regularization tech-
niques work and when to use them.
5. Practice forward propagation: Be comfortable with calculating the
output of a simple neural network given weights and inputs.

Deep Learing
No ratings yet
Deep Learing
37 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Deep Learning Questions
No ratings yet
Deep Learning Questions
17 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Unit II
No ratings yet
Unit II
56 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Deep Learning Techniques: 1. Define Neural Networks
No ratings yet
Deep Learning Techniques: 1. Define Neural Networks
31 pages
Algorithmic Advances
No ratings yet
Algorithmic Advances
5 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
Deep Learning Report For Students
No ratings yet
Deep Learning Report For Students
32 pages
Unit 2 Notes NLP
No ratings yet
Unit 2 Notes NLP
6 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Deep Learning Algorithms
No ratings yet
Deep Learning Algorithms
19 pages
ISE-1 Imp DLPDF
No ratings yet
ISE-1 Imp DLPDF
28 pages
Deep Learning Final
No ratings yet
Deep Learning Final
17 pages
Deep Learning Essentials for Experts
No ratings yet
Deep Learning Essentials for Experts
8 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
Notes For Deep Learning
No ratings yet
Notes For Deep Learning
6 pages
120 Deep Learning Important Questions + Answers ?
No ratings yet
120 Deep Learning Important Questions + Answers ?
68 pages
Group I
No ratings yet
Group I
20 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
NNML Full
No ratings yet
NNML Full
19 pages
6.1 DeepFFNets M2
No ratings yet
6.1 DeepFFNets M2
48 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Deep Learning Exam Questions Explained
No ratings yet
Deep Learning Exam Questions Explained
5 pages
Neural Networks: Concepts and Challenges
No ratings yet
Neural Networks: Concepts and Challenges
13 pages
Deep Learning 117 MCQ
100% (2)
Deep Learning 117 MCQ
33 pages
NoteGPT Summary DL Mod1
No ratings yet
NoteGPT Summary DL Mod1
3 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Deep Learning Concepts and Techniques
No ratings yet
Deep Learning Concepts and Techniques
36 pages
Genai See
No ratings yet
Genai See
51 pages
Unit 4
No ratings yet
Unit 4
86 pages
Deep vs. Shallow Neural Networks
No ratings yet
Deep vs. Shallow Neural Networks
12 pages
DL Cie2
No ratings yet
DL Cie2
5 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Deep Learning Techniques and Architectures
No ratings yet
Deep Learning Techniques and Architectures
35 pages
Review of Deep Learning Algorithms and Architectur
No ratings yet
Review of Deep Learning Algorithms and Architectur
29 pages
Lecture No 6 Deep Learning Algorithm
No ratings yet
Lecture No 6 Deep Learning Algorithm
37 pages
Revision Questions - Lecture 1
No ratings yet
Revision Questions - Lecture 1
6 pages
Ch4 and Ch5 Notes
No ratings yet
Ch4 and Ch5 Notes
38 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
4 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
DL Internal
No ratings yet
DL Internal
9 pages
Deep Learning
No ratings yet
Deep Learning
8 pages
Exam Gen AI
No ratings yet
Exam Gen AI
14 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
Deep Learning
No ratings yet
Deep Learning
87 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
6.3 HiddenUnits
No ratings yet
6.3 HiddenUnits
26 pages
02 Neural Networks 1
No ratings yet
02 Neural Networks 1
24 pages
Four Unit
No ratings yet
Four Unit
3 pages
Neural Networks & Deep Learning MCQs
100% (1)
Neural Networks & Deep Learning MCQs
6 pages
Eigenvalues in Dynamical Systems
No ratings yet
Eigenvalues in Dynamical Systems
2 pages
Numerical Solutions of ODEs Assignment
No ratings yet
Numerical Solutions of ODEs Assignment
5 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
70 pages
Engineering Curve Fitting Guide
No ratings yet
Engineering Curve Fitting Guide
4 pages
有限元素分析劉德騏
No ratings yet
有限元素分析劉德騏
3 pages
Qabd CH3
No ratings yet
Qabd CH3
109 pages
Agri Lab Manual Merged
No ratings yet
Agri Lab Manual Merged
25 pages
BCS 042 RJT Notes
No ratings yet
BCS 042 RJT Notes
25 pages
Grade 8 Polynomial Factoring Guide
No ratings yet
Grade 8 Polynomial Factoring Guide
1 page
Unit III-V - Sorting at CSJMU - 6 Slides Handouts
No ratings yet
Unit III-V - Sorting at CSJMU - 6 Slides Handouts
7 pages
Transportation Problem and Assignment Problem
No ratings yet
Transportation Problem and Assignment Problem
12 pages
5 Primaldual
No ratings yet
5 Primaldual
10 pages
20A54702 Numerical Methods For Engineers
No ratings yet
20A54702 Numerical Methods For Engineers
2 pages
Why Split-Radix Algorithm?: 18/12/2014 Vlsi Implementation of SRFFT
No ratings yet
Why Split-Radix Algorithm?: 18/12/2014 Vlsi Implementation of SRFFT
2 pages
Kruskal's Algorithm for MST Calculation
No ratings yet
Kruskal's Algorithm for MST Calculation
8 pages
Math10 Q1mod5 Long and Synthetic Division Laila Kiwisen Bgo v2
No ratings yet
Math10 Q1mod5 Long and Synthetic Division Laila Kiwisen Bgo v2
27 pages
Exponents & Polynomials Guide
No ratings yet
Exponents & Polynomials Guide
4 pages
CH-2 Polynomials
No ratings yet
CH-2 Polynomials
6 pages
19 Newton Method
No ratings yet
19 Newton Method
10 pages
Optimization 3rd Lecture 3rd Stage 2024-2025 - 241013 - 002111
No ratings yet
Optimization 3rd Lecture 3rd Stage 2024-2025 - 241013 - 002111
20 pages
Unit 3 - 3 - Graph Traversal - BFS - DFS
No ratings yet
Unit 3 - 3 - Graph Traversal - BFS - DFS
23 pages
Optimization of Chemical Processes (Che1011)
No ratings yet
Optimization of Chemical Processes (Che1011)
20 pages
Class X Maths Monthly Test April 2024
No ratings yet
Class X Maths Monthly Test April 2024
4 pages
Ch-2 Practice Worksheet of Mathematics
No ratings yet
Ch-2 Practice Worksheet of Mathematics
3 pages
Introduction To Randomized Algorithms
No ratings yet
Introduction To Randomized Algorithms
18 pages
1 What Is A Randomized Algorithm?: Lecture Notes CS:5360 Randomized Algorithms
No ratings yet
1 What Is A Randomized Algorithm?: Lecture Notes CS:5360 Randomized Algorithms
8 pages
3A
No ratings yet
3A
218 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Solutions of Nonlinear Equations
No ratings yet
Solutions of Nonlinear Equations
68 pages
Computational Techniques for Chem Eng
No ratings yet
Computational Techniques for Chem Eng
2 pages