EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
Part 1 Questions
1. What is the main function of the perceptron?
a) To compute the loss function
b) To perform classification based on linear separability
c) To apply backpropagation
d) To store data for batch learning
2. Which of the following is a limitation of using only one layer of perceptron?
a) It uses sigmoid activation
b) It cannot classify linearly separable data
c) It cannot solve XOR problem
d) It requires a large datas
3. What does the backpropagation aim to minimize in machine learning?
a) Momentum
b) Learning rate
c) Prediction error
d) Number of epochs
4. Which of the following is NOT required for gradient descent in a dense neural network?
a) Learning rate
b) Activation function
c) Gradient of loss function
d) Noise function
Part 1 Questions 1/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
5. In gradient descent, the weights are updated in which direction?
a) In the direction of the gradient
b) In the opposite direction of the gradient
c) Randomly
d) Perpendicular to the gradient
6. What is the purpose of the learning rate in training neural networks?
a) To set the activation threshold
b) To control the number of layers
c) To determine the step size of weight updates
d) To normalize inputs
7. What is a key feature of the backpropagation algorithm?
a) Random weight updates
b) Only weights in the output layer changes
c) Layer-wise error propagation
d) No need for activation function
8. Which activation function is differentiable AND commonly used in backpropagation?
a) Step function
b) Linear function
c) Sigmoid function
d) ReLU
9. In a feedforward neural network, how is data processed?
a) Randomly through layers
b) Forward only, without feedback loops
c) Backward only
d) Bidirectionally
Part 1 Questions 2/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
10. Which of the following is most likely to lead to a good test loss in multi-layer
perceptrons?
a) Large learning rates
b) Overfitting the training data
c) Using hidden layers and appropriate training
d) Removing hidden layers
11. What is the main reason for using hidden layers in a neural network?
a) To make the network slower
b) To enable learning of non-linear functions
c) To increase data redundancy
d) To simplify backpropagation
12. How is weight initialization important in training a neural network?
a) It has no effect on convergence
b) Poor initialization can lead to vanishing gradients
c) All weights must be initialized to zero
d) It prevents overfitting
1
13. The sigmoid activation is given by 𝜎(𝑣) = 1+𝑒 −𝑣. What is 𝜎(0)?
a) 0
b) 0.5
c) 1
d) 𝑒
Part 1 Questions 3/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
14. During training, a weight 𝑤 is updated using gradient descent as 𝑤(𝑡) = 𝑤(𝑡 − 1) −
𝜕𝐸 𝜕𝐸
𝜂 𝜕𝑤. If 𝜂 = 0.1 and 𝜕𝑤 = 0.4, what is the weight update Δ𝑤?
a) 0.04
b) -0.04
c) 0.4
d) -0.4
1 2
15. The mean squared error is given by 𝐸 = 𝑛=1 ∑𝑝=1(𝑑𝑝,𝑛 − 𝑦𝑝,𝑛 ) , where 𝑁 is the
∑𝑁 𝑃
𝑁𝑃
number of samples and 𝑃 is the number of dimensions. Given the following desired
output 𝐝 and output 𝐲, calculate the mean squared error. The number of rows
corresponds to the number of samples, while the number of columns corresponds to
the number of dimensions at the output.
0 1 0 1
𝐝=[ ], 𝐲=[ ]
1 2 0 1
0
a) [ ]
2
b) 0.5
c) [1 1]
d) 1
Part 1 Questions 4/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
Part 2 Questions
16. Determine the output 𝐲 of the linear convolution between the input 𝐱 and filter 𝐤 below.
The filter is moved with stride-1. You can assume that the kernel moves across the row
from left to right before moving to the next row.
1 2
1 0
𝐱 = [0 1] , 𝐤=[ ], 𝐲=𝐱∗𝐤
0 1
0 0
a) 𝐲 = [2 0]
2
b) 𝐲 = [ ]
0
1 2
c) 𝐲 = [0 1]
0 0
2 0
d) 𝐲 = [ ]
0 1
17. Determine the output 𝐲 of the linear convolution between the input 𝐱 and filter 𝐤 below.
The filter is moved with stride-2 along both dimensions. The input 𝐱 is automatically
padded with zeros to the right and bottom to account for the stride. You can assume
that the kernel moves across the row from left to right before moving to the next row.
1 2 3
1 0
𝐱 = [0 1 2] , 𝐤=[ ], 𝐲=𝐱∗𝐤
0 1
0 0 1
a) 𝐲 = [2]
2
b) 𝐲 = [ ]
0
2 3
c) 𝐲 = [ ]
0 2
2 4
d) 𝐲 = [ ]
0 2
Part 2 Questions 5/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
18. Consider the input 𝐱 and the convolutional filter 𝐤 below. The filter 𝐤 is moved with one
stride along both dimensions. A 2 × 2 max-pooling filter 𝐩 is applied to the linear
convolutional output 𝐱 ∗ 𝐤 with one stride along both dimensions. Determine the output
𝐲 after both operations. You can assume that the kernel moves across the row from left
to right before moving to the next row.
1 2 3 4
1 0
𝐱=[ 0 1 2 3] , 𝐤=[ ], 𝐲= 𝐱∗𝐤
0 1
−1 0 1 2
a) 𝐲 = [4 8]
b) 𝐲 = [2 4]
2 4 6
c) 𝐲 = [ ]
0 2 4
d) 𝐲 = [4 6]
19. Consider the input 𝐱 and the convolutional filter 𝐤 below. The filter 𝐤 is moved with one
stride along both dimensions. A 2 × 2 average-pooling filter 𝐩 is applied to the linear
convolutional output 𝐱 ∗ 𝐤 with one stride along both dimensions. Determine the output
𝐲 after both operations. You can assume that the kernel moves across the row from left
to right before moving to the next row.
1 2 3
0 1 2 1 0
𝐱=[ ], 𝐤=[ ]
−1 0 1 0 1
−2 −1 0
2
a) 𝐲 = [ ]
0
b) 𝐲 = [2 0]
c) 𝐲 = [4 0]
d) 𝐲 = [4 2]
Part 2 Questions 6/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
20. Consider the input 𝐱 and the convolutional filter 𝐤 below. The filter 𝐤 is moved with one
stride along both dimensions. The output is subsequently non-linearized with the ReLU
function. A 2 × 2 average-pooling filter 𝐩 is applied to the non-linear output with one
stride along both dimensions. Determine the output 𝐲 after all operations. You can
assume that the kernel moves across the row from left to right before moving to the next
row.
1 2 3
0 1 2 1 0
𝐱=[ ], 𝐤=[ ], ReLU(𝑣) = max(0, 𝑣)
−1 0 1 0 1
−2 −1 0
2
a) 𝐲 = [ ]
0
b) 𝐲 = [2 0]
2
c) 𝐲 = [ ]
0.5
d) 𝐲 = [2 0.5]
21. Determine the minimum amount of padding that is required to obtain a 64 × 64 output
from a 128 × 128 input, for a 5 × 5 convolutional filter with stride 2.
a) Pad 1 row and 1 column.
b) Pad 2 rows and 2 columns.
c) Pad 3 rows and 3 columns.
d) Pad 4 rows and 4 columns.
22. Determine the smallest convolutional filter size that is required for the output to be
31 × 31, if the input is 64 × 64, assuming the least amount of padding. The filter stride
is 2.
a) 2 × 2 kernel.
b) 3 × 3 kernel.
c) 4 × 4 kernel.
d) 5 × 5 kernel.
Part 2 Questions 7/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
23. Determine the largest convolutional filter size that is required for the output to be
112 × 112, if the input is 448 × 448, assuming the least amount of padding. The filter
stride is 4.
a) 2 × 2 kernel.
b) 3 × 3 kernel.
c) 4 × 4 kernel.
d) 5 × 5 kernel.
24. An input image has a size of 32 × 32. Determine the output size if a 3 × 3 pooling filter is
applied with stride 2 and no padding.
a) 12 × 12.
b) 13 × 13.
c) 14 × 14.
d) 15 × 15.
25. An input image has a size of 448 × 448. Determine the output size if a 7 × 7
convolutional filter is first applied with stride 2, followed by a 2 × 2 max-pooling filter
with stride 2.
a) 220 × 220.
b) 219 × 219.
c) 110 × 110.
d) 109 × 109.
Part 2 Questions 8/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
26. An input image has a size of 512 × 1024. A convolutional layer a 5 × 5 filter 𝐤 is applied
followed by a max-pooling layer with a 3 × 3 pooling filter 𝐩. The output image is
128 × 256. By considering possible padding, what are the possible strides for 𝒌 and 𝐩?
a) Both filters move in strides of 2.
b) 𝐤 has a stride of 2, 𝐩 has a stride of 1.
c) 𝐤 has a stride of 3, 𝐩 has a stride of 1.
d) Both filters move in strides of 4.
27. How is the error propagated across a max-pooling layer?
a) Averaging the error gradient across all input.
b) Across the selected input for pooling.
c) Using stochastic gradient descent.
d) By applying the transpose of the convolution operation in the pooling layer.
Part 2 Questions 9/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
28. Consider the diagram of the original YOLO architecture. Consider the second block,
which has an input size that is 112 × 112 × 192, and consists of a convolutional layer
and a pooling layer. What is the minimum amount of padding in either dimension at the
convolutional layer, provided that no padding occurs at the block’s pooling layer?
a) 0
b) 1
c) 3
d) 5
29. In order to perform image classification, what is required at the end of a convolutional
neural network?
a) Fully-connected layers.
b) Long-short-term-memory.
c) Convolutional layers.
d) Pooling layers.
Part 2 Questions 10/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
30. What is the purpose of convolutional layers in a convolutional neural network in object
detection? Select the most appropriate choice.
a) Identifying the class of the object.
b) Feature detection.
c) Decoding the latent features.
d) Determining the number of classes.
31. What is the purpose of convolutional layers in a convolutional neural network in object
detection? Select the most appropriate choice.
a) Identifying the class of the object.
b) Feature detection.
c) Decoding the latent features.
d) Determining the number of classes.
32. Select the types of input that the Recurrent neural networks can work with.
i. Variable-length input.
ii. Fixed-length input.
iii. Time-series input.
iv. Multiple-dimensional input.
a) i.
b) i, iii
c) i, iii, iv
d) All of the above.
Part 2 Questions 11/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
33. What is the main advantage of the bi-directional recurrent neural network?
a) To more effectively learn the meaning of an earlier part of a variable length input.
b) To more effectively learn the meaning of a later part of a variable length input.
c) To allow for variable length input.
d) To learn about fixed-length input.
34. What is the main advantage of the bi-directional recurrent neural network?
a) To more effectively learn the meaning of an earlier part of a variable length input.
b) To more effectively learn the meaning of a later part of a variable length input.
c) To allow for variable length input.
d) To learn about fixed-length input.
35. What is the primary advantage of layer normalization in recurrent neural networks?
a) To normalize the input.
b) To alleviate the problem of vanishing and exploding gradients.
c) To normalize the output of a layer with min-max scaling.
d) To drop some output in a layer in order to regularize the learning process.
36. What is the purpose of the cell-state in Long-short term memory?
a) A probabilistic gate for forgetting.
b) To gate the amount of input into the next cell-state.
c) To provide a residual connection to the neuron at many time steps in the past.
d) To remember past input which is many steps away from the current input.
Part 2 Questions 12/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
𝜕𝐸
37. Consider the following recurrent neural network. What is the error gradient 𝜕𝑤 after
2,2
back-propagation, supposing the last input is at 𝑡 = 2?
(𝑡−1) (𝑡−1)
𝑤1,1 ℎ1 (𝑡−1) (𝑡−1)
𝑤2,2 ℎ2
( 𝑡) ( 𝑡) ( 𝑡) ( 𝑡 ) ( 𝑡)
𝑤1,𝑥 𝑥(𝑡) 𝑤2,1 ℎ1 𝑤𝑦,2 ℎ2
( 𝑡)
𝑥 𝑦 ( 𝑡)
( 𝑡) ( 𝑡) ( 𝑡) ( 𝑡)
𝑤1,1 ℎ1 𝑤2,2 ℎ2
(2)
𝜕𝐸 𝜕𝐸 𝜕𝑦 (2) 𝜕ℎ2
a) = 𝜕𝑦 (2) (2) (1)
𝜕𝑤2,2 𝜕ℎ2 𝜕𝑤2,2
(𝑡)
𝜕𝐸 𝜕𝐸 𝜕𝑦 (𝑡) 𝜕ℎ2
b) = ∑2𝑡=1 𝜕𝑦 (𝑡) (𝑡) (𝑡−1)
𝜕𝑤2,2 𝜕ℎ2 𝜕𝑤2,2
(𝑡)
𝜕𝐸 𝜕𝐸 𝜕𝑦 (𝑡) 𝜕ℎ2
c) = ∑3𝑡=1 𝜕𝑦 (𝑡) (𝑡) (𝑡−1)
𝜕𝑤2,2 𝜕ℎ2 𝜕𝑤2,2
d) 0
38. To mimic unconstrained matrix factorization, what should the activation functions in an
autoencoder be?
a) Linear.
b) Hyperbolic tangent.
c) Sigmoid.
d) ReLU.
39. What is the primary reason that autoencoders can be considered self-supervised
techniques?
a) Dimensionality reduction.
b) Essential information is captured in an input data.
c) The input and output are typically the same.
d) The output requires labels.
Part 2 Questions 13/14
EE4312 Sample Quiz Questions (Solutions) AY24/25 Semester 2
40. What does the Kullback Liebler loss do in a Variational autoencoder?
a) Ensures good reconstruction of generated data.
b) Reduce the dimensionality of data.
c) Ensures that the learnt distribution parameters are as close to the prior distribution
of the decoder as possible.
d) Pushes the learnt distribution parameters away from the origin in the latent space.
41. What is the purpose of the reconstruction loss in a Variational autoencoder?
a) Ensures good reconstruction of generated data.
b) Reduce the dimensionality of data.
c) Ensures that the learnt distribution parameters are as close to the prior distribution
as possible.
d) Pushes the learnt distribution parameters away from the origin in the latent space.
Part 2 Questions 14/14