NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA
ASSIGNMENT-1 Subject Name: Deep Learning
Submission Deadline: 18th February 2024
1. What is deep learning (DL)? Explain the relationship between artificial intelligence
(AI), machine learning (ML) and DL using a Venn diagram.
2. What do you mean by Overfitting and Underfitting? In DL, chances of _________ is high.
3. What do you mean by training a DL model? Explain the back propagation algorithm [using
gradient descent update rule].
4. Why loss function is chosen as squared loss instead of absolute or actual loss in regression
problems?
5. Which loss function is usually used in a DL model for a classification and regression
problem?
6. What do you mean by convex and non-convex error function? In neural network the error
function is usually ___________.
7. Why optimizers are used in DL? Name four optimizers.
8. What is an activation function? Why activation functions are used in artificial neuron?
9. Why bias component is used in artificial neuron?
10. What is learning rate? What happens if the learning rate is too high or too low?
11. What is the difference between gradient descent, stochastic gradient descent and mini batch
stochastic gradient descent?
12. If the gradient of a function at a point is zero, what can you say regarding the point?
13. Design a McCulloch Pitts neuron to model a 4-Bit logical AND problem.
14. Design a McCulloch Pitts neuron to model a 3-Bit logical OR problem.
15. What do you mean by linear separability? Give examples of two linearly separable problems.
16. What is the difference in McCulloch Pitts neuron and classical perceptron model?
17. If the sets P and N are finite and linearly separable, prove that the perceptron learning algorithm
updates the weight vector wt a finite number of times such that the two sets are separated.
18. Solve the 2-bit Ex-OR problem using a 3 layer (input, hidden and output) perceptron.
19. Explain the weight updating equations for the following algorithms:
a. Gradient Descent
b. Gradient Descent with momentum
c. Nesterov Accelerated Gradient Descent
d. Adagrad
e. RMSProp
f. Adam
20. You are using an Adam optimizer. Show why the bias correction naturally
disappears when the numbers of steps to compute the exponential moving averages
gets large.
21. A 1-hidden layer neural network with 5 neurons, 5 inputs and 5 outputs with bias at
neuron will have how many total parameters.
22. You come across a nonlinear function that passes 1 if its input is nonnegative, else
evaluates to 0, i.e.
A friend recommends you use this non-linearity in your convolutional neural network
with the Adam optimizer. Would you follow their advice? Why or why not?
23. You want to use the figure below to explain the concept of early stopping
to a friend. Fill-in the blanks. (1) and (2) describe the axes. (3) and (4) describe
values on the vertical and horizontal axis. (5) and (6) describe the curves. Be precise.
24. Consider the following neural network architecture with sigmoid activation function at every
neuron and given weights as initial weights, inputs and outputs. Compute the value of W 1, W5
and W6 after 1 epoch of backpropagation algorithm using stochastic gradient descent weight
updating rule.
25. You are training a deep neural network on a large scale dataset,
and you are thinking of using gradient descent as your optimization function. Which
of the following is true?
a. It is possible for Stochastic Gradient Descent to converge faster than Batch
Gradient Descent.
b. It is possible for Mini Batch Gradient Descent to converge faster than Stochastic
Gradient Descent.
c. It is possible for Mini Batch Gradient Descent to converge faster than Batch
Gradient Descent.
d. It is possible for Batch Gradient Descent to converge faster than Stochastic
Gradient Descent.
26. During backpropagation, as the gradient flows backward through a sigmoid
non-linearity, the gradient will always:
a. Increase in magnitude, maintain polarity
b. Increase in magnitude, reverse polarity
c. Decrease in magnitude, maintain polarity
d. Decrease in magnitude, reverse polarity
27. How does splitting a dataset into train, validation and test sets help identify
overfitting?
28. Which of the following is/are true? If any statement is false, make it true without
diverting from the context of the question.
a. Simple models has low bias and high variance.
b. Simple models has high bias and low variance.
c. High bias and high variance are the desirable characteristic of a good model.
d. Bias contribute to error while Variance do not contribute to error.
e. In overfitting, train error is low and test error is also low.
f. In underfitting train error is high and test error is also high.
g. In NAG optimizer, gradient term is first added to the old weight and then momentum
term is added.
h. Bias correction is used in RMSProp.
i. Only by gradually decreasing the learning rate over iterations will work fine in a dataset
having sparse and dense important features.