CNN Mcqs
CNN Mcqs
7.2 Downsampling
7.3 Upsampling
Page 1 of 26
Which method is mentioned for scaling channels during upsampling?
a) Bilinear interpolation
b) Nearest neighbor interpolation
c) Gaussian blur
d) Median filtering
Answer: b
7.4 Architectures
What is the formula for the number of weights in a fully connected layer?
a) W × H × C_out × (W × H × C_in + 1)
b) C_out × (K × K × C_in + 1)
c) W × H × C_in
Page 2 of 26
d) K × K × C_out
Answer: a
How does the number of weights in a convolutional layer compare to a fully connected layer?
a) Always higher
b) Generally lower due to weight sharing
c) Equal
d) Depends only on input size
Answer: b
Padding
Page 3 of 26
What does a computation graph represent in the context of convolution?
a) The hardware architecture
b) The flow of data and operations
c) The loss function
d) The training dataset
Answer: b
Dilated Convolutions
Visualization
Page 4 of 26
d) To reduce memory usage
Answer: b
VGG Architecture
How many layers does the VGG architecture with 152 layers have?
a) 50
b) 152
c) 16
d) 8
Answer: b
SegNet
General Concepts
Page 5 of 26
What activation function was used in AlexNet?
a) Sigmoid
b) ReLU
c) Tanh
d) Softmax
Answer: b
Page 6 of 26
b) 10 million
c) 138 million
d) 200 million
Answer: a
Which technique does not require pooling to increase the receptive field?
a) Standard convolution
b) Dilated convolution
c) Upsampling
d) Downsampling
Answer: b
How many convolution layers are mentioned in a simple architecture with 5x5 filters?
a) 1
b) 2
c) 3
d) 4
Answer: b
What size pooling layers are mentioned alongside 5x5 convolution layers?
a) 3x3
b) 2x2
c) 4x4
Page 7 of 26
d) 1x1
Answer: b
How many fully connected layers are typically used in the architectures mentioned?
a) 0
b) 1
c) 2
d) 3
Answer: c
Page 8 of 26
Which layer type is three-dimensional in nature?
a) Fully connected layer
b) Convolution kernel
c) Pooling layer
d) Dropout layer
Answer: b
What is the kernel size used in the Inception module for efficiency?
a) 3x3
b) 5x1
c) 7x7
d) 1x1
Answer: b
What is the primary goal of using varying filter sizes in Inception modules?
a) To reduce the network depth
b) To capture multi-scale features
c) To increase memory usage
d) To simplify the architecture
Answer: b
Which year saw the introduction of very deep convolutional networks like VGG?
a) 2012
Page 9 of 26
b) 2014
c) 2015
d) 2016
Answer: c
What is a key difference between FCN-8s and DeepLab as shown in the comparison?
a) FCN-8s uses dilated convolutions
b) DeepLab produces more accurate results
c) FCN-8s has no pooling
d) DeepLab uses only fully connected layers
Answer: b
What is the ground truth in the context of the image processing technique comparison?
a) The input image
b) The expected output
c) The filter kernel
d) The training data
Answer: b
Which architecture eliminates the need for fully connected layers with intermediate pooling?
a) AlexNet
b) VGG
c) Inception
d) SegNet
Answer: c
Page 10 of 26
d) Half the input
Answer: b
What is the main challenge with fully connected layers in deep networks?
a) Overfitting
b) High computational complexity
c) Lack of feature extraction
d) Small memory footprint
Answer: b
Page 11 of 26
How does the Inception architecture reduce parameters?
a) By using larger filters
b) By employing 5x1 convolutions
c) By removing pooling layers
d) By increasing depth
Answer: b
How many pooling layers are typically used with 5x5 convolution layers in the mentioned architecture?
a) 1
b) 2
c) 3
d) 4
Answer: b
Which layer type is responsible for the most memory usage in VGG?
a) Convolutional layers
Page 12 of 26
b) Pooling layers
c) Fully connected layers
d) Dropout layers
Answer: c
Which technique allows predictions at the same resolution as inputs without pooling?
a) Standard convolution
b) Dilated convolution
c) Downsampling
d) Upsampling
Answer: b
Which year was the VGG architecture with 152 layers discussed?
a) 2012
b) 2014
c) 2015
Page 13 of 26
d) 2016
Answer: c
What does the comparison of FCN-8s, DeepLab, and ground truth illustrate?
a) Training speed
b) Accuracy of segmentation results
c) Memory usage
d) Kernel sizes
Answer: b
How many convolution layers are in the simple architecture with 2x2 pooling?
a) 1
b) 2
c) 3
d) 4
Answer: b
What is the kernel size of the filters in the simple architecture mentioned?
a) 3x3
b) 5x5
c) 7x7
d) 1x1
Answer: b
Page 14 of 26
Which technique is used to handle large receptive fields efficiently?
a) Fully connected layers
b) Dilated convolutions
c) Nearest neighbor interpolation
d) Downsampling
Answer: b
Page 15 of 26
b) Enhances feature extraction
c) Reduces memory usage
d) Eliminates pooling
Answer: b
Page 16 of 26
d) To eliminate pooling
Answer: b
What is the primary advantage of CNNs over traditional neural networks for image data?a) They require
less datab) They exploit spatial structurec) They eliminate the need for activation functionsd) They are
faster to trainAnswer: b) They exploit spatial structureExplanation: CNNs use convolutional layers to
capture spatial hierarchies in images.
Which type of data are CNNs primarily designed to handle?a) Tabular datab) Sequential datac) Grid-like
data (e.g., images)d) Unstructured textAnswer: c) Grid-like data (e.g., images)Explanation: CNNs are
optimized for 2D grid data like images.
What is a key feature that reduces the number of parameters in CNNs?a) Fully connected layersb)
Weight sharingc) Large filter sizesd) Increased depthAnswer: b) Weight sharingExplanation: Weight
sharing in convolutional layers reduces parameter count.
Which layer in a CNN applies filters to the input?a) Pooling layerb) Convolutional layerc) Fully connected
layerd) Output layerAnswer: b) Convolutional layerExplanation: The convolutional layer applies filters to
extract features.
Page 17 of 26
What is the purpose of CNNs in computer vision tasks?a) To perform clusteringb) To detect and classify
objectsc) To normalize datad) To reduce dimensionalityAnswer: b) To detect and classify
objectsExplanation: CNNs are widely used for object detection and classification.
Convolution
What does a convolution operation in a CNN do?a) Applies a global transformationb) Extracts local
features using filtersc) Normalizes the entire inputd) Reduces the number of layersAnswer: b) Extracts
local features using filtersExplanation: Convolution uses filters to detect local patterns.
What is the role of the filter (kernel) in a convolutional layer?a) To initialize weightsb) To slide over the
input and compute feature mapsc) To reduce spatial dimensionsd) To compute the loss
functionAnswer: b) To slide over the input and compute feature mapsExplanation: Filters generate
feature maps by convolving with the input.
What happens to the output size if no padding is used in convolution?a) It increasesb) It decreasesc) It
remains the samed) It depends on the filter sizeAnswer: b) It decreasesExplanation: Without padding,
the output size shrinks due to boundary effects.
What is the purpose of the stride parameter in convolution?a) To determine the filter sizeb) To control
the step size of the filterc) To set the learning rated) To initialize weightsAnswer: b) To control the step
size of the filterExplanation: Stride determines how far the filter moves across the input.
What effect does adding padding have on the convolutional output?a) Reduces the output sizeb)
Preserves or increases the output sizec) Eliminates the need for filtersd) Normalizes the inputAnswer:
b) Preserves or increases the output sizeExplanation: Padding maintains or adjusts the output
dimensions.
Downsampling
What is the primary purpose of downsampling in CNNs?a) To increase the number of parametersb) To
reduce spatial dimensionsc) To add noise to the datad) To increase the learning rateAnswer: b) To
reduce spatial dimensionsExplanation: Downsampling reduces computational load and overfitting.
Which technique is commonly used for downsampling in CNNs?a) Max poolingb) Dropoutc) Batch
normalizationd) Gradient clippingAnswer: a) Max poolingExplanation: Max pooling selects the maximum
value to downsample.
What is the effect of downsampling on the spatial resolution of feature maps?a) Increases resolutionb)
Decreases resolutionc) Maintains resolutiond) Eliminates resolutionAnswer: b) Decreases
resolutionExplanation: Downsampling reduces the spatial size of feature maps.
What is a disadvantage of aggressive downsampling in CNNs?a) Increased training speedb) Loss of fine
detailsc) Reduced memory usaged) Improved accuracyAnswer: b) Loss of fine detailsExplanation:
Excessive downsampling can discard important features.
Page 18 of 26
Which pooling method takes the average value in a region?a) Max poolingb) Average poolingc) Global
poolingd) Sum poolingAnswer: b) Average poolingExplanation: Average pooling computes the mean
value in a region.
Upsampling
What is the primary purpose of upsampling in CNNs?a) To reduce spatial dimensionsb) To increase
spatial dimensionsc) To normalize input datad) To initialize weightsAnswer: b) To increase spatial
dimensionsExplanation: Upsampling is used to reconstruct or enlarge feature maps.
Which technique is commonly used for upsampling in CNNs?a) Transposed convolutionb) Max poolingc)
Dropoutd) Batch normalizationAnswer: a) Transposed convolutionExplanation: Transposed convolution
upsamples feature maps.
What is the effect of upsampling on the spatial resolution of feature maps?a) Decreases resolutionb)
Increases resolutionc) Maintains resolutiond) Eliminates resolutionAnswer: b) Increases
resolutionExplanation: Upsampling enlarges the spatial size of feature maps.
In which task is upsampling commonly used in CNNs?a) Image classificationb) Image segmentationc)
Object detectiond) Data augmentationAnswer: b) Image segmentationExplanation: Upsampling helps
reconstruct pixel-level predictions in segmentation.
Architectures
Which CNN architecture won the ImageNet competition in 2012?a) VGGb) AlexNetc) ResNetd)
InceptionAnswer: b) AlexNetExplanation: AlexNet’s success revitalized CNN research.
What is a key feature of the VGG architecture?a) Skip connectionsb) Small 3x3 filtersc) Parallel
convolutionsd) No pooling layersAnswer: b) Small 3x3 filtersExplanation: VGG uses stacks of small
filters for depth.
Which architecture introduced the concept of inception modules?a) AlexNetb) VGGc) GoogLeNetd)
ResNetAnswer: c) GoogLeNetExplanation: Inception modules combine multiple filter sizes.
What is a key innovation in ResNet architectures?a) Use of large filtersb) Skip connectionsc) Absence of
activation functionsd) Fixed strideAnswer: b) Skip connectionsExplanation: Skip connections mitigate
vanishing gradients.
Which architecture is known for its depth and residual learning?a) AlexNetb) VGGc) ResNetd)
InceptionAnswer: c) ResNetExplanation: ResNet uses residual blocks for deep networks.
Page 19 of 26
What is a disadvantage of very deep CNN architectures?a) Increased accuracyb) Vanishing gradientsc)
Reduced training timed) Smaller model sizeAnswer: b) Vanishing gradientsExplanation: Depth can
cause gradient issues without mitigation.
Which architecture uses a global average pooling layer before the output?a) AlexNetb) VGGc)
GoogLeNetd) ResNetAnswer: c) GoogLeNetExplanation: Global pooling reduces parameters in
GoogLeNet.
Which CNN architecture is known for its simplicity and uniform structure?a) AlexNetb) VGGc) ResNetd)
InceptionAnswer: b) VGGExplanation: VGG’s uniform structure simplifies design.
What is the purpose of the bottleneck design in ResNet?a) To increase parametersb) To reduce
computational costc) To eliminate poolingd) To normalize dataAnswer: b) To reduce computational
costExplanation: Bottlenecks optimize resource usage.
Visualization
What is the purpose of visualizing feature maps in CNNs?a) To reduce model sizeb) To understand
learned featuresc) To normalize input datad) To increase training speedAnswer: b) To understand
learned featuresExplanation: Feature maps reveal what the network learns.
Which technique visualizes the importance of input regions for CNN predictions?a) Gradient descentb)
Grad-CAMc) Dropoutd) Data augmentationAnswer: b) Grad-CAMExplanation: Grad-CAM highlights
influential input regions.
What does a saliency map show in CNN visualization?a) The loss functionb) The gradient impact on the
inputc) The number of parametersd) The learning rateAnswer: b) The gradient impact on the
inputExplanation: Saliency maps show gradient effects on input pixels.
Which visualization technique helps interpret CNN filters?a) t-SNEb) Filter visualizationc) PCAd) K-
meansAnswer: b) Filter visualizationExplanation: Filter visualization shows learned patterns.
What is a common tool used to visualize CNN activations?a) TensorBoardb) Excelc) MATLABd)
PowerPointAnswer: a) TensorBoardExplanation: TensorBoard is widely used for CNN visualization.
What can visualization of CNN layers reveal?a) Training timeb) Hierarchical feature extractionc) Number
of epochsd) Learning rate scheduleAnswer: b) Hierarchical feature extractionExplanation: Layers show
progression from edges to objects.
Page 20 of 26
What is a limitation of CNN visualization techniques?a) They are too fastb) They may not fully explain
decisionsc) They reduce accuracyd) They eliminate parametersAnswer: b) They may not fully explain
decisionsExplanation: Visualizations provide insights but not complete explanations.
Which visualization method highlights class-specific regions?a) Average poolingb) Class Activation
Mapping (CAM)c) Convolutiond) UpsamplingAnswer: b) Class Activation Mapping (CAM)Explanation: CAM
focuses on regions relevant to a class.
What is the purpose of occlusion sensitivity analysis in CNNs?a) To increase model sizeb) To identify
critical input regionsc) To reduce training timed) To normalize dataAnswer: b) To identify critical input
regionsExplanation: Occlusion tests reveal important areas by masking inputs.
What does a feature map visualization typically show?a) The loss valueb) Activated regions after
convolutionc) The number of layersd) The learning rateAnswer: b) Activated regions after
convolutionExplanation: Feature maps display activated areas.
Which tool can visualize CNN training progress?a) Notepadb) TensorBoardc) Wordd) PaintAnswer: b)
TensorBoardExplanation: TensorBoard tracks and visualizes training metrics.
What is a common application of CNN visualization?a) Data preprocessingb) Model debuggingc) Weight
initializationd) Loss computationAnswer: b) Model debuggingExplanation: Visualization helps debug
and interpret models.
Which technique uses gradients to produce a saliency map?a) Max poolingb) Vanilla gradientc)
Transposed convolutiond) DropoutAnswer: b) Vanilla gradientExplanation: Vanilla gradients create
saliency maps from input gradients.
What can over-visualization of CNNs lead to?a) Improved accuracyb) Overcomplication and confusionc)
Reduced training timed) Smaller model sizeAnswer: b) Overcomplication and confusionExplanation:
Excessive visualization can obscure insights.
Additional Questions
What is the effect of increasing the number of filters in a convolutional layer?a) Reduces output sizeb)
Increases feature diversityc) Eliminates poolingd) Normalizes inputAnswer: b) Increases feature
diversityExplanation: More filters capture varied features.
What is the purpose of the ReLU activation in CNNs?a) To normalize datab) To introduce non-linearityc)
To reduce parametersd) To initialize weightsAnswer: b) To introduce non-linearityExplanation: ReLU
enables complex pattern learning.
Which layer follows convolution in a typical CNN architecture?a) Input layerb) Pooling layerc) Output
layerd) Fully connected layerAnswer: b) Pooling layerExplanation: Pooling often follows convolution.
What is the effect of a larger stride in convolution?a) Increases output sizeb) Reduces output sizec)
Maintains output sized) Eliminates filtersAnswer: b) Reduces output sizeExplanation: Larger strides
Page 21 of 26
reduce spatial dimensions.
What is a common padding strategy to preserve input size?a) No paddingb) Same paddingc) Valid
paddingd) Zero paddingAnswer: b) Same paddingExplanation: Same padding maintains input
dimensions.
What is the purpose of the bias term in convolution?a) To initialize weightsb) To shift the activationc) To
reduce parametersd) To normalize dataAnswer: b) To shift the activationExplanation: Bias adjusts the
filter output.
What is the role of the activation function after convolution?a) To reduce dimensionsb) To introduce
non-linearityc) To initialize filtersd) To compute lossAnswer: b) To introduce non-linearityExplanation:
Activation adds non-linearity to feature maps.
What is the effect of increasing filter size in convolution?a) Captures smaller patternsb) Captures larger
patternsc) Reduces parametersd) Eliminates paddingAnswer: b) Captures larger patternsExplanation:
Larger filters detect broader features.
Which pooling method is invariant to small translations?a) Average poolingb) Max poolingc) Global
poolingd) Sum poolingAnswer: b) Max poolingExplanation: Max pooling is translation-invariant.
What is the purpose of upsampling in generative CNNs?a) To classify imagesb) To generate imagesc) To
reduce dimensionsd) To normalize dataAnswer: b) To generate imagesExplanation: Upsampling
reconstructs images in generative models.
What is the purpose of skip connections in ResNet?a) To reduce parametersb) To mitigate vanishing
gradientsc) To increase strided) To normalize dataAnswer: b) To mitigate vanishing
gradientsExplanation: Skip connections aid gradient flow.
Which visualization technique uses class scores?a) Grad-CAMb) Saliency mapc) Filter visualizationd)
OcclusionAnswer: a) Grad-CAMExplanation: Grad-CAM leverages class scores.
What is a common application of CNN visualization?a) Data preprocessingb) Medical imaging analysisc)
Weight initializationd) Loss optimizationAnswer: b) Medical imaging analysisExplanation: Visualization
aids in interpreting medical scans.
What is the effect of no padding in convolution?a) Increases output sizeb) Decreases output sizec)
Maintains output sized) Eliminates filtersAnswer: b) Decreases output sizeExplanation: No padding
reduces output dimensions.
Page 22 of 26
Which layer combines features in a CNN before classification?a) Convolutional layerb) Fully connected
layerc) Pooling layerd) Upsampling layerAnswer: b) Fully connected layerExplanation: Fully connected
layers integrate features.
What is the purpose of the softmax layer in CNNs?a) To extract featuresb) To compute class
probabilitiesc) To reduce dimensionsd) To initialize weightsAnswer: b) To compute class
probabilitiesExplanation: Softmax outputs probabilities.
Which pooling method preserves more spatial information?a) Max poolingb) Average poolingc) Global
poolingd) Sum poolingAnswer: b) Average poolingExplanation: Average pooling retains more spatial
details.
What is the effect of increasing the number of convolutional layers?a) Reduces depthb) Increases
feature hierarchyc) Eliminates poolingd) Normalizes dataAnswer: b) Increases feature
hierarchyExplanation: More layers enhance feature complexity.
Which architecture avoids fully connected layers?a) AlexNetb) GoogLeNetc) VGGd) ResNetAnswer: b)
GoogLeNetExplanation: GoogLeNet uses global pooling instead.
What is the purpose of batch normalization in CNNs?a) To reduce parametersb) To stabilize trainingc) To
increase strided) To eliminate filtersAnswer: b) To stabilize trainingExplanation: Batch normalization
normalizes layer inputs.
Which technique visualizes the impact of input occlusion?a) Grad-CAMb) Occlusion sensitivityc)
Saliency mapd) Filter visualizationAnswer: b) Occlusion sensitivityExplanation: Occlusion tests input
importance.
What is the effect of a smaller stride in convolution?a) Increases output sizeb) Reduces output sizec)
Maintains output sized) Eliminates paddingAnswer: a) Increases output sizeExplanation: Smaller strides
produce larger outputs.
Which layer is critical for spatial invariance in CNNs?a) Convolutional layerb) Pooling layerc) Fully
connected layerd) Upsampling layerAnswer: b) Pooling layerExplanation: Pooling provides translation
invariance.
What is the purpose of the ReLU function in CNNs?a) To normalize datab) To mitigate vanishing
gradientsc) To reduce parametersd) To initialize filtersAnswer: b) To mitigate vanishing
gradientsExplanation: ReLU prevents gradient saturation.
Which architecture uses parallel convolutional paths?a) VGGb) Inceptionc) ResNetd) AlexNetAnswer: b)
InceptionExplanation: Inception uses multi-scale convolutions.
What is the effect of downsampling on computational cost?a) Increases costb) Decreases costc)
Maintains costd) Eliminates costAnswer: b) Decreases costExplanation: Downsampling reduces data
size.
Page 23 of 26
Which upsampling method avoids checkerboard artifacts?a) Nearest neighborb) Transposed
convolution with proper stridec) Max poolingd) Average poolingAnswer: b) Transposed convolution with
proper strideExplanation: Proper stride settings reduce artifacts.
What is the purpose of visualizing CNN gradients?a) To initialize weightsb) To interpret model
decisionsc) To reduce layersd) To normalize dataAnswer: b) To interpret model decisionsExplanation:
Gradients show input influence.
Which architecture is known for its residual blocks?a) AlexNetb) VGGc) ResNetd) InceptionAnswer: c)
ResNetExplanation: Residual blocks define ResNet.
What is the effect of increasing padding in convolution?a) Reduces output sizeb) Increases output
sizec) Maintains output sized) Eliminates filtersAnswer: b) Increases output sizeExplanation: More
padding expands the output.
Which pooling method is sensitive to noise?a) Max poolingb) Average poolingc) Global poolingd) Sum
poolingAnswer: b) Average poolingExplanation: Average pooling averages noise.
What is the purpose of upsampling in autoencoders?a) To encode datab) To decode datac) To reduce
dimensionsd) To normalize dataAnswer: b) To decode dataExplanation: Upsampling reconstructs input
in decoding.
Which visualization technique uses layer activations?a) Grad-CAMb) Feature map visualizationc)
Saliency mapd) OcclusionAnswer: b) Feature map visualizationExplanation: Feature maps show
activations.
What is the effect of a larger filter size on receptive field?a) Reduces receptive fieldb) Increases
receptive fieldc) Maintains receptive fieldd) Eliminates receptive fieldAnswer: b) Increases receptive
fieldExplanation: Larger filters cover more area.
Which layer precedes the output layer in CNNs?a) Convolutional layerb) Pooling layerc) Fully connected
layerd) Upsampling layerAnswer: c) Fully connected layerExplanation: Fully connected layers feed into
the output.
What is the purpose of the softmax layer in classification CNNs?a) To extract featuresb) To compute
probabilitiesc) To reduce dimensionsd) To initialize weightsAnswer: b) To compute
probabilitiesExplanation: Softmax provides class probabilities.
Which downsampling method is robust to outliers?a) Max poolingb) Average poolingc) Global poolingd)
Sum poolingAnswer: a) Max poolingExplanation: Max pooling ignores outliers.
What is the effect of upsampling on feature map detail?a) Reduces detailb) Increases detailc) Maintains
detaild) Eliminates detailAnswer: b) Increases detailExplanation: Upsampling can enhance detail.
Page 24 of 26
What is the purpose of visualizing CNN weights?a) To reduce parametersb) To understand filter
patternsc) To normalize datad) To increase layersAnswer: b) To understand filter patternsExplanation:
Weights reveal learned filters.
Which technique combines gradients and activations?a) Grad-CAMb) Saliency mapc) Occlusiond) Filter
visualizationAnswer: a) Grad-CAMExplanation: Grad-CAM integrates both for visualization.
Which layer is optional in some CNN architectures?a) Convolutional layerb) Fully connected layerc)
Pooling layerd) Upsampling layerAnswer: b) Fully connected layerExplanation: Some architectures skip
fully connected layers.
What is the purpose of the ReLU function after pooling?a) To normalize datab) To introduce non-
linearityc) To reduce parametersd) To initialize weightsAnswer: b) To introduce non-
linearityExplanation: ReLU adds non-linearity post-pooling.
What is the purpose of visualizing CNN predictions?a) To initialize weightsb) To validate model
decisionsc) To reduce layersd) To normalize dataAnswer: b) To validate model decisionsExplanation:
Visualization confirms prediction accuracy.
What is the effect of increasing the number of layers in a CNN?a) Reduces feature hierarchyb) Increases
feature hierarchyc) Eliminates poolingd) Normalizes dataAnswer: b) Increases feature
hierarchyExplanation: More layers enhance feature complexity.
Which pooling method is used in global average pooling?a) Max poolingb) Average poolingc) Sum
poolingd) NoneAnswer: b) Average poolingExplanation: Global average pooling averages the entire map.
Page 25 of 26
resolutionExplanation: Upsampling improves image quality.
Which visualization technique is most interpretable for end-users?a) Grad-CAMb) Saliency mapc)
Occlusiond) Filter visualizationAnswer: a) Grad-CAMExplanation: Grad-CAM provides clear, class-
specific heatmaps.
Page 26 of 26