0% found this document useful (0 votes)

86 views5 pages

Advance Computer Vision

The document is a guess paper for an Advanced Computer Vision course, covering fundamental concepts, image processing techniques, and state-of-the-art architectures. It includes questions on the challenges of computer vision, effects of image filtering, SIFT keypoints, neural network architectures, and GANs. Solutions are provided for each question, detailing the reasoning and calculations involved.

Uploaded by

engineerai299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views5 pages

Advance Computer Vision

Uploaded by

engineerai299

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Advance Computer Vision – Guess Paper 1 with Solutions

CLO #1: Understand the fundamental concepts of Computer Vision, Image Formation, and filtering.

Q1.

a. Why is computer vision considered a challenging problem, even though human vision appears natural
and effortless? Identify three key factors that contribute to its complexity.
2 Points

Answer:

1. Variability in Image Acquisition: Images can vary due to changes in lighting, viewpoint, scale,
occlusion, and noise, making it hard for algorithms to generalize.

2. Complexity of Visual Scenes: Real-world scenes are cluttered with overlapping objects, textures, and
shadows, increasing the difficulty of object recognition.

3. Ambiguity and Context Dependence: Images often have ambiguous or incomplete information
requiring contextual understanding beyond raw pixel data.

Rationale: Human vision is a result of biological evolution and contextual cognition, while computer vision
must infer meaning solely from pixel values under varied conditions.

b. The images shown below are quite different, but their histograms are the same. Suppose that each
image is blurred with a 3 x 3 averaging mask. Would the histograms of the blurred images still be equal?
Explain.
2 Points

If your answer is no, sketch the two histograms.

2 Points

Answer:

 No, after blurring, the histograms will not remain equal.

 Although the original histograms are identical, the spatial arrangement of pixel intensities is different.
Blurring smooths pixel intensities based on neighbours, so differences in local patterns affect the post-
blur histogram differently.

 Sketch explanation: One histogram will show smoother peaks and less contrast, while the other might
retain distinct peaks shifted differently due to different spatial distributions.

Rationale: Histogram is a frequency distribution of intensities, which ignores spatial info. Blurring modifies
intensities based on local neighbourhoods, which differ between images with the same histogram.

c. In each application an averaging mask is applied to input images to reduce noise, and then a Laplacian
mask is applied to enhance small details. Would the result be the same if the order of these operations
were reversed?
2 Points

Answer:

 No, the result will differ.

 Applying the averaging mask first smooths the image, reducing noise before edge enhancement.

 If the Laplacian is applied first, it will highlight noise as well as edges; subsequent averaging will then
blur these enhanced noise details, reducing edge sharpness.

 Thus, applying smoothing before edge enhancement gives cleaner edges.

Rationale: Filtering operations are not commutative. Noise reduction before edge detection improves results by
suppressing false edges.

d. Consider a horizontal intensity profile I(x) of ten pixels: I= (10,12,15,25,45,50,48,30,20,15) where

x=0,,,,,9

1. Compute the first derivative using the forward difference method for the first five pixels.
1 Point

2. Compute the second derivative using the central difference method for the first five pixels.
1 Point

Answer:

1. First derivative (forward difference):

f′(x)=I(x+1)−I(x)

x Calculation Result

0 12 - 10 2

1 15 - 12 3

2 25 - 15 10

3 45 - 25 20

4 50 - 45 5

2. Second derivative (central difference):

f′′(x)=I(x+1)−2I(x)+I(x−1)

For x=1 to 4 (central difference requires neighbours):

x Calculation Result

1 15 - 2*12 + 10 = 1 1

2 25 - 2*15 + 12 = 7 7

3 45 - 2*25 + 15 = 10 10

4 50 - 2*45 + 25 = -15 -15

Q2.

The matrices in the left column are the output of applying Gaussian filters with different bandwidths for
a single octave in the SIFT detection algorithm. On the right, we have the Difference of Gaussian images.

 Fill in the blank areas in the Gaussian filtered images so that there are only 2 SIFT keypoints located at
(x=2, y=scale=2), and (x=1, y=scale=3), as marked by "X" in the difference of Gaussian images. This
is before removing edges and low contrast points, and sub-pixel tuning. Also, fill in the Difference of
Gaussian values.
10 Points

 Explain why we have key points in the above-mentioned locations, and why we do not have keypoints
in other locations.
5 Points
Answer:

 Filling in the Gaussian filtered images: Values are chosen so that subtracting adjacent scales gives
positive or negative extrema exactly at those points (x=2, scale=2) and (x=1, scale=3). This creates
local maxima or minima in DoG.

 Filling DoG: Difference of Gaussian images are computed by subtracting Gaussian images at adjacent
scales. Values at keypoints are significantly higher or lower than neighbors.

 Why keypoints at these locations: Keypoints are detected as local extrema in scale-space (in both
space and scale dimensions), representing stable, repeatable features invariant to scale and rotation.

 No keypoints elsewhere: Because those points are not local extrema; they may be flat, edges, or low
contrast points rejected to improve robustness.

Rationale: SIFT keypoints correspond to scale-space extrema of DoG, which helps detect distinctive, stable
points invariant to image transformations.

CLO #2: Understand the state-of-the-art architecture of computer vision.

Q3.

a. Ali and Bilal are trying to redesign the LeNet conv net architecture to reduce the number of weights.
Ali wants to reduce the number of feature maps in the first convolution layer. Bilal wants to reduce the
number of hidden units in the last layer before the output. Briefly explain whose approach is better?
Why?
2 Points

Answer:

 Ali’s approach is generally better because the number of parameters in convolutional layers is often
much larger due to large spatial dimensions and filter sizes. Reducing feature maps early reduces total
computations and parameters significantly.

 Bilal’s approach reduces parameters only in the fully connected layers, which may be smaller
compared to convolutional layers in modern architectures.

Rationale: Convolutional layers dominate parameter count in early layers; reducing their size often yields more
parameter savings.

b. One possible way to address the vanishing gradient problem in deep networks is to use the tanh
activation function. However, during lectures, it was discussed that in classification tasks neural networks
output the probability of each class. Given probabilities must always be non-negative, do you think using
tanh could distort these probabilities? Briefly explain your reasoning.
2 Points

Answer:

 Yes, tanh outputs values in [−1,1][-1,1][−1,1], which include negative values, incompatible with the
probability requirement of non-negative outputs between 0 and 1.

 This can distort the interpretation of outputs as probabilities. Instead, softmax or sigmoid activations
are used to generate valid probabilities.

c. In a multi-label classification task, each instance may belong to multiple categories simultaneously.
1. Which activation function should be used in the output layer for multi-label classification, and why is it
preferred over softmax?
2 Points

2. Explain why Categorical Cross-Entropy is not suitable for multi-label classification and which loss
function should be used instead.
2 Points

3. Suppose your dataset has N classes — what would be the appropriate architecture for the final layer of
your neural network?
2 Points

Answer:

1. Use sigmoid activation in each output neuron independently because classes are not mutually
exclusive. Softmax enforces exclusivity, which is inappropriate for multi-label.

2. Use Binary Cross-Entropy (BCE) loss instead of Categorical Cross-Entropy because BCE treats each
class independently as a binary classification, suitable for multi-label scenarios.

3. The output layer should have N neurons, each with sigmoid activation, to independently predict
presence/absence of each class.

Q4.

a. Explain how the generator and discriminator in a cGAN differ from those in a standard GAN, in terms
of their inputs/outputs, architecture, and loss function.
2 Points

Answer:

 Generator: In cGAN, input includes both noise zzz and conditional label yyy; output is an image
conditioned on yyy. In standard GAN, input is noise only.

 Discriminator: In cGAN, input is an image plus the label yyy, learning to distinguish if image matches
label. In GAN, input is image only.

 Loss function: cGAN incorporates conditioning yyy into both generator and discriminator loss terms
to enforce class-conditional generation.

b. Suppose a GAN is trained to generate images of animals, but after training, it only produces images
resembling a few specific types of animals.

 What could be causing this issue?

2 Points

 How would you diagnose/detect this particular phenomenon?

2 Points

 How would you modify the training process to ensure more diverse outputs?
2 Points

Answer:

 The problem is mode collapse, where the generator produces limited output modes that fool the
discriminator.

 Diagnose by observing lack of variety in generated samples, and by tracking metrics like diversity
score or latent space coverage.
 Modify training by adding techniques like minibatch discrimination, feature matching, unrolled GANs,
or adding noise and regularization to encourage diversity.

c. The loss function for the CGAN is given as:

 Interpret the role of the label y in both terms of the loss function.
2 Points

 Why is conditioning on y important for generating meaningful outputs?

2 Points

Answer:

 Label y provides conditioning information: in the discriminator, it judges whether the image matches
the label y; in the generator, it guides generation towards images corresponding to y.

 Conditioning enables control over the class/type of generated images, allowing targeted generation
rather than random sampling.

Q5.

a. Self-attention computes dot products between keys and queries. If a given self-attention has 4 attention
heads and its one-dimensional input size is 3, how many dot products will it compute? Show your
calculations.
2 Points

Answer:

 For a sequence length L=3, number of dot products per head = L×L=9.

 For 4 heads, total dot products = 4×9=36

b. Which token is used as a feature representation of the input Image/video in vision transformer (ViT)?
2 Points

Answer:

 The [CLS] token (classification token) is used as the global feature representation for downstream
tasks.

c. Vision Transformers often require large-scale datasets for effective training. Suggest one strategy to
improve the performance of ViTs when training data is limited.
2 Points

Answer:

 Use transfer learning with pretrained ViT weights on large datasets and fine-tune on the smaller
dataset.

 Alternatively, use data augmentation or regularization techniques.

ACV
No ratings yet
ACV
7 pages
Advance Computer Vision 3
No ratings yet
Advance Computer Vision 3
5 pages
Advance Computer Vision 4
No ratings yet
Advance Computer Vision 4
4 pages
Advance Computer Vision 2
No ratings yet
Advance Computer Vision 2
5 pages
WS 2021
No ratings yet
WS 2021
16 pages
Sessional-II Exam Solution Spring 2024
No ratings yet
Sessional-II Exam Solution Spring 2024
7 pages
SS 2021
No ratings yet
SS 2021
16 pages
18CSE481T AML AIML CT3 Answer Key
No ratings yet
18CSE481T AML AIML CT3 Answer Key
11 pages
Computer Vision Exam Questions English
No ratings yet
Computer Vision Exam Questions English
9 pages
WS 2021 Solutions
No ratings yet
WS 2021 Solutions
16 pages
DL - Assignment 8 Solution
100% (2)
DL - Assignment 8 Solution
6 pages
Assignment 11 2022
No ratings yet
Assignment 11 2022
7 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
2019final IUP SampleAnswer
No ratings yet
2019final IUP SampleAnswer
11 pages
Mid II-sol
No ratings yet
Mid II-sol
4 pages
DL2024
No ratings yet
DL2024
4 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
Gen AI - Sessional-II Exam Solution
No ratings yet
Gen AI - Sessional-II Exam Solution
10 pages
MT1SP19
No ratings yet
MT1SP19
13 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
9 pages
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
No ratings yet
CS131 Computer Vision: Foundations and Applications Practice Final (Solution) Stanford University December 11, 2017
15 pages
DL Quiz1
No ratings yet
DL Quiz1
5 pages
Midpaper
No ratings yet
Midpaper
16 pages
1.explain The Concept of Empirical Risk Minimization. What Is The Goal of Optimization in Deep Learning?
No ratings yet
1.explain The Concept of Empirical Risk Minimization. What Is The Goal of Optimization in Deep Learning?
11 pages
Quiz Sol
No ratings yet
Quiz Sol
4 pages
Mid Sem Makeup Questions
No ratings yet
Mid Sem Makeup Questions
6 pages
بنك الاسئلة د محمود ابوالفتوح PDF
No ratings yet
بنك الاسئلة د محمود ابوالفتوح PDF
4 pages
Question 1-Canny Edge Detector (10 Points) : Fundamentals of Computer Vision - Midterm Exam Dr. B. Nasihatkon
100% (1)
Question 1-Canny Edge Detector (10 Points) : Fundamentals of Computer Vision - Midterm Exam Dr. B. Nasihatkon
6 pages
Assignment 8 2024 Updated
No ratings yet
Assignment 8 2024 Updated
6 pages
2023-24 First Sem - Computer Vision Compre Makeup
No ratings yet
2023-24 First Sem - Computer Vision Compre Makeup
4 pages
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
No ratings yet
CSE489: Machine Vision (Sheet 7) : Yehia Zakaria
34 pages
CS412 Computer Vision Final Exam
No ratings yet
CS412 Computer Vision Final Exam
5 pages
QP
No ratings yet
QP
3 pages
Cs230exam Win19 Soln
No ratings yet
Cs230exam Win19 Soln
29 pages
Deep Learning Exam Solutions 2019
No ratings yet
Deep Learning Exam Solutions 2019
20 pages
CS 182 Practice Midterm Questions
No ratings yet
CS 182 Practice Midterm Questions
8 pages
2024 Exam2 Solution
No ratings yet
2024 Exam2 Solution
11 pages
Mock Endterm ADL 2021
No ratings yet
Mock Endterm ADL 2021
8 pages
CS 231A Computer Vision Midterm: 1 Multiple Choice (22 Points)
100% (1)
CS 231A Computer Vision Midterm: 1 Multiple Choice (22 Points)
20 pages
Image Processing and Edge Detection Quiz
No ratings yet
Image Processing and Edge Detection Quiz
24 pages
DL - Assignment 11 Solution
No ratings yet
DL - Assignment 11 Solution
7 pages
Second Exam 2021-22 Solution
No ratings yet
Second Exam 2021-22 Solution
9 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
21cse251t QP
No ratings yet
21cse251t QP
4 pages
Assignment3 40168195
No ratings yet
Assignment3 40168195
13 pages
DLV Notes Preparatin
No ratings yet
DLV Notes Preparatin
24 pages
Non-Linearity in Neural Networks
No ratings yet
Non-Linearity in Neural Networks
2 pages
Deep Neural Networks Midterm Prep
No ratings yet
Deep Neural Networks Midterm Prep
5 pages
Mid Exam Image ف3 solution
No ratings yet
Mid Exam Image ف3 solution
4 pages
Homework 1
50% (2)
Homework 1
3 pages
Genai See
No ratings yet
Genai See
51 pages
SS 2021 Solutions
No ratings yet
SS 2021 Solutions
16 pages
2-Mark Questions: Image Sensor Processor Software
No ratings yet
2-Mark Questions: Image Sensor Processor Software
14 pages
Deep 2
No ratings yet
Deep 2
57 pages
CS230 Midterm Solutions Fall 2022
No ratings yet
CS230 Midterm Solutions Fall 2022
20 pages
Deep Learning
No ratings yet
Deep Learning
9 pages
Ee782 Es QP 2023
No ratings yet
Ee782 Es QP 2023
2 pages
Como Observa Un Robot
No ratings yet
Como Observa Un Robot
38 pages
Ml@ok Questions
No ratings yet
Ml@ok Questions
16 pages
Deep Learning in Procedural Content Generation
No ratings yet
Deep Learning in Procedural Content Generation
22 pages
M11 Final Document
No ratings yet
M11 Final Document
82 pages
Real-Time Sign Language Translator
No ratings yet
Real-Time Sign Language Translator
22 pages
CNNs for Fashion Classification & Detection
No ratings yet
CNNs for Fashion Classification & Detection
7 pages
Vision-Language Models on Bistable Images
No ratings yet
Vision-Language Models on Bistable Images
21 pages
Dental Image Processing
No ratings yet
Dental Image Processing
6 pages
Wa0356.
No ratings yet
Wa0356.
12 pages
Retraction of AI Techniques for COVID-19
No ratings yet
Retraction of AI Techniques for COVID-19
21 pages
Report Skin Cancer
No ratings yet
Report Skin Cancer
27 pages
Abstract For Facial Emotion Detection Using Neural Networks
No ratings yet
Abstract For Facial Emotion Detection Using Neural Networks
48 pages
Deepfake Elsevier CVIU
No ratings yet
Deepfake Elsevier CVIU
19 pages
Syllabus AI Bootcamp #7
No ratings yet
Syllabus AI Bootcamp #7
19 pages
AI-driven Irrigation Systems For Sustainable Water Management - A Systematic Review and Meta-Analytical Insights
No ratings yet
AI-driven Irrigation Systems For Sustainable Water Management - A Systematic Review and Meta-Analytical Insights
14 pages
Development of Machine Learning Methods For Mechanical Problems
No ratings yet
Development of Machine Learning Methods For Mechanical Problems
25 pages
Comparing Recurrent Convolutional Neural Networks For Large Scale Bird Species Classification
No ratings yet
Comparing Recurrent Convolutional Neural Networks For Large Scale Bird Species Classification
12 pages
Steel 2
No ratings yet
Steel 2
39 pages
Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
Introduction To Convolutional Neural Networks (CNNS)
28 pages
Indian Dance Classification with CNN-RNN
No ratings yet
Indian Dance Classification with CNN-RNN
7 pages
Hrithik Internship
No ratings yet
Hrithik Internship
33 pages
An Efficient Data-Driven Traffic Prediction Framework For Network Digital Twin
No ratings yet
An Efficient Data-Driven Traffic Prediction Framework For Network Digital Twin
8 pages
v1 Covered
No ratings yet
v1 Covered
29 pages
Deep Learning for Video Analytics Optimization
No ratings yet
Deep Learning for Video Analytics Optimization
12 pages
Full Page Handwriting Recognition Model
No ratings yet
Full Page Handwriting Recognition Model
16 pages
Logical It Networking N
No ratings yet
Logical It Networking N
21 pages
Optimized Brain Tumor Detection A Dual-Module Approach For MRI Image Enhancement and Tumor Classification
No ratings yet
Optimized Brain Tumor Detection A Dual-Module Approach For MRI Image Enhancement and Tumor Classification
20 pages
Course Information Sheet
No ratings yet
Course Information Sheet
7 pages
CMT: Hybrid CNN-Transformer Model
No ratings yet
CMT: Hybrid CNN-Transformer Model
11 pages
Deep Learning With PyTorch Guide For Beginners and Intermediate
100% (7)
Deep Learning With PyTorch Guide For Beginners and Intermediate
120 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
Deep Learning Mini Project Report
No ratings yet
Deep Learning Mini Project Report
2 pages