CV801: Advanced Computer Vision
Week 1 Lecture 2
Class Participation and Peer-Review (10% Weightage)
Class-participation: 5%
• In-person Attendance: 3%.
• Full mark: In-person attendance in 18 out of 30 lectures AND 7 out of 15 labs
• Reading research papers in advance, and providing correct answers for the in-class room Quizzes-2%
Peer Review: 5%
• Participate in the discussions related to project presentations and paper presentations of other
students: 1%
• 1-page review report on Projects of other groups ( Each person write two peer-review report): 4%
2
Introduction and Overview of Computer Vision
What is Computer Vision?
• Ability of computers
• To understand visual data
• For example, images, videos…
• Automate tasks
• Which human visual system can perform
What is Computer Vision?
• To extract “meaning” from pixels. To bridge the gap between image pixels and
“meaning” (semantic)!
What we see!
What computer sees!
What do we have here?
Seems easy ……..
Wrong! Vision is Hard
• Vision is an amazing feature of natural intelligence
• Around 50% of neural tissues of human brain is directly or indirectly
related to vision, which assists in visual learning.
Hardware perspective:
Is that a Massive digital data collections
queen or a
bishop?
Why Study Computer Vision?
• Engineering point of view - Computer Vision helps to solve many
practical problems: business potential
• Scientific point of view - Human kind of visual system is one of
the grand challenges of Artificial Intelligence (AI)
AI itself is a grand challenge of computing
• Massive visual data on internet
More than 70 million photos are shared on Instagram every day (more than 50 billion photos in total)
300 million images a day (More than 350 billion photos in total)
More than 500 hours of video uploaded every minute
Why Study Computer Vision?
• Used to be done mostly in academics.
• Recent advancements:
Business potential Substantial Commercial Interest
• Google
• Meta AI/Facebook
• Apple
List of CVPR 2024 sponsors
• Amazon
• Microsoft
• OpenAI
• G42
• TII
•…
Why Study Computer Vision?
• Numerous real-world practical applications
Autonomous Driving Security Computer vision
Health
technology can
improve our lives
Biometric Access Comfort: Robot Fun: Virtual Avatar
Why Study Computer Vision?
12
Why Study Computer Vision?
• CVPR conference ranking (Engineering) as of 2024
13
Why Study Computer Vision?
• CVPR papers
2023 2024
Why Study Computer Vision?
Substantial Commercial Interest
List of CVPR 2022 sponsors
CV801 Topics vs Major topics in CVPR 2023
• Covering 8 Out of 12 top CVPR 2023 topics
• Covering ~12 topics
16
Acceptance Rate for Each Topic: CVPR 2024
17
Common Computer Vision Tasks
18
Common Computer Vision Tasks
Image Categorization/Recognition:
CAT
Common Computer Vision Tasks
Scene Recognition:
Is this an outdoor image?
21
Activity Recognition
Activity:
What is this person doing in this image?
Common Computer Vision Tasks: Detection
Detection:
Where is a car in this image?
Common Computer Vision Tasks: Detection
24
Semantic Segmentation
GRASS, CAT, TREE, SKY
25
Instance Segmentation
DOG, DOG, CAT
26
Common Computer Vision Tasks: Segmentation
Semantic Object Instance
Classification
Segmentation Detection Segmentation
CAT GRASS, CAT, TREE, DOG, DOG, CAT DOG, DOG, CAT
SKY
No spatial extent No objects, just pixels Multiple Objects
Video Instance Segmentation
28
Research Paper Presentations (10% Weightage)
Objective
• Learn to systematically introduce a research topic
• Improve teaching and presentation skills
• Involve in critical discussions about research papers
How to Select a Topic?
• Suggested topics.
• Specialized Applications of Segmentation: Eg. medical image segmentation (~3 presentations)
• Vision Foundation Models: Segment Anything Model (SAM) (~2 presentations)
• Efficient Architectures for Computer Vision Applications: State-space Models and Mamba (~4 presentations)
• Conversational LLMs and Vision-Language Models (~2 presentations)
• Image Generation using Diffusion Models (~5 presentations)
• Remote sensing, change detection (~2 presentations)
• Human-centric Vision (~2 presentations)
• All presenters on the same topic should work together to systematically introduce the concepts.
29
Specialized Applications of Segmentation: 3D Medical Image segmentation
UNETR: Transformers for 3D Medical Image Segmentation, WACV 2022
30
Remote Sensing Change Detection
Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review.
[Link]
34
Foundation Models in Vision
Foundational Models Defining a New Era in Vision: A Survey and Outlook
38
[Link]
Generalizable Localization Models
Segment Anything Model (SAM- [Link]
SAM for Synthetic Embryo Detection, Counting and Segmentation
(without training the model on target dataset or target category)
Embryo detection & counting Segmentation
Input Count=307
39
Large Language Models
40
Multi-Model LLMs
[Link]
Multi-Model LLMs
Image Generation Using Diffusion Models
Diffusion Models in Vision: A Survey [Link]
“A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and
a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over
several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original
input data by learning to gradually reverse the diffusion process, step by step “
Forward
Reverse
Image Generation (i)
1. Diffusion Models 2. Multi Model LLM Meets Diffusion Models
Eg: For Person Image Synthesis, CVPR 2023
[Link]
Image Generation (ii)
3. 3D-aware Image Generation 4. Image Generation for Healthcare Applications
ICCV 2023 MICCAI 2023
[Link]
Human-centric Scene Understanding
Example: Pedestrian detection, Multi-camera person search, Crowd counting, Pose estimation, Activity
recognition
Pedestrian Detection Person Search Crowd Counting Human Pose Estimation
[Link]
ARCHITECTURE DESIGN CHOICES FOR
REAL-WORLD VISION APPLICATIONS
• Development of Efficient network architectures
For image classification, object detection, segmentation
and human pose estimation in images and videos.
Vision Mamba
• Mamba for Medical Image Segmentation
[Link]
Questions?
Survey Outcome
Expected Deep learning and CNN backgrounds
• Perceptron. • Regularization
• Multi-layer Perceptron • Dropout
• Backpropagation • Data Augmentation
• Stochastic gradient descent. • Batch normalization
• Cross entropy loss
• CNN layer
58
Summary
• Course Overview
• Introduction and Overview of Computer Vision
• Common Computer Vision tasks
[Link]