Ninio, J. and Stevens, K. A. (2000) Variations on the Hermann grid: an extinction illusion.
Perception, 29, 1209-1217.
Fundamental matrix
Let p be a point in left image, p’ in right image
l l’
p p’
Epipolar relation
• p maps to epipolar line l’
• p’ maps to epipolar line l
Epipolar mapping described by a 3x3 matrix F
𝑝′𝑇 𝐹𝑝 = 0
Fundamental matrix
This matrix F is called
• the “Essential Matrix”
– when image intrinsic parameters are known
• the “Fundamental Matrix”
– more generally (uncalibrated case)
Can solve for F from point correspondences
• Each (p, p’) pair gives one linear equation in entries of F
𝑝′𝑇 𝐹𝑝 = 0
• F has 9 entries, but really only 7 or 8 degrees of freedom.
Today’s lecture
• Depth Estimation from Stereo Matching (Sparse
correspondence to Dense Correspondence)
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Stereo Matching
Stereo image rectification
Stereo image rectification
• Reproject image planes
onto a common plane
parallel to the line
between camera centers
• Pixel motion is horizontal
after this transformation
• Two homographies (3x3
transform), one for each
input image reprojection
➢ C. Loop and Z. Zhang. Computing
Rectifying Homographies for Stereo
Vision. IEEE Conf. Computer Vision
and Pattern Recognition, 1999.
Rectification example
The correspondence problem
• Epipolar geometry constrains our search, but we still have a
difficult correspondence problem.
Fundamental Matrix + Sparse correspondence
Fundamental Matrix + Dense correspondence
SIFT + Fundamental Matrix + RANSAC
Building Rome in a Day
By Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, Richard Szeliski
Communications of the ACM, Vol. 54 No. 10, Pages 105-112
Sparse to Dense Correspodence
Building Rome in a Day
By Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, Richard Szeliski
Communications of the ACM, Vol. 54 No. 10, Pages 105-112
Structure from motion (or SLAM)
• Given a set of corresponding points in two or more
images, compute the camera parameters and the 3D
point coordinates
?
Camera 1
R1,t1 ? Camera 2
R2,t2 ? ?
Camera 3
R3,t3 Slide credit:
Noah Snavely
Structure from motion ambiguity
• If we scale the entire scene by some factor k and, at the same
time, scale the camera matrices by the factor of 1/k, the
projections of the scene points in the image remain exactly the
same:
1
x = PX = P (k X)
k
It is impossible to recover the absolute scale of the scene!
How do we know the scale of image content?
Bundle adjustment
• Non-linear method for refining structure and motion
• Minimizing reprojection error
2
E (P, X) = D(x ij , Pi X j )
m n
i =1 j =1
Xj
P1Xj
x1j x3j
P3Xj
P2Xj x2j
P1
P3
P2
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence problem
Multiple match
hypotheses
satisfy epipolar
constraint, but
which is correct?
Figure from Gee & Cipolla 1999
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence problem
• Beyond the hard constraint of epipolar geometry, there are “soft” constraints to
help identify corresponding points
• Similarity
• Uniqueness
• Ordering
• Disparity gradient
• To find matches in the image pair, we will assume
• Most scene points visible from both views
• Image regions for the matches are similar in appearance
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Dense correspondence search
For each epipolar line
For each pixel / window in the left image
• compare with every pixel / window on same epipolar line
in right image
• pick position with minimum match cost (e.g., SSD,
normalized correlation)
Adapted from Li Zhang
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence search with similarity constraint
Left Right
scanline
Matching cost
disparity
• Slide a window along the right scanline and compare
contents of that window with the reference window in
the left image
• Matching cost: SSD or normalized correlation
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence search with similarity constraint
Left Right
scanline
SSD
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence search with similarity constraint
Left Right
scanline
Norm. corr
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence problem
Intensity
profiles
Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correspondence problem
Neighborhoods of corresponding points are
similar in intensity patterns.
Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correlation-based window matching
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correlation-based window matching
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correlation-based window matching
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correlation-based window matching
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Correlation-based window matching
???
Textureless regions are
non-distinct; high
ambiguity for matches.
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Effect of window size
Source: Andrew Zisserman
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Effect of window size
W=3 W = 20
Want window large enough to have sufficient intensity
variation, yet small enough to contain only pixels with
about the same disparity.
Figures from Li Zhang
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Left image Right image
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Results with window search
Window-based matching Ground truth
(best window size)
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Better solutions
• Beyond individual correspondences to estimate disparities:
• Optimize correspondence assignments jointly
• Scanline at a time (e.g. dynamic programming)
• Full 2D grid (e.g. graph cuts)
• Approximate 2D solution (e.g. semi-global matching)
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Scanline stereo
• Try to coherently match pixels on the entire scanline
• Different scanlines are still optimized independently
Left image Right image
Robert Collins
CSE486, Penn State
Matching using Epipolar Lines
Left Image Right Image
For a patch in left image
Compare with patches along
same row in right image
Match Score Values
Robert Collins
CSE486, Penn State
Matching using Epipolar Lines
Left Image Right Image
Select patch with highest
match score.
Repeat for all pixels in Match Score Values
left image.
Robert Collins
CSE486, Penn State
Example: 5x5 windows
NCC match score
Computed disparities Ground truth
Black pixels: bad disparity values,
or no matching patch in right image
Robert Collins
CSE486, Penn State
Occlusions: No matches
Left image
Right image
Robert Collins
CSE486, Penn State
Effects of Patch Size
5x5 patches 11x11patches
Smoother in some areas Loss of finer details
Robert Collins
Adding Intra-Scanline Consistency
CSE486, Penn State
So far, each left image patch has been matched
independently along the right epipolar line.
This can lead to errors.
We would like to enforce some consistency
among matches in the same row (scanline).
Robert Collins
CSE486, Penn State
Disparity Space Image
First we introduce the concept of DSI.
The DSI for one row represents pairwise match scores
between patches along that row in the left and right image.
Pixels along left scanline
Pixel i
C(i,j) = Match score
Pixels along right scanline for patch centered
at left pixel i with
patch centered at
Pixel j right pixel j.
Robert Collins
CSE486, Penn State
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD
Robert Collins
CSE486, Penn State
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD
Robert Collins
CSE486, Penn State
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD
Robert Collins
CSE486, Penn State
Disparity Space Image (DSI)
Left Image DSI
Enter each vector of
match scores as a
column in the DSI
Dissimilarity Values
Robert Collins
CSE486, Penn State
Disparity Space Image
Left scanline
Right scanline
Robert Collins
CSE486, Penn State
Disparity Space Image
Left scanline
Invalid entries due to constraint
that disparity <= high value
64 in this case)
Right scanline
Invalid entries due to constraint
that disparity >= low value
(0 in this case)
Robert Collins
CSE486, Penn State
DSI and Scanline Consistency
Assigning disparities to all pixels in left scanline now
amounts to finding a connected path through the DSI
Start
End
Robert Collins
CSE486, Penn State
Lowest Cost Path
We would like to choose the “best” path.
Want one with lowest “cost” (Lowest sum of
dissimilarity scores along the path)
?
?
Robert Collins
CSE486, Penn State
Cox et.al. Stereo Matching
Occluded i-1,j-1 i-1,j
from right
match
Occluded match
Occluded
from left from left
i,j-1 Occluded i,j
Three cases: from right
– Matching patches. Cost = dissimilarity score
– Occluded from right. Cost is some constant value.
– Occluded from left. Cost is some constant value.
C(i,j)= min([ C(i-1,j-1) + dissimilarity(i,j)
C(i-1,j) + occlusionConstant,
C(i,j-1) + occlusionConstant]);
Robert Collins
CSE486, Penn State
Cox et.al. Stereo Matching
Start Recap: want to find lowest
cost path from upper left to
lower right of DSI image.
At each point on the path we
have three choices: step left,
step down, step diagonally.
End
Each choice has a well-defined
cost associated with it.
This problem just screams out for Dynamic Programming!
(which, indeed, is how Cox et.al. solve the problem)
Robert Collins
CSE486, Penn State
Real Scanline Example
DSI DP cost matrix
(cost of optimal path from each point to END)
Every pixel in left column now is marked with
either a disparity value, or an occlusion label.
Proceed for every scanline in left image.
Robert Collins
CSE486, Penn State
Example
Result of DP alg Result without DP (independent pixels)
Result of DP alg. Black pixels = occluded.
Robert Collins
CSE486, Penn State
Occlusion Filling
Simple trick for filling in gaps caused by occlusion.
= left occluded
Fill in left occluded pixels with value from the
nearest valid pixel preceding it in the scanline.
Similarly, for right occluded, look for valid pixel to the right.
Robert Collins
CSE486, Penn State
Example
Result of DP alg with occlusion filling.
Robert Collins
CSE486, Penn State
Example
Result of DP alg with occlusion filling. Result without DP (independent pixels)
Robert Collins
CSE486, Penn State
Example
Result of DP alg with occlusion filling. Ground truth
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Stereo with 2D smoothness constraint
• What defines a good stereo correspondence?
1. Match quality
• Want each pixel to find a good match in the other image
2. Smoothness
• If two pixels are adjacent, they should (usually) move about
the same amount
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Optimizing for match quality and smoothness (in any direction)
I1 I2 D
W1(i ) W2(i+D(i )) D(i )
E = Edata ( I1 , I 2 , D) + Esmooth ( D)
Edata = (W1 (i ) − W2 (i + D(i)) )
2
Esmooth = (D(i) − D( j ))
neighbors i , j
i
• Energy functions of this form can be minimized using
graph cuts
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate
Energy Minimization via Graph Cuts, PAMI 2001 Source: Steve Seitz
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Results with window search
Window-based matching Ground truth
(best window size)
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Better results…
Graph cut method Ground truth
Boykov et al., Fast Approximate Energy Minimization via Graph Cuts,
International Conference on Computer Vision, September 1999.
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Semi-global matching
• Approximate the full smoothness optimization by
considering 8 or 16 directions in two or three
passes.
• Optimization looks like scanline, dynamic
programming stereo, but with a 2d notion of
smoothness
Stereo Processing by Semi-Global Matching and Mutual Information. Hirschmuller,
PAMI 2007. 3500+ citations
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Semi-global matching
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
https://vision.middlebury.edu/stereo/eval3/
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Stereo Depth Estimation Challenges
• Low-contrast ; textureless image regions
• Occlusions
• Violations of brightness constancy (e.g., specular reflections)
• Really large baselines (foreshortening and appearance change)
• Camera calibration errors
CS 4495 Computer Vision – A. Bobick Motion and Optic Flow
Quizlet
Active stereo with structured light
• Project “structured” light patterns onto the object
• Simplifies the correspondence problem
• Allows us to use only one camera
camera
projector
L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured
Light and Multi-pass Dynamic Programming. 3DPVT 2002
Kinect: Structured infrared light
http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/
iPhone X
iPhone 12 switched to lidar
(time of flight)
Self-driving efforts use both lidar and stereo
State of the art lidar