Stats 330: An Introduction to Compressed Sensing
Lecture 1 & 2: agenda
1 Underdetermined systems of linear equations
2 Convex optimization often returns the “correct” answer
3 What is compressed sensing?
Underdetermined Systems of Linear Equations
Underdetermined systems of linear equations
b = A
x ∈ Rn
m n linear equations about x
b = Ax, b ∈ Rm
Would like to recover x
Arises in many fields of science and engineering
Example in genetics: sparse regression
Find the locations on the genome that influence a trait: e.g. cholesterol level
Model
y = Ax + z
yi : measured cholesterol level of patient i
A : genotype matrix; ith row encodes the genotype of patient i
e.g. Ai,j number of alleles of type a at location j
zi environmental factors (not accounted by genetics) for patient i
Typical dimensions:
# of columns: about 500,000
# of rows: a few thousands
Example in reflection seismology (sketch)
Impedence and reflectivity profiles
(a) (b)
(a): impedance A(z)
(b): reflectivity α0 (z)
Probe and seismic trace
wavelet probe seismic trace
Linear inversion of bandlimited reflection seismograms
1D wave propagation
A(z)utt = (A(z)uz )z , t > 0, z ≥ 0
t is time and z is depth
u(t, z) particle displacement at time t and depth z
A(z) acoustical impedance; desired unknown
System is probed by a “wavelet” f (t)
uz (t, z = 0) = f (t)
Reflection seismogram (seismic trace): surface particle velocity
ut (t, z = 0) = h(t), 0≤t≤T
Problem: find A(z) from given data (f (t), h(t))
Linear inversion
A(z) = 1 + α(z)
α small pertubation
Linear approximation
Z t
g(t) = f (t) + h(t) = f (t − τ )α0 (τ /2) dτ
0
so with r(τ ) = α0 (τ /2) (this is the reflectivity function)
f ? r = g.
This is a deconvolution problem
Bandpass effect
Probing wavelet f is bandpass
Consequence: seismic experiment is missing low and high frequencies
Unknown reflectivity has low and high frequency content
wavelet Fourier transform
Ill-posed or undersampled problem
Discrete-time Fourier transform:
N −1
1
fˆ[k] =
X
√ f [t]e−i2πkt/N , 0≤k ≤n−1
t=0
N
Deconvolution in the Fourier domain
ĝ[k] = fˆ[k] r̂[k], 0≤k ≤n−1
fˆ[k] ≈ 0 → missing frequencies/equations
Widefield microscope
Deconvolution/super-resolution in widefield spectroscopy
Widefield microscope
Iout = |H|2 � Iin
Optical Transfer Function
UV 1
ser
0.5
CCD Camera 0
−kmax 0 kmax
widefield microscope transfer function H
Iout = H ? Iin
Magnetic resonance imaging (sketch)
(Powerful) magnetic field aligns nuclear magnetization
of (usually) hydrogen atoms in water in the body
RF fields systematically alter the alignment of this
magnetization → hydrogen nuclei produce a rotating
magnetic field detectable by the scanner
Make excitation strength space dependent
Goal is to recover proton density
Model
Discrete-time Fourier transform of f [t1 , t2 ], 0 ≤ t1 , t2 ≤ n − 1
fˆ(ω1 , ω2 ) =
X
f [t1 , t2 ]e−i2π(ω1 t1 +ω2 t2 )
(t1 ,t2 )
MR scan
y(ω1 , ω2 ) = fˆ(ω1 , ω2 ) + noise
for some frequencies (ω1 , ω2 )
Undersampling in magnetic resonance angiography
space frequency
Wish to speed up MRI by sampling less (very important to widen applicability)
example: 500 × 500 pixel image
about 20 radial lines; 500 measurements along each line
∼ 96% of the equations are missing
Frequency interpolation
Original Phantom (Logan−Shepp)
50
A Row of the Fourier Matrix
25
100
20
15
150
10
200
5
0
250
50 100 150 200 250
−5
−10
−15
−20
−25
0 50 100 150 200 250 300
Undersample Nyquist by 25 at high frequencies!
Underdetermined system
N -pixel image f
Underdetermined system
y(ω1 , ω2 ) = fˆ(ω1 , ω2 ) + z(ω1 , ω2 )
z is a noise term
Interested in m N samples
The Netflix problem
Netflix database
About a million users
About 25,000 movies
People rate movies
Sparsely sampled entries
The Netflix problem
Movies
Netflix database
× ×
About a million users × ×
About 25,000 movies
× ×
Users
People rate movies
× ×
Sparsely sampled entries ×
× ×
Challenge
Complete the “Netflix matrix”
Many such problems → collaborative filtering
Partially filled out surveys...
Matrix completion
? ? ? × ?
×
? ? × × ? ?
Matrix M ∈ Rn1 ×n2
? ? × ? ?
×
Observe subset of entries ? ? × ? ? ×
? ? ? ? ?
Wish to infer the missing entries ×
? ? × × ? ?
Underdetermined systems of linear equations
b = A
When fewer equations than unknowns
Fundamental theorem of algebra says we cannot find x
In general, this is absolutely correct
Convex Optimization often Returns the Correct Answer
Special structure
= *
If unknown is assumed to be
sparse (genomics example)
low-rank (Netflix)
then one can often find solutions to these problems by convex optimization
Sparsity in genomics
y = Ax (+z)
n ∼ 500, 000
m ∼ 1, 000
But number of nonzero components of x in the tens
x is said to be s-sparse if
kxk`0 = #{i : xi 6= 0} ≤ s
Under some conditions, noiseless recovery by `1 minimization is exact
`1 minimization
`0 minimization `1 minimization
min kxk`0 min kxk`1
s. t. Ax = b s. t. Ax = b
Combinatorially hard Linear program
Why an ! objective?
`1 norm is closest convex 1approximation to `0 quasi-norm
!0 quasi-norm !1 norm !2 norm
`1 ball, smallest convex ball containing ±ei = (0, . . . , 0, ±1, 0, . . . , 0)
Linear programming formulation
P
minimize i |xi |
subject to Ax = b
is equivalent to P
minimize i ti
subject to Ax = b
−ti ≤ xi ≤ ti
with variables x, t ∈ Rn
x? solution ⇔ (x? , t? = |x? |) solution
Sparse regression in genomics
y = Ax + z
Lasso Dantzig selector
min kxk`1
min kxk`1
s. t. kb − Axk`2 ≤
s. t. kA∗ (b − Ax)k`∞ (w) ≤ δ
Under some conditions
accurate recovery/estimation from noisy data
perfect recovery from noiseless data (z = 0)
Sparse deconvolution in seismics
g =f ?r
(a) (b)
Real earth is irregularly layered
Consequence: r is an irregular series of spikes (“sparse spike train”)
Recovery by `1 minimization
min krk`1 s. t. f ?r =g
Under some conditions, the recovery is exact
`1 reconstruction from undersampled freq. data
Signal length: n = 256
Data: m = 30 (complex) valued Fourier coefficients
Sparsity: s = 15
2.5
1.5
0.5
!0.5
!1
!1.5
0 50 100 150 200 250 300
original
`1 reconstruction from undersampled freq. data
Signal length: n = 256
Data: m = 30 (complex) valued Fourier coefficients
Sparsity: s = 15
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
!0.5 !0.5
!1 !1
!1.5 !1.5
0 50 100 150 200 250 300 0 50 100 150 200 250 300
original `1 recovery = perfect!
`1 reconstruction from undersampled freq. data
Signal length: n = 256
Data: m = 30 (complex) valued Fourier coefficients
Sparsity: s = 15
2.5 2.5 0.6
0.5
2 2
0.4
1.5 1.5
0.3
1 1
0.2
0.5 0.5 0.1
0
0 0
!0.1
!0.5 !0.5
!0.2
!1 !1
!0.3
!1.5 !1.5 !0.4
0 50 100 150 200 250 300 0 50 100 150 200 250 300 0 50 100 150 200 250 300
original `1 recovery = perfect! `2 recovery
Example in MRI
image is not sparse
gradient is sparse
So minimize `1 norm of gradient subject to constraints
X
min kxkT V := |∇x(t1 , t2 )| s. t. Ax = b
t1 ,t2
Original Phantom (Logan−Shepp)
50
100
150
200
250
50 100 150 200 250
original
Example in MRI
image is not sparse
gradient is sparse
So minimize `1 norm of gradient subject to constraints
X
min kxkT V := |∇x(t1 , t2 )| s. t. Ax = b
t1 ,t2
Original Phantom (Logan−Shepp) Naive Reconstruction
50 50
100 100
150 150
200 200
250 250
50 100 150 200 250 50 100 150 200 250
original filtered backprojection
Example in MRI
image is not sparse
gradient is sparse
So minimize `1 norm of gradient subject to constraints
X
min kxkT V := |∇x(t1 , t2 )| s. t. Ax = b
t1 ,t2
Original Phantom (Logan−Shepp) Naive Reconstruction Reconstruction: min BV + nonnegativity constraint
50 50 50
100 100 100
150 150 150
200 200 200
250 250 250
50 100 150 200 250 50 100 150 200 250 50 100 150 200 250
original filtered backprojection perfect recovery
Under some conditions, the recovery is exact
Low-rank modeling in collaborative filtering
? ? ? × ?
×
? ? × × ? ?
large number of entries
× ? ? × ? ?
only sees a few ? ? × ? ? ×
? ? ? ? ?
But unknwon matrix has low rank ×
? ? × × ? ?
Under some conditions, noiseless recovery by convex programming is exact
Nuclear-norm minimization
rank minimization nuclear-norm minimization
i σi (X)
P
min rank(X) min
s. t. Xij = Mij (Mij observed) s. t. Xij = Mij (Mij observed)
Combinatorially hard Semidefinite program
Convex relaxation of the rank minimization program
Ball {X : i σi (X) ≤ 1}: convex hull of rank-1 matrices obeying kxy ∗ k ≤ 1
P
Low-rank regression
Yij = Mij + Zij , (i, j) ∈ E
Under some conditions
accurate matrix recovery/estimation from noisy data
perfect matrix recovery from noiseless data (Z = 0)
What is Compressive Sensing?
The goal of image compression is to represent the digital model of an object as
compactly as possible. One can regard the the possibility of digital compression as
a failure of sensor design. If it is possible to compress measured data, one might
argue that too many measurements were taken.
David Brady
Compressed sensing
Name coined by David Donoho
Has become a label for sparse signal recovery
But really one instance of underdetermined problems
Compressed sensing
Name coined by David Donoho
Has become a label for sparse signal recovery
But really one instance of underdetermined problems
Informs analysis of underdetermined problems
Changes viewpoint about underdetermined problems
Starting point of a general burst of activity in
information theory
signal processing
statistics
some areas of computer science
...
Inspired new areas of research, e. g. low-rank matrix recovery
A little experiment
Raw 5 MegaPixel image
15 Megabytes
A little experiment
JPEG Compressed 5 MegaPixel image
150 Kilobytes
A contemporary paradox
Massive data acquisition
Most of the data is redundant and can be Raw: 15MB
thrown away
Seems enormously wasteful
JPEG: 150KB
A long established tradition
Acquire/Sample (A-to-D converter, digital camera)
Compress (signal dependent, nonlinear)
N >> M
N M
sample compress transmit/store
!
! sparse !
wavelet
transform
receive
M decompress
N
Sparsity in signal processing
1 megapixel image
Sparsity in signal processing
wavelet coefficients
16000
14000
12000
10000
8000
6000
4000
2000
−2000
−4000
0 2 4 6 8 10 12
5
x 10
⇓
3000
2000
1000
−1000
1 megapixel image −2000
−3000
2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7
4
x 10
zoom in
Implication: can discard small coefficients without much perceptual loss
Sparsity and wavelet “compression”
Take a mega-pixel image
1 Compute 1,000,000 wavelet coefficients
2 Set to zero all but the 25,000 largest coefficients
3 Invert the wavelet transform
1 megapixel image
Sparsity and wavelet “compression”
Take a mega-pixel image
1 Compute 1,000,000 wavelet coefficients
2 Set to zero all but the 25,000 largest coefficients
3 Invert the wavelet transform
1 megapixel image 25k term approximation
This principle underlies modern lossy coders
Transform-domain image coding
Sparse representation = good compression
Why? Because there are fewer things to send/store
Transform-domain image coding
Sparse representation = good compression
Why? Because there are fewer things to send/store
Canonical image coder
Most of the transformed coefficients are ≈ 0 ⇒ few bits
How many measurements to acquire a sparse signal?
Motivation: Sam
concentrated vector
x is s-sparse
Take m random and nonadaptive measurements
yk = hak , xi, k = 1, . . . , m
e.g. ak ∼ N (0, I)
Reconstruct by `1 minimization
• Signal is local, measurements
• Each measurement picks up a
• Triangulate significant compone
• Formalization: Relies on uncer
measurement system
How many measurements to acquire a sparse signal?
Motivation: Sam
concentrated vector
x is s-sparse
Take m random and nonadaptive measurements
yk = hak , xi, k = 1, . . . , m
e.g. ak ∼ N (0, I)
Reconstruct by `1 minimization
• Signal is local, measurements
First fundamental result • Each measurement picks up a
If m & s log n
• Triangulate significant compone
Recovers original exactly
• Formalization: Relies on uncer
Efficient acquisition is possible by nonadaptive sensing
measurement system
Motivation: Sampling a Sparse Vector
concentrated vector incoherent measurements
• Signal is local,
signal is localmeasurements are global
measurements are global
• Each measurement picks up a little information about each component
• Triangulate significant components from measurements
• Formalization: Relies on uncertainty principles between sparsity basis and
Signals/images may not be exactly sparse
image wavelet coefficient table
Nonadaptive sensing of compressible signals
1
0.5
!0.5
0 5 10 15 20 25 30 35 40 45 50
Classical viewpoint
Measure everything
(all the pixels, all the coefficients)
Store s-largest coefficients
distortion : kx − xs k`2
Nonadaptive sensing
Motivation:of compressible
Sampling signals
a Sparse Vector
concentrated vector incoherent measurements
1
0.5
!0.5
0 5 10 15 20 25 30 35 40 45 50
• Signal is local, measurements are global Compressed sensing
Classical viewpoint
• Each measurement picks up a little information about each component
Measure everything Take m random measurements
Triangulateallsignificant components from measurements
(all the• pixels, the coefficients) bk = hak , xi
• Formalization: Relies on uncertainty principles between sparsity basis and
Store s-largest coefficients
measurement system
distortion : kx − xs k`2 Reconstruct by `1 minimization
Second fundamental result
With m & s log n or even m & s log n/s
kx̂ − xk`2 ≤ kx − xs k`2
Simultaneous (nonadaptive) acquisition and compression
Optimality
Cannot do essentially better with
fewer measurements
with other reconstruction algorithms
But data are always noisy...
Random sensing with a ∼ N (0, I)
yk = hak , xi + σzk , k = 1, . . . , m
a ∼ N (0, I) (say)
zk iid N (0, 1) (say)
But data are always noisy...
Random sensing with a ∼ N (0, I)
yk = hak , xi + σzk , k = 1, . . . , m
a ∼ N (0, I) (say)
zk iid N (0, 1) (say)
Third fundamental result: C. and Plan (’09)
Recovery via lasso or Dantzig selector
s̄ = m/ log(n/m)
sσ 2
kx̂ − xk2`2 . inf kx − xs k2`2 + log n
1≤s.s̄ m
= near-optimal bias-variance trade off
Result holds more generally
Opportunities
When measurements are
expensive (e.g. fuel cell imaging, near IR imaging)
slow (e.g. MRI)
beyond current capabilities (e.g. wideband analog to digital conversion)
wasteful
missing
...
Compressed sensing may offer a way out
Example of hardware development
New analog-to-digital converter architectures (guest lecture by A. Emami)
New optical devices; cameras, spectroscopes, microscopes
Fast MRI (guest lecture by M. Lustig)
Course agenda and objectives
1 fundamental theoretical and mathematical ideas
2 efficient numerical methods in large-scale convex optimization
3 progress in implementing compressive sensing ideas into acquisition devices
Course agenda and objectives
1 fundamental theoretical and mathematical ideas
2 efficient numerical methods in large-scale convex optimization
3 progress in implementing compressive sensing ideas into acquisition devices
Course objectives
expose students to a novel active field of research
give tools to make theoretical and/or practical contributions