ℓ1-norm Methods for
Convex-Cardinality Problems
• problems involving cardinality
• the ℓ1-norm heuristic
• convex relaxation and convex envelope interpretations
• examples
• recent results
EE364b, Stanford University
ℓ1-norm heuristics for cardinality problems
• cardinality problems arise often, but are hard to solve exactly
• a simple heuristic, that relies on ℓ1-norm, seems to work well
• used for many years, in many fields
– sparse design
– LASSO, robust estimation in statistics
– support vector machine (SVM) in machine learning
– total variation reconstruction in signal processing, geophysics
– compressed sensing
• new theoretical results guarantee the method works, at least for a few
problems
EE364b, Stanford University 1
Cardinality
• the cardinality of x ∈ Rn, denoted card(x), is the number of nonzero
components of x
0 x=0
• card is separable; for scalar x, card(x) =
1 x 6= 0
• card is quasiconcave on Rn+ (but not Rn) since
card(x + y) ≥ min{card(x), card(y)}
holds for x, y 0
• but otherwise has no convexity properties
• arises in many problems
EE364b, Stanford University 2
General convex-cardinality problems
a convex-cardinality problem is one that would be convex, except for
appearance of card in objective or constraints
examples (with C, f convex):
• convex minimum cardinality problem:
minimize card(x)
subject to x ∈ C
• convex problem with cardinality constraint:
minimize f (x)
subject to x ∈ C, card(x) ≤ k
EE364b, Stanford University 3
Solving convex-cardinality problems
convex-cardinality problem with x ∈ Rn
• if we fix the sparsity pattern of x (i.e., which entries are zero/nonzero)
we get a convex problem
• by solving 2n convex problems associated with all possible sparsity
patterns, we can solve convex-cardinality problem
(possibly practical for n ≤ 10; not practical for n > 15 or so . . . )
• general convex-cardinality problem is (NP-) hard
• can solve globally by branch-and-bound
– can work for particular problem instances (with some luck)
– in worst case reduces to checking all (or many of) 2n sparsity patterns
EE364b, Stanford University 4
Boolean LP as convex-cardinality problem
• Boolean LP:
minimize cT x
subject to Ax b, xi ∈ {0, 1}
includes many famous (hard) problems, e.g., 3-SAT, traveling salesman
• can be expressed as
minimize cT x
subject to Ax b, card(x) + card(1 − x) ≤ n
since card(x) + card(1 − x) ≤ n ⇐⇒ xi ∈ {0, 1}
• conclusion: general convex-cardinality problem is hard
EE364b, Stanford University 5
Sparse design
minimize card(x)
subject to x ∈ C
• find sparsest design vector x that satisfies a set of specifications
• zero values of x simplify design, or correspond to components that
aren’t even needed
• examples:
– FIR filter design (zero coefficients reduce required hardware)
– antenna array beamforming (zero coefficients correspond to unneeded
antenna elements)
– truss design (zero coefficients correspond to bars that are not needed)
– wire sizing (zero coefficients correspond to wires that are not needed)
EE364b, Stanford University 6
Sparse modeling / regressor selection
fit vector b ∈ Rm as a linear combination of k regressors (chosen from n
possible regressors)
minimize kAx − bk2
subject to card(x) ≤ k
• gives k-term model
• chooses subset of k regressors that (together) best fit or explain b
n
• can solve (in principle) by trying all choices
k
• variations:
– minimize card(x) subject to kAx − bk2 ≤ ǫ
– minimize kAx − bk2 + λ card(x)
EE364b, Stanford University 7
Sparse signal reconstruction
• estimate signal x, given
– noisy measurement y = Ax + v, v ∼ N (0, σ 2I) (A is known; v is not)
– prior information card(x) ≤ k
• maximum likelihood estimate x̂ml is solution of
minimize kAx − yk2
subject to card(x) ≤ k
EE364b, Stanford University 8
Estimation with outliers
• we have measurements yi = aTi x + vi + wi, i = 1, . . . , m
• noises vi ∼ N (0, σ 2) are independent
• only assumption on w is sparsity: card(w) ≤ k
• B = {i | wi 6= 0} is set of bad measurements or outliers
• maximum likelihood estimate of x found by solving
T 2
P
minimize i6∈B (yi − ai x)
subject to |B| ≤ k
with variables x and B ⊆ {1, . . . , m}
• equivalent to
minimize ky − Ax − wk22
subject to card(w) ≤ k
EE364b, Stanford University 9
Minimum number of violations
• set of convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0, x∈C
• choose x to minimize the number of violated inequalities:
minimize card(t)
subject to fi(x) ≤ ti, i = 1, . . . , m
x ∈ C, t ≥ 0
• determining whether zero inequalities can be violated is (easy) convex
feasibility problem
EE364b, Stanford University 10
Linear classifier with fewest errors
• given data (x1, y1), . . . , (xm, ym) ∈ Rn × {−1, 1}
• we seek linear (affine) classifier y ≈ sign(wT x + v)
• classification error corresponds to yi(wT x + v) ≤ 0
• to find w, v that give fewest classification errors:
minimize card(t)
subject to yi(wT xi + v) + ti ≥ 1, i = 1, . . . , m
with variables w, v, t (we use homogeneity in w, v here)
EE364b, Stanford University 11
Smallest set of mutually infeasible inequalities
• given a set of mutually infeasible convex inequalities
f1(x) ≤ 0, . . . , fm(x) ≤ 0
• find smallest (cardinality) subset of these that is infeasible
Pm
• certificate of infeasibility is g(λ) = inf x( i=1 λi fi (x)) ≥ 1, λ 0
• to find smallest cardinality infeasible subset, we solve
minimize card(λ)
subject to g(λ) ≥ 1, λ0
(assuming some constraint qualifications)
EE364b, Stanford University 12
Portfolio investment with linear and fixed costs
• we use budget B to purchase (dollar) amount xi ≥ 0 of stock i
• trading fee is fixed cost plus linear cost: β card(x) + αT x
• budget constraint is 1T x + β card(x) + αT x ≤ B
• mean return on investment is µT x; variance is xT Σx
• minimize investment variance (risk) with mean return ≥ Rmin:
minimize xT Σx
subject to µT x ≥ Rmin, x 0
1T x + β card(x) + αT x ≤ B
EE364b, Stanford University 13
Piecewise constant fitting
• fit corrupted xcor by a piecewise constant signal x̂ with k or fewer jumps
• problem is convex once location (indices) of jumps are fixed
• x̂ is piecewise constant with ≤ k jumps ⇐⇒ card(Dx̂) ≤ k, where
1 −1
1 −1 ∈ R(n−1)×n
D=
... ...
1 −1
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(Dx̂) ≤ k
EE364b, Stanford University 14
Piecewise linear fitting
• fit xcor by a piecewise linear signal x̂ with k or fewer kinks
• as convex-cardinality problem:
minimize kx̂ − xcork2
subject to card(∇2x̂) ≤ k
where
−1 2 −1
2
−1 2 −1
∇ =
... ... ...
−1 2 −1
EE364b, Stanford University 15
ℓ1-norm heuristic
• replace card(z) with γkzk1, or add regularization term γkzk1 to
objective
• γ > 0 is parameter used to achieve desired sparsity
(when card appears in constraint, or as term in objective)
P P P
• more sophisticated versions use i wi |zi | or i wi (zi )+ + i vi (zi )− ,
where w, v are positive weights
EE364b, Stanford University 16
Example: Minimum cardinality problem
• start with (hard) minimum cardinality problem
minimize card(x)
subject to x ∈ C
(C convex)
• apply heuristic to get (easy) ℓ1-norm minimization problem
minimize kxk1
subject to x ∈ C
EE364b, Stanford University 17
Example: Cardinality constrained problem
• start with (hard) cardinality constrained problem (f , C convex)
minimize f (x)
subject to x ∈ C, card(x) ≤ k
• apply heuristic to get (easy) ℓ1-constrained problem
minimize f (x)
subject to x ∈ C, kxk1 ≤ β
or ℓ1-regularized problem
minimize f (x) + γkxk1
subject to x ∈ C
β, γ adjusted so that card(x) ≤ k
EE364b, Stanford University 18
Polishing
• use ℓ1 heuristic to find x̂ with required sparsity
• fix the sparsity pattern of x̂
• re-solve the (convex) optimization problem with this sparsity pattern to
obtain final (heuristic) solution
EE364b, Stanford University 19
Interpretation as convex relaxation
• start with
minimize card(x)
subject to x ∈ C, kxk∞ ≤ R
• equivalent to mixed Boolean convex problem
minimize 1T z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x ∈ C, zi ∈ {0, 1}, i = 1, . . . , n
with variables x, z
EE364b, Stanford University 20
• now relax zi ∈ {0, 1} to zi ∈ [0, 1] to obtain
minimize 1T z
subject to |xi| ≤ Rzi, i = 1, . . . , n
x∈C
0 ≤ zi ≤ 1, i = 1, . . . , n
which is equivalent to
minimize (1/R)kxk1
subject to x ∈ C
kxk∞ ≤ R
the ℓ1 heuristic
• optimal value of this problem is lower bound on original problem
EE364b, Stanford University 21
Interpretation via convex envelope
• convex envelope f env of a function f on set C is the largest convex
function that is an underestimator of f on C
• epi(f env ) = Co(epi(f ))
• f env = (f ∗)∗ (with some technical conditions)
• for x scalar, |x| is the convex envelope of card(x) on [−1, 1]
• for x ∈ Rn scalar, (1/R)kxk1 is convex envelope of card(x) on
{z | kzk∞ ≤ R}
EE364b, Stanford University 22
Weighted and asymmetric ℓ1 heuristics
• minimize card(x) over convex set C
• suppose we know lower and upper bounds on xi over C
x ∈ C =⇒ li ≤ xi ≤ ui
(best values for these can be found by solving 2n convex problems)
• if ui < 0 or li > 0, then card(xi) = 1 (i.e., xi 6= 0) for all x ∈ C
• assuming li < 0, ui > 0, convex relaxation and convex envelope
interpretations suggest using
n
X (xi)+ (xi)−
+
i=1
ui −li
as surrogate (and also lower bound) for card(x)
EE364b, Stanford University 23
Regressor selection
minimize kAx − bk2
subject to card(x) ≤ k
• heuristic:
– minimize kAx − bk2 + γkxk1
– find smallest value of γ that gives card(x) ≤ k
– fix associated sparsity pattern (i.e., subset of selected regressors) and
find x that minimizes kAx − bk2
EE364b, Stanford University 24
Example (6.4 in BV book)
• A ∈ R10×20, x ∈ R20, b ∈ R10
• dashed curve: exact optimal (via enumeration)
• solid curve: ℓ1 heuristic with polishing
10
8
card(x)
0
0 1 2 3 4
kAx − bk2
EE364b, Stanford University 25
Sparse signal reconstruction
• convex-cardinality problem:
minimize kAx − yk2
subject to card(x) ≤ k
• ℓ1 heuristic:
minimize kAx − yk2
subject to kxk1 ≤ β
(called LASSO)
• another form: minimize kAx − yk2 + γkxk1
(called basis pursuit denoising)
EE364b, Stanford University 26
Example
• signal x ∈ Rn with n = 1000, card(x) = 30
• m = 200 (random) noisy measurements: y = Ax + v, v ∼ N (0, σ 2I),
Aij ∼ N (0, 1)
• left: original; right: ℓ1 reconstruction with γ = 10−3
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
EE364b, Stanford University 27
• ℓ2 reconstruction; minimizes kAx − yk2 + γkxk2, where γ = 10−3
• left: original; right: ℓ2 reconstruction
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
EE364b, Stanford University 28
Some recent theoretical results
• suppose y = Ax, A ∈ Rm×n, card(x) ≤ k
• to reconstruct x, clearly need m ≥ k
• if m ≥ n and A is full rank, we can reconstruct x without cardinality
assumption
• when does the ℓ1 heuristic (minimizing kxk1 subject to Ax = y)
reconstruct x (exactly)?
EE364b, Stanford University 29
recent results by Candès, Donoho, Romberg, Tao, . . .
• (for some choices of A) if m ≥ (C log n)k, ℓ1 heuristic reconstructs x
exactly, with overwhelming probability
• C is absolute constant; valid A’s include
– Aij ∼ N (0, σ 2)
– Ax gives Fourier transform of x at m frequencies, chosen from
uniform distribution
EE364b, Stanford University 30