Agenda
1 Cone programming
2 Convex cones
3 Generalized inequalities
4 Linear programming (LP)
5 Second-order cone programming (SOCP)
6 Semidefinite programming (SDP)
7 Examples
Optimization problem in standard form
minimize f0 (x)
subject to fi (x) ≤ 0 i = 1, . . . , m
hi (x) = 0 i = 1, . . . , p
x ∈ Rn
f0 : Rn → R (objective or cost function)
fi : R n → R (inequality constraint functionals)
n
hi : R → R (equality constraint functionals)
Terminology
x is feasible if x obeys the constraints
feasible set C: set of all feasible points
optimal value: p? = inf{f0 (x), x ∈ C}
can be −∞; e.g. min log(x), x > 0.
by convention, p? = ∞ if C = ∅ (problem infeasible)
optimal solution: x? s.t. f (x? ) = p?
there may be no optimal solution: e.g. min log(x), x > 0
optimal set: {x : f (x) = p? }
Convex optimization problem
Convex optimization problem in standard form
minimize f0 (x)
subject to fi (x) ≤ 0 i = 1, . . . , m
aTi x = bi i = 1, . . . , p
f0 , f1 , . . . , fm convex
affine equality constraints Ax = b, A ∈ Rp×n
feasible set is convex
Abstract convex optimization problem
minimize f0 (x)
subject to x∈C
f0 convex
C convex
Why convexity?
A convex function has no local minimum that is not global
convex not convex
A convex set is connected and has feasible directions at any point
convex + not convex
feasible directions
A convex function is continuous and has some differentiability properties
Convex functions arise prominently in duality
Cone programming I
LP
minimize cT x
subject to Fx + g ≥ 0
Ax = b
Nonlinear programming → nonlinear constraints
Express nonlinearity via generalized inequalities
Orderings of Rn and convex cones
K is a convex cone if
(i) K is convex
(ii) K is a cone (i.e. x ∈ K =⇒ λx ∈ K ∀λ ≥ 0)
K is pointed if
(iii) x ∈ K and − x ∈ K =⇒ x = 0
(K does not contain a straight line through the origin)
Example: K = {x ∈ Rn : x ≥ 0} is a pointed convex cone
Two additional properties of Rn+
(iv) Rn+ is closed
(v) Rn+ has a nonempty interior
Implication: ordering
a K b ⇐⇒ a − b ∈ K
(i) - (iii) ensure that this is a good ordering
1 reflexivity: a a follows from 0 ∈ K
2 antisymmetry: a b, b a =⇒ a = b (since K is pointed)
3 transitivity: a b, b c =⇒ a c (since K is convex and a cone)
→ compatibility with linear operations
a b & λ ≥ 0 =⇒ λa λb
a b & c d =⇒ a + c b + d
Good properties of LPs come from these properties
4 closedness: ai bi , ai → a, bi → b =⇒ a b
5 nonempty interior allows us to define strict inequalities:
a b ⇐⇒ a − b ∈ int(K)
Examples of cones
Nonnegative orthant Rn+
{x ∈ Rn : x ≥ 0}
Second-order (or Lorentz or ice cream) cone
q
{x ∈ Rn+1 : x21 + . . . + x2n ≤ xn+1 }
Positive semidefinite cone
{X ∈ S n : X 0}
Cone programming II
minimize cT x
subject to Fx + g 0
Ax = b
K = Rn+ =⇒ linear programming
Minimize linear functional over an affine slice of a cone
Very fruitful point of view
useful theory (duality)
useful algorithms (interior point methods)
Linear programming (LP)
minimize cT x
subject to Fx + g ≥ 0
Ax = b
Linear objective
Linear equality and inequality constraints
x* (optimal)
Feasible set is a polyhedron
c
cTx = constant
Many problems can be formulated as LP’s
Example: Chebyshev approximation
A ∈ Rm×n
b ∈ Rm
minimize kAx − bk∞ ⇐⇒ minimize maxi=1,...,m |aTi x − bi |
Different from LS problem: minimizekAx − bk2
LP formulation (epigraph trick)
⇐⇒ minimize t
subject to |aTi x − bi | ≤ t ∀i
⇐⇒ minimize t
subject to − t ≤ aTi x − bi ≤ t ∀i
optimization variables (x, t) ∈ Rn+1
Example: basis pursuit
A ∈ Rm×n
b ∈ Rm
minimize kxk1
subject to Ax = b
LP formulations:
(a) P
minimize ti
subject to −ti ≤ xi ≤ ti
Ax = b
optimization variables (x, t) ∈ R2n
(b) P −
x+
P
minimize i + xi
subject to A(x+ − x− ) = b
x+ , x− ≥ 0
optimization variables (x+ , x− ) ∈ R2n
Second-order cone programming (SOCP)
minimize cT x
subject to kFi x + gi k2 ≤ cTi x + di i = 1, . . . , m
Ax = b
Fi x + gi
kFi x + gi k2 ≤ cTi x + di ⇐⇒ ∈ Li = {(yi , t) : kyi k ≤ t}
cTi x + di
(hence the name)
SOCP ⇐⇒ minimize cT x
Fi gi
subject to x + ∈ Li
cTi di
Ax = b
affine mapping
Fi g
Fx + g = x+ i
cTi di i=1,...,m
cone product
K = L1 × L2 × . . . × Lm
Fi gi
x + ∈ Li ∀i ⇐⇒ F x + g ∈ K
cTi di
minimize cT x
∴ SOCP ⇐⇒ subject to Fx + g ∈ K
Ax = b
this is a cone program
Example: support vector machines
n pairs (xi , yi )
xi ∈ Rp : feature/explanatory variables
yi ∈ {−1, 1}: response/class label
Examples
xi : infrared blood absorption spectrum
yi : person is diabetic or not
SVM model: SVM as a penalized fitting
procedure
n
X
min [1 − yi f (xi )]+ + λkβk2 hinge loss
β
i=1
[1-yf(x)]+
f (x) = xT β
0 1
yf(x)
sometimes f (x) = xT β + β0 and same
minimum
SVM: formulation as an SOCP
Variables (β, t) ∈ Rp×n
ti + λkβk2 ti + λkβk2
P P
minimize ⇐⇒ minimize
subject to [1 − yi f (xi )]+ ≤ ti subject to yi f (xi ) ≥ 1 − ti
ti ≥ 0
this an SOCP, since SOCP’s are more general than QP’s and QCQP’s
Equivalence P
minimize ti + λu
subject to kβk2 ≤ u
yi f (xi ) ≥ 1 − ti
ti ≥ 0
u + 1 2 u − 1 2
β u+1
kβk2 ≤ u ⇐⇒ kβk2 ≤ − ⇐⇒ u−1 ≤
2 2 2 2
QP ⊂ SOCP ( =⇒ LP ⊂ SOCP)
QCQP
1 T T
minimize 2 x P0 x + q0 x + r0
1 T T
subject to 2 x Pi x + qi x + ri ≤ 0
P0 , Pi 0
QCQP ⊂ SOCP
quadratic convex inequalities are SOCP-representable
Example: total-variation denoising
Observe
bij = fij + σzij 0 < i, j < n
f is original image
b is a noisy version
Problem: recover original image (de-noise)
Min-TV solution
minimize kxkT V
subject to kx − bk ≤ δ
TV norm
X xi+1,j − xi,j
kxkTV = kDij xk2 Dij x =
xi,j+1 − xi,j
Formulation as an SOCP
P
minimize tij
subject to kDij xk2 ≤ tij
kx − bk2 ≤ δ
Semidefinite programming (SDP)
minimize cT x
subject to F (x) = x1 F1 + . . . + xn Fn − F0 0
Fi ∈ S p (p × p symmetric matrices)
linear matrix inequality (LMI): F (x) 0
multiple LMI’s can be combined into one:
F1 (x)
Fi (x) 0 i = 1, . . . , m ⇐⇒
.. 0
.
Fm (x)
SOCP ⊂ SDP (but the converse is not true!)
n+1 tIm x
(x, t) ∈ R : kxk ≤ t ⇐⇒ 0
xT t
SOCP constraints are LMI’s
Hierarchy: LP ⊂ SOCP ⊂ SDP
Many nonlinear convex problems can be cast as SDP’s
Example: minimum-norm problem
minimize kA(x)k
subject to A(x) = x1 A1 + . . . + xn An − B
with Ai ∈ Rp1 ×p2 , is equivalent to
minimize t
subject to kA(x)k ≤ t
tIp1 A(x)
kA(x)k ≤ t ⇐⇒ 0
AT (x) tIp2
Why? Eigenvalues of this matrix are {t ± σi (A(x))}
Example: nuclear-norm minimization
P
minimize kXk∗ = σi (X)
subject to Xij = Bij (i, j) ∈ Ω ⊂ [p1 ] × [p2 ]
This is an SDP (proof, later)
Stability analysis for dynamical systems
Linear system
dv
= v̇(t) = Qv(t) Q ∈ Rn×n
dt
Main question: is this system stable? i.e. do all trajectories tend to zero as
t → ∞?
Simple sufficient condition: existence of a quadratic Lyapunov function
(i) L(v) = v T Xv X 0
d
(ii) L̇ = dt L(v(t)) ≤ −αL(v(t)) (α > 0) for any trajectory
This condition gives L(v(t)) = v T (t)Xv(t) ≤ exp(−αt) L(v(0)) (Gronwall’s
inequality), whence
X 0 =⇒ v(t) → 0 as t→∞
Exsitence of X 0 and α > 0 provides a certificate of stability
dv
= v̇(t) = Qv(t), L(v) = v T Xv X 0
dt
d T
v (t)Xv(t) = v̇ T Xv + v T X v̇ = v T (QT X + XQ)v
L̇ =
dt
i.e. L̇ ≤ −αL ⇐⇒ v T (QT X + XQ + αX)v <0 ∀v
⇐⇒ QT X + XQ + αX ≺0
Conclusion: to certify stability, it suffices to find X obeying
X 0, QT X + XQ ≺ 0
If the optimal value of SDP
minimize t
X + tI 0
subject to 0
0 −(QT X + XQ) + tI
is negative, then the system is stable
Extension
v̇(t) = Q(t)v(t)
Q(t) ∈ conv{Q1 , . . . , Qn } time-varying
L(v) = v T Xv (X 0) s.t. L̇ ≤ −αL =⇒ stability
Similar calculations show that for all v
v T (QT (t)X + XQ(t) + αX)v ≤ 0
T
⇐⇒ Q (t)X + XQ(t) + αX ≺ 0, ∀Q(t) ∈ conv{Q1 , . . . , Qn }
⇐⇒ QTi X + XQi + αX ≺ 0, ∀i = 1, . . . , k
If we can find X such that
X0 & QTi X + XQi ≺ 0 ∀i = 1, . . . , k
then we have stability
This is an SDP!
References
1 A. Ben-Tal and A. Nemirovski, Lectures on Modern Convex Optimization:
Analysis, Algorithms, and Engineering Applications, MPS-SIAM Series on
Optimization
2 S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University
Press