Convex Optimization
Stephen Boyd Lieven Vandenberghe
Revised slides by Stephen Boyd, Lieven Vandenberghe, and Parth Nobel
3. Convex functions
Outline
Convex functions
Operations that preserve convexity
Constructive convex analysis
Perspective and conjugate
Quasiconvexity
Convex Optimization Boyd and Vandenberghe 3.1
Definition
▶ f : Rn → R is convex if dom f is a convex set and for all x, y ∈ dom f , 0 ≤ 𝜃 ≤ 1,
f (𝜃x + (1 − 𝜃)y) ≤ 𝜃f (x) + (1 − 𝜃)f (y)
(y, f (y))
(x, f (x))
▶ f is concave if −f is convex
▶ f is strictly convex if dom f is convex and for x, y ∈ dom f , x ≠ y, 0 < 𝜃 < 1,
f (𝜃x + (1 − 𝜃)y) < 𝜃f (x) + (1 − 𝜃)f (y)
Convex Optimization Boyd and Vandenberghe 3.2
Examples on R
convex functions:
▶ affine: ax + b on R, for any a, b ∈ R
▶ exponential: eax , for any a ∈ R
▶ powers: x 𝛼 on R++ , for 𝛼 ≥ 1 or 𝛼 ≤ 0
▶ powers of absolute value: |x| p on R, for p ≥ 1
▶ positive part (relu): max{0, x}
concave functions:
▶ affine: ax + b on R, for any a, b ∈ R
▶ powers: x 𝛼 on R++ , for 0 ≤ 𝛼 ≤ 1
▶ logarithm: log x on R++
▶ entropy: −x log x on R++
▶ negative part: min{0, x}
Convex Optimization Boyd and Vandenberghe 3.3
Examples on Rn
convex functions:
▶ affine functions: f (x) = aT x + b
▶ any norm, e.g., the ℓp norms
– ∥x∥ p = (|x1 | p + · · · + |xn | p ) 1/p for p ≥ 1
– ∥x∥ ∞ = max{|x1 |, . . . , |xn |}
▶ sum of squares: ∥x∥ 22 = x12 + · · · + xn2
▶ max function: max(x) = max{x1 , x2 , . . . , xn }
▶ softmax or log-sum-exp function: log(exp x1 + · · · + exp xn )
Convex Optimization Boyd and Vandenberghe 3.4
Examples on Rm×n
▶ X ∈ Rm×n (m × n matrices) is the variable
▶ general affine function has form
m ∑︁
n
f (X) = tr(AT X) + b =
∑︁
Aij Xij + b
i=1 j=1
for some A ∈ Rm×n , b ∈ R
▶ spectral norm (maximum singular value) is convex
f (X) = ∥X∥ 2 = 𝜎max (X) = (𝜆max (X T X)) 1/2
▶ log-determinant: for X ∈ Sn++ , f (X) = log det X is concave
Convex Optimization Boyd and Vandenberghe 3.5
Extended-value extension
▶ suppose f is convex on Rn , with domain dom f
▶ its extended-value extension f̃ is function f̃ : Rn → R ∪ {∞}
f (x) x ∈ dom f
f̃ (x) =
∞ x ∉ dom f
▶ often simplifies notation; for example, the condition
0≤𝜃≤1 =⇒ f̃ (𝜃x + (1 − 𝜃)y) ≤ 𝜃 f̃ (x) + (1 − 𝜃) f̃ (y)
(as an inequality in R ∪ {∞}), means the same as the two conditions
– dom f is convex
– x, y ∈ dom f , 0 ≤ 𝜃 ≤ 1 =⇒ f (𝜃x + (1 − 𝜃)y) ≤ 𝜃f (x) + (1 − 𝜃)f (y)
Convex Optimization Boyd and Vandenberghe 3.6
Restriction of a convex function to a line
▶ f : Rn → R is convex if and only if the function g : R → R,
g(t) = f (x + tv), dom g = {t | x + tv ∈ dom f }
is convex (in t) for any x ∈ dom f , v ∈ Rn
▶ can check convexity of f by checking convexity of functions of one variable
Convex Optimization Boyd and Vandenberghe 3.7
Example
▶ f : Sn → R with f (X) = log det X , dom f = Sn++
▶ consider line in Sn given by X + tV , X ∈ Sn++ , V ∈ Sn , t ∈ R
g(t) = log det(X + tV)
= log det X 1/2 I + tX −1/2 VX −1/2 X 1/2
= log det X + log det I + tX −1/2 VX −1/2
n
∑︁
= log det X + log(1 + t𝜆i )
i=1
where 𝜆 i are the eigenvalues of X −1/2 VX −1/2
▶ g is concave in t (for any choice of X ∈ Sn++ , V ∈ Sn ); hence f is concave
Convex Optimization Boyd and Vandenberghe 3.8
First-order condition
▶ f is differentiable if dom f is open and the gradient
𝜕f (x) 𝜕f (x) 𝜕f (x)
∇f (x) = , ,..., ∈ Rn
𝜕x1 𝜕x2 𝜕xn
exists at each x ∈ dom f
▶ 1st-order condition: differentiable f with convex domain is convex if and only if
f (y) ≥ f (x) + ∇f (x) T (y − x) for all x, y ∈ dom f
▶ first order Taylor approximation of convex f is a global underestimator of f
f (y)
f (x) + ∇f (x) T (y − x)
(x, f (x))
Convex Optimization Boyd and Vandenberghe 3.9
Second-order conditions
▶ f is twice differentiable if dom f is open and the Hessian ∇2 f (x) ∈ Sn ,
𝜕 2 f (x)
∇2 f (x)ij = , i, j = 1, . . . , n,
𝜕xi 𝜕xj
exists at each x ∈ dom f
▶ 2nd-order conditions: for twice differentiable f with convex domain
– f is convex if and only if ∇2 f (x) ⪰ 0 for all x ∈ dom f
– if ∇2 f (x) ≻ 0 for all x ∈ dom f , then f is strictly convex
Convex Optimization Boyd and Vandenberghe 3.10
Examples
▶ quadratic function: f (x) = (1/2)xT Px + qT x + r (with P ∈ Sn )
∇f (x) = Px + q, ∇2 f (x) = P
convex if P ⪰ 0 (concave if P ⪯ 0)
▶ least-squares objective: f (x) = ∥Ax − b∥ 22
∇f (x) = 2AT (Ax − b), ∇2 f (x) = 2AT A
convex (for any A)
▶ quadratic-over-linear: f (x, y) = x2 /y, y > 0
f (x, y)
1
T
y y
2
∇2 f (x, y) = 3 ⪰0
y −x −x 0
2 2
convex for y > 0
1 0
y 0 −2 x
Convex Optimization Boyd and Vandenberghe 3.11
More examples
Ín
▶ log-sum-exp: f (x) = log k=1 exp xk is convex
1 1
∇2 f (x) = diag(z) − T 2 zzT (zk = exp xk )
1 z
T (1 z)
▶ to show ∇2 f (x) ⪰ 0, we must verify that vT ∇2 f (x)v ≥ 0 for all v:
( k zk v2k ) ( k zk ) − ( k vk zk ) 2
Í Í Í
T 2
v ∇ f (x)v = ≥0
( k zk ) 2
Í
since ( k vk zk ) 2 ≤ ( k zk v2k ) ( k zk ) (from Cauchy-Schwarz inequality)
Í Í Í
În
▶ geometric mean: f (x) = ( k=1 xk )
1/n on Rn++ is concave (similar proof as above)
Convex Optimization Boyd and Vandenberghe 3.12
Epigraph and sublevel set
▶ 𝛼-sublevel set of f : Rn → R is C 𝛼 = {x ∈ dom f | f (x) ≤ 𝛼}
▶ sublevel sets of convex functions are convex sets (but converse is false)
▶ epigraph of f : Rn → R is epi f = {(x, t) ∈ Rn+1 | x ∈ dom f , f (x) ≤ t}
epi f
▶ f is convex if and only if epi f is a convex set
Convex Optimization Boyd and Vandenberghe 3.13
Jensen’s inequality
▶ basic inequality: if f is convex, then for x, y ∈ dom f , 0 ≤ 𝜃 ≤ 1,
f (𝜃x + (1 − 𝜃)y) ≤ 𝜃f (x) + (1 − 𝜃)f (y)
▶ extension: if f is convex and z is a random variable on dom f ,
f (E z) ≤ E f (z)
▶ basic inequality is special case with discrete distribution
prob(z = x) = 𝜃, prob(z = y) = 1 − 𝜃
Convex Optimization Boyd and Vandenberghe 3.14
Example: log-normal random variable
▶ suppose X ∼ N (𝜇, 𝜎 2 )
▶ with f (u) = exp u, Y = f (X) is log-normal
▶ we have E f (X) = exp(𝜇 + 𝜎 2 /2)
▶ Jensen’s inequality is
f (E X) = exp 𝜇 ≤ E f (X) = exp(𝜇 + 𝜎 2 /2)
which indeed holds since exp 𝜎 2 /2 > 1
Convex Optimization Boyd and Vandenberghe 3.15
Example: log-normal random variable
p(f (X))
E f (X)
f (E X)
p(X)
EX
Convex Optimization Boyd and Vandenberghe 3.16
Outline
Convex functions
Operations that preserve convexity
Constructive convex analysis
Perspective and conjugate
Quasiconvexity
Convex Optimization Boyd and Vandenberghe 3.17
Showing a function is convex
methods for establishing convexity of a function f
1. verify definition (often simplified by restricting to a line)
2. for twice differentiable functions, show ∇2 f (x) ⪰ 0
– recommended only for very simple functions
3. show that f is obtained from simple convex functions by operations that preserve convexity
– nonnegative weighted sum
– composition with affine function
– pointwise maximum and supremum
– composition
– minimization
– perspective
you’ll mostly use methods 2 and 3
Convex Optimization Boyd and Vandenberghe 3.18
Nonnegative scaling, sum, and integral
▶ nonnegative multiple: 𝛼f is convex if f is convex, 𝛼 ≥ 0
▶ sum: f1 + f2 convex if f1 , f2 convex
▶ infinite sum: if f1 , f2 , . . . are convex functions, infinite sum ∞i=1 fi is convex
Í
∫
▶ integral: if f (x, 𝛼) is convex in x for each 𝛼 ∈ A , then f (x, 𝛼) d𝛼 is convex
𝛼∈ A
▶ there are analogous rules for concave functions
Convex Optimization Boyd and Vandenberghe 3.19
Composition with affine function
(pre-)composition with affine function: f (Ax + b) is convex if f is convex
examples
▶ log barrier for linear inequalities
m
log(bi − aTi x), dom f = {x | aTi x < bi , i = 1, . . . , m}
∑︁
f (x) = −
i=1
▶ norm approximation error: f (x) = ∥Ax − b∥ (any norm)
Convex Optimization Boyd and Vandenberghe 3.20
Pointwise maximum
if f1 , . . . , fm are convex, then f (x) = max{f1 (x), . . . , fm (x)} is convex
examples
▶ piecewise-linear function: f (x) = maxi=1,...,m (aTi x + bi )
▶ sum of r largest components of x ∈ Rn :
f (x) = x [1] + x [2] + · · · + x [r]
(x [i] is ith largest component of x)
proof: f (x) = max{xi1 + xi2 + · · · + xir | 1 ≤ i1 < i2 < · · · < ir ≤ n}
Convex Optimization Boyd and Vandenberghe 3.21
Pointwise supremum
if f (x, y) is convex in x for each y ∈ A , then g(x) = supy∈ A f (x, y) is convex
examples
▶ distance to farthest point in a set C: f (x) = supy∈C ∥x − y∥
▶ maximum eigenvalue of symmetric matrix: for X ∈ Sn , 𝜆max (X) = sup ∥y∥ 2 =1 yT Xy is convex
▶ support function of a set C: SC (x) = supy∈C yT x is convex
Convex Optimization Boyd and Vandenberghe 3.22
Partial minimization
▶ the function g(x) = inf y∈C f (x, y) is called the partial minimization of f (w.r.t. y)
▶ if f (x, y) is convex in (x, y) and C is a convex set, then partial minimization g is convex
examples
▶ f (x, y) = xT Ax + 2xT By + yT Cy with
A B
⪰ 0, C≻0
BT C
minimizing over y gives g(x) = inf y f (x, y) = xT (A − BC −1 BT )x
g is convex, hence Schur complement A − BC −1 BT ⪰ 0
▶ distance to a set: dist(x, S) = inf y∈S ∥x − y∥ is convex if S is convex
Convex Optimization Boyd and Vandenberghe 3.23
Composition with scalar functions
▶ composition of g : Rn → R and h : R → R is f (x) = h(g(x)) (written as f = h ◦ g)
▶ composition f is convex if
– g convex, h convex, h̃ nondecreasing
– or g concave, h convex, h̃ nonincreasing
(monotonicity must hold for extended-value extension h̃)
▶ proof (for n = 1, differentiable g, h)
f ′′ (x) = h′′ (g(x))g′ (x) 2 + h′ (g(x))g′′ (x)
examples
▶ f (x) = exp g(x) is convex if g is convex
▶ f (x) = 1/g(x) is convex if g is concave and positive
Convex Optimization Boyd and Vandenberghe 3.24
General composition rule
▶ composition of g : Rn → Rk and h : Rk → R is f (x) = h(g(x)) = h(g1 (x), g2 (x), . . . , gk (x))
▶ f is convex if h is convex and for each i one of the following holds
– gi convex, h̃ nondecreasing in its ith argument
– gi concave, h̃ nonincreasing in its ith argument
– gi affine
▶ you will use this composition rule constantly throughout this course
▶ you need to commit this rule to memory
Convex Optimization Boyd and Vandenberghe 3.25
Examples
▶ log m i=1 exp gi (x) is convex if gi are convex
Í
▶ f (x) = p(x) 2 /q(x) is convex if
– p is nonnegative and convex
– q is positive and concave
▶ composition rule subsumes others, e.g.,
– 𝛼f is convex if f is, and 𝛼 ≥ 0
– sum of convex (concave) functions is convex (concave)
– max of convex functions is convex
– min of concave functions is concave
Convex Optimization Boyd and Vandenberghe 3.26
Outline
Convex functions
Operations that preserve convexity
Constructive convex analysis
Perspective and conjugate
Quasiconvexity
Convex Optimization Boyd and Vandenberghe 3.27
Constructive convexity verification
▶ start with function f given as expression
▶ build parse tree for expression
– leaves are variables or constants
– nodes are functions of child expressions
▶ use composition rule to tag subexpressions as convex, concave, affine, or none
▶ if root node is labeled convex (concave), then f is convex (concave)
▶ extension: tag sign of each expression, and use sign-dependent monotonicity
▶ this is sufficient to show f is convex (concave), but not necessary
▶ this method for checking convexity (concavity) is readily automated
Convex Optimization Boyd and Vandenberghe 3.28
Example
the function
(x − y) 2
f (x, y) = , x < 1, y<1
1 − max(x, y)
is convex
constructive analysis:
▶ (leaves) x, y, and 1 are affine
▶ max(x, y) is convex; x − y is affine
▶ 1 − max(x, y) is concave
▶ function u2 /v is convex, monotone decreasing in v for v > 0
▶ f is composition of u2 /v with u = x − y, v = 1 − max(x, y) , hence convex
Convex Optimization Boyd and Vandenberghe 3.29
Example (from dcp.stanford.edu)
Convex Optimization Boyd and Vandenberghe 3.30
Disciplined convex programming
in disciplined convex programming (DCP) users construct convex and concave functions as
expressions using constructive convex analysis
▶ expressions formed from
– variables,
– constants,
– and atomic functions from a library
▶ atomic functions have known convexity, monotonicity, and sign properties
▶ all subexpressions match general composition rule
▶ a valid DCP function is
– convex-by-construction
– ‘syntactically’ convex (can be checked ‘locally’)
▶ convexity depends only on attributes of atomic functions, not their meanings
√ √4
– e.g., could swap · and ·, or exp · and (·)+ , since their attributes match
Convex Optimization Boyd and Vandenberghe 3.31
CVXPY example
(x − y) 2
, x < 1, y<1
1 − max(x, y)
import cvxpy as cp
x = cp.Variable()
y = cp.Variable()
expr = cp.quad_over_lin(x - y, 1 - cp.maximum(x, y))
expr.curvature # Convex
expr.sign # Positive
expr.is_dcp() # True
(atom quad_over_lin(u,v) includes domain constraint v>0)
Convex Optimization Boyd and Vandenberghe 3.32
DCP is only sufficient
√
▶ consider convex function f (x) = 1 + x2
▶ expression f1 = cp.sqrt(1+cp.square(x)) is not DCP
▶ expression f2 = cp.norm2([1,x]) is DCP
▶ CVXPY will not recognize f1 as convex, even though it represents a convex function
Convex Optimization Boyd and Vandenberghe 3.33
Outline
Convex functions
Operations that preserve convexity
Constructive convex analysis
Perspective and conjugate
Quasiconvexity
Convex Optimization Boyd and Vandenberghe 3.34
Perspective
▶ the perspective of a function f : Rn → R is the function g : Rn × R → R,
g(x, t) = tf (x/t), dom g = {(x, t) | x/t ∈ dom f , t > 0}
▶ g is convex if f is convex
examples
▶ f (x) = xT x is convex; so g(x, t) = xT x/t is convex for t > 0
▶ f (x) = − log x is convex; so relative entropy g(x, t) = t log t − t log x is convex on R2++
Convex Optimization Boyd and Vandenberghe 3.35
Conjugate function
▶ the conjugate of a function f is f ∗ (y) = supx∈dom f (yT x − f (x))
f (x)
xy
(0, −f ∗ (y))
▶ f ∗ is convex (even if f is not)
▶ will be useful in chapter 5
Convex Optimization Boyd and Vandenberghe 3.36
Examples
▶ negative logarithm f (x) = − log x
y<0
−1 − log(−y)
f (y) = sup (xy + log x) =
∗
x>0 ∞ otherwise
▶ strictly convex quadratic, f (x) = (1/2)xT Qx with Q ∈ Sn++
1 T −1
f ∗ (y) = sup (yT x − (1/2)xT Qx) = y Q y
x 2
Convex Optimization Boyd and Vandenberghe 3.37
Outline
Convex functions
Operations that preserve convexity
Constructive convex analysis
Perspective and conjugate
Quasiconvexity
Convex Optimization Boyd and Vandenberghe 3.38
Quasiconvex functions
▶ f : Rn → R is quasiconvex if dom f is convex and the sublevel sets
S 𝛼 = {x ∈ dom f | f (x) ≤ 𝛼}
are convex for all 𝛼
a b c
▶ f is quasiconcave if −f is quasiconvex
▶ f is quasilinear if it is quasiconvex and quasiconcave
Convex Optimization Boyd and Vandenberghe 3.39
Examples
|x| is quasiconvex on R
√︁
▶
▶ ceil(x) = inf{z ∈ Z | z ≥ x} is quasilinear
▶ log x is quasilinear on R++
▶ f (x1 , x2 ) = x1 x2 is quasiconcave on R2++
▶ linear-fractional function
aT x + b
f (x) = , dom f = {x | cT x + d > 0}
cT x + d
is quasilinear
Convex Optimization Boyd and Vandenberghe 3.40
Example: Internal rate of return
▶ cash flow x = (x0 , . . . , xn ) ; xi is payment in period i (to us if xi > 0)
▶ we assume x0 < 0 (i.e., an initial investment) and x0 + x1 + · · · + xn > 0
Ín
▶ net present value (NPV) of cash flow x, for interest rate r, is PV(x, r) = i=0 (1 + r) −i xi
▶ internal rate of return (IRR) is smallest interest rate for which PV(x, r) = 0:
IRR(x) = inf{r ≥ 0 | PV(x, r) = 0}
▶ IRR is quasiconcave: superlevel set is intersection of open halfspaces
n
∑︁
IRR(x) ≥ R ⇐⇒ (1 + r) −i xi > 0 for 0 ≤ r < R
i=0
Convex Optimization Boyd and Vandenberghe 3.41
Properties of quasiconvex functions
▶ modified Jensen inequality: for quasiconvex f
0≤𝜃≤1 =⇒ f (𝜃x + (1 − 𝜃)y) ≤ max{f (x), f (y)}
▶ first-order condition: differentiable f with convex domain is quasiconvex if and only if
f (y) ≤ f (x) =⇒ ∇f (x) T (y − x) ≤ 0
∇f (x)
x
▶ sum of quasiconvex functions is not necessarily quasiconvex
Convex Optimization Boyd and Vandenberghe 3.42