Convex functions
Definition
f : R n → R is convex if dom f is a convex set and
f (θx + (1 − θ)y ) ≤ θf (x) + (1 − θ)f (y )
for all x, y ∈ dom f , and θ ∈ [0, 1].
Convex functions
Definition
f : R n → R is convex if dom f is a convex set and
f (θx + (1 − θ)y ) ≤ θf (x) + (1 − θ)f (y )
for all x, y ∈ dom f , and θ ∈ [0, 1].
◮ f is concave if −f is convex
◮ f is strictly convex if dom f is convex and
f (θx + (1 − θ)y ) < θf (x) + (1 − θ)f (y )
for all x, y ∈ dom f , x 6= y , θ ∈ (0, 1).
Convex function: examples on R
◮ affine: ax + b
◮ exponential: e αx , for any α ∈ R
◮ powers:
◮ x a on R++ , for a ≥ 1 or a ≤ 0
◮ −x a on R++ , for 0 ≤ a ≤ 1
◮ powers of absolute value: |x|p on R, for p ≥ 1
◮ negative logarithm: − log x on R++
◮ x log x on R++
Convex function: affine functions
Affine functions are both convex and concave
◮ affine function in R n :
f (x) = aT x + b
◮ affine function in R m×n :
m X
X n
f (X ) = tr(AT X ) + b = hA, X i + b = Aij Xij + b
i =1 j=1
Convex function: norms
All norms are convex
◮ norms in R n : !1/p
n
X
p
kxkp = |xi |
i =1
for all p ≥ 1
◮ norms in R m×n :
◮ Frobenius norm: kX kF = hX , X i1/2
◮ spectral norm: kX k2 = σmax (X ) = (λmax (X T X ))1/2
Operator norms
Let k · ka and k · kb be norms on R m and R n , respectively. The operator
norm of X ∈ R m×n , induced by k · ka and k · kb , is defined to be
kX ka,b = sup {kXuka | kukb ≤ 1}
◮ Spectral norm (ℓ2 -norm):
kX k2 = kX k2,2 = σmax (X ) = (λmax (X T X ))1/2
◮ Max-row-sum norm:
n
X
kX k∞ = kX k∞,∞ = max |Xij |
i =1,··· ,m
j=1
◮ Max-column-sum norm:
m
X
kX k1 = kX k1,1 = max |Xij |
j=1,··· ,n
i =1
Dual norm
Let k · k be a norm on R n . The associated dual norm, denoted k · k∗ , is
defined as
kzk∗ = sup {z T x | kxk ≤ 1}.
◮ z T x ≤ kxk kzk∗ for all x, z ∈ R n
◮ The dual of the ℓp -norm is the ℓq -norm, where 1/p + 1/q = 1
◮ The dual of the ℓ2 -norm on R m×n is the nuclear norm,
kZ k2∗ = sup {tr(Z T X ) | kX k2 ≤ 1}
= σ1 (Z ) + · · · + σr (Z ) = tr(Z T Z )1/2 ,
where r = rank Z .
Restriction of a convex function to a line
f : R n → R is convex iff g : R → R,
g (t) = f (x + tv ) dom g = {t | x + tv ∈ dom f }
is convex for any x ∈ dom f , v ∈ R n
So we can check the convexity of a function with multiple variables by
checking the convexity of functions of one variable
Restriction of a convex function to a line: example
n
Show that f : S++ → R with f (X ) = log det X is concave.
Proof.
Define g : R → R, g (t) = f (X + tV ) with dom g = {t | X + tV ≻ 0},
for any X ≻ 0 and V ∈ S n .
g (t) = log det(X + tV )
= log det(X 1/2 (I + tX −1/2 VX −1/2 )X 1/2 )
X n
= log(1 + tλi ) + log det X
i =1
where λ1 , · · · , λn are the eigenvalues of X −1/2 VX −1/2 . Hence g is
concave for any X ≻ 0 and V ∈ S n , so is f .
Extended-value extensions
Definition
If f : R n → R is convex, we define its extended-value extension
f˜ : R n → R ∪ {∞} by
˜ f (x) x ∈ dom f
f (x) =
∞ x∈ / dom f
By this notation, the condition
f˜(θx + (1 − θ)y ) ≤ θf˜(x) + (1 − θ)f˜(y ) ∀θ ∈ [0, 1]
is equivalent to the two conditions:
◮ dom f is convex
◮ f (θx + (1 − θ)y ) ≤ θf (x) + (1 − θ)f (y ) ∀θ ∈ [0, 1]
First-order conditions
Theorem (first-order condition)
If f : R n → R is differentiable, then f is convex if and only if dom f is
convex and
f (y ) ≥ f (x) + ∇f (x)T (y − x), ∀x, y ∈ dom f
First-order conditions
Theorem (first-order condition)
If f : R n → R is differentiable, then f is convex if and only if dom f is
convex and
f (y ) ≥ f (x) + ∇f (x)T (y − x), ∀x, y ∈ dom f
◮ local information (gradient) leads to global information (convexity)
◮ f is strictly convex if and only if dom f is convex and
f (y ) > f (x) + ∇f (x)T (y − x), ∀x, y ∈ dom f , x 6= y
First-order condition: proof
Proof.
◮ Suppose f (x) is convex. Then
f (x + θ(y − x)) − f (x) ≤ θ(f (y ) − f (x)), ∀θ ∈ [0, 1]
f (x + θ(y − x)) − f (x)
lim ≤ f (y ) − f (x)
θ→0 θ
T
=⇒ ∇f (x) (y − x) ≤ f (y ) − f (x)
◮ Suppose the first-order condition holds. Let z = θx + (1 − θ)y . Then
f (x) ≥ f (z) + ∇f (z)T (x − z)
f (y ) ≥ f (z) + ∇f (z)T (y − z)
=⇒ θf (x) + (1 − θ)f (y ) ≥ f (z)
which is true for any θ ∈ [0, 1], so f is convex.
Second-order conditions
Theorem (second-order condition)
If f : R n → R is twice differentiable, then f is convex if and only if
dom f is convex and its Hessian is positive semidefinite, i.e.,
∇f (x) 0 ∀x ∈ dom f
Second-order conditions
Theorem (second-order condition)
If f : R n → R is twice differentiable, then f is convex if and only if
dom f is convex and its Hessian is positive semidefinite, i.e.,
∇f (x) 0 ∀x ∈ dom f
◮ if ∇f (x) ≻ 0 for all x ∈ dom f , then f is strictly convex
Second-order conditions: proof
Proof.
◮ Suppose f is convex. Because f is twice differentiable, we have
1 T 2
f (x + δx) = f (x) + ∇f (x)T δx + δx ∇ f (x)δx + R(x; δx)kδxk2
2
where R(x; δx) → 0 as δx → 0. Because f is convex, by the first-order
condition, f (x + δx) ≥ f (x) + ∇f (x)T δx. Hence
δx T ∇2 f (x)δx + R(x; δx)kδxk2 ≥ 0
for any δx. Let δx = ǫd. Taking ǫ → 0 yields d T ∇2 f (x)d ≥ 0 for any d,
thus ∇2 f (x) 0.
◮ Suppose ∇f (x) 0 ∀x ∈ dom f . Then for any x, y ∈ dom f and some
z = θx + (1 − θ)y with θ ∈ [0, 1],
1
f (y ) = f (x) + ∇f (x)T (y − x) + (y − x)T ∇2 f (z)(y − x)
2
≥ f (x) + ∇f (x)T (y − x)
By the first-order condition, f is convex.
Some special cases
◮ quadratic function: f (x) = 21 x T Px + q T x + r , where P ∈ S n
∇f (x) = Px + q ∇2 f (x) = P
convex if P 0
◮ least-squares: f (x) = kAx − bk22
∇f (x) = 2AT (Ax − b) ∇2 f (x) = 2AT A
convex for any A
◮ quadratic-over-linear: f (x, y ) = x 2 /y
T
2 y y
∇2 f (x) = 0
y 3 −x −x
convex for y > 0
Some special cases: log-sum-exp
Pn
log-sum-exp: f (x) = log k=1 exp xk is convex
max{x1 , · · · , xn } ≤ f (x) ≤ max{x1 , · · · , xn } + log n, so f can be viewed as a
differentiable approximation of the max function.
Proof.
1
∇2 f (x) = ((1T z) diag(z) − zz T ) (zk = exp xk )
(1T z)2
n n n
!
1 X X X
=⇒ v T ∇2 f (x)v = T 2 zi zi vi2 − ( zi vi )2
(1 z) i =1 i =1 i =1
1
= (kak22 kbk22 − ha, bi2 ) ≥ 0
(1T z)2
√ √ √ √
where a = ( z1 , · · · , zn ), b = ( z1 v1 , · · · , zn vn ), for any v .
Some special cases: geometric mean
Qn
geometric mean: f (x) = ( k=1 xk )1/n on R++
n
is concave
Proof.
Qn 1/n
xi
∇2 f (x) = − i =1
(n diag2 (q) − qq T ) (qi = 1/xi )
n2 !
Qn 1/n n n n
i =1 xi
T 2
X X 2 2
X 2
=⇒ v ∇ f (x)v = − 1 vi qi − ( qi vi )
n2 i =1 i =1 i =1
Qn 1/n
x
= − i =12 i (kak22 kbk22 − ha, bi2 ) ≤ 0
n
where ai = 1, bi = qi vi for any v , so ∇2 f (x) 0.
Epigraph and sublevel set
α-sublevel set of f : R n → R:
Cα = {x ∈ dom f | f (x) ≤ α}
sublevel sets of convex functions are convex (converse is false)
epigraph of f : R n → R:
epi f = {(x, t) ∈ R n+1 | x ∈ dom f , f (x) ≤ t}
f is convex if and only if epi f is a convex set
Jensen’s inequality and extensions
basic inequality: if f is convex, then for any θ ∈ [0, 1],
f (θx + (1 − θ)y ) ≤ θf (x) + (1 − θ)f (y )
extension: if f is convex, then
f (E[z]) ≤ E[f (z)]
for any random variable z.
Using Jensen’s inequality: deriving Holder’s inequality
θ log a + (1 − θ) log b ≤ log(θa + (1 − θ)b)
=⇒ aθ b 1−θ ≤ θa + (1 − θ)b ∀a, b ≥ 0, θ ∈ [0, 1]
Applying this with
|xi |p |yi |p
a= P p
a= P p
θ = 1/p
j |xj | j |yj |
yields
!1/p !1/q
|x |p |y |p |x |p |yi |p
P i p P i p ≤ Pi +
p j |xj |p q j |yj |p
P
j |xj | j |yj |
and summing over i yields Holder’s inequality.
Operations preserving convexity
◮ nonnegative weighted sum
◮ composition with affine function
◮ pointwise maximum and supremum
◮ composition
◮ minimization
◮ perspective
positive weighted sum & composition with affine function
◮ f is convex =⇒ αf is convex for any α ≥ 0
◮ f1 , f2 are convex =⇒ f1 + f2 is convex (extends to infinite sums,
integrals)
◮ f is convex =⇒ f (Ax + b) is convex
Examples:
◮ log barrier for linear inequalities
m
X
f (x) = − log(bi − aiT x)
i =1
◮ f (x) = kAx + bk
Pointwise maximum
If f1 , · · · , fm are convex, then f (x) = max{f1 (x), · · · , fm (x)} is convex
examples
◮ piecewise-linear function: f (x) = maxi =1,··· ,m (aiT x + bi ) is convex
◮ sum of r largest components of x ∈ R n :
f (x) = x[1] + x[2] + · · · + x[n]
is convex (x[i ] is ith largest component of x)
Pointwise supremum
If f (x, y ) is convex in x for each y ∈ A, then
g (x) = sup f (x, y )
y∈A
is convex
In terms of epigraph:
\
epi g = epi f (·, y )
y∈A
Examples
◮ support function of a set: SC (x) = supy∈C y T x
◮ f (x) = supy∈C kx − y k
◮ λmax (X ) = supkuk2 =1 u T Xu where X ∈ S n is convex
Minimization
If f (x, y ) is convex in (x, y ) and C is a convex nonempty set, then
g (x) = inf f (x, y )
y∈C
is convex
examples
◮ f (x, y ) = x T Ax + 2x T By + y T Cy with
A B
0 C ≻0
BT C
Because
g (x) = inf f (x, y ) = x T (A − BC −1 B T )x
y
is convex, the Schur complement A − BC −1 B T 0
◮ distance to a set: dist(x, S) = inf y∈S kx − y k is convex if S is convex
Minimization: proof
Proof.
Suppose x1 , x2 ∈ dom g . Given ǫ > 0, ∃ y1 , y2 ∈ C such that
f (xi , yi ) ≤ g (xi ) + ǫ. Hence for any θ ∈ [0, 1],
g (θx1 + (1 − θ)x2 ) = inf f (θx1 + (1 − θ)x2 , y )
y∈C
≤ f (θx1 + (1 − θ)x2 , θy1 + (1 − θ)y2 )
≤ θf (x1 , y1 ) + (1 − θ)f (x2 , y2 )
≤ θg (x1 ) + (1 − θ)g (x2 ) + ǫ
which holds for any ǫ > 0. Thus
g (θx1 + (1 − θ)x2 ) ≤ θg (x1 ) + (1 − θ)g (x2 )
Composition with scalar functions
Composition of g : R n → R and h : R → R:
f (x) = h(g (x))
Use:
∇2 f (x) = h′ (g (x))∇2 g (x) + h′′ (g (x))∇g (x)∇g (x)T
f is convex if
◮ g convex, h convex, h̃ nondecreasing
◮ example: exp g (x) is convex if g is convex
◮ g concave, h convex, h̃ nonincreasing
◮ example: 1/g (x) is concave if g is concave and positive
Composition with vector functions
Composition of g : R n → R k and h : R k → R:
f (x) = h(g (x) = h(g1 (x), · · · , gk (x))
when n = 1:
f ′′ (x) = g ′ (x)T ∇2 h(g (x))g ′ (x) + ∇h(g (x))T g ′′ (x)
f is convex if
◮ gi convex, h convex, h̃ nondecreasing in each argument
Pm
◮ example: i =1 exp gi (x) is convex if gi is convex
◮ gi concave, h convex, h̃ nonincreasing in each argument
Pm
◮ example: i =1 log gi (x) is concave if gi is concave and positive
Perspective
The perspective of a function f : R n → R is the function g : R n+1 → R,
x
g (x, t) = tf ( ) dom g = {(x, t) | x/t ∈ dom f , t > 0}
t
f is convex =⇒ g is convex.
For t > 0,
(x, t, s) ∈ epi g ⇐⇒ f (x/t) ≤ s/t ⇐⇒ (x/t, s/t) ∈ epi f
Hence epi g is the inverse image of epi f under the perspective mapping
examples:
◮ f (x) = x T x =⇒ g (x, t) = x T x/t is convex for t > 0
2
◮ f (x) = − log x =⇒ g (x, t) = t log t − t log x is convex on R++
The conjugate function
the conjugate of a function f is
f ∗ (y ) = sup y T x − f (x)
x∈dom f
Figure: Conjugate function
The conjugate function: examples
◮ negative logarithm f (x) = − log x
∗ −1 − log(−y ) y <0
f (y ) = sup xy + log x =
x>0 ∞ otherwise
◮ strictly convex quadratic f (x) = 1/2x T Qx with Q ∈ S++
n
f ∗ (y ) = sup y T x − 1/2x T Qx
x
1
= y T Q −1 y
2
The conjugate function: properties
Properties:
◮ f ∗ is convex
◮ f is convex and closed (i.e., epi f is closed) =⇒ f ∗∗ = f .
◮ Frechel’s inequality: f (x) + f ∗ (y ) ≥ x T y
◮ Example: with f (x) = (1/2)x T Qx with Q ∈ S++
n
, we have
x T y ≤ (1/2)x T Qx + (1/2)y T Q −1 y
Convexity with respect to generalized inequalities
f : R n → R m is K -convex if dom f is convex and
f (θx + (1 − θ)y ) K θf (x) + (1 − θ)f (y ) ∀x, y ∈ dom f , θ ∈ [0, 1]
Convexity with respect to generalized inequalities
f : R n → R m is K -convex if dom f is convex and
f (θx + (1 − θ)y ) K θf (x) + (1 − θ)f (y ) ∀x, y ∈ dom f , θ ∈ [0, 1]
example: f : S m → S m with f (X ) = X 2 is S+m -convex
Proof.
for fixed z ∈ R m , z T X 2 z = kXzk22 is convex in X , i.e.,
z T (θX + (1 − θ)Y )2 z ≤ θz T X 2 z + (1 − θ)z T Y 2 z ∀X , Y ∈ S m
=⇒ (θX + (1 − θ)Y )2 θX 2 + (1 − θ)Y 2
Quasiconvex functions
Definition
A function f (x) is quasiconvex if ∀x, y ∈ dom f ,
f (θx + (1 − θ)y ) ≤ max{f (x), f (y )} ∀θ ∈ [0, 1]
Theorem
f (x) is quasiconvex if and only if every level set of f is convex.
Quasiconvex functions: level sets
Theorem
f (x) is quasiconvex if and only if every level set of f is convex.
Proof.
1. Suppose f is quasiconvex. Suppose x, y ∈ dom f belongs to level set
Sa = {x | f (x) ≤ a}. Then
f (θx + (1 − θ)y ) ≤ max{f (x), f (y )} ≤ a
Thus θx + (1 − θ)y ∈ Sa for all θ ∈ [0, 1], so Sa is convex.
2. Suppose every level set of f is convex. For any x, y ∈ dom f , let
a = max{f (x), f (y )}. Clearly x, y ∈ Sa . Because Sa is convex,
θx + (1 − θ)y ∈ Sa for any θ ∈ [0, 1]. Thus f is quasiconvex.