Numerical Optimization
Lecture 2: Elements of Analysis and Topology
Sangkyun Lee
The content is mainly Nocedal and Wright (2006), Bertsekas (1999), and
Folland (1999). Topics marked with ** are optional.
1
1.1
Analysis
Sequences
A sequence of an infinite number of points in Rn is denoted by {xk }
k=1 =
{x1 , x2 , . . . } = {xk }.
1.2
Convergence of a Sequence, Limit
A sequence of points {xk } in Rn converges to a point x, if for any > 0,
there exists an index K such that
kxk xk2 ,
for all k K.
Then we write
lim xk = x, or xk x as k ,
and call x the limit of {xk }.
Ex. xk = (1 2k , 1/k 2 )T converges to (1, 0)T .
1.3
Subsequence
Given an infinite index set S {1, 2, . . . }, a subsequence of {xk } corresponding to S can be constructed, and is denoted by
{xk }kS .
1.4
Sequence with a Convergence Subsequence, Limit Point
For a sequence {xk } in Rn , if there exists an infinite set of indices S =
{k1 , k2 , . . . , } such that the corresponding subsequence {xk }kS converges
to a point x
Rn , that is,
lim xki = x
,
i
then x
is called an limit point (or, an accumulation point or a cluster point
in the Euclidean space) of {xk }.
Alternatively, we say for any > 0 and all positive integer K, we have
kxk x
k , for some k K.
Ex. a sequence with two limit points x
= (0, 0)T and x
= (1, 1)T :
1
1/2
1
1/4
1
1/8
,
,
,
,
,
,...
1
1/2
1
1/4
1
1/8
Ex. a sequence with an infinite number of limit points: xk = sin k. Every
point in the interval [1, 1] is a limit point.
1.5
Cauchy Sequence
A sequence is said to be a Cauchy sequence if for any > 0, there exists an
integer K such that
kxk xl k2 , for all k K and ` K.
1.6
Boundedness
A set of points in F Rn is bounded if there exists a scalar M > 0 such
that
kxk2 M, x F.
1.7
Convergence of a Sequence (Cont)
A bounded sequence {xk } in Rn converges if and only if it has a unique
limit point.
A sequence {xk } in Rn converges if and only if it is a Cauchy sequence.
(Bolzano-Weierstrass theorem) Every bounded sequence in Rn has at
least one limit point. Proof**:
Every sequence {xk } in Rn has a monotone subsequence.
(Monotone convergence theorem) every monotone sequence has
a finite limit if and only if the sequence is bounded.
1.8
Convergence of a Scalar Sequence
Consider a scalar sequence {tk }, tk R for all k. This sequence is
bounded above: if there exists a scalar u such that tk u for all k.
bounded below: if there exists a scalar ` such that tk ` for all k.
2
nondecreasing: if tk+1 tk for all k.
nonincreasing: if tk+1 tk for all k.
The scalar sequence {tk } converges if (monotone convergence theorem)
it is nondecreasing and bounded above.
it is nonincreasing and bounded below.
If a sequence is bounded below and bounded above, then it is called bounded.
1.9
Properties of Scalar Sequences
limn (xn yn ) = limn xn limn yn .
limn cxn = c limn yn .
limn (xn yn ) = (limn xn )(limn yn ).
limn
xn
yn
limn xn
limn yn
if limn yn 6= 0.
limn xpn = (limn xn )p .
If xn yn for all n N for some positive integer N , then limn xn
limn yn .
(Sandwich theorem) If xn zn yn for all n N for some positive
integer N and limn xn = limn = , then limn zn = .
(LHopital theorem) For functions f and g differentiable on I \ c where
I is an open interval containing c (c can be ), if
f 0 (x)
exists,
xc g 0 (x)
lim f (x) = lim g(x) = 0 or , and lim
xc
xc
f (x)
f 0 (x)
= lim 0
.
xc g(x)
xc g (x)
then lim
1.10
Supremum and Infimum
For a scalar sequence {tk },
Supremum: the smallest real number u: tk u for all k, denoted
by sup{tk }. If there is no such number, then sup{tk } = . By
convention, sup = .
Infimum: the largest real number `: tk ` for all k, denoted by inf{tk }.
If there is no such number, then inf{tk } = . By convention, inf =
.
3
For two scalar sequences {xk } and {yk }, we have (think about the proof!)
inf{xk } + inf{yk } inf{xk + yk }.
sup{xk } + sup{yk } sup{xk + yk }.
1.11
Limsup and Liminf
Define a sequence of suprema as {ui }:
ui := sup{tk |k i}
Then {ui } is an nonincreasing sequence. If bounded below, it converges to
a finite number u
, called lim sup of {tk }, denoted by
lim sup{tk } = lim ui = lim sup{tk |k i}
i
Similarly, define the sequence of infima as {`i }:
`i = inf{tk |k i}.
Then {`i } is nondecreasing. If {`i } is bounded above, it converges to a point
v which is called lim inf of {tk }, denoted by
lim inf{tk } = lim `i = lim inf{tk |k i}.
i
Ex. the sequence 1, 1/2, 1, 1/4, 1, 1/8, . . . has a lim inf of 0 and a lim sup of
1.
Let {xk } and {yk } be scalar sequences. Then,
inf{xk } lim inf k xk lim supk xk sup{xk }.
If lim supk xk (lim inf k xk ) is finite, then it is the largest (smallest, resp.) limit point of {xk }.
{xk } converges if and only if
< lim inf xk = lim sup xk < .
k
Furthermore, the limit of {xk } is equal to the common scalar value of
lim inf k xk and lim supk xk .
If xk yk for all k, then
lim inf xk lim inf yk ,
k
lim sup xk lim sup yk .
k
Finally, we have
lim inf xk + lim inf yk lim inf (xk + yk )
k
lim sup xk + lim sup yk lim sup(xk + yk )
k
Rate of Convergence
Let {xk } be a sequence in Rn that converges to x .
2.1
Q-Linear Convergence
The convergence is Q-linear if there exists a constant r (0, 1) such that
kxk+1 x k2
r (0, 1), for all k sufficiently large.
kxk x k2
Q stands for quotient.
Q-linear convergences implies that the distance to the solution x decreases at each iteration by a constant factor 0 < r < 1.
Ex. xk = 1 + (0.5)k converges to 1 Q-linearly with rate r = 0.5.
2.2
Q-Superlinear Convergence
The convergence is superlinear if
kxk+1 x k2
= 0.
k kxk x k2
lim
Ex. xk = 1 + k k converges superlinearly to 1.
(Hint)
X
xn
x n
ex =
= lim 1 +
.
n
n!
n
n=0
2.3
Q-Quadratic Convergence
The convergence is Q-quadratic if there exists a constant M > 0 (not necessarily < 1) such that
kxk+1 x k2
M for all k sufficiently large.
kxk x k22
k
Ex. xk = 1 + (0.5)2 .
2.4
Remarks
Convergence rate depends on constants r and M , whose values depend
on algorithms and problems. But eventually a Q-quadratic sequence
will always converge faster than a Q-linear sequence.
Q-quadratic (e.g. Newtons under some conditions)
Q-superlinear (e.g. quasi-Newton)
Q-linear (e.g. gradient descent).
5
There are R-rates of convergence (R for root), but well use only
Q-rates in this lecture (therefore well often omit Q).
1.04
1.00
Error
linear
superlinear
quadratic
superlinear
quadratic
1.02
1.0 1.1 1.2 1.3 1.4 1.5
Error
In Q-rates, the error (distance to x ) cannot increase for all sufficiently
large k (in R-rates the error can increase).
2.5
3.0
3.5
4.0
0.0
1.0
0.5
0.5
1.0
Figure 1: Rate of convergence (the three examples in the text).
Figure 2: Inf, sup, liminf, limsup, and convergence illustrated (yk =
exp(k) cos(2k)).
Topology
3.1
Topology**
For a set X, T 2X is a topology on X if
T and X T .
Arbitrary unions of members of T is in T .
Finite intersections of members of T is in T .
Ex. if X 6= , then 2X and {, X} are discrete and trivial topology, resp.
The pair (X, T ) is called a topological space. The members of T are
called open sets, and their complements are called closed sets.
3.2
Metric Space**
A pair M = (X, d) with a set X and a function d : X X R satisfying
the following properties (d is called a metric) is a metric space:
d(x, y) 0 for all x, y X.
d(x, y) = 0 if and only if x = y.
d(x, y) = d(y, x) for all x, y X.
d(x, y) + d(y, z) d(x, z) for all x, y, z X (triangle inequality).
The metric space M = (X, d) with a real
X = Rn and the (EuPn vector space
2
1/2
clidean) metric d(x, y) = kxyk2 = ( i=1 (xi yi ) ) is the n-dimensional
Euclidean space. For convenience, we use Rn to denote the Euclidean space.
3.3
Open Sets of a Metric Space**
Let (X, d) be a metric space. A subset S X is open if for all point x S
we can find a scalar > 0 such that
{y X : d(x, y) < } S.
3.4
Metric-Induced Topology**
The collection of all open sets in a metric space M = (X, d) constitutes a
topology T on X. 1 A topology induced by the Euclidean metric d is called
the Euclidean topology.
1
For a proof, see Theorem 4.7 of https://www.dpmms.cam.ac.uk/~twk/Top.pdf
Topology of the Euclidean Space
4.1
Open Sets
A subset F Rn is open if for every x F we can find a positive number
> 0 such that an open ball of radius centered at x is contained entirely
in F :
B(x, ) := {y Rn : ky xk2 < } F.
**These open sets are the members of the Euclidean topology. That is,
Arbitrary unions of open sets are open.
Finite intersections of open sets are open.
Ex. (infinite intersections of open sets can be closed2 )
1 1
,
= {0}
n=1
n n
4.2
Closed Sets
A set F is closed if its complement F c = Rn \ F is open.
Or, equivalently, if for all possible sequences of points {xk } in F , all limit
points of {xk } are elements in F .
Ex.
F = (0, 1) (2, 10) is an open set of R.
F = [0, 1] [2, 5] is a closed set of R.
F = (0, 1] is neither open nor closed.
Rn and are both open and closed.
**From the properties of open sets, we have
Finite unions of closed sets are closed.
Arbitrary intersections of closed sets are closed.
4.3
Interior
A point x F Rn is an interior point of the set F , if there is an open ball
B(x, ) contained entirely in F .
The interior of a set F , denoted by int F , is the largest open set contained
in F . Or, equivalently, int F is the set of all interior points of F .
Ex. F = {x Rn : xi 0}. int F = {x Rn : xi > 0}.
2
Singleton sets are closed in any metric space.
4.4
Closure
The closure of a set F , denoted by cl F , is the smallest closed set containing
F . That is,
x cl F if lim xk = x for some sequence {xk } of points in F .
k
In other words, cl F is the union of F and all limit points from sequences in
F.
From definitions: int F F , F cl F .
int and cl are idempotent: int int F = int F , cl cl F = cl F .
int and cl are monotone: X Y implies that int X int Y and
cl X cl Y .
F is open iff int F = F .
F is closed iff cl F = F .
If F is a convex set, then int F and cl F are convex.
4.5
Boundary
The boundary of a set F , denoted by bd F , is
bd(F ) = cl F cl F c , F c = Rn \ F.
4.6
Compact Sets
A set F is compact if every sequence {xk } of points in F has at least one
limit point, and all such limit points are in F . If F Rn is closed and
bounded, then F is compact.
4.7
Neighborhood
For a point x Rn , N Rn is called a neighborhood of x, if it is an open set
containing x. A common neighborhood is the ball centered at x with radius
> 0, B(x, ).
For a set F Rn , N is a neighborhood of F if N is open and there exists
> 0 such that
xF B(x, ) N.
References
Bertsekas, D. P. (1999). Nonlinear Programming. Athena Scientific, second
edition.
Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications. Wiley Interscience, 2nd edition.
Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer,
second edition.
10