Optimization: an Overview
Moritz Diehl,
Optimization in Engineering Center (OPTEC) & ESAT
K.U. Leuven, Belgium
(some slide material was provided by W. Bangerth, K. Mombaur,
and P. Riede)
Overview of presentation
Optimization: basic definitions and concepts
Introduction to classes of optimization problems
Introduction to Newton type optimization algorithms
What is optimization?
Optimization = search for the best solution
in mathematical terms:
minimization or maximization of an objective function f (x)
depending on variables x subject to constraints
Equivalence of maximization and minimization problems:
(from now on only minimization)
f(x) -f(x)
x x
x* Maximum
x* Minimum
Constrained optimization
Often variable x shall satisfy certain constraints, e.g.:
x 0
x1 2 + x2 2 = C
General formulation:
f objective function / cost function
h equality constraints
g inequality constraints
Simple example: Ball hanging on a spring
To find position at rest,
minimize potential energy!
min x12 + x22 + mx2
spring gravity
1 + x1 + x2 0
3 x1 + x2 0
Feasible set
Feasible set = collection of all
points that satisfy all constraints:
Example feasible set is intersection
of grey and blue area
x2 0
1 x12 x22 0
h1 ( x) := x2 0
h2 ( x) := 1 x12 x22 0
Local and global optima
f(x)
Local Minimum
Local Minimum
Global Minimum:
x
Derivatives
First and second derivatives of the objective function or the
constraints play an important role in optimization
The first order derivatives are called the gradient (of the resp. fct)
and the second order derivatives are called the Hessian matrix
Optimality conditions (unconstrained)
min f ( x) x R n
Assume that f is twice differentiable.
We want to test a point x* for local
optimality.
necessary condition:
f(x*)=0 (stationarity)
x*
sufficient condition:
x* stationary and 2f(x*) positive definite
Types of stationary points
(a)-(c) x* is stationary: f(x*)=0
2f(x*) positive definite: 2f(x*) negative definite:
local minimum local maximum
2f(x*) indefinite: saddle point
Ball on a spring without constraints
min2 x12 + x22 + mx2
xR
contour lines of f(x)
gradient vector
f ( x) = (2 x1 ,2 x2 + m)
unconstrained minimum:
m
0 = f ( x* ) ( x1* , x2* ) = (0, )
2
Sometimes there are many local minima
e.g. potential energy
of macromolecule
Global optimization is a very hard issue - most algorithms find only
the next local minimum. But there is a favourable special case...
Convex functions
Convex: all connecting Non-convex: some connecting
lines are above graph lines are not above graph
Convex feasible sets
Convex: all connecting lines Non-convex: some connecting
between feasible points are in line between two feasible points
the feasible set is not in the feasible set
Convex problems
Convex problem if
f(x) is convex and the feasible set is convex
One can show:
For convex problems, every local minimum is also a global minimum.
It is sufficient to find local minima!
Characteristics of optimization problems 1
size / dimension of problem n ,
i.e. number of free variables
continuous or discrete search space
number of minima
Characteristics of optimization problems 2
Properties of the objective function:
type: linear, nonlinear, quadratic ...
smoothness: continuity, differentiability
Existence of constraints
Properties of constraints:
equalities / inequalities
type: simple bounds, linear, nonlinear,
dynamic equations (optimal control)
smoothness
Overview of presentation
Optimization: basic definitions and concepts
Introduction to classes of optimization problems
Introduction to Newton type optimization algorithms
Problem Class 1: Linear Programming (LP)
Linear objective,
linear constraints:
Linear Optimization Problem
(convex)
Example: Logistics Problem
shipment of quantities a1, a2, ... am
of a product from m locations
Origin of linear
to be received at n detinations in programming
quantities b1, b2, ... bn in 2nd world war
shipping costs cij
determine amounts xij
Problem Class 2: Quadratic Programming (QP)
Quadratic objective and linear
constraints:
Quadratic Optimization Problem
(convex, if Q pos. def.)
Example: Markovitz mean variance portfolio optimization
quadratic objective: portfolio variance (sum of the variances and
covariances of individual securities)
linear constraints specify a lower bound for portfolio return
QPs play an important role as subproblems in nonlinear optimization
Problem Class 3: Nonlinear Programming (NLP)
Nonlinear Optimization Problem
(in general nonconvex)
TODAYS MAIN TOPIC!
E.g. the famous nonlinear Rosenbrock
function
Problem Class 4: Non-smooth optimization
objective function or constraints are
non-differentiable or not continuous e.g.
Problem Class 5: Integer Programming (IP)
Some or all variables are integer
(e.g. linear integer problems)
Special case: combinatorial optimization
problems -- feasible set is finite
Example: traveling salesman problem
determine fastest/shortest round
trip through n locations
Problem Class 6: Optimal Control
Optimization problems Variables (partly -dim.)
including dynamics in form of
differential equations
(infinite dimensional)
THIS COURSES MAIN TOPIC!
Overview of presentation
Optimization: basic definitions and concepts
Introduction to classes of optimization problems
Introduction to Newton type optimization algorithms
Aim of Newton type optimization algorithms
min f ( x) ( x R ) n
Find a local minimizer x* of f(x), i.e. a point satisfying
f(x*)=0
Derivative based algorithms
Fundamental underlying structure of most algorithms:
choose start value x0
for i=1, ......:
determine direction of search (descent) p
determine step length
new iterate x i+1 = xi + p
check convergence
Optimization algorithms differ in the choice of p und
Basic algorithm:
Search direction:
choose descent direction
(f should be decreased)
Step length:
solve1-d minimization approximately,
satisfy Armijo condition
Computation of step length
Dream:
exact line search: k = arg min f ( xk + pk )
In practice:
inexact line search: k
arg min f ( x k
+ p k
)
ensure sufficient decrease, e.g. Armijo condition
How to compute search direction?
We discuss three algorithms:
Steepest descent method
Newtons method
Newton type methods
Algorithm 1: Steepest descent method
Based on first order Taylor series approximation of objective function
maximum descent, if
Steepest descent method
Choose steepest descent search direction, perform (exact) line search:
p k = f ( x k ) x k +1 = x k k f ( x k )
search direction is perpendicular to level sets of f(x)
Gradient direction
Convergence of steepest descent method
steepest descent method has linear convergence
i.e. x k x* C xk 1 x*
gain is a fixed factor C<1
convergence can be very slow
if C close to 1
If f(x) = xTAx, A positive definite,
eigenvalues of A, one can show that
max min
C
max + min
Example - steepest descent method
1 1 2
f ( x, y) = 4 ( x y 2 )2 + + y
100 100
banana valley function,
global minimum at x=y=0
Example - steepest descent method
xk x*
Convergence of steepest descent method:
needs almost 35.000 iterations to come closer than 0.1 to the solution
mean value of convergence constant C: 0.99995
at (x=4,y=2), there holds
268 0.1
1 = 0.1, 2 = 268 C 0.9993
268+ 0.1
Algorithm 2: Newtons Method
Based on second order Taylor series approximation of f(x)
Newton-Direction
Visualization of Newtons method
pk minimizes quadratic approximation of the objective
1 kT 2
Q ( p ) = f ( x ) + f ( x ) p + p f ( x k ) p k
k k k k
Gradient direction
if quadratic model is
good, then take full
step with k =1
Newton direction
Convergence of Newtons method
Newtons method has quadratic convergence
k 1 * 2
i.e. x x
k *
C x x
This is very fast close to a solution:
Correct digits double in each iteration!
Example - Newtons method
1 1 2
f ( x, y) = 4 ( x y 2 )2 + + y
100 100
banana valley function,
global minimum at x=y=0
Example - Newtons method
xk x*
Convergence of Newtons method:
less than 25 iterations for an accuracy of better than 10-7!
convergence roughly linear for first 15-20 iterations since step
length k 1
convergence roughly quadratic for last iterations with step length
k = 1
Comparison of steepest descent and Newton
xk x*
For banana valley example:
Newtons method much faster than steepest descent method (factor
1000)
Newtons method superior due to higher order of convergence
steepest descent method converges too slowly for practical
applications
Algorithm 3: Newton type methods
In practice, evaluation of second derivatives
for the hessian can be difficult!
approximate hessian matrix 2f(xk)
often methods ensure that the approximation Bk is positive
definite
xk +1 = x k Bk 1f ( xk )
Bk 2 f ( xk )
methods are collectively known as Newton type methods
Newton type variants
Notation:
Steepest Descent:
Convergence rate: linear
Newton Method:
Convergence rate: quadradic
Newton type variants (continued)
BFGS update (Broyden, Fletcher, Goldfarb, Shanno)
Convergence rate: super-linear
For Least-Squares Problems: Gauss-Newton Method
Convergence rate: linear
Summary: Optimization Overview
Optimization problems can be
(un)constrained, (non)convex, (non)linear, (non)smooth,
continuous/integer,(in)finite dimensional, ...
Aim: find local minima of smooth nonlinear problems: f(x*)=0
Derivative based methods iterate x i+1 = xi + i pi with
search direction pi and step length i .
start at initial guess x0 ,
Three methods:
steepest descent: intuitive, but slow (linear convergence)
Newtons method: very fast (quadratic convergence)
Newton type methods: cheap, fast, and popular, e.g.
BFGS (superlinear)
Gauss-Newton (fast linear convergence)
Attention: Nonlinear vs. Convex Optimization
For nonlinear problems, Newton type algorithms only find local minima, and
"optimal solution" depends on initialization!
Important exception: convex problems
"The great watershed in optimization isn't between linearity and
nonlinearity, but convexity and nonconvexity - R. Tyrrell Rockafellar
Literature
J. Nocedal, S. Wright: Numerical Optimization, Springer, 1999/2006
P. E. Gill, W. Murray, M. H. Wright: Practical Optimization, Academic
Press, 1981
R. Fletcher, Practical Methods of Optimization, Wiley, 1987
D. E. Luenberger: Linear and Nonlinear Programming, Addison
Wesley, 1984