ENGINEERING OPTIMIZATION
Unconstrained Optimization
BITS Pilani
Pilani Campus
Outline
• Unconstrained problems
– Single variable
– Two variables
– Multiple variables
• Unconstrained algorithms
– Ternary search
– Dichotomous and Golden-section
– Gradient descent/ascent method
2
Unconstrained
problems
3
Unconstrained Problems
• An extreme point of a function f(X) defines either a
maximum or a minimum of the function.
• Point 𝐗𝟎 = x10 , x20 , … , xn0 is a maximum if:
f(𝐗𝟎 + 𝐡) ≤ f(𝐗𝟎)
for all 𝐡 = h1 , h2 , … , hn , where |hj | is sufficiently
small.
• Point 𝐗𝟎 = x10 , x20 , … , xn0 is a minimum if:
f(𝐗𝟎 + 𝐡) ≥ f(𝐗𝟎)
4
Strong
Maxima
Strong
Maxima
Weak
Maxima
Inflection/Saddle
5
Single-variable function
• Necessary condition for x0 to be an extreme point of
z = f(x) is:
dz
=0
dx
– satisfied at inflection/saddle points points obtained from
dz
= 0 stationary points.
dx
6
Single-variable function
• Sufficient condition: Given x0, a stationary point of
f(x), if the first (n – 1) derivatives are zero:
dz d2 z dn−1 z dn z
= 2 = ⋯ = n−1 = 0 and n
dx dx dx dx
i. If n is odd x0 is an inflection point.
ii. If n is even
dn z
• x0 is minimum if >0
dxn
dn z
• x0 is maximum if <0
dxn
7
8
9
Two variable
function
https://www.youtube.com/watch?v=_Ffcr98c7EE
10
Two-variable function
• Necessary condition for (x0, y0) to be an extreme
point of z = f(x, y) is:
𝜕z 𝜕z
= 0 and =0
𝜕x 𝜕y
– satisfied at inflection/saddle points.
What is the sufficient condition?
11
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − >0
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z
– if > 0 Local minimum
𝜕x2
𝜕2 z
– if < 0 Local maximum
𝜕x2
2
𝜕2 z 𝜕2 z 𝜕2 z
• if − < 0 saddle point
𝜕x2 𝜕y2 𝜕x𝜕y
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
• if − = 0 inconclusive…
𝜕x2 𝜕y2 𝜕x𝜕y 𝜕y𝜕x
12
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR
Compute the values of a for which (0,0) is
– A minimum:
– A saddle point:
13
Two-variable function
Example: Consider the function:
z = x 2 + y 2 − a(xy), aϵR
Compute the values of a for which (0,0) is
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– A minimum: =? > 0 𝑎𝑛𝑑 2 2 − =? > 0
𝜕x2 𝜕x 𝜕y 𝜕x𝜕y
2
𝜕2 z 𝜕2 z 𝜕2 z
– A saddle point: − =? < 0
𝜕x2 𝜕y2 𝜕x𝜕y
14
Two-variable function (alternative form)
• Pricipal minor: Let A be an n×n matrix. A k×k
submatrix of A formed by deleting n−k rows of A, and
the same n−k columns of A, is called principal
submatrix of A. The determinant of a principal
submatrix of A is called a principal minor of A.
• Example: Compute the principal minors of
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
15
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: a11 , a22 , and a33
a11 a12 a11 a13 a22 a23
– 2nd order: a a22 , a31 a33 , and a32 a33
21
a11 a12 a13
– 3rd order: a21 a22 a23
a31 a32 a33
17
Two-variable function (alternative form)
• Leading principal minor: Let A be an n×n matrix.
The kth order principal sub-matrix of A obtained by
deleting the last n−k rows and columns of A is called
the kth order leading principal submatrix of A, and its
determinant is called the kth order leading principal
minor of A.
• Example: Compute the leading principal minors of
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
18
Two-variable function (alternative form)
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
• Solution:
– 1st order: 𝒂𝟏𝟏 , a22 , and a33
𝒂𝟏𝟏 𝒂𝟏𝟐 a11 a13 a22 a23
– 2nd order: 𝒂 𝒂𝟐𝟐 , a31 a33 , and a32 a33
𝟐𝟏
𝒂𝟏𝟏 𝒂𝟏𝟐 𝒂𝟏𝟑
– 3rd order: 𝒂𝟐𝟏 𝒂𝟐𝟐 𝒂𝟐𝟑
𝒂𝟑𝟏 𝒂𝟑𝟐 𝒂𝟑𝟑
19
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0.
definite
Positive semi-
All principal minors of A are ≥ 0.
definite
Negative The leading principal minors alternate in sign as follows:
definite |A1| < 0, |A2| > 0, |A3| < 0, etc.
Negative All principal minors of odd order are ≤ 0 and all
semi-definite principal minors of even order are ≥ 0.
Indefinite None of the above criteria is satisfied.
20
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0. Minimum
definite
Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite
Negative The leading principal minors alternate in sign as follows:
definite |A1| < 0, |A2| > 0, |A3| < 0, etc. Maximum
Negative All principal minors of odd order are ≤ 0 and all
semi-definite principal minors of even order are ≥ 0. Inconclusive
Indefinite None of the above criteria is satisfied. Saddle
21
Two-variable function (alternative form)
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then define hessian matrix (H)
𝜕2z 𝜕2z
𝜕x 2 𝜕x𝜕y
H=
𝜕2z 𝜕2z
𝜕y𝜕x 𝜕y 2
𝜕2 z
• 1st order leading principal minor: D1 =
𝜕x2
𝜕2 z 𝜕2 z
𝜕x2 𝜕x𝜕y
• 2nd order leading principal minor: D2 =
𝜕2 z 𝜕2 z
𝜕y𝜕x 𝜕y2
22
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is positive definite minima
– A) If and only if all leading principal minors of the matrix are
positive, then the matrix is positive definite.
– D1 > 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– > 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y
23
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) is negative definite maximum
– B) If and only if the kth order leading principal minor of the
matrix has sign (−1)k, then the matrix is negative definite.
– D1 < 0 and D2 > 0
2
𝜕2 z 𝜕2 z 𝜕2 z 𝜕2 z
– < 0 and − >0
𝜕x2 𝜕x2 𝜕y2 𝜕x𝜕y
24
Two-variable function
• Sufficient condition: Given (x0, y0), a stationary point
of f(x, y), then:
– H(x0 , y0 ) indefinite saddle point.
– C) If some kth leading principal minors is non-zero, but does
not fit either of the sign patterns, then the matrix is
indefinite.
– Neither A nor B holds and D2 ≠ 0
2
𝜕2 z 𝜕2 z 𝜕2 z
– − <0
𝜕x2 𝜕y2 𝜕x𝜕y
• Test inconclusive if none of the above hold D2 = 0
25
Multi-variable function
• Necessary condition for X0 = (x1, x2, x3, …xn) to be
an extreme point of z = f(x1 , x1 , … , xn ) = f(𝐗):
𝜕z 𝜕z 𝜕z
= 0, = 0, … and = 0 at 𝐗 𝟎
𝜕x1 𝜕x2 𝜕xn
Or
𝛻z 𝐗 𝟎 = 𝟎
– satisfied at inflection/saddle points.
What is the sufficient condition?
26
Multi-variable function
• Sufficient condition: Given X0 = (x1, x2, x3, …xn), a
stationary point of z = f(𝐗), then (H)
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x12 𝜕x1 𝜕x2 𝜕x1 𝜕xn
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕x2 𝜕x1 𝜕x22 𝜕x2 𝜕xn
H= . . . . . .
. . . . . .
. . . . . .
𝜕2z 𝜕2z 𝜕2z
. . .
𝜕xn 𝜕x1 𝜕xn 𝜕x2 𝜕xn2
27
Type of
Necessary and sufficient condition.
Two-variable
matrix A function
Positive
All leading principal minors of A are > 0. Minimum
definite
Positive semi-
All principal minors of A are ≥ 0. Inconclusive
definite
Negative The leading principal minors alternate in sign as follows:
definite |A1| < 0, |A2| > 0, |A3| < 0, etc. Maximum
Negative All principal minors of odd order are ≤ 0 and all
semi-definite principal minors of even order are ≥ 0. Inconclusive
Indefinite None of the above criteria is satisfied. Saddle
28
Example 20-2.
Determine the extreme points of the following functions:
(a) f(X) = x13 + x23 − 3x1x2
(b) f(X) = 2x12 + x22 + x32 + 6(x1 + x2 + x3) + 2x1x2x3 [HW]
29
Example 20-3.
Verify that the function:
f(X) = 2x1x2x3 − 4x1x3 − 2x2x3 + x12 + x22 + x32 − 2x1 − 4x2 + 4x3
has the stationary points:
(0, 3, 1), (0, 1, −1), (1, 2, 0), (2, 1, 1), and (2, 3, −1).
Use the sufficiency condition to identify the extreme points at:
– (1, 2, 0)
– (0, 1, −1)
30
Multi-variable function
• Notes
– Matrices have to be real and symmetric when considering
their definiteness.
– Hessian is real and symmetric.
– All positive definite matrices are positive semi-definite.
– All negative definite matrices are negative semi-definite.
31
Direct search
methods
32
Direct search methods
• Apply primarily to strictly unimodal single-variable
functions.
• Identify interval of uncertainty known to include the
optimum solution point.
• Locate the optimum by iteratively narrowing the
interval of uncertainty to a desired level of accuracy.
33
Direct search methods
• A unimodal function is one that has only one peak
(maximum) or valley (minimum) in a given
interval.
34
Direct search methods
• Three search algorithms:
– Terenary
– Dichotomous
– Golden-section.
• Both seek maximization of unimodal f(x) in
a≤ x≤b
that includes the optimum point x*.
• Initial interval of uncertainty I0 = [a, b]
35
Direct search methods
f(x)
a b
Maximize f(x) in: a ≤ x ≤ b, I0 = [a, b]
36
Direct search methods
• If x1 and x2 are on the
either sides of the f(x1)
maximum
– if f(x1) > f(x2)
– replace xR with x2
f(x2)
xL x1 x2 xR
37
Direct search methods
• If x1 and x2 are on the
right of the maximum f(x1)
f(x1) > f(x2)
– replace xR x2
f(x2)
xL x1 x2 xR
38
Direct search methods
• If x1 and x2 are on the
either sides of the
maximum
– if f(x1) < f(x2) f(x2)
– replace xL with x1
f(x1)
xL x1 x2 xR
39
Direct search methods
• If x1 and x2 are on the
left of the maximum
f(x2)
f(x1) < f(x2)
– replace xL with x1
f(x1)
xL x1 x2 xR
40
Direct search methods
• Fundamental ideal (**)
Condition What to do? New interval
f(x1) > f(x2) Replace xR x2 (xL, x2)
f(x1) < f(x2) Replace xL x1 (x1, xR)
f(x1) = f(x2) Replace xL x1 & xR x2 (x1, x2)
41
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width is ___.
xL x1 x2 xR
L L L
42
Ternary search
• Divides the interval into
three equal parts.
• Use **
f(x2)
f(x1)
• Next interval = 2/3 of
current interval.
• Terminate when the
interval width In ≤ Δ.
xL + xR xL x1 x2 xR
x∗ =
2
L L L
43
Dichotomous search
• Divides the interval:
(xL +xR ) δ f(x1) f(x2)
– x1 = −
2 2
(xL +xR ) δ
– x2 = +
2 2
• Use **
I0 δ
• I1 = +
2 2
I1 δ I0 δ δ
• I2 = + = + +
2 2 4 4 2
xL x1 x2 xR
I0 1
• In = n + δ(1 − n )
2 2 L δ L
44
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I δ
f(x1)
f(x2)
• When to stop?
xL x1 x2 xR
δ
45
Dichotomous search
I0 δ
In = δ + n − n
2 2
Note: In ≥ δ
• Interval narrowing slows
appreciably as I δ
f(x1)
f(x2)
• When to stop?
I0 δ
δ+ n− n ≤Δ
2 2 xL x1 x2 xR
δ
46
Golden-section search
• Save computations
by reusing discarded f(x2)
values
f(x1)
• Divides the interval into
three parts:
x1 = xR − α(xR − xL )
x2 = xL + α(xR − xL )
xL x1 x2 xR
0<α<1
47
Golden-section search
• Save computations by
reusing discarded f(x2)
values
f(x1)
• Consider Inew = (xL , x2 )
𝐈𝐧𝐞𝐰
xL x1 x2 xR
48
Golden-section search
• Save computations by
reusing discarded
values
• Consider Inew = (xL , x2 )
• What should be α such
that x2,new = x1 𝐈𝐧𝐞𝐰
−1 + √5 xL x1,new x1
α= = .681 xR
2 x2,new
49
Golden-section search
• Divides the interval:
x1 = xR − α(xR − xL ) f(x2)
x2 = xL + α(xR − xL )
−1 + √5 f(x1)
𝛼= = .681
2
• Use **
• Next interval = 𝛼 times
current.
• Terminate when In ≤ Δ. xL x1 x2 xR
50
Golden-section vs dichotomous search
• Golden-section converges more rapidly than
dichotomous. Why?
• Less computation required in Golden-section. Why?
51
52
Golden-section vs dichotomous search
• In the dichotomous method, narrowing of uncertainty
interval slows down appreciably as IΔ.
• Golden-section requires half the computations
recycles one set of computations from the preceding
iteration.
53
HW
• Derive the value of the golden ratio (α).
• Do example 21.1-1
54
Notes
A unimodal function f(x) on [a, b] satisfies for some
unique x ∗ ∈ [a, b] that:
f(x) is strictly increasing (decreasing) for x ≤ x ∗ and strictly
decreasing (increasing) for x ≥ x ∗ .
x ∗ is the maxima (minima)
55
Steepest Ascent Method
• For optimizing twice continuously differentiable
functions.
• Generate successive points in the direction of the
gradient of the function.
56
Gradients and level curves
P2 P1
P3
Y
X
https://math.illinois.edu/research/igl/projects/spring/2019/interactive-visualizations-calculus-and-probability 57
Gradients and level curves
https://mathinsight.org/directional_derivative_gradient_introduction
58
Gradients and level curves
• Gradient is always perpendicular to the level curves.
• When the level curves are close together, the gradient is
large.
• What happens to the gradient at the tops of the
mountains?
59
https://mathinsight.org/directional_derivative_gradient_introduction
60
P1
When to stop?
https://mathinsight.org/directional_derivative_gradient_introduction 61
P1
P2
https://mathinsight.org/directional_derivative_gradient_introduction 62
P3 P1
P2
https://mathinsight.org/directional_derivative_gradient_introduction 63
Example 21.1-2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
64
Example 21.1-2
3
2
1 𝑿𝟎 = (1,1)
𝛁𝒇(𝑿𝟎 ) = (−2,0)
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
65
Example 21.1-2
3
2
1 𝑷𝟎
𝑷𝟏
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
66
Example 21.1-2
3
2
1 𝑷𝟎
𝑷𝟏
1
2
1 1 3 2
2 2
𝑓 𝑥1 , 𝑥2 = 4𝑥1 + 6𝑥2 − 2𝑥12 − 2𝑥1 𝑥2 − 2𝑥22
67
Steepest Ascent Method
• 𝑓(𝑋) is maximized and 𝑋𝑘 be the current point.
• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 + 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .
• 𝑟 = 𝑟𝑘 that maximizes single-variable function:
ℎ 𝑟 = 𝒇[𝑿𝒌 + 𝑟𝑘 𝜵𝒇 𝑿𝒌 ]
• Terminates when 𝑿𝒌+𝟏 ≈ 𝑿𝒌 .
68
Steepest Descent Method
• 𝑓(𝑋) is minimized and 𝑋𝑘 be the current point.
• Next point:
𝑿𝒌+𝟏 = 𝑿𝒌 − 𝑟𝑘 𝜵𝒇(𝑿𝒌 )
– 𝑟𝑘 is the optimal step size at 𝑿𝒌 .
• 𝑟 = 𝑟𝑘 that minimizes single-variable function:
ℎ 𝑟 = 𝒇[𝑿𝒌 − 𝑟𝑘 𝜵𝒇 𝑿𝒌 ]
• Terminates when 𝑿𝒌+𝟏 ≈ 𝑿𝒌 .
69
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)
70
Example
• Minimize:
𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
Start at (0,0)
𝛻𝑓 = (1 + 4𝑥1 + 2𝑥2 , −1 + 2𝑥1 + 2𝑥2 )
𝛻𝑓 0,0 = 1, −1
𝑿𝟐 = 0,0 − 𝑟 1, −1 = (−𝑟, 𝑟)
ℎ 𝑟 = −𝑟 − 𝑟 + 2𝑟 2 − 2𝑟 2 + 𝑟 2 =
= 𝑟 2 − 2𝑟
= (𝑟 − 1)2 −1
Minimum at 𝑟 = 1 ⇒ 𝑿𝟐 = −1,1 𝑎𝑛𝑑 𝜵𝒇 −1,1 = (−1, −1)
71
Example
• Minimize: 𝑓 𝑥1 , 𝑥2 = 𝑥1 − 𝑥2 + 2𝑥12 + 2𝑥1 𝑥2 + 𝑥22
72