The Kuhn-Tucker conditions
Hao Zhang
1. General Case
The General form of optimization problems is below:
minimize f ( x); x ∈ Rn
such that g j ( x) ≤ 0; j = 1,..., p
h j ( x) = 0; j = 1,..., q
In general, this problem may have several local minima. Only under
special circumstances are sure of the existence of single global minimum.
The necessary conditions for a minimum of the constrained problem are
obtained by using the Lagrange multiplier method.
We start by considering the special case of equality constraints only.
Using the Lagrange multiplier technique, we define the Lagrangian
function
ne
( x, λ ) = f ( x ) − ∑ λ j h j ( x )
Ldx
j =1
λ j are unknown Lagrangian multipliers.
The necessary conditions for a stationary point are
-1-
∂L ∂f ne
∂h j
= − ∑ λj =0 i = 1,..., n
∂xi ∂xi j =1 ∂xi
∂L
= h j ( x) = 0 j = 1,..., ne
∂λ j
These conditions, however, apply only at a regular point, that is at a
point where the gradients of the constraints are linearly independent. If we
have constraints gradients that are linearly dependent, it means that we can
remove some constraints without affecting the solution. At a regular point,
these two equations represent n+ne equations for the ne Lagrange
multipliers and the n coordinates of the stationary point.
The situation is somewhat more complicated when inequality
constraints are present. To be able to apply the Lagrange multiplier
method we first transform the inequality constraints to equality constraints
by adding slack variables. That is, the inequality constraints are written as
g j ( x) − t 2j = 0, j = 1,..., ng
where tj is a slack variable which measures how far the jth constraint
is from being critical. We can now form a Lagrangian function
ng
L ( x, t , λ ) = f − ∑ λ j ( g j − t 2j )
-2-
j =1
Differentiating the Lagrangian function with respect to x, λ, t we
obtain
∂L ∂f
ng
∂g j
= − ∑ λg = 0, i = 1,..., n
∂xi ∂xi j =1 ∂xi
∂L j = 1,..., ng
= − g j + t 2j = 0,
∂λ j
∂L
= 2λ j t j = 0, j = 1,..., ng
∂t j
The last two equations imply that when an inequality constraint is
not critical (so that the corresponding slack variable is non-zero) then the
Lagrange multiplier associated with the constraint is zero.
These three equations are the necessary conditions for a stationary
regular point. Note that for inequality constraints a regular point is one
where the gradients of the active constraints are linearly independent.
These conditions are modified slightly to yield the necessary
conditions for a minimum which is known as the Kuhn-Tucker conditions.
The Kuhn-Tucker conditions may be summarized as follows:
A point x is a local minimum of an inequality constrained problem
-3-
only if a set of nonnegative λj’s may be found such that:
∂L ∂f
ng
∂g j
1.
= − ∑ λg = 0, i = 1,..., n
∂xi ∂xi j =1 ∂xi
is satisfied.
2. The corresponding λj is zero if a constraint is not active.
Figure 1 A geometrical interpretation of Kuhn-Tucker condition
for the case of two constraints.
A geometrical interpretation of the Kuhn-Tucker conditions is
illustrated in Fig 1 for the case of two constraints. ∇g1 and ∇g 2 denote
the gradients of the two constraints which are orthogonal to the respective
constraint surfaces. The vector s shows a typical feasible direction which
does not lead immediately to any constraint violation. For the
-4-
two-constraints case. We get
−∇f = −(λ1∇g1 + λ2∇g 2 )
Assume that we want to determine whether point A is a minimum or
not. To improve the design we proceed from A in a direction s that is
usable and feasible.
To be usable, a small move along the direction should decrease the
objective function, so it must form an acute angle with −∇f . To be feasible,
s should form an obtuse angle with −∇g1 and −∇g 2 .
Clearly from the Fig 1, any vector which forms an acute angle with
−∇f will also form an acute angle with either −∇g1 or −∇g 2 .
Thus, Kuhn-Tucker conditions mean that no feasible design with
reduced objective function is to be found in the neighborhood of A.
Mathematically, the condition that a direction s be feasible is written
as
sT ∇g j ≥ 0, j ∈ I A
IA is the set of active constraints. Equality is permitted only for linear
or concave constraints. The condition for a usable direction (one that
decreases the objective function) is
s T ∇f < 0
-5-
Multiplying by si and summing over i we obtain
ng
s ∇f = ∑ λ j s T ∇ g j
T
j =1
So it is impossible if the λj ’s are positive.
If the Kuhn-Tucker conditions are satisfied at a point it is impossible
to find a direction with a negative slope for objective function that does
not violate the constraints. In some cases, though, it is possible to move in
a direction which is tangent to the active constraints and perpendicular to
the gradient, that is
sT ∇f = sT ∇g j = 0, j ∈ IA
The effect of such a move on the objective function and constraints
can be determined only from higher derivatives. In some cases, a move in
this direction could reduce the objective function without violating the
constraints even though the Kuhn-Tucker conditions are met. Therefore,
the Kuhn-Tucker conditions are necessary but not sufficient for optimality.
The Kuhn-Tucker conditions are sufficient when the number of
active constraints is equal to the number of design variables. In this case
this equation cannot be satisfied with s ≠ 0 because ∇g j includes n
linearly independent directions.
-6-
When the number of active constraints is not equal to the number of
design variables sufficient conditions for optimality require the second
derivatives of the objective function and constraints. A sufficient condition
for optimality is that the Hessian matrix of the Lagrangian function is
positive definite in the subspace tangent to the active constraints. For
example, if we take the case of equality constraints, the Hessian matrix of
the Lagrangian is
ne
∇ 2 L== ∇ 2 f − ∑ λ j ∇ 2 h j
j =1
The sufficient condition for optimality is that
sT (∇ 2 L) s > 0
for all s for which sT ∇h j = 0, j = 1,..., ne
When inequality constraints are present, the vector s also needs to
be orthogonal to the gradients of the active constraints with positive
Lagrange multipliers. For active constraints with zero Lagrange
multipliers, s must satisfy
sT ∇g j ≥ 0, when g j = 0and λj = 0
2. Example
Find the minimum of f = − x13 − 2 x22 + 10 x1 − 6 − 2 x23
Subject to
-7-
g1 = 10 − x1 x2 ≥ 0,
g 2 = x1 ≥ 0,
g3 = 10 − x2 ≥ 0
The Kuhn-Tucker conditions are
−3 x12 + 10 + λ1 x2 − λ2 = 0
−4 x2 − 6 x22 + λ1 x1 + λ3 = 0
We have to check for all possibilities of active constraints.
The simplest case is when no constraints are active, λ1 = λ2 = λ3 = 0 .
we get x1=1.826, x2=0, f=6.17. The Hessian matrix of the Lagrangian
⎛ −6 x1 λ1 ⎞
∇2 L = ⎜ ⎟
⎝ λ1 −4 − 12 x2 ⎠
is clearly negative definite, so that this point is a maximum.
We assume that the first constraints is active, x1 x2 = 10 , so that x1 ≠ 0
and g2 is inactive and therefore λ2 = 0 . We have two possibilities for the
third constraint. If it is active, we get x1=1, x2=10, λ1=-0.7, λ3=639.3 so
this point is neither a minimum nor a maximum.
If the third constraint is not active λ3=0, we obtain the following
equations
−3x12 + 10 + λ1 x2 = 0
−4 x2 − 6 x22 + λ1 x1 = 0
x1 x2 = 0
The only solution for these equations that satisfies the constraints on
x1 and x2 is
x1=3.847, x2=2.599, λ1=13.24, f=-73.08.
This point satisfies the KT condition for a minimum. However, the
-8-
Hessian of the Lagrangian at that point
⎛ −23.08 13.24 ⎞
∇2 L = ⎜ ⎟
⎝ 13.24 −35.19 ⎠
is negative definite, so that it cannot satisfy the sufficiency
condition. In fact, an examination of the function f at neighboring points
along x1 x2 = 10 reveals that the point is not a minimum.
Next we consider the possibility that g1 is not active, so that λ1=0, and
−3 x12 + 10 − λ2 = 0
−4 x2 − 6 x22 + λ3 = 0
We have already considered the possibility of both λ’s being zero, so
we need to consider only three possibilities of one of these Lagrange
multipliers being nonzero, or both being nonzero. The first case is
λ2 ≠ 0, λ3 = 0 , so that g2=0, and we get
x1=0, x2=0, λ2=10, f=-6 or x1=0, x2=-2/3, λ2=10, f=-6.99.
Both points satisfy the KT conditions for a minimum, but not the
sufficiency conditions. In fact, the vectors tangent to the active constraints
(x1=0 is the only one) have the form sT = (0, a) , and it is easy to check that
sT ∇ 2 Ls < 0 . It is also easy to check that these points are indeed no minima
by reducing x2 slightly.
The next case is λ2 = 0, λ3 ≠ 0 , so that g3=0. We get
x1=1.826, x2=10, λ3=640, f=-2194
This point satisfies the KT conditions, but it is not a minimum either.
It is easy to check that ∇ 2L= is negative definite in this case, so that the
-9-
sufficiency condition could not be satisfied.
Finally, we consider the case x1=0, x2=10, λ2=10, λ3=640, f=-2206.
Now the KT conditions are satisfied, and the number of active constraints
is equal to the number of design variables, so that it is the minimum.
3. Convex Problems
There is a class of problems namely convex problems, for which the
Kuhn-Tucker conditions are not only necessary but also sufficient for a
global minimum.
A set of points S is convex whenever the entire line segment
connecting two points that are in S is also in S. That is
If x1 , x2 ∈ S , then α x1 + (1 − α ) x2 ∈ S , 0 <α <1
A function is convex if
f [α x2 + (1 − α ) x1 ] ≤ α f ( x2 ) + (1 − α ) f ( x1 ), 0 <α <1
It can be shown that a function of n variables is convex if its matrix
of second derivatives is positive semi-definite.
A convex optimization problem has a convex objective function and a
convex feasible domain. It can be shown that the feasible domain is
convex if all the inequality constraints gj are concave and the equality
constraints are linear. A convex optimization problem has only one
minimum, and the Kuhn-Tucker conditions are sufficient to establish it.
Most optimization problems encountered in practice cannot be shown
to be convex. However, the theory of convex programming is still very
- 10 -
important in structural optimization, as we often approximate optimization
problems by a series of convex approximations.
- 11 -