INEQUALITY CONSTRAINTS
Introduction of Slack Variables
• Consider the very general situation in which we have a nonlinear objective
function, nonlinear equality, and nonlinear inequality constraints.
• The simplest way to handle inequality constraints is to convert them to equality
constraints using slack variables and then use the Lagrange theory.
• Consider the inequality constraints
hj ( x ) ≥ 0 j = 1, 2…, r
and define the real-valued slack variables θ j such that
2
θj = hj ( x ) ≥ 0 j = 1, 2, …, r
but at the expense of introducing r new variables.
• If we now consider the general problem written as
minimize f(x) (1)
x
subject to hj ( x ) ≥ 0 j = 1 ( 1 )r (2)
• Introducing the slack variables:
2
hj ( x ) – θj = 0 j = 1 ( 1 )r
the Lagrangian is written as:
r
2
L ( x, λ, θ ) = f ( x ) + ∑ λj ( hj ( x ) – θj ) (3)
j=1
• The necessary conditions for an optimum are:
r
∂L ∂f ∂h j
∂ xi
=
∂ xi
+ ∑ λj
∂ xi
= 0 i = 1 ( 1 )n (4)
j=1 x = x*
*
λ = λ
∂L = h ( x ) – θ 2 j = 1 ( 1 )r
∂ λj j j x = x* = 0 (5)
*
θ = θ
∂L * *
= – 2λ j θ j = 0 j = 1 ( 1 )r (6)
∂ θj
*
• From the last expression (6), it is obvious that either λ* = 0 or θ j = 0 or both.
*
• Case 1: λ *j = 0 , θ j ≠ 0
* 2
In this case, the constraint h j ( x ) ≥ 0 is ignored since h j ( x * ) = ( θ j ) > 0
(i.e. the constraint is not binding).
*
If all λ j = 0 , then (4) implies that ∇f ( x * ) = 0 which means that the the
solution is the unconstrained minimum.
*
• Case 2: θ j = 0 , λ *j ≠ 0
In this case, we have h j ( x * ) = 0 which means that the optimal solution is on
the boundary of the j th constraint.
Since λ *j ≠ 0 this implies that ∇f ( x * ) ≠ 0 and therefore we are not at the
unconstrained minimum.
*
• Case 3: θ j = 0 and λ *j = 0 for all j.
In this case, we have h j ( x * ) = 0 for all j and ∇f ( x * ) = 0 .
Therefore, the boundary passes through the unconstrained optimum which is
also the constrained optimum.
Example: Now consider the problem
2
minimize f( x) = (x – a) + b
x
subject to: x≥c
f( x) 2
f(x) = (x – a) + b
b x≥c
a c x
Sketch of the constrained one dimensional problem.
• The location of the minimum depends on whether or not the unconstrained
minimum is inside the feasible region or not.
• If c > a then the minimum lies at x = c , which is the boundary of the feasible
reagion defined by x ≥ c .
• If c ≤ a then the minimum lies at the unconstrained minimum, x = a .
2
• Introducing a single slack variable, θ = x – c ≥ 0 :
2
x–c–θ = 0
and we can write the Lagrangian as
2 2
L ( x , λ, θ ) = ( x – a ) + b + λ ( x – c – θ )
where λ is the Lagrange multiplier.
∂L
= 2 ( x * – a ) + λ* = 0 (7)
∂x
∂L 2
= x * – c – θ* = 0 (8)
∂λ
∂L
= – 2λ* θ* = 0 (9)
∂θ
• In general, we need to know how c and a compare.
• Case 1:
*
From (9), assume λ* = 0 , θ ≠ 0 .
2
Therefore, from (7) x * = a and thus from (8), a – c – θ* = 0 which gives
2
that θ* = a – c and we have that θ* is real only for c ≤ a .
Now since λ* = 0 we have
∂f
L ( x *, λ* , θ* ) = f ( x * ) and = 0
∂x
x*
This tells us that the unconstrained minimum is the constrained minimum.
• Case 2: Now let us assume that λ* ≠ 0 , θ* = 0 .
From (8) we have x * = c and from (7) λ* = 2 ( a – c ) .
Since λ* ≠ 0 and in the previous case we had c ≤ a , now we have c > a .
• Case 3: For the case λ* = θ* = 0 , (7) tells us that x * – a = 0 and therefore
x* = a .
From (8) we have x – c = 0 and therefore x * = a = c . The uncostrained
minimum lies on the boundary since from (7)
∂L ∂f
= = 0.
∂x ∂x
x* x*
Example: As a two dimensional example, consider
2 2
minimize f (x) = ( x 1 – 3 ) + 2 ( x 2 – 5 )
x
subject to: g(x) = 2x 1 + 3x 2 – 5 ≤ 0
x 2 f (x) = ( x – 3 ) 2 + 2 ( x – 5 ) 2
1 2
4
g = 2x 1 + 3x 2 – 5
2
3 6 x1
Contours and feasible region for the example problem.
• Unless one was to draw a very accurate contour plot, it is hard to find the
minimum from such a graphical method.
• It is obvious from the graph though, that the minimum will lie on the line
g(x) = 0 .
2
• We introduce a single slack variable, θ , and construct the Lagrangian as
2 2 2
L ( x, λ, θ ) = ( x 1 – 3 ) + 2 ( x 2 – 5 ) + λ ( 2x 1 + 3x 2 – 5 + θ ) .
2
• The inequality constraint was changed to the equality constraint g(x) + θ = 0 ,
2
using the slack variable θ = – g(x) ≥ 0 .
• The necessary conditions become
∂L
= 2 ( x *1 – 3 ) + 2λ* = 0 (10)
∂ x1
∂L
= 4 ( x *2 – 5 ) + 3λ* = 0 (11)
∂ x2
∂L
= 2θ* λ* = 0 (12)
∂θ
∂L 2
= 2x *1 + 3x *2 – 5 + θ* = 0 (13)
∂λ
From (10) and (11):
x *1 = 3 – λ*
3
x *2 = 5 – --- λ*
4
substituting these expressions in (13) we have:
3 2
2 ( 3 – λ* ) + 3 ⎛ 5 – --- λ* ⎞ – 5 + θ* = 0
⎝ 4 ⎠
17 2
16 – ------ λ* + θ* = 0 .
4
If λ* = 0 then θ* will be complex. If θ* = 0 then λ* = 64 ⁄ 17 and therefore
13 37
x *1 = – ------ x *2 = ------
17 17
θ* = 0 means there is no slack in the constraint as expected from the plot.
The Kuhn-Tucker Theorem
• Kuhn-Tucker theorem gives the necessary conditions for optimum of a nonlinear
objective function constrained by a set of nonlinear inequality constraints.
• The general problem is written as
n
minimize f(x) x∈
x
subject to: g i(x) ≤ 0 i = 1, 2, …r
If we had equality constraints, then we could introduce two inequality
constraints in place of it.
For instance if it was required that h ( x ) = 0 , then we could just impose
h ( x ) ≤ 0 and h ( x ) ≥ 0 or – h(x) ≤ 0 .
• Now assume that f ( x ) and g i ( x ) are differentiable functions; The Lagrangian is:
L ( x, λ ) = f ( x ) + ∑ λi gi ( x )
i=1
The necessary conditions for x * to be the solution to the above problem are:
r
∂ ∂
f(x *) + ∑ λ*i g i ( x * ) = 0 j = 1, 2, …, n (14)
∂ xj ∂ xj
i=1
g i(x *) ≤ 0 i = 1 ( 1 )r (15)
*
λi gi ( x* ) = 0 i = 1 ( 1 )r (16)
*
λi ≥ 0 i = 1 ( 1 )r (17)
• These are known as the Kuhn-Tucker stationary conditions; written compactly
as:
∇x L ( x *, λ* ) = 0 (18)
* *
∇λ L ( x *, λ ) = g ( x ) ≤ 0 (19)
T *
( λ* ) g ( x ) = 0 (20)
*
λ ≥0 (21)
• If our problem is one of maximization instead of minimization then
n
maximize f(x) x∈
x
subject to: g i(x) ≤ 0 i = 1, 2, …r
we can replace f(x) by – f(x) in the first condition
r
∂ * ∂
–
∂ xj
f(x *) + ∑ λi ∂ xjgi ( x* ) = 0 j = 1, 2, …, n (22)
i=1
r
∂ ∂
f(x *) + ∑ ( – λ*i ) g i ( x * ) = 0 j = 1, 2, …, n . (23)
∂ xj ∂ xj
i=1
• For the maximization problem is one of changing the sign of λ*i :
*
∇x L ( x *, λ ) = 0 (24)
*
∇λ L ( x *, λ* ) = g ( x ) ≤ 0 (25)
* T *
(λ ) g(x ) = 0 (26)
λ* ≤ 0 (27)
Transformation via the Penalty Method
• The Kuhn-Tucker necessary conditions give us a theoretical framework for
dealing with nonlinear optimization
• From a practical computer algorithm point of view we are not much further than
we were when we started.
• We require practical methods of solving problems of the form:
n
minimize f(x) x∈ (28)
x
subject to gj ( x ) ≤ 0 j = 1 ( 1 )J (29)
h k(x) = 0 k = 1 ( 1 )K (30)
• We introduce a new objective function called the penalty function
P ( x ;R ) = f ( x ) + Ω(R, g(x), h(x))
where the vector R contains the penalty parameters and
Ω(R, g(x), h(x))
is the penalty term.
• The penalty term is a function of R and the constraint functions, g(x), h(x) .
• The purpose of the addition of this term to the objective function is to penalize
the objective function when a set of decision variables, x , which are not feasible
are chosen.
Use of a parabolic penalty term
• Consider the minimization of an objective function, f(x) with equality
constraints, h(x) .
• We create a penalty function by adding a positive coefficient times each
constraint, that is
K
2
minimize P ( x ;R ) = f ( x ) +
x ∑ Rk { hk ( x ) } . (31)
k=1
th
As the penalty parameters R k → ∞ , more weight is attached to satisfying the k
constraint.
th
If a specific parameter is chosen as zero, say R k = 0 , then the k equality
constraint is ignored.
The user specifies value of R k according to the importance of satisfying each
equality constraint.
Example:
2 2
minimize x 1 + x2
x
subject to: x2 = 1
We construct a penalty function as:
2 2 2
P ( x ;R ) = x 1 + x 2 + R ( x 2 – 1 )
and we proceed to minimizing P ( x ;R ) for particular values of R .
• We proceed analytically; first order necessary conditions for a minimum
∂P
= 2x *1 = 0 ⇒ x *1 = 0
∂ x1
∂P * * * R
= 2x 2 + 2R ( x 2 – 1 ) = 0 ⇒ x 2 = -------------
∂ x2 1+R
If we now take the limit as R → ∞ , we have
R - = 1.
x *2 = lim ------------
R→∞1+R
• In a numerical procedure, the value of the R would be increased gradually and
the numerical optimization would be performed several times.
x2
R = ∞
x2 = 1
R = 2
R = 1
x1
unconstrained minimum
R = 0
Example of the use of a parabolic penalty function.
Inequality constrained problems
• Consider the penalty method for inequality constrained problems.
• The general nonlinear objective function with J nonlinear inequality constraints
is written as
n
minimize f(x) x∈R (32)
x
subject to gj ( x ) ≤ 0 j = 1 ( 1 )J (33)
A penalty function can be constructed as
J
2
P ( x ;R ) = f( x ) + ∑ R i [ g i ( x ) ] u(g i) (34)
i=1
where u(g i) is the step-function defined by
⎧0 if g i(x) ≤ 0
u ( gi ) = ⎨ (35)
⎩1 if g i(x) > 0
and the penalty parameter is chosen as a positive number, R i > 0 .
2
• The term [ g i ( x ) ] u ( g i ) is sometimes called the bracket operator and is denoted
〈 g i ( x )〉
• The step-function is used to ignore the constraint when it is satisfied by the
decision variables and to treat it as a penalty term when it is not satisfied.
• When this type of penalty term is used, the method is referred to as an exterior
penalty method since points outside the feasible region are allowed, but are
penalized.
• As the penalty parameter increases, the feasibility region is “pushed in”.
constraint constraint
satisfied violated J
2
Ω = ∑ Ri [ gi ( x ) ] u ( gi )
Ri i=1
1 gi
Exterior penalty method.
Inverse penalty term
• An alternate method which is commonly used is the inverse penaly method.
• If we have a nonlinear optimization problem written as
n
minimize f(x) x∈R (36)
x
subject to gj ( x ) ≥ 0 j = 1 ( 1 )J (37)
then we can construct a penalty function as
J
1
P ( x ;R ) = f( x ) + R ∑ g-----------
i ( x
-
)
i=1
where only one penalty parameter is used.
• The method is easy to visualize if we consider the case of only one constraint,
J = 1 . Then the penalty term is simply
R
Ω = -----------
g(x)
where g(x) is the single constraint.
R
Ω = -----------
g(x)
-1 1 g(x)
-R
Inverse penalty term.
• As can be deduced from the figure, it is important that only feasible points be
started with; because of this, this method is classified as an interior method.
• With exterior penalties the parameter R is steadily increased with R → ∞ in the
limit so as to exclude infeasible points.
• With interior penalties, the parameter R is steadily decreased with R → 0 in the
limit. Otherwise, you may artificially exclude a minimum located on the
boundary.
Example:
In the following problem, solve it using both an interior and exterior method.
2 2
minimize f(x) = ( x 1 – 4 ) + ( x 2 – 4 )
x
subject to: g ( x ) = 5 – x1 – x2 ≥ 0
solve using the “bracket operator”
2
P ( x ;R ) = f(x) + R [ g ( x ) ] u ( – g(x) )
we have
2 2 2
P ( x ;R ) = ( x 1 – 4 ) + ( x 2 – 4 ) + R ( 5 – x 1 – x 2 ) u ( – g(x) )
• Thus, when g(x) < 0 , i.e. the decision variables are infeasible, then a penalty of
2
R ( 5 – x 1 – x 2 ) is applied.
• Proceeding analytically to find the necessary conditions for a minimum, we have
∂P
= 2 ( x *1 – 4 ) + ( 2R ) ( 5 – x *1 – x *2 ) ( – 1 ) = 0
∂ x1
∂P
= 2 ( x *2 – 4 ) + ( 2R ) ( 5 – x *1 – x *2 ) ( – 1 ) = 0
∂ x2
subtracting these two equations
2 ( x *1 – 4 ) – 2 ( x *2 – 4 ) = 0 ⇒ x *1 = x *2 .
From the first above, we get
( x 1 – 4 ) – R ( 5 – 2x 1 ) = 0
and therefore
5R + 4
x 1 = ----------------
2R + 1
• Increasing the penalty parameter to ∞, we have
5
lim x 1 = ---
R→∞ 2
and the constrained minimum is:
5 5
x * = ⎛ ---, ---⎞
⎝ 2 2⎠
Since the constraint, g ( x * ) = 0 this implies that the constaint is tight.
The unconstrained minimum is at x = ( 4, 4 ) .
• Now solve using the inverse penalty:
–1
P ( x ;R ) = f( x ) + R [ g ( x ) ]
we have
2 2 –1
P ( x ;R ) = ( x 1 – 4 ) + ( x 2 – 4 ) + R ( 5 – x 1 – x 2 )
Whether or not g(x) < 0 , i.e. whether or not the decision variables are infeasible,
–1
a penalty of R ( 5 – x 1 – x 2 ) is applied.
• We must make sure that we remain feasible during the execution of any
algorithm we may employ.
• Proceeding analytically to find the necessary conditions for a minimum, we have
∂P R
= 2 ( x *1 – 4 ) + --------------------------------- = 0
∂ x1 2
( 5 – x *1 – x *2 )
∂P R
= 2 ( x *2 – 4 ) + --------------------------------- = 0
∂ x2 2
( 5 – x *1 – x *2 )
subtracting these two equations, we again get x *1 = x *2 and we also have
3 2 R
4 ( x *1 ) – 36 ( x *1 ) + 105x *1 – 100 + --- = 0
2
• This equation can be solved, for its roots, and the minimum of P ( x ;R ) for
particular values of R can be found.
Minimum for different values of R
R x *1, x *2 f (x* )
100 0.5864 23.3053
10 1.7540 10.0890
1 2.2340 6.32375
0.1 2.4113 5.0479
0.01 2.4714 4.6732
0.001 2.4909 4.5548
0 2.5000 4.5000
• For each value of the penalty parameter, an unconstrained optimization problem
must be solved.