4.
Constrained Optimization
a. Introduction to Constrained Optimization Problems
Constrained optimization involves finding the maximum or minimum of an objective function subject
to constraints on the values of the decision variables. The general form is:
Minimize f(x)
Subject to:
g_i(x) ≤ 0 for i = 1, ..., m (inequality constraints)
h_j(x) = 0 for j = 1, ..., p (equality constraints)
and possibly x ∈ Ω (where Ω is the domain, often R^n)
Here, x is a vector of decision variables, f(x) is the objective function, g_i(x) are inequality constraint
functions, and h_j(x) are equality constraint functions.
Why constraints? Real-world problems have limitations (e.g., resource limits, physical laws, design
specifications). Constraints define the feasible set — the set of points x that satisfy all constraints.
The solution must lie in this set.
b. Equality and Inequality Constraints
Equality Constraints: Constraints of the form h_j(x) = 0. These define a surface or manifold in
the search space. The solution must lie on this surface.
o Example: x₁ + x₂ = 10 (a line in 2D)
Inequality Constraints: Constraints of the form g_i(x) ≤ 0 or g_i(x) ≥ 0 (note: ≥ can be
converted to ≤ by multiplying by -1). These define a region (possibly unbounded) in the
search space.
o Example: x₁² + x₂² ≤ 1 (a disk in 2D)
The feasible set is the intersection of the regions defined by all constraints.
c. Linear Programming and Quadratic Programming
Linear Programming (LP): Both the objective function and all constraints are linear functions.
The feasible set is a convex polyhedron. The optimal solution, if it exists, is found at a vertex
of this polyhedron. Solved by the Simplex method or interior-point methods.
Example:
Minimize cᵀx
Subject to: A x ≤ b, x ≥ 0
Quadratic Programming (QP): The objective function is quadratic (convex if the quadratic
form is positive semi-definite) and the constraints are linear. QP problems are common in
finance (portfolio optimization) and control.
Example:
Minimize (1/2) xᵀ Q x + cᵀx
Subject to: A x ≤ b, C x = d
d. Constrained Optimization Algorithms
1. Lagrange Multipliers (KKT Conditions):
o For equality constrained problems, the method of Lagrange multipliers states that at
gradients of the constraints: ∇f(x*) = Σ λ_j ∇h_j(x*).
a local minimum, the gradient of the objective is a linear combination of the
o For problems with inequality constraints, the Karush-Kuhn-Tucker (KKT) conditions
generalize Lagrange multipliers. For a point x* to be a local minimum, there must
exist constants λ_i (for inequalities) and μ_j (for equalities) such that:
Stationarity: ∇f(x*) + Σ λ_i ∇g_i(x*) + Σ μ_j ∇h_j(x*) = 0
Primal feasibility: g_i(x*) ≤ 0, h_j(x*) = 0
Dual feasibility: λ_i ≥ 0
Complementary slackness: λ_i g_i(x*) = 0
The KKT conditions are necessary for optimality under certain regularity conditions (constraint
qualifications).
2. Penalty Methods:
o Transform the constrained problem into an unconstrained one by adding a penalty
term to the objective function that increases when constraints are violated.
o For example, for inequality constraints g_i(x) ≤ 0, we can use a quadratic
penalty: P(x) = f(x) + μ * Σ [max(0, g_i(x))]², where μ is a large positive constant. As μ
→ ∞, the solution of the unconstrained problem approaches the constrained
solution.
o Pros: Simple idea, easy to implement. Cons: Ill-conditioning for large μ, and the
solution is only approximate.
3. Interior Point (Barrier) Methods:
o These methods are used for inequality constraints. They add a barrier function that
prevents the iterate from leaving the feasible region. For example, for
constraints g_i(x) ≤ 0, the logarithmic barrier function is B(x) = - Σ log(-g_i(x)).
o The problem becomes: Minimize f(x) + μ * B(x), where μ is a barrier parameter that
is decreased to zero as the method iterates.
o As μ decreases, the solution of the unconstrained problem approaches the solution
of the constrained problem from the interior of the feasible region (hence the
name).
o Pros: Can handle large problems efficiently, well-suited for convex optimization.
Cons: Requires a feasible starting point, and the barrier function must be chosen
appropriately.
e. Implementation using NumPy and SciPy
SciPy's optimize module provides functions for constrained optimization.
Linear Programming: scipy.optimize.linprog
Quadratic Programming: scipy.optimize.minimize with method='trust-constr' or other
methods that handle constraints, or use specialized QP solvers.
General Nonlinear Constraints: scipy.optimize.minimize with methods that support
constraints (e.g., 'SLSQP', 'trust-constr').
Example for Linear Programming:
python
from scipy.optimize import linprog
# Minimize: c^T x
# Subject to: A_ub x <= b_ub, A_eq x = b_eq, bounds on x
c = [-1, 4] # minimize -x1 + 4x2 -> same as maximize x1 - 4x2
A_ub = [[-3, 1], [1, 2]]
b_ub = [6, 4]
bounds = [(None, None), (-3, None)]
result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds)
print(result)
Example for Nonlinear Constraints (using SLSQP):
python
from scipy.optimize import minimize
# Objective function
def f(x):
return x[0]**2 + x[1]**2
# Constraint functions
def g1(x):
return x[0] + x[1] - 1 # >=0 constraint, so we use -g1 for <=0 form
# Constraints dictionary
cons = ({'type': 'ineq', 'fun': lambda x: -g1(x)}) # g1(x) >=0 is equivalent to -g1(x) <=0
x0 = [0.5, 0.5]
res = minimize(f, x0, constraints=cons, method='SLSQP')
print(res)
PART 2: HIGH-MARK ESSAY QUESTIONS & ANSWERS
Essay Question 1 (25 Marks)
"Discuss the Karush-Kuhn-Tucker (KKT) conditions for constrained optimization. Your answer
should include the derivation and interpretation of each condition, and explain why these
conditions are necessary for optimality. Use a diagram to illustrate the geometry of the KKT
conditions at a constrained optimum."
Answer:
The Karush-Kuhn-Tucker (KKT) conditions are first-order necessary conditions for a solution to be
optimal in a constrained optimization problem. They generalize the method of Lagrange multipliers
to include inequality constraints.
Consider the problem:
Minimize f(x)
Subject to: g_i(x) ≤ 0 for i=1,...,m and h_j(x)=0 for j=1,...,p.
The KKT conditions for a point x* to be a local minimum are:
∇f(x*) + Σ λ_i ∇g_i(x*) + Σ μ_j ∇h_j(x*) = 0
1. Stationarity:
This condition says that at the optimum, the gradient of the objective function must be a
linear combination of the gradients of the active constraints (both equality and active
inequalities). The coefficients λ_i and μ_j are called KKT multipliers.
2. Primal Feasibility:
g_i(x*) ≤ 0 for all i, and h_j(x*) = 0 for all j.
This ensures that x* is in the feasible set.
3. Dual Feasibility:
λ_i ≥ 0 for all i.
This is required for the direction of the inequality constraints. For a minimization problem,
the Lagrange multipliers for the inequality constraints must be non-negative.
4. Complementary Slackness:
λ_i * g_i(x*) = 0 for all i.
This condition says that for each inequality constraint, either the constraint is active (g_i(x*)
= 0) or the corresponding multiplier is zero (or both). If the constraint is inactive, it doesn't
influence the optimum and its multiplier is zero.
Why necessary? Under a constraint qualification (e.g., linear independence of the active constraints'
gradients), if x* is a local minimum, then there exist multipliers such that the KKT conditions hold.
Geometry: At the optimum, the gradient of the objective function must be pointing in a direction
that is a combination of the gradients of the active constraints. Specifically, for
minimization, ∇f(x*) must be in the cone spanned by the gradients of the active constraints. If it
were not, then we could move in a direction that decreases the objective without violating the
constraints.
Diagram: Imagine a feasible set defined by two inequalities. At the optimum, the gradient of the
objective and the gradients of the active constraints are opposite in direction (for the inequalities)
and the objective gradient is contained in the cone formed by the constraint gradients.
Essay Question 2 (25 Marks)
"Compare and contrast penalty methods and interior point methods for handling constraints in
optimization. Your answer should include the algorithmic steps, the advantages and disadvantages
of each, and the types of problems for which each method is best suited."
Answer:
Penalty Methods:
Idea: Transform the constrained problem into an unconstrained one by adding a penalty
term to the objective function that increases when constraints are violated.
Algorithmic Steps:
1. Choose an initial penalty parameter μ and an initial point x0.
2. Solve the unconstrained problem: minimize f(x) + μ * P(x), where P(x) is a penalty
function (e.g., quadratic penalty: P(x) = Σ [max(0, g_i(x))]² + Σ [h_j(x)]²).
3. Increase μ (e.g., μ <- c * μ with c>1) and use the solution from step 2 as the new
initial point.
4. Repeat until convergence (constraint violation is below a tolerance).
Advantages: Simple to implement; can handle both equality and inequality constraints; does
not require a feasible starting point.
Disadvantages: The unconstrained problem becomes ill-conditioned as μ becomes large; the
solution is only an approximation of the true solution (unless μ→∞, which is not numerically
feasible).
Best suited for: Problems where a rough solution is acceptable, and for problems with few
constraints that are not too nonlinear.
Interior Point Methods:
Idea: Also known as barrier methods, they add a barrier function that tends to infinity as the
boundary of the feasible set is approached. This keeps the iterates inside the feasible region.
Algorithmic Steps:
1. Start with a feasible point x0 and an initial barrier parameter μ.
2. Solve the unconstrained problem: minimize f(x) + μ * B(x), where B(x) is a barrier
function (e.g., logarithmic barrier: B(x) = - Σ log(-g_i(x)) for inequalities).
3. Decrease μ (e.g., μ <- c * μ with 0<c<1) and use the solution from step 2 as the new
initial point.
4. Repeat until convergence.
Advantages: The iterates remain feasible (if started feasible); the conditioning of the
problem does not deteriorate as μ decreases; efficient for large-scale problems, especially
convex ones.
Disadvantages: Requires a feasible starting point (which can be difficult to find); the barrier
function must be chosen appropriately for the constraints (typically for inequalities only).
Best suited for: Convex optimization problems (especially linear and quadratic programs)
and problems with many inequalities. Widely used in industry.
Comparison:
Penalty methods are easier to implement and can start from an infeasible point, but they
become numerically unstable for large penalty parameters. Interior point methods require a
feasible point but are more stable and efficient for large problems.
Penalty methods approximate the solution from the infeasible region, while interior point
methods approach the solution from the interior of the feasible region.
For problems with equality constraints, penalty methods are often used because interior
point methods are primarily for inequalities.
In practice, modern optimization solvers often use interior point methods for linear and convex
quadratic programming, while penalty methods are sometimes used in nonlinear programming or as
part of augmented Lagrangian methods (which combine penalty and Lagrange multiplier ideas).
Additional Revision Questions
1. What is the feasible set in constrained optimization?
The set of points that satisfy all constraints. It is the region within which the solution must
lie.
2. How do equality constraints affect the feasible set?
They restrict the feasible set to a surface (manifold) of lower dimension than the original
space.
3. What is the role of the Lagrange multipliers in the KKT conditions?
They indicate the sensitivity of the objective function to changes in the constraints. The
magnitude of a multiplier shows how much the objective would improve if the constraint
were relaxed.
4. Why is complementary slackness important?
It identifies which inequality constraints are active (binding) at the solution. Inactive
constraints do not affect the optimum.
5. Give an example of a penalty function for an inequality constraint g(x) ≤ 0.
Quadratic penalty: P(x) = [max(0, g(x))]².
6. What is a major drawback of penalty methods?
The problem becomes ill-conditioned as the penalty parameter increases, making it hard to
solve the unconstrained problem accurately.
7. What is a barrier function? Give an example.
A function that goes to infinity as the boundary of the feasible set is approached. Example:
Logarithmic barrier for g(x) ≤ 0: B(x) = -log(-g(x)).
8. What is the difference between linear and quadratic programming?
Linear programming has a linear objective and linear constraints. Quadratic programming
has a quadratic objective (and typically linear constraints).
9. How does SciPy's minimize function handle constraints?
Through the constraints argument, which takes a list of dictionaries specifying the constraint
type ('eq' or 'ineq') and a function.
10. What is the significance of the dual feasibility condition (λ_i ≥ 0) in KKT?
For a minimization problem, it ensures that the objective function would worsen if the
constraint were tightened (because the constraint is pointing in the direction of increase of
the objective).
These questions and answers should help in revising the topic. Let me know if you want to proceed
to the next topic.
PART 1: COMPREHENSIVE NOTES & DIAGRAMS
4. Constrained Optimization
a. Introduction to Constrained Optimization Problems
Constrained optimization involves finding the extremum (minimum or maximum) of an objective
function subject to restrictions on the values of the decision variables. These restrictions are
called constraints. The general form is:
Minimize f(x)
Subject to:
g_i(x) ≤ 0 for i = 1, ..., m (inequality constraints)
h_j(x) = 0 for j = 1, ..., p (equality constraints)
and possibly x ∈ Ω (where Ω is the domain, often R^n)
Here, x is a vector of decision variables, f(x) is the objective function, g_i(x) are inequality constraint
functions, and h_j(x) are equality constraint functions.
Why constraints? Real-world problems have limitations (e.g., resource limits, physical laws, design
specifications). Constraints define the feasible set — the set of points x that satisfy all constraints.
The solution must lie in this set.
b. Equality and Inequality Constraints
Equality Constraints: Constraints of the form h_j(x) = 0. These define a surface or manifold in
the search space. The solution must lie on this surface.
o Example: x₁ + x₂ = 10 (a line in 2D)
Inequality Constraints: Constraints of the form g_i(x) ≤ 0 or g_i(x) ≥ 0 (note: ≥ can be
converted to ≤ by multiplying by -1). These define a region (possibly unbounded) in the
search space.
o Example: x₁² + x₂² ≤ 1 (a disk in 2D)
The feasible set is the intersection of the regions defined by all constraints.
c. Linear Programming and Quadratic Programming
Linear Programming (LP): Both the objective function and all constraints are linear functions.
The feasible set is a convex polyhedron. The optimal solution, if it exists, is found at a vertex
of this polyhedron. Solved by the Simplex method or interior-point methods.
Example:
Minimize cᵀx
Subject to: A x ≤ b, x ≥ 0
Quadratic Programming (QP): The objective function is quadratic (convex if the quadratic
form is positive semi-definite) and the constraints are linear. QP problems are common in
finance (portfolio optimization) and control.
Example:
Minimize (1/2) xᵀ Q x + cᵀx
Subject to: A x ≤ b, C x = d
d. Constrained Optimization Algorithms
1. Lagrange Multipliers (KKT Conditions):
o For equality constrained problems, the method of Lagrange multipliers states that at
gradients of the constraints: ∇f(x*) = Σ λ_j ∇h_j(x*).
a local minimum, the gradient of the objective is a linear combination of the
o For problems with inequality constraints, the Karush-Kuhn-Tucker (KKT) conditions
generalize Lagrange multipliers. For a point x* to be a local minimum, there must
exist constants λ_i (for inequalities) and μ_j (for equalities) such that:
Stationarity: ∇f(x*) + Σ λ_i ∇g_i(x*) + Σ μ_j ∇h_j(x*) = 0
Primal feasibility: g_i(x*) ≤ 0, h_j(x*) = 0
Dual feasibility: λ_i ≥ 0
Complementary slackness: λ_i g_i(x*) = 0
The KKT conditions are necessary for optimality under certain regularity conditions (constraint
qualifications).
2. Penalty Methods:
o Transform the constrained problem into an unconstrained one by adding a penalty
term to the objective function that increases when constraints are violated.
o For example, for inequality constraints g_i(x) ≤ 0, we can use a quadratic
penalty: P(x) = f(x) + μ * Σ [max(0, g_i(x))]², where μ is a large positive constant. As μ
→ ∞, the solution of the unconstrained problem approaches the constrained
solution.
o Pros: Simple idea, easy to implement. Cons: Ill-conditioning for large μ, and the
solution is only approximate.
3. Interior Point (Barrier) Methods:
o These methods are used for inequality constraints. They add a barrier function that
prevents the iterate from leaving the feasible region. For example, for
constraints g_i(x) ≤ 0, the logarithmic barrier function is B(x) = - Σ log(-g_i(x)).
o The problem becomes: Minimize f(x) + μ * B(x), where μ is a barrier parameter that
is decreased to zero as the method iterates.
o As μ decreases, the solution of the unconstrained problem approaches the solution
of the constrained problem from the interior of the feasible region (hence the
name).
o Pros: Can handle large problems efficiently, well-suited for convex optimization.
Cons: Requires a feasible starting point, and the barrier function must be chosen
appropriately.
e. Implementation using NumPy and SciPy
SciPy's optimize module provides functions for constrained optimization.
Linear Programming: scipy.optimize.linprog
Quadratic Programming: scipy.optimize.minimize with method='trust-constr' or other
methods that handle constraints, or use specialized QP solvers.
General Nonlinear Constraints: scipy.optimize.minimize with methods that support
constraints (e.g., 'SLSQP', 'trust-constr').
Example for Linear Programming:
python
from scipy.optimize import linprog
# Minimize: c^T x
# Subject to: A_ub x <= b_ub, A_eq x = b_eq, bounds on x
c = [-1, 4] # minimize -x1 + 4x2 -> same as maximize x1 - 4x2
A_ub = [[-3, 1], [1, 2]]
b_ub = [6, 4]
bounds = [(None, None), (-3, None)]
result = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds)
print(result)
Example for Nonlinear Constraints (using SLSQP):
python
from scipy.optimize import minimize
# Objective function
def f(x):
return x[0]**2 + x[1]**2
# Constraint functions
def g1(x):
return x[0] + x[1] - 1 # >=0 constraint, so we use -g1 for <=0 form
# Constraints dictionary
cons = ({'type': 'ineq', 'fun': lambda x: -g1(x)}) # g1(x) >=0 is equivalent to -g1(x) <=0
x0 = [0.5, 0.5]
res = minimize(f, x0, constraints=cons, method='SLSQP')
print(res)
PART 2: HIGH-MARK ESSAY QUESTIONS & ANSWERS
Essay Question 1 (25 Marks)
"Discuss the Karush-Kuhn-Tucker (KKT) conditions for constrained optimization. Your answer
should include the derivation and interpretation of each condition, and explain why these
conditions are necessary for optimality. Use a diagram to illustrate the geometry of the KKT
conditions at a constrained optimum."
Answer:
The Karush-Kuhn-Tucker (KKT) conditions are first-order necessary conditions for a solution to be
optimal in a constrained optimization problem. They generalize the method of Lagrange multipliers
to include inequality constraints.
Consider the problem:
Minimize f(x)
Subject to: g_i(x) ≤ 0 for i=1,...,m and h_j(x)=0 for j=1,...,p.
The KKT conditions for a point x* to be a local minimum are:
∇f(x*) + Σ λ_i ∇g_i(x*) + Σ μ_j ∇h_j(x*) = 0
1. Stationarity:
This condition says that at the optimum, the gradient of the objective function must be a
linear combination of the gradients of the active constraints (both equality and active
inequalities). The coefficients λ_i and μ_j are called KKT multipliers.
2. Primal Feasibility:
g_i(x*) ≤ 0 for all i, and h_j(x*) = 0 for all j.
This ensures that x* is in the feasible set.
3. Dual Feasibility:
λ_i ≥ 0 for all i.
This is required for the direction of the inequality constraints. For a minimization problem,
the Lagrange multipliers for the inequality constraints must be non-negative.
4. Complementary Slackness:
λ_i * g_i(x*) = 0 for all i.
This condition says that for each inequality constraint, either the constraint is active (g_i(x*)
= 0) or the corresponding multiplier is zero (or both). If the constraint is inactive, it doesn't
influence the optimum and its multiplier is zero.
Why necessary? Under a constraint qualification (e.g., linear independence of the active constraints'
gradients), if x* is a local minimum, then there exist multipliers such that the KKT conditions hold.
Geometry: At the optimum, the gradient of the objective function must be pointing in a direction
minimization, ∇f(x*) must be in the cone spanned by the gradients of the active constraints. If it
that is a combination of the gradients of the active constraints. Specifically, for
were not, then we could move in a direction that decreases the objective without violating the
constraints.
Diagram: Imagine a feasible set defined by two inequalities. At the optimum, the gradient of the
objective and the gradients of the active constraints are opposite in direction (for the inequalities)
and the objective gradient is contained in the cone formed by the constraint gradients.
Essay Question 2 (25 Marks)
"Compare and contrast penalty methods and interior point methods for handling constraints in
optimization. Your answer should include the algorithmic steps, the advantages and disadvantages
of each, and the types of problems for which each method is best suited."
Answer:
Penalty Methods:
Idea: Transform the constrained problem into an unconstrained one by adding a penalty
term to the objective function that increases when constraints are violated.
Algorithmic Steps:
1. Choose an initial penalty parameter μ and an initial point x0.
2. Solve the unconstrained problem: minimize f(x) + μ * P(x), where P(x) is a penalty
function (e.g., quadratic penalty: P(x) = Σ [max(0, g_i(x))]² + Σ [h_j(x)]²).
3. Increase μ (e.g., μ <- c * μ with c>1) and use the solution from step 2 as the new
initial point.
4. Repeat until convergence (constraint violation is below a tolerance).
Advantages: Simple to implement; can handle both equality and inequality constraints; does
not require a feasible starting point.
Disadvantages: The unconstrained problem becomes ill-conditioned as μ becomes large; the
solution is only an approximation of the true solution (unless μ→∞, which is not numerically
feasible).
Best suited for: Problems where a rough solution is acceptable, and for problems with few
constraints that are not too nonlinear.