PHDG for solving the saddle point formulation of linear programs (LPs)

Saddle point formulation of linear programs (LPs)

Consider the linear program of the form

\begin{aligned} \text{minimize}_x \quad & c^\top x \\  \text{subject to} \quad& Ax \leq b, \end{aligned}

where c, x \in \mathbb{R}^n, A \in \mathbb{R}^{m \times n}, and b \in \mathbb{R}^m. The Lagrangian for this problem is

L(x,y) = c^\top x + y^\top (Ax - b) = y^\top Ax + c^\top x - b^\top y,

where the domain of L is (x, y) \in \mathbb{R}^n \times \mathbb{R}_+^m (elements of y must be non-negative).

In this previous post, we introduced the saddle point theorem for convex optimization. Here is that same saddle point theorem, but for LPs:

Theorem. If (x^*, y^*) \in \mathbb{R}^n \times \mathbb{R}_+^m is a saddle point for the Lagrangian L, then x^* solves the primal problem. Conversely, if x^* is a finite solution to the primal problem, then there is a y* \in \mathbb{R}_+^m such that (x^*, y^*) is a saddle point for L.

Finding the saddle point of an LP Lagrangian can be written as either of the two (equivalent) formulations:

\begin{aligned} &\text{Primal Formulation:} && \min_{x \in \mathbb{R}^n} \max_{y \in \mathbb{R}_+^m} \left[ y^\top Ax + c^\top x - b^\top y \right], \\  &\text{Dual Formulation:} && \max_{y \in \mathbb{R}_+^m} \min_{x \in \mathbb{R}^n} \left[ y^\top Ax + c^\top x - b^\top y \right].  \end{aligned}

Primal-Dual Hybrid Gradient (PDHG) method

The saddle point formulation of the LP allows us to use primal-dual methods to solve the problem. One such method is called the Primal-Dual Hybrid Gradient (PDHG) method introduced by Chambolle & Pock (2011) (Reference 1). PDHG solves the more general problem

\begin{aligned} \min_{x \in X} \max_{y \in Y} \left\{ \left\langle Kx, y \right\rangle + G(x) - F^*(y) \right\}, \end{aligned}

where X, Y are finite-dimensional real vector spaces, K : X \mapsto Y is a continuous linear operator with induced norm \| K \| = \max \{ \|Kx \|: x \in X \text{ and } \|x\| \leq 1 \}, G: X \mapsto [0, +\infty] and F^*: Y \mapsto [0, +\infty] are proper, convex, lower-semicontinuous functions.

The PDHG algorithm simply loops over the following 3 steps until convergence:

  1. (Dual step) y^{n+1} = \text{prox}_{\sigma F^*}\left( y^n + \sigma K \bar{x}^n \right).
  2. (Primal step) x^{n+1} = \text{prox}_{\tau G} \left( x^n - \tau K^* y^{n+1}\right).
  3. (Over-relaxation) \bar{x}^{n+1} = x^{n+1} + \theta (x^{n+1} - x^n).

In the above, \tau, \sigma > 0 and \theta \in [0, 1] are hyperparameters to be chosen by the user. (Note: Reference 1 doesn’t have proximal operators but has resolvent operators instead. See this post for why we can rewrite the algorithm with prox operators.) It is also possible to switch the order of the dual and primal steps, which gives rise to the following 2 steps until convergence (see Reference 2):

  1. x^{n+1} = \text{prox}_{\tau G} \left( x^n - \tau K^* y^n \right).
  2. y^{n+1} = \text{prox}_{\sigma F^*}\left( y^n + \sigma K \left[ x^n + \theta (x^{n+1} - x^n)\right] \right).

PDHG for LPs

Recall the LP problem is equivalent to

\begin{aligned} \min_{x \in \mathbb{R}^n} \max_{y \in \mathbb{R}_+^m} \left[ y^\top Ax + c^\top x - b^\top y \right]. \end{aligned}

To map this problem onto the PDHG problem, we need X \leftarrow \mathbb{R}^n, Y \leftarrow \mathbb{R}_+^m, K(x) = Ax, G(x) = c^\top x, F^*(y) = b^\top y. With this, the PDHG algorithm becomes

  1. x^{n+1} = \text{prox}_{\tau G} \left( x^n - \tau A^\top y^n \right).
  2. y^{n+1} = \text{prox}_{\sigma F^*}\left( y^n + \sigma A \left[ x^n + \theta (x^{n+1} - x^n)\right] \right).

Note that for h: X \mapsto \mathbb{R} such that h(x) = c^\top x, we have \text{prox}_h (x) = \text{proj}_X (x - c). (See this post for the proof.) Hence, we can replace the prox operators above with Euclidean projection operators:

  1. x^{n+1} = \text{proj}_{\mathbb{R}^n} \left( x^n - \tau A^\top y^n - \tau c \right) = x^n - \tau \left( A^\top y^n + c \right).
  2. y^{n+1} = \text{proj}_{\mathbb{R}_+^m}\left( y^n + \sigma A \left[ x^n + \theta (x^{n+1} - x^n) \right] - \sigma b \right).

References:

  1. Chambolle, A., and Pock, T. (2011). A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging.
  2. Pock, T., and Chambolle, A. (2011). Diagonal preconditioning for first order primal-dual algorithms in convex optimization.

Strong duality for linear programs (LPs)

I keep forgetting the possible scenarios for strong duality of linear programs (LPs), so I’m putting it here as a reference.

Theorem (Strong duality of LPs). Consider a primal LP and its dual. There are 4 possibilities:
1. Both primal and dual have no feasible solutions.
2. The primal is infeasible and the dual is unbounded.
3. The dual is infeasible and the primal is unbounded.
4. Both primal and dual have feasible solutions and their values are equal.

Examples of each possibility can be found in Table 4.2 of Reference 1. A proof of this theorem can be found in multiple places (e.g. Reference 2).

References:

  1. Duality in Linear Programming.
  2. Williamson, D. P., and Kircher, K. (2014). Lecture 8: Strong duality.

Deriving the dual of a linear program (LP)

Did you know that the dual of a linear program (LP) is also a linear program? I derive the dual LP in this post.

Consider the primal LP

\begin{aligned} \underset{x}{\text{minimize}} \quad & c^\top x \\  \text{subject to} \quad & Ax = b, \\  & Gx \leq h. \end{aligned}

The Lagrangian is

\begin{aligned} L(x, u, v) &= c^\top x + u^\top (Ax - b) + v^\top (Gx - h) \\  &= \left( A^\top u + G^\top v + c \right)^\top x - b^\top u - h^\top v, \end{aligned}

where v \geq 0 and there are no restrictions on u. The Lagrange dual function is

\begin{aligned} g(u, v) &= \inf_x L(x, u, v) \\  &= - b^\top u - h^\top v + \inf_x \left\{ \left( A^\top u + G^\top v + c \right)^\top x \right\} \\  &= \begin{cases} - b^\top u - h^\top v &\text{if } A^\top u + G^\top v + c = 0, \\  -\infty &\text{otherwise}. \end{cases} \end{aligned}

The dual problem is maximizing the dual function subject to the constraints of the variables, i.e.

\begin{aligned} \underset{u, v}{\text{maximize}} \quad & g(u, v) \\  \text{subject to} \quad & v \geq 0. \end{aligned}

or equivalently

\begin{aligned} \underset{u, v}{\text{maximize}} \quad & - b^\top u - h^\top v \\  \text{subject to} \quad & -A^\top u - G^\top v = c, \\  & v \geq 0. \end{aligned}

Special case 1: Standard form LP

Consider the primal LP in standard form:

\begin{aligned} \underset{x}{\text{minimize}} \quad & c^\top x \\  \text{subject to} \quad & Ax \leq b, \\  & x \geq 0. \end{aligned}

To match notation with the previous section, we need A \leftarrow 0, b \leftarrow 0, G \leftarrow \begin{pmatrix} A \\ -I \end{pmatrix}, h \leftarrow \begin{pmatrix} b \\ 0 \end{pmatrix}, where I is the identity matrix. With these substitutions, the dual problem becomes

\begin{aligned} \underset{v}{\text{maximize}} \quad & - h^\top v \\  \text{subject to} \quad & - G^\top v = c, \\  & v \geq 0. \end{aligned}

If we write v = \begin{pmatrix} y \\ z \end{pmatrix} where y is the dual variable corresponding to the inequality Ax \leq b, then the above becomes

\begin{aligned} \underset{y, z}{\text{maximize}} \quad & - b^\top y \\  \text{subject to} \quad & - A^\top y + z = c, \\  & y, z \geq 0. \end{aligned}

z can viewed as a slack variable, and so the dual LP is equivalent to

\begin{aligned} \underset{y}{\text{maximize}} \quad & - b^\top y \\  \text{subject to} \quad & A^\top y \geq -c, \\  & y \geq 0. \end{aligned}

Special case 2: Standard form LP (maximization)

Consider the primal LP in standard form but where we want to maximize the objective function instead of minimizing it:

\begin{aligned} \underset{x}{\text{maximize}} \quad & c^\top x \\  \text{subject to} \quad & Ax \leq b, \\  & x \geq 0. \end{aligned}

This is equivalent to minimizing the negative of the objective function subject to the same constraints:

\begin{aligned} \underset{x}{\text{minimize}} \quad & (-c)^\top x \\  \text{subject to} \quad & Ax \leq b, \\  & x \geq 0. \end{aligned}

Thus by replacing c by -c in the previous section, the dual LP in this case is

\begin{aligned} \underset{y}{\text{maximize}} \quad & - b^\top y \\  \text{subject to} \quad & A^\top y \geq -(-c), \\  & y \geq 0, \end{aligned}

or equivalently

\begin{aligned} \underset{y}{\text{minimize}} \quad & b^\top y \\  \text{subject to} \quad & A^\top y \geq c, \\  & y \geq 0. \end{aligned}

(This is the version that you’ll see most often in other websites, and is the one currently on the Wikipedia page.)

What is the Charnes-Cooper transformation?

In what follows, bold capital letters represent matrices, smaller letters in bold represent vectors, and non-bolded characters are scalars.

The Charnes-Cooper transformation (Reference 1) is a transformation that allows us to solve linear-fractional programs by solving a linear program.

A linear program is an optimization problem of the form

\begin{aligned} \underset{\bf x}{\text{maximize}} \quad& {\bf c}^T{\bf x} \\  \text{subject to} \quad& {\bf Ax} \leq {\bf b}, \\  &{\bf x} \geq {\bf 0},\end{aligned}

where {\bf A} \in \mathbb{R}^{m \times n}, {\bf x, c} \in \mathbb{R}^n and {\bf b} \in \mathbb{R}^m. In the above, the inequalities are to be interpreted element-wise. We like linear programs because we have many efficient algorithms for solving them (see this for a preliminary list).

A linear-fractional program is a generalization of the linear program where the objective function is replaced by a ratio of two affine functions, and the optimization is over the same type of constraints:

\begin{aligned} \underset{\bf x}{\text{maximize}} \quad& \frac{{\bf c}^T{\bf x} + \alpha}{{\bf d}^T{\bf x} + \beta} \\  \text{subject to} \quad& {\bf Ax} \leq {\bf b}, \\  &{\bf x} \geq {\bf 0},\end{aligned}

where {\bf d} \in \mathbb{R}^n \alpha, \beta are scalar constants.

The Charnes-Cooper transformation

The Charnes-Cooper transformation is a change of variables which allows us to transform a linear-fractional program into a linear program. Define new variables

\begin{aligned} {\bf y} = \dfrac{1}{{\bf d}^T {\bf x} + \beta} \cdot {\bf x}, \qquad t = \dfrac{1}{{\bf d}^T {\bf x} + \beta}. \end{aligned}

Under the assumption that the feasible region is non-empty and bounded, Reference 1 showed that the linear-fractional program above is equivalent to the linear program below:

\begin{aligned} \underset{{\bf y}, t}{\text{maximize}} \quad& {\bf c}^T{\bf y} + \alpha t \\  \text{subject to} \quad& {\bf Ay} \leq {\bf b}t, \\  &{\bf d}^T {\bf y} + \beta t = 1, \\  &t \geq 0.\end{aligned}

Given the solution {\bf y} and t, we can reconstruct the solution to our original problem: {\bf x} = {\bf y} / t.

References:

  1. Charnes, A., and Cooper, W. W. (1962). Programming with linear fractional functionals.
  2. Wikipedia. Linear-fractional programming.