10.
8 Combination of Steepest Descent and Newton’s Method 309
Analysis of Quadratic Case
Since the method is not a full Newton method, we can conclude that it possesses only
linear convergence and that the dominating aspects of convergence will be revealed
by an analysis of the method as applied to a quadratic function. Furthermore, as
might be intuitively anticipated, the associated rate of convergence is governed
by the steepest descent part of algorithm (57), and that rate is governed by a
Kantorovich-like ratio defined over the subspace orthogonal to N .
Theorem. (Combined method). Let Q be an n × n symmetric positive definite
matrix, and let x∗ ∈ E n . Define the function
Ex = 21 x − x∗ T Qx − x∗
and let b = Qx∗ . Let B be an n × m matrix of rank m. Starting at an arbitrary
point x0 , define the iterative process
a) uk = −BT QB−1 BT gk , where gk = Qxk − b.
b) zk = xk + Buk .
c) pk = b − Qzk .
pT p
d) xk+1 = zk + k pk , where k = Tk k .
pk Qpk
This process converges to x∗ , and satisfies
Exk+1 1 − Exk (58)
where 0 1, is the minimum of
pT p2
pT QppT Q−1 p
over all vectors p in the nullspace of BT .
Proof. The algorithm given in the theorem statement is exactly the general
combined algorithm specialized to the quadratic situation. Next we note that
BT pk = BT Q x∗ − zk = BT Q x∗ − xk − BT QBuk
−1 (59)
= −BT gk + BQBT BT QB BT gk = 0
which merely proves that the gradient at zk is orthogonal to N . Next we calculate
2Exk − Ezk = xk − x∗ T Qxk − x∗ − zk − x∗ T Qzk − x∗
= −2ukT BT Qxk − x∗ − ukT BT QBuk
(60)
= −2ukT BT gk + ukT BT QBBT QB−1 BT gk
= −ukT BT gk = gkT BBT QB−1 BT gk
310 Chapter 10 Quasi-Newton Methods
Then we compute
2Ezk − Exk+1 = zk − x∗ T Qzk − x∗ − xk+1 − x∗ T Qxk+1 − x∗
= −2k pTk Qzk − x∗ − 2k pTk Qpk
= 2k pTk pk − 2k pTk Qpk (61)
pTk pk 2
= k pTk pk =
pTk Qpk
Now using (59) and pk = −gk − QBuk we have
2Exk = xk − x∗ T Qxk − x∗ = gkT Q−1 gk
= pTk + ukT BT QQ−1 pk + QBuk
(62)
= pTk Q−1 pk + ukT BT QBuk
= pTk Q−1 pk + gkT BBT QB−1 BT gk
Adding (60) and (61) and dividing by (62) there results
Exk − Exk+1 gkT BBT QB−1 BT gk + pTk pk 2 /pTk Qpk
=
Exk pTk Q−1 pk + gkT BBT QB−1 BT gk
q + pTk pk /pTk Qpk
=
q + pTk Q−1 pk /pTk pk
where q 0. This has the form q + a/q + b with
pTk pk pTk Q−1 pk
a= b=
pTk Qpk pTk pk
But for any pk , it follows that a b. Hence
q+a a
q+b b
and thus
Exk − Exk+1 pTk pk 2
T
Exk pk Qpk pTk Q−1 pk
Finally,
pTk pk 2
Exk+1 Exk 1 − 1 − Exk
pTk Qpk pTk Q−1 pk
since BT pk = 0.
10.8 Combination of Steepest Descent and Newton’s Method 311
The value associated with the above theorem is related to the eigenvalue
structure of Q. If p were allowed to vary over the whole space, then the Kantorovich
inequality
pT p2 4aA
(63)
p Qpp Q p a + A2
T T −1
where a and A are, respectively, the smallest and largest eigenvalues of Q, gives
explicitly
4aA
=
a + A2
When p is restricted to the nullspace of BT , the corresponding value of is larger.
In some special cases it is possible to obtain a fairly explicit estimate of . Suppose,
for example, that the subspace N were the subspace spanned by m eigenvectors of
Q. Then the subspace in which p is allowed to vary is the space orthogonal to N
and is thus, in this case, the space generated by the other n − m eigenvectors of Q.
In this case since for p in N ⊥ (the space orthogonal to N ), both Qp and Q−1 p are
also in N ⊥ , the ratio satisfies
pT p2 4aA
=
p Qpp Q p a + A2
T T −1
where now a and A are, respectively, the smallest and largest of the n − m eigen-
values of Q corresponding to N ⊥ . Thus the convergence ratio (58) reduces to the
familiar form
A−a 2
Exk+1 Exk
A+a
where a and A are these special eigenvalues. Thus, if B, or equivalently N , is chosen
to include the eigenvectors corresponding to the most undesirable eigenvalues of
Q, the convergence rate of the combined method will be quite attractive.
Applications
The combination of steepest descent and Newton’s method can be applied usefully
in a number of important situations. Suppose, for example, we are faced with a
problem of the form
minimize fx y
where x ∈ E n y ∈ E m , and where the second partial derivatives with respect to x
are easily computable but those with respect to y are not. We may then employ
Newton steps with respect to x and steepest descent with respect to y.