<<

. 5
( 7)



>>


we may set
M1 = 2γ F (x— )’1 , and M2 = K F (x— )
and obtain, using (4.7) and (6.5),

F (x— )e+ ¤ (1 + M1 δ) r + M2 δ ec

¤ (1 + M1 δ)(1 + M0 δ)·c F (x— )ec
(6.7)
+M2 δ F (x— )’1 F (x— )ec

¤ ((1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ) F (x— )ec .

Now let δ be small enough so that

(1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ¤ ·
¯

and the proof is complete.
Note that the distance δ from the initial iterate to the solution may have
to be smaller for (6.4) to hold than for (6.2). However (6.4) is remarkable in its
assertion that any method that produces a step with a linear residual less than
that of the zero step will reduce the norm of the error if the error is measured
with the weighted norm
· — = F (x— ) · .
In fact, Theorem 6.1.3 asserts q-linear convergence in the weighted norm if the
initial iterate is su¬ciently near x— . This is made precise in Theorem 6.1.4.
The application of Theorem 6.1.3 described in Theorem 6.1.4 di¬ers from
Theorem 6.1.2 in that we do not demand that {·n } be bounded away from 1
by a su¬ciently large amount, just that 1 not be an accumulation point. The
importance of this theorem for implementation is that choices of the sequence
of forcing terms {·n } (such as ·n = .5 for all n) that try to minimize the
number of inner iterations are completely justi¬ed by this result if the initial
iterate is su¬ciently near the solution. We make such a choice in one of the
examples considered in § 6.4 and compare it to a modi¬cation of a choice from



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




99
INEXACT NEWTON METHODS

[69] that decreases ·n rapidly with a view toward minimizing the number of
outer iterations.
Theorem 6.1.4. Let the standard assumptions hold. Then there is δ such
that if x0 ∈ B(δ), {·n } ‚ [0, ·] with · < · < 1, then the inexact Newton
¯
iteration
xn+1 = xn + sn ,
where
F (xn )sn + F (xn ) ¤ ·n F (xn )
converges q-linearly with respect to · — to x— . Moreover
• if ·n ’ 0 the convergence is q-superlinear, and

• if ·n ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
with q-order 1 + p.
Proof. The proof uses both (6.4) and (6.2). Our ¬rst task is to relate the
norms · and · — so that we can estimate δ. Let δ0 be such that (6.4) holds
for ec < δ0 .
For all x ∈ RN ,

¤ F (x— ) x ¤ κ(F (x— )) x — ,
(6.8) x —

Note that e < δ0 if
< δ— = F (x— ) δ0
e —

Set δ = F (x— ) ’1 δ— . Then e0 — < δ— if e0 < δ.
Our proof does not rely on the (possibly false) assertions that en < δ for
all n or that xn ’ x— q-linearly with respect to the unweighted norm. Rather
we note that (6.4) implies that if en — < δ— then

(6.9) en+1 ¤ · en
¯ < δ—
— —

and hence xn ’ x— q-linearly with respect to the weighted norm. This proves
the ¬rst assertion.
To prove the assertions on q-superlinear convergence, note that since
xn ’ x— , eventually en will be small enough so that the conclusions of
Theorem 6.1.1 hold. The superlinear convergence results then follow exactly
as in the proof of Theorem 6.1.2.
We close this section with some remarks on the two types of results in
Theorems 6.1.2 and 6.1.4. The most signi¬cant di¬erence is the norm in which
q-linear convergence takes place. q-linear convergence with respect to the
weighted norm · — is equivalent to q-linear convergence of the sequence of
nonlinear residuals { F (xn ) }. We state this result as Proposition 6.1.1 and
leave the proof to the reader in Exercise 6.5.2. The implications for the choice of
the forcing terms {·n } are also di¬erent. While Theorem 6.1.2 might lead one
to believe that a small value of ·n is necessary for convergence, Theorem 6.1.4
shows that the sequence of forcing terms need only be kept bounded away from



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




100 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1. A constant sequence such as ·n = .5 may well su¬ce for linear convergence,
but at a price of a greater number of outer iterations. We will discuss other
factors in the choice of forcing terms in § 6.3.
Proposition 6.1.1. Let the standard assumptions hold and let xn ’ x— .
Then F (xn ) converges q-linearly to 0 if and only if en — does.

6.1.3. Errors in the function. In this section, we state two theorems on
inexact local improvement. We leave the proofs, which are direct analogs to
the proofs of Theorems 6.1.1 and 6.1.3, to the reader as exercises. Note that
errors in the derivative, such as those arising from a di¬erence approximation
of the action of F (xc ) on a vector, can be regarded as part of the inexact
solution of the equation for the Newton step. See [55] or Proposition 6.2.1 and
its proof for an illustration of this point.
Theorem 6.1.5. Let the standard assumptions hold. Then there are δ and
KI such that if xc ∈ B(δ), s satis¬es

(6.10) F (xc )s + F (xc ) + (xc ) ¤ ·c F (xc ) + (xc )

and x+ = xc + s, then

(6.11) e+ ¤ KI (( ec + ·c ) ec + (xc ) ).

Theorem 6.1.6. Let the standard assumptions hold. Then there is δ such
that if xc ∈ B(δ), s satis¬es (6.10), x+ = xc + s, and ·c ¤ · < · < 1, then
¯
there is KE such that

F (x— )e+ ¤ · F (x— )ec + KE (xc ) .
(6.12) ¯


6.2. Newton-iterative methods
A Newton-iterative method realizes (6.1) with an iterative method for the linear
system for the Newton step, terminating when the relative linear residual is
smaller than ·c (i.e, when (6.1) holds). The name of the method indicates
which linear iterative method is used, for example, Newton-SOR, Newton-CG,
Newton-GMRES, would use SOR, CG, or GMRES to solve the linear system.
This naming convention is taken from [175] and [145]. These methods have also
been called truncated Newton methods [56], [136] in the context of optimization.
Typically the nonlinear iteration that generates the sequence {xn } is called
the outer iteration and the linear iteration that generates the approximations
to the steps is called the inner iteration.
In this section we use the l2 norm to measure nonlinear residuals. The
reason for this is that the Krylov methods use the scalar product and the
estimates in Chapters 2 and 3 are in terms of the Euclidean norm. In the
examples discussed in § 6.4 we will scale the l2 norm by a factor of 1/N so
that the results for the di¬erential and integral equations will be independent
of the computational mesh.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




101
INEXACT NEWTON METHODS

6.2.1. Newton GMRES. We provide a detailed analysis of the Newton-
GMRES iteration. We begin by discussing the e¬ects of a forward di¬erence
approximation to the action of the Jacobian on a vector.
If the linear iterative method is any one of the Krylov subspace methods
discussed in Chapters 2 and 3 then each inner iteration requires at least one
evaluation of the action of F (xc ) on a vector. In the case of CGNR and CGNE,
an evaluation of the action of F (xc )T on a vector is also required. In many
implementations [20], [24], the action of F (xc ) on a vector w is approximated
by a forward di¬erence, (5.15), Dh F (x : w) for some h. It is important to note
that this is entirely di¬erent from forming the ¬nite di¬erence Jacobian ∇h F (x)
and applying that matrix to w. In fact, as pointed out in [20], application
of GMRES to the linear equation for the Newton step with matrix-vector
products approximated by ¬nite di¬erences is the same as the application of
GMRES to the matrix Gh F (x) whose last k ’ 1 columns are the vectors

vk = Dh F (x : vk’1 )

The sequence {vk } is formed in the course of the forward-di¬erence GMRES
iteration.
To illustrate this point, we give the forward-di¬erence GMRES algorithm
for computation of the Newton step, s = ’F (x)’1 F (x). Note that matrix-
vector products are replaced by forward di¬erence approximations to F (x), a
sequence of approximate steps {sk } is produced, that b = ’F (x), and that the
initial iterate for the linear problem is the zero vector. We give a MATLAB
implementation of this algorithm in the collection of MATLAB codes.
Algorithm 6.2.1. fdgmres(s, x, F, h, ·, kmax, ρ)
1. s = 0, r = ’F (x), v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > · F (x) and k < kmax do
2

(a) k = k + 1
(b) vk+1 = Dh F (x : vk )
for j = 1, . . . k
T
i. hjk = vk+1 vj
ii. vk+1 = vk+1 ’ hjk vj
(c) hk+1,k = vk+1 2

(d) vk+1 = vk+1 / vk+1 2

(e) e1 = (1, 0, . . . , 0)T ∈ Rk+1
Minimize βe1 ’ Hk y k Rk+1 to obtain y k ∈ Rk .
(f) ρ = βe1 ’ Hk y k Rk+1 .

3. s = Vk y k .
In our MATLAB implementation we solve the least squares problem in
step 2e by the Givens rotation approach used in Algorithm gmres.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




102 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

As should be clear from inspection of Algorithm gmres, the di¬erence
between ∇h F and Gh F is that the columns are computed by taking directional
derivatives based on two di¬erent orthonormal bases: ∇h F using the basis of
unit vectors in coordinate directions and Gh F the basis for the Krylov space Kk

constructed by algorithm fdgmres. The action of Gh F on Kk , the orthogonal
complement of Kk , is not used by the algorithm and, for the purposes of

analysis, could be speci¬ed by the action of ∇h F on any basis for Kk . Hence
Gh F is also ¬rst order accurate in h. Assuming that there is no error in the
evaluation of F we may summarize the observations above in the following
proposition, which is a special case of the more general results in [20].
Proposition 6.2.1. Let the standard assumptions hold. Let · ∈ (0, 1).
¯ ¯
Then there are CG , h, and δ such that if x ∈ B(δ), h ¤ h, and Algo-
rithm fdgmres terminates with k < kmax then the computed step s satis¬es

(6.13) F (x)s + F (x) < (· + CG h) F (x) 2 .
2


Proof. First let δ be small enough so that B(δ) ‚ „¦ and the conclusions
to Lemma 4.3.1 hold. Let {uj } be any orthonormal basis for RN such that
uj = vj for 1 ¤ j ¤ k, where {vj }k is the orthonormal basis for Kk generated
j=1
by Algorithm fdgmres.
Consider the linear map de¬ned by its action on {uj }

Buj = Dh F (x : uj ).

Note that
1
Dh F (x : uj ) = 0F (x + th x 2 uj )uj dt

1
= F (x)uj + 0 (F (x + th x 2 uj ) ’ F (x))uj dt.

Since F is Lipschitz continuous and {uj } is an orthonormal basis for RN we
have, with γ = γ( x— 2 + δ),
¯

(6.14) B ’ F (x) ¤ h x 2 γ/2 ¤ h¯ /2
γ
2


Since the linear iteration (fdgmres) terminates in k < kmax iterations, we
have, since B and Gh F agree on Kk ,

Bs + F (x) ¤ · F (x)
2 2


and therefore

(6.15) F (x)s + F (x) ¤ · F (x) + h¯ s 2 /2.
γ
2 2


Assume that
¯γ ’1
h¯ ¤ F (x— )’1 2 /2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




103
INEXACT NEWTON METHODS

Then Lemma 4.3.1 and (6.15) imply
’1 ’1
F (x— )’1 s 2 /2 ¤ F (x)’1 s 2
2 2

¯γ
¤ F (x)s ¤ (1 + ·) F (x) + h¯ s 2 /2.
2 2

Therefore,
¤ 4(1 + ·) F (x— )’1
(6.16) s F (x) 2 .
2 2

Combining (6.16) with (6.15) completes the proof with CG = 4¯ (1 + γ
— )’1 .
·) F (x 2
Proposition 6.2.1 implies that a ¬nite di¬erence implementation will not
a¬ect the performance of Newton-GMRES if the steps in the forward di¬erence
approximation of the derivatives are su¬ciently small. In an implementation,
therefore, we must specify not only the sequence {·n }, but also the steps {hn }
used to compute forward di¬erences. Note also that Proposition 6.2.1 applies
to a restarted GMRES because upon successful termination (6.15) will hold.
We summarize our results so far in the analogs of Theorem 6.1.4 and
Theorem 6.1.2. The proofs are immediate consequences of Theorems 6.1.2,
6.1.4, and Proposition 6.2.1 and are left as (easy) exercises.
Theorem 6.2.1. Assume that the assumptions of Proposition 6.2.1 hold.
Then there are δ, σ such that if x0 ∈ B(δ) and the sequences {·n } and {hn }
¯
satisfy
σn = ·n + CG hn ¤ σ ¯
then the forward di¬erence Newton-GMRES iteration

xn+1 = xn + sn

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly and sn satis¬es

(6.17) F (xn )sn + F (xn ) < σn F (xn ) 2 .
2

Moreover,
• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
2
with q-order 1 + p.
Theorem 6.2.2. Assume that the assumptions of Proposition 6.2.1 hold.
Then there is δ such that if x0 ∈ B(δ) and the sequences {·n } and {hn } satisfy

σn = ·n + CG hn ∈ [0, σ ]
¯

with 0 < σ < 1 then the forward di¬erence Newton-GMRES iteration
¯

xn+1 = xn + sn



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




104 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly with respect to · — and sn satis¬es (6.17). Moreover
• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear
2
with q-order 1 + p.
In the case where hn is approximately the square root of machine roundo¬,
σn ≈ ·n if ·n is not too small and Theorem 6.2.2 states that the observed
behavior of the forward di¬erence implementation will be the same as that of
an implementation with exact derivatives.

6.2.2. Other Newton-iterative methods. Other iterative methods may
be used as the solver for the linear equation for the Newton step. Newton-
multigrid [99] is one approach using a stationary iterative method. See also
[6], [111]. If F is symmetric and positive de¬nite, as it is in the context of
unconstrained minimization, [63], Newton-CG is a possible approach.
When storage is restricted or the problem is very large, GMRES may not
be practical. GMRES(m), for a small m, may not converge rapidly enough
to be useful. CGNR and CGNE o¬er alternatives, but require evaluation of
transpose-vector products. Bi-CGSTAB and TFQMR should be considered in
all cases where there is not enough storage for GMRES(m) to perform well.
If a transpose is needed, as it will be if CGNR or CGNE is used as the
iterative method, a forward di¬erence formulation is not an option because
approximation of the transpose-vector product by di¬erences is not possible.
Computation of ∇h F (x) and its transpose analytically is one option as is
di¬erentiation of F automatically or by hand. The reader may explore this
further in Exercise 6.5.13.
The reader may also be interested in experimenting with the other Krylov
subspace methods described in § 3.6, [78], and [12].

6.3. Newton-GMRES implementation
We provide a MATLAB code nsolgm in the collection that implements
a forward-di¬erence Newton-GMRES algorithm. This algorithm requires
di¬erent inputs than nsol. As input we must give the initial iterate, the
function, the vector of termination tolerances as we did for nsol. In addition,
we must provide a method for forming the sequence of forcing terms {·n }.
We adjust · as the iteration progresses with a variation of a choice from
[69]. This issue is independent of the particular linear solver and our discussion
is in the general inexact Newton method setting. Setting · to a constant for
the entire iteration is often a reasonable strategy as we see in § 6.4, but the
choice of that constant depends on the problem. If a constant · is too small,
much e¬ort can be wasted in the initial stages of the iteration. The choice in



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




105
INEXACT NEWTON METHODS

[69] is motivated by a desire to avoid such over solving. Over solving means
that the linear equation for the Newton step is solved to a precision far beyond
what is needed to correct the nonlinear iteration. As a measure of the degree
to which the nonlinear iteration approximates the solution we begin with
·n = γ F (xn ) 2 / F (xn’1 ) 2 ,
A

A
where γ ∈ (0, 1] is a parameter. If ·n is uniformly bounded away from 1,
A
then setting ·n = ·n for n > 0 would guarantee q-quadratic convergence by
Theorem 6.1.1. In this way, the most information possible would be extracted
from the inner iteration. In order to specify the choice at n = 0 and bound
the sequence away from 1 we set
±
 ·max , n = 0,

B
(6.18) ·n =

 min(· A
max , ·n ), n > 0.

In (6.18) the parameter ·max is an upper limit on the sequence {·n }. In [69]
the choices γ = .9 and ·max = .9999 are used.
B
It may happen that ·n is small for one or more iterations while xn is still
far from the solution. A method of safeguarding was suggested in [69] to avoid
volatile decreases in ·n . The idea is that if ·n’1 is su¬ciently large we do not
let ·n decrease by much more than a factor of ·n’1 .
±
 ·max , n = 0,





C A 2
min(·max , ·n ), n > 0, γ·n’1 ¤ .1,
(6.19) ·n =





 A 2 2
min(·max , max(·n , γ·n’1 )), n > 0, γ·n’1 > .1.
The constant .1 is somewhat arbitrary. This safeguarding does improve the
performance of the iteration.
There is a chance that the ¬nal iterate will reduce F far beyond the
desired level and that the cost of the solution of the linear equation for the last
step will be more accurate than is really needed. This oversolving on the ¬nal
step can be controlled comparing the norm of the current nonlinear residual
F (xn ) to the nonlinear residual norm at which the iteration would terminate
„t = „a + „r F (x0 )
and bounding ·n from below by a constant multiple of „t / F (xn ) . The
algorithm nsolgm does this and uses
C
(6.20) ·n = min(·max , max(·n , .5„t / F (xn ) )).
Exercise 6.5.9 asks the reader to modify nsolgm so that other choices of
the sequence {·n } can be used.
We use dirder to approximate directional derivatives and use the default
value of h = 10’7 . In all the examples in this book we use the value γ = .9 as
recommended in [69].



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




106 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 6.3.1. nsolgm(x, F, „, ·)

1. rc = r0 = F (x) 2 / N

2. Do while F (x) 2 / N > „r r0 + „a
(a) Select ·.
(b) fdgmres(s, x, F, ·)
(c) x = x + s
(d) Evaluate F (x)

(e) r+ = F (x) 2 / N , σ = r+ /rc , rc = r+
(f) If F (x) ¤ „r r0 + „a exit.
2


Note that the cost of an outer iterate is one evaluation of F to compute
the value at the current iterate and other evaluations of F to compute the
forward di¬erences for each inner iterate. Hence if a large value of · can be
used, the cost of the entire iteration can be greatly reduced. If a maximum
of m GMRES iterations is needed (or GMRES(m) is used as a linear solver)
the storage requirements of Algorithm nsolgm are the m + 5 vectors x, F (x),
x + hv, F (x + hv), s, and the Krylov basis {vk }m .
k=1
Since GMRES forms the inner iterates and makes termination decisions
based on scalar products and l2 norms, we also terminate the outer iteration on
small l2 norms of the nonlinear residuals. However, because of our applications
to di¬erential and integral equations, we scale the l2 norm of the nonlinear

residual by a factor of 1/ N so that constant functions will have norms that
are independent of the computational mesh.
For large and poorly conditioned problems, GMRES will have to be
restarted. We illustrate the consequences of this in § 6.4 and ask the reader to
make the necessary modi¬cations to nsolgm in Exercise 6.5.9.
The preconditioners we use in § 6.4 are independent of the outer iteration.
Since the Newton steps for M F (x) = 0 are the same as those for F (x) = 0, it
is equivalent to precondition the nonlinear equation before calling nsolgm.

6.4. Examples for Newton-GMRES
In this section we consider two examples. We revisit the H-equation and solve
a preconditioned nonlinear convection-di¬usion equation.
In all the ¬gures we plot the relative nonlinear residual F (xn ) 2 / F (x0 ) 2
against the number of function evaluations required by all inner and outer
iterations to compute xn . Counts of function evaluations corresponding to
outer iterations are indicated by circles. From these plots one can compare
not only the number of outer iterations, but also the total cost. This enables
us to directly compare the costs of di¬erent strategies for selection of ·. Note
that if only a single inner iteration is needed, the total cost of the outer iteration
will be two function evaluations since F (xc ) will be known. One new function
evaluation will be needed to approximate the action of F (xc ) on a vector



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




107
INEXACT NEWTON METHODS

and then F (x+ ) will be evaluated to test for termination. We also count
the evaluation of F (x0 ) as an additional cost in the evaluation of x1 , which
therefore has a minimum cost of three function evaluations. For each example
we compare a constant value of ·n with the choice given in (6.20) and used as
the default in nsolgm.

6.4.1. Chandrasekhar H-equation. We solve the H-equation on a 100-
point mesh with c = .9 using two schemes for selection of the parameter ·.
The initial iterate is the function identically one. We set the parameters in
(6.20) to γ = .9 and ·max = .25.
We used „r = „a = 10’6 in this example.
In Fig. 6.1 we plot the progress of the iteration using · = .1 with the solid
line and using the sequence given by (6.20) with the dashed line. We set the
parameters in (6.20) to γ = .9 and ·max = .25. This choice of ·max was the
one that did best overall in our experiments on this problem.
We see that the constant · iteration terminates after 12 function evalua-
tions, 4 outer iterations, and roughly 275 thousand ¬‚oating-point operations.
This is a slightly higher overall cost than the other approach in which {·n } is
given by (6.20), which terminated after 10 function evaluations, 3 outer itera-
tions, and 230 thousand ¬‚oating-point operations. The chord method with the
di¬erent (but very similar for this problem) l∞ termination condition for the
same problem, reported in § 5.6 incurred a cost of 3.3 million ¬‚oating-point
operations, slower by a factor of over 1000. This cost is entirely a result of the
computation and factorization of a single Jacobian.

0
10

-1
10

-2
10
Relative Nonlinear Residual




-3
10

-4
10

-5
10

-6
10

-7
10

-8
10
0 2 4 6 8 10 12
Function Evaluations




Fig. 6.1. Newton-GMRES for the H-equation, c = .9.

In Fig. 6.2 the results change for the more ill-conditioned problem with
c = .9999. Here the iteration with ·n = .1 performed slightly better and



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




108 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS


0
10

-1
10




Relative Nonlinear Residual
-2
10

-3
10

-4
10

-5
10

-6
10

-7
10
0 5 10 15 20 25
Function Evaluations




Fig. 6.2. Newton-GMRES for the H-equation, c = .9999
0
10




-1
10
Relative Nonlinear Residual




-2
10




-3
10




-4
10
0 2 4 6 8 10 12 14 16 18
Function Evaluations




Fig. 6.3. Newton-GMRES for the PDE, C = 20.



terminated after 22 function evaluations and 7 outer iterations, and roughly
513 thousand ¬‚oating-point operations. The iteration with the decreasing {·n }
given by (6.20) required 23 function evaluations, 7 outer iterations, and 537
thousand ¬‚oating-point operations.

6.4.2. Convection-di¬usion equation. We consider the partial di¬eren-
tial equation

’ ∇2 u + Cu(ux + uy ) = f
(6.21)



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




109
INEXACT NEWTON METHODS

with homogeneous Dirichlet boundary conditions on the unit square (0, 1) —
(0, 1). f has been constructed so that the exact solution was the discretization
of
10xy(1 ’ x)(1 ’ y) exp(x4.5 ).
We set C = 20 and u0 = 0. As in § 3.7 we discretized (6.21) on a 31 — 31 grid
using centered di¬erences. To be consistent with the second-order accuracy of
the di¬erence scheme we used „r = „a = h2 , where h = 1/32.
We compare the same possibilities for {·n } as in the previous example and
report the results in Fig. 6.3. In (6.20) we set γ = .9 and ·max = .5. The
computation with constant · terminated after 19 function evaluations and 4
outer iterates at a cost of roughly 3 million ¬‚oating-point operations. The
iteration with {·n } given by (6.20) required 16 function evaluations, 4 outer
iterations, and 2.6 million ¬‚oating-point operations. In the computation we
preconditioned (6.21) with the fast Poisson solver fish2d.
In this case, the choice of {·n } given by (6.20) reduced the number of inner
iterations in the early stages of the iteration and avoided oversolving. In cases
where the initial iterate is very good and only a single Newton iterate might be
needed, however, a large choice of ·max or constant · may be the more e¬cient
choice. Examples of such cases include (1) implicit time integration of nonlinear
parabolic partial di¬erential equations or ordinary di¬erential equations in
which the initial iterate is either the converged solution from the previous
time step or the output of a predictor and (2) solution of problems in which
the initial iterate is an interpolant of the solution from a coarser mesh.
Preconditioning is crucial for good performance. When preconditioning
was not done, the iteration for C = 20 required more than 80 function
evaluations to converge. In Exercise 6.5.9 the reader is asked to explore this.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




110 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

6.5. Exercises on inexact Newton methods
6.5.1. Verify (6.8).

6.5.2. Prove Proposition 6.1.1 by showing that if the standard assumptions
hold, 0 < < 1, and x is su¬ciently near x— then

F (x) /(1 + ) ¤ F (x— )e ¤ (1 + ) F (x) .



6.5.3. Verify (6.14).

6.5.4. Prove Theorems 6.2.2 and 6.2.1.

6.5.5. Prove Theorems 6.1.5 and 6.1.6.

6.5.6. Can anything like Proposition 6.2.1 be proved for a ¬nite-di¬erence Bi-
CGSTAB or TFQMR linear solver? If not, what are some possible
implications?

6.5.7. Give an example in which F (xn ) ’ 0 q-linearly but xn does not converge
to x— q-linearly.

6.5.8. In the DASPK code for integration of di¬erential algebraic equations,
[23], preconditioned Newton-GMRES is implemented in such a way
that the data needed for the preconditioner is only computed when
needed, which may be less often that with each outer iteration. Discuss
this strategy and when it might be useful. Is there a relation to the
Shamanskii method?

6.5.9. Duplicate the results in § 6.4. For the H-equation experiment with
various values of c, such as c = .5, .99999, .999999, and for the convection-
di¬usion equation various values of C such as C = 1, 10, 40 and how
preconditioning a¬ects the iteration. Experiment with the sequence of
forcing terms {·n }. How do the choices ·n = 1/(n + 1), ·n = 10’4
[32], ·n = 21’n [24], ·n = min( F (xn ) 2 , (n + 2)’1 ) [56], ·n = .05, and
·n = .75 a¬ect the results? Present the results of your experiments as
graphical iteration histories.

6.5.10. For the H-equation example in § 6.4, vary the parameter c and see how
the performance of Newton-GMRES with the various choices of {·n } is
a¬ected. What happens when c = 1? See [119] for an explanation of
this.

6.5.11. Are the converged solutions for the preconditioned and unpreconditioned
convection-di¬usion example in § 6.4 equally accurate?

6.5.12. For the convection-di¬usion example in § 6.4, how is the performance
a¬ected if GMRES(m) is used instead of GMRES?



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




111
INEXACT NEWTON METHODS

6.5.13. Apply Newton-CGNR, Newton-Bi-CGSTAB, or Newton-TFQMR to the
two examples in § 6.4. How will you deal with the need for a transpose in
the case of CGNR? How does the performance (in terms of time, ¬‚oating-
point operations, function evaluations) compare to Newton-GMRES?
Experiment with the forcing terms. The MATLAB codes fdkrylov,
which allows you to choose from a variety of forward di¬erence Krylov
methods, and nsola, which we describe more fully in Chapter 8, might
be of use in this exercise.

6.5.14. If you have done Exercise 5.7.25, apply nsolgm to the same two-point
boundary value problem and compare the performance.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




112 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 7

Broyden™s method




Quasi-Newton methods maintain approximations of the solution x— and the
Jacobian at the solution F (x— ) as the iteration progresses. If xc and Bc are
the current approximate solution and Jacobian, then
’1
(7.1) x+ = xc ’ Bc F (xc ).

After the computation of x+ , Bc is updated to form B+ . The construction of
B+ determines the quasi-Newton method.
The advantages of such methods, as we shall see in § 7.3, is that solution of
equations with the quasi-Newton approximation to the Jacobian is often much
cheaper than using F (xn ) as the coe¬cient matrix. In fact, if the Jacobian is
dense, the cost is little more than that of the chord method. The method we
present in this chapter, Broyden™s method, is locally superlinearly convergent,
and hence is a very powerful alternative to Newton™s method or the chord
method.
In comparison with Newton-iterative methods, quasi-Newton methods
require only one function evaluation for each nonlinear iterate; there is no cost
in function evaluations associated with an inner iteration. Hence, if a good
preconditioner (initial approximation to F (x— )) can be found, these methods
could have an advantage in terms of function evaluation cost over Newton-
iterative methods.
Broyden™s method [26] computes B+ by

(y ’ Bc s)sT F (x+ )sT
(7.2) B+ = Bc + = Bc + .
sT s sT s
In (7.2) y = F (x+ ) ’ F (xc ) and s = x+ ’ xc .
Broyden™s method is an example of a secant update. This means that the
updated approximation to F (x— ) satis¬es the secant equation

(7.3) B+ s = y.

In one dimension, (7.3) uniquely speci¬es the classical secant method. For
equations in several variables (7.3) alone is a system of N equations in N 2
113

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




114 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

unknowns. The role of the secant equation is very deep. See [62] and [64] for
discussion of this topic. In this book we focus on Broyden™s method and our
methods of analysis are not as general as those in [62] and [64]. In this chapter
we will assume that the function values are accurate and refer the reader to [66],
[65], and [114] for discussions of quasi-Newton methods with inaccurate data.
Locally superlinearly convergent secant methods can be designed to maintain
the sparsity pattern, symmetry, or positive de¬niteness, of the approximate
Jacobian, and we refer the reader to [28], [63], [62], and [64] for discussion and
analysis of several such methods. More subtle structural properties can also
be preserved by secant methods. Some examples can be found in [101], [96],
[95], [113], and [116].
Broyden™s method is also applicable to linear equations [29], [82], [84],
[104], [143], [130], Ax = b, where B ≈ A is updated. One can also apply the
method to linear least squares problems [84], [104]. The analysis is somewhat
clearer in the linear case and the plan of this chapter is to present results for
the linear case for both the theory and the examples. One interesting result,
which we will not discuss further, is that for linear problems Broyden™s method
converges in 2N iterations [29], [82], [84], [143].
We will express our results in terms of the error in the Jacobian

E = B ’ F (x— )
(7.4)

and the step s = x+ ’xc . While we could use E = B ’F (x), (7.4) is consistent
with our use of e = x ’ x— . When indexing of the steps is necessary, we will
write
sn = xn+1 ’ xn .
In this chapter we show that if the data x0 and B0 are su¬ciently good,
the Broyden iteration will converge q-superlinearly to the root.

7.1. The Dennis“Mor´ condition
e
In this section we consider general iterative methods of the form
’1
(7.5) xn+1 = xn ’ Bn F (xn )

where Bn = F (x— )+En ≈ F (x— ) is generated by some method (not necessarily
Broyden™s method).
Veri¬cation of q-superlinear convergence cannot be done by the direct
approaches used in Chapters 5 and 6. We must relate the superlinear
convergence condition that ·n ’ 0 in Theorem 6.1.2 to a more subtle, but
easier to verify condition in order to analyze Broyden™s method. This technique
is based on veri¬cation of the Dennis“Mor´ condition [61], [60] on the sequence
e
of steps {sn } and errors in the Jacobian {En }

En sn
(7.6) lim = 0.
sn
n’∞




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




115
BROYDEN™S METHOD

The main result in this section, Theorem 7.1.1 and its corollary for linear
problems, are used to prove superlinear convergence for Broyden™s method in
this chapter and many other quasi-Newton methods. See [61], [28], [62], and
[64] for several more examples. Our formulation of the Dennis“Mor´ result is
e
a bit less general than that in [60] or [63].
Theorem 7.1.1. Let the standard assumptions hold, let {Bn } be a sequence
of nonsingular N — N matrices, let x0 ∈ RN be given and let {xn }∞ be given
n=1
— for any n. Then x ’ x— q-superlinearly if
by (7.5). Assume that xn = x n
— and the Dennis“Mor´ condition (7.6) holds.
and only if xn ’ x e
Proof. Since
’F (xn ) = Bn sn = F (x— )sn + En sn
we have

(7.7) En sn = ’F (x— )sn ’ F (xn ) = ’F (x— )en+1 + F (x— )en ’ F (xn ).

We use the fundamental theorem of calculus and the standard assumptions to
obtain
1

(F (x— ) ’ F (x— + ten ))en dt
F (x )en ’ F (xn ) =
0
and hence
F (x— )en ’ F (xn ) ¤ γ en 2 /2.
Therefore, by (7.7)

En sn ¤ F (x— )en+1 + γ en 2 /2.
(7.8)

Now, if xn ’ x— q-superlinearly, then for n su¬ciently large

(7.9) sn /2 ¤ en ¤ 2 sn .

The assumption that xn = x— for any n implies that the sequence

(7.10) νn = en+1 / en

is de¬ned and and νn ’ 0 by superlinear convergence of {xn }. By (7.9) we
have
en+1 = νn en ¤ 2νn sn ,
and hence (7.8) implies that

En sn ¤ (2 F (x— ) νn + γ en ) sn

which implies the Dennis-Mor´ condition (7.6).
e
Conversely, assume that xn ’ x— and that (7.6) holds. Let

En sn
µn = .
sn



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




116 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We have, since
En sn = (Bn ’ F (x— ))sn = ’F (xn ) ’ F (x— )sn ,
(7.11)
F (x— )’1 ’1 ¤ F (x— )sn ¤ En sn + F (xn )
sn
(7.12)
= µn sn + F (xn ) .
Since µn ’ 0 by assumption,
µn ¤ F (x— )’1 ’1
(7.13) /2,
for n su¬ciently large. We assume now that n is large enough so that (7.13)
holds. Hence,
sn ¤ 2 F (x— )’1 F (xn ) .
(7.14)
Moreover, using the standard assumptions and (7.11) again,
¤ F (xn ) + F (x— )sn + (F (x— ) ’ F (xn ))sn
F (xn ) + F (xn )sn
(7.15)
¤ (µn + γ en ) sn .
Since xn ’ x— by assumption, ·n ’ 0 where
·n = 2 F (x— )’1 (µn + γ en ).
Combining (7.14) and (7.15) implies that
(7.16) F (xn ) + F (xn )sn ¤ ·n F (xn ) ,
which is the inexact Newton condition. Since ·n ’ 0, the proof is complete
by Theorem 6.1.2.

7.2. Convergence analysis
Our convergence analysis is based on an approach to in¬nite-dimensional
problems from [97]. That paper was the basis for [168], [104], and [115], and
we employ notation from all four of these references. We will use the l2 norm
throughout our discussion. We begin with a lemma from [104].
ˆ
Lemma 7.2.1. Let 0 < θ < 1 and let
ˆ ˆ
{θn }∞ ‚ (θ, 2 ’ θ).
n=0

Let { n }∞ ‚ RN be such that
n=0

< ∞,
n2
n

and let {·n }∞ be a set of vectors in RN such that ·n is either 1 or 0 for
2
n=0
all n. Let ψ0 ∈ RN be given. If {ψn }∞ is given by
n=1
T
(7.17) ψn+1 = ψn ’ θn (·n ψn )·n + n

then
T
(7.18) lim ·n ψn = 0.
n’∞




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




117
BROYDEN™S METHOD

Proof. We ¬rst consider the case in which n = 0 for all n. In that case, we
can use the fact that
ˆ
θn (2 ’ θn ) > θ2 > 0
to show that the sequence {ψn } is bounded in l2 -norm by ψ0 and, in fact,
2

2 2 ’ θn (2 ’ θn )(·n ψn )2
T
ψn+1 ¤ ψn
2 2

ˆT
2 ’ θ2 (·n ψn )2
¤ ψn 2


¤ ψn 2 .
2

Therefore, for any M > 0,
M 2 2
ψ0 ’ ψM +1 ψ0 2
2 2
(·n ψn )2
T
¤ ¤ .
ˆ ˆ
θ2 θ2
n=0

We let M ’ ∞ to obtain

(·n ψn )2 < ∞.
T
(7.19)
n=0

Convergence of the series in (7.19) implies convergence to zero of the terms in
the series, which is (7.18).
To prove the result for n = 0 we use the inequality

b2
a2 b2
(7.20) ’ ¤a’ ,
2a
which is valid for a > 0 and |b| ¤ a. This inequality is used often in the analysis
of quasi-Newton methods [63]. From (7.20) we conclude that if ψn = 0 then

T 2 ’ θn (2 ’ θn )(·n ψn )2
T
ψn ’ θn (·n ψn )·n ¤ ψn
2 2


θn (2 ’ θn )(·n ψn )2
T
¤ ψn 2’ .
2 ψn 2
Hence if ψn = 0

θn (2 ’ θn )(·n ψn )2
T
(7.21) ψn+1 ¤ ψn 2’ + n 2.
2
2 ψn 2
Hence
2 ψn 2
(·n ψn )2 ¤
T
(7.22) ( ψn ’ ψn+1 + n 2 ),
2 2
θn (2 ’ θn )
which holds even if ψn = 0.
From (7.21) we conclude that

ψn+1 ¤ µ,
2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




118 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where ∞
µ= + ψ0 2 .
i2
i=0
Hence
M

M T 2
n=0 (·n ψn ) ¤ ( ψn ’ ψn+1 + n 2)
2 2
ˆ
θ2 n=0

M

= ψ0 ’ ψM +1 +
2 2 n2
ˆ
θ2 n=0

2µ2
¤ .
ˆ
θ2
Hence (7.19) holds and the proof is complete.

7.2.1. Linear problems. We may use Lemma 7.2.1 to prove a result from
[130] on the convergence of Broyden™s method for linear problems. The role of
ˆ
the parameter θ and the sequence {θn } in Lemma 7.2.1 is nicely illustrated by
this application of the lemma.
In this section F (x) = Ax’b and Bn = A+En . The standard assumptions
in this case reduce to nonsingularity of A. Our ¬rst task is to show that
the errors in the Broyden approximations to A do not grow. This property,
called bounded deterioration [28], is an important part of the convergence
analysis. The reader should compare the statement of the result with that
in the nonlinear case.
In the linear case, the Broyden update has a very simple form. We will
consider a modi¬ed update

(y ’ Bc s)sT (Ax+ ’ b)sT
(7.23) B+ = Bc + θc = Bc + θc .
sT s sT s
If xc and Bc are the current approximations to x— = A’1 b and A and
s = x+ ’ xc then

y ’ Bc s = (Ax+ ’ Axc ) ’ Bc (x+ ’ xc ) = ’Ec s

and therefore
(y ’ Bc s)sT (Ec s)sT
(7.24) B+ = Bc + θc = Bc ’ θc .
s2 s2
2 2

Lemma 7.2.2. Let A be nonsingular, θc ∈ [0, 2], and xc ∈ RN be given. Let
Bc be nonsingular and let B+ be formed by the Broyden update (7.23). Then

(7.25) E+ ¤ Ec 2 .
2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




119
BROYDEN™S METHOD

Proof. By subtracting A from both sides of (7.24) we have

(Ec s)sT
E+ = Ec ’ θ c
s2
(7.26) 2

= Ec (I ’ θc Ps ),

where Ps is the orthogonal projector

ssT
(7.27) Ps = .
s22

This completes the proof as

E+ ¤ Ec I ’ θ c Ps
2 2 2

and I ’ θc Ps 2 ¤ 1 by orthogonality of Ps and the fact that 0 ¤ θc ¤ 2.
Now, θc can always be selected to make B+ nonsingular. In [130] one
suggestion was ±
 1, |γc | ≥ σ,
1 ’ sign(γc )σ
θc =
, otherwise,

1 ’ γc
where
(Bc y)T s
’1 (Bc As)T s
’1
γc = =
s2 s2
2 2
and σ ∈ (0, 1) is ¬xed. However the results in [130] assume only that the
ˆ
sequence {θn } satis¬es the hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1) and
that θc is always chosen so that B+ is nonsingular.
For linear problems one can show that the Dennis“Mor´ condition implies
e
convergence and hence the assumption that convergence takes place is not
needed. Speci¬cally
Proposition 7.2.1. Let A be nonsingular. Assume that {θn } satis¬es
ˆ
the hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1). If {θn } is such that
the matrices Bn obtained by (7.23) are nonsingular and {xn } is given by the
modi¬ed Broyden iteration
’1
(7.28) xn+1 = xn ’ Bn (Axn ’ b)

then the Dennis“Mor´ condition implies convergence of {xn } to x— = A’1 b.
e
We leave the proof as Exercise 7.5.1.
The superlinear convergence result is remarkable in that the initial iterate
need not be near the root.
Theorem 7.2.1. Let A be nonsingular. Assume that {θn } satis¬es the
ˆ
hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1). If {θn } is such the matrices Bn
obtained by (7.23) are nonsingular then the modi¬ed Broyden iteration (7.28)
converges q-superlinearly to x— = A’1 b for every initial iterate x0 .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




120 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Our approach is to show that for every φ ∈ RN that
En sn
lim φT
(7.29) = 0.
sn 2
n’∞

Assuming that (7.29) holds, setting φ = ej , the unit vector in the jth
En sn
coordinate direction, shows that the j component of has limit zero.
sn 2
As j was arbitrary, we have
En sn
lim =0
sn 2
n’∞

which is the Dennis“Mor´ condition. This will imply q-superlinear convergence
e
by Theorem 7.1.1 and Proposition 7.2.1.
Let φ ∈ RN be given. Let
sn sT
n
Pn = .
sn 22

Since for all v ∈ RN and all n
v T (En φ) = φT (En v)
T


and, by (7.26)
T T
En+1 φ = (I ’ θn Pn )En φ,
we may invoke Lemma 7.2.1 with
T
·n = sn / sn 2 , ψn = En φ, and =0
n

to conclude that
(En φ)T sn
T En sn
T
= φT
(7.30) ·n ψn = ’ 0.
sn 2 sn 2
This completes the proof.
The result here is not a local convergence result. It is global in the sense
that q-superlinear convergence takes place independently of the initial choices
of x and B. It is known [29], [82], [84], [143], that the Broyden iteration will
terminate in 2N steps in the linear case.

7.2.2. Nonlinear problems. Our analysis of the nonlinear case is di¬erent
from the classical work [28], [63]. As indicated in the preface, we provide a
proof based on ideas from [97], [115], and [104], that does not use the Frobenius
norm and extends to various in¬nite-dimensional settings.
For nonlinear problems our analysis is local and we set θn = 1. However
we must show that the sequence {Bn } of Broyden updates exists and remains
nonsingular. We make a de¬nition that allows us to compactly express this
fact.
Definition 7.2.1. We say that the Broyden sequence ({xn }, {Bn }) for the
data (F, x0 , B0 ) exists if F is de¬ned at xn for all n, Bn is nonsingular for all
n, and xn+1 and Bn+1 can be computed from xn and Bn using (7.1) and (7.2).



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




121
BROYDEN™S METHOD

We will show that if the standard assumptions hold and x0 and B0 are
su¬ciently good approximations to x— and F (x— ) then the Broyden sequence
exists for the data (F, x0 , B0 ) and that the Broyden iterates converge q-
superlinearly to x— . The development of the proof is complicated. We ¬rst
prove a bounded deterioration result like Lemma 7.2.2. However in the
nonlinear case, the errors in the Jacobian approximation can grow as the
iteration progresses. We then show that if the initial data is su¬ciently good,
this growth can be limited and therefore the Broyden sequence exists and
the Broyden iterates converges to x— at least q-linearly. Finally we verify the
Dennis“Mor´ condition (7.6), which, together with the q-linear convergence
e
proved earlier completes the proof of local q-superlinear convergence.

Bounded deterioration. Our ¬rst task is to show that bounded deteriora-
tion holds. Unlike the linear case, the sequence { En } need not be monotone
non-increasing. However, the possible increase can be bounded in terms of the
errors in the current and new iterates.
Theorem 7.2.2. Let the standard assumptions hold. Let xc ∈ „¦ and a
nonsingular matrix Bc be given. Assume that
’1
x+ = xc ’ Bc F (xc ) = xc + s ∈ „¦

and B+ is given by (7.2). Then

(7.31) E+ ¤ Ec + γ( ec + e+ 2 )/2
2 2

Proof. Note that (7.7) implies that

Ec s = ’F (x+ ) + (F (x+ ) ’ F (xc ) ’ F (x— )s)
(7.32) 1
(F (xc + ts) ’ F (x— ))s dt.
= ’F (x+ ) +
0

We begin by writing (7.32) as
1
(F (xc + ts) ’ F (x— ))s dt
F (x+ ) = ’Ec s +
0

and then using (7.2) to obtain

(∆c s)sT
E+ = Ec (I ’ Ps ) +
s2 2

where Ps is is given by (7.27) and
1
(F (xc + ts) ’ F (x— )) dt.
∆c =
0




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




122 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, by the standard assumptions
∆c ¤ γ( ec + e+ 2 )/2.
2 2

Just as in the proof of Lemma 7.2.2 for the linear case
E+ ¤ Ec + ∆c
2 2 2

<<

. 5
( 7)



>>