we may set

M1 = 2γ F (x— )’1 , and M2 = K F (x— )

and obtain, using (4.7) and (6.5),

F (x— )e+ ¤ (1 + M1 δ) r + M2 δ ec

¤ (1 + M1 δ)(1 + M0 δ)·c F (x— )ec

(6.7)

+M2 δ F (x— )’1 F (x— )ec

¤ ((1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ) F (x— )ec .

Now let δ be small enough so that

(1 + M1 δ)(1 + M0 δ)· + M2 δ F (x— )’1 ¤ ·

¯

and the proof is complete.

Note that the distance δ from the initial iterate to the solution may have

to be smaller for (6.4) to hold than for (6.2). However (6.4) is remarkable in its

assertion that any method that produces a step with a linear residual less than

that of the zero step will reduce the norm of the error if the error is measured

with the weighted norm

· — = F (x— ) · .

In fact, Theorem 6.1.3 asserts q-linear convergence in the weighted norm if the

initial iterate is su¬ciently near x— . This is made precise in Theorem 6.1.4.

The application of Theorem 6.1.3 described in Theorem 6.1.4 di¬ers from

Theorem 6.1.2 in that we do not demand that {·n } be bounded away from 1

by a su¬ciently large amount, just that 1 not be an accumulation point. The

importance of this theorem for implementation is that choices of the sequence

of forcing terms {·n } (such as ·n = .5 for all n) that try to minimize the

number of inner iterations are completely justi¬ed by this result if the initial

iterate is su¬ciently near the solution. We make such a choice in one of the

examples considered in § 6.4 and compare it to a modi¬cation of a choice from

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

99

INEXACT NEWTON METHODS

[69] that decreases ·n rapidly with a view toward minimizing the number of

outer iterations.

Theorem 6.1.4. Let the standard assumptions hold. Then there is δ such

that if x0 ∈ B(δ), {·n } ‚ [0, ·] with · < · < 1, then the inexact Newton

¯

iteration

xn+1 = xn + sn ,

where

F (xn )sn + F (xn ) ¤ ·n F (xn )

converges q-linearly with respect to · — to x— . Moreover

• if ·n ’ 0 the convergence is q-superlinear, and

• if ·n ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

with q-order 1 + p.

Proof. The proof uses both (6.4) and (6.2). Our ¬rst task is to relate the

norms · and · — so that we can estimate δ. Let δ0 be such that (6.4) holds

for ec < δ0 .

For all x ∈ RN ,

¤ F (x— ) x ¤ κ(F (x— )) x — ,

(6.8) x —

Note that e < δ0 if

< δ— = F (x— ) δ0

e —

Set δ = F (x— ) ’1 δ— . Then e0 — < δ— if e0 < δ.

Our proof does not rely on the (possibly false) assertions that en < δ for

all n or that xn ’ x— q-linearly with respect to the unweighted norm. Rather

we note that (6.4) implies that if en — < δ— then

(6.9) en+1 ¤ · en

¯ < δ—

— —

and hence xn ’ x— q-linearly with respect to the weighted norm. This proves

the ¬rst assertion.

To prove the assertions on q-superlinear convergence, note that since

xn ’ x— , eventually en will be small enough so that the conclusions of

Theorem 6.1.1 hold. The superlinear convergence results then follow exactly

as in the proof of Theorem 6.1.2.

We close this section with some remarks on the two types of results in

Theorems 6.1.2 and 6.1.4. The most signi¬cant di¬erence is the norm in which

q-linear convergence takes place. q-linear convergence with respect to the

weighted norm · — is equivalent to q-linear convergence of the sequence of

nonlinear residuals { F (xn ) }. We state this result as Proposition 6.1.1 and

leave the proof to the reader in Exercise 6.5.2. The implications for the choice of

the forcing terms {·n } are also di¬erent. While Theorem 6.1.2 might lead one

to believe that a small value of ·n is necessary for convergence, Theorem 6.1.4

shows that the sequence of forcing terms need only be kept bounded away from

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

100 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1. A constant sequence such as ·n = .5 may well su¬ce for linear convergence,

but at a price of a greater number of outer iterations. We will discuss other

factors in the choice of forcing terms in § 6.3.

Proposition 6.1.1. Let the standard assumptions hold and let xn ’ x— .

Then F (xn ) converges q-linearly to 0 if and only if en — does.

6.1.3. Errors in the function. In this section, we state two theorems on

inexact local improvement. We leave the proofs, which are direct analogs to

the proofs of Theorems 6.1.1 and 6.1.3, to the reader as exercises. Note that

errors in the derivative, such as those arising from a di¬erence approximation

of the action of F (xc ) on a vector, can be regarded as part of the inexact

solution of the equation for the Newton step. See [55] or Proposition 6.2.1 and

its proof for an illustration of this point.

Theorem 6.1.5. Let the standard assumptions hold. Then there are δ and

KI such that if xc ∈ B(δ), s satis¬es

(6.10) F (xc )s + F (xc ) + (xc ) ¤ ·c F (xc ) + (xc )

and x+ = xc + s, then

(6.11) e+ ¤ KI (( ec + ·c ) ec + (xc ) ).

Theorem 6.1.6. Let the standard assumptions hold. Then there is δ such

that if xc ∈ B(δ), s satis¬es (6.10), x+ = xc + s, and ·c ¤ · < · < 1, then

¯

there is KE such that

F (x— )e+ ¤ · F (x— )ec + KE (xc ) .

(6.12) ¯

6.2. Newton-iterative methods

A Newton-iterative method realizes (6.1) with an iterative method for the linear

system for the Newton step, terminating when the relative linear residual is

smaller than ·c (i.e, when (6.1) holds). The name of the method indicates

which linear iterative method is used, for example, Newton-SOR, Newton-CG,

Newton-GMRES, would use SOR, CG, or GMRES to solve the linear system.

This naming convention is taken from [175] and [145]. These methods have also

been called truncated Newton methods [56], [136] in the context of optimization.

Typically the nonlinear iteration that generates the sequence {xn } is called

the outer iteration and the linear iteration that generates the approximations

to the steps is called the inner iteration.

In this section we use the l2 norm to measure nonlinear residuals. The

reason for this is that the Krylov methods use the scalar product and the

estimates in Chapters 2 and 3 are in terms of the Euclidean norm. In the

examples discussed in § 6.4 we will scale the l2 norm by a factor of 1/N so

that the results for the di¬erential and integral equations will be independent

of the computational mesh.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

101

INEXACT NEWTON METHODS

6.2.1. Newton GMRES. We provide a detailed analysis of the Newton-

GMRES iteration. We begin by discussing the e¬ects of a forward di¬erence

approximation to the action of the Jacobian on a vector.

If the linear iterative method is any one of the Krylov subspace methods

discussed in Chapters 2 and 3 then each inner iteration requires at least one

evaluation of the action of F (xc ) on a vector. In the case of CGNR and CGNE,

an evaluation of the action of F (xc )T on a vector is also required. In many

implementations [20], [24], the action of F (xc ) on a vector w is approximated

by a forward di¬erence, (5.15), Dh F (x : w) for some h. It is important to note

that this is entirely di¬erent from forming the ¬nite di¬erence Jacobian ∇h F (x)

and applying that matrix to w. In fact, as pointed out in [20], application

of GMRES to the linear equation for the Newton step with matrix-vector

products approximated by ¬nite di¬erences is the same as the application of

GMRES to the matrix Gh F (x) whose last k ’ 1 columns are the vectors

vk = Dh F (x : vk’1 )

The sequence {vk } is formed in the course of the forward-di¬erence GMRES

iteration.

To illustrate this point, we give the forward-di¬erence GMRES algorithm

for computation of the Newton step, s = ’F (x)’1 F (x). Note that matrix-

vector products are replaced by forward di¬erence approximations to F (x), a

sequence of approximate steps {sk } is produced, that b = ’F (x), and that the

initial iterate for the linear problem is the zero vector. We give a MATLAB

implementation of this algorithm in the collection of MATLAB codes.

Algorithm 6.2.1. fdgmres(s, x, F, h, ·, kmax, ρ)

1. s = 0, r = ’F (x), v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > · F (x) and k < kmax do

2

(a) k = k + 1

(b) vk+1 = Dh F (x : vk )

for j = 1, . . . k

T

i. hjk = vk+1 vj

ii. vk+1 = vk+1 ’ hjk vj

(c) hk+1,k = vk+1 2

(d) vk+1 = vk+1 / vk+1 2

(e) e1 = (1, 0, . . . , 0)T ∈ Rk+1

Minimize βe1 ’ Hk y k Rk+1 to obtain y k ∈ Rk .

(f) ρ = βe1 ’ Hk y k Rk+1 .

3. s = Vk y k .

In our MATLAB implementation we solve the least squares problem in

step 2e by the Givens rotation approach used in Algorithm gmres.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

102 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

As should be clear from inspection of Algorithm gmres, the di¬erence

between ∇h F and Gh F is that the columns are computed by taking directional

derivatives based on two di¬erent orthonormal bases: ∇h F using the basis of

unit vectors in coordinate directions and Gh F the basis for the Krylov space Kk

⊥

constructed by algorithm fdgmres. The action of Gh F on Kk , the orthogonal

complement of Kk , is not used by the algorithm and, for the purposes of

⊥

analysis, could be speci¬ed by the action of ∇h F on any basis for Kk . Hence

Gh F is also ¬rst order accurate in h. Assuming that there is no error in the

evaluation of F we may summarize the observations above in the following

proposition, which is a special case of the more general results in [20].

Proposition 6.2.1. Let the standard assumptions hold. Let · ∈ (0, 1).

¯ ¯

Then there are CG , h, and δ such that if x ∈ B(δ), h ¤ h, and Algo-

rithm fdgmres terminates with k < kmax then the computed step s satis¬es

(6.13) F (x)s + F (x) < (· + CG h) F (x) 2 .

2

Proof. First let δ be small enough so that B(δ) ‚ „¦ and the conclusions

to Lemma 4.3.1 hold. Let {uj } be any orthonormal basis for RN such that

uj = vj for 1 ¤ j ¤ k, where {vj }k is the orthonormal basis for Kk generated

j=1

by Algorithm fdgmres.

Consider the linear map de¬ned by its action on {uj }

Buj = Dh F (x : uj ).

Note that

1

Dh F (x : uj ) = 0F (x + th x 2 uj )uj dt

1

= F (x)uj + 0 (F (x + th x 2 uj ) ’ F (x))uj dt.

Since F is Lipschitz continuous and {uj } is an orthonormal basis for RN we

have, with γ = γ( x— 2 + δ),

¯

(6.14) B ’ F (x) ¤ h x 2 γ/2 ¤ h¯ /2

γ

2

Since the linear iteration (fdgmres) terminates in k < kmax iterations, we

have, since B and Gh F agree on Kk ,

Bs + F (x) ¤ · F (x)

2 2

and therefore

(6.15) F (x)s + F (x) ¤ · F (x) + h¯ s 2 /2.

γ

2 2

Assume that

¯γ ’1

h¯ ¤ F (x— )’1 2 /2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

103

INEXACT NEWTON METHODS

Then Lemma 4.3.1 and (6.15) imply

’1 ’1

F (x— )’1 s 2 /2 ¤ F (x)’1 s 2

2 2

¯γ

¤ F (x)s ¤ (1 + ·) F (x) + h¯ s 2 /2.

2 2

Therefore,

¤ 4(1 + ·) F (x— )’1

(6.16) s F (x) 2 .

2 2

Combining (6.16) with (6.15) completes the proof with CG = 4¯ (1 + γ

— )’1 .

·) F (x 2

Proposition 6.2.1 implies that a ¬nite di¬erence implementation will not

a¬ect the performance of Newton-GMRES if the steps in the forward di¬erence

approximation of the derivatives are su¬ciently small. In an implementation,

therefore, we must specify not only the sequence {·n }, but also the steps {hn }

used to compute forward di¬erences. Note also that Proposition 6.2.1 applies

to a restarted GMRES because upon successful termination (6.15) will hold.

We summarize our results so far in the analogs of Theorem 6.1.4 and

Theorem 6.1.2. The proofs are immediate consequences of Theorems 6.1.2,

6.1.4, and Proposition 6.2.1 and are left as (easy) exercises.

Theorem 6.2.1. Assume that the assumptions of Proposition 6.2.1 hold.

Then there are δ, σ such that if x0 ∈ B(δ) and the sequences {·n } and {hn }

¯

satisfy

σn = ·n + CG hn ¤ σ ¯

then the forward di¬erence Newton-GMRES iteration

xn+1 = xn + sn

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly and sn satis¬es

(6.17) F (xn )sn + F (xn ) < σn F (xn ) 2 .

2

Moreover,

• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

2

with q-order 1 + p.

Theorem 6.2.2. Assume that the assumptions of Proposition 6.2.1 hold.

Then there is δ such that if x0 ∈ B(δ) and the sequences {·n } and {hn } satisfy

σn = ·n + CG hn ∈ [0, σ ]

¯

with 0 < σ < 1 then the forward di¬erence Newton-GMRES iteration

¯

xn+1 = xn + sn

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

104 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where sn is computed by Algorithm fdgmres with arguments

(sn , xn , F, hn , ·n , kmax, ρ)

converges q-linearly with respect to · — and sn satis¬es (6.17). Moreover

• if σn ’ 0 the convergence is q-superlinear, and

• if σn ¤ K· F (xn ) p for some K· > 0 the convergence is q-superlinear

2

with q-order 1 + p.

In the case where hn is approximately the square root of machine roundo¬,

σn ≈ ·n if ·n is not too small and Theorem 6.2.2 states that the observed

behavior of the forward di¬erence implementation will be the same as that of

an implementation with exact derivatives.

6.2.2. Other Newton-iterative methods. Other iterative methods may

be used as the solver for the linear equation for the Newton step. Newton-

multigrid [99] is one approach using a stationary iterative method. See also

[6], [111]. If F is symmetric and positive de¬nite, as it is in the context of

unconstrained minimization, [63], Newton-CG is a possible approach.

When storage is restricted or the problem is very large, GMRES may not

be practical. GMRES(m), for a small m, may not converge rapidly enough

to be useful. CGNR and CGNE o¬er alternatives, but require evaluation of

transpose-vector products. Bi-CGSTAB and TFQMR should be considered in

all cases where there is not enough storage for GMRES(m) to perform well.

If a transpose is needed, as it will be if CGNR or CGNE is used as the

iterative method, a forward di¬erence formulation is not an option because

approximation of the transpose-vector product by di¬erences is not possible.

Computation of ∇h F (x) and its transpose analytically is one option as is

di¬erentiation of F automatically or by hand. The reader may explore this

further in Exercise 6.5.13.

The reader may also be interested in experimenting with the other Krylov

subspace methods described in § 3.6, [78], and [12].

6.3. Newton-GMRES implementation

We provide a MATLAB code nsolgm in the collection that implements

a forward-di¬erence Newton-GMRES algorithm. This algorithm requires

di¬erent inputs than nsol. As input we must give the initial iterate, the

function, the vector of termination tolerances as we did for nsol. In addition,

we must provide a method for forming the sequence of forcing terms {·n }.

We adjust · as the iteration progresses with a variation of a choice from

[69]. This issue is independent of the particular linear solver and our discussion

is in the general inexact Newton method setting. Setting · to a constant for

the entire iteration is often a reasonable strategy as we see in § 6.4, but the

choice of that constant depends on the problem. If a constant · is too small,

much e¬ort can be wasted in the initial stages of the iteration. The choice in

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

105

INEXACT NEWTON METHODS

[69] is motivated by a desire to avoid such over solving. Over solving means

that the linear equation for the Newton step is solved to a precision far beyond

what is needed to correct the nonlinear iteration. As a measure of the degree

to which the nonlinear iteration approximates the solution we begin with

·n = γ F (xn ) 2 / F (xn’1 ) 2 ,

A

A

where γ ∈ (0, 1] is a parameter. If ·n is uniformly bounded away from 1,

A

then setting ·n = ·n for n > 0 would guarantee q-quadratic convergence by

Theorem 6.1.1. In this way, the most information possible would be extracted

from the inner iteration. In order to specify the choice at n = 0 and bound

the sequence away from 1 we set

±

·max , n = 0,

B

(6.18) ·n =

min(· A

max , ·n ), n > 0.

In (6.18) the parameter ·max is an upper limit on the sequence {·n }. In [69]

the choices γ = .9 and ·max = .9999 are used.

B

It may happen that ·n is small for one or more iterations while xn is still

far from the solution. A method of safeguarding was suggested in [69] to avoid

volatile decreases in ·n . The idea is that if ·n’1 is su¬ciently large we do not

let ·n decrease by much more than a factor of ·n’1 .

±

·max , n = 0,

C A 2

min(·max , ·n ), n > 0, γ·n’1 ¤ .1,

(6.19) ·n =

A 2 2

min(·max , max(·n , γ·n’1 )), n > 0, γ·n’1 > .1.

The constant .1 is somewhat arbitrary. This safeguarding does improve the

performance of the iteration.

There is a chance that the ¬nal iterate will reduce F far beyond the

desired level and that the cost of the solution of the linear equation for the last

step will be more accurate than is really needed. This oversolving on the ¬nal

step can be controlled comparing the norm of the current nonlinear residual

F (xn ) to the nonlinear residual norm at which the iteration would terminate

„t = „a + „r F (x0 )

and bounding ·n from below by a constant multiple of „t / F (xn ) . The

algorithm nsolgm does this and uses

C

(6.20) ·n = min(·max , max(·n , .5„t / F (xn ) )).

Exercise 6.5.9 asks the reader to modify nsolgm so that other choices of

the sequence {·n } can be used.

We use dirder to approximate directional derivatives and use the default

value of h = 10’7 . In all the examples in this book we use the value γ = .9 as

recommended in [69].

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

106 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Algorithm 6.3.1. nsolgm(x, F, „, ·)

√

1. rc = r0 = F (x) 2 / N

√

2. Do while F (x) 2 / N > „r r0 + „a

(a) Select ·.

(b) fdgmres(s, x, F, ·)

(c) x = x + s

(d) Evaluate F (x)

√

(e) r+ = F (x) 2 / N , σ = r+ /rc , rc = r+

(f) If F (x) ¤ „r r0 + „a exit.

2

Note that the cost of an outer iterate is one evaluation of F to compute

the value at the current iterate and other evaluations of F to compute the

forward di¬erences for each inner iterate. Hence if a large value of · can be

used, the cost of the entire iteration can be greatly reduced. If a maximum

of m GMRES iterations is needed (or GMRES(m) is used as a linear solver)

the storage requirements of Algorithm nsolgm are the m + 5 vectors x, F (x),

x + hv, F (x + hv), s, and the Krylov basis {vk }m .

k=1

Since GMRES forms the inner iterates and makes termination decisions

based on scalar products and l2 norms, we also terminate the outer iteration on

small l2 norms of the nonlinear residuals. However, because of our applications

to di¬erential and integral equations, we scale the l2 norm of the nonlinear

√

residual by a factor of 1/ N so that constant functions will have norms that

are independent of the computational mesh.

For large and poorly conditioned problems, GMRES will have to be

restarted. We illustrate the consequences of this in § 6.4 and ask the reader to

make the necessary modi¬cations to nsolgm in Exercise 6.5.9.

The preconditioners we use in § 6.4 are independent of the outer iteration.

Since the Newton steps for M F (x) = 0 are the same as those for F (x) = 0, it

is equivalent to precondition the nonlinear equation before calling nsolgm.

6.4. Examples for Newton-GMRES

In this section we consider two examples. We revisit the H-equation and solve

a preconditioned nonlinear convection-di¬usion equation.

In all the ¬gures we plot the relative nonlinear residual F (xn ) 2 / F (x0 ) 2

against the number of function evaluations required by all inner and outer

iterations to compute xn . Counts of function evaluations corresponding to

outer iterations are indicated by circles. From these plots one can compare

not only the number of outer iterations, but also the total cost. This enables

us to directly compare the costs of di¬erent strategies for selection of ·. Note

that if only a single inner iteration is needed, the total cost of the outer iteration

will be two function evaluations since F (xc ) will be known. One new function

evaluation will be needed to approximate the action of F (xc ) on a vector

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

107

INEXACT NEWTON METHODS

and then F (x+ ) will be evaluated to test for termination. We also count

the evaluation of F (x0 ) as an additional cost in the evaluation of x1 , which

therefore has a minimum cost of three function evaluations. For each example

we compare a constant value of ·n with the choice given in (6.20) and used as

the default in nsolgm.

6.4.1. Chandrasekhar H-equation. We solve the H-equation on a 100-

point mesh with c = .9 using two schemes for selection of the parameter ·.

The initial iterate is the function identically one. We set the parameters in

(6.20) to γ = .9 and ·max = .25.

We used „r = „a = 10’6 in this example.

In Fig. 6.1 we plot the progress of the iteration using · = .1 with the solid

line and using the sequence given by (6.20) with the dashed line. We set the

parameters in (6.20) to γ = .9 and ·max = .25. This choice of ·max was the

one that did best overall in our experiments on this problem.

We see that the constant · iteration terminates after 12 function evalua-

tions, 4 outer iterations, and roughly 275 thousand ¬‚oating-point operations.

This is a slightly higher overall cost than the other approach in which {·n } is

given by (6.20), which terminated after 10 function evaluations, 3 outer itera-

tions, and 230 thousand ¬‚oating-point operations. The chord method with the

di¬erent (but very similar for this problem) l∞ termination condition for the

same problem, reported in § 5.6 incurred a cost of 3.3 million ¬‚oating-point

operations, slower by a factor of over 1000. This cost is entirely a result of the

computation and factorization of a single Jacobian.

0

10

-1

10

-2

10

Relative Nonlinear Residual

-3

10

-4

10

-5

10

-6

10

-7

10

-8

10

0 2 4 6 8 10 12

Function Evaluations

Fig. 6.1. Newton-GMRES for the H-equation, c = .9.

In Fig. 6.2 the results change for the more ill-conditioned problem with

c = .9999. Here the iteration with ·n = .1 performed slightly better and

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

108 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

0

10

-1

10

Relative Nonlinear Residual

-2

10

-3

10

-4

10

-5

10

-6

10

-7

10

0 5 10 15 20 25

Function Evaluations

Fig. 6.2. Newton-GMRES for the H-equation, c = .9999

0

10

-1

10

Relative Nonlinear Residual

-2

10

-3

10

-4

10

0 2 4 6 8 10 12 14 16 18

Function Evaluations

Fig. 6.3. Newton-GMRES for the PDE, C = 20.

terminated after 22 function evaluations and 7 outer iterations, and roughly

513 thousand ¬‚oating-point operations. The iteration with the decreasing {·n }

given by (6.20) required 23 function evaluations, 7 outer iterations, and 537

thousand ¬‚oating-point operations.

6.4.2. Convection-di¬usion equation. We consider the partial di¬eren-

tial equation

’ ∇2 u + Cu(ux + uy ) = f

(6.21)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

109

INEXACT NEWTON METHODS

with homogeneous Dirichlet boundary conditions on the unit square (0, 1) —

(0, 1). f has been constructed so that the exact solution was the discretization

of

10xy(1 ’ x)(1 ’ y) exp(x4.5 ).

We set C = 20 and u0 = 0. As in § 3.7 we discretized (6.21) on a 31 — 31 grid

using centered di¬erences. To be consistent with the second-order accuracy of

the di¬erence scheme we used „r = „a = h2 , where h = 1/32.

We compare the same possibilities for {·n } as in the previous example and

report the results in Fig. 6.3. In (6.20) we set γ = .9 and ·max = .5. The

computation with constant · terminated after 19 function evaluations and 4

outer iterates at a cost of roughly 3 million ¬‚oating-point operations. The

iteration with {·n } given by (6.20) required 16 function evaluations, 4 outer

iterations, and 2.6 million ¬‚oating-point operations. In the computation we

preconditioned (6.21) with the fast Poisson solver fish2d.

In this case, the choice of {·n } given by (6.20) reduced the number of inner

iterations in the early stages of the iteration and avoided oversolving. In cases

where the initial iterate is very good and only a single Newton iterate might be

needed, however, a large choice of ·max or constant · may be the more e¬cient

choice. Examples of such cases include (1) implicit time integration of nonlinear

parabolic partial di¬erential equations or ordinary di¬erential equations in

which the initial iterate is either the converged solution from the previous

time step or the output of a predictor and (2) solution of problems in which

the initial iterate is an interpolant of the solution from a coarser mesh.

Preconditioning is crucial for good performance. When preconditioning

was not done, the iteration for C = 20 required more than 80 function

evaluations to converge. In Exercise 6.5.9 the reader is asked to explore this.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

110 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

6.5. Exercises on inexact Newton methods

6.5.1. Verify (6.8).

6.5.2. Prove Proposition 6.1.1 by showing that if the standard assumptions

hold, 0 < < 1, and x is su¬ciently near x— then

F (x) /(1 + ) ¤ F (x— )e ¤ (1 + ) F (x) .

6.5.3. Verify (6.14).

6.5.4. Prove Theorems 6.2.2 and 6.2.1.

6.5.5. Prove Theorems 6.1.5 and 6.1.6.

6.5.6. Can anything like Proposition 6.2.1 be proved for a ¬nite-di¬erence Bi-

CGSTAB or TFQMR linear solver? If not, what are some possible

implications?

6.5.7. Give an example in which F (xn ) ’ 0 q-linearly but xn does not converge

to x— q-linearly.

6.5.8. In the DASPK code for integration of di¬erential algebraic equations,

[23], preconditioned Newton-GMRES is implemented in such a way

that the data needed for the preconditioner is only computed when

needed, which may be less often that with each outer iteration. Discuss

this strategy and when it might be useful. Is there a relation to the

Shamanskii method?

6.5.9. Duplicate the results in § 6.4. For the H-equation experiment with

various values of c, such as c = .5, .99999, .999999, and for the convection-

di¬usion equation various values of C such as C = 1, 10, 40 and how

preconditioning a¬ects the iteration. Experiment with the sequence of

forcing terms {·n }. How do the choices ·n = 1/(n + 1), ·n = 10’4

[32], ·n = 21’n [24], ·n = min( F (xn ) 2 , (n + 2)’1 ) [56], ·n = .05, and

·n = .75 a¬ect the results? Present the results of your experiments as

graphical iteration histories.

6.5.10. For the H-equation example in § 6.4, vary the parameter c and see how

the performance of Newton-GMRES with the various choices of {·n } is

a¬ected. What happens when c = 1? See [119] for an explanation of

this.

6.5.11. Are the converged solutions for the preconditioned and unpreconditioned

convection-di¬usion example in § 6.4 equally accurate?

6.5.12. For the convection-di¬usion example in § 6.4, how is the performance

a¬ected if GMRES(m) is used instead of GMRES?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

111

INEXACT NEWTON METHODS

6.5.13. Apply Newton-CGNR, Newton-Bi-CGSTAB, or Newton-TFQMR to the

two examples in § 6.4. How will you deal with the need for a transpose in

the case of CGNR? How does the performance (in terms of time, ¬‚oating-

point operations, function evaluations) compare to Newton-GMRES?

Experiment with the forcing terms. The MATLAB codes fdkrylov,

which allows you to choose from a variety of forward di¬erence Krylov

methods, and nsola, which we describe more fully in Chapter 8, might

be of use in this exercise.

6.5.14. If you have done Exercise 5.7.25, apply nsolgm to the same two-point

boundary value problem and compare the performance.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

112 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 7

Broyden™s method

Quasi-Newton methods maintain approximations of the solution x— and the

Jacobian at the solution F (x— ) as the iteration progresses. If xc and Bc are

the current approximate solution and Jacobian, then

’1

(7.1) x+ = xc ’ Bc F (xc ).

After the computation of x+ , Bc is updated to form B+ . The construction of

B+ determines the quasi-Newton method.

The advantages of such methods, as we shall see in § 7.3, is that solution of

equations with the quasi-Newton approximation to the Jacobian is often much

cheaper than using F (xn ) as the coe¬cient matrix. In fact, if the Jacobian is

dense, the cost is little more than that of the chord method. The method we

present in this chapter, Broyden™s method, is locally superlinearly convergent,

and hence is a very powerful alternative to Newton™s method or the chord

method.

In comparison with Newton-iterative methods, quasi-Newton methods

require only one function evaluation for each nonlinear iterate; there is no cost

in function evaluations associated with an inner iteration. Hence, if a good

preconditioner (initial approximation to F (x— )) can be found, these methods

could have an advantage in terms of function evaluation cost over Newton-

iterative methods.

Broyden™s method [26] computes B+ by

(y ’ Bc s)sT F (x+ )sT

(7.2) B+ = Bc + = Bc + .

sT s sT s

In (7.2) y = F (x+ ) ’ F (xc ) and s = x+ ’ xc .

Broyden™s method is an example of a secant update. This means that the

updated approximation to F (x— ) satis¬es the secant equation

(7.3) B+ s = y.

In one dimension, (7.3) uniquely speci¬es the classical secant method. For

equations in several variables (7.3) alone is a system of N equations in N 2

113

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

114 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

unknowns. The role of the secant equation is very deep. See [62] and [64] for

discussion of this topic. In this book we focus on Broyden™s method and our

methods of analysis are not as general as those in [62] and [64]. In this chapter

we will assume that the function values are accurate and refer the reader to [66],

[65], and [114] for discussions of quasi-Newton methods with inaccurate data.

Locally superlinearly convergent secant methods can be designed to maintain

the sparsity pattern, symmetry, or positive de¬niteness, of the approximate

Jacobian, and we refer the reader to [28], [63], [62], and [64] for discussion and

analysis of several such methods. More subtle structural properties can also

be preserved by secant methods. Some examples can be found in [101], [96],

[95], [113], and [116].

Broyden™s method is also applicable to linear equations [29], [82], [84],

[104], [143], [130], Ax = b, where B ≈ A is updated. One can also apply the

method to linear least squares problems [84], [104]. The analysis is somewhat

clearer in the linear case and the plan of this chapter is to present results for

the linear case for both the theory and the examples. One interesting result,

which we will not discuss further, is that for linear problems Broyden™s method

converges in 2N iterations [29], [82], [84], [143].

We will express our results in terms of the error in the Jacobian

E = B ’ F (x— )

(7.4)

and the step s = x+ ’xc . While we could use E = B ’F (x), (7.4) is consistent

with our use of e = x ’ x— . When indexing of the steps is necessary, we will

write

sn = xn+1 ’ xn .

In this chapter we show that if the data x0 and B0 are su¬ciently good,

the Broyden iteration will converge q-superlinearly to the root.

7.1. The Dennis“Mor´ condition

e

In this section we consider general iterative methods of the form

’1

(7.5) xn+1 = xn ’ Bn F (xn )

where Bn = F (x— )+En ≈ F (x— ) is generated by some method (not necessarily

Broyden™s method).

Veri¬cation of q-superlinear convergence cannot be done by the direct

approaches used in Chapters 5 and 6. We must relate the superlinear

convergence condition that ·n ’ 0 in Theorem 6.1.2 to a more subtle, but

easier to verify condition in order to analyze Broyden™s method. This technique

is based on veri¬cation of the Dennis“Mor´ condition [61], [60] on the sequence

e

of steps {sn } and errors in the Jacobian {En }

En sn

(7.6) lim = 0.

sn

n’∞

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

115

BROYDEN™S METHOD

The main result in this section, Theorem 7.1.1 and its corollary for linear

problems, are used to prove superlinear convergence for Broyden™s method in

this chapter and many other quasi-Newton methods. See [61], [28], [62], and

[64] for several more examples. Our formulation of the Dennis“Mor´ result is

e

a bit less general than that in [60] or [63].

Theorem 7.1.1. Let the standard assumptions hold, let {Bn } be a sequence

of nonsingular N — N matrices, let x0 ∈ RN be given and let {xn }∞ be given

n=1

— for any n. Then x ’ x— q-superlinearly if

by (7.5). Assume that xn = x n

— and the Dennis“Mor´ condition (7.6) holds.

and only if xn ’ x e

Proof. Since

’F (xn ) = Bn sn = F (x— )sn + En sn

we have

(7.7) En sn = ’F (x— )sn ’ F (xn ) = ’F (x— )en+1 + F (x— )en ’ F (xn ).

We use the fundamental theorem of calculus and the standard assumptions to

obtain

1

—

(F (x— ) ’ F (x— + ten ))en dt

F (x )en ’ F (xn ) =

0

and hence

F (x— )en ’ F (xn ) ¤ γ en 2 /2.

Therefore, by (7.7)

En sn ¤ F (x— )en+1 + γ en 2 /2.

(7.8)

Now, if xn ’ x— q-superlinearly, then for n su¬ciently large

(7.9) sn /2 ¤ en ¤ 2 sn .

The assumption that xn = x— for any n implies that the sequence

(7.10) νn = en+1 / en

is de¬ned and and νn ’ 0 by superlinear convergence of {xn }. By (7.9) we

have

en+1 = νn en ¤ 2νn sn ,

and hence (7.8) implies that

En sn ¤ (2 F (x— ) νn + γ en ) sn

which implies the Dennis-Mor´ condition (7.6).

e

Conversely, assume that xn ’ x— and that (7.6) holds. Let

En sn

µn = .

sn

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

116 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We have, since

En sn = (Bn ’ F (x— ))sn = ’F (xn ) ’ F (x— )sn ,

(7.11)

F (x— )’1 ’1 ¤ F (x— )sn ¤ En sn + F (xn )

sn

(7.12)

= µn sn + F (xn ) .

Since µn ’ 0 by assumption,

µn ¤ F (x— )’1 ’1

(7.13) /2,

for n su¬ciently large. We assume now that n is large enough so that (7.13)

holds. Hence,

sn ¤ 2 F (x— )’1 F (xn ) .

(7.14)

Moreover, using the standard assumptions and (7.11) again,

¤ F (xn ) + F (x— )sn + (F (x— ) ’ F (xn ))sn

F (xn ) + F (xn )sn

(7.15)

¤ (µn + γ en ) sn .

Since xn ’ x— by assumption, ·n ’ 0 where

·n = 2 F (x— )’1 (µn + γ en ).

Combining (7.14) and (7.15) implies that

(7.16) F (xn ) + F (xn )sn ¤ ·n F (xn ) ,

which is the inexact Newton condition. Since ·n ’ 0, the proof is complete

by Theorem 6.1.2.

7.2. Convergence analysis

Our convergence analysis is based on an approach to in¬nite-dimensional

problems from [97]. That paper was the basis for [168], [104], and [115], and

we employ notation from all four of these references. We will use the l2 norm

throughout our discussion. We begin with a lemma from [104].

ˆ

Lemma 7.2.1. Let 0 < θ < 1 and let

ˆ ˆ

{θn }∞ ‚ (θ, 2 ’ θ).

n=0

Let { n }∞ ‚ RN be such that

n=0

< ∞,

n2

n

and let {·n }∞ be a set of vectors in RN such that ·n is either 1 or 0 for

2

n=0

all n. Let ψ0 ∈ RN be given. If {ψn }∞ is given by

n=1

T

(7.17) ψn+1 = ψn ’ θn (·n ψn )·n + n

then

T

(7.18) lim ·n ψn = 0.

n’∞

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

117

BROYDEN™S METHOD

Proof. We ¬rst consider the case in which n = 0 for all n. In that case, we

can use the fact that

ˆ

θn (2 ’ θn ) > θ2 > 0

to show that the sequence {ψn } is bounded in l2 -norm by ψ0 and, in fact,

2

2 2 ’ θn (2 ’ θn )(·n ψn )2

T

ψn+1 ¤ ψn

2 2

ˆT

2 ’ θ2 (·n ψn )2

¤ ψn 2

¤ ψn 2 .

2

Therefore, for any M > 0,

M 2 2

ψ0 ’ ψM +1 ψ0 2

2 2

(·n ψn )2

T

¤ ¤ .

ˆ ˆ

θ2 θ2

n=0

We let M ’ ∞ to obtain

∞

(·n ψn )2 < ∞.

T

(7.19)

n=0

Convergence of the series in (7.19) implies convergence to zero of the terms in

the series, which is (7.18).

To prove the result for n = 0 we use the inequality

b2

a2 b2

(7.20) ’ ¤a’ ,

2a

which is valid for a > 0 and |b| ¤ a. This inequality is used often in the analysis

of quasi-Newton methods [63]. From (7.20) we conclude that if ψn = 0 then

T 2 ’ θn (2 ’ θn )(·n ψn )2

T

ψn ’ θn (·n ψn )·n ¤ ψn

2 2

θn (2 ’ θn )(·n ψn )2

T

¤ ψn 2’ .

2 ψn 2

Hence if ψn = 0

θn (2 ’ θn )(·n ψn )2

T

(7.21) ψn+1 ¤ ψn 2’ + n 2.

2

2 ψn 2

Hence

2 ψn 2

(·n ψn )2 ¤

T

(7.22) ( ψn ’ ψn+1 + n 2 ),

2 2

θn (2 ’ θn )

which holds even if ψn = 0.

From (7.21) we conclude that

ψn+1 ¤ µ,

2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

118 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where ∞

µ= + ψ0 2 .

i2

i=0

Hence

M

2µ

M T 2

n=0 (·n ψn ) ¤ ( ψn ’ ψn+1 + n 2)

2 2

ˆ

θ2 n=0

M

2µ

= ψ0 ’ ψM +1 +

2 2 n2

ˆ

θ2 n=0

2µ2

¤ .

ˆ

θ2

Hence (7.19) holds and the proof is complete.

7.2.1. Linear problems. We may use Lemma 7.2.1 to prove a result from

[130] on the convergence of Broyden™s method for linear problems. The role of

ˆ

the parameter θ and the sequence {θn } in Lemma 7.2.1 is nicely illustrated by

this application of the lemma.

In this section F (x) = Ax’b and Bn = A+En . The standard assumptions

in this case reduce to nonsingularity of A. Our ¬rst task is to show that

the errors in the Broyden approximations to A do not grow. This property,

called bounded deterioration [28], is an important part of the convergence

analysis. The reader should compare the statement of the result with that

in the nonlinear case.

In the linear case, the Broyden update has a very simple form. We will

consider a modi¬ed update

(y ’ Bc s)sT (Ax+ ’ b)sT

(7.23) B+ = Bc + θc = Bc + θc .

sT s sT s

If xc and Bc are the current approximations to x— = A’1 b and A and

s = x+ ’ xc then

y ’ Bc s = (Ax+ ’ Axc ) ’ Bc (x+ ’ xc ) = ’Ec s

and therefore

(y ’ Bc s)sT (Ec s)sT

(7.24) B+ = Bc + θc = Bc ’ θc .

s2 s2

2 2

Lemma 7.2.2. Let A be nonsingular, θc ∈ [0, 2], and xc ∈ RN be given. Let

Bc be nonsingular and let B+ be formed by the Broyden update (7.23). Then

(7.25) E+ ¤ Ec 2 .

2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

119

BROYDEN™S METHOD

Proof. By subtracting A from both sides of (7.24) we have

(Ec s)sT

E+ = Ec ’ θ c

s2

(7.26) 2

= Ec (I ’ θc Ps ),

where Ps is the orthogonal projector

ssT

(7.27) Ps = .

s22

This completes the proof as

E+ ¤ Ec I ’ θ c Ps

2 2 2

and I ’ θc Ps 2 ¤ 1 by orthogonality of Ps and the fact that 0 ¤ θc ¤ 2.

Now, θc can always be selected to make B+ nonsingular. In [130] one

suggestion was ±

1, |γc | ≥ σ,

1 ’ sign(γc )σ

θc =

, otherwise,

1 ’ γc

where

(Bc y)T s

’1 (Bc As)T s

’1

γc = =

s2 s2

2 2

and σ ∈ (0, 1) is ¬xed. However the results in [130] assume only that the

ˆ

sequence {θn } satis¬es the hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1) and

that θc is always chosen so that B+ is nonsingular.

For linear problems one can show that the Dennis“Mor´ condition implies

e

convergence and hence the assumption that convergence takes place is not

needed. Speci¬cally

Proposition 7.2.1. Let A be nonsingular. Assume that {θn } satis¬es

ˆ

the hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1). If {θn } is such that

the matrices Bn obtained by (7.23) are nonsingular and {xn } is given by the

modi¬ed Broyden iteration

’1

(7.28) xn+1 = xn ’ Bn (Axn ’ b)

then the Dennis“Mor´ condition implies convergence of {xn } to x— = A’1 b.

e

We leave the proof as Exercise 7.5.1.

The superlinear convergence result is remarkable in that the initial iterate

need not be near the root.

Theorem 7.2.1. Let A be nonsingular. Assume that {θn } satis¬es the

ˆ

hypotheses of Lemma 7.2.1 for some θ ∈ (0, 1). If {θn } is such the matrices Bn

obtained by (7.23) are nonsingular then the modi¬ed Broyden iteration (7.28)

converges q-superlinearly to x— = A’1 b for every initial iterate x0 .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

120 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Our approach is to show that for every φ ∈ RN that

En sn

lim φT

(7.29) = 0.

sn 2

n’∞

Assuming that (7.29) holds, setting φ = ej , the unit vector in the jth

En sn

coordinate direction, shows that the j component of has limit zero.

sn 2

As j was arbitrary, we have

En sn

lim =0

sn 2

n’∞

which is the Dennis“Mor´ condition. This will imply q-superlinear convergence

e

by Theorem 7.1.1 and Proposition 7.2.1.

Let φ ∈ RN be given. Let

sn sT

n

Pn = .

sn 22

Since for all v ∈ RN and all n

v T (En φ) = φT (En v)

T

and, by (7.26)

T T

En+1 φ = (I ’ θn Pn )En φ,

we may invoke Lemma 7.2.1 with

T

·n = sn / sn 2 , ψn = En φ, and =0

n

to conclude that

(En φ)T sn

T En sn

T

= φT

(7.30) ·n ψn = ’ 0.

sn 2 sn 2

This completes the proof.

The result here is not a local convergence result. It is global in the sense

that q-superlinear convergence takes place independently of the initial choices

of x and B. It is known [29], [82], [84], [143], that the Broyden iteration will

terminate in 2N steps in the linear case.

7.2.2. Nonlinear problems. Our analysis of the nonlinear case is di¬erent

from the classical work [28], [63]. As indicated in the preface, we provide a

proof based on ideas from [97], [115], and [104], that does not use the Frobenius

norm and extends to various in¬nite-dimensional settings.

For nonlinear problems our analysis is local and we set θn = 1. However

we must show that the sequence {Bn } of Broyden updates exists and remains

nonsingular. We make a de¬nition that allows us to compactly express this

fact.

Definition 7.2.1. We say that the Broyden sequence ({xn }, {Bn }) for the

data (F, x0 , B0 ) exists if F is de¬ned at xn for all n, Bn is nonsingular for all

n, and xn+1 and Bn+1 can be computed from xn and Bn using (7.1) and (7.2).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

121

BROYDEN™S METHOD

We will show that if the standard assumptions hold and x0 and B0 are

su¬ciently good approximations to x— and F (x— ) then the Broyden sequence

exists for the data (F, x0 , B0 ) and that the Broyden iterates converge q-

superlinearly to x— . The development of the proof is complicated. We ¬rst

prove a bounded deterioration result like Lemma 7.2.2. However in the

nonlinear case, the errors in the Jacobian approximation can grow as the

iteration progresses. We then show that if the initial data is su¬ciently good,

this growth can be limited and therefore the Broyden sequence exists and

the Broyden iterates converges to x— at least q-linearly. Finally we verify the

Dennis“Mor´ condition (7.6), which, together with the q-linear convergence

e

proved earlier completes the proof of local q-superlinear convergence.

Bounded deterioration. Our ¬rst task is to show that bounded deteriora-

tion holds. Unlike the linear case, the sequence { En } need not be monotone

non-increasing. However, the possible increase can be bounded in terms of the

errors in the current and new iterates.

Theorem 7.2.2. Let the standard assumptions hold. Let xc ∈ „¦ and a

nonsingular matrix Bc be given. Assume that

’1

x+ = xc ’ Bc F (xc ) = xc + s ∈ „¦

and B+ is given by (7.2). Then

(7.31) E+ ¤ Ec + γ( ec + e+ 2 )/2

2 2

Proof. Note that (7.7) implies that

Ec s = ’F (x+ ) + (F (x+ ) ’ F (xc ) ’ F (x— )s)

(7.32) 1

(F (xc + ts) ’ F (x— ))s dt.

= ’F (x+ ) +

0

We begin by writing (7.32) as

1

(F (xc + ts) ’ F (x— ))s dt

F (x+ ) = ’Ec s +

0

and then using (7.2) to obtain

(∆c s)sT

E+ = Ec (I ’ Ps ) +

s2 2

where Ps is is given by (7.27) and

1

(F (xc + ts) ’ F (x— )) dt.

∆c =

0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

122 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, by the standard assumptions

∆c ¤ γ( ec + e+ 2 )/2.

2 2

Just as in the proof of Lemma 7.2.2 for the linear case

E+ ¤ Ec + ∆c

2 2 2