cost N times the cost of an evaluation of F because each column in F requires

an evaluation of F to form the di¬erence approximation. Hence the cost of a

Newton step may be roughly estimated as N + 1 evaluations of F and O(N 3 )

¬‚oating-point operations. In many cases F can be computed more e¬ciently,

accurately, and directly than with di¬erences and the analysis above for the

cost of a Newton iterate is very pessimistic. See Exercise 5.7.21 for an example.

For general nonlinearities or general purpose codes such as the MATLAB code

nsol from the collection, there may be little alternative to di¬erence Jacobians.

In the future, automatic di¬erentiation (see [94] for a collection of articles on

this topic) may provide such an alternative.

For de¬niteness in the description of Algorithm newton we use an LU

factorization of F . Any other appropriate factorization such as QR or

Cholesky could be used as well. The inputs of the algorithm are the initial

iterate x, the nonlinear map F , and a vector of termination tolerances

„ = („r , „a ) ∈ R2 .

Algorithm 5.3.1. newton(x, F, „ )

1. r0 = F (x)

2. Do while F (x) > „r r0 + „a

(a) Compute F (x)

(b) Factor F (x) = LU .

(c) Solve LU s = ’F (x)

(d) x = x + s

(e) Evaluate F (x).

One approach to reducing the cost of items 2a and item 2b in the Newton

iteration is to move them outside of the main loop. This means that the linear

approximation of F (x) = 0 that is solved at each iteration has derivative

determined by the initial iterate.

x+ = xc ’ F (x0 )’1 F (xc ).

This method is called the chord method. The inputs are the same as for

Newton™s method.

Algorithm 5.3.2. chord(x, F, „ )

1. r0 = F (x)

2. Compute F (x)

3. Factor F (x) = LU .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

75

NEWTON™S METHOD

4. Do while F (x) > „r r0 + „a

(a) Solve LU s = ’F (x)

(b) x = x + s

(c) Evaluate F (x).

The only di¬erence in implementation from Newton™s method is that the

computation and factorization of the Jacobian are done before the iteration

is begun. The di¬erence in the iteration itself is that an approximation to

F (xc ) is used. Similar di¬erences arise if F (xc ) is numerically approximated

by di¬erences. We continue in the next section with an analysis of the e¬ects

of errors in the function and Jacobian in a general context and then consider

the chord method and di¬erence approximation to F as applications.

5.4. Errors in the function and derivative

Suppose that F and F are computed inaccurately so that F + and F + ∆

are used instead of F and F in the iteration. If ∆ is su¬ciently small, the

resulting iteration can return a result that is an O( ) accurate approximation

to x— . This is di¬erent from convergence and was called “local improvement”

in [65]. These issues are discussed in [201] as well. If, for example, is entirely

due to ¬‚oating-point roundo¬, there is no reason to expect that F (xn ) will

ever be smaller than in general. We will refer to this phase of the iteration

in which the nonlinear residual is no longer being reduced as stagnation of the

iteration.

Theorem 5.4.1. Let the standard assumptions hold. Then there are

¯

K > 0, δ > 0, and δ1 > 0 such that if xc ∈ B(δ) and ∆(xc ) < δ1 then

x+ = xc ’ (F (xc ) + ∆(xc ))’1 (F (xc ) + (xc ))

is de¬ned (i.e., F (xc ) + ∆(xc ) is nonsingular) and satis¬es

¯ 2

(5.5) e+ ¤ K( ec + ∆(xc ) ec + (xc ) ).

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and

Theorem 5.1.1 hold. Let

xN = xc ’ F (xc )’1 F (xc )

+

and note that

x+ = xN + (F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 )F (xc ) ’ (F (xc ) + ∆(xc ))’1 (xc ).

+

Lemma 4.3.1 and Theorem 5.1.1 imply

2 + 2 F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ) F (x— ) ec

e+ ¤ K ec

(5.6)

+ (F (xc ) + ∆(xc ))’1 (xc ) .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

76 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If

∆(xc ) ¤ F (x— )’1 ’1

/4

then Lemma 4.3.1 implies that

∆(xc ) ¤ F (xc )’1 ’1

/2

and the Banach Lemma implies that F (xc ) + ∆(xc ) is nonsingular and

(F (xc ) + ∆(xc ))’1 ¤ 2 F (xc )’1 ¤ 4 F (x— )’1 .

Hence

F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ¤ 8 F (x— )’1 2

∆(xc ) .

We use these estimates and (5.6) to obtain

2

+ 16 F (x— )’1 2

F (x— ) ∆(xc ) ec + 4 F (x— )’1

e+ ¤ K ec (xc ) .

Setting

¯

K = K + 16 F (x— )’1 2

F (x— ) + 4 F (x— )’1

completes the proof.

The remainder of this section is devoted to applications of Theorem 5.4.1.

5.4.1. The chord method. Recall that the chord method is given by

x+ = xc ’ F (x0 )’1 F (xc ).

In the language of Theorem 5.4.1

(xc ) = 0, ∆(xc ) = F (x0 ) ’ F (xc ).

Hence, if xc , x0 ∈ B(δ) ‚ „¦

(5.7) ∆(xc ) ¤ γ x0 ’ xc ¤ γ( e0 + ec ).

We apply Theorem 5.4.1 to obtain the following result.

Theorem 5.4.2. Let the standard assumptions hold. Then there are

KC > 0 and δ > 0 such that if x0 ∈ B(δ) the chord iterates converge q-linearly

to x— and

(5.8) en+1 ¤ KC e0 en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of

Theorem 5.4.1 hold. Assume that xn ∈ B(δ). Combining (5.7) and (5.5)

implies

¯ ¯

en+1 ¤ K( en (1 + γ) + γ e0 ) en ¤ K(1 + 2γ)δ en .

Hence if δ is small enough so that

¯

K(1 + 2γ)δ = · < 1

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

77

NEWTON™S METHOD

then the chord iterates converge q-linearly to x— . Q-linear convergence implies

¯

that en < e0 and hence (5.8) holds with KC = K(1 + 2γ).

Another variation of the chord method is

x+ = xc ’ A’1 F (xc ),

where A ≈ F (x— ). Methods of this type may be viewed as preconditioned

nonlinear Richardson iteration. Since

∆(xc ) = A ’ F (xc ) ¤ A ’ F (x— ) + F (x— ) ’ F (xc ) ,

if xc ∈ B(δ) ‚ „¦ then

∆(xc ) ¤ A ’ F (x— ) + γ ec ¤ A ’ F (x— ) + γδ.

Theorem 5.4.3. Let the standard assumptions hold. Then there are

KA > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and A ’ F (x— ) < δ1 then

the iteration

xn+1 = xn ’ A’1 F (xn )

converges q-linearly to x— and

en+1 ¤ KA ( e0 + A ’ F (x— ) ) en .

(5.9)

5.4.2. Approximate inversion of F . Another way to implement chord-

type methods is to provide an approximate inverse of F . Here we replace

F (x)’1 by B(x), where the action of B on a vector is less expensive to compute

than a solution using the LU factorization. Rather than express the iteration

in terms of B ’1 (x) ’ F (x) and using Theorem 5.4.3 one can proceed directly

from the de¬nition of approximate inverse.

Note that if B is constant (independent of x) then the iteration

x+ = xc ’ BF (xc )

can be viewed as a preconditioned nonlinear Richardson iteration.

We have the following result.

Theorem 5.4.4. Let the standard assumptions hold. Then there are

KB > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and the matrix-valued

function B(x) satis¬es

I ’ B(x)F (x— ) = ρ(x) < δ1

(5.10)

for all x ∈ B(δ) then the iteration

xn+1 = xn ’ B(xn )F (xn )

converges q-linearly to x— and

(5.11) en+1 ¤ KB (ρ(xn ) + en ) en .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

78 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. On one hand, this theorem could be regarded as a corollary of the

Banach Lemma and Theorem 5.4.1. We give a direct proof.

First, by (5.10) we have

= B(x)F (x— )F (x— )’1 ¤ B(x)F (x— ) F (x— )’1

B(x)

(5.12)

¤ MB = (1 + δ1 ) F (x— )’1 .

Using (5.12) and

1

(I ’ B(xc )F (x— + tec ))ec dt

e+ = ec ’ B(xc )F (xc ) =

0

1

= (I ’ B(xc )F (x— ))ec + B(xc ) (F (x— ) ’ F (x— + tec ))ec dt

0

we have

e+ ¤ ρ(xc ) ec + MB γ ec 2 /2.

This completes the proof with KB = 1 + MB γ/2.

5.4.3. The Shamanskii method. Alternation of a Newton step with a

sequence of chord steps leads to a class of high-order methods, that is, methods

that converge q-superlinearly with q-order larger than 2. We follow [18] and

name this method for Shamanskii [174], who considered the ¬nite di¬erence

Jacobian formulation of the method from the point of view of e¬ciency. The

method itself was analyzed in [190]. Other high-order methods, especially for

problems in one dimension, are described in [190] and [146].

We can describe the transition from xc to x+ by

= xc ’ F (xc )’1 F (xc ),

y1

yj+1 = yj ’ F (xc )’1 F (yj ) for 1 ¤ j ¤ m ’ 1,

(5.13)

x+ = ym .

Note that m = 1 is Newton™s method and m = ∞ is the chord method with

{yj } playing the role of the chord iterates.

Algorithmically a second loop surrounds the factor/solve step. The inputs

for the Shamanskii method are the same as for Newton™s method or the chord

method except for the addition of the parameter m. Note that we can overwrite

xc with the sequence of yj ™s. Note that we apply the termination test after

computation of each yj .

Algorithm 5.4.1. sham(x, F, „, m)

1. r0 = F (x)

2. Do while F (x) > „r r0 + „a

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

79

NEWTON™S METHOD

(a) Compute F (x)

(b) Factor F (x) = LU .

(c) for j = 1, . . . m

i. Solve LU s = ’F (x)

ii. x = x + s

iii. Evaluate F (x).

iv. If F (x) ¤ „r r0 + „a exit.

The convergence result is a simple application of Theorem 5.4.2.

Theorem 5.4.5. Let the standard assumptions hold and let m ≥ 1 be

given. Then there are KS > 0 and δ > 0 such that if x0 ∈ B(δ) the Shamanskii

iterates converge q-superlinearly to x— with q-order m + 1 and

m+1

(5.14) en+1 ¤ KS en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and that the conclusions

of Theorem 5.4.2 hold. Then if xn ∈ B(δ) all the intermediate iterates {yj }

are in B(δ) by Theorem 5.4.2. In fact, if we set y1 = xn , (5.8) implies that for

1¤j¤m

j

yj ’x— ¤ KC xn ’x— yj’1 ’x— = KC en yj’1 ’x— ¤ . . . ¤ KC en j+1

.

Hence xn+1 ∈ B(δ). Setting j = m in the above inequality completes the

proof.

The advantage of the Shamanskii method over Newton™s method is

that high q-orders can be obtained with far fewer Jacobian evaluations or

factorizations. Optimal choices of m as a function of N can be derived

under assumptions consistent with dense matrix algebra [18] by balancing the

O(N 3 ) cost of a matrix factorization with the cost of function and Jacobian

evaluation. The analysis in [18] and the expectation that, if the initial iterate is

su¬ciently accurate, only a small number of iterations will be taken, indicate

that the chord method is usually the best option for very large problems.

Algorithm nsol is based on this idea and using an idea from [114] computes

and factors the Jacobian only until an estimate of the q-factor for the linear

convergence rate is su¬ciently low.

5.4.4. Di¬erence approximation to F . Assume that we compute

F (x) + (x) instead of F (x) and attempt to approximate the action of F

on a vector by a forward di¬erence. We would do this, for example, when

building an approximate Jacobian. Here the jth column of F (x) is F (x)ej

where ej is the unit vector with jth component 1 and other components 0.

A forward di¬erence approximation to F (x)w would be

F (x + hw) + (x + hw) ’ F (x) ’ (x)

.

h

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

80 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Assume that (x) ¤ ¯. Then

F (x + hw) + (x + hw) ’ F (x) ’ (x)

F (x)w ’ = O(h + ¯/h).

h

√

The quantity inside the O-term is minimized when h = ¯. This means, for

instance, that very small di¬erence steps h can lead to inaccurate results. In

the special case where (x) is a result of ¬‚oating-point roundo¬ in full precision

(¯ ≈ 10’15 ), the analysis here indicates that h ≈ 10’7 is a reasonable choice.

The choice h ≈ 10’15 in this case can lead to disaster (try it!). Here we have

made the implicit assumption that x and w are of roughly the same size. If

this is not the case, h should be scaled to re¬‚ect that. The choice

h = ¯1/2 x / w

re¬‚ects all the discussion above.

A more important consequence of this analysis is that the choice of step

size in a forward di¬erence approximation to the action of the Jacobian on

a vector must take into account the error in the evaluation of F . This can

become very important if part of F is computed from measured data.

If F (x) has already been computed, the cost of a forward di¬erence

approximation of F (x)w is an additional evaluation of F (at the point x+hw).

Hence the evaluation of a full ¬nite di¬erence Jacobian would cost N function

evaluations, one for each column of the Jacobian. Exploitation of special

structure could reduce this cost.

The forward di¬erence approximation to the directional derivative is not

a linear operator in w. The reason for this is that the derivative has been

approximated, and so linearity has been lost. Because of this we must

carefully specify what we mean by a di¬erence approximation to the Jacobian.

De¬nition 5.4.1 below speci¬es the usual choice.

Definition 5.4.1. Let F be de¬ned in a neighborhood of x ∈ RN .

(∇h F )(x) is the N — N matrix whose jth column is given by

±

F (x + h x ej ) ’ F (x)

x=0

hx

(∇h F )(x)j =

F (he ) ’ F (x)

j

x=0

h

We make a similar de¬nition of the di¬erence approximation of the

directional derivative.

Definition 5.4.2. Let F be de¬ned in a neighborhood of x ∈ RN and let

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

81

NEWTON™S METHOD

w ∈ RN . We have

±

0, w = 0,

F (x + h x w/ w ) ’ F (x)

w , w, x = 0,

(5.15)Dh F (x : w) = hx

F (hw/ w ) ’ F (x)

w , x = 0, w = 0.

h

The standard assumptions and the Banach Lemma imply the following

result.

¯

Lemma 5.4.1. Let the standard assumptions hold. Then there are δ, ¯, h >

¯

0 such that if x ∈ B(δ), h ¤ h, and (x) ¤ ¯ for all x ∈ B(δ), then

∇h (F + )(x) is nonsingular for all x ∈ B(δ) and there is MF > 0 such that

F (x) ’ ∇h (F + )(x) ¤ MF (h + ¯/h).

If we assume that the forward di¬erence step has been selected so that

√ √

h = O( ¯), then ∆(x) = O( ¯) by Lemma 5.4.1. Theorem 5.4.1 implies the

following result.

Theorem 5.4.6. Let the standard assumptions hold. Then there are

positive δ, ¯, and KD such that if xc ∈ B(δ), (x) ¤ ¯ for all x ∈ B(δ),

and there is M’ > 0 such that

√

hc > M’ ¯

then

x+ = xc ’ ∇hc (F + )(xc )’1 (F (xc ) + (xc ))

satis¬es

e+ ¤ KD (¯ + ( ec + hc ) ec ).

Note that Theorem 5.4.6 does not imply that an iteration will converge to

x— .Even if is entirely due to ¬‚oating-point roundo¬, there is no guarantee

that (xn ) ’ 0 as xn ’ x— . Hence, one should expect the sequence { en }

to stagnate and cease to decrease once√ en ≈ ¯. Therefore in the early

phases of the iteration, while en >> ¯, the progress of the iteration will

be indistinguishable from Newton™s method as implemented with an exact

√

Jacobian. When en ¤ ¯, Theorem 5.4.6 says that en+1 = O(¯). Hence

one will see quadratic convergence until the error in F admits no further

reduction.

A di¬erence approximation to a full Jacobian does not use any special

structure of the Jacobian and will probably be less e¬cient than a hand coded

analytic Jacobian. If information on the sparsity pattern of the Jacobian is

available it is sometimes possible [47], [42], to evaluate ∇h F with far fewer than

N evaluations of F . If the sparsity pattern is such that F can be reduced to

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

82 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

a block triangular form, the entire solution process can take advantage of this

structure [59].

For problems in which the Lipschitz constant of F is large, the error in

the di¬erence approximation will be large, too. Moreover, Dh F (x : w) is not,

in general, a linear function of w and it is usually the case that

(∇h F (x))w = Dh F (x : w).

If F (x— ) is ill conditioned or the Lipschitz constant of F is large, the

di¬erence between them may be signi¬cant and a di¬erence approximation to

a directional derivative, which uses the size of w, is likely to be more accurate

than ∇h F (x)w.

While we used full matrix approximations to numerical Jacobians in all

the examples reported here, they should be used with caution.

5.4.5. The secant method. The results in this section are for single

equations f (x) = 0, where f is a real-valued function of one real variable.

We assume for the sake of simplicity that there are no errors in the evaluation

of f , i.e., (x) = 0. The standard assumptions, then, imply that f (x— ) = 0.

There is no reason to use a di¬erence approximation to f for equations in a

single variable because one can use previously computed data to approximate

f (xn ) by

f (xn ) ’ f (xn’1 )

an = .

xn ’ xn’1

The resulting method is called the secant method and is due to Newton

[140], [137]. We will prove that the secant method is locally q-superlinearly

convergent if the standard assumptions hold.

In order to start, one needs two approximations to x— , x0 and x’1 . The

local convergence theory can be easily described by Theorem 5.4.1. Let xc be

the current iterate and x’ the previous iterate. We have (xc ) = 0 and

f (xc ) ’ f (x’ )

(5.16) ∆(xc ) = ’ f (xc ).

xc ’ x’

We have the following theorem.

Theorem 5.4.7. Let the standard assumptions hold with N = 1. Then

there is δ > 0 such that if x0 , x’1 ∈ B(δ) and x0 = x’1 then the secant iterates

converge q-superlinearly to x— .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of

Theorem 5.4.1 hold. Assume that x’1 , x0 , . . . , xn ∈ B(δ). If xn = x— for some

¬nite n, we are done. Otherwise xn = xn’1 . If we set s = xn ’ xn’1 we have

by (5.16) and the fundamental theorem of calculus

1

∆(xn ) = (f (xn’1 + ts) ’ f (xn )) dt

0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

83

NEWTON™S METHOD

and hence

γ(|en | + |en’1 |)

|∆(xn )| ¤ γ|xn’1 ’ xn |/2 ¤ .

2

Applying Theorem 5.4.1 implies

¯

|en+1 | ¤ K((1 + γ/2)|en |2 + γ|en ||en’1 |/2).

(5.17)

If we reduce δ so that

¯

K(1 + γ)δ = · < 1

then xn ’ x— q-linearly. Therefore

|en+1 | ¯

¤ K((1 + γ/2)|en | + γ|en’1 |/2) ’ 0

|en |

as n ’ ∞. This completes the proof.

The secant method and the classical bisection (see Exercise 5.7.8) method

are the basis for the popular method of Brent [17] for the solution of single

equations.

5.5. The Kantorovich Theorem

In this section we present the result of Kantorovich [107], [106], which states

that if the standard assumptions “almost” hold at a point x0 , then there

is a root and the Newton iteration converges r-quadratically to that root.

This result is of use, for example, in proving that discretizations of nonlinear

di¬erential and integral equations have solutions that are near to that of the

continuous problem.

We require the following assumption.

Assumption 5.5.1. There are β, ·, r, and γ with β·γ ¤ 1/2 and x0 ∈ RN

¯

such that

1. F is di¬erentiable at x0 ,

F (x0 )’1 ¤ β, and F (x0 )’1 F (x0 ) ¤ ·.

2. F is Lipschitz continuous with Lipschitz constant γ in a ball of radius

r ≥ r’ about x0 where

¯

√

1 ’ 1 ’ 2β·γ

r’ = .

βγ

We will let B0 be the closed ball of radius r’ about x0

B0 = {x | x ’ x0 ¤ r’ }.

We do not prove the result in its full generality and refer to [106], [145],

[57], and [58] for a complete proof and for discussions of related results.

However, we give a simpli¬ed version of a Kantorovich-like result for the chord

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

84 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

method to show how existence of a solution can be derived from estimates

on the function and Jacobian. Unlike the chord method results in [106],

[145], and [57], Theorem 5.5.1 assumes a bit more than Assumption 5.5.1 but

has a simple proof that uses only the contraction mapping theorem and the

fundamental theorem of calculus and obtains q-linear convergence instead of

r-linear convergence.

Theorem 5.5.1. Let Assumption 5.5.1 hold and let

(5.18) βγ· < 1/2.

Then there is a unique root x— of F in B0 , the chord iteration with x0 as the

initial iterate converges to x— q-linearly with q-factor βγr’ , and xn ∈ B0 for

all n. Moreover x— is the unique root of F in the ball of radius

√

1 + 1 ’ 2β·γ

min r,¯

βγ

about x0 .

Proof. We will show that the map

φ(x) = x ’ F (x0 )’1 F (x)

is a contraction on the set

S = {x | x ’ x0 ¤ r’ }.

By the fundamental theorem of calculus and Assumption 5.5.1 we have for

all x ∈ S

F (x0 )’1 F (x) = F (x0 )’1 F (x0 )

1

)’1

+F (x0 F (x0 + t(x ’ x0 ))(x ’ x0 ) dt

0

(5.19)

= F (x0 )’1 F (x0 ) + x ’ x0

1

)’1

+F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt

0

and

φ(x) ’ x0 = ’F (x0 )’1 F (x0 )

1

)’1

’F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt.

0

Hence,

2

φ(x) ’ x0 ¤ · + βγr’ /2 = r’ .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

85

NEWTON™S METHOD

and φ maps S into itself.

To show that φ is a contraction on S and to prove the q-linear convergence

we will show that

φ(x) ’ φ(y) ¤ βγr’ x ’ y

for all x, y ∈ S. If we do this, the contraction mapping theorem (Theo-

rem 4.2.1) will imply that there is a unique ¬xed point x— of φ in S and

that xn ’ x— q-linearly with q-factor βγr’ . Clearly x— is a root of F because

it is a ¬xed point of φ. We know that βγr’ < 1 by our assumption that

β·γ < 1/2.

Note that for all x ∈ B0

φ (x) = I ’ F (x0 )’1 F (x) = F (x0 )’1 (F (x0 ) ’ F (x))

and hence

φ (x) ¤ βγ x ’ x0 ¤ βγr’ < 1.

Therefore, for all x, y ∈ B0 ,

1

φ(x) ’ φ(y) = φ (y + t(x ’ y))(x ’ y) dt ¤ βγr’ x ’ y .

0

This proves the convergence result and the uniqueness of x— in B0 .

To prove the remainder of the uniqueness assertion let

√

1 + 1 ’ 2β·γ

r+ =

βγ

and note that r+ > r’ because β·γ < 1/2. If r = r’ , then we are done by the

¯

— in B . Hence we may assume that r > r . We must show

uniqueness of x ¯

0 ’

that if x is such that

r’ < x ’ x0 < min(¯, r+ )

r

then F (x) = 0. Letting r = x ’ x0 we have by (5.19)

F (x0 )’1 F (x) ≥ r ’ · ’ βγr2 /2 > 0

because r’ < r < r+ . This completes the proof.

The Kantorovich theorem is more precise than Theorem 5.5.1 and has a

very di¬erent proof. We state the theorem in the standard way [145], [106],

[63], using the r-quadratic estimates from [106].

Theorem 5.5.2. Let Assumption 5.5.1 hold. Then there is a unique root

x— of F in B0 , the Newton iteration with x0 as the initial iterate converges to

x— , and xn ∈ B0 for all n. x— is the unique root of F in the ball of radius

√

1 + 1 ’ 2β·γ

min r, ¯

βγ

about x0 and the errors satisfy

n

(2β·γ)2

(5.20) en ¤ .

2n βγ

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

86 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If β·γ < 1/2 then (5.20) implies that the convergence is r-quadratic (see

Exercise 5.7.16).

5.6. Examples for Newton™s method

In the collection of MATLAB codes we provide an implementation nsol

of Newton™s method, the chord method, the Shamanskii method, or a

combination of all of them based on the reduction in the nonlinear residual.

This latter option decides if the Jacobian should be recomputed based on the

ratio of successive residuals. If

F (x+ ) / F (xc )

is below a given threshold, the Jacobian is not recomputed. If the ratio is too

large, however, we compute and factor F (x+ ) for use in the subsequent chord

steps. In addition to the inputs of Algorithm sham, a threshold for the ratio

of successive residuals is also input. The Jacobian is recomputed and factored

if either the ratio of successive residuals exceeds the threshold 0 < ρ < 1 or

the number of iterations without an update exceeds m. The output of the

MATLAB implementation of nsol is the solution, the history of the iteration

stored as the vector of l∞ norms of the nonlinear residuals, and an error ¬‚ag.

nsol uses diffjac to approximate the Jacobian by ∇h F with h = 10’7 .

While our examples have dense Jacobians, the ideas in the implementation

of nsol also apply to situations in which the Jacobian is sparse and direct

methods for solving the equation for the Newton step are necessary. One would

still want to minimize the number of Jacobian computations (by the methods

of [47] or [42], say) and sparse matrix factorizations. In Exercise 5.7.25, which

is not easy, the reader is asked to create a sparse form of nsol.

Algorithm 5.6.1. nsol(x, F, „, m, ρ)

1. rc = r0 = F (x)

2. Do while F (x) > „r r0 + „a

(a) Compute ∇h F (x) ≈ F (x)

(b) Factor ∇h F (x) = LU .

(c) is = 0, σ = 0

(d) Do While is < m and σ ¤ ρ

i. is = is + 1

ii. Solve LU s = ’F (x)

iii. x = x + s

iv. Evaluate F (x)

v. r+ = F (x) , σ = r+ /rc , rc = r+

vi. If F (x) ¤ „r r0 + „a exit.

In the MATLAB code the iteration is terminated with an error condition

if σ ≥ 1 at any point. This indicates that the local convergence analysis in this

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

87

NEWTON™S METHOD

section is not applicable. In Chapter 8 we discuss how to recover from this.

nsol is based on the assumption that a di¬erence Jacobian will be used. In

Exercise 5.7.21 you are asked to modify the code to accept an analytic Jacobian.

Many times the Jacobian can be computed e¬ciently by reusing results that

are already available from the function evaluation. The parameters m and ρ

are assigned the default values of 1000 and .5. With these default parameters

the decision to update the Jacobian is made based entirely on the reduction in

the norm of the nonlinear residual. nsol allows the user to specify a maximum

number of iterations; the default is 40.

The Chandrasekhar H-equation. The Chandrasekhar H-equation, [41],

[30],

’1

1 µH(ν) dν

c

(5.21) F (H)(µ) = H(µ) ’ 1 ’ = 0,

2 µ+ν

0

is used to solve exit distribution problems in radiative transfer.

We will discretize the equation with the composite midpoint rule. Here we

approximate integrals on [0, 1] by

N

1 1

f (µ) dµ ≈ f (µj ),

N

0 j=1

where µi = (i ’ 1/2)/N for 1 ¤ i ¤ N . The resulting discrete problem is

« ’1

N

c µi xj

F (x)i = xi ’ 1 ’

(5.22) .

2N µ + µj

j=1 i

It is known [132] that both (5.21) and the discrete analog (5.22) have two

solutions for c ∈ (0, 1). Only one of these solutions has physical meaning,

however. Newton™s method, with the 0 function or the constant function with

value 1 as initial iterate, will ¬nd the physically meaningful solution [110].

The standard assumptions hold near either solution [110] for c ∈ [0, 1). The

discrete problem has its own physical meaning [41] and we will attempt to

solve it to high accuracy. Because H has a singularity at µ = 0, the solution of

the discrete problem is not even a ¬rst-order accurate approximation to that

of the continuous problem.

In the computations reported here we set N = 100 and c = .9. We used the

function identically one as initial iterate. We computed all Jacobians with the

MATLAB code diffjac. We set the termination parameters „r = „a = 10’6 .

We begin with a comparison of Newton™s method and the chord method.

We present some iteration statistics in both tabular and graphical form. In

Table 5.1 we tabulate the iteration counter n, F (xn ) ∞ / F (x0 ) ∞ , and the

ratio

Rn = F (xn ) ∞ / F (xn’1 ) ∞

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

88 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Table 5.1

Comparison of Newton and chord iteration.

n F (xn ) ∞ / F (x0 ) Rn F (xn ) ∞ / F (x0 ) Rn

∞ ∞

0 1.000e+00 1.000e+00

1 1.480e-01 1.480e-01 1.480e-01 1.480e-01

2 2.698e-03 1.823e-02 3.074e-02 2.077e-01

3 7.729e-07 2.865e-04 6.511e-03 2.118e-01

4 1.388e-03 2.132e-01

5 2.965e-04 2.136e-01

6 6.334e-05 2.136e-01

7 1.353e-05 2.136e-01

8 2.891e-06 2.136e-01

for n ≥ 1 for both Newton™s method and the chord method. In Fig. 5.1

the solid curve is a plot of F (xn ) ∞ / F (x0 ) ∞ for Newton™s method and

the dashed curve a plot of F (xn ) ∞ / F (x0 ) ∞ for the chord method. The

concave iteration track is the signature of superlinear convergence, the linear

track indicates linear convergence. See Exercise 5.7.15 for a more careful

examination of this concavity. The linear convergence of the chord method

can also be seen in the convergence of the ratios Rn to a constant value.

For this example the MATLAB flops command returns an fairly accurate

indicator of the cost. The Newton iteration required over 8.8 million ¬‚oating-

point operations while the chord iteration required under 3.3 million. In

Exercise 5.7.18 you are asked to obtain timings in your environment and will

see how the ¬‚oating-point operation counts are related to the actual timings.

The good performance of the chord method is no accident. If we think

of the chord method as a preconditioned nonlinear Richardson iteration with

the initial Jacobian playing the role of preconditioner, then the chord method

should be much more e¬cient than Newton™s method if the equations for the

steps can be solved cheaply. In the case considered here, the cost of a single

evaluation and factorization of the Jacobian is far more than the cost of the

entire remainder of the chord iteration.

The ideas in Chapters 6 and 7 show how the low iteration cost of the

chord method can be combined with the small number of iterations of a

superlinearly convergent nonlinear iterative method. In the context of this

particular example, the cost of the Jacobian computation can be reduced by

analytic evaluation and reuse of data that have already been computed for the

evaluation of F . The reader is encouraged to do this in Exercise 5.7.21.

In the remaining examples, we plot, but do not tabulate, the iteration

statistics. In Fig. 5.2 we compare the Shamanskii method with m = 2 with

Newton™s method. The Shamanskii computation required under 6 million

¬‚oating-point operations, as compared with over 8.8 for Newton™s method.

However, for this problem, the simple chord method is the most e¬cient.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

89

NEWTON™S METHOD

0

10

-1

10

Relative Nonlinear Residual

-2

10

-3

10

-4

10

-5

10

-6

10

-7

10

0 1 2 3 4 5 6 7 8

Iterations

Fig. 5.1. Newton and chord methods for H-equation, c = .9.

0

10

-1

10

Relative Nonlinear Residual

-2

10

-3

10

-4

10

-5

10

-6

10

-7

10

0 0.5 1 1.5 2 2.5 3 3.5 4

Iterations

Fig. 5.2. Newton and Shamanskii method for H-equation, c = .9.

The hybrid approach in nsol with the default parameters would be the chord

method for this problem.

We now reconsider the H-equation with c = .9999. For this problem the

Jacobian at the solution is nearly singular [110]. Aside from c, all iteration

parameters are the same as in the previous example. While Newton™s method

converges in 7 iterations and requires 21 million ¬‚oating-point operations, the

chord method requires 188 iterations and 10.6 million ¬‚oating-point operations.

The q-factor for the chord method in this computation was over .96, indicating

very slow convergence. The hybrid method in nsol with the default parameters

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

90 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

computes the Jacobian four times during the course of the iteration, converges

in 14 iterations, and requires 12 million ¬‚oating-point operations. In Fig. 5.3

we compare the Newton iteration with the hybrid.

0

10

-1

10

Relative Nonlinear Residual

-2

10

-3

10

-4

10

-5

10

-6

10

0 2 4 6 8 10 12 14

Iterations

Fig. 5.3. Newton and hybrid methods for H-equation c = .9999.

One should approach plots of convergence histories with some caution. The

theory for the chord method asserts that the error norms converge q-linearly to

zero. This implies only that the nonlinear residual norms { F (xn ) } converge

r-linearly to zero. While the plots indicate q-linear convergence of the nonlinear

residuals, the theory does not, in general, support this. In practice, however,

plots like those in this section are common.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

91

NEWTON™S METHOD

5.7. Exercises on Newton™s method

In some of the exercises here, and in the rest of this part of the book, you will

be asked to plot or tabulate iteration statistics. When asked to do this, for

each iteration tabulate the iteration counter, the norm of F , and the ratio

of F (xn ) / F (xn’1 ) for n ≥ 1. A better alternative in the MATLAB

environment is to use the semilogy command to plot the norms. When

one does this, one can visually inspect the plot to determine superlinear

convergence (concavity) without explicitly computing the ratios. Use the l∞

norm.

5.7.1. A function G is said to be H¨lder continuous with exponent ± in „¦ if

o

± for all x, y ∈ „¦. Show that if the Lipschitz

G(x) ’ G(y) ¤ K x ’ y

continuity condition on F in the standard assumptions is replaced by

H¨lder continuity with exponent ± > 0 that the Newton iterates converge

o

with q-order 1 + ±.

5.7.2. Can the performance of the Newton iteration be improved by a linear

change of variables? That is, for nonsingular N — N matrices A and

B, can the Newton iterates for F (x) = 0 and AF (Bx) = 0 show any

performance di¬erence when started at the same initial iterate? What

about the chord method?

5.7.3. Let ∇h F be given by De¬nition 5.4.1. Given the standard assumptions,

prove that the iteration given by

xn+1 = xn ’ (∇hn F (xn ))’1 F (xn )

is locally q-superlinearly convergent if hn ’ 0. When is it locally q-

quadratically convergent?

5.7.4. Suppose F (xn ) is replaced by ∇hn F (xn ) in the Shamanskii method.

Discuss how hn must be decreased as the iteration progresses to preserve

the q-order of m + 1 [174].

5.7.5. In what way is the convergence analysis of the secant method changed if

there are errors in the evaluation of f ?

5.7.6. Prove that the secant method converges r-superlinearly with r-order

√

( 5 + 1)/2. This is easy.

5.7.7. Show that the secant method converges q-superlinearly with q-order

√

( 5 + 1)/2.

5.7.8. The bisection method produces a sequence of intervals (xk , xk ) that r

l

contain a root of a function of one variable. Given (xk , xk ) with r

l

k )f (xk ) < 0 we replace one of the endpoints with y = (xk + xk )/2

f (xl r r

l

k+1 k+1 ) < 0. If f (y) = 0, we terminate the iteration.

to force f (xl )f (xr

Show that the sequence xk = (xk + xk )/2 is r-linearly convergent if f is

r

l

0 , x0 ) such that f (x0 )f (x0 ) < 0.

continuous and there exists (xl r r

l

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

92 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.9. Write a program that solves single nonlinear equations with Newton™s

method, the chord method, and the secant method. For the secant

method, use x’1 = .99x0 . Apply your program to the following

function/initial iterate combinations, tabulate or plot iteration statistics,

and explain your results.

1. f (x) = cos(x) ’ x, x0 = .5

2. f (x) = tan’1 (x), x0 = 1

3. f (x) = sin(x), x0 = 3

4. f (x) = x2 , x0 = .5

5. f (x) = x2 + 1, x0 = 10

5.7.10. For scalar functions of one variable a quadratic model of f about a point

xc is

mc (x) = f (xc ) + f (xc )(x ’ xc ) + f (xc )(x ’ xc )2 /2.

Consider the iterative method that de¬nes x+ as a root of mc . What

problems might arise in such a method? Resolve these problems and

implement your algorithm. Test it on the problems in Exercise 5.7.9. In

what cases is this algorithm superior to Newton™s method? For hints and

generalizations of this idea to functions of several variables, see [170]. Can

the ideas in [170] be applied to some of the high-order methods discussed

in [190] for equations in one unknown?

5.7.11. Show that if f is a real function of one real variable, f is Lipschitz

continuous, and f (x— ) = f (x— ) = 0 but f (x— ) = 0 then the iteration

xn+1 = xn ’ 2f (xn )/f (xn )

converges locally q-quadratically to x— provided x0 is su¬ciently near to

x— , but not equal to x— [171].

5.7.12. Show that if f is a real function of one real variable, the standard

assumptions hold, f (x— ) = 0 and f (x) is Lipschitz continuous, then

the Newton iteration converges cubically (q-superlinear with q-order 3).

5.7.13. Show that if f is a real function of one real variable, the standard

assumptions hold, x0 is su¬ciently near x— , and f (x0 )f (x0 ) > 0, then

the Newton iteration converges monotonically to x— . What if f (x— ) = 0?

How can you extend this result if f (x— ) = f (x— ) = . . . f (k) (x— ) = 0 =

f (k+1) (x— )? See [131].

5.7.14. How would you estimate the q-order of a q-superlinearly convergent

sequence? Rigorously justify your answer.

5.7.15. If xn ’ x— q-superlinearly with q-order ± > 1, show that log( en ) is a

concave function of n in the sense that

log( en+1 ) ’ log( en )

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

93

NEWTON™S METHOD

is a decreasing function of n for n su¬ciently large. Is this still the case

if the convergence is q-superlinear but there is no q-order? What if the

convergence is q-linear?

5.7.16. Show that the sequence on the right side of (5.20) is r-quadratically (but

not q-quadratically) convergent to 0 if β·γ < 1/2.

5.7.17. Typically (see [89], [184]) the computed LU factorization of F (x) is the

exact LU factorization of a nearby matrix.

LU = F (x) + E.

Use the theory in [89], [184] or another book on numerical linear algebra

and Theorem 5.4.1 to describe how this factorization error a¬ects the

convergence of the Newton iteration. How does the analysis for the QR

factorization di¬er?

5.7.18. Duplicate the results on the H-equation in § 5.6. Try di¬erent values of

N and c. How does the performance of the methods di¬er as N and c

change? Compare your results to the tabulated values of the H function

in [41] or [14]. Compare execution times in your environment. Tabulate

or plot iteration statistics. Use the MATLAB cputime command to see

how long the entire iteration took for each N, c combination and the

MATLAB flops command to get an idea for the number of ¬‚oating-

point operations required by the various methods.

5.7.19. Solve the H-equation with c = 1 and N = 100. Explain your results

(see [50] and [157]). Does the trick in Exercise 5.7.11 improve the

convergence? Try one of the approaches in [51], [49], or [118] for

acceleration of the convergence. You might also look at [90], [170], [75],

[35] and [155] for more discussion of this. Try the chord method and

compare the results to part 4 of Exercise 5.7.9 (see [52]).

5.7.20. Solve the H-equation with N = 100, c = .9 and c = .9999 with ¬xed-point

iteration. Compare timings and ¬‚op counts with Newton™s method, the

chord method, and the algorithm in nsol.

5.7.21. Compute by hand the Jacobian of F , the discretized nonlinearity from

the H equation in (5.22). Prove that F can be computed at a cost of

less than twice that of an evaluation of F . Modify nsol to accept this

analytic Jacobian rather than evaluate a di¬erence Jacobian. How do

the execution times and ¬‚op counts change?

5.7.22. Modify dirder by setting the forward di¬erence step h = 10’2 . How

does this change a¬ect the convergence rate observations that you have

made?

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

94 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.23. Add noise to your function evaluation with a random number generator,

such as the the MATLAB rand function. If the size of the noise is

O(10’4 ) and you make the appropriate adjustments to dirder, how are

the convergence rates di¬erent? What is an appropriate termination

criterion? At what point does the iteration stagnate?

5.7.24. Assume that the standard assumptions hold, that the cost of a function

evaluation is O(N 2 ) ¬‚oating-point operations, the cost of a Jacobian is

O(N ) function evaluations, and that x0 is near enough to x— so that the

Newton iteration converges q-quadratically to x— . Show that the number

n of Newton iterations needed to obtain en ¤ e0 is O(log( )) and

that the number of ¬‚oating-point operations required is O(N 3 log( )).

What about the chord method?

5.7.25. Develop a sparse version of nsol for problems with tridiagonal Jacobian.

In MATLAB, for example, you could modify diffjac to use the

techniques of [47] and nsol to use the sparse matrix factorization in

MATLAB. Apply your code to the central di¬erence discretization of the

two-point boundary value problem

’u = sin(u) + f (x), u(0) = u(1) = 0.

Here, the solution is u— (x) = x(1 ’ x), and

f (x) = 2 ’ sin(x(1 ’ x)).

Let u0 = 0 be the initial iterate.

5.7.26. Suppose you try to ¬nd an eigenvector-eigenvalue pair (φ, ») of an N —N

matrix A by solving the system of N + 1 nonlinear equations

Aφ ’ »φ 0

(5.23) F (φ, ») = =

φT φ ’ 1 0

for the vector x = (φT , »)T ∈ RN +1 . What is the Jacobian of this

system? If x = (φ, ») ∈ RN +1 is an eigenvector-eigenvalue pair, when is

F (x) nonsingular? Relate the application of Newton™s method to (5.23)

to the inverse power method. See [150] for more development of this

idea.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 6

Inexact Newton Methods

Theorem 5.4.1 describes how errors in the derivative/function a¬ect the

progress in the Newton iteration. Another way to look at this is to ask how

an approximate solution of the linear equation for the Newton step a¬ects the

iteration. This was the view taken in [55] where inexact Newton methods in

which the step satis¬es

(6.1) F (xc )s + F (xc ) ¤ ·c F (xc )

are considered. Any approximate step is accepted provided that the relative

residual of the linear equation is small. This is quite useful because conditions

like (6.1) are precisely the small linear residual termination conditions for

iterative solution of the linear system for the Newton step. Such methods are

not new. See [145] and [175], for example, for discussion of these ideas in the

context of the classical stationary iterative methods. In most of this chapter

we will focus on the approximate solution of the equation for the Newton step

by GMRES, but the other Krylov subspace methods discussed in Chapters 2

and 3 and elsewhere can also be used. We follow [69] and refer to the term ·c

on the right hand side of (6.1) as the forcing term.

6.1. The basic estimates

We will discuss speci¬c implementation issues in § 6.2. Before that we will

give the basic result from [55] to show how a step that satis¬es (6.1) a¬ects

the convergence. We will present this result in two parts. In § 6.1.1 we give a

straightforward analysis that uses the techniques of Chapter 5. In § 6.1.2 we

show how the requirements of the simple analysis can be changed to admit a

more aggressive (i.e., larger) choice of the parameter ·.

6.1.1. Direct analysis. The proof and application of Theorem 6.1.1 should

be compared to that of Theorem 5.1.1 and the other results in Chapter 5.

Theorem 6.1.1. Let the standard assumptions hold. Then there are δ and

KI such that if xc ∈ B(δ), s satis¬es (6.1), and x+ = xc + s then

(6.2) e+ ¤ KI ( ec + ·c ) ec .

95

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

96 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and

Theorem 5.1.1 hold. To prove the ¬rst assertion (6.2) note that if2

r = ’F (xc )s ’ F (xc )

is the linear residual then

s + F (xc )’1 F (xc ) = ’F (xc )’1 r

and

e+ = ec + s = ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r.

(6.3)

Now, (6.1), (4.7), and (4.6) imply that

s + F (xc )’1 F (xc ) ¤ F (xc )’1 ·c F (xc )

¤ 4κ(F (x— ))·c ec .

Hence, using (6.3) and Theorem 5.1.1

¤ ec ’ F (xc )’1 F (xc ) + 4κ(F (x— ))·c ec

e+

2 + 4κ(F (x— ))·c ec ,

¤ K ec

where K is the constant from (5.2). If we set

KI = K + 4κ(F (x— )),

the proof is complete.

We summarize the implications of Theorem 6.1.1 for iterative methods in

the next result, which is a direct consequence of (6.2) and will su¬ce to explain

most computational observations.

Theorem 6.1.2. Let the standard assumptions hold. Then there are δ and

· such that if x0 ∈ B(δ), {·n } ‚ [0, · ], then the inexact Newton iteration

¯ ¯

xn+1 = xn + sn ,

where

F (xn )sn + F (xn ) ¤ ·n F (xn )

converges q-linearly to x— . Moreover

• if ·n ’ 0 the convergence is q-superlinear, and

p

• if ·n ¤ K· F (xn ) for some K· > 0 the convergence is q-superlinear

with q-order 1 + p.

2 Our

de¬nition of r is consistent with the idea that the residual in a linear equation is b ’ Ax. In

[55] r = F (xc )s + F (xc ).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

97

INEXACT NEWTON METHODS

Proof. Let δ be small enough so that (6.2) holds for xc ∈ B(δ). Reduce δ

and · if needed so that

¯

KI (δ + · ) < 1,

¯

where KI is from (6.2). Then if n ≥ 0 and xn ∈ B(δ) we have

en+1 ¤ KI ( en + ·n ) en ¤ KI (δ + · ) en < en .

¯

This proves q-linear convergence with a q-factor of KI (δ + · ).

¯

If ·n ’ 0 then q-superlinear convergence follows from the de¬nition. If

p

·n ¤ K· F (xn )

then we may use (4.7) and (6.2) to conclude

1’p

+ K· 2p F (x— ) p ) en 1+p

en+1 ¤ KI ( en

which completes the proof.

6.1.2. Weighted norm analysis. Since KI = O(κ(F (x— ))), one might

conclude from Theorem 6.1.2 that if F (x— ) is ill conditioned very small forcing

terms must be used. This is not the case and the purpose of this subsection

is to describe more accurately the necessary restrictions on the forcing terms.

The results here di¬er from those in § 6.1.1 in that no restriction is put on

the sequence {·n } ‚ [0, 1) other than requiring that 1 not be an accumulation

point.

Theorem 6.1.3. Let the standard assumptions hold. Then there is δ such

that if xc ∈ B(δ), s satis¬es (6.1), x+ = xc + s, and ·c ¤ · < · < 1, then

¯

F (x— )e+ ¤ · F (x— )ec .

(6.4) ¯

Proof. To prove (6.4) note that Theorem 4.0.1 implies that

2

γ ec

—

F (xc ) ¤ F (x )ec + .

2

Since

ec = F (x— )’1 F (x— )ec ¤ F (x— )’1 F (x— )ec

we have, with

γ F (x— )’1

M0 =

2

F (xc ) ¤ (1 + M0 δ) F (x— )ec .

(6.5)

Now,

F (x— )e+ = F (x— )(ec + s)

= F (x— )(ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

98 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

By Theorem 5.1.1

F (x— )(ec ’ F (xc )’1 F (xc )) ¤ K F (x— ) ec 2 .

Hence,

F (x— )e+ ¤ F (x— )F (xc )’1 r + K F (x— ) ec 2 .

(6.6)

Since

F (x— )F (xc )’1 r ¤ r + (F (x— ) ’ F (xc ))F (xc )’1 r

¤ (1 + 2γ F (x— )’1 ec ) r ,