<<

. 4
( 7)



>>

point operations. Evaluation of F by ¬nite di¬erences should be expected to
cost N times the cost of an evaluation of F because each column in F requires
an evaluation of F to form the di¬erence approximation. Hence the cost of a
Newton step may be roughly estimated as N + 1 evaluations of F and O(N 3 )
¬‚oating-point operations. In many cases F can be computed more e¬ciently,
accurately, and directly than with di¬erences and the analysis above for the
cost of a Newton iterate is very pessimistic. See Exercise 5.7.21 for an example.
For general nonlinearities or general purpose codes such as the MATLAB code
nsol from the collection, there may be little alternative to di¬erence Jacobians.
In the future, automatic di¬erentiation (see [94] for a collection of articles on
this topic) may provide such an alternative.
For de¬niteness in the description of Algorithm newton we use an LU
factorization of F . Any other appropriate factorization such as QR or
Cholesky could be used as well. The inputs of the algorithm are the initial
iterate x, the nonlinear map F , and a vector of termination tolerances
„ = („r , „a ) ∈ R2 .
Algorithm 5.3.1. newton(x, F, „ )
1. r0 = F (x)

2. Do while F (x) > „r r0 + „a
(a) Compute F (x)
(b) Factor F (x) = LU .
(c) Solve LU s = ’F (x)
(d) x = x + s
(e) Evaluate F (x).

One approach to reducing the cost of items 2a and item 2b in the Newton
iteration is to move them outside of the main loop. This means that the linear
approximation of F (x) = 0 that is solved at each iteration has derivative
determined by the initial iterate.

x+ = xc ’ F (x0 )’1 F (xc ).

This method is called the chord method. The inputs are the same as for
Newton™s method.
Algorithm 5.3.2. chord(x, F, „ )
1. r0 = F (x)

2. Compute F (x)

3. Factor F (x) = LU .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




75
NEWTON™S METHOD

4. Do while F (x) > „r r0 + „a
(a) Solve LU s = ’F (x)
(b) x = x + s
(c) Evaluate F (x).

The only di¬erence in implementation from Newton™s method is that the
computation and factorization of the Jacobian are done before the iteration
is begun. The di¬erence in the iteration itself is that an approximation to
F (xc ) is used. Similar di¬erences arise if F (xc ) is numerically approximated
by di¬erences. We continue in the next section with an analysis of the e¬ects
of errors in the function and Jacobian in a general context and then consider
the chord method and di¬erence approximation to F as applications.

5.4. Errors in the function and derivative
Suppose that F and F are computed inaccurately so that F + and F + ∆
are used instead of F and F in the iteration. If ∆ is su¬ciently small, the
resulting iteration can return a result that is an O( ) accurate approximation
to x— . This is di¬erent from convergence and was called “local improvement”
in [65]. These issues are discussed in [201] as well. If, for example, is entirely
due to ¬‚oating-point roundo¬, there is no reason to expect that F (xn ) will
ever be smaller than in general. We will refer to this phase of the iteration
in which the nonlinear residual is no longer being reduced as stagnation of the
iteration.
Theorem 5.4.1. Let the standard assumptions hold. Then there are
¯
K > 0, δ > 0, and δ1 > 0 such that if xc ∈ B(δ) and ∆(xc ) < δ1 then

x+ = xc ’ (F (xc ) + ∆(xc ))’1 (F (xc ) + (xc ))

is de¬ned (i.e., F (xc ) + ∆(xc ) is nonsingular) and satis¬es
¯ 2
(5.5) e+ ¤ K( ec + ∆(xc ) ec + (xc ) ).

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and
Theorem 5.1.1 hold. Let

xN = xc ’ F (xc )’1 F (xc )
+

and note that

x+ = xN + (F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 )F (xc ) ’ (F (xc ) + ∆(xc ))’1 (xc ).
+

Lemma 4.3.1 and Theorem 5.1.1 imply
2 + 2 F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ) F (x— ) ec
e+ ¤ K ec
(5.6)
+ (F (xc ) + ∆(xc ))’1 (xc ) .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




76 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If
∆(xc ) ¤ F (x— )’1 ’1
/4
then Lemma 4.3.1 implies that

∆(xc ) ¤ F (xc )’1 ’1
/2

and the Banach Lemma implies that F (xc ) + ∆(xc ) is nonsingular and

(F (xc ) + ∆(xc ))’1 ¤ 2 F (xc )’1 ¤ 4 F (x— )’1 .

Hence

F (xc )’1 ’ (F (xc ) + ∆(xc ))’1 ¤ 8 F (x— )’1 2
∆(xc ) .

We use these estimates and (5.6) to obtain
2
+ 16 F (x— )’1 2
F (x— ) ∆(xc ) ec + 4 F (x— )’1
e+ ¤ K ec (xc ) .

Setting
¯
K = K + 16 F (x— )’1 2
F (x— ) + 4 F (x— )’1
completes the proof.
The remainder of this section is devoted to applications of Theorem 5.4.1.

5.4.1. The chord method. Recall that the chord method is given by

x+ = xc ’ F (x0 )’1 F (xc ).

In the language of Theorem 5.4.1

(xc ) = 0, ∆(xc ) = F (x0 ) ’ F (xc ).

Hence, if xc , x0 ∈ B(δ) ‚ „¦

(5.7) ∆(xc ) ¤ γ x0 ’ xc ¤ γ( e0 + ec ).

We apply Theorem 5.4.1 to obtain the following result.
Theorem 5.4.2. Let the standard assumptions hold. Then there are
KC > 0 and δ > 0 such that if x0 ∈ B(δ) the chord iterates converge q-linearly
to x— and
(5.8) en+1 ¤ KC e0 en .
Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of
Theorem 5.4.1 hold. Assume that xn ∈ B(δ). Combining (5.7) and (5.5)
implies
¯ ¯
en+1 ¤ K( en (1 + γ) + γ e0 ) en ¤ K(1 + 2γ)δ en .

Hence if δ is small enough so that
¯
K(1 + 2γ)δ = · < 1



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




77
NEWTON™S METHOD

then the chord iterates converge q-linearly to x— . Q-linear convergence implies
¯
that en < e0 and hence (5.8) holds with KC = K(1 + 2γ).
Another variation of the chord method is

x+ = xc ’ A’1 F (xc ),

where A ≈ F (x— ). Methods of this type may be viewed as preconditioned
nonlinear Richardson iteration. Since

∆(xc ) = A ’ F (xc ) ¤ A ’ F (x— ) + F (x— ) ’ F (xc ) ,

if xc ∈ B(δ) ‚ „¦ then

∆(xc ) ¤ A ’ F (x— ) + γ ec ¤ A ’ F (x— ) + γδ.

Theorem 5.4.3. Let the standard assumptions hold. Then there are
KA > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and A ’ F (x— ) < δ1 then
the iteration
xn+1 = xn ’ A’1 F (xn )
converges q-linearly to x— and

en+1 ¤ KA ( e0 + A ’ F (x— ) ) en .
(5.9)


5.4.2. Approximate inversion of F . Another way to implement chord-
type methods is to provide an approximate inverse of F . Here we replace
F (x)’1 by B(x), where the action of B on a vector is less expensive to compute
than a solution using the LU factorization. Rather than express the iteration
in terms of B ’1 (x) ’ F (x) and using Theorem 5.4.3 one can proceed directly
from the de¬nition of approximate inverse.
Note that if B is constant (independent of x) then the iteration

x+ = xc ’ BF (xc )

can be viewed as a preconditioned nonlinear Richardson iteration.
We have the following result.
Theorem 5.4.4. Let the standard assumptions hold. Then there are
KB > 0, δ > 0, and δ1 > 0, such that if x0 ∈ B(δ) and the matrix-valued
function B(x) satis¬es

I ’ B(x)F (x— ) = ρ(x) < δ1
(5.10)

for all x ∈ B(δ) then the iteration

xn+1 = xn ’ B(xn )F (xn )

converges q-linearly to x— and

(5.11) en+1 ¤ KB (ρ(xn ) + en ) en .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




78 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. On one hand, this theorem could be regarded as a corollary of the
Banach Lemma and Theorem 5.4.1. We give a direct proof.
First, by (5.10) we have

= B(x)F (x— )F (x— )’1 ¤ B(x)F (x— ) F (x— )’1
B(x)
(5.12)
¤ MB = (1 + δ1 ) F (x— )’1 .

Using (5.12) and
1
(I ’ B(xc )F (x— + tec ))ec dt
e+ = ec ’ B(xc )F (xc ) =
0

1
= (I ’ B(xc )F (x— ))ec + B(xc ) (F (x— ) ’ F (x— + tec ))ec dt
0

we have
e+ ¤ ρ(xc ) ec + MB γ ec 2 /2.
This completes the proof with KB = 1 + MB γ/2.

5.4.3. The Shamanskii method. Alternation of a Newton step with a
sequence of chord steps leads to a class of high-order methods, that is, methods
that converge q-superlinearly with q-order larger than 2. We follow [18] and
name this method for Shamanskii [174], who considered the ¬nite di¬erence
Jacobian formulation of the method from the point of view of e¬ciency. The
method itself was analyzed in [190]. Other high-order methods, especially for
problems in one dimension, are described in [190] and [146].
We can describe the transition from xc to x+ by

= xc ’ F (xc )’1 F (xc ),
y1

yj+1 = yj ’ F (xc )’1 F (yj ) for 1 ¤ j ¤ m ’ 1,
(5.13)

x+ = ym .

Note that m = 1 is Newton™s method and m = ∞ is the chord method with
{yj } playing the role of the chord iterates.
Algorithmically a second loop surrounds the factor/solve step. The inputs
for the Shamanskii method are the same as for Newton™s method or the chord
method except for the addition of the parameter m. Note that we can overwrite
xc with the sequence of yj ™s. Note that we apply the termination test after
computation of each yj .
Algorithm 5.4.1. sham(x, F, „, m)
1. r0 = F (x)

2. Do while F (x) > „r r0 + „a



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




79
NEWTON™S METHOD

(a) Compute F (x)
(b) Factor F (x) = LU .
(c) for j = 1, . . . m
i. Solve LU s = ’F (x)
ii. x = x + s
iii. Evaluate F (x).
iv. If F (x) ¤ „r r0 + „a exit.

The convergence result is a simple application of Theorem 5.4.2.
Theorem 5.4.5. Let the standard assumptions hold and let m ≥ 1 be
given. Then there are KS > 0 and δ > 0 such that if x0 ∈ B(δ) the Shamanskii
iterates converge q-superlinearly to x— with q-order m + 1 and
m+1
(5.14) en+1 ¤ KS en .

Proof. Let δ be small enough so that B(δ) ‚ „¦ and that the conclusions
of Theorem 5.4.2 hold. Then if xn ∈ B(δ) all the intermediate iterates {yj }
are in B(δ) by Theorem 5.4.2. In fact, if we set y1 = xn , (5.8) implies that for
1¤j¤m
j
yj ’x— ¤ KC xn ’x— yj’1 ’x— = KC en yj’1 ’x— ¤ . . . ¤ KC en j+1
.

Hence xn+1 ∈ B(δ). Setting j = m in the above inequality completes the
proof.
The advantage of the Shamanskii method over Newton™s method is
that high q-orders can be obtained with far fewer Jacobian evaluations or
factorizations. Optimal choices of m as a function of N can be derived
under assumptions consistent with dense matrix algebra [18] by balancing the
O(N 3 ) cost of a matrix factorization with the cost of function and Jacobian
evaluation. The analysis in [18] and the expectation that, if the initial iterate is
su¬ciently accurate, only a small number of iterations will be taken, indicate
that the chord method is usually the best option for very large problems.
Algorithm nsol is based on this idea and using an idea from [114] computes
and factors the Jacobian only until an estimate of the q-factor for the linear
convergence rate is su¬ciently low.

5.4.4. Di¬erence approximation to F . Assume that we compute
F (x) + (x) instead of F (x) and attempt to approximate the action of F
on a vector by a forward di¬erence. We would do this, for example, when
building an approximate Jacobian. Here the jth column of F (x) is F (x)ej
where ej is the unit vector with jth component 1 and other components 0.
A forward di¬erence approximation to F (x)w would be

F (x + hw) + (x + hw) ’ F (x) ’ (x)
.
h


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




80 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Assume that (x) ¤ ¯. Then


F (x + hw) + (x + hw) ’ F (x) ’ (x)
F (x)w ’ = O(h + ¯/h).
h


The quantity inside the O-term is minimized when h = ¯. This means, for
instance, that very small di¬erence steps h can lead to inaccurate results. In
the special case where (x) is a result of ¬‚oating-point roundo¬ in full precision
(¯ ≈ 10’15 ), the analysis here indicates that h ≈ 10’7 is a reasonable choice.
The choice h ≈ 10’15 in this case can lead to disaster (try it!). Here we have
made the implicit assumption that x and w are of roughly the same size. If
this is not the case, h should be scaled to re¬‚ect that. The choice


h = ¯1/2 x / w


re¬‚ects all the discussion above.
A more important consequence of this analysis is that the choice of step
size in a forward di¬erence approximation to the action of the Jacobian on
a vector must take into account the error in the evaluation of F . This can
become very important if part of F is computed from measured data.
If F (x) has already been computed, the cost of a forward di¬erence
approximation of F (x)w is an additional evaluation of F (at the point x+hw).
Hence the evaluation of a full ¬nite di¬erence Jacobian would cost N function
evaluations, one for each column of the Jacobian. Exploitation of special
structure could reduce this cost.
The forward di¬erence approximation to the directional derivative is not
a linear operator in w. The reason for this is that the derivative has been
approximated, and so linearity has been lost. Because of this we must
carefully specify what we mean by a di¬erence approximation to the Jacobian.
De¬nition 5.4.1 below speci¬es the usual choice.
Definition 5.4.1. Let F be de¬ned in a neighborhood of x ∈ RN .
(∇h F )(x) is the N — N matrix whose jth column is given by

±
 F (x + h x ej ) ’ F (x)
 x=0

 hx

(∇h F )(x)j =

 F (he ) ’ F (x)

 j
 x=0
h

We make a similar de¬nition of the di¬erence approximation of the
directional derivative.
Definition 5.4.2. Let F be de¬ned in a neighborhood of x ∈ RN and let



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




81
NEWTON™S METHOD

w ∈ RN . We have
±
 0, w = 0,







 F (x + h x w/ w ) ’ F (x)

w , w, x = 0,
(5.15)Dh F (x : w) = hx






 F (hw/ w ) ’ F (x)


 w , x = 0, w = 0.
h
The standard assumptions and the Banach Lemma imply the following
result.
¯
Lemma 5.4.1. Let the standard assumptions hold. Then there are δ, ¯, h >
¯
0 such that if x ∈ B(δ), h ¤ h, and (x) ¤ ¯ for all x ∈ B(δ), then
∇h (F + )(x) is nonsingular for all x ∈ B(δ) and there is MF > 0 such that

F (x) ’ ∇h (F + )(x) ¤ MF (h + ¯/h).

If we assume that the forward di¬erence step has been selected so that
√ √
h = O( ¯), then ∆(x) = O( ¯) by Lemma 5.4.1. Theorem 5.4.1 implies the
following result.
Theorem 5.4.6. Let the standard assumptions hold. Then there are
positive δ, ¯, and KD such that if xc ∈ B(δ), (x) ¤ ¯ for all x ∈ B(δ),
and there is M’ > 0 such that

hc > M’ ¯

then
x+ = xc ’ ∇hc (F + )(xc )’1 (F (xc ) + (xc ))
satis¬es
e+ ¤ KD (¯ + ( ec + hc ) ec ).
Note that Theorem 5.4.6 does not imply that an iteration will converge to
x— .Even if is entirely due to ¬‚oating-point roundo¬, there is no guarantee
that (xn ) ’ 0 as xn ’ x— . Hence, one should expect the sequence { en }
to stagnate and cease to decrease once√ en ≈ ¯. Therefore in the early
phases of the iteration, while en >> ¯, the progress of the iteration will
be indistinguishable from Newton™s method as implemented with an exact

Jacobian. When en ¤ ¯, Theorem 5.4.6 says that en+1 = O(¯). Hence
one will see quadratic convergence until the error in F admits no further
reduction.
A di¬erence approximation to a full Jacobian does not use any special
structure of the Jacobian and will probably be less e¬cient than a hand coded
analytic Jacobian. If information on the sparsity pattern of the Jacobian is
available it is sometimes possible [47], [42], to evaluate ∇h F with far fewer than
N evaluations of F . If the sparsity pattern is such that F can be reduced to



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




82 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

a block triangular form, the entire solution process can take advantage of this
structure [59].
For problems in which the Lipschitz constant of F is large, the error in
the di¬erence approximation will be large, too. Moreover, Dh F (x : w) is not,
in general, a linear function of w and it is usually the case that

(∇h F (x))w = Dh F (x : w).

If F (x— ) is ill conditioned or the Lipschitz constant of F is large, the
di¬erence between them may be signi¬cant and a di¬erence approximation to
a directional derivative, which uses the size of w, is likely to be more accurate
than ∇h F (x)w.
While we used full matrix approximations to numerical Jacobians in all
the examples reported here, they should be used with caution.

5.4.5. The secant method. The results in this section are for single
equations f (x) = 0, where f is a real-valued function of one real variable.
We assume for the sake of simplicity that there are no errors in the evaluation
of f , i.e., (x) = 0. The standard assumptions, then, imply that f (x— ) = 0.
There is no reason to use a di¬erence approximation to f for equations in a
single variable because one can use previously computed data to approximate
f (xn ) by
f (xn ) ’ f (xn’1 )
an = .
xn ’ xn’1
The resulting method is called the secant method and is due to Newton
[140], [137]. We will prove that the secant method is locally q-superlinearly
convergent if the standard assumptions hold.
In order to start, one needs two approximations to x— , x0 and x’1 . The
local convergence theory can be easily described by Theorem 5.4.1. Let xc be
the current iterate and x’ the previous iterate. We have (xc ) = 0 and

f (xc ) ’ f (x’ )
(5.16) ∆(xc ) = ’ f (xc ).
xc ’ x’

We have the following theorem.
Theorem 5.4.7. Let the standard assumptions hold with N = 1. Then
there is δ > 0 such that if x0 , x’1 ∈ B(δ) and x0 = x’1 then the secant iterates
converge q-superlinearly to x— .
Proof. Let δ be small enough so that B(δ) ‚ „¦ and the conclusions of
Theorem 5.4.1 hold. Assume that x’1 , x0 , . . . , xn ∈ B(δ). If xn = x— for some
¬nite n, we are done. Otherwise xn = xn’1 . If we set s = xn ’ xn’1 we have
by (5.16) and the fundamental theorem of calculus
1
∆(xn ) = (f (xn’1 + ts) ’ f (xn )) dt
0




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




83
NEWTON™S METHOD

and hence
γ(|en | + |en’1 |)
|∆(xn )| ¤ γ|xn’1 ’ xn |/2 ¤ .
2
Applying Theorem 5.4.1 implies
¯
|en+1 | ¤ K((1 + γ/2)|en |2 + γ|en ||en’1 |/2).
(5.17)

If we reduce δ so that
¯
K(1 + γ)δ = · < 1
then xn ’ x— q-linearly. Therefore

|en+1 | ¯
¤ K((1 + γ/2)|en | + γ|en’1 |/2) ’ 0
|en |

as n ’ ∞. This completes the proof.
The secant method and the classical bisection (see Exercise 5.7.8) method
are the basis for the popular method of Brent [17] for the solution of single
equations.

5.5. The Kantorovich Theorem
In this section we present the result of Kantorovich [107], [106], which states
that if the standard assumptions “almost” hold at a point x0 , then there
is a root and the Newton iteration converges r-quadratically to that root.
This result is of use, for example, in proving that discretizations of nonlinear
di¬erential and integral equations have solutions that are near to that of the
continuous problem.
We require the following assumption.
Assumption 5.5.1. There are β, ·, r, and γ with β·γ ¤ 1/2 and x0 ∈ RN
¯
such that
1. F is di¬erentiable at x0 ,

F (x0 )’1 ¤ β, and F (x0 )’1 F (x0 ) ¤ ·.

2. F is Lipschitz continuous with Lipschitz constant γ in a ball of radius
r ≥ r’ about x0 where
¯

1 ’ 1 ’ 2β·γ
r’ = .
βγ

We will let B0 be the closed ball of radius r’ about x0

B0 = {x | x ’ x0 ¤ r’ }.

We do not prove the result in its full generality and refer to [106], [145],
[57], and [58] for a complete proof and for discussions of related results.
However, we give a simpli¬ed version of a Kantorovich-like result for the chord



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




84 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

method to show how existence of a solution can be derived from estimates
on the function and Jacobian. Unlike the chord method results in [106],
[145], and [57], Theorem 5.5.1 assumes a bit more than Assumption 5.5.1 but
has a simple proof that uses only the contraction mapping theorem and the
fundamental theorem of calculus and obtains q-linear convergence instead of
r-linear convergence.
Theorem 5.5.1. Let Assumption 5.5.1 hold and let

(5.18) βγ· < 1/2.

Then there is a unique root x— of F in B0 , the chord iteration with x0 as the
initial iterate converges to x— q-linearly with q-factor βγr’ , and xn ∈ B0 for
all n. Moreover x— is the unique root of F in the ball of radius

1 + 1 ’ 2β·γ
min r,¯
βγ

about x0 .
Proof. We will show that the map

φ(x) = x ’ F (x0 )’1 F (x)

is a contraction on the set

S = {x | x ’ x0 ¤ r’ }.

By the fundamental theorem of calculus and Assumption 5.5.1 we have for
all x ∈ S

F (x0 )’1 F (x) = F (x0 )’1 F (x0 )

1
)’1
+F (x0 F (x0 + t(x ’ x0 ))(x ’ x0 ) dt
0
(5.19)
= F (x0 )’1 F (x0 ) + x ’ x0

1
)’1
+F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt
0

and
φ(x) ’ x0 = ’F (x0 )’1 F (x0 )

1
)’1
’F (x0 (F (x0 + t(x ’ x0 )) ’ F (x0 ))(x ’ x0 ) dt.
0

Hence,
2
φ(x) ’ x0 ¤ · + βγr’ /2 = r’ .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




85
NEWTON™S METHOD

and φ maps S into itself.
To show that φ is a contraction on S and to prove the q-linear convergence
we will show that
φ(x) ’ φ(y) ¤ βγr’ x ’ y
for all x, y ∈ S. If we do this, the contraction mapping theorem (Theo-
rem 4.2.1) will imply that there is a unique ¬xed point x— of φ in S and
that xn ’ x— q-linearly with q-factor βγr’ . Clearly x— is a root of F because
it is a ¬xed point of φ. We know that βγr’ < 1 by our assumption that
β·γ < 1/2.
Note that for all x ∈ B0
φ (x) = I ’ F (x0 )’1 F (x) = F (x0 )’1 (F (x0 ) ’ F (x))
and hence
φ (x) ¤ βγ x ’ x0 ¤ βγr’ < 1.
Therefore, for all x, y ∈ B0 ,
1
φ(x) ’ φ(y) = φ (y + t(x ’ y))(x ’ y) dt ¤ βγr’ x ’ y .
0
This proves the convergence result and the uniqueness of x— in B0 .
To prove the remainder of the uniqueness assertion let

1 + 1 ’ 2β·γ
r+ =
βγ
and note that r+ > r’ because β·γ < 1/2. If r = r’ , then we are done by the
¯
— in B . Hence we may assume that r > r . We must show
uniqueness of x ¯
0 ’
that if x is such that
r’ < x ’ x0 < min(¯, r+ )
r
then F (x) = 0. Letting r = x ’ x0 we have by (5.19)
F (x0 )’1 F (x) ≥ r ’ · ’ βγr2 /2 > 0
because r’ < r < r+ . This completes the proof.
The Kantorovich theorem is more precise than Theorem 5.5.1 and has a
very di¬erent proof. We state the theorem in the standard way [145], [106],
[63], using the r-quadratic estimates from [106].
Theorem 5.5.2. Let Assumption 5.5.1 hold. Then there is a unique root
x— of F in B0 , the Newton iteration with x0 as the initial iterate converges to
x— , and xn ∈ B0 for all n. x— is the unique root of F in the ball of radius

1 + 1 ’ 2β·γ
min r, ¯
βγ
about x0 and the errors satisfy
n
(2β·γ)2
(5.20) en ¤ .
2n βγ



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




86 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

If β·γ < 1/2 then (5.20) implies that the convergence is r-quadratic (see
Exercise 5.7.16).

5.6. Examples for Newton™s method
In the collection of MATLAB codes we provide an implementation nsol
of Newton™s method, the chord method, the Shamanskii method, or a
combination of all of them based on the reduction in the nonlinear residual.
This latter option decides if the Jacobian should be recomputed based on the
ratio of successive residuals. If

F (x+ ) / F (xc )

is below a given threshold, the Jacobian is not recomputed. If the ratio is too
large, however, we compute and factor F (x+ ) for use in the subsequent chord
steps. In addition to the inputs of Algorithm sham, a threshold for the ratio
of successive residuals is also input. The Jacobian is recomputed and factored
if either the ratio of successive residuals exceeds the threshold 0 < ρ < 1 or
the number of iterations without an update exceeds m. The output of the
MATLAB implementation of nsol is the solution, the history of the iteration
stored as the vector of l∞ norms of the nonlinear residuals, and an error ¬‚ag.
nsol uses diffjac to approximate the Jacobian by ∇h F with h = 10’7 .
While our examples have dense Jacobians, the ideas in the implementation
of nsol also apply to situations in which the Jacobian is sparse and direct
methods for solving the equation for the Newton step are necessary. One would
still want to minimize the number of Jacobian computations (by the methods
of [47] or [42], say) and sparse matrix factorizations. In Exercise 5.7.25, which
is not easy, the reader is asked to create a sparse form of nsol.
Algorithm 5.6.1. nsol(x, F, „, m, ρ)
1. rc = r0 = F (x)

2. Do while F (x) > „r r0 + „a
(a) Compute ∇h F (x) ≈ F (x)
(b) Factor ∇h F (x) = LU .
(c) is = 0, σ = 0
(d) Do While is < m and σ ¤ ρ
i. is = is + 1
ii. Solve LU s = ’F (x)
iii. x = x + s
iv. Evaluate F (x)
v. r+ = F (x) , σ = r+ /rc , rc = r+
vi. If F (x) ¤ „r r0 + „a exit.

In the MATLAB code the iteration is terminated with an error condition
if σ ≥ 1 at any point. This indicates that the local convergence analysis in this



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




87
NEWTON™S METHOD

section is not applicable. In Chapter 8 we discuss how to recover from this.
nsol is based on the assumption that a di¬erence Jacobian will be used. In
Exercise 5.7.21 you are asked to modify the code to accept an analytic Jacobian.
Many times the Jacobian can be computed e¬ciently by reusing results that
are already available from the function evaluation. The parameters m and ρ
are assigned the default values of 1000 and .5. With these default parameters
the decision to update the Jacobian is made based entirely on the reduction in
the norm of the nonlinear residual. nsol allows the user to specify a maximum
number of iterations; the default is 40.

The Chandrasekhar H-equation. The Chandrasekhar H-equation, [41],
[30],
’1
1 µH(ν) dν
c
(5.21) F (H)(µ) = H(µ) ’ 1 ’ = 0,
2 µ+ν
0

is used to solve exit distribution problems in radiative transfer.
We will discretize the equation with the composite midpoint rule. Here we
approximate integrals on [0, 1] by

N
1 1
f (µ) dµ ≈ f (µj ),
N
0 j=1


where µi = (i ’ 1/2)/N for 1 ¤ i ¤ N . The resulting discrete problem is
« ’1
N
c µi xj 
F (x)i = xi ’ 1 ’
(5.22) .
2N µ + µj
j=1 i


It is known [132] that both (5.21) and the discrete analog (5.22) have two
solutions for c ∈ (0, 1). Only one of these solutions has physical meaning,
however. Newton™s method, with the 0 function or the constant function with
value 1 as initial iterate, will ¬nd the physically meaningful solution [110].
The standard assumptions hold near either solution [110] for c ∈ [0, 1). The
discrete problem has its own physical meaning [41] and we will attempt to
solve it to high accuracy. Because H has a singularity at µ = 0, the solution of
the discrete problem is not even a ¬rst-order accurate approximation to that
of the continuous problem.
In the computations reported here we set N = 100 and c = .9. We used the
function identically one as initial iterate. We computed all Jacobians with the
MATLAB code diffjac. We set the termination parameters „r = „a = 10’6 .
We begin with a comparison of Newton™s method and the chord method.
We present some iteration statistics in both tabular and graphical form. In
Table 5.1 we tabulate the iteration counter n, F (xn ) ∞ / F (x0 ) ∞ , and the
ratio
Rn = F (xn ) ∞ / F (xn’1 ) ∞



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




88 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Table 5.1
Comparison of Newton and chord iteration.

n F (xn ) ∞ / F (x0 ) Rn F (xn ) ∞ / F (x0 ) Rn
∞ ∞
0 1.000e+00 1.000e+00
1 1.480e-01 1.480e-01 1.480e-01 1.480e-01
2 2.698e-03 1.823e-02 3.074e-02 2.077e-01
3 7.729e-07 2.865e-04 6.511e-03 2.118e-01
4 1.388e-03 2.132e-01
5 2.965e-04 2.136e-01
6 6.334e-05 2.136e-01
7 1.353e-05 2.136e-01
8 2.891e-06 2.136e-01



for n ≥ 1 for both Newton™s method and the chord method. In Fig. 5.1
the solid curve is a plot of F (xn ) ∞ / F (x0 ) ∞ for Newton™s method and
the dashed curve a plot of F (xn ) ∞ / F (x0 ) ∞ for the chord method. The
concave iteration track is the signature of superlinear convergence, the linear
track indicates linear convergence. See Exercise 5.7.15 for a more careful
examination of this concavity. The linear convergence of the chord method
can also be seen in the convergence of the ratios Rn to a constant value.
For this example the MATLAB flops command returns an fairly accurate
indicator of the cost. The Newton iteration required over 8.8 million ¬‚oating-
point operations while the chord iteration required under 3.3 million. In
Exercise 5.7.18 you are asked to obtain timings in your environment and will
see how the ¬‚oating-point operation counts are related to the actual timings.
The good performance of the chord method is no accident. If we think
of the chord method as a preconditioned nonlinear Richardson iteration with
the initial Jacobian playing the role of preconditioner, then the chord method
should be much more e¬cient than Newton™s method if the equations for the
steps can be solved cheaply. In the case considered here, the cost of a single
evaluation and factorization of the Jacobian is far more than the cost of the
entire remainder of the chord iteration.
The ideas in Chapters 6 and 7 show how the low iteration cost of the
chord method can be combined with the small number of iterations of a
superlinearly convergent nonlinear iterative method. In the context of this
particular example, the cost of the Jacobian computation can be reduced by
analytic evaluation and reuse of data that have already been computed for the
evaluation of F . The reader is encouraged to do this in Exercise 5.7.21.
In the remaining examples, we plot, but do not tabulate, the iteration
statistics. In Fig. 5.2 we compare the Shamanskii method with m = 2 with
Newton™s method. The Shamanskii computation required under 6 million
¬‚oating-point operations, as compared with over 8.8 for Newton™s method.
However, for this problem, the simple chord method is the most e¬cient.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




89
NEWTON™S METHOD


0
10

-1
10


Relative Nonlinear Residual
-2
10

-3
10

-4
10

-5
10

-6
10

-7
10
0 1 2 3 4 5 6 7 8
Iterations




Fig. 5.1. Newton and chord methods for H-equation, c = .9.
0
10

-1
10
Relative Nonlinear Residual




-2
10

-3
10

-4
10

-5
10

-6
10

-7
10
0 0.5 1 1.5 2 2.5 3 3.5 4
Iterations




Fig. 5.2. Newton and Shamanskii method for H-equation, c = .9.


The hybrid approach in nsol with the default parameters would be the chord
method for this problem.
We now reconsider the H-equation with c = .9999. For this problem the
Jacobian at the solution is nearly singular [110]. Aside from c, all iteration
parameters are the same as in the previous example. While Newton™s method
converges in 7 iterations and requires 21 million ¬‚oating-point operations, the
chord method requires 188 iterations and 10.6 million ¬‚oating-point operations.
The q-factor for the chord method in this computation was over .96, indicating
very slow convergence. The hybrid method in nsol with the default parameters



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




90 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

computes the Jacobian four times during the course of the iteration, converges
in 14 iterations, and requires 12 million ¬‚oating-point operations. In Fig. 5.3
we compare the Newton iteration with the hybrid.

0
10


-1
10
Relative Nonlinear Residual

-2
10


-3
10


-4
10


-5
10


-6
10
0 2 4 6 8 10 12 14
Iterations




Fig. 5.3. Newton and hybrid methods for H-equation c = .9999.

One should approach plots of convergence histories with some caution. The
theory for the chord method asserts that the error norms converge q-linearly to
zero. This implies only that the nonlinear residual norms { F (xn ) } converge
r-linearly to zero. While the plots indicate q-linear convergence of the nonlinear
residuals, the theory does not, in general, support this. In practice, however,
plots like those in this section are common.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




91
NEWTON™S METHOD

5.7. Exercises on Newton™s method
In some of the exercises here, and in the rest of this part of the book, you will
be asked to plot or tabulate iteration statistics. When asked to do this, for
each iteration tabulate the iteration counter, the norm of F , and the ratio
of F (xn ) / F (xn’1 ) for n ≥ 1. A better alternative in the MATLAB
environment is to use the semilogy command to plot the norms. When
one does this, one can visually inspect the plot to determine superlinear
convergence (concavity) without explicitly computing the ratios. Use the l∞
norm.
5.7.1. A function G is said to be H¨lder continuous with exponent ± in „¦ if
o
± for all x, y ∈ „¦. Show that if the Lipschitz
G(x) ’ G(y) ¤ K x ’ y
continuity condition on F in the standard assumptions is replaced by
H¨lder continuity with exponent ± > 0 that the Newton iterates converge
o
with q-order 1 + ±.
5.7.2. Can the performance of the Newton iteration be improved by a linear
change of variables? That is, for nonsingular N — N matrices A and
B, can the Newton iterates for F (x) = 0 and AF (Bx) = 0 show any
performance di¬erence when started at the same initial iterate? What
about the chord method?
5.7.3. Let ∇h F be given by De¬nition 5.4.1. Given the standard assumptions,
prove that the iteration given by
xn+1 = xn ’ (∇hn F (xn ))’1 F (xn )
is locally q-superlinearly convergent if hn ’ 0. When is it locally q-
quadratically convergent?
5.7.4. Suppose F (xn ) is replaced by ∇hn F (xn ) in the Shamanskii method.
Discuss how hn must be decreased as the iteration progresses to preserve
the q-order of m + 1 [174].
5.7.5. In what way is the convergence analysis of the secant method changed if
there are errors in the evaluation of f ?
5.7.6. Prove that the secant method converges r-superlinearly with r-order

( 5 + 1)/2. This is easy.
5.7.7. Show that the secant method converges q-superlinearly with q-order

( 5 + 1)/2.
5.7.8. The bisection method produces a sequence of intervals (xk , xk ) that r
l
contain a root of a function of one variable. Given (xk , xk ) with r
l
k )f (xk ) < 0 we replace one of the endpoints with y = (xk + xk )/2
f (xl r r
l
k+1 k+1 ) < 0. If f (y) = 0, we terminate the iteration.
to force f (xl )f (xr
Show that the sequence xk = (xk + xk )/2 is r-linearly convergent if f is
r
l
0 , x0 ) such that f (x0 )f (x0 ) < 0.
continuous and there exists (xl r r
l




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




92 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.9. Write a program that solves single nonlinear equations with Newton™s
method, the chord method, and the secant method. For the secant
method, use x’1 = .99x0 . Apply your program to the following
function/initial iterate combinations, tabulate or plot iteration statistics,
and explain your results.
1. f (x) = cos(x) ’ x, x0 = .5
2. f (x) = tan’1 (x), x0 = 1
3. f (x) = sin(x), x0 = 3
4. f (x) = x2 , x0 = .5
5. f (x) = x2 + 1, x0 = 10
5.7.10. For scalar functions of one variable a quadratic model of f about a point
xc is
mc (x) = f (xc ) + f (xc )(x ’ xc ) + f (xc )(x ’ xc )2 /2.
Consider the iterative method that de¬nes x+ as a root of mc . What
problems might arise in such a method? Resolve these problems and
implement your algorithm. Test it on the problems in Exercise 5.7.9. In
what cases is this algorithm superior to Newton™s method? For hints and
generalizations of this idea to functions of several variables, see [170]. Can
the ideas in [170] be applied to some of the high-order methods discussed
in [190] for equations in one unknown?
5.7.11. Show that if f is a real function of one real variable, f is Lipschitz
continuous, and f (x— ) = f (x— ) = 0 but f (x— ) = 0 then the iteration
xn+1 = xn ’ 2f (xn )/f (xn )
converges locally q-quadratically to x— provided x0 is su¬ciently near to
x— , but not equal to x— [171].
5.7.12. Show that if f is a real function of one real variable, the standard
assumptions hold, f (x— ) = 0 and f (x) is Lipschitz continuous, then
the Newton iteration converges cubically (q-superlinear with q-order 3).
5.7.13. Show that if f is a real function of one real variable, the standard
assumptions hold, x0 is su¬ciently near x— , and f (x0 )f (x0 ) > 0, then
the Newton iteration converges monotonically to x— . What if f (x— ) = 0?
How can you extend this result if f (x— ) = f (x— ) = . . . f (k) (x— ) = 0 =
f (k+1) (x— )? See [131].
5.7.14. How would you estimate the q-order of a q-superlinearly convergent
sequence? Rigorously justify your answer.
5.7.15. If xn ’ x— q-superlinearly with q-order ± > 1, show that log( en ) is a
concave function of n in the sense that
log( en+1 ) ’ log( en )



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




93
NEWTON™S METHOD

is a decreasing function of n for n su¬ciently large. Is this still the case
if the convergence is q-superlinear but there is no q-order? What if the
convergence is q-linear?

5.7.16. Show that the sequence on the right side of (5.20) is r-quadratically (but
not q-quadratically) convergent to 0 if β·γ < 1/2.

5.7.17. Typically (see [89], [184]) the computed LU factorization of F (x) is the
exact LU factorization of a nearby matrix.

LU = F (x) + E.

Use the theory in [89], [184] or another book on numerical linear algebra
and Theorem 5.4.1 to describe how this factorization error a¬ects the
convergence of the Newton iteration. How does the analysis for the QR
factorization di¬er?

5.7.18. Duplicate the results on the H-equation in § 5.6. Try di¬erent values of
N and c. How does the performance of the methods di¬er as N and c
change? Compare your results to the tabulated values of the H function
in [41] or [14]. Compare execution times in your environment. Tabulate
or plot iteration statistics. Use the MATLAB cputime command to see
how long the entire iteration took for each N, c combination and the
MATLAB flops command to get an idea for the number of ¬‚oating-
point operations required by the various methods.

5.7.19. Solve the H-equation with c = 1 and N = 100. Explain your results
(see [50] and [157]). Does the trick in Exercise 5.7.11 improve the
convergence? Try one of the approaches in [51], [49], or [118] for
acceleration of the convergence. You might also look at [90], [170], [75],
[35] and [155] for more discussion of this. Try the chord method and
compare the results to part 4 of Exercise 5.7.9 (see [52]).

5.7.20. Solve the H-equation with N = 100, c = .9 and c = .9999 with ¬xed-point
iteration. Compare timings and ¬‚op counts with Newton™s method, the
chord method, and the algorithm in nsol.

5.7.21. Compute by hand the Jacobian of F , the discretized nonlinearity from
the H equation in (5.22). Prove that F can be computed at a cost of
less than twice that of an evaluation of F . Modify nsol to accept this
analytic Jacobian rather than evaluate a di¬erence Jacobian. How do
the execution times and ¬‚op counts change?

5.7.22. Modify dirder by setting the forward di¬erence step h = 10’2 . How
does this change a¬ect the convergence rate observations that you have
made?



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




94 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

5.7.23. Add noise to your function evaluation with a random number generator,
such as the the MATLAB rand function. If the size of the noise is
O(10’4 ) and you make the appropriate adjustments to dirder, how are
the convergence rates di¬erent? What is an appropriate termination
criterion? At what point does the iteration stagnate?

5.7.24. Assume that the standard assumptions hold, that the cost of a function
evaluation is O(N 2 ) ¬‚oating-point operations, the cost of a Jacobian is
O(N ) function evaluations, and that x0 is near enough to x— so that the
Newton iteration converges q-quadratically to x— . Show that the number
n of Newton iterations needed to obtain en ¤ e0 is O(log( )) and
that the number of ¬‚oating-point operations required is O(N 3 log( )).
What about the chord method?

5.7.25. Develop a sparse version of nsol for problems with tridiagonal Jacobian.
In MATLAB, for example, you could modify diffjac to use the
techniques of [47] and nsol to use the sparse matrix factorization in
MATLAB. Apply your code to the central di¬erence discretization of the
two-point boundary value problem

’u = sin(u) + f (x), u(0) = u(1) = 0.

Here, the solution is u— (x) = x(1 ’ x), and

f (x) = 2 ’ sin(x(1 ’ x)).

Let u0 = 0 be the initial iterate.

5.7.26. Suppose you try to ¬nd an eigenvector-eigenvalue pair (φ, ») of an N —N
matrix A by solving the system of N + 1 nonlinear equations

Aφ ’ »φ 0
(5.23) F (φ, ») = =
φT φ ’ 1 0

for the vector x = (φT , »)T ∈ RN +1 . What is the Jacobian of this
system? If x = (φ, ») ∈ RN +1 is an eigenvector-eigenvalue pair, when is
F (x) nonsingular? Relate the application of Newton™s method to (5.23)
to the inverse power method. See [150] for more development of this
idea.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 6

Inexact Newton Methods




Theorem 5.4.1 describes how errors in the derivative/function a¬ect the
progress in the Newton iteration. Another way to look at this is to ask how
an approximate solution of the linear equation for the Newton step a¬ects the
iteration. This was the view taken in [55] where inexact Newton methods in
which the step satis¬es

(6.1) F (xc )s + F (xc ) ¤ ·c F (xc )

are considered. Any approximate step is accepted provided that the relative
residual of the linear equation is small. This is quite useful because conditions
like (6.1) are precisely the small linear residual termination conditions for
iterative solution of the linear system for the Newton step. Such methods are
not new. See [145] and [175], for example, for discussion of these ideas in the
context of the classical stationary iterative methods. In most of this chapter
we will focus on the approximate solution of the equation for the Newton step
by GMRES, but the other Krylov subspace methods discussed in Chapters 2
and 3 and elsewhere can also be used. We follow [69] and refer to the term ·c
on the right hand side of (6.1) as the forcing term.

6.1. The basic estimates
We will discuss speci¬c implementation issues in § 6.2. Before that we will
give the basic result from [55] to show how a step that satis¬es (6.1) a¬ects
the convergence. We will present this result in two parts. In § 6.1.1 we give a
straightforward analysis that uses the techniques of Chapter 5. In § 6.1.2 we
show how the requirements of the simple analysis can be changed to admit a
more aggressive (i.e., larger) choice of the parameter ·.

6.1.1. Direct analysis. The proof and application of Theorem 6.1.1 should
be compared to that of Theorem 5.1.1 and the other results in Chapter 5.
Theorem 6.1.1. Let the standard assumptions hold. Then there are δ and
KI such that if xc ∈ B(δ), s satis¬es (6.1), and x+ = xc + s then

(6.2) e+ ¤ KI ( ec + ·c ) ec .
95

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




96 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Proof. Let δ be small enough so that the conclusions of Lemma 4.3.1 and
Theorem 5.1.1 hold. To prove the ¬rst assertion (6.2) note that if2

r = ’F (xc )s ’ F (xc )

is the linear residual then

s + F (xc )’1 F (xc ) = ’F (xc )’1 r

and
e+ = ec + s = ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r.
(6.3)
Now, (6.1), (4.7), and (4.6) imply that

s + F (xc )’1 F (xc ) ¤ F (xc )’1 ·c F (xc )

¤ 4κ(F (x— ))·c ec .

Hence, using (6.3) and Theorem 5.1.1

¤ ec ’ F (xc )’1 F (xc ) + 4κ(F (x— ))·c ec
e+

2 + 4κ(F (x— ))·c ec ,
¤ K ec

where K is the constant from (5.2). If we set

KI = K + 4κ(F (x— )),

the proof is complete.
We summarize the implications of Theorem 6.1.1 for iterative methods in
the next result, which is a direct consequence of (6.2) and will su¬ce to explain
most computational observations.
Theorem 6.1.2. Let the standard assumptions hold. Then there are δ and
· such that if x0 ∈ B(δ), {·n } ‚ [0, · ], then the inexact Newton iteration
¯ ¯

xn+1 = xn + sn ,

where
F (xn )sn + F (xn ) ¤ ·n F (xn )
converges q-linearly to x— . Moreover
• if ·n ’ 0 the convergence is q-superlinear, and
p
• if ·n ¤ K· F (xn ) for some K· > 0 the convergence is q-superlinear
with q-order 1 + p.

2 Our
de¬nition of r is consistent with the idea that the residual in a linear equation is b ’ Ax. In
[55] r = F (xc )s + F (xc ).




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




97
INEXACT NEWTON METHODS

Proof. Let δ be small enough so that (6.2) holds for xc ∈ B(δ). Reduce δ
and · if needed so that
¯
KI (δ + · ) < 1,
¯
where KI is from (6.2). Then if n ≥ 0 and xn ∈ B(δ) we have

en+1 ¤ KI ( en + ·n ) en ¤ KI (δ + · ) en < en .
¯

This proves q-linear convergence with a q-factor of KI (δ + · ).
¯
If ·n ’ 0 then q-superlinear convergence follows from the de¬nition. If
p
·n ¤ K· F (xn )

then we may use (4.7) and (6.2) to conclude
1’p
+ K· 2p F (x— ) p ) en 1+p
en+1 ¤ KI ( en

which completes the proof.

6.1.2. Weighted norm analysis. Since KI = O(κ(F (x— ))), one might
conclude from Theorem 6.1.2 that if F (x— ) is ill conditioned very small forcing
terms must be used. This is not the case and the purpose of this subsection
is to describe more accurately the necessary restrictions on the forcing terms.
The results here di¬er from those in § 6.1.1 in that no restriction is put on
the sequence {·n } ‚ [0, 1) other than requiring that 1 not be an accumulation
point.
Theorem 6.1.3. Let the standard assumptions hold. Then there is δ such
that if xc ∈ B(δ), s satis¬es (6.1), x+ = xc + s, and ·c ¤ · < · < 1, then
¯

F (x— )e+ ¤ · F (x— )ec .
(6.4) ¯

Proof. To prove (6.4) note that Theorem 4.0.1 implies that
2
γ ec

F (xc ) ¤ F (x )ec + .
2
Since
ec = F (x— )’1 F (x— )ec ¤ F (x— )’1 F (x— )ec
we have, with
γ F (x— )’1
M0 =
2
F (xc ) ¤ (1 + M0 δ) F (x— )ec .
(6.5)
Now,

F (x— )e+ = F (x— )(ec + s)

= F (x— )(ec ’ F (xc )’1 F (xc ) ’ F (xc )’1 r).



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




98 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

By Theorem 5.1.1

F (x— )(ec ’ F (xc )’1 F (xc )) ¤ K F (x— ) ec 2 .

Hence,
F (x— )e+ ¤ F (x— )F (xc )’1 r + K F (x— ) ec 2 .
(6.6)
Since
F (x— )F (xc )’1 r ¤ r + (F (x— ) ’ F (xc ))F (xc )’1 r

¤ (1 + 2γ F (x— )’1 ec ) r ,

<<

. 4
( 7)



>>