. 1
( 7)



>>

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




To Polly H. Thomas, 1906-1994, devoted mother and grandmother




1

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Contents




Preface xi


How to Get the Software xiii

CHAPTER 1. Basic Concepts and Stationary Iterative Methods 3
1.1 Review and notation . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The Banach Lemma and approximate inverses . . . . . . . . . . 5
1.3 The spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Matrix splittings and classical stationary iterative methods . . 7
1.5 Exercises on stationary iterative methods . . . . . . . . . . . . 10

CHAPTER 2. Conjugate Gradient Iteration 11
2.1 Krylov methods and the minimization property . . . . . . . . . 11
2.2 Consequences of the minimization property . . . . . . . . . . . 13
2.3 Termination of the iteration . . . . . . . . . . . . . . . . . . . . 15
2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 CGNR and CGNE . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7 Examples for preconditioned conjugate iteration . . . . . . . . 26
2.8 Exercises on conjugate gradient . . . . . . . . . . . . . . . . . . 30

CHAPTER 3. GMRES Iteration 33
3.1 The minimization property and its consequences . . . . . . . . 33
3.2 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4 GMRES implementation: Basic ideas . . . . . . . . . . . . . . . 37
3.5 Implementation: Givens rotations . . . . . . . . . . . . . . . . . 43
3.6 Other methods for nonsymmetric systems . . . . . . . . . . . . 46
3.6.1 Bi-CG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.6.2 CGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.6.3 Bi-CGSTAB. . . . . . . . . . . . . . . . . . . . . . . . . 50
vii

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




viii CONTENTS

3.6.4 TFQMR. . . . . . . . . . . ............. . . . 51
3.7 Examples for GMRES iteration . . ............. . . . 54
3.8 Examples for CGNR, Bi-CGSTAB, and TFQMR iteration . . . 55
3.9 Exercises on GMRES . . . . . . . . ............. . . . 60

CHAPTER 4. Basic Concepts and Fixed-Point Iteration 65
4.1 Types of convergence . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Fixed-point iteration . . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 The standard assumptions . . . . . . . . . . . . . . . . . . . . . 68

CHAPTER 5. Newton™s Method 71
5.1 Local convergence of Newton™s method . . . . . . . . . . . . . . 71
5.2 Termination of the iteration . . . . . . . . . . . . . . . . . . . . 72
5.3 Implementation of Newton™s method . . . . . . . . . . . . . . . 73
5.4 Errors in the function and derivative . . . . . . . . . . . . . . . 75
5.4.1 The chord method. . . . . . . . . . . . . . . . . . . . . . 76
5.4.2 Approximate inversion of F . . . . . . . . . . . . . . . . 77
5.4.3 The Shamanskii method. . . . . . . . . . . . . . . . . . 78
5.4.4 Di¬erence approximation to F . . . . . . . . . . . . . . . 79
5.4.5 The secant method. . . . . . . . . . . . . . . . . . . . . 82
5.5 The Kantorovich Theorem . . . . . . . . . . . . . . . . . . . . . 83
5.6 Examples for Newton™s method . . . . . . . . . . . . . . . . . . 86
5.7 Exercises on Newton™s method . . . . . . . . . . . . . . . . . . 91

CHAPTER 6. Inexact Newton Methods 95
6.1 The basic estimates . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.1 Direct analysis. . . . . . . . . . . . . . . . . . . . . . . . 95
6.1.2 Weighted norm analysis. . . . . . . . . . . . . . . . . . . 97
6.1.3 Errors in the function. . . . . . . . . . . . . . . . . . . . 100
6.2 Newton-iterative methods . . . . . . . . . . . . . . . . . . . . . 100
6.2.1 Newton GMRES. . . . . . . . . . . . . . . . . . . . . . . 101
6.2.2 Other Newton-iterative methods. . . . . . . . . . . . . . 104
6.3 Newton-GMRES implementation . . . . . . . . . . . . . . . . . 104
6.4 Examples for Newton-GMRES . . . . . . . . . . . . . . . . . . 106
6.4.1 Chandrasekhar H-equation. . . . . . . . . . . . . . . . . 107
6.4.2 Convection-di¬usion equation. . . . . . . . . . . . . . . 108
6.5 Exercises on inexact Newton methods . . . . . . . . . . . . . . 110

CHAPTER 7. Broyden™s method 113
7.1 The Dennis“Mor´ condition . . . . .
e . . . . . . . . . . . . . . . 114
7.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2.1 Linear problems. . . . . . . . . . . . . . . . . . . . . . . 118
7.2.2 Nonlinear problems. . . . . . . . . . . . . . . . . . . . . 120
7.3 Implementation of Broyden™s method . . . . . . . . . . . . . . . 123
7.4 Examples for Broyden™s method . . . . . . . . . . . . . . . . . . 127



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




ix
CONTENTS

7.4.1 Linear problems. . . . . . . . . . . . . . . . . . . . . . . 127
7.4.2 Nonlinear problems. . . . . . . . . . . . . . . . . . . . . 128
7.5 Exercises on Broyden™s method . . . . . . . . . . . . . . . . . . 132

CHAPTER 8. Global Convergence 135
8.1 Single equations . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.2 Analysis of the Armijo rule . . . . . . . . . . . . . . . . . . . . 138
8.3 Implementation of the Armijo rule . . . . . . . . . . . . . . . . 141
8.3.1 Polynomial line searches. . . . . . . . . . . . . . . . . . 142
8.3.2 Broyden™s method. . . . . . . . . . . . . . . . . . . . . . 144
8.4 Examples for Newton“Armijo . . . . . . . . . . . . . . . . . . . 146
8.4.1 Inverse tangent function. . . . . . . . . . . . . . . . . . 146
8.4.2 Convection-di¬usion equation. . . . . . . . . . . . . . . 146
8.4.3 Broyden“Armijo. . . . . . . . . . . . . . . . . . . . . . . 148
8.5 Exercises on global convergence . . . . . . . . . . . . . . . . . . 151

Bibliography 153

Index 163




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




x CONTENTS




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Preface




This book on iterative methods for linear and nonlinear equations can be used
as a tutorial and a reference by anyone who needs to solve nonlinear systems
of equations or large linear systems. It may also be used as a textbook for
introductory courses in nonlinear equations or iterative methods or as source
material for an introductory course in numerical analysis at the graduate level.
We assume that the reader is familiar with elementary numerical analysis,
linear algebra, and the central ideas of direct methods for the numerical
solution of dense linear systems as described in standard texts such as [7],
[105], or [184].
Our approach is to focus on a small number of methods and treat them
in depth. Though this book is written in a ¬nite-dimensional setting, we
have selected for coverage mostly algorithms and methods of analysis which
extend directly to the in¬nite-dimensional case and whose convergence can be
thoroughly analyzed. For example, the matrix-free formulation and analysis for
GMRES and conjugate gradient is almost unchanged in an in¬nite-dimensional
setting. The analysis of Broyden™s method presented in Chapter 7 and
the implementations presented in Chapters 7 and 8 are di¬erent from the
classical ones and also extend directly to an in¬nite-dimensional setting. The
computational examples and exercises focus on discretizations of in¬nite-
dimensional problems such as integral and di¬erential equations.
We present a limited number of computational examples. These examples
are intended to provide results that can be used to validate the reader™s own
implementations and to give a sense of how the algorithms perform. The
examples are not designed to give a complete picture of performance or to be
a suite of test problems.
The computational examples in this book were done with MATLAB
(version 4.0a on various SUN SPARCstations and version 4.1 on an Apple
Macintosh Powerbook 180) and the MATLAB environment is an excellent one
for getting experience with the algorithms, for doing the exercises, and for
small-to-medium scale production work.1 MATLAB codes for many of the
algorithms are available by anonymous ftp. A good introduction to the latest
1 MATLAB is a registered trademark of The MathWorks, Inc.

xi

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




xii PREFACE

version (version 4.2) of MATLAB is the MATLAB Primer [178]; [43] is also
a useful resource. If the reader has no access to MATLAB or will be solving
very large problems, the general algorithmic descriptions or even the MATLAB
codes can easily be translated to another language.
Parts of this book are based upon work supported by the National
Science Foundation and the Air Force O¬ce of Scienti¬c Research over
several years, most recently under National Science Foundation Grant Nos.
DMS-9024622 and DMS-9321938. Any opinions, ¬ndings, and conclusions or
recommendations expressed in this material are those of the author and do not
necessarily re¬‚ect the views of the National Science Foundation or of the Air
Force O¬ce of Scienti¬c Research.
Many of my students and colleagues discussed various aspects of this
project with me and provided important corrections, ideas, suggestions, and
pointers to the literature. I am especially indebted to Jim Banoczi, Je¬ Butera,
Steve Campbell, Tony Choi, Moody Chu, Howard Elman, Jim Epperson,
Andreas Griewank, Laura Helfrich, Ilse Ipsen, Lea Jenkins, Vickie Kearn,
Belinda King, Debbie Lockhart, Carl Meyer, Casey Miller, Ekkehard Sachs,
Je¬ Scroggs, Joseph Skudlarek, Mike Tocci, Gordon Wade, Homer Walker,
Steve Wright, Zhaqing Xue, Yue Zhang, and an anonymous reviewer for their
contributions and encouragement.
Most importantly, I thank Chung-Wei Ng and my parents for over one
hundred and ten years of patience and support.

C. T. Kelley
Raleigh, North Carolina
January, 1998




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




How to get the software




A collection of MATLAB codes has been written to accompany this book. The
MATLAB codes can be obtained by anonymous ftp from the MathWorks server
ftp.mathworks.com in the directory pub/books/kelley, from the MathWorks
World Wide Web site,
http://www.mathworks.com
or from SIAM™s World Wide Web site
http://www.siam.org/books/kelley/kelley.html
One can obtain MATLAB from
The MathWorks, Inc.
24 Prime Park Way
Natick, MA 01760,
Phone: (508) 653-1415
Fax: (508) 653-2997
E-mail: info@mathworks.com
WWW: http://www.mathworks.com




xiii

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 1

Basic Concepts and Stationary Iterative Methods




1.1. Review and notation
We begin by setting notation and reviewing some ideas from numerical linear
algebra that we expect the reader to be familiar with. An excellent reference
for the basic ideas of numerical linear algebra and direct methods for linear
equations is [184].
We will write linear equations as
(1.1) Ax = b,
where A is a nonsingular N — N matrix, b ∈ RN is given, and
x— = A’1 b ∈ RN
is to be found.
Throughout this chapter x will denote a potential solution and {xk }k≥0 the
sequence of iterates. We will denote the ith component of a vector x by (x)i
(note the parentheses) and the ith component of xk by (xk )i . We will rarely
need to refer to individual components of vectors.
In this chapter · will denote a norm on RN as well as the induced matrix
norm.
Definition 1.1.1. Let · be a norm on RN . The induced matrix norm
of an N — N matrix A is de¬ned by
A = max Ax .
x =1

Induced norms have the important property that
Ax ¤ A x.
Recall that the condition number of A relative to the norm · is
A’1 ,
κ(A) = A
is the lp norm
where κ(A) is understood to be in¬nite if A is singular. If ·
« 1/p
N
|(x)i |p 
=
x p
j=1

3

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




4 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

we will write the condition number as κp .
Most iterative methods terminate when the residual

r = b ’ Ax

is su¬ciently small. One termination criterion is

rk
(1.2) < „,
r0

which can be related to the error

e = x ’ x—

in terms of the condition number.
Lemma 1.1.1. Let b, x, x0 ∈ RN . Let A be nonsingular and let x— = A’1 b.

e r
(1.3) ¤ κ(A) .
e0 r0

Proof. Since
r = b ’ Ax = ’Ae

we have
e = A’1 Ae ¤ A’1 Ae = A’1 r

and
r0 = Ae0 ¤ A e0 .

Hence
A’1 r
e r
¤ = κ(A) ,
A ’1 r0
e0 r0

as asserted.
The termination criterion (1.2) depends on the initial iterate and may result
in unnecessary work when the initial iterate is good and a poor result when the
initial iterate is far from the solution. For this reason we prefer to terminate
the iteration when
rk
(1.4) < „.
b

The two conditions (1.2) and (1.4) are the same when x0 = 0, which is a
common choice, particularly when the linear iteration is being used as part of
a nonlinear solver.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




5
BASIC CONCEPTS

1.2. The Banach Lemma and approximate inverses
The most straightforward approach to an iterative solution of a linear system
is to rewrite (1.1) as a linear ¬xed-point iteration. One way to do this is to
write Ax = b as
(1.5) x = (I ’ A)x + b,
and to de¬ne the Richardson iteration

(1.6) xk+1 = (I ’ A)xk + b.

We will discuss more general methods in which {xk } is given by

(1.7) xk+1 = M xk + c.

In (1.7) M is an N —N matrix called the iteration matrix. Iterative methods of
this form are called stationary iterative methods because the transition from xk
to xk+1 does not depend on the history of the iteration. The Krylov methods
discussed in Chapters 2 and 3 are not stationary iterative methods.
All our results are based on the following lemma.
Lemma 1.2.1. If M is an N — N matrix with M < 1 then I ’ M is
nonsingular and
1
(I ’ M )’1 ¤
(1.8) .
1’ M
Proof. We will show that I ’ M is nonsingular and that (1.8) holds by
showing that the series

M l = (I ’ M )’1 .
l=0

The partial sums
k
Ml
Sk =
l=0

form a Cauchy sequence in RN —N . To see this note that for all m > k
m
Ml .
Sk ’ Sm ¤
l=k+1

Now, M l ¤ M l because · is a matrix norm that is induced by a vector
norm. Hence
m
1 ’ M m’k
l k+1
Sk ’ S m ¤ M =M ’0
1’ M
l=k+1

as m, k ’ ∞. Hence the sequence Sk converges, say to S. Since M Sk + I =
Sk+1 , we must have M S + I = S and hence (I ’ M )S = I. This proves that
I ’ M is nonsingular and that S = (I ’ M )’1 .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




6 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Noting that

’1 l
= (1 ’ M )’1 .
(I ’ M ) ¤ M
l=0

proves (1.8) and completes the proof.
The following corollary is a direct consequence of Lemma 1.2.1.
Corollary 1.2.1. If M < 1 then the iteration (1.7) converges to
x = (I ’ M )’1 c for all initial iterates x0 .
A consequence of Corollary 1.2.1 is that Richardson iteration (1.6) will
converge if I ’ A < 1. It is sometimes possible to precondition a linear
equation by multiplying both sides of (1.1) by a matrix B

BAx = Bb

so that convergence of iterative methods is improved. In the context of
Richardson iteration, the matrices B that allow us to apply the Banach lemma
and its corollary are called approximate inverses.
Definition 1.2.1. B is an approximate inverse of A if I ’ BA < 1.
The following theorem is often referred to as the Banach Lemma.
Theorem 1.2.1. If A and B are N — N matrices and B is an approximate
inverse of A, then A and B are both nonsingular and
B A
A’1 ¤ B ’1 ¤
(1.9) , ,
1 ’ I ’ BA 1 ’ I ’ BA
and
B I ’ BA A I ’ BA
A’1 ’ B ¤ A ’ B ’1 ¤
(1.10) , .
1 ’ I ’ BA 1 ’ I ’ BA
Proof. Let M = I ’ BA. By Lemma 1.2.1 I ’ M = I ’ (I ’ BA) = BA is
nonsingular. Hence both A and B are nonsingular. By (1.8)
1 1
A’1 B ’1 = (I ’ M )’1 ¤
(1.11) = .
1’ M 1 ’ I ’ BA

Since A’1 = (I ’ M )’1 B, inequality (1.11) implies the ¬rst part of (1.9). The
second part follows in a similar way from B ’1 = A(I ’ M )’1 .
To complete the proof note that

A’1 ’ B = (I ’ BA)A’1 , A ’ B ’1 = B ’1 (I ’ BA),

and use (1.9).
Richardson iteration, preconditioned with approximate inversion, has the
form
(1.12) xk+1 = (I ’ BA)xk + Bb.
If the norm of I ’ BA is small, then not only will the iteration converge
rapidly, but, as Lemma 1.1.1 indicates, termination decisions based on the



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




7
BASIC CONCEPTS

preconditioned residual Bb ’ BAx will better re¬‚ect the actual error. This
method is a very e¬ective technique for solving di¬erential equations, integral
equations, and related problems [15], [6], [100], [117], [111]. Multigrid methods
[19], [99], [126], can also be interpreted in this light. We mention one other
approach, polynomial preconditioning, which tries to approximate A’1 by a
polynomial in A [123], [179], [169].

1.3. The spectral radius
The analysis in § 1.2 related convergence of the iteration (1.7) to the norm of
the matrix M . However the norm of M could be small in some norms and
quite large in others. Hence the performance of the iteration is not completely
described by M . The concept of spectral radius allows us to make a complete
description.
We let σ(A) denote the set of eigenvalues of A.
Definition 1.3.1. The spectral radius of an N — N matrix A is

ρ(A) = max |»| = lim An 1/n
(1.13) .
n’∞
»∈σ(A)


The term on the right-hand side of the second equality in (1.13) is the limit
used by the radical test for convergence of the series An .
The spectral radius of M is independent of any particular matrix norm of
M . It is clear, in fact, that
(1.14) ρ(A) ¤ A
for any induced matrix norm. The inequality (1.14) has a partial converse that
allows us to completely describe the performance of iteration (1.7) in terms of
spectral radius. We state that converse as a theorem and refer to [105] for a
proof.
Theorem 1.3.1. Let A be an N — N matrix. Then for any > 0 there is
a norm · on RN such that

ρ(A) > A ’ .

A consequence of Theorem 1.3.1, Lemma 1.2.1, and Exercise 1.5.1 is a
characterization of convergent stationary iterative methods. The proof is left
as an exercise.
Theorem 1.3.2. Let M be an N —N matrix. The iteration (1.7) converges
for all c ∈ RN if and only if ρ(M ) < 1.

1.4. Matrix splittings and classical stationary iterative methods
There are ways to convert Ax = b to a linear ¬xed-point iteration that are
di¬erent from (1.5). Methods such as Jacobi, Gauss“Seidel, and sucessive
overrelaxation (SOR) iteration are based on splittings of A of the form

A = A1 + A2 ,



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




8 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where A1 is a nonsingular matrix constructed so that equations with A1 as
coe¬cient matrix are easy to solve. Then Ax = b is converted to the ¬xed-
point problem
x = A’1 (b ’ A2 x).
1

The analysis of the method is based on an estimation of the spectral radius of
’1
the iteration matrix M = ’A1 A2 .
For a detailed description of the classical stationary iterative methods the
reader may consult [89], [105], [144], [193], or [200]. These methods are usually
less e¬cient than the Krylov methods discussed in Chapters 2 and 3 or the more
modern stationary methods based on multigrid ideas. However the classical
methods have a role as preconditioners. The limited description in this section
is intended as a review that will set some notation to be used later.
As a ¬rst example we consider the Jacobi iteration that uses the splitting

A1 = D, A2 = L + U,

where D is the diagonal of A and L and U are the (strict) lower and upper
triangular parts. This leads to the iteration matrix

MJAC = ’D’1 (L + U ).

Letting (xk )i denote the ith component of the kth iterate we can express Jacobi
iteration concretely as
« 

(xk+1 )i = a’1 bi ’ aij (xk )j  .
(1.15) ii
j=i

Note that A1 is diagonal and hence trivial to invert.
We present only one convergence result for the classical stationary iterative
methods.
Theorem 1.4.1. Let A be an N — N matrix and assume that for all
1¤i¤N
(1.16) 0< |aij | < |aii |.
j=i

Then A is nonsingular and the Jacobi iteration (1.15) converges to x— = A’1 b
for all b.
Proof. Note that the ith row sum of M = MJAC satis¬es
N
j=i |aij |
|mij | = < 1.
|aii |
j=1

Hence MJAC ∞ < 1 and the iteration converges to the unique solution of
x = M x + D’1 b. Also I ’ M = D’1 A is nonsingular and therefore A is
nonsingular.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




9
BASIC CONCEPTS

Gauss“Seidel iteration overwrites the approximate solution with the new
value as soon as it is computed. This results in the iteration
« 

(xk+1 )i = a’1 bi ’ aij (xk )j  ,
aij (xk+1 )j ’
ii
j<i j>i

the splitting
A1 = D + L, A2 = U,
and iteration matrix
MGS = ’(D + L)’1 U.
’1
Note that A1 is lower triangular, and hence A1 y is easy to compute for vectors
y. Note also that, unlike Jacobi iteration, the iteration depends on the ordering
of the unknowns. Backward Gauss“Seidel begins the update of x with the N th
coordinate rather than the ¬rst, resulting in the splitting

A1 = D + U, A2 = L,

and iteration matrix
MBGS = ’(D + U )’1 L.
A symmetric Gauss“Seidel iteration is a forward Gauss“Seidel iteration
followed by a backward Gauss“Seidel iteration. This leads to the iteration
matrix
MSGS = MBGS MGS = (D + U )’1 L(D + L)’1 U.
If A is symmetric then U = LT . In that event

MSGS = (D + U )’1 L(D + L)’1 U = (D + LT )’1 L(D + L)’1 LT .

From the point of view of preconditioning, one wants to write the stationary
method as a preconditioned Richardson iteration. That means that one wants
to ¬nd B such that M = I ’ BA and then use B as an approximate inverse.
For the Jacobi iteration,
BJAC = D’1 .
(1.17)
For symmetric Gauss“Seidel

BSGS = (D + LT )’1 D(D + L)’1 .
(1.18)

The successive overrelaxation iteration modi¬es Gauss“Seidel by adding a
relaxation parameter ω to construct an iteration with iteration matrix

MSOR = (D + ωL)’1 ((1 ’ ω)D ’ ωU ).

The performance can be dramatically improved with a good choice of ω but
still is not competitive with Krylov methods. A further disadvantage is that
the choice of ω is often di¬cult to make. References [200], [89], [193], [8], and
the papers cited therein provide additional reading on this topic.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




10 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1.5. Exercises on stationary iterative methods
1.5.1. Show that if ρ(M ) ≥ 1 then there are x0 and c such that the iteration
(1.7) fails to converge.

1.5.2. Prove Theorem 1.3.2.

1.5.3. Verify equality (1.18).

1.5.4. Show that if A is symmetric and positive de¬nite (that is AT = A and
xT Ax > 0 for all x = 0) that BSGS is also symmetric and positive
de¬nite.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 2

Conjugate Gradient Iteration




2.1. Krylov methods and the minimization property
In the following two chapters we describe some of the Krylov space methods
for linear equations. Unlike the stationary iterative methods, Krylov methods
do not have an iteration matrix. The two such methods that we™ll discuss in
depth, conjugate gradient and GMRES, minimize, at the kth iteration, some
measure of error over the a¬ne space

x0 + Kk ,

where x0 is the initial iterate and the kth Krylov subspace Kk is

Kk = span(r0 , Ar0 , . . . , Ak’1 r0 )

for k ≥ 1.
The residual is
r = b ’ Ax.
So {rk }k≥0 will denote the sequence of residuals

rk = b ’ Axk .

As in Chapter 1, we assume that A is a nonsingular N — N matrix and let

x— = A’1 b.

There are other Krylov methods that are not as well understood as CG or
GMRES. Brief descriptions of several of these methods and their properties
are in § 3.6, [12], and [78].
The conjugate gradient (CG) iteration was invented in the 1950s [103] as a
direct method. It has come into wide use over the last 15 years as an iterative
method and has generally superseded the Jacobi“Gauss“Seidel“SOR family of
methods.
CG is intended to solve symmetric positive de¬nite (spd) systems. Recall
that A is symmetric if A = AT and positive de¬nite if

xT Ax > 0 for all x = 0.
11

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




12 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

In this section we assume that A is spd. Since A is spd we may de¬ne a norm
(you should check that this is a norm) by

xT Ax.
(2.1) x =
A


· A is called the A-norm. The development in these notes is di¬erent from
the classical work and more like the analysis for GMRES and CGNR in [134].
In this section, and in the section on GMRES that follows, we begin with a
description of what the algorithm does and the consequences of the minimiza-
tion property of the iterates. After that we describe termination criterion,
performance, preconditioning, and at the very end, the implementation.
The kth iterate xk of CG minimizes

1
φ(x) = xT Ax ’ xT b
(2.2)
2

over x0 + Kk .
Note that if φ(˜) is the minimal value (in RN ) then
x

∇φ(˜) = A˜ ’ b = 0
x x

and hence x = x— .
˜
Minimizing φ over any subset of RN is the same as minimizing x ’ x— A
over that subset. We state this as a lemma.
Lemma 2.1.1. Let S ‚ RN . If xk minimizes φ over S then xk also
minimizes x— ’ x A = r A’1 over S.
Proof. Note that

x ’ x— 2
= (x ’ x— )T A(x ’ x— ) = xT Ax ’ xT Ax— ’ (x— )T Ax + (x— )T Ax— .
A


Since A is symmetric and Ax— = b

’xT Ax— ’ (x— )T Ax = ’2xT Ax— = ’2xT b.

Therefore
x ’ x— 2
= 2φ(x) + (x— )T Ax— .
A

Since (x— )T Ax— is independent of x, minimizing φ is equivalent to minimizing
x ’ x— 2 and hence to minimizing x ’ x— A .
A
If e = x ’ x— then

2
= eT Ae = (A(x ’ x— ))T A’1 (A(x ’ x— )) = b ’ Ax 2
e A A’1


and hence the A-norm of the error is also the A’1 -norm of the residual.
We will use this lemma in the particular case that S = x0 + Kk for some k.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




13
CONJUGATE GRADIENT ITERATION

2.2. Consequences of the minimization property
Lemma 2.1.1 implies that since xk minimizes φ over x0 + Kk

x— ’ xk ¤ x— ’ w
(2.3) A A

for all w ∈ x0 + Kk . Since any w ∈ x0 + Kk can be written as
k’1
γj Aj r0 + x0
w=
j=0

for some coe¬cients {γj }, we can express x— ’ w as
k’1
— —
γj Aj r0 .
x ’ w = x ’ x0 ’
j=0

Since Ax— = b we have

r0 = b ’ Ax0 = A(x— ’ x0 )

and therefore
k’1
— —
γj Aj+1 (x— ’ x0 ) = p(A)(x— ’ x0 ),
x ’ w = x ’ x0 ’
j=0

where the polynomial
k’1
γj z j+1
p(z) = 1 ’
j=0

has degree k and satis¬es p(0) = 1. Hence

x— ’ xk p(A)(x— ’ x0 )
(2.4) = min A.
A
p∈Pk ,p(0)=1

In (2.4) Pk denotes the set of polynomials of degree k.
The spectral theorem for spd matrices asserts that

A = U ΛU T ,

where U is an orthogonal matrix whose columns are the eigenvectors of A and
Λ is a diagonal matrix with the positive eigenvalues of A on the diagonal. Since
U U T = U T U = I by orthogonality of U , we have

Aj = U Λ j U T .

Hence
p(A) = U p(Λ)U T .
De¬ne A1/2 = U Λ1/2 U T and note that
2
= xT Ax = A1/2 x 2 .
(2.5) x A 2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




14 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, for any x ∈ RN and

= A1/2 p(A)x A1/2 x
p(A)x ¤ p(A) ¤ p(A) x A.
2 2 2 2
A

This, together with (2.4) implies that

xk ’ x— ¤ x0 ’ x—
(2.6) min max |p(z)|.
A A
p∈Pk ,p(0)=1 z∈σ(A)

Here σ(A) is the set of all eigenvalues of A.
The following corollary is an important consequence of (2.6).
Corollary 2.2.1. Let A be spd and let {xk } be the CG iterates. Let k be
given and let {¯k } be any kth degree polynomial such that pk (0) = 1. Then
p ¯

xk ’ x— A
(2.7) ¤ max |¯k (z)|.
p
x0 ’ x— z∈σ(A)
A

We will refer to the polynomial pk as a residual polynomial [185].
¯
Definition 2.2.1. The set of kth degree residual polynomials is

(2.8) Pk = {p | p is a polynomial of degree k and p(0) = 1.}

In speci¬c contexts we try to construct sequences of residual polynomials,
based on information on σ(A), that make either the middle or the right term
in (2.7) easy to evaluate. This leads to an upper estimate for the number of
CG iterations required to reduce the A-norm of the error to a given tolerance.
One simple application of (2.7) is to show how the CG algorithm can be
viewed as a direct method.
Theorem 2.2.1. Let A be spd. Then the CG algorithm will ¬nd the
solution within N iterations.
Proof. Let {»i }N be the eigenvalues of A. As a test polynomial, let
i=1

N
p(z) =
¯ (»i ’ z)/»i .
i=1

p ∈ PN because p has degree N and p(0) = 1. Hence, by (2.7) and the fact
¯ ¯ ¯
that p vanishes on σ(A),
¯

xN ’ x— ¤ x0 ’ x— max |¯(z)| = 0.
p
A A
z∈σ(A)

Note that our test polynomial had the eigenvalues of A as its roots. In
that way we showed (in the absence of all roundo¬ error!) that CG terminated
in ¬nitely many iterations with the exact solution. This is not as good as it
sounds, since in most applications the number of unknowns N is very large,
and one cannot a¬ord to perform N iterations. It is best to regard CG as an
iterative method. When doing that we seek to terminate the iteration when
some speci¬ed error tolerance is reached.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




15
CONJUGATE GRADIENT ITERATION

In the two examples that follow we look at some other easy consequences
of (2.7).
Theorem 2.2.2. Let A be spd with eigenvectors {ui }N . Let b be a linear
i=1
combination of k of the eigenvectors of A
k
b= γl uil .
l=1

Then the CG iteration for Ax = b with x0 = 0 will terminate in at most k
iterations.
Proof. Let {»il } be the eigenvalues of A associated with the eigenvectors
{uil }k . By the spectral theorem
l=1

k

x= (γl /»il )uil .
l=1

We use the residual polynomial,
k
p(z) =
¯ (»il ’ z)/»il .
l=1

One can easily verify that p ∈ Pk . Moreover, p(»il ) = 0 for 1 ¤ l ¤ k and
¯ ¯
hence
k

p(A)x =
¯ p(»il )γl /»il uil = 0.
¯
l=1
So, we have by (2.4) and the fact that x0 = 0 that

xk ’ x— ¤ p(A)x—
¯ = 0.
A A

This completes the proof.
If the spectrum of A has fewer than N points, we can use a similar technique
to prove the following theorem.
Theorem 2.2.3. Let A be spd. Assume that there are exactly k ¤ N
distinct eigenvalues of A. Then the CG iteration terminates in at most k
iterations.

2.3. Termination of the iteration
In practice we do not run the CG iteration until an exact solution is found, but
rather terminate once some criterion has been satis¬ed. One typical criterion is
small (say ¤ ·) relative residuals. This means that we terminate the iteration
after
(2.9) b ’ Axk 2 ¤ · b 2 .
The error estimates that come from the minimization property, however, are
based on (2.7) and therefore estimate the reduction in the relative A-norm of
the error.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




16 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Our next task is to relate the relative residual in the Euclidean norm to
the relative error in the A-norm. We will do this in the next two lemmas and
then illustrate the point with an example.
Lemma 2.3.1. Let A be spd with eigenvalues »1 ≥ »2 ≥ . . . »N . Then for
all z ∈ RN ,
A1/2 z 2 = z A
(2.10)
and
1/2 1/2
(2.11) »N z ¤ Az ¤ »1 z A.
2
A

Proof. Clearly
2
= z T Az = (A1/2 z)T (A1/2 z) = A1/2 z 2
z A 2

which proves (2.10).
Let ui be a unit eigenvector corresponding to »i . We may write A = U ΛU T
as
N
»i (uT z)ui .
Az = i
i=1
Hence
N
»N A1/2 z 2 T2
= »N i=1 »i (ui z)
2

N
2 2T2
¤ Az = i=1 »i (ui z)
2

N T2 = »1 A1/2 z 2 .
¤ »1 i=1 »i (ui z) 2

Taking square roots and using (2.10) complete the proof.
Lemma 2.3.2.
xk ’ x—
b b ’ Axk b ’ Axk
2 2 2 A
(2.12) = ¤ κ2 (A) —
r0 b2 b ’ Ax0 x ’ x0
2 2 A

and
xk ’ x—
b ’ Axk κ2 (A) r0
2 2 A
(2.13) ¤ .
x— ’ x0
b2 b2 A
Proof. The equality on the left of (2.12) is clear and (2.13) follows directly
from (2.12). To obtain the inequality on the right of (2.12), ¬rst recall that if
A = U ΛU T is the spectral decomposition of A and we order the eigenvalues
such that »1 ≥ »2 ≥ . . . »N > 0, then A 2 = »1 and A’1 2 = 1/»N . So
κ2 (A) = »1 /»N .
Therefore, using (2.10) and (2.11) twice,

A(x— ’ xk ) »1 x— ’ xk
b ’ Axk 2 2 A
= ¤
A(x— ’ x0 ) »N x— ’ x0
b ’ Ax0 2 2 A

as asserted.
So, to predict the performance of the CG iteration based on termination on
small relative residuals, we must not only use (2.7) to predict when the relative



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




17
CONJUGATE GRADIENT ITERATION

A-norm error is small, but also use Lemma 2.3.2 to relate small A-norm errors
to small relative residuals.
We consider a very simple example. Assume that x0 = 0 and that the
eigenvalues of A are contained in the interval (9, 11). If we let pk (z) =
¯
k /10k , then p ∈ P . This means that we may apply (2.7) to get
(10 ’ z) ¯k k

xk ’ x— ¤ x— max |¯k (z)|.
p
A A
9¤z¤11

It is easy to see that
max |¯k (z)| = 10’k .
p
9¤z¤11

Hence, after k iterations

xk ’ x— ¤ x— ’k
(2.14) A 10 .
A

So, the size of the A-norm of the error will be reduced by a factor of 10’3 when

10’k ¤ 10’3 ,

that is, when
k ≥ 3.
To use Lemma 2.3.2 we simply note that κ2 (A) ¤ 11/9. Hence, after k
iterations we have
Axk ’ b 2 √
¤ 11 — 10’k /3.
b2
So, the size of the relative residual will be reduced by a factor of 10’3 when

10’k ¤ 3 — 10’3 / 11,

that is, when
k ≥ 4.
One can obtain a more precise estimate by using a polynomial other than
pk in the upper estimate for the right-hand side of (2.7). Note that it is always
the case that the spectrum of a spd matrix is contained in the interval [»N , »1 ]
and that κ2 (A) = »1 /»N . A result from [48] (see also [45]) that is, in one
sense, the sharpest possible, is
k
κ2 (A) ’ 1
— —
(2.15) xk ’ x ¤ 2 x0 ’ x .
A A
κ2 (A) + 1

In the case of the above example, we can estimate κ2 (A) by κ2 (A) ¤ 11/9.
√ √
Hence, since ( x ’ 1)/( x + 1) is an increasing function of x on the interval
(1, ∞). √
κ2 (A) ’ 1 11 ’ 3
¤√ ≈ .05.
κ2 (A) + 1 11 + 3



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




18 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Therefore (2.15) would predict a reduction in the size of the A-norm error by
a factor of 10’3 when
2 — .05k < 10’3
or when
k > ’ log10 (2000)/ log10 (.05) ≈ 3.3/1.3 ≈ 2.6,
which also predicts termination within three iterations.
We may have more precise information than a single interval containing
σ(A). When we do, the estimate in (2.15) can be very pessimistic. If the
eigenvalues cluster in a small number of intervals, the condition number can
be quite large, but CG can perform very well. We will illustrate this with an
example. Exercise 2.8.5 also covers this point.
Assume that x0 = 0 and the eigenvalues of A lie in the two intervals (1, 1.5)
and (399, 400). Based on this information the best estimate of the condition
number of A is κ2 (A) ¤ 400, which, when inserted into (2.15) gives

xk ’ x— A
¤ 2 — (19/21)k ≈ 2 — (.91)k .
x— A
This would indicate fairly slow convergence. However, if we use as a residual
polynomial p3k ∈ P3k
¯

(1.25 ’ z)k (400 ’ z)2k
p3k (z) =
¯ .
(1.25)k — 4002k
It is easy to see that

max |¯3k (z)| ¤ (.25/1.25)k = (.2)k ,
p
z∈σ(A)

which is a sharper estimate on convergence. In fact, (2.15) would predict that

xk ’ x— ¤ 10’3 x— A,
A

when 2 — (.91)k < 10’3 or when

k > ’ log10 (2000)/ log10 (.91) ≈ 3.3/.04 = 82.5.

The estimate based on the clustering gives convergence in 3k iterations when

(.2)k ¤ 10’3

or when
k > ’3/ log10 (.2) = 4.3.
Hence (2.15) predicts 83 iterations and the clustering analysis 15 (the smallest
integer multiple of 3 larger than 3 — 4.3 = 12.9).
From the results above one can see that if the condition number of A is near
one, the CG iteration will converge very rapidly. Even if the condition number



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




19
CONJUGATE GRADIENT ITERATION

is large, the iteration will perform well if the eigenvalues are clustered in a few
small intervals. The transformation of the problem into one with eigenvalues
clustered near one (i.e., easier to solve) is called preconditioning. We used
this term before in the context of Richardson iteration and accomplished the
goal by multiplying A by an approximate inverse. In the context of CG, such
a simple approach can destroy the symmetry of the coe¬cient matrix and a
more subtle implementation is required. We discuss this in § 2.5.

2.4. Implementation
The implementation of CG depends on the amazing fact that once xk has been
determined, either xk = x— or a search direction pk+1 = 0 can be found very
cheaply so that xk+1 = xk + ±k+1 pk+1 for some scalar ±k+1 . Once pk+1 has
been found, ±k+1 is easy to compute from the minimization property of the
iteration. In fact
dφ(xk + ±pk+1 )
(2.16) =0

for the correct choice of ± = ±k+1 . Equation (2.16) can be written as

pT Axk + ±pT Apk+1 ’ pT b = 0
k+1 k+1 k+1

leading to
pT (b ’ Axk ) pT r k
k+1
= T k+1
(2.17) ±k+1 = .
T Ap
pk+1 k+1 pk+1 Apk+1
If xk = xk+1 then the above analysis implies that ± = 0. We show that
this only happens if xk is the solution.
Lemma 2.4.1. Let A be spd and let {xk } be the conjugate gradient iterates.
Then
T
(2.18) rk rl = 0 for all 0 ¤ l < k.
Proof. Since xk minimizes φ on x0 + Kk , we have, for any ξ ∈ Kk

dφ(xk + tξ)
= ∇φ(xk + tξ)T ξ = 0
dt
at t = 0. Recalling that

∇φ(x) = Ax ’ b = ’r

we have
∇φ(xk )T ξ = ’rk ξ = 0 for all ξ ∈ Kk .
T
(2.19)
Since rl ∈ Kk for all l < k (see Exercise 2.8.1), this proves (2.18).
Now, if xk = xk+1 , then rk = rk+1 . Lemma 2.4.1 then implies that
rk 2 = rk rk = rk rk+1 = 0 and hence xk = x— .
T T
2
The next lemma characterizes the search direction and, as a side e¬ect,
proves that (if we de¬ne p0 = 0) pT rk = 0 for all 0 ¤ l < k ¤ n, unless the
l
iteration terminates prematurely.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




20 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Lemma 2.4.2. Let A be spd and let {xk } be the conjugate gradient iterates.
If xk = x— then xk+1 = xk + ±k+1 pk+1 and pk+1 is determined up to a scalar
multiple by the conditions

pk+1 ∈ Kk+1 , pT Aξ = 0 for all ξ ∈ Kk .
(2.20) k+1

Proof. Since Kk ‚ Kk+1 ,

∇φ(xk+1 )T ξ = (Axk + ±k+1 Apk+1 ’ b)T ξ = 0
(2.21)

for all ξ ∈ Kk . (2.19) and (2.21) then imply that for all ξ ∈ Kk ,

±k+1 pT Aξ = ’(Axk ’ b)T ξ = ’∇φ(xk )T ξ = 0.
(2.22) k+1

This uniquely speci¬es the direction of pk+1 as (2.22) implies that pk+1 ∈ Kk+1
is A-orthogonal (i.e., in the scalar product (x, y) = xT Ay) to Kk , a subspace
of dimension one less than Kk+1 .
The condition pT Aξ = 0 is called A-conjugacy of pk+1 to Kk . Now, any
k+1
pk+1 satisfying (2.20) can, up to a scalar multiple, be expressed as

pk+1 = rk + wk

with wk ∈ Kk . While one might think that wk would be hard to compute, it
is, in fact, trivial. We have the following theorem.
Theorem 2.4.1. Let A be spd and assume that rk = 0. De¬ne p0 = 0.
Then
(2.23) pk+1 = rk + βk+1 pk for some βk+1 and k ≥ 0.
Proof. By Lemma 2.4.2 and the fact that Kk = span(r0 , . . . , rk’1 ), we need
only verify that a βk+1 can be found so that if pk+1 is given by (2.23) then

pT Arl = 0
k+1

for all 0 ¤ l ¤ k ’ 1.
Let pk+1 be given by (2.23). Then for any l ¤ k

pT Arl = rk Arl + βk+1 pT Arl .
T
k+1 k

If l ¤ k ’ 2, then rl ∈ Kl+1 ‚ Kk’1 . Lemma 2.4.2 then implies that

pT Arl = 0 for 0 ¤ l ¤ k ’ 2.
k+1

It only remains to solve for βk+1 so that pT Ark’1 = 0. Trivially
k+1

βk+1 = ’rk Ark’1 /pT Ark’1
T
(2.24) k

provided pT Ark’1 = 0. Since
k

rk = rk’1 ’ ±k Apk



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




21
CONJUGATE GRADIENT ITERATION

we have
T 2
’ ±k pT Ark’1 .
rk rk’1 = rk’1 2 k
T
Since rk rk’1 = 0 by Lemma 2.4.1 we have

pT Ark’1 = rk’1 2 /±k = 0.
(2.25) k 2

This completes the proof.
The common implementation of conjugate gradient uses a di¬erent form
for ±k and βk than given in (2.17) and (2.24).
Lemma 2.4.3. Let A be spd and assume that rk = 0. Then

rk 22
(2.26) ±k+1 =T
pk+1 Apk+1

and
2
rk 2
(2.27) βk+1 = 2.
rk’1 2
Proof. Note that for k ≥ 0

pT rk+1 = rk rk+1 + βk+1 pT rk+1 = 0
T
(2.28) k+1 k

by Lemma 2.4.2. An immediate consequence of (2.28) is that pT rk = 0 and
k
hence
pT rk = (rk + βk+1 pk )T rk = rk 2 .
(2.29) k+1 2

Taking scalar products of both sides of

rk+1 = rk ’ ±k+1 Apk+1

with pk+1 and using (2.29) gives

0 = pT rk ’ ±k+1 pT Apk+1 = rk
T 2
’ ±k+1 pT Apk+1 ,
k+1 k+1 2 k+1

which is equivalent to (2.26).
To get (2.27) note that pT Apk = 0 and hence (2.23) implies that
k+1

T
’rk Apk
(2.30) βk+1 =T .
pk Apk
Also note that
pT Apk = pT A(rk’1 + βk pk’1 )
k k
(2.31)

. 1
( 7)



>>