To Polly H. Thomas, 1906-1994, devoted mother and grandmother

1

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Contents

Preface xi

How to Get the Software xiii

CHAPTER 1. Basic Concepts and Stationary Iterative Methods 3

1.1 Review and notation . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 The Banach Lemma and approximate inverses . . . . . . . . . . 5

1.3 The spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Matrix splittings and classical stationary iterative methods . . 7

1.5 Exercises on stationary iterative methods . . . . . . . . . . . . 10

CHAPTER 2. Conjugate Gradient Iteration 11

2.1 Krylov methods and the minimization property . . . . . . . . . 11

2.2 Consequences of the minimization property . . . . . . . . . . . 13

2.3 Termination of the iteration . . . . . . . . . . . . . . . . . . . . 15

2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 CGNR and CGNE . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.7 Examples for preconditioned conjugate iteration . . . . . . . . 26

2.8 Exercises on conjugate gradient . . . . . . . . . . . . . . . . . . 30

CHAPTER 3. GMRES Iteration 33

3.1 The minimization property and its consequences . . . . . . . . 33

3.2 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.4 GMRES implementation: Basic ideas . . . . . . . . . . . . . . . 37

3.5 Implementation: Givens rotations . . . . . . . . . . . . . . . . . 43

3.6 Other methods for nonsymmetric systems . . . . . . . . . . . . 46

3.6.1 Bi-CG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.6.2 CGS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6.3 Bi-CGSTAB. . . . . . . . . . . . . . . . . . . . . . . . . 50

vii

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

viii CONTENTS

3.6.4 TFQMR. . . . . . . . . . . ............. . . . 51

3.7 Examples for GMRES iteration . . ............. . . . 54

3.8 Examples for CGNR, Bi-CGSTAB, and TFQMR iteration . . . 55

3.9 Exercises on GMRES . . . . . . . . ............. . . . 60

CHAPTER 4. Basic Concepts and Fixed-Point Iteration 65

4.1 Types of convergence . . . . . . . . . . . . . . . . . . . . . . . . 65

4.2 Fixed-point iteration . . . . . . . . . . . . . . . . . . . . . . . . 66

4.3 The standard assumptions . . . . . . . . . . . . . . . . . . . . . 68

CHAPTER 5. Newton™s Method 71

5.1 Local convergence of Newton™s method . . . . . . . . . . . . . . 71

5.2 Termination of the iteration . . . . . . . . . . . . . . . . . . . . 72

5.3 Implementation of Newton™s method . . . . . . . . . . . . . . . 73

5.4 Errors in the function and derivative . . . . . . . . . . . . . . . 75

5.4.1 The chord method. . . . . . . . . . . . . . . . . . . . . . 76

5.4.2 Approximate inversion of F . . . . . . . . . . . . . . . . 77

5.4.3 The Shamanskii method. . . . . . . . . . . . . . . . . . 78

5.4.4 Di¬erence approximation to F . . . . . . . . . . . . . . . 79

5.4.5 The secant method. . . . . . . . . . . . . . . . . . . . . 82

5.5 The Kantorovich Theorem . . . . . . . . . . . . . . . . . . . . . 83

5.6 Examples for Newton™s method . . . . . . . . . . . . . . . . . . 86

5.7 Exercises on Newton™s method . . . . . . . . . . . . . . . . . . 91

CHAPTER 6. Inexact Newton Methods 95

6.1 The basic estimates . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.1.1 Direct analysis. . . . . . . . . . . . . . . . . . . . . . . . 95

6.1.2 Weighted norm analysis. . . . . . . . . . . . . . . . . . . 97

6.1.3 Errors in the function. . . . . . . . . . . . . . . . . . . . 100

6.2 Newton-iterative methods . . . . . . . . . . . . . . . . . . . . . 100

6.2.1 Newton GMRES. . . . . . . . . . . . . . . . . . . . . . . 101

6.2.2 Other Newton-iterative methods. . . . . . . . . . . . . . 104

6.3 Newton-GMRES implementation . . . . . . . . . . . . . . . . . 104

6.4 Examples for Newton-GMRES . . . . . . . . . . . . . . . . . . 106

6.4.1 Chandrasekhar H-equation. . . . . . . . . . . . . . . . . 107

6.4.2 Convection-di¬usion equation. . . . . . . . . . . . . . . 108

6.5 Exercises on inexact Newton methods . . . . . . . . . . . . . . 110

CHAPTER 7. Broyden™s method 113

7.1 The Dennis“Mor´ condition . . . . .

e . . . . . . . . . . . . . . . 114

7.2 Convergence analysis . . . . . . . . . . . . . . . . . . . . . . . . 116

7.2.1 Linear problems. . . . . . . . . . . . . . . . . . . . . . . 118

7.2.2 Nonlinear problems. . . . . . . . . . . . . . . . . . . . . 120

7.3 Implementation of Broyden™s method . . . . . . . . . . . . . . . 123

7.4 Examples for Broyden™s method . . . . . . . . . . . . . . . . . . 127

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

ix

CONTENTS

7.4.1 Linear problems. . . . . . . . . . . . . . . . . . . . . . . 127

7.4.2 Nonlinear problems. . . . . . . . . . . . . . . . . . . . . 128

7.5 Exercises on Broyden™s method . . . . . . . . . . . . . . . . . . 132

CHAPTER 8. Global Convergence 135

8.1 Single equations . . . . . . . . . . . . . . . . . . . . . . . . . . 135

8.2 Analysis of the Armijo rule . . . . . . . . . . . . . . . . . . . . 138

8.3 Implementation of the Armijo rule . . . . . . . . . . . . . . . . 141

8.3.1 Polynomial line searches. . . . . . . . . . . . . . . . . . 142

8.3.2 Broyden™s method. . . . . . . . . . . . . . . . . . . . . . 144

8.4 Examples for Newton“Armijo . . . . . . . . . . . . . . . . . . . 146

8.4.1 Inverse tangent function. . . . . . . . . . . . . . . . . . 146

8.4.2 Convection-di¬usion equation. . . . . . . . . . . . . . . 146

8.4.3 Broyden“Armijo. . . . . . . . . . . . . . . . . . . . . . . 148

8.5 Exercises on global convergence . . . . . . . . . . . . . . . . . . 151

Bibliography 153

Index 163

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

x CONTENTS

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Preface

This book on iterative methods for linear and nonlinear equations can be used

as a tutorial and a reference by anyone who needs to solve nonlinear systems

of equations or large linear systems. It may also be used as a textbook for

introductory courses in nonlinear equations or iterative methods or as source

material for an introductory course in numerical analysis at the graduate level.

We assume that the reader is familiar with elementary numerical analysis,

linear algebra, and the central ideas of direct methods for the numerical

solution of dense linear systems as described in standard texts such as [7],

[105], or [184].

Our approach is to focus on a small number of methods and treat them

in depth. Though this book is written in a ¬nite-dimensional setting, we

have selected for coverage mostly algorithms and methods of analysis which

extend directly to the in¬nite-dimensional case and whose convergence can be

thoroughly analyzed. For example, the matrix-free formulation and analysis for

GMRES and conjugate gradient is almost unchanged in an in¬nite-dimensional

setting. The analysis of Broyden™s method presented in Chapter 7 and

the implementations presented in Chapters 7 and 8 are di¬erent from the

classical ones and also extend directly to an in¬nite-dimensional setting. The

computational examples and exercises focus on discretizations of in¬nite-

dimensional problems such as integral and di¬erential equations.

We present a limited number of computational examples. These examples

are intended to provide results that can be used to validate the reader™s own

implementations and to give a sense of how the algorithms perform. The

examples are not designed to give a complete picture of performance or to be

a suite of test problems.

The computational examples in this book were done with MATLAB

(version 4.0a on various SUN SPARCstations and version 4.1 on an Apple

Macintosh Powerbook 180) and the MATLAB environment is an excellent one

for getting experience with the algorithms, for doing the exercises, and for

small-to-medium scale production work.1 MATLAB codes for many of the

algorithms are available by anonymous ftp. A good introduction to the latest

1 MATLAB is a registered trademark of The MathWorks, Inc.

xi

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

xii PREFACE

version (version 4.2) of MATLAB is the MATLAB Primer [178]; [43] is also

a useful resource. If the reader has no access to MATLAB or will be solving

very large problems, the general algorithmic descriptions or even the MATLAB

codes can easily be translated to another language.

Parts of this book are based upon work supported by the National

Science Foundation and the Air Force O¬ce of Scienti¬c Research over

several years, most recently under National Science Foundation Grant Nos.

DMS-9024622 and DMS-9321938. Any opinions, ¬ndings, and conclusions or

recommendations expressed in this material are those of the author and do not

necessarily re¬‚ect the views of the National Science Foundation or of the Air

Force O¬ce of Scienti¬c Research.

Many of my students and colleagues discussed various aspects of this

project with me and provided important corrections, ideas, suggestions, and

pointers to the literature. I am especially indebted to Jim Banoczi, Je¬ Butera,

Steve Campbell, Tony Choi, Moody Chu, Howard Elman, Jim Epperson,

Andreas Griewank, Laura Helfrich, Ilse Ipsen, Lea Jenkins, Vickie Kearn,

Belinda King, Debbie Lockhart, Carl Meyer, Casey Miller, Ekkehard Sachs,

Je¬ Scroggs, Joseph Skudlarek, Mike Tocci, Gordon Wade, Homer Walker,

Steve Wright, Zhaqing Xue, Yue Zhang, and an anonymous reviewer for their

contributions and encouragement.

Most importantly, I thank Chung-Wei Ng and my parents for over one

hundred and ten years of patience and support.

C. T. Kelley

Raleigh, North Carolina

January, 1998

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

How to get the software

A collection of MATLAB codes has been written to accompany this book. The

MATLAB codes can be obtained by anonymous ftp from the MathWorks server

ftp.mathworks.com in the directory pub/books/kelley, from the MathWorks

World Wide Web site,

http://www.mathworks.com

or from SIAM™s World Wide Web site

http://www.siam.org/books/kelley/kelley.html

One can obtain MATLAB from

The MathWorks, Inc.

24 Prime Park Way

Natick, MA 01760,

Phone: (508) 653-1415

Fax: (508) 653-2997

E-mail: info@mathworks.com

WWW: http://www.mathworks.com

xiii

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 1

Basic Concepts and Stationary Iterative Methods

1.1. Review and notation

We begin by setting notation and reviewing some ideas from numerical linear

algebra that we expect the reader to be familiar with. An excellent reference

for the basic ideas of numerical linear algebra and direct methods for linear

equations is [184].

We will write linear equations as

(1.1) Ax = b,

where A is a nonsingular N — N matrix, b ∈ RN is given, and

x— = A’1 b ∈ RN

is to be found.

Throughout this chapter x will denote a potential solution and {xk }k≥0 the

sequence of iterates. We will denote the ith component of a vector x by (x)i

(note the parentheses) and the ith component of xk by (xk )i . We will rarely

need to refer to individual components of vectors.

In this chapter · will denote a norm on RN as well as the induced matrix

norm.

Definition 1.1.1. Let · be a norm on RN . The induced matrix norm

of an N — N matrix A is de¬ned by

A = max Ax .

x =1

Induced norms have the important property that

Ax ¤ A x.

Recall that the condition number of A relative to the norm · is

A’1 ,

κ(A) = A

is the lp norm

where κ(A) is understood to be in¬nite if A is singular. If ·

« 1/p

N

|(x)i |p

=

x p

j=1

3

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

4 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

we will write the condition number as κp .

Most iterative methods terminate when the residual

r = b ’ Ax

is su¬ciently small. One termination criterion is

rk

(1.2) < „,

r0

which can be related to the error

e = x ’ x—

in terms of the condition number.

Lemma 1.1.1. Let b, x, x0 ∈ RN . Let A be nonsingular and let x— = A’1 b.

e r

(1.3) ¤ κ(A) .

e0 r0

Proof. Since

r = b ’ Ax = ’Ae

we have

e = A’1 Ae ¤ A’1 Ae = A’1 r

and

r0 = Ae0 ¤ A e0 .

Hence

A’1 r

e r

¤ = κ(A) ,

A ’1 r0

e0 r0

as asserted.

The termination criterion (1.2) depends on the initial iterate and may result

in unnecessary work when the initial iterate is good and a poor result when the

initial iterate is far from the solution. For this reason we prefer to terminate

the iteration when

rk

(1.4) < „.

b

The two conditions (1.2) and (1.4) are the same when x0 = 0, which is a

common choice, particularly when the linear iteration is being used as part of

a nonlinear solver.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

5

BASIC CONCEPTS

1.2. The Banach Lemma and approximate inverses

The most straightforward approach to an iterative solution of a linear system

is to rewrite (1.1) as a linear ¬xed-point iteration. One way to do this is to

write Ax = b as

(1.5) x = (I ’ A)x + b,

and to de¬ne the Richardson iteration

(1.6) xk+1 = (I ’ A)xk + b.

We will discuss more general methods in which {xk } is given by

(1.7) xk+1 = M xk + c.

In (1.7) M is an N —N matrix called the iteration matrix. Iterative methods of

this form are called stationary iterative methods because the transition from xk

to xk+1 does not depend on the history of the iteration. The Krylov methods

discussed in Chapters 2 and 3 are not stationary iterative methods.

All our results are based on the following lemma.

Lemma 1.2.1. If M is an N — N matrix with M < 1 then I ’ M is

nonsingular and

1

(I ’ M )’1 ¤

(1.8) .

1’ M

Proof. We will show that I ’ M is nonsingular and that (1.8) holds by

showing that the series

∞

M l = (I ’ M )’1 .

l=0

The partial sums

k

Ml

Sk =

l=0

form a Cauchy sequence in RN —N . To see this note that for all m > k

m

Ml .

Sk ’ Sm ¤

l=k+1

Now, M l ¤ M l because · is a matrix norm that is induced by a vector

norm. Hence

m

1 ’ M m’k

l k+1

Sk ’ S m ¤ M =M ’0

1’ M

l=k+1

as m, k ’ ∞. Hence the sequence Sk converges, say to S. Since M Sk + I =

Sk+1 , we must have M S + I = S and hence (I ’ M )S = I. This proves that

I ’ M is nonsingular and that S = (I ’ M )’1 .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

6 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Noting that

∞

’1 l

= (1 ’ M )’1 .

(I ’ M ) ¤ M

l=0

proves (1.8) and completes the proof.

The following corollary is a direct consequence of Lemma 1.2.1.

Corollary 1.2.1. If M < 1 then the iteration (1.7) converges to

x = (I ’ M )’1 c for all initial iterates x0 .

A consequence of Corollary 1.2.1 is that Richardson iteration (1.6) will

converge if I ’ A < 1. It is sometimes possible to precondition a linear

equation by multiplying both sides of (1.1) by a matrix B

BAx = Bb

so that convergence of iterative methods is improved. In the context of

Richardson iteration, the matrices B that allow us to apply the Banach lemma

and its corollary are called approximate inverses.

Definition 1.2.1. B is an approximate inverse of A if I ’ BA < 1.

The following theorem is often referred to as the Banach Lemma.

Theorem 1.2.1. If A and B are N — N matrices and B is an approximate

inverse of A, then A and B are both nonsingular and

B A

A’1 ¤ B ’1 ¤

(1.9) , ,

1 ’ I ’ BA 1 ’ I ’ BA

and

B I ’ BA A I ’ BA

A’1 ’ B ¤ A ’ B ’1 ¤

(1.10) , .

1 ’ I ’ BA 1 ’ I ’ BA

Proof. Let M = I ’ BA. By Lemma 1.2.1 I ’ M = I ’ (I ’ BA) = BA is

nonsingular. Hence both A and B are nonsingular. By (1.8)

1 1

A’1 B ’1 = (I ’ M )’1 ¤

(1.11) = .

1’ M 1 ’ I ’ BA

Since A’1 = (I ’ M )’1 B, inequality (1.11) implies the ¬rst part of (1.9). The

second part follows in a similar way from B ’1 = A(I ’ M )’1 .

To complete the proof note that

A’1 ’ B = (I ’ BA)A’1 , A ’ B ’1 = B ’1 (I ’ BA),

and use (1.9).

Richardson iteration, preconditioned with approximate inversion, has the

form

(1.12) xk+1 = (I ’ BA)xk + Bb.

If the norm of I ’ BA is small, then not only will the iteration converge

rapidly, but, as Lemma 1.1.1 indicates, termination decisions based on the

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

7

BASIC CONCEPTS

preconditioned residual Bb ’ BAx will better re¬‚ect the actual error. This

method is a very e¬ective technique for solving di¬erential equations, integral

equations, and related problems [15], [6], [100], [117], [111]. Multigrid methods

[19], [99], [126], can also be interpreted in this light. We mention one other

approach, polynomial preconditioning, which tries to approximate A’1 by a

polynomial in A [123], [179], [169].

1.3. The spectral radius

The analysis in § 1.2 related convergence of the iteration (1.7) to the norm of

the matrix M . However the norm of M could be small in some norms and

quite large in others. Hence the performance of the iteration is not completely

described by M . The concept of spectral radius allows us to make a complete

description.

We let σ(A) denote the set of eigenvalues of A.

Definition 1.3.1. The spectral radius of an N — N matrix A is

ρ(A) = max |»| = lim An 1/n

(1.13) .

n’∞

»∈σ(A)

The term on the right-hand side of the second equality in (1.13) is the limit

used by the radical test for convergence of the series An .

The spectral radius of M is independent of any particular matrix norm of

M . It is clear, in fact, that

(1.14) ρ(A) ¤ A

for any induced matrix norm. The inequality (1.14) has a partial converse that

allows us to completely describe the performance of iteration (1.7) in terms of

spectral radius. We state that converse as a theorem and refer to [105] for a

proof.

Theorem 1.3.1. Let A be an N — N matrix. Then for any > 0 there is

a norm · on RN such that

ρ(A) > A ’ .

A consequence of Theorem 1.3.1, Lemma 1.2.1, and Exercise 1.5.1 is a

characterization of convergent stationary iterative methods. The proof is left

as an exercise.

Theorem 1.3.2. Let M be an N —N matrix. The iteration (1.7) converges

for all c ∈ RN if and only if ρ(M ) < 1.

1.4. Matrix splittings and classical stationary iterative methods

There are ways to convert Ax = b to a linear ¬xed-point iteration that are

di¬erent from (1.5). Methods such as Jacobi, Gauss“Seidel, and sucessive

overrelaxation (SOR) iteration are based on splittings of A of the form

A = A1 + A2 ,

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

8 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

where A1 is a nonsingular matrix constructed so that equations with A1 as

coe¬cient matrix are easy to solve. Then Ax = b is converted to the ¬xed-

point problem

x = A’1 (b ’ A2 x).

1

The analysis of the method is based on an estimation of the spectral radius of

’1

the iteration matrix M = ’A1 A2 .

For a detailed description of the classical stationary iterative methods the

reader may consult [89], [105], [144], [193], or [200]. These methods are usually

less e¬cient than the Krylov methods discussed in Chapters 2 and 3 or the more

modern stationary methods based on multigrid ideas. However the classical

methods have a role as preconditioners. The limited description in this section

is intended as a review that will set some notation to be used later.

As a ¬rst example we consider the Jacobi iteration that uses the splitting

A1 = D, A2 = L + U,

where D is the diagonal of A and L and U are the (strict) lower and upper

triangular parts. This leads to the iteration matrix

MJAC = ’D’1 (L + U ).

Letting (xk )i denote the ith component of the kth iterate we can express Jacobi

iteration concretely as

«

(xk+1 )i = a’1 bi ’ aij (xk )j .

(1.15) ii

j=i

Note that A1 is diagonal and hence trivial to invert.

We present only one convergence result for the classical stationary iterative

methods.

Theorem 1.4.1. Let A be an N — N matrix and assume that for all

1¤i¤N

(1.16) 0< |aij | < |aii |.

j=i

Then A is nonsingular and the Jacobi iteration (1.15) converges to x— = A’1 b

for all b.

Proof. Note that the ith row sum of M = MJAC satis¬es

N

j=i |aij |

|mij | = < 1.

|aii |

j=1

Hence MJAC ∞ < 1 and the iteration converges to the unique solution of

x = M x + D’1 b. Also I ’ M = D’1 A is nonsingular and therefore A is

nonsingular.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

9

BASIC CONCEPTS

Gauss“Seidel iteration overwrites the approximate solution with the new

value as soon as it is computed. This results in the iteration

«

(xk+1 )i = a’1 bi ’ aij (xk )j ,

aij (xk+1 )j ’

ii

j<i j>i

the splitting

A1 = D + L, A2 = U,

and iteration matrix

MGS = ’(D + L)’1 U.

’1

Note that A1 is lower triangular, and hence A1 y is easy to compute for vectors

y. Note also that, unlike Jacobi iteration, the iteration depends on the ordering

of the unknowns. Backward Gauss“Seidel begins the update of x with the N th

coordinate rather than the ¬rst, resulting in the splitting

A1 = D + U, A2 = L,

and iteration matrix

MBGS = ’(D + U )’1 L.

A symmetric Gauss“Seidel iteration is a forward Gauss“Seidel iteration

followed by a backward Gauss“Seidel iteration. This leads to the iteration

matrix

MSGS = MBGS MGS = (D + U )’1 L(D + L)’1 U.

If A is symmetric then U = LT . In that event

MSGS = (D + U )’1 L(D + L)’1 U = (D + LT )’1 L(D + L)’1 LT .

From the point of view of preconditioning, one wants to write the stationary

method as a preconditioned Richardson iteration. That means that one wants

to ¬nd B such that M = I ’ BA and then use B as an approximate inverse.

For the Jacobi iteration,

BJAC = D’1 .

(1.17)

For symmetric Gauss“Seidel

BSGS = (D + LT )’1 D(D + L)’1 .

(1.18)

The successive overrelaxation iteration modi¬es Gauss“Seidel by adding a

relaxation parameter ω to construct an iteration with iteration matrix

MSOR = (D + ωL)’1 ((1 ’ ω)D ’ ωU ).

The performance can be dramatically improved with a good choice of ω but

still is not competitive with Krylov methods. A further disadvantage is that

the choice of ω is often di¬cult to make. References [200], [89], [193], [8], and

the papers cited therein provide additional reading on this topic.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

10 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

1.5. Exercises on stationary iterative methods

1.5.1. Show that if ρ(M ) ≥ 1 then there are x0 and c such that the iteration

(1.7) fails to converge.

1.5.2. Prove Theorem 1.3.2.

1.5.3. Verify equality (1.18).

1.5.4. Show that if A is symmetric and positive de¬nite (that is AT = A and

xT Ax > 0 for all x = 0) that BSGS is also symmetric and positive

de¬nite.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 2

Conjugate Gradient Iteration

2.1. Krylov methods and the minimization property

In the following two chapters we describe some of the Krylov space methods

for linear equations. Unlike the stationary iterative methods, Krylov methods

do not have an iteration matrix. The two such methods that we™ll discuss in

depth, conjugate gradient and GMRES, minimize, at the kth iteration, some

measure of error over the a¬ne space

x0 + Kk ,

where x0 is the initial iterate and the kth Krylov subspace Kk is

Kk = span(r0 , Ar0 , . . . , Ak’1 r0 )

for k ≥ 1.

The residual is

r = b ’ Ax.

So {rk }k≥0 will denote the sequence of residuals

rk = b ’ Axk .

As in Chapter 1, we assume that A is a nonsingular N — N matrix and let

x— = A’1 b.

There are other Krylov methods that are not as well understood as CG or

GMRES. Brief descriptions of several of these methods and their properties

are in § 3.6, [12], and [78].

The conjugate gradient (CG) iteration was invented in the 1950s [103] as a

direct method. It has come into wide use over the last 15 years as an iterative

method and has generally superseded the Jacobi“Gauss“Seidel“SOR family of

methods.

CG is intended to solve symmetric positive de¬nite (spd) systems. Recall

that A is symmetric if A = AT and positive de¬nite if

xT Ax > 0 for all x = 0.

11

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

12 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

In this section we assume that A is spd. Since A is spd we may de¬ne a norm

(you should check that this is a norm) by

√

xT Ax.

(2.1) x =

A

· A is called the A-norm. The development in these notes is di¬erent from

the classical work and more like the analysis for GMRES and CGNR in [134].

In this section, and in the section on GMRES that follows, we begin with a

description of what the algorithm does and the consequences of the minimiza-

tion property of the iterates. After that we describe termination criterion,

performance, preconditioning, and at the very end, the implementation.

The kth iterate xk of CG minimizes

1

φ(x) = xT Ax ’ xT b

(2.2)

2

over x0 + Kk .

Note that if φ(˜) is the minimal value (in RN ) then

x

∇φ(˜) = A˜ ’ b = 0

x x

and hence x = x— .

˜

Minimizing φ over any subset of RN is the same as minimizing x ’ x— A

over that subset. We state this as a lemma.

Lemma 2.1.1. Let S ‚ RN . If xk minimizes φ over S then xk also

minimizes x— ’ x A = r A’1 over S.

Proof. Note that

x ’ x— 2

= (x ’ x— )T A(x ’ x— ) = xT Ax ’ xT Ax— ’ (x— )T Ax + (x— )T Ax— .

A

Since A is symmetric and Ax— = b

’xT Ax— ’ (x— )T Ax = ’2xT Ax— = ’2xT b.

Therefore

x ’ x— 2

= 2φ(x) + (x— )T Ax— .

A

Since (x— )T Ax— is independent of x, minimizing φ is equivalent to minimizing

x ’ x— 2 and hence to minimizing x ’ x— A .

A

If e = x ’ x— then

2

= eT Ae = (A(x ’ x— ))T A’1 (A(x ’ x— )) = b ’ Ax 2

e A A’1

and hence the A-norm of the error is also the A’1 -norm of the residual.

We will use this lemma in the particular case that S = x0 + Kk for some k.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

13

CONJUGATE GRADIENT ITERATION

2.2. Consequences of the minimization property

Lemma 2.1.1 implies that since xk minimizes φ over x0 + Kk

x— ’ xk ¤ x— ’ w

(2.3) A A

for all w ∈ x0 + Kk . Since any w ∈ x0 + Kk can be written as

k’1

γj Aj r0 + x0

w=

j=0

for some coe¬cients {γj }, we can express x— ’ w as

k’1

— —

γj Aj r0 .

x ’ w = x ’ x0 ’

j=0

Since Ax— = b we have

r0 = b ’ Ax0 = A(x— ’ x0 )

and therefore

k’1

— —

γj Aj+1 (x— ’ x0 ) = p(A)(x— ’ x0 ),

x ’ w = x ’ x0 ’

j=0

where the polynomial

k’1

γj z j+1

p(z) = 1 ’

j=0

has degree k and satis¬es p(0) = 1. Hence

x— ’ xk p(A)(x— ’ x0 )

(2.4) = min A.

A

p∈Pk ,p(0)=1

In (2.4) Pk denotes the set of polynomials of degree k.

The spectral theorem for spd matrices asserts that

A = U ΛU T ,

where U is an orthogonal matrix whose columns are the eigenvectors of A and

Λ is a diagonal matrix with the positive eigenvalues of A on the diagonal. Since

U U T = U T U = I by orthogonality of U , we have

Aj = U Λ j U T .

Hence

p(A) = U p(Λ)U T .

De¬ne A1/2 = U Λ1/2 U T and note that

2

= xT Ax = A1/2 x 2 .

(2.5) x A 2

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

14 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Hence, for any x ∈ RN and

= A1/2 p(A)x A1/2 x

p(A)x ¤ p(A) ¤ p(A) x A.

2 2 2 2

A

This, together with (2.4) implies that

xk ’ x— ¤ x0 ’ x—

(2.6) min max |p(z)|.

A A

p∈Pk ,p(0)=1 z∈σ(A)

Here σ(A) is the set of all eigenvalues of A.

The following corollary is an important consequence of (2.6).

Corollary 2.2.1. Let A be spd and let {xk } be the CG iterates. Let k be

given and let {¯k } be any kth degree polynomial such that pk (0) = 1. Then

p ¯

xk ’ x— A

(2.7) ¤ max |¯k (z)|.

p

x0 ’ x— z∈σ(A)

A

We will refer to the polynomial pk as a residual polynomial [185].

¯

Definition 2.2.1. The set of kth degree residual polynomials is

(2.8) Pk = {p | p is a polynomial of degree k and p(0) = 1.}

In speci¬c contexts we try to construct sequences of residual polynomials,

based on information on σ(A), that make either the middle or the right term

in (2.7) easy to evaluate. This leads to an upper estimate for the number of

CG iterations required to reduce the A-norm of the error to a given tolerance.

One simple application of (2.7) is to show how the CG algorithm can be

viewed as a direct method.

Theorem 2.2.1. Let A be spd. Then the CG algorithm will ¬nd the

solution within N iterations.

Proof. Let {»i }N be the eigenvalues of A. As a test polynomial, let

i=1

N

p(z) =

¯ (»i ’ z)/»i .

i=1

p ∈ PN because p has degree N and p(0) = 1. Hence, by (2.7) and the fact

¯ ¯ ¯

that p vanishes on σ(A),

¯

xN ’ x— ¤ x0 ’ x— max |¯(z)| = 0.

p

A A

z∈σ(A)

Note that our test polynomial had the eigenvalues of A as its roots. In

that way we showed (in the absence of all roundo¬ error!) that CG terminated

in ¬nitely many iterations with the exact solution. This is not as good as it

sounds, since in most applications the number of unknowns N is very large,

and one cannot a¬ord to perform N iterations. It is best to regard CG as an

iterative method. When doing that we seek to terminate the iteration when

some speci¬ed error tolerance is reached.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

15

CONJUGATE GRADIENT ITERATION

In the two examples that follow we look at some other easy consequences

of (2.7).

Theorem 2.2.2. Let A be spd with eigenvectors {ui }N . Let b be a linear

i=1

combination of k of the eigenvectors of A

k

b= γl uil .

l=1

Then the CG iteration for Ax = b with x0 = 0 will terminate in at most k

iterations.

Proof. Let {»il } be the eigenvalues of A associated with the eigenvectors

{uil }k . By the spectral theorem

l=1

k

—

x= (γl /»il )uil .

l=1

We use the residual polynomial,

k

p(z) =

¯ (»il ’ z)/»il .

l=1

One can easily verify that p ∈ Pk . Moreover, p(»il ) = 0 for 1 ¤ l ¤ k and

¯ ¯

hence

k

—

p(A)x =

¯ p(»il )γl /»il uil = 0.

¯

l=1

So, we have by (2.4) and the fact that x0 = 0 that

xk ’ x— ¤ p(A)x—

¯ = 0.

A A

This completes the proof.

If the spectrum of A has fewer than N points, we can use a similar technique

to prove the following theorem.

Theorem 2.2.3. Let A be spd. Assume that there are exactly k ¤ N

distinct eigenvalues of A. Then the CG iteration terminates in at most k

iterations.

2.3. Termination of the iteration

In practice we do not run the CG iteration until an exact solution is found, but

rather terminate once some criterion has been satis¬ed. One typical criterion is

small (say ¤ ·) relative residuals. This means that we terminate the iteration

after

(2.9) b ’ Axk 2 ¤ · b 2 .

The error estimates that come from the minimization property, however, are

based on (2.7) and therefore estimate the reduction in the relative A-norm of

the error.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

16 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Our next task is to relate the relative residual in the Euclidean norm to

the relative error in the A-norm. We will do this in the next two lemmas and

then illustrate the point with an example.

Lemma 2.3.1. Let A be spd with eigenvalues »1 ≥ »2 ≥ . . . »N . Then for

all z ∈ RN ,

A1/2 z 2 = z A

(2.10)

and

1/2 1/2

(2.11) »N z ¤ Az ¤ »1 z A.

2

A

Proof. Clearly

2

= z T Az = (A1/2 z)T (A1/2 z) = A1/2 z 2

z A 2

which proves (2.10).

Let ui be a unit eigenvector corresponding to »i . We may write A = U ΛU T

as

N

»i (uT z)ui .

Az = i

i=1

Hence

N

»N A1/2 z 2 T2

= »N i=1 »i (ui z)

2

N

2 2T2

¤ Az = i=1 »i (ui z)

2

N T2 = »1 A1/2 z 2 .

¤ »1 i=1 »i (ui z) 2

Taking square roots and using (2.10) complete the proof.

Lemma 2.3.2.

xk ’ x—

b b ’ Axk b ’ Axk

2 2 2 A

(2.12) = ¤ κ2 (A) —

r0 b2 b ’ Ax0 x ’ x0

2 2 A

and

xk ’ x—

b ’ Axk κ2 (A) r0

2 2 A

(2.13) ¤ .

x— ’ x0

b2 b2 A

Proof. The equality on the left of (2.12) is clear and (2.13) follows directly

from (2.12). To obtain the inequality on the right of (2.12), ¬rst recall that if

A = U ΛU T is the spectral decomposition of A and we order the eigenvalues

such that »1 ≥ »2 ≥ . . . »N > 0, then A 2 = »1 and A’1 2 = 1/»N . So

κ2 (A) = »1 /»N .

Therefore, using (2.10) and (2.11) twice,

A(x— ’ xk ) »1 x— ’ xk

b ’ Axk 2 2 A

= ¤

A(x— ’ x0 ) »N x— ’ x0

b ’ Ax0 2 2 A

as asserted.

So, to predict the performance of the CG iteration based on termination on

small relative residuals, we must not only use (2.7) to predict when the relative

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

17

CONJUGATE GRADIENT ITERATION

A-norm error is small, but also use Lemma 2.3.2 to relate small A-norm errors

to small relative residuals.

We consider a very simple example. Assume that x0 = 0 and that the

eigenvalues of A are contained in the interval (9, 11). If we let pk (z) =

¯

k /10k , then p ∈ P . This means that we may apply (2.7) to get

(10 ’ z) ¯k k

xk ’ x— ¤ x— max |¯k (z)|.

p

A A

9¤z¤11

It is easy to see that

max |¯k (z)| = 10’k .

p

9¤z¤11

Hence, after k iterations

xk ’ x— ¤ x— ’k

(2.14) A 10 .

A

So, the size of the A-norm of the error will be reduced by a factor of 10’3 when

10’k ¤ 10’3 ,

that is, when

k ≥ 3.

To use Lemma 2.3.2 we simply note that κ2 (A) ¤ 11/9. Hence, after k

iterations we have

Axk ’ b 2 √

¤ 11 — 10’k /3.

b2

So, the size of the relative residual will be reduced by a factor of 10’3 when

√

10’k ¤ 3 — 10’3 / 11,

that is, when

k ≥ 4.

One can obtain a more precise estimate by using a polynomial other than

pk in the upper estimate for the right-hand side of (2.7). Note that it is always

the case that the spectrum of a spd matrix is contained in the interval [»N , »1 ]

and that κ2 (A) = »1 /»N . A result from [48] (see also [45]) that is, in one

sense, the sharpest possible, is

k

κ2 (A) ’ 1

— —

(2.15) xk ’ x ¤ 2 x0 ’ x .

A A

κ2 (A) + 1

In the case of the above example, we can estimate κ2 (A) by κ2 (A) ¤ 11/9.

√ √

Hence, since ( x ’ 1)/( x + 1) is an increasing function of x on the interval

(1, ∞). √

κ2 (A) ’ 1 11 ’ 3

¤√ ≈ .05.

κ2 (A) + 1 11 + 3

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

18 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Therefore (2.15) would predict a reduction in the size of the A-norm error by

a factor of 10’3 when

2 — .05k < 10’3

or when

k > ’ log10 (2000)/ log10 (.05) ≈ 3.3/1.3 ≈ 2.6,

which also predicts termination within three iterations.

We may have more precise information than a single interval containing

σ(A). When we do, the estimate in (2.15) can be very pessimistic. If the

eigenvalues cluster in a small number of intervals, the condition number can

be quite large, but CG can perform very well. We will illustrate this with an

example. Exercise 2.8.5 also covers this point.

Assume that x0 = 0 and the eigenvalues of A lie in the two intervals (1, 1.5)

and (399, 400). Based on this information the best estimate of the condition

number of A is κ2 (A) ¤ 400, which, when inserted into (2.15) gives

xk ’ x— A

¤ 2 — (19/21)k ≈ 2 — (.91)k .

x— A

This would indicate fairly slow convergence. However, if we use as a residual

polynomial p3k ∈ P3k

¯

(1.25 ’ z)k (400 ’ z)2k

p3k (z) =

¯ .

(1.25)k — 4002k

It is easy to see that

max |¯3k (z)| ¤ (.25/1.25)k = (.2)k ,

p

z∈σ(A)

which is a sharper estimate on convergence. In fact, (2.15) would predict that

xk ’ x— ¤ 10’3 x— A,

A

when 2 — (.91)k < 10’3 or when

k > ’ log10 (2000)/ log10 (.91) ≈ 3.3/.04 = 82.5.

The estimate based on the clustering gives convergence in 3k iterations when

(.2)k ¤ 10’3

or when

k > ’3/ log10 (.2) = 4.3.

Hence (2.15) predicts 83 iterations and the clustering analysis 15 (the smallest

integer multiple of 3 larger than 3 — 4.3 = 12.9).

From the results above one can see that if the condition number of A is near

one, the CG iteration will converge very rapidly. Even if the condition number

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

19

CONJUGATE GRADIENT ITERATION

is large, the iteration will perform well if the eigenvalues are clustered in a few

small intervals. The transformation of the problem into one with eigenvalues

clustered near one (i.e., easier to solve) is called preconditioning. We used

this term before in the context of Richardson iteration and accomplished the

goal by multiplying A by an approximate inverse. In the context of CG, such

a simple approach can destroy the symmetry of the coe¬cient matrix and a

more subtle implementation is required. We discuss this in § 2.5.

2.4. Implementation

The implementation of CG depends on the amazing fact that once xk has been

determined, either xk = x— or a search direction pk+1 = 0 can be found very

cheaply so that xk+1 = xk + ±k+1 pk+1 for some scalar ±k+1 . Once pk+1 has

been found, ±k+1 is easy to compute from the minimization property of the

iteration. In fact

dφ(xk + ±pk+1 )

(2.16) =0

d±

for the correct choice of ± = ±k+1 . Equation (2.16) can be written as

pT Axk + ±pT Apk+1 ’ pT b = 0

k+1 k+1 k+1

leading to

pT (b ’ Axk ) pT r k

k+1

= T k+1

(2.17) ±k+1 = .

T Ap

pk+1 k+1 pk+1 Apk+1

If xk = xk+1 then the above analysis implies that ± = 0. We show that

this only happens if xk is the solution.

Lemma 2.4.1. Let A be spd and let {xk } be the conjugate gradient iterates.

Then

T

(2.18) rk rl = 0 for all 0 ¤ l < k.

Proof. Since xk minimizes φ on x0 + Kk , we have, for any ξ ∈ Kk

dφ(xk + tξ)

= ∇φ(xk + tξ)T ξ = 0

dt

at t = 0. Recalling that

∇φ(x) = Ax ’ b = ’r

we have

∇φ(xk )T ξ = ’rk ξ = 0 for all ξ ∈ Kk .

T

(2.19)

Since rl ∈ Kk for all l < k (see Exercise 2.8.1), this proves (2.18).

Now, if xk = xk+1 , then rk = rk+1 . Lemma 2.4.1 then implies that

rk 2 = rk rk = rk rk+1 = 0 and hence xk = x— .

T T

2

The next lemma characterizes the search direction and, as a side e¬ect,

proves that (if we de¬ne p0 = 0) pT rk = 0 for all 0 ¤ l < k ¤ n, unless the

l

iteration terminates prematurely.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

20 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Lemma 2.4.2. Let A be spd and let {xk } be the conjugate gradient iterates.

If xk = x— then xk+1 = xk + ±k+1 pk+1 and pk+1 is determined up to a scalar

multiple by the conditions

pk+1 ∈ Kk+1 , pT Aξ = 0 for all ξ ∈ Kk .

(2.20) k+1

Proof. Since Kk ‚ Kk+1 ,

∇φ(xk+1 )T ξ = (Axk + ±k+1 Apk+1 ’ b)T ξ = 0

(2.21)

for all ξ ∈ Kk . (2.19) and (2.21) then imply that for all ξ ∈ Kk ,

±k+1 pT Aξ = ’(Axk ’ b)T ξ = ’∇φ(xk )T ξ = 0.

(2.22) k+1

This uniquely speci¬es the direction of pk+1 as (2.22) implies that pk+1 ∈ Kk+1

is A-orthogonal (i.e., in the scalar product (x, y) = xT Ay) to Kk , a subspace

of dimension one less than Kk+1 .

The condition pT Aξ = 0 is called A-conjugacy of pk+1 to Kk . Now, any

k+1

pk+1 satisfying (2.20) can, up to a scalar multiple, be expressed as

pk+1 = rk + wk

with wk ∈ Kk . While one might think that wk would be hard to compute, it

is, in fact, trivial. We have the following theorem.

Theorem 2.4.1. Let A be spd and assume that rk = 0. De¬ne p0 = 0.

Then

(2.23) pk+1 = rk + βk+1 pk for some βk+1 and k ≥ 0.

Proof. By Lemma 2.4.2 and the fact that Kk = span(r0 , . . . , rk’1 ), we need

only verify that a βk+1 can be found so that if pk+1 is given by (2.23) then

pT Arl = 0

k+1

for all 0 ¤ l ¤ k ’ 1.

Let pk+1 be given by (2.23). Then for any l ¤ k

pT Arl = rk Arl + βk+1 pT Arl .

T

k+1 k

If l ¤ k ’ 2, then rl ∈ Kl+1 ‚ Kk’1 . Lemma 2.4.2 then implies that

pT Arl = 0 for 0 ¤ l ¤ k ’ 2.

k+1

It only remains to solve for βk+1 so that pT Ark’1 = 0. Trivially

k+1

βk+1 = ’rk Ark’1 /pT Ark’1

T

(2.24) k

provided pT Ark’1 = 0. Since

k

rk = rk’1 ’ ±k Apk

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.

Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

21

CONJUGATE GRADIENT ITERATION

we have

T 2

’ ±k pT Ark’1 .

rk rk’1 = rk’1 2 k

T

Since rk rk’1 = 0 by Lemma 2.4.1 we have

pT Ark’1 = rk’1 2 /±k = 0.

(2.25) k 2

This completes the proof.

The common implementation of conjugate gradient uses a di¬erent form

for ±k and βk than given in (2.17) and (2.24).

Lemma 2.4.3. Let A be spd and assume that rk = 0. Then

rk 22

(2.26) ±k+1 =T

pk+1 Apk+1

and

2

rk 2

(2.27) βk+1 = 2.

rk’1 2

Proof. Note that for k ≥ 0

pT rk+1 = rk rk+1 + βk+1 pT rk+1 = 0

T

(2.28) k+1 k

by Lemma 2.4.2. An immediate consequence of (2.28) is that pT rk = 0 and

k

hence

pT rk = (rk + βk+1 pk )T rk = rk 2 .

(2.29) k+1 2

Taking scalar products of both sides of

rk+1 = rk ’ ±k+1 Apk+1

with pk+1 and using (2.29) gives

0 = pT rk ’ ±k+1 pT Apk+1 = rk

T 2

’ ±k+1 pT Apk+1 ,

k+1 k+1 2 k+1

which is equivalent to (2.26).

To get (2.27) note that pT Apk = 0 and hence (2.23) implies that

k+1

T

’rk Apk

(2.30) βk+1 =T .

pk Apk

Also note that

pT Apk = pT A(rk’1 + βk pk’1 )

k k

(2.31)