<<

. 2
( 7)



>>

= pT Ark’1 + βk pT Apk’1 = pT Ark’1 .
k k k

Now combine (2.30), (2.31), and (2.25) to get
T
’rk Apk ±k
βk+1 = .
rk’1 22




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




22 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Now take scalar products of both sides of

rk = rk’1 ’ ±k Apk

with rk and use Lemma 2.4.1 to get
2 T
rk = ’±k rk Apk .
2

Hence (2.27) holds.
The usual implementation re¬‚ects all of the above results. The goal is to
¬nd, for a given , a vector x so that b’Ax 2 ¤ b 2 . The input is the initial
iterate x, which is overwritten with the solution, the right hand side b, and a
routine which computes the action of A on a vector. We limit the number of
iterations to kmax and return the solution, which overwrites the initial iterate
x and the residual norm.
Algorithm 2.4.1. cg(x, b, A, , kmax)
1. r = b ’ Ax, ρ0 = r 2 , k = 1.
2

2. Do While ρk’1 > b 2 and k < kmax
(a) if k = 1 then p = r
else
β = ρk’1 /ρk’2 and p = r + βp
(b) w = Ap
(c) ± = ρk’1 /pT w
(d) x = x + ±p
(e) r = r ’ ±w
2
(f) ρk = r 2
(g) k = k + 1

Note that the matrix A itself need not be formed or stored, only a routine
for matrix-vector products is required. Krylov space methods are often called
matrix-free for that reason.
Now, consider the costs. We need store only the four vectors x, w, p, and r.
Each iteration requires a single matrix-vector product (to compute w = Ap),
two scalar products (one for pT w and one to compute ρk = r 2 ), and three
2
operations of the form ax + y, where x and y are vectors and a is a scalar.
It is remarkable that the iteration can progress without storing a basis for
the entire Krylov subspace. As we will see in the section on GMRES, this is
not the case in general. The spd structure buys quite a lot.

2.5. Preconditioning
To reduce the condition number, and hence improve the performance of the
iteration, one might try to replace Ax = b by another spd system with the
same solution. If M is a spd matrix that is close to A’1 , then the eigenvalues



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




23
CONJUGATE GRADIENT ITERATION

of M A will be clustered near one. However M A is unlikely to be spd, and
hence CG cannot be applied to the system M Ax = M b.
In theory one avoids this di¬culty by expressing the preconditioned
problem in terms of B, where B is spd, A = B 2 , and by using a two-sided
preconditioner, S ≈ B ’1 (so M = S 2 ). Then the matrix SAS is spd and its
eigenvalues are clustered near one. Moreover the preconditioned system

SASy = Sb

has y — = S ’1 x— as a solution, where Ax— = b. Hence x— can be recovered
from y — by multiplication by S. One might think, therefore, that computing
S (or a subroutine for its action on a vector) would be necessary and that a
matrix-vector multiply by SAS would incur a cost of one multiplication by A
and two by S. Fortunately, this is not the case.
If y k , rk , pk are the iterate, residual, and search direction for CG applied
ˆˆ
to SAS and we let

xk = S y k , rk = S ’1 rk , pk = S pk , and zk = S rk ,
ˆ ˆ ˆ ˆ

then one can perform the iteration directly in terms of xk , A, and M . The
reader should verify that the following algorithm does exactly that. The input
is the same as that for Algorithm cg and the routine to compute the action of
the preconditioner on a vector. Aside from the preconditioner, the arguments
to pcg are the same as those to Algorithm cg.
Algorithm 2.5.1. pcg(x, b, A, M, , kmax)
1. r = b ’ Ax, ρ0 = r 2 , k = 1
2

2. Do While ρk’1 > b 2 and k < kmax
(a) z = M r
(b) „k’1 = z T r
(c) if k = 1 then β = 0 and p = z
else
β = „k’1 /„k’2 , p = z + βp
(d) w = Ap
(e) ± = „k’1 /pT w
(f) x = x + ±p
(g) r = r ’ ±w
(h) ρk = rT r
(i) k = k + 1

Note that the cost is identical to CG with the addition of
• the application of the preconditioner M in step 2a and

• the additional inner product required to compute „k in step 2b.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




24 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Of these costs, the application of the preconditioner is usually the larger. In
the remainder of this section we brie¬‚y mention some classes of preconditioners.
A more complete and detailed discussion of preconditioners is in [8] and a
concise survey with many pointers to the literature is in [12].
Some e¬ective preconditioners are based on deep insight into the structure
of the problem. See [124] for an example in the context of partial di¬erential
equations, where it is shown that certain discretized second-order elliptic
problems on simple geometries can be very well preconditioned with fast
Poisson solvers [99], [188], and [187]. Similar performance can be obtained from
multigrid [99], domain decomposition, [38], [39], [40], and alternating direction
preconditioners [8], [149], [193], [194]. We use a Poisson solver preconditioner
in the examples in § 2.7 and § 3.7 as well as for nonlinear problems in § 6.4.2
and § 8.4.2.
One commonly used and easily implemented preconditioner is Jacobi
preconditioning, where M is the inverse of the diagonal part of A. One can also
use other preconditioners based on the classical stationary iterative methods,
such as the symmetric Gauss“Seidel preconditioner (1.18). For applications to
partial di¬erential equations, these preconditioners may be somewhat useful,
but should not be expected to have dramatic e¬ects.
Another approach is to apply a sparse Cholesky factorization to the
matrix A (thereby giving up a fully matrix-free formulation) and discarding
small elements of the factors and/or allowing only a ¬xed amount of storage
for the factors. Such preconditioners are called incomplete factorization
preconditioners. So if A = LLT + E, where E is small, the preconditioner
is (LLT )’1 and its action on a vector is done by two sparse triangular solves.
We refer the reader to [8], [127], and [44] for more detail.
One could also attempt to estimate the spectrum of A, ¬nd a polynomial
p such that 1 ’ zp(z) is small on the approximate spectrum, and use p(A) as a
preconditioner. This is called polynomial preconditioning. The preconditioned
system is

p(A)Ax = p(A)b


and we would expect the spectrum of p(A)A to be more clustered near z = 1
than that of A. If an interval containing the spectrum can be found, the
residual polynomial q(z) = 1 ’ zp(z) of smallest L∞ norm on that interval
can be expressed in terms of Chebyshev [161] polynomials. Alternatively
q can be selected to solve a least squares minimization problem [5], [163].
The preconditioning p can be directly recovered from q and convergence rate
estimates made. This technique is used to prove the estimate (2.15), for
example. The cost of such a preconditioner, if a polynomial of degree K is
used, is K matrix-vector products for each application of the preconditioner
[5]. The performance gains can be very signi¬cant and the implementation is
matrix-free.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




25
CONJUGATE GRADIENT ITERATION

2.6. CGNR and CGNE
If A is nonsingular and nonsymmetric, one might consider solving Ax = b by
applying CG to the normal equations

AT Ax = AT b.
(2.32)

This approach [103] is called CGNR [71], [78], [134]. The reason for this name
is that the minimization property of CG as applied to (2.32) asserts that

x— ’ x 2 = (x— ’ x)T AT A(x— ’ x)
AT A
= (Ax— ’ Ax)T (Ax— ’ Ax) = (b ’ Ax)T (b ’ Ax) = r 2


is minimized over x0 + Kk at each iterate. Hence the name Conjugate Gradient
on the Normal equations to minimize the Residual.
Alternatively, one could solve

AAT y = b
(2.33)

and then set x = AT y to solve Ax = b. This approach [46] is now called CGNE
[78], [134]. The reason for this name is that the minimization property of CG
as applied to (2.33) asserts that if y — is the solution to (2.33) then

y— ’ y 2 = (y — ’ y)T (AAT )(y — ’ y) = (AT y — ’ AT y)T (AT y — ’ AT y)
AAT

= x— ’ x 2
2

is minimized over y0 + Kk at each iterate. Conjugate Gradient on the Normal
equations to minimize the Error.
The advantages of this approach are that all the theory for CG carries over
and the simple implementation for both CG and PCG can be used. There
are three disadvantages that may or may not be serious. The ¬rst is that the
condition number of the coe¬cient matrix AT A is the square of that of A.
The second is that two matrix-vector products are needed for each CG iterate
since w = AT Ap = AT (Ap) in CGNR and w = AAT p = A(AT p) in CGNE.
The third, more important, disadvantage is that one must compute the action
of AT on a vector as part of the matrix-vector product involving AT A. As we
will see in the chapter on nonlinear problems, there are situations where this
is not possible.
The analysis with residual polynomials is similar to that for CG. We
consider the case for CGNR, the analysis for CGNE is essentially the same.
As above, when we consider the AT A norm of the error, we have

x— ’ x 2
= (x— ’ x)T AT A(x— ’ x) = A(x— ’ x) 2
= r 2.
2 2
AT A

Hence, for any residual polynomial pk ∈ Pk ,
¯

¤ pk (AT A)r0
(2.34) rk ¯ ¤ r0 max |¯k (z)|.
p
2 2 2
z∈σ(AT A)




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




26 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

There are two major di¬erences between (2.34) and (2.7). The estimate is
in terms of the l2 norm of the residual, which corresponds exactly to the
termination criterion, hence we need not prove a result like Lemma 2.3.2. Most
signi¬cantly, the residual polynomial is to be maximized over the eigenvalues
of AT A, which is the set of the squares of the singular values of A. Hence the
performance of CGNR and CGNE is determined by the distribution of singular
values.

2.7. Examples for preconditioned conjugate iteration
In the collection of MATLAB codes we provide a code for preconditioned
conjugate gradient iteration. The inputs, described in the comment lines,
are the initial iterate, x0 , the right hand side vector b, MATLAB functions for
the matrix-vector product and (optionally) the preconditioner, and iteration
parameters to specify the maximum number of iterations and the termination
criterion. On return the code supplies the approximate solution x and the
history of the iteration as the vector of residual norms.
We consider the discretization of the partial di¬erential equation

(2.35) ’∇(a(x, y)∇u) = f (x, y)

on 0 < x, y < 1 subject to homogeneous Dirichlet boundary conditions

u(x, 0) = u(x, 1) = u(0, y) = u(1, y) = 0, 0 < x, y < 1.

One can verify [105] that the di¬erential operator is positive de¬nite in the
Hilbert space sense and that the ¬ve-point discretization described below is
positive de¬nite if a > 0 for all 0 ¤ x, y ¤ 1 (Exercise 2.8.10).
We discretize with a ¬ve-point centered di¬erence scheme with n2 points
and mesh width h = 1/(n + 1). The unknowns are

uij ≈ u(xi , xj )

where xi = ih for 1 ¤ i ¤ n. We set

u0j = u(n+1)j = ui0 = ui(n+1) = 0,

to re¬‚ect the boundary conditions, and de¬ne

±ij = ’a(xi , xj )h’2 /2.

We express the discrete matrix-vector product as

(Au)ij = (±ij + ±(i+1)j )(u(i+1)j ’ uij )

’(±(i’1)j + ±ij )(uij ’ u(i’1)j ) + (±i(j+1) + ±ij )(ui(j+1) ’ uij )
(2.36)

’(±ij + ±i(j’1) )(uij ’ ui(j’1) )



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




27
CONJUGATE GRADIENT ITERATION

for 1 ¤ i, j ¤ n.
For the MATLAB implementation we convert freely from the representa-
tion of u as a two-dimensional array (with the boundary conditions added),
which is useful for computing the action of A on u and applying fast solvers,
and the representation as a one-dimensional array, which is what pcgsol ex-
pects to see. See the routine fish2d in collection of MATLAB codes for an
example of how to do this in MATLAB.
For the computations reported in this section we took a(x, y) = cos(x) and
took the right hand side so that the exact solution was the discretization of

10xy(1 ’ x)(1 ’ y) exp(x4.5 ).

The initial iterate was u0 = 0.
In the results reported here we took n = 31 resulting in a system with
N = n2 = 961 unknowns. We expect second-order accuracy from the method
and accordingly we set termination parameter = h2 = 1/1024. We allowed
up to 100 CG iterations. The initial iterate was the zero vector. We will report
our results graphically, plotting rk 2 / b 2 on a semi-log scale.
In Fig. 2.1 the solid line is a plot of rk / b and the dashed line a plot of
— ’u —
u k A / u ’u0 A . Note that the reduction in r is not monotone. This is
consistent with the theory, which predicts decrease in e A but not necessarily
in r as the iteration progresses. Note that the unpreconditioned iteration is
slowly convergent. This can be explained by the fact that the eigenvalues are
not clustered and

κ(A) = O(1/h2 ) = O(n2 ) = O(N )

and hence (2.15) indicates that convergence will be slow. The reader is asked
to quantify this in terms of execution times in Exercise 2.8.9. This example
illustrates the importance of a good preconditioner. Even the unpreconditioned
iteration, however, is more e¬cient that the classical stationary iterative
methods.
For a preconditioner we use a Poisson solver. By this we mean an operator
G such that v = Gw is the solution of the discrete form of

’vxx ’ vyy = w,

subject to homogeneous Dirichlet boundary conditions. The e¬ectiveness of
such a preconditioner has been analyzed in [124] and some of the many ways
to implement the solver e¬ciently are discussed in [99], [188], [186], and [187].
The properties of CG on the preconditioned problem in the continuous
case have been analyzed in [48]. For many types of domains and boundary
conditions, Poisson solvers can be designed to take advantage of vector and/or
parallel architectures or, in the case of the MATLAB environment used in
this book, designed to take advantage of fast MATLAB built-in functions.
Because of this their execution time is less than a simple count of ¬‚oating-
point operations would indicate. The fast Poisson solver in the collection of



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




28 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS


1
10




Relative Residual and A-norm of Error
0
10



-1
10



-2
10



-3
10



-4
10
0 10 20 30 40 50 60
Iterations




Fig. 2.1. CG for 2-D elliptic equation.
1
10



0
10
Relative Residual




-1
10



-2
10



-3
10



-4
10
0 10 20 30 40 50 60
Iterations




Fig. 2.2. PCG for 2-D elliptic equation.


codes fish2d is based on the MATLAB fast Fourier transform, the built-in
function fft.
In Fig. 2.2 the solid line is the graph of rk 2 / b 2 for the preconditioned
iteration and the dashed line for the unpreconditioned. The preconditioned
iteration required 5 iterations for convergence and the unpreconditioned
iteration 52. Not only does the preconditioned iteration converge more
rapidly, but the number of iterations required to reduce the relative residual
by a given amount is independent of the mesh spacing [124]. We caution
the reader that the preconditioned iteration is not as much faster than the



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




29
CONJUGATE GRADIENT ITERATION

unpreconditioned one as the iteration count would suggest. The MATLAB
flops command indicates that the unpreconditioned iteration required roughly
1.2 million ¬‚oating-point operations while the preconditioned iteration required
.87 million ¬‚oating-point operations. Hence, the cost of the preconditioner is
considerable. In the MATLAB environment we used, the execution time of
the preconditioned iteration was about 60% of that of the unpreconditioned.
As we remarked above, this speed is a result of the e¬ciency of the MATLAB
fast Fourier transform. In Exercise 2.8.11 you are asked to compare execution
times for your own environment.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




30 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

2.8. Exercises on conjugate gradient
2.8.1. Let {xk } be the conjugate gradient iterates. Prove that rl ∈ Kk for all
l < k.

2.8.2. Let A be spd. Show that there is a spd B such that B 2 = A. Is B
unique?

2.8.3. Let Λ be a diagonal matrix with Λii = »i and let p be a polynomial.
Prove that p(Λ) = maxi |p(»i )| where · is any induced matrix norm.

2.8.4. Prove Theorem 2.2.3.

2.8.5. Assume that A is spd and that

σ(A) ‚ (1, 1.1) ∪ (2, 2.2).

Give upper estimates based on (2.6) for the number of CG iterations
required to reduce the A norm of the error by a factor of 10’3 and for
the number of CG iterations required to reduce the residual by a factor
of 10’3 .

2.8.6. For the matrix A in problem 5, assume that the cost of a matrix vector
multiply is 4N ¬‚oating-point multiplies. Estimate the number of ¬‚oating-
point operations reduce the A norm of the error by a factor of 10’3 using
CG iteration.

2.8.7. Let A be a nonsingular matrix with all singular values in the interval
(1, 2). Estimate the number of CGNR/CGNE iterations required to
reduce the relative residual by a factor of 10’4 .

2.8.8. Show that if A has constant diagonal then PCG with Jacobi precondi-
tioning produces the same iterates as CG with no preconditioning.

2.8.9. Assume that A is N — N , nonsingular, and spd. If κ(A) = O(N ), give
a rough estimate of the number of CG iterates required to reduce the
relative residual to O(1/N ).

2.8.10. Prove that the linear transformation given by (2.36) is symmetric and
2
positive de¬nite on Rn if a(x, y) > 0 for all 0 ¤ x, y ¤ 1.

2.8.11. Duplicate the results in § 2.7 for example, in MATLAB by writing the
matrix-vector product routines and using the MATLAB codes pcgsol
and fish2d. What happens as N is increased? How are the performance

and accuracy a¬ected by changes in a(x, y)? Try a(x, y) = .1 + x and
examine the accuracy of the result. Explain your ¬ndings. Compare
the execution times on your computing environment (using the cputime
command in MATLAB, for instance).



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




31
CONJUGATE GRADIENT ITERATION

2.8.12. Use the Jacobi and symmetric Gauss“Seidel iterations from Chapter 1
to solve the elliptic boundary value problem in § 2.7. How does the
performance compare to CG and PCG?

2.8.13. Implement Jacobi (1.17) and symmetric Gauss“Seidel (1.18) precondi-
tioners for the elliptic boundary value problem in § 2.7. Compare the
performance with respect to both computer time and number of itera-
tions to preconditioning with the Poisson solver.

2.8.14. Modify pcgsol so that φ(x) is computed and stored at each iterate
and returned on output. Plot φ(xn ) as a function of n for each of the
examples.

2.8.15. Apply CG and PCG to solve the ¬ve-point discretization of

’uxx (x, y) ’ uyy (x, y) + ex+y u(x, y) = 1, 0 < x, y, < 1,

subject to the inhomogeneous Dirichlet boundary conditions

u(x, 0) = u(x, 1) = u(1, y) = 0, u(0, y) = 1, 0 < x, y < 1.

Experiment with di¬erent mesh sizes and preconditioners (Fast Poisson
solver, Jacobi, and symmetric Gauss“Seidel).




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




32 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 3

GMRES Iteration




3.1. The minimization property and its consequences
The GMRES (Generalized Minimum RESidual) was proposed in 1986 in [167]
as a Krylov subspace method for nonsymmetric systems. Unlike CGNR,
GMRES does not require computation of the action of AT on a vector. This is
a signi¬cant advantage in many cases. The use of residual polynomials is made
more complicated because we cannot use the spectral theorem to decompose
A. Moreover, one must store a basis for Kk , and therefore storage requirements
increase as the iteration progresses.
The kth (k ≥ 1) iteration of GMRES is the solution to the least squares
problem
(3.1) minimizex∈x0 +Kk b ’ Ax 2 .
The beginning of this section is much like the analysis for CG. Note that
if x ∈ x0 + Kk then
k’1
γj Aj r0
x = x0 +
j=0

and so
k’1 k
j+1
γj’1 Aj r0 .
b ’ Ax = b ’ Ax0 ’ γj A r0 = r0 ’
j=0 j=1

Hence if x ∈ x0 + Kk then r = p(A)r0 where p ∈ Pk is a residual polynomial.
¯ ¯
We have just proved the following result.
Theorem 3.1.1. Let A be nonsingular and let xk be the kth GMRES
iteration. Then for all pk ∈ Pk
¯

(3.2) rk = min p(A)r0
¯ ¤ pk (A)r0 2 .
¯
2 2
p∈Pk

From this we have the following corollary.
Corollary 3.1.1. Let A be nonsingular and let xk be the kth GMRES
iteration. Then for all pk ∈ Pk
¯
rk 2
(3.3) ¤ pk (A) 2 .
¯
r0 2

33

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




34 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We can apply the corollary to prove ¬nite termination of the GMRES
iteration.
Theorem 3.1.2. Let A be nonsingular. Then the GMRES algorithm will
¬nd the solution within N iterations.
Proof. The characteristic polynomial of A is p(z) = det(A ’ zI). p has
degree N , p(0) = det(A) = 0 since A is nonsingular, and so

pN (z) = p(z)/p(0) ∈ PN
¯

is a residual polynomial. It is well known [141] that p(A) = pN (A) = 0. By
¯
(3.3), rN = b ’ AxN = 0 and hence xN is the solution.
In Chapter 2 we applied the spectral theorem to obtain more precise infor-
mation on convergence rates. This is not an option for general nonsymmetric
matrices. However, if A is diagonalizable we may use (3.2) to get information
from clustering of the spectrum just like we did for CG. We pay a price in that
we must use complex arithmetic for the only time in this book. Recall that
A is diagonalizable if there is a nonsingular (possibly complex!) matrix V such
that
A = V ΛV ’1 .
Here Λ is a (possibly complex!) diagonal matrix with the eigenvalues of A on
the diagonal. If A is a diagonalizable matrix and p is a polynomial then

p(A) = V p(Λ)V ’1

A is normal if the diagonalizing transformation V is orthogonal. In that case
the columns of V are the eigenvectors of A and V ’1 = V H . Here V H is the
complex conjugate transpose of V . In the remainder of this section we must
use complex arithmetic to analyze the convergence. Hence we will switch to
complex matrices and vectors. Recall that the scalar product in C N , the space
of complex N -vectors, is xH y. In particular, we will use the l2 norm in C N .
Our use of complex arithmetic will be implicit for the most part and is needed
only so that we may admit the possibility of complex eigenvalues of A.
We can use the structure of a diagonalizable matrix to prove the following
result.
Theorem 3.1.3. Let A = V ΛV ’1 be a nonsingular diagonalizable matrix.
Let xk be the kth GMRES iterate. Then for all pk ∈ Pk
¯

rk 2
(3.4) ¤ κ2 (V ) max |¯k (z)|.
p
r0 z∈σ(A)
2

Proof. Let pk ∈ Pk . We can easily estimate pk (A)
¯ ¯ by
2

V ’1
pk (A)
¯ ¤V pk (Λ)
¯ ¤ κ2 (V ) max |¯k (z)|,
p
2 2 2 2
z∈σ(A)

as asserted.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




35
GMRES ITERATION

It is not clear how one should estimate the condition number of the
diagonalizing transformation if it exists. If A is normal, of course, κ2 (V ) = 1.
As we did for CG, we look at some easy consequences of (3.3) and (3.4).
Theorem 3.1.4. Let A be a nonsingular diagonalizable matrix. Assume
that A has only k distinct eigenvalues. Then GMRES will terminate in at most
k iterations.
Theorem 3.1.5. Let A be a nonsingular normal matrix. Let b be a linear
combination of k of the eigenvectors of A
k
b= γl uil .
l=1

Then the GMRES iteration, with x0 = 0, for Ax = b will terminate in at most
k iterations.

3.2. Termination
As is the case with CG, GMRES is best thought of as an iterative method.
The convergence rate estimates for the diagonalizable case will involve κ2 (V ),
but will otherwise resemble those for CG. If A is not diagonalizable, rate
estimates have been derived in [139], [134], [192], [33], and [34]. As the set of
nondiagonalizable matrices has measure zero in the space of N — N matrices,
the chances are very high that a computed matrix will be diagonalizable. This
is particularly so for the ¬nite di¬erence Jacobian matrices we consider in
Chapters 6 and 8. Hence we con¬ne our attention to diagonalizable matrices.
As was the case with CG, we terminate the iteration when

(3.5) rk ¤· b
2 2

for the purposes of this example. We can use (3.3) and (3.4) directly to estimate
the ¬rst k such that (3.5) holds without requiring a lemma like Lemma 2.3.2.
Again we look at examples. Assume that A = V ΛV ’1 is diagonalizable,
that the eigenvalues of A lie in the interval (9, 11), and that κ2 (V ) = 100.
We assume that x0 = 0 and hence r0 = b. Using the residual polynomial
pk (z) = (10 ’ z)k /10k we ¬nd
¯
rk 2
¤ (100)10’k = 102’k .
r0 2

Hence (3.5) holds when 102’k < · or when

k > 2 + log10 (·).

¤ ρ < 1. Let pk (z) = (1 ’ z)k . It is a direct
Assume that I ’ A ¯
2
consequence of (3.2) that
¤ ρk r 0 2 .
(3.6) rk 2

The estimate (3.6) illustrates the potential bene¬ts of a good approximate
inverse preconditioner.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




36 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

The convergence estimates for GMRES in the nonnormal case are much
less satisfying that those for CG, CGNR, CGNE, or GMRES in the normal
case. This is a very active area of research and we refer to [134], [33], [120],
[34], and [36] for discussions of and pointers to additional references to several
questions related to nonnormal matrices.

3.3. Preconditioning
Preconditioning for GMRES and other methods for nonsymmetric problems is
di¬erent from that for CG. There is no concern that the preconditioned system
be spd and hence (3.6) essentially tells the whole story. However there are two
di¬erent ways to view preconditioning. If one can ¬nd M such that
I ’ MA = ρ < 1,
2

then applying GMRES to M Ax = M b allows one to apply (3.6) to the
preconditioned system. Preconditioning done in this way is called left
preconditioning. If r = M Ax ’ M b is the residual for the preconditioned
system, we have, if the product M A can be formed without error,
ek rk
2 2
¤ κ2 (M A) ,
e0 r0
2 2

by Lemma 1.1.1. Hence, if M A has a smaller condition number than A, we
might expect the relative residual of the preconditioned system to be a better
indicator of the relative error than the relative residual of the original system.
If
I ’ AM 2 = ρ < 1,
one can solve the system AM y = b with GMRES and then set x = M y. This is
called right preconditioning. The residual of the preconditioned problem is the
same as that of the unpreconditioned problem. Hence, the value of the relative
residuals as estimators of the relative error is unchanged. Right preconditioning
has been used as the basis for a method that changes the preconditioner as the
iteration progresses [166].
One important aspect of implementation is that, unlike PCG, one can
apply the algorithm directly to the system M Ax = M b (or AM y = b). Hence,
one can write a single matrix-vector product routine for M A (or AM ) that
includes both the application of A to a vector and that of the preconditioner.
Most of the preconditioning ideas mentioned in § 2.5 are useful for GMRES
as well. In the examples in § 3.7 we use the Poisson solver preconditioner for
nonsymmetric partial di¬erential equations. Multigrid [99] and alternating
direction [8], [182] methods have similar performance and may be more
generally applicable. Incomplete factorization (LU in this case) preconditioners
can be used [165] as can polynomial preconditioners. Some hybrid algorithms
use the GMRES/Arnoldi process itself to construct polynomial preconditioners
for GMRES or for Richardson iteration [135], [72], [164], [183]. Again we
mention [8] and [12] as a good general references for preconditioning.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




37
GMRES ITERATION

3.4. GMRES implementation: Basic ideas
Recall that the least squares problem de¬ning the kth GMRES iterate is

minimizex∈x0 +Kk b ’ Ax 2 .

Suppose one had an orthogonal projector Vk onto Kk . Then any z ∈ Kk can
be written as
k
yl vlk ,
z=
l=1

where vlk is the lth column of Vk . Hence we can convert (3.1) to a least squares
problem in Rk for the coe¬cient vector y of z = x ’ x0 . Since

x ’ x0 = Vk y

for some y ∈ Rk , we must have xk = x0 + Vk y where y minimizes

b ’ A(x0 + Vk y) = r0 ’ AVk y 2 .
2

Hence, our least squares problem in Rk is

(3.7) minimizey∈Rk r0 ’ AVk y 2 .

This is a standard linear least squares problem that could be solved by a QR
factorization, say. The problem with such a direct approach is that the matrix
vector product of A with the basis matrix Vk must be taken at each iteration.
If one uses Gram“Schmidt orthogonalization, however, one can represent
(3.7) very e¬ciently and the resulting least squares problem requires no extra
multiplications of A with vectors. The Gram“Schmidt procedure for formation
of an orthonormal basis for Kk is called the Arnoldi [4] process. The data are
vectors x0 and b, a map that computes the action of A on a vector, and a
dimension k. The algorithm computes an orthonormal basis for Kk and stores
it in the columns of V .
Algorithm 3.4.1. arnoldi(x0 , b, A, k, V )
1. De¬ne r0 = b ’ Ax0 and v1 = r0 / r0 2 .

2. For i = 1, . . . , k ’ 1
i T
Avi ’ j=1 ((Avi ) vj )vj
vi+1 = i T
Avi ’ j=1 ((Avi ) vj )vj 2



If there is never a division by zero in step 2 of Algorithm arnoldi, then
the columns of the matrix Vk are an orthonormal basis for Kk . A division by
zero is referred to as breakdown and happens only if the solution to Ax = b is
in x0 + Kk’1 .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




38 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

Lemma 3.4.1. Let A be nonsingular, let the vectors vj be generated by
Algorithm arnoldi, and let i be the smallest integer for which
i
((Avi )T vj )vj = 0.
Avi ’
j=1

Then x = A’1 b ∈ x0 + Ki .
Proof. By hypothesis Avi ∈ Ki and hence AKi ‚ Ki . Since the columns of
Vi are an orthonormal basis for Ki , we have
AVi = Vi H,
where H is an i — i matrix. H is nonsingular since A is. Setting β = r0 2
and e1 = (1, 0, . . . , 0)T ∈ Ri we have
ri = b ’ Axi = r0 ’ A(xi ’ x0 ) 2 .
2 2

Since xi ’ x0 ∈ Ki there is y ∈ Ri such that xi ’ x0 = Vi y. Since r0 = βVi e1
and Vi is an orthogonal matrix
ri = Vi (βe1 ’ Hy) = βe1 ’ Hy Ri+1 ,
2 2

where · Rk+1 denotes the Euclidean norm in Rk+1 .
Setting y = βH ’1 e1 proves that ri = 0 by the minimization property.
The upper Hessenberg structure can be exploited to make the solution of
the least squares problems very e¬cient [167].
If the Arnoldi process does not break down, we can use it to implement
GMRES in an e¬cient way. Set hji = (Avj )T vi . By the Gram“Schmidt
construction, the k +1—k matrix Hk whose entries are hij is upper Hessenberg,
i.e., hij = 0 if i > j + 1. The Arnoldi process (unless it terminates prematurely
with a solution) produces matrices {Vk } with orthonormal columns such that
(3.8) AVk = Vk+1 Hk .
Hence, for some y k ∈ Rk ,
rk = b ’ Axk = r0 ’ A(xk ’ x0 ) = Vk+1 (βe1 ’ Hk y k ).
Hence xk = x0 + Vk y k , where y k minimizes βe1 ’ Hk y 2 over Rk . Note that
when y k has been computed, the norm of rk can be found without explicitly
forming xk and computing rk = b ’ Axk . We have, using the orthogonality of
Vk+1 ,
rk 2 = Vk+1 (βe1 ’ Hk y k ) 2 = βe1 ’ Hk y k Rk+1 .
(3.9)
The goal of the iteration is to ¬nd, for a given , a vector x so that
b ’ Ax ¤ b 2.
2

The input is the initial iterate, x, the right-hand side b, and a map that
computes the action of A on a vector. We limit the number of iterations
to kmax and return the solution, which overwrites the initial iterate x and the
residual norm.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




39
GMRES ITERATION

Algorithm 3.4.2. gmresa(x, b, A, , kmax, ρ)
1. r = b ’ Ax, v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > b and k < kmax do
2

(a) k = k + 1
(b) for j = 1, . . . , k
hjk = (Avk )T vj
k
(c) vk+1 = Avk ’ j=1 hjk vj
(d) hk+1,k = vk+1 2

(e) vk+1 = vk+1 / vk+1 2

(f) e1 = (1, 0, . . . , 0)T ∈ Rk+1
Minimize βe1 ’ Hk y k Rk+1 over Rk to obtain y k .
(g) ρ = βe1 ’ Hk y k Rk+1 .

3. xk = x0 + Vk y k .

Note that xk is only computed upon termination and is not needed within
the iteration. It is an important property of GMRES that the basis for the
Krylov space must be stored as the iteration progress. This means that in
order to perform k GMRES iterations one must store k vectors of length N .
For very large problems this becomes prohibitive and the iteration is restarted
when the available room for basis vectors is exhausted. One way to implement
this is to set kmax to the maximum number m of vectors that one can store,
call GMRES and explicitly test the residual b’Axk if k = m upon termination.
If the norm of the residual is larger than , call GMRES again with x0 = xk ,
the result from the previous call. This restarted version of the algorithm is
called GMRES(m) in [167]. There is no general convergence theorem for the
restarted algorithm and restarting will slow the convergence down. However,
when it works it can signi¬cantly reduce the storage costs of the iteration. We
discuss implementation of GMRES(m) later in this section.
Algorithm gmresa can be implemented very straightforwardly in MAT-
LAB. Step 2f can be done with a single MATLAB command, the backward
division operator, at a cost of O(k 3 ) ¬‚oating-point operations. There are more
e¬cient ways to solve the least squares problem in step 2f, [167], [197], and we
use the method of [167] in the collection of MATLAB codes. The savings are
slight if k is small relative to N , which is often the case for large problems, and
the simple one-line MATLAB approach can be e¬cient for such problems.
A more serious problem with the implementation proposed in Algo-
rithm gmresa is that the vectors vj may become nonorthogonal as a result of
cancellation errors. If this happens, (3.9), which depends on this orthogonality,
will not hold and the residual and approximate solution could be inaccurate. A
partial remedy is to replace the classical Gram“Schmidt orthogonalization in
Algorithm gmresa with modi¬ed Gram“Schmidt orthogonalization. We replace



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




40 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

the loop in step 2c of Algorithm gmresa with

vk+1 = Avk
for j = 1, . . . k
T
vk+1 = vk+1 ’ (vk+1 vj )vj .

While modi¬ed Gram“Schmidt and classical Gram“Schmidt are equivalent in
in¬nite precision, the modi¬ed form is much more likely in practice to maintain
orthogonality of the basis.
We illustrate this point with a simple example from [128], doing the
computations in MATLAB. Let δ = 10’7 and de¬ne
« 
111
¬ ·
A =  δ δ 0 .
δ0δ

We orthogonalize the columns of A with classical Gram“Schmidt to obtain
« 
1.0000e + 00 1.0436e ’ 07 9.9715e ’ 08
¬ ·
V =  1.0000e ’ 07 1.0456e ’ 14 ’9.9905e ’ 01  .
1.0000e ’ 07 ’1.0000e + 00 4.3568e ’ 02

T
The columns of VU are not orthogonal at all. In fact v2 v3 ≈ ’.004. For
modi¬ed Gram“Schmidt
« 
1.0000e + 00 1.0436e ’ 07 1.0436e ’ 07
¬ ·
V =  1.0000e ’ 07 1.0456e ’ 14 ’1.0000e + 00  .
1.0000e ’ 07 ’1.0000e + 00 4.3565e ’ 16

Here |vi vj ’ δij | ¤ 10’8 for all i, j.
T

The versions we implement in the collection of MATLAB codes use modi-
¬ed Gram“Schmidt. The outline of our implementation is Algorithm gmresb.
This implementation solves the upper Hessenberg least squares problem using
the MATLAB backward division operator, and is not particularly e¬cient. We
present a better implementation in Algorithm gmres. However, this version is
very simple and illustrates some important ideas. First, we see that xk need
only be computed after termination as the least squares residual ρ can be used
to approximate the norm of the residual (they are identical in exact arithmetic).
Second, there is an opportunity to compensate for a loss of orthogonality in
the basis vectors for the Krylov space. One can take a second pass through the
modi¬ed Gram“Schmidt process and restore lost orthogonality [147], [160].
Algorithm 3.4.3. gmresb(x, b, A, , kmax, ρ)
1. r = b ’ Ax, v1 = r/ r 2 , ρ = r 2 , β = ρ, k = 0

2. While ρ > b and k < kmax do
2

(a) k = k + 1



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




41
GMRES ITERATION

(b) vk+1 = Avk
for j = 1, . . . k
T
i. hjk = vk+1 vj
ii. vk+1 = vk+1 ’ hjk vj
(c) hk+1,k = vk+1 2
(d) vk+1 = vk+1 / vk+1 2
(e) e1 = (1, 0, . . . , 0)T ∈ Rk+1
Minimize βe1 ’ Hk y k Rk+1 to obtain y k ∈ Rk .
(f) ρ = βe1 ’ Hk y k Rk+1 .

3. xk = x0 + Vk y k .

Even if modi¬ed Gram“Schmidt orthogonalization is used, one can still
lose orthogonality in the columns of V . One can test for loss of orthogonality
[22], [147], and reorthogonalize if needed or use a more stable means to create
the matrix V [195]. These more complex implementations are necessary if A is
ill conditioned or many iterations will be taken. For example, one can augment
the modi¬ed Gram“Schmidt process
• vk+1 = Avk
for j = 1, . . . k
T
hjk = vk+1 vj
vk+1 = vk+1 ’ hjk vj

• hk+1,k = vk+1 2

• vk+1 = vk+1 / vk+1 2
with a second pass (reorthogonalization). One can reorthogonalize in every
iteration or only if a test [147] detects a loss of orthogonality. There is nothing
to be gained by reorthogonalizing more than once [147].
The modi¬ed Gram“Schmidt process with reorthogonalization looks like
• vk+1 = Avk
for j = 1, . . . , k
T
hjk = vk+1 vj
vk+1 = vk+1 ’ hjk vj

• hk+1,k = vk+1 2

• If loss of orthogonality is detected
For j = 1, . . . , k
T
htmp = vk+1 vj
hjk = hjk + htmp
vk+1 = vk+1 ’ htmp vj

• hk+1,k = vk+1 2

• vk+1 = vk+1 / vk+1 2




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




42 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

One approach to reorthogonalization is to reorthogonalize in every step.
This doubles the cost of the computation of V and is usually unnecessary.
More e¬cient and equally e¬ective approaches are based on other ideas. A
variation on a method from [147] is used in [22]. Reorthogonalization is done
after the Gram“Schmidt loop and before vk+1 is normalized if

(3.10) Avk + δ vk+1 = Avk
2 2 2


to working precision. The idea is that if the new vector is very small relative
to Avk then information may have been lost and a second pass through
the modi¬ed Gram“Schmidt process is needed. We employ this test in the
MATLAB code gmres with δ = 10’3 .
To illustrate the e¬ects of loss of orthogonality and those of reorthogonal-
ization we apply GMRES to the diagonal system Ax = b where b = (1, 1, 1)T ,
x0 = (0, 0, 0)T , and
« 
.001 0 0
¬ ·
A= 0 .0011 0  .
(3.11)
104
0 0

While in in¬nite precision arithmetic only three iterations are needed to solve
the system exactly, we ¬nd in the MATLAB environment that a solution to full
precision requires more than three iterations unless reorthogonalization is ap-
plied after every iteration. In Table 3.1 we tabulate relative residuals as a func-
tion of the iteration counter for classical Gram“Schmidt without reorthogonal-
ization (CGM), modi¬ed Gram“Schmidt without reorthogonalization (MGM),
reorthogonalization based on the test (3.10) (MGM-PO), and reorthogonaliza-
tion in every iteration (MGM-FO). While classical Gram“Schmidt fails, the
reorthogonalization strategy based on (3.10) is almost as e¬ective as the much
more expensive approach of reorthogonalizing in every step. The method based
on (3.10) is the default in the MATLAB code gmresa.
The kth GMRES iteration requires a matrix-vector product, k scalar
products, and the solution of the Hessenberg least squares problem in step 2e.
The k scalar products require O(kN ) ¬‚oating-point operations and the cost
of a solution of the Hessenberg least squares problem, by QR factorization or
the MATLAB backward division operator, say, in step 2e of gmresb is O(k 3 )
¬‚oating-point operations. Hence the total cost of the m GMRES iterations is
m matrix-vector products and O(m4 + m2 N ) ¬‚oating-point operations. When
k is not too large and the cost of matrix-vector products is high, a brute-
force solution to the least squares problem using the MATLAB backward
division operator is not terribly ine¬cient. We provide an implementation
of Algorithm gmresb in the collection of MATLAB codes. This is an appealing
algorithm, especially when implemented in an environment like MATLAB,
because of its simplicity. For large k, however, the brute-force method can be
very costly.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




43
GMRES ITERATION

Table 3.1
E¬ects of reorthogonalization.

k CGM MGM MGM-PO MGM-FO
0 1.00e+00 1.00e+00 1.00e+00 1.00e+00
1 8.16e-01 8.16e-01 8.16e-01 8.16e-01
2 3.88e-02 3.88e-02 3.88e-02 3.88e-02
3 6.69e-05 6.42e-08 6.42e-08 6.34e-34
4 4.74e-05 3.70e-08 5.04e-24
5 3.87e-05 3.04e-18
6 3.35e-05
7 3.00e-05
8 2.74e-05
9 2.53e-05
10 2.37e-05




3.5. Implementation: Givens rotations
If k is large, implementations using Givens rotations [167], [22], Householder
re¬‚ections [195], or a shifted Arnoldi process [197] are much more e¬cient
than the brute-force approach in Algorithm gmresb. The implementation in
Algorithm gmres and in the MATLAB code collection is from [167]. This
implementation maintains the QR factorization of Hk in a clever way so that
the cost for a single GMRES iteration is O(N k) ¬‚oating-point operations. The
O(k 2 ) cost of the triangular solve and the O(kN ) cost of the construction of
xk are incurred after termination.
A 2 — 2 Givens rotation is a matrix of the form


c ’s
(3.12) G= ,
s c


where c = cos(θ), s = sin(θ) for θ ∈ [’π, π]. The orthogonal matrix G rotates
the vector (c, ’s), which makes an angle of ’θ with the x-axis through an
angle θ so that it overlaps the x-axis.


c 1
G = .
’s 0


An N — N Givens rotation replaces a 2 — 2 block on the diagonal of the



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




44 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

N — N identity matrix with a 2 — 2 Givens rotation.
« 
1
0 ... 0
¬ ·
¬ 0 ... ... ·
¬ ·
¬ ·
..
¬ ·
. c ’s
¬ ·
¬. ·
.
G=¬ . ·.
.
(3.13) ¬. s c 0 . ·
¬ ·
.
¬ ·
1 ..
0
¬ ·
¬ ·
¬ ·
.. ..
. .0
 
0 ... 01

Our notation is that Gj (c, s) is an N — N givens rotation of the form (3.13)
with a 2 — 2 Givens rotation in rows and columns j and j + 1.
Givens rotations are used to annihilate single nonzero elements of matrices
in reduction to triangular form [89]. They are of particular value in reducing
Hessenberg matrices to triangular form and thereby solving Hessenberg least
squares problems such as the ones that arise in GMRES. This reduction can be
accomplished in O(N ) ¬‚oating-point operations and hence is far more e¬cient
than a solution by a singular value decomposition or a reduction based on
Householder transformations. This method is also used in the QR algorithm
for computing eigenvalues [89], [184].
Let H be an N — M (N ≥ M ) upper Hessenberg matrix with rank M .
We reduce H to triangular form by ¬rst multiplying the matrix by a Givens
rotation that annihilates h21 (and, of course, changes h11 and the subsequent
columns). We de¬ne G1 = G1 (c1 , s1 ) by

c1 = h11 / h2 + h2 and s1 = ’h21 / h2 + h2 .
(3.14) 11 21 11 21

If we replace H by G1 H, then the ¬rst column of H now has only a single
nonzero element h11 . Similarly, we can now apply G2 (c2 , s2 ) to H, where

c2 = h22 / h2 + h2 and s2 = ’h32 / h2 + h2 .
(3.15) 22 32 22 32

and annihilate h32 . Note that G2 does not a¬ect the ¬rst column of H.
Continuing in this way and setting

Q = GN . . . G1

we see that QH = R is upper triangular.
A straightforward application of these ideas to Algorithm gmres would
solve the least squares problem by computing the product of k Givens rotations
Q, setting g = βQe1 , and noting that

βe1 ’ Hk y k = Q(βe1 ’ Hk y k ) = g ’ Rk y k Rk+1 ,
Rk+1 Rk+1

where Rk is the k + 1 — k triangular factor of the QR factorization of Hk .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




45
GMRES ITERATION

In the context of GMRES iteration, however, we can incrementally perform
the QR factorization of H as the GMRES iteration progresses [167]. To see
this, note that if Rk = Qk Hk and, after orthogonalization, we add the new
column hk+2 to Hk , we can update both Qk and Rk by ¬rst multiplying hk+2
by Qk (that is applying the ¬rst k Givens rotations to hk+2 ), then computing
the Givens rotation Gk+1 that annihilates the (k + 2)nd element of Qk hk+2 ,
and ¬nally setting Qk+1 = Gk+1 Qk and forming Rk+1 by augmenting Rk with
Gk+1 Qk hk+2 .
The MATLAB implementation of Algorithm gmres stores Qk by storing
the sequences {cj } and {sj } and then computing the action of Qk on a vector
x ∈ Rk+1 by applying Gj (cj , sj ) in turn to obtain

Qk x = Gk (ck , sk ) . . . G1 (c1 , s1 )x.

We overwrite the upper triangular part of Hk with the triangular part of the
QR factorization of Hk in the MATLAB code. The MATLAB implementation
of Algorithm gmres uses (3.10) to test for loss of orthogonality.
Algorithm 3.5.1. gmres(x, b, A, , kmax, ρ)
1. r = b ’ Ax, v1 = r/ r 2 , ρ = r 2 , β = ρ,
k = 0; g = ρ(1, 0, . . . , 0)T ∈ Rkmax+1

2. While ρ > b and k < kmax do
2

(a) k = k + 1
(b) vk+1 = Avk
for j = 1, . . . k
T
i. hjk = vk+1 vj
ii. vk+1 = vk+1 ’ hjk vj
(c) hk+1,k = vk+1 2

(d) Test for loss of orthogonality and reorthogonalize if necessary.
(e) vk+1 = vk+1 / vk+1 2

(f) i. If k > 1 apply Qk’1 to the kth column of H.
h2 + h2
ii. ν = k+1,k .
k,k
iii. ck = hk,k /ν, sk = ’hk+1,k /ν
hk,k = ck hk,k ’ sk hk+1,k , hk+1,k = 0
iv. g = Gk (ck , sk )g.
(g) ρ = |(g)k+1 |.

3. Set ri,j = hi,j for 1 ¤ i, j ¤ k.
Set (w)i = (g)i for 1 ¤ i ¤ k.
Solve the upper triangular system Ry k = w.

4. xk = x0 + Vk y k .



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




46 ITERATIVE METHODS FOR LINEAR AND NONLINEAR EQUATIONS

We close with an example of an implementation of GMRES(m) . This
implementation does not test for success and may, therefore, fail to terminate.
You are asked to repair this in exercise 7. Aside from the parameter m, the
arguments to Algorithm gmresm are the same as those for Algorithm gmres.
Algorithm 3.5.2. gmresm(x, b, A, , kmax, m, ρ)
1. gmres(x, b, A, , m, ρ)

2. k = m

3. While ρ > b and k < kmax do
2

(a) gmres(x, b, A, , m, ρ)
(b) k = k + m
The storage costs of m iterations of gmres or of gmresm are the m + 2
vectors b, x, and {vk }m .
k=1

3.6. Other methods for nonsymmetric systems
The methods for nonsymmetric linear systems that receive most attention in
this book, GMRES, CGNR, and CGNE, share the properties that they are easy
to implement, can be analyzed by a common residual polynomial approach, and
only terminate if an acceptable approximate solution has been found. CGNR
and CGNE have the disadvantages that a transpose-vector product must be
computed for each iteration and that the coe¬cient matrix AT A or AAT has
condition number the square of that of the original matrix. In § 3.8 we give
an example of how this squaring of the condition number can lead to failure
of the iteration. GMRES needs only matrix-vector products and uses A alone,
but, as we have seen, a basis for Kk must be stored to compute xk . Hence,
the storage requirements increase as the iteration progresses. For large and
ill-conditioned problems, it may be impossible to store enough basis vectors
and the iteration may have to be restarted. Restarting can seriously degrade
performance.
An ideal method would, like CG, only need matrix-vector products, be
based on some kind of minimization principle or conjugacy, have modest
storage requirements that do not depend on the number of iterations needed
for convergence, and converge in N iterations for all nonsingular A. However,
[74], methods based on short-term recurrences such as CG that also satisfy
minimization or conjugacy conditions cannot be constructed for general
matrices. The methods we describe in this section fall short of the ideal, but
can still be quite useful. We discuss only a small subset of these methods and
refer the reader to [12] and [78] for pointers to more of the literature on this
subject. All the methods we present in this section require two matrix-vector
products for each iteration.
Consistently with our implementation of GMRES, we take the view that
preconditioners will be applied externally to the iteration. However, as with
CG, these methods can also be implemented in a manner that incorporates



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR16.html.
Copyright ©1995 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




47
GMRES ITERATION

the preconditioner and uses the residual for the original system to control
termination.

<<

. 2
( 7)



>>