10

’6

10

’7

10

’8

10

0 5 10 15 20 25

FIGURE 4.11. Absolute error (in solid line) versus the error estimated by (4.73)

(dashed line). The number of iterations is indicated on the x-axis

174 4. Iterative Methods for Solving Linear Systems

4.6.2 A Stopping Test Based on the Residual

A di¬erent stopping criterion consists of continuing the iteration until

r(k) ¤ µ, µ being a ¬xed tolerance. Note that

x ’ x(k) = A’1 b ’ x(k) = A’1 r(k) ¤ A’1 µ.

Considering instead a normalized residual, i.e. stopping the iteration as

soon as r(k) / b ¤ µ, we obtain the following control on the relative

error

A’1 r(k)

x ’ x(k) r(k)

¤ ¤ K(A) ¤ µK(A).

x x b

In the case of preconditioned methods, the residual is replaced by the pre-

conditioned residual, so that the previous criterion becomes

P’1 r(k)

¤ µ,

P’1 r(0)

where P is the preconditioning matrix.

4.7 Applications

In this section we consider two examples arising in electrical network anal-

ysis and structural mechanics which lead to the solution of large sparse

linear systems.

4.7.1 Analysis of an Electric Network

We consider a purely resistive electric network (shown in Figure 4.12, left)

which consists of a connection of n stages S (Figure 4.12, right) through

the series resistances R. The circuit is completed by the driving current

generator I0 and the load resistance RL . As an example, a purely resistive

network is a model of a signal attenuator for low-frequency applications

where capacitive and inductive e¬ects can be neglected. The connecting

points between the electrical components will be referred to henceforth as

nodes and are progressively labeled as drawn in the ¬gure. For n ≥ 1, the

total number of nodes is 4n. Each node is associated with a value of the

electric potential Vi , i = 0, . . . , 4n, which are the unknowns of the problem.

The nodal analysis method is employed to solve the problem. Precisely,

the Kirchho¬ current law is written at any node of the network leading

to the linear system YV = ˜ where V ∈ RN +1 is the vector of nodal

˜˜ ˜

I,

potentials, ˜ ∈ RN +1 is the load vector and the entries of the matrix Y ∈

˜

I

4.7 Applications 175

R4

R

1 2 4 n-1

R5

R1 R3

S S RL

I

R6

R

3 5 n

R2

FIGURE 4.12. Resistive electric network (left) and resistive stage S (right)

R(N +1)—(N +1) , for i, j = 0, . . . , 4n, are given by

±

Gij , for i = j,

˜

Yij =

j∈k(i)

’G , for i = j,

ij

where k(i) is the index set of the neighboring nodes of node i and Gij =

1/Rij is the admittance between node i and node j, provided Rij denotes

the resistance between the two nodes i and j. Since the potential is de¬ned

up to an additive constant, we arbitrarily set V0 = 0 (ground potential). As

a consequence, the number of independent nodes for potential di¬erence

computations is N = 4n ’ 1 and the linear system to be solved becomes

YV = I, where Y ∈ RN —N , V ∈ RN and I ∈ RN are obtained eliminating

the ¬rst row and column in Y and the ¬rst entry in V and ˜ respectively.

˜

˜ I,

The matrix Y is symmetric, diagonally dominant and positive de¬nite. This

last property follows by noting that

N N

˜T˜˜ Gij (Vi ’ Vj )2 ,

Gii Vi2

V YV = +

i=1 i,j=1

˜

which is always a positive quantity, being equal to zero only if V = 0. The

sparsity pattern of Y in the case n = 3 is shown in Figure 4.13 (left) while

the spectral condition number of Y as a function of the number of blocks n

is reported in Figure 4.13 (right). Our numerical computations have been

carried out setting the resistance values equal to 1 „¦ while I0 = 1 A.

In Figure 4.14 we report the convergence history of several non precondi-

tioned iterative methods in the case n = 5 corresponding to a matrix size of

19 — 19. The plots show the Euclidean norms of the residual normalized to

the initial residual. The dashed curve refers to the Gauss-Seidel method, the

dash-dotted line refers to the gradient method, while the solid and circled

lines refer respectively to the conjugate gradient (CG) and SOR method

(with an optimal value of the relaxation parameter ω 1.76 computed

according to (4.19) since Y is block tridiagonal symmetric positive de¬-

nite). The SOR method converges in 109 iterations, while the CG method

converges in 10 iterations.

We have also considered the solution of the system at hand by the conju-

gate gradient (CG) method using the Cholesky version of the ILU(0) and

176 4. Iterative Methods for Solving Linear Systems

5

0 10

2 4

10

4

3

10

6

2

10

8

1

10

10

0

12 10

0 2 4 6 8 10 12 0 5 10 15 20 25 30 35 40 45 50

FIGURE 4.13. Sparsity pattern of Y for n = 3 (left) and spectral condition

number of Y as a function of n (right)

2

10

0

10

’2

10

’4

10

’6

10

’8

10

’10

10

’12

10

’14

10

’16

10

0 50 100 150 200 250

FIGURE 4.14. Convergence history of several non preconditioned iterative meth-

ods

MILU(0) preconditioners, where drop tolerances equal to µ = 10’2 , 10’3

have been chosen for the MILU(0) preconditioner (see Section 4.3.2). Cal-

culations with both preconditioners have been done using the MATLAB

functions cholinc and michol. Table 4.2 shows the convergence iterations

of the method for n = 5, 10, 20, 40, 80, 160 and for the considered values of

µ. We report in the second column the number of nonzero entries in the

Cholesky factor of matrix Y, in the third column the number of iterations

for the CG method without preconditioning to converge, while the columns

ICh(0) and MICh(0) with µ = 10’2 and µ = 10’3 show the same infor-

mation for the CG method using the incomplete Cholesky and modi¬ed

incomplete Cholesky preconditioners, respectively.

The entries in the table are the number of iterations to converge and the

number in the brackets are the nonzero entries of the L-factor of the cor-

responding preconditioners. Notice the decrease of the iterations as µ de-

creases, as expected. Notice also the increase of the number of iterations

with respect to the increase of the size of the problem.

4.7 Applications 177

MICh(0) µ = 10’2 MICh(0) µ = 10’3

n CG ICh(0)

nz

5 114 10 9 (54) 6 (78) 4 (98)

10 429 20 15 (114) 7 (173) 5 (233)

20 1659 40 23 (234) 10 (363) 6 (503)

40 6519 80 36 (474) 14 (743) 7 (1043)

80 25839 160 62 (954) 21 (1503) 10 (2123)

160 102879 320 110 (1914) 34 (3023) 14 (4283)

TABLE 4.2. Convergence iterations for the preconditioned CG method

4.7.2 Finite Di¬erence Analysis of Beam Bending

Consider the beam clamped at the endpoints that is drawn in Figure 4.15

(left). The structure, of length L, is subject to a distributed load P , varying

along the free coordinate x and expressed in [Kgm’1 ]. We assume hence-

forth that the beam has uniform rectangular section, of width r and depth

s, momentum of inertia J = rs3 /12 and Young™s module E, expressed in

[m4 ] and [Kg m’2 ], respectively.

5

10

0

10

’5

10

P(x) 000

111

000

111 111

000

000

111 000

111

111

000 111

000

111

000 111

000

000

111 111

000

111

000 000

111 ’10

111

000 10

111

000

000

111 111

000

111

000 111

000

000

111 111

000

x

000

111 000

111

000

111 000

111

111

000 000

111

000

111 000

111 ’15

000

111 111

000 10

111

000 000

111 n=60 n=110

n=10

u(x) ’20

10

0 20 40 60 80 100 120

FIGURE 4.15. Clamped beam (left); convergence histories for the preconditioned

conjugate gradient method in the solution of system (4.76) (right)

The transverse bending of the beam, under the assumption of small dis-

placements, is governed by the following fourth-order di¬erential equation

(EJu ) (x) = P (x), 0 < x < L, (4.74)

where u = u(x) denotes the vertical displacement. The following boundary

conditions (at the endpoints x = 0 and x = L)

u(0) = u(L) = 0, u (0) = u (L) = 0, (4.75)

model the e¬ect of the two clampings (vanishing displacements and rota-

tions). To solve numerically the boundary-value problem (4.74)-(4.75), we

use the ¬nite di¬erence method (see Section 10.10.1 and Exercise 11 of

Chapter 12).

178 4. Iterative Methods for Solving Linear Systems

With this aim, let us introduce the discretization nodes xj = jh, with

h = L/Nh and j = 0, . . . , Nh , and substitute at each node xj the fourth-

order derivative with an approximation through centered ¬nite di¬erences.

Letting f (x) = P (x)/(EJ), fj = f (xj ) and denoting by ·j the (approx-

imate) nodal displacement of the beam at node xj , the ¬nite di¬erence

discretization of (4.74)-(4.75) is

·j’2 ’ 4·j’1 + 6·j ’ 4·j+1 + ·j+2 = h4 fj , ∀j = 2, . . . , Nh ’ 2,

(4.76)

·0 = ·1 = ·Nh ’1 = ·Nh = 0.

The null displacement boundary conditions in (4.76) that have been im-

posed at the ¬rst and the last two nodes of the grid, require that Nh ≥ 4.

Notice that a fourth-order scheme has been used to approximate the fourth-

order derivative, while, for sake of simplicity, a ¬rst-order approximation

has been employed to deal with the boundary conditions (see Section

10.10.1).

The Nh ’ 3 discrete equations (4.76) yield a linear system of the form

Ax = b where the unknown vector x ∈ RNh ’3 and the load vector

b ∈ RNh ’3 are given respectively by x = (·2 , ·3 , . . . , ·Nh ’2 )T and b =

(f2 , f3 , . . . , fNh ’2 )T , while the coe¬cient matrix A ∈ R(Nh ’3)—(Nh ’3) is

pentadiagonal and symmetric, given by A = pentadiagNh ’3 (1, ’4, 6, ’4, 1).

The matrix A is symmetric and positive de¬nite. Therefore, to solve

system Ax = b, the SSOR preconditioned conjugated gradient method (see

Section 4.21) and the Cholesky factorization method have been employed.

In the remainder of the section, the two methods are identi¬ed by the

symbols (CG) and (CH).

The convergence histories of CG are reported in Figure 4.15 (right),

where the sequences r(k) 2 / b(k) 2 , for the values n = 10, 60, 110, are

plotted, r(k) = b ’ Ax(k) being the residual at the k-th step. The results

have been obtained using Program 20, with toll=10’15 and ω = 1.8 in

(4.22). The initial vector x(0) has been set equal to the null vector.

As a comment to the graphs, it is worth noting that CG has required 7, 33

and 64 iterations to converge, respectively, with a maximum absolute error

of 5 · 10’15 with respect to the solution produced by CH. This latter has an

overall computational cost of 136, 1286 and 2436 ¬‚ops respectively, to be

compared with the corresponding 3117, 149424 and 541647 ¬‚ops of method

CG. As for the performances of the SSOR preconditioner, we remark that

the spectral condition number of matrix A is equal to 192, 3.8 · 105 and

4.5 · 106 , respectively, while the corresponding values in the preconditioned

case are 65, 1.2 · 104 and 1.3 · 105 .

4.8 Exercises 179

4.8 Exercises

1. The spectral radius of the matrix

a 4

B=

0 a

1/m

is ρ(B) = a. Check that if 0 < a < 1, then ρ(B) < 1, while Bm can

2

be greater than 1.

2. Let A ∈ Rn—n be a strictly diagonally dominant matrix by rows. Show

that the Gauss-Seidel method for the solution of the linear system (3.2) is

convergent.

3. Check that the matrix A = tridiag(’1, ±, ’1), with ± ∈ R, has eigenvalues

given by

»j = ± ’ 2 cos(jθ), j = 1, . . . , n

where θ = π/(n + 1) and the corresponding eigenvectors are

qj = [sin(jθ), sin(2jθ), . . . , sin(njθ)]T .

Under which conditions on ± is the matrix positive de¬nite?

[Solution : ± ≥ 2.]

4. Consider the pentadiagonal matrix A = pentadiagn (’1, ’1, 10, ’1, ’1).

Assume n = 10 and A = M + N + D, with D = diag(8, . . . , 8) ∈ R10—10 ,

M = pentadiag10 (’1, ’1, 1, 0, 0) and N = MT . To solve Ax = b, analyze

the convergence of the following iterative methods

(M + D)x(k+1) = ’Nx(k) + b,

(a)

Dx(k+1) = ’(M + N)x(k) + b,

(b)

(M + N)x(k+1) = ’Dx(k) + b.

(c)

[Solution : denoting respectively by ρa , ρb and ρc the spectral radii of the

iteration matrices of the three methods, we have ρa = 0.1450, ρb = 0.5

and ρc = 12.2870 which implies convergence for methods (a) and (b) and

divergence for method (c).]

5. For the solution of the linear system Ax = b with

1 2 3

A= , b= ,

2 3 5

consider the following iterative method

k ≥ 0,

x(k+1) = B(θ)x(k) + g(θ), with x(0) given,

where θ is a real parameter and

’2θ2 + 2θ + 1 ’θ

2θ2 + 2θ + 1 1

1 2

B(θ) = , g(θ) = .

’2θ2 + 2θ + 1 ’θ

2θ2 + 2θ + 1 1

4 2

180 4. Iterative Methods for Solving Linear Systems

Check that the method is consistent ∀ θ ∈ R. Then, determine the values

of θ for which the method is convergent and compute the optimal value

of θ (i.e., the value of the parameter for which the convergence rate is

maximum).

[Solution : the method is convergent i¬ ’1 < θ < 1/2 and the convergence

√

rate is maximum if θ = (1 ’ 3)/2.]

6. To solve the following block linear system

A1 B x b1

= ,

B A2 y b2

consider the two methods

A1 x(k+1) + By(k) = b1 , Bx(k) + A2 y(k+1) = b2 ;

(1)

A1 x(k+1) + By(k) = b1 , Bx(k+1) + A2 y(k+1) = b2 .

(2)

Find su¬cient conditions in order for the two schemes to be convergent for

any choice of the initial data x(0) , y(0) .

[Solution : method (1) is a decoupled system in the unknowns x(k+1) and

y(k+1) . Assuming that A1 and A2 are invertible, method (1) converges if

ρ(A’1 B) < 1 and ρ(A’1 B) < 1. In the case of method (2) we have a coupled

1 2

system to solve at each step in the unknowns x(k+1) and y(k+1) . Solving

formally the ¬rst equation with respect to x(k+1) (which requires A1 to be

invertible) and substituting into the second one we see that method (2) is

convergent if ρ(A’1 BA’1 B) < 1 (again A2 must be invertible).]

2 1

7. Consider the linear system Ax = b with

® ®

62 24 1 8 15 110

23 50 7 14 16

110

A= 4 22 , b= .

6 58 20 110

° 10 12 19 66 3» ° »

110

11 18 25 2 54 110

(1) Check if the Jacobi and Gauss-Seidel methods can be applied to solve

the system. (2) Check if the stationary Richardson method with optimal

parameter can be applied with P = I and P = D, where D is the diagonal

part of A, and compute the corresponding values of ±opt and ρopt .

[Solution : (1): matrix A is neither diagonally dominant nor symmetric

positive de¬nite, so that we must compute the spectral radii of the itera-

tion matrices of the Jacobi and Gauss-Seidel methods to verify if they are

convergent. It turns out that ρJ = 0.9280 and ρGS = 0.3066 which implies

convergence for both methods. (2): in the case P = I all the eigenvalues

of A are positive so that the Richardson method can be applied yielding

±opt = 0.015 and ρopt = 0.6452. If P = D the method is still applicable

and ±opt = 0.8510, ρopt = 0.6407.]

8. Consider the linear system Ax = b with

® ®

57 6 5 23

7 10 8 7 32

A= , b = .

° 6 8 10 9 » ° 33 »

57 9 10 31

4.8 Exercises 181

Analyze the convergence properties of the Jacobi and Gauss-Seidel methods

applied to the system above in their point and block forms (for a 2—2 block

partition of A).

[Solution : both methods are convergent, the block form being the faster

one. Moreover, ρ2 (BJ ) = ρ(BGS ).]

9. To solve the linear system Ax = b, consider the iterative method (4.6),

with P = D + ωF and N = ’βF ’ E, ω and β being real numbers. Check

that the method is consistent only if β = 1 ’ ω. In such a case, express

the eigenvalues of the iteration matrix as a function of ω and determine for

which values of ω the method is convergent, as well as the value of ωopt ,

assuming that A = tridiag10 (’1, 2, ’1).

[Hint : Take advantage of the result in Exercise 3.]

10. Let A ∈ Rn—n be such that A = (1+ω)P’(N+ωP), with P’1 N nonsingular

and with real eigenvalues 1 > »1 ≥ »2 ≥ . . . ≥ »n . Find the values of ω ∈ R

for which the following iterative method

k ≥ 0,

(1 + ω)Px(k+1) = (N + ωP)x(k) + b,

converges ∀x(0) to the solution of the linear system (3.2). Determine also

the value of ω for which the convergence rate is maximum.

[Solution : ω > ’(1 + »n )/2; ωopt = ’(»1 + »n )/2.]

11. Consider the linear system

3 2 2

Ax = b with A = , b= .

’8

2 6

Write the associated functional ¦(x) and give a graphical interpretation of

the solution of the linear system. Perform some iterations of the gradient

method, after proving convergence for it.

12. Check that in the gradient method x(k+2) is not an optimal direction with

respect to r(k) .

13. Show that the coe¬cients ±k and βk in the conjugate gradient method can

be written in the alternative form (4.45).

[Solution: notice Ap(k) = (r(k) ’ r(k+1) )/±k and thus (Ap(k) )T r(k+1) =

’ r(k+1) 2 /±k . Moreover, ±k (Ap(k) )T p(k) = ’ r(k) 2 .]

2 2

14. Prove the three-terms recursive relation (4.46) for the residual in the con-

jugate gradient method.

[Solution: subtract from both sides of Ap(k) = (r(k) ’ r(k+1) )/±k the quan-

tity βk’1 /±k r(k) and recall that Ap(k) = Ar(k) ’ βk’1 Ap(k’1) . Then, ex-

pressing the residual r(k) as a function of r(k’1) one immediately gets the

desired relation.]

5

Approximation of Eigenvalues and

Eigenvectors

In this chapter we deal with approximations of the eigenvalues and eigen-

vectors of a matrix A ∈ Cn—n . Two main classes of numerical methods

exist to this purpose, partial methods, which compute the extremal eigen-

values of A (that is, those having maximum and minimum module), or

global methods, which approximate the whole spectrum of A.

It is worth noting that methods which are introduced to solve the matrix

eigenvalue problem are not necessarily suitable for calculating the matrix

eigenvectors. For example, the power method (a partial method, see Section

5.3) provides an approximation to a particular eigenvalue/eigenvector pair.

The QR method (a global method, see Section 5.5) instead computes the

real Schur form of A, a canonical form that displays all the eigenvalues of

A but not its eigenvectors. These eigenvectors can be computed, starting

from the real Schur form of A, with an extra amount of work, as described

in Section 5.8.2.

Finally, some ad hoc methods for dealing e¬ectively with the special case

where A is a symmetric (n — n) matrix are considered in Section 5.10.

5.1 Geometrical Location of the Eigenvalues

Since the eigenvalues of A are the roots of the characteristic polynomial

pA (») (see Section 1.7), iterative methods must be used for their approxi-

mation when n ≥ 5. Knowledge of eigenvalue location in the complex plane

can thus be helpful in accelerating the convergence of the process.

184 5. Approximation of Eigenvalues and Eigenvectors

A ¬rst estimate is provided by Theorem 1.4,

|»| ¤ A , ∀» ∈ σ(A), (5.1)

for any consistent matrix norm · . Inequality (5.1), which is often quite

rough, states that all the eigenvalues of A are contained in a circle of radius

R A = A centered at the origin of the Gauss plane.

Another result is obtained by extending the Decomposition Property 1.23

to complex-valued matrices.

Theorem 5.1 If A ∈ Cn—n , let

iS = A ’ AH /2

H = A + AH /2 and

be the hermitian and skew-hermitian parts of A, respectively, i being the

imaginary unit. For any » ∈ σ(A)

»min (H) ¤ Re(») ¤ »max (H), »min (S) ¤ Im(») ¤ »max (S). (5.2)

Proof. From the de¬nition of H and S it follows that A = H + iS. Let u ∈ Cn ,

u 2 = 1, be the eigenvector associated with the eigenvalue »; the Rayleigh

quotient (introduced in Section 1.7) reads

» = uH Au = uH Hu + iuH Su. (5.3)

Notice that both H and S are hermitian matrices, whilst iS is skew-hermitian.

Matrices H and S are thus unitarily similar to a real diagonal matrix (see Section

1.7), and therefore their eigenvalues are real. In such a case, (5.3) yields

Re(») = uH Hu, Im(») = uH Su,

3

from which (5.2) follows.

An a priori bound for the eigenvalues of A is given by the following result.

Theorem 5.2 (of the Gershgorin circles) Let A ∈ Cn—n . Then

n n

σ(A) ⊆ SR = Ri , Ri = {z ∈ C : |z ’ aii | ¤ |aij |}. (5.4)

j=1

i=1

j=i

The sets Ri are called Gershgorin circles.

Proof. Let us decompose A as A = D + E, where D is the diagonal part of

A, whilst eii = 0 for i = 1, . . . , n. For » ∈ σ(A) (with » = aii , i = 1, . . . , n),

let us introduce the matrix B» = A ’ »I = (D ’ »I) + E. Since B» is singular,

there exists a non-null vector x ∈ Cn such that B» x = 0. This means that

((D ’ »I) + E) x = 0, that is, passing to the · ∞ norm,

x = ’(D ’ »I)’1 Ex, ¤ (D ’ »I)’1 E

x x ∞,

∞ ∞

5.1 Geometrical Location of the Eigenvalues 185

and thus

n n

|ekj | |akj |

’1

1 ¤ (D ’ »I) E = =

∞

|akk ’ »| |akk ’ »| (5.5)

j=1

j=1

j=k

for a certain k, 1 ¤ k ¤ n. Inequality (5.5) implies » ∈ Rk and thus (5.4). 3

The bounds (5.4) ensure that any eigenvalue of A lies within the union

of the circles Ri . Moreover, since A and AT share the same spectrum,

Theorem 5.2 also holds in the form

n n

σ(A) ⊆ SC = Cj , Cj = {z ∈ C : |z ’ ajj | ¤ |aij |}. (5.6)

i=1

j=1

i=j

The circles Ri in the complex plane are called row circles, and Cj column

circles. The immediate consequence of (5.4) and (5.6) is the following.

Property 5.1 (First Gershgorin theorem) For a given matrix A ∈

Cn—n ,

∀» ∈ σ(A), » ∈ SR SC . (5.7)

The following two location theorems can also be proved (see [Atk89], pp.

588-590 and [Hou75], pp. 66-67).

Property 5.2 (Second Gershgorin theorem) Let

m n

S1 = Ri , S2 = Ri .

i=1 i=m+1

If S1 © S2 = …, then S1 contains exactly m eigenvalues of A, each one being

accounted for with its algebraic multiplicity, while the remaining eigenvalues

are contained in S2 .

Remark 5.1 Properties 5.1 and 5.2 do not exclude the possibility that

there exist circles containing no eigenvalues, as happens for the matrix in

Exercise 1.

De¬nition 5.1 A matrix A ∈ Cn—n is called reducible if there exists a

permutation matrix P such that

B11 B12

PAPT = ,

0 B22

where B11 and B22 are square matrices; A is irreducible if it is not reducible.

186 5. Approximation of Eigenvalues and Eigenvectors

To check if a matrix is reducible, the oriented graph of the matrix can be

conveniently employed. Recall from Section 3.9 that the oriented graph of a

real matrix A is obtained by joining n points (called vertices of the graph)

P1 , . . . , Pn through a line oriented from Pi to Pj if the corresponding

matrix entry aij = 0. An oriented graph is strongly connected if for any

pair of distinct vertices Pi and Pj there exists an oriented path from Pi to

Pj . The following result holds (see [Var62] for the proof).

Property 5.3 A matrix A ∈ Rn—n is irreducible i¬ its oriented graph is

strongly connected.

Property 5.4 (Third Gershgorin theorem) Let A ∈ Cn—n be an irre-

ducible matrix. An eigenvalue » ∈ σ(A) cannot lie on the boundary of SR

unless it belongs to the boundary of every circle Ri , for i = 1, . . . , n.

Example 5.1 Let us consider the matrix

®

10 2 3

A = ’1 2 ’1

° »

01 3

whose spectrum is (to four signi¬cant ¬gures) σ(A) = {9.687, 2.656±i0.693}. The

following values of the norm of A: A 1 = 11, A 2 = 10.72, A ∞ = 15 and

A F = 11.36 can be used in the estimate (5.1). Estimate (5.2) provides instead

1.96 ¤ Re(»(A)) ¤ 10.34, ’2.34 ¤ Im(»(A)) ¤ 2.34, while the row and column

circles are given respectively by R1 = {|z| : |z ’10| ¤ 5}, R2 = {|z| : |z ’2| ¤ 2},

R3 = {|z| : |z ’ 3| ¤ 1} and C1 = {|z| : |z ’ 10| ¤ 1}, C2 = {|z| : |z ’ 2| ¤ 3},

C3 = {|z| : |z ’ 3| ¤ 4}.

In Figure 5.1, for i = 1, 2, 3 the Ri and Ci circles and the intersection SR © SC

(shaded areas) are drawn. In agreement with Property 5.2, we notice that an

eigenvalue is contained in C1 , which is disjoint from C2 and C3 , while the remaining

eigenvalues, thanks to Property 5.1, lie within the set R2 ∪ {C3 © R1 }. •

5.2 Stability and Conditioning Analysis

In this section we introduce some a priori and a posteriori estimates that

are relevant in the stability analysis of the matrix eigenvalue and eigenvec-

tor problem. The presentation follows the guidelines that have been traced

in Chapter 2.

A priori Estimates

5.2.1

Assume that A ∈ Cn—n is a diagonalizable matrix and denote by X =

(x1 , . . . , xn ) ∈ Cn—n the matrix of its right eigenvectors, where xk ∈ Cn

5.2 Stability and Conditioning Analysis 187

Im(z)

R1

C3

C2

C1

R2

R3

23 10 Re(z)

FIGURE 5.1. Row and column circles for matrix A in Example 5.1

for k = 1, . . . , n, such that D = X’1 AX = diag(»1 , . . . , »n ), »i being the

eigenvalues of A, i = 1, . . . , n. Moreover, let E ∈ Cn—n be a perturbation

of A. The following theorem holds.

Theorem 5.3 (Bauer-Fike) Let µ be an eigenvalue of the matrix A+E ∈

Cn—n ; then

min |» ’ µ| ¤ Kp (X) E (5.8)

p

»∈σ(A)

where · p is any matrix p-norm and Kp (X) = X p X’1 is called the

p

condition number of the eigenvalue problem for matrix A.

Proof. We ¬rst notice that if µ ∈ σ(A) then (5.8) is trivially veri¬ed, since

X p X’1 p E p ≥ 0. Let us thus assume henceforth that µ ∈ σ(A). From the

de¬nition of eigenvalue it follows that matrix (A+E’µI) is singular, which means

that, since X is invertible, the matrix X’1 (A + E ’ µI)X = D + X’1 EX ’ µI is

singular. Therefore, there exists a non-null vector x ∈ Cn such that

(D ’ µI) + X’1 EX x = 0.

Since µ ∈ σ(A), the diagonal matrix (D ’ µI) is invertible and the previous

equation can be written in the form

I + (D ’ µI)’1 (X’1 EX) x = 0.

·

Passing to the norm and proceeding as in the proof of Theorem 5.2, we get

p

1 ¤ (D ’ µI)’1 p Kp (X) E p,

from which the estimate (5.8) follows, since

(D ’ µI)’1 = ( min |» ’ µ|)’1 .

p

»∈σ(A)

3

188 5. Approximation of Eigenvalues and Eigenvectors

If A is a normal matrix, from the Schur decomposition theorem (see Section

1.8) it follows that the similarity transformation matrix X is unitary so that

Kp (X) = 1. This implies that

∀µ ∈ σ(A + E), min |» ’ µ| ¤ E p , (5.9)

»∈σ(A)

hence the eigenvalue problem is well-conditioned with respect to the abso-

lute error. This, however, does not prevent the matrix eigenvalue problem

from being a¬ected by signi¬cant relative errors, especially when A has a

widely spread spectrum.

Example 5.2 Let us consider, for 1 ¤ n ¤ 10, the calculation of the eigenvalues

of the Hilbert matrix Hn ∈ Rn—n (see Example 3.2, Chapter 3). It is symmetric

(thus, in particular, normal) and exhibits, for n ≥ 4, a very large condition

number. Let En ∈ Rn—n be a matrix having constant entries equal to · = 10’3 .

We show in Table 5.1 the results of the computation of the minimum in (5.9),

taking p = 2 (that is, En 2 = n·). Notice how the absolute error is decreasing,

since the eigenvalue of minimum module tends to zero, whilst the relative error

is increasing as the size n of the matrix increases, due to the higher sensitivity of

•

“small” eigenvalues with respect to rounding errors.

n Abs. Err. Rel. Err. En 2 K2 (Hn ) K2 (Hn + En )

1 · 10’3 1 · 10’3 1 · 10’3 1 · 10’3

1 1

1.677 · 10’4 1.446 · 10’3 2 · 10’3

2 19.28 19.26

5.080 · 10’7 2.207 · 10’3 4 · 10’3 1.551 · 104 1.547 · 104

4

1.156 · 10’12 3.496 · 10’3 8 · 10’3 1.526 · 1010 1.515 · 1010

8

1.355 · 10’15 4.078 · 10’3 1 · 10’2 1.603 · 1013 1.589 · 1013

10

TABLE 5.1. Relative and absolute errors in the calculation of the eigenvalues of

the Hilbert matrix (using the MATLAB intrinsic function eig). “Abs. Err.” and

“Rel. Err.” denote respectively the absolute and relative errors (with respect to

»)

The Bauer-Fike theorem states that the matrix eigenvalue problem is well-

conditioned if A is a normal matrix. Failure to ful¬l this property, however,

does not necessarily imply that A must exhibit a “strong” numerical sen-

sitivity to the computation of every one of its eigenvalues. In this respect,

the following result holds, which can be regarded as an a priori estimate of

the conditioning of the calculation of a particular eigenvalue of a matrix.

Theorem 5.4 Let A ∈ Cn—n be a diagonalizable matrix; let », x and y

be a simple eigenvalue of A and its associated right and left eigenvectors,

respectively, with x 2 = y 2 = 1. Moreover, for µ > 0, let A(µ) =

A + µE, with E ∈ Cn—n such that E 2 = 1. Denoting by »(µ) and x(µ) the

5.2 Stability and Conditioning Analysis 189

eigenvalue and the corresponding eigenvector of A(µ), such that »(0) = »

and x(0) = x,

‚» 1

(0) ¤ H . (5.10)

|y x|

‚µ

Proof. Let us ¬rst prove that yH x = 0. Setting Y = (y1 , . . . , yn ) = (XH )’1 ,

with yk ∈ Cn for k = 1, . . . , n, it follows that yk A = »k yk , i.e., the rows

H H

of X’1 = YH are left eigenvectors of A. Then, since YH X = I, yi xj = δij H

for i, j = 1, . . . , n, δij being the Kronecker symbol. This result is equivalent to

saying that the eigenvectors {x} of A and the eigenvectors {y} of AH form a

bi-orthogonal set (see (4.67)).

Let us now prove (5.10). Since the roots of the characteristic equation are

continuous functions of the coe¬cients of the characteristic polynomial associated

with A(µ), it follows that the eigenvalues of A(µ) are continuous functions of µ

(see, for instance, [Hen74], p. 281). Therefore, in a neighborhood of µ = 0,

(A + µE)x(µ) = »(µ)x(µ).

Di¬erentiating the previous equation with respect to µ and setting µ = 0 yields

‚x ‚» ‚x

A (0) + Ex = (0)x + » (0),

‚µ ‚µ ‚µ

from which, left-multiplying both sides by yH and recalling that yH is a left

eigenvector of A,

yH Ex

‚»

(0) = H .

‚µ yx

3

Using the Cauchy-Schwarz inequality gives the desired estimate (5.10).

Notice that |yH x| = | cos(θ» )|, where θ» is the angle between the eigenvec-

tors yH and x (both having unit Euclidean norm). Therefore, if these two

vectors are almost orthogonal the computation of the eigenvalue » turns

out to be ill-conditioned. The quantity

1 1

κ(») = = (5.11)

|yH x| | cos(θ» )|

can thus be taken as the condition number of the eigenvalue ». Obviously,

κ(») ≥ 1; when A is a normal matrix, since it is unitarily similar to a

diagonal matrix, the left and right eigenvectors y and x coincide, yielding

κ(») = 1/ x 2 = 1.

2

Inequality (5.10) can be roughly interpreted as stating that perturbations

of the order of δµ in the entries of matrix A induce changes of the order of

δ» = δµ/| cos(θ» )| in the eigenvalue ». If normal matrices are considered,

the calculation of » is a well-conditioned problem; the case of a generic non-

symmetric matrix A can be conveniently dealt with using methods based

on similarity transformations, as will be seen in later sections.

190 5. Approximation of Eigenvalues and Eigenvectors

It is interesting to check that the conditioning of the matrix eigenvalue

problem remains unchanged if the transformation matrices are unitary. To

this end, let U ∈ Cn—n be a unitary matrix and let A = UH AU. Also let

»j be an eigenvalue of A and denote by κj the condition number (5.11).

Moreover, let κj be the condition number of »j when it is regarded as an

eigenvalue of A. Finally, let {xk }, {yk } be the right and left eigenvectors of

A respectively. Clearly, {UH xk }, {UH yk } are the right and left eigenvectors

of A. Thus, for any j = 1, . . . , n,

’1

κj = yj UUH xj

H

= κj ,

from which it follows that the stability of the computation of »j is not

a¬ected by performing similarity transformations using unitary matrices.

It can also be checked that unitary transformation matrices do not change

the Euclidean length and the angles between vectors in Cn . Moreover, the

following a priori estimate holds (see [GL89], p. 317)

f l X’1 AX = X’1 AX + E, with E uK2 (X) A (5.12)

2 2

where f l(M) is the machine representation of matrix M and u is the roundo¬

unit (see Section 2.5). From (5.12) it follows that using nonunitary trans-

formation matrices in the eigenvalue computation can lead to an unstable

process with respect to rounding errors.

We conclude this section with a stability result for the approximation of

the eigenvector associated with a simple eigenvalue. Under the same as-

sumptions of Theorem 5.4, the following result holds (see for the proof,

[Atk89], Problem 6, pp. 649-650).

Property 5.5 The eigenvectors xk and xk (µ) of the matrices A and A(µ) =

A + µE, with xk (µ) 2 = xk 2 = 1 for k = 1, . . . , n, satisfy

µ

xk (µ) ’ xk ¤ + O(µ2 ), ∀k = 1, . . . , n.

E

2 2

minj=k |»k ’ »j |

Analogous to (5.11), the quantity

1

κ(xk ) =

minj=k |»k ’ »j |

can be regarded as being the condition number of the eigenvector xk . Com-

puting xk might be an ill-conditioned operation if some eigenvalues »j are

“very close” to the eigenvalue »k associated with xk .

A posteriori Estimates

5.2.2

The a priori estimates examined in the previous section characterize the

stability properties of the matrix eigenvalue and eigenvector problem. From

5.2 Stability and Conditioning Analysis 191

the implementation standpoint, it is also important to dispose of a pos-

teriori estimates that allow for a run-time control of the quality of the

approximation that is being constructed. Since the methods that will be

considered later are iterative processes, the results of this section can be

usefully employed to devise reliable stopping criteria for these latter.

Theorem 5.5 Let A ∈ Cn—n be an hermitian matrix and let (», x) be

the computed approximations of an eigenvalue/eigenvector pair (», x) of

A. De¬ning the residual as

r = Ax ’ »x, x = 0,

it then follows that

r 2

min |» ’ »i | ¤ . (5.13)

x

»i ∈σ(A) 2

Proof. Since A is hermitian, it admits a system of orthonormal eigenvectors {uk }

n

which can be taken as a basis of C . In particular, x =

n

±i ui with ±i = uH x,

i

i=1

n

±i (»i ’ »)ui . As a consequence

and thus r =

i=1

n n

2

r 2

βi (»i ’ ») , with βi = |±k | /( |±j |2 ).

2 2

= (5.14)

x 2

i=1 j=1

n

3

Since βi = 1, the inequality (5.13) immediately follows from (5.14).

i=1

The estimate (5.13) ensures that a small absolute error corresponds to a

small relative residual in the computation of the eigenvalue of the matrix

A which is closest to ».

Let us now consider the following a posteriori estimate for the eigenvector

x (for the proof, see [IK66], pp. 142-143).

Property 5.6 Under the same assumptions of Theorem 5.5, suppose that

|»i ’ »| ¤ r 2 for i = 1, . . . , m and that |»i ’ »| ≥ δ > 0 for i =

m + 1, . . . , n. Then

r 2

d(x, Um ) ¤ (5.15)

δ

where d(x, Um ) is the Euclidean distance between x and the space Um gen-

erated by the eigenvectors ui , i = 1, . . . , m associated with the eigenvalues

»i of A.

192 5. Approximation of Eigenvalues and Eigenvectors

Notice that the a posteriori estimate (5.15) ensures that a small absolute

error corresponds to a small residual in the approximation of the eigenvec-

tor associated with the eigenvalue of A that is closest to », provided that

the eigenvalues of A are well-separated (that is, if δ is su¬ciently large).

In the general case of a nonhermitian matrix A, an a posteriori estimate

can be given for the eigenvalue » only when the matrix of the eigenvectors

of A is available. We have the following result (for the proof, we refer to

[IK66], p. 146).

Property 5.7 Let A ∈ Cn—n be a diagonalizable matrix, with matrix of

eigenvectors X = [x1 , . . . , xn ]. If, for some µ > 0,

¤ µ x 2,

r 2

then

min |» ’ »i | ¤ µ X’1 X 2.

2

»i ∈σ(A)

This estimate is of little practical use, since it requires the knowledge of all

the eigenvectors of A. Examples of a posteriori estimates that can actually

be implemented in a numerical algorithm will be provided in Sections 5.3.1

and 5.3.2.

5.3 The Power Method

The power method is very good at approximating the extremal eigenvalues

of the matrix, that is, the eigenvalues having largest and smallest module,

denoted by »1 and »n respectively, as well as their associated eigenvectors.

Solving such a problem is of great interest in several real-life applications

(geosysmic, machine and structural vibrations, electric network analysis,

quantum mechanics, . . . ) where the computation of »n (and its associated

eigenvector xn ) arises in the determination of the proper frequency (and

the corresponding fundamental mode) of a given physical system. We shall

come back to this point in Section 5.12.

Having approximations of »1 and »n can also be useful in the analysis of

numerical methods. For instance, if A is symmetric and positive de¬nite,

one can compute the optimal value of the acceleration parameter of the

Richardson method and estimate its error reducing factor (see Chapter

4), as well as perform the stability analysis of discretization methods for

systems of ordinary di¬erential equations (see Chapter 11).

5.3.1 Approximation of the Eigenvalue of Largest Module

Let A ∈ Cn—n be a diagonalizable matrix and let X ∈ Cn—n be the matrix of

its eigenvectors xi , for i = 1, . . . , n. Let us also suppose that the eigenvalues

5.3 The Power Method 193

of A are ordered as

|»1 | > |»2 | ≥ |»3 | . . . ≥ |»n |, (5.16)

where »1 has algebraic multiplicity equal to 1. Under these assumptions,

»1 is called the dominant eigenvalue of matrix A.

Given an arbitrary initial vector q(0) ∈ Cn of unit Euclidean norm, consider

for k = 1, 2, . . . the following iteration based on the computation of powers

of matrices, commonly known as the power method:

z(k) = Aq(k’1)

q(k) = z(k) / z(k) (5.17)

2

ν (k) = (q(k) )H Aq(k) .

Let us analyze the convergence properties of method (5.17). By induction

on k one can check that

Ak q(0)

k ≥ 1.

(k)

q = , (5.18)

Ak q(0) 2

This relation explains the role played by the powers of A in the method.

Because A is diagonalizable, its eigenvectors xi form a basis of Cn ; it is

thus possible to represent q(0) as

n

±i ∈ C,

(0)

q = ±i xi , i = 1, . . . , n. (5.19)

i=1

Moreover, since Axi = »i xi , we have

n k

±i »i

k (0)

±1 »k

Aq = x1 + xi , k = 1, 2, . . . (5.20)

1

± »1

i=2 1

Since |»i /»1 | < 1 for i = 2, . . . , n, as k increases the vector Ak q(0) (and

thus also q(k) , due to (5.18)), tends to assume an increasingly signi¬cant

component in the direction of the eigenvector x1 , while its components in

the other directions xj decrease. Using (5.18) and (5.20), we get

±1 »k (x1 + y(k) ) x1 + y(k)

1

(k)

q = = µk ,

±1 »k (x1 + y(k) ) 2 x1 + y(k) 2

1

where µk is the sign of ±1 »k and y(k) denotes a vector that vanishes as

1

k ’ ∞.

As k ’ ∞, the vector q(k) thus aligns itself along the direction of eigen-

vector x1 , and the following error estimate holds at each step k.

194 5. Approximation of Eigenvalues and Eigenvectors

Theorem 5.6 Let A ∈ Cn—n be a diagonalizable matrix whose eigenvalues

satisfy (5.16). Assuming that ±1 = 0, there exists a constant C > 0 such

that

k

»2

’ x1 ¤C k ≥ 1,

(k)

˜

q , (5.21)

2

»1

where

n k

q(k) Ak q(0) ±i »i

2

(k)

˜

q = = x1 + xi , k = 1, 2, . . . (5.22)

±1 »k ± »1

i=2 1

1

Proof. Since A is diagonalizable, without losing generality, we can pick up the

nonsingular matrix X in such a way that its columns have unit Euclidean length,

that is xi 2 = 1 for i = 1, . . . , n. From (5.20) it thus follows that

n n

k k

±i »i ±i »i

xi ’ x1

x1 + = xi

2 2

±1 »1 ±1 »1

i=2 i=2

1/2 1/2

n n

2 2k k 2

±i »i »2 ±i

¤ ¤ ,

±1 »1 »1 ±1

i=2 i=2

1/2

n

3

2

that is (5.21) with C = (±i /±1 ) .

i=2

Estimate (5.21) expresses the convergence of the sequence q(k) towards x1 .

˜

Therefore the sequence of Rayleigh quotients

H

((˜ (k) )H A˜ (k) )/ q(k) 2

= q(k) Aq(k) = ν (k)

˜

q q 2

will converge to »1 . As a consequence, limk’∞ ν (k) = »1 , and the conver-

gence will be faster when the ratio |»2 /»1 | is smaller.

If the matrix A is real and symmetric it can be proved, always assuming

that ±1 = 0, that (see [GL89], pp. 406-407)

2k

»2

|»1 ’ ν | ¤ |»1 ’ »n | tan (θ0 )

(k) 2

, (5.23)

»1

where cos(θ0 ) = |xT q(0) | = 0. Inequality (5.23) outlines that the conver-

1

gence of the sequence ν (k) to »1 is quadratic with respect to the ratio |»2 /»1 |

(we refer to Section 5.3.3 for numerical results).

We conclude the section by providing a stopping criterion for the iteration

(5.17). For this purpose, let us introduce the residual at step k

r(k) = Aq(k) ’ ν (k) q(k) , k ≥ 1,

H

and, for µ > 0, the matrix µE(k) = ’r(k) q(k) ∈ Cn—n with E(k) = 1.

2

Since

µE(k) q(k) = ’r(k) , k ≥ 1, (5.24)

5.3 The Power Method 195

we obtain A + µE(k) q(k) = ν (k) q(k) . As a result, at each step of the

power method ν (k) is an eigenvalue of the perturbed matrix A + µE(k) .

From (5.24) and from de¬nition (1.20) it also follows that µ = r(k) 2 for

k = 1, 2, . . . . Plugging this identity back into (5.10) and approximating the

partial derivative in (5.10) by the incremental ratio |»1 ’ ν (k) |/µ, we get

r(k) 2

|»1 ’ ν | k ≥ 1,

(k)

, (5.25)

| cos(θ» )|

where θ» is the angle between the right and the left eigenvectors, x1 and y1 ,

associated with »1 . Notice that, if A is an hermitian matrix, then cos(θ» ) =

1, so that (5.25) yields an estimate which is analogue to (5.13).

In practice, in order to employ the estimate (5.25) it is necessary at each

step k to replace | cos(θ» )| with the module of the scalar product between

two approximations q(k) and w(k) of x1 and y1 , computed by the power

method. The following a posteriori estimate is thus obtained

r(k) 2

|»1 ’ ν | k ≥ 1.

(k)

, (5.26)

|(w(k) )H q(k) |

Examples of applications of (5.26) will be provided in Section 5.3.3.

5.3.2 Inverse Iteration

In this section we look for an approximation of the eigenvalue of a matrix

A ∈ Cn—n which is closest to a given number µ ∈ C, where µ ∈ σ(A).

’1

For this, the power iteration (5.17) can be applied to the matrix (Mµ ) =

’1

(A ’ µI) , yielding the so-called inverse iteration or inverse power method.

The number µ is called a shift.

The eigenvalues of M’1 are ξi = (»i ’ µ)’1 ; let us assume that there

µ

exists an integer m such that

|»m ’ µ| < |»i ’ µ|, ∀i = 1, . . . , n and i = m. (5.27)

This amounts to requiring that the eigenvalue »m which is closest to µ has

multiplicity equal to 1. Moreover, (5.27) shows that ξm is the eigenvalue of

M’1 with largest module; in particular, if µ = 0, »m turns out to be the

µ

eigenvalue of A with smallest module.

Given an arbitrary initial vector q(0) ∈ Cn of unit Euclidean norm, for

k = 1, 2, . . . the following sequence is constructed:

(A ’ µI) z(k) = q(k’1)

q(k) = z(k) / z(k) (5.28)

2

σ (k) = (q(k) )H Aq(k) .

196 5. Approximation of Eigenvalues and Eigenvectors

Notice that the eigenvectors of Mµ are the same as those of A since Mµ =

X (Λ ’ µIn ) X’1 , where Λ = diag(»1 , . . . , »n ). For this reason, the Rayleigh

quotient in (5.28) is computed directly on the matrix A (and not on M’1 ). µ

The main di¬erence with respect to (5.17) is that at each step k a linear

system with coe¬cient matrix Mµ = A ’ µI must be solved. For numerical

convenience, the LU factorization of Mµ is computed once for all at k = 1,

so that at each step only two triangular systems are to be solved, with a

cost of the order of n2 ¬‚ops.

Although being more computationally expensive than the power method

(5.17), the inverse iteration has the advantage that it can converge to any

desired eigenvalue of A (namely, the one closest to the shift µ). Inverse iter-

ation is thus ideally suited for re¬ning an initial estimate µ of an eigenvalue

of A, which can be obtained, for instance, by applying the localization tech-

niques introduced in Section 5.1. Inverse iteration can be also e¬ectively

employed to compute the eigenvector associated with a given (approximate)

eigenvalue, as described in Section 5.8.1.

In view of the convergence analysis of the iteration (5.28) we assume that

A is diagonalizable, so that q(0) can be represented in the form (5.19).

Proceeding in the same way as in the power method, we let

n k

±i ξi

(k)

˜

q = xm + xi ,

±m ξm

i=1,i=m

where xi are the eigenvectors of M’1 (and thus also of A), while ±i are as

µ

in (5.19). As a consequence, recalling the de¬nition of ξi and using (5.27),

we get

lim q(k) = xm , lim σ (k) = »m .

˜

k’∞ k’∞

Convergence will be faster when µ is closer to »m . Under the same assump-

tions made for proving (5.26), the following a posteriori estimate can be

obtained for the approximation error on »m

r(k) 2

|»m ’ σ | k ≥ 1,

(k)

, (5.29)

|(w(k) )H q(k) |

where r(k) = Aq(k) ’ σ (k) q(k) and w(k) is the k-th iterate of the inverse

power method to approximate the left eigenvector associated with »m .

5.3.3 Implementation Issues

The convergence analysis of Section 5.3.1 shows that the e¬ectiveness of

the power method strongly depends on the dominant eigenvalues being

well-separated (that is, |»2 |/|»1 | 1). Let us now analyze the behavior of

iteration (5.17) when two dominant eigenvalues of equal module exist (that

is, |»2 | = |»1 |). Three cases must be distinguished:

5.3 The Power Method 197

1. »2 = »1 : the two dominant eigenvalues are coincident. The method

is still convergent, since for k su¬ciently large (5.20) yields

Ak q(0) »k (±1 x1 + ±2 x2 )

1

which is an eigenvector of A. For k ’ ∞, the sequence q(k) (after

˜

a suitable rede¬nition) converges to a vector lying in the subspace

spanned by the eigenvectors x1 and x2 , while the sequence ν (k) still

converges to »1 .

2. »2 = ’»1 : the two dominant eigenvalues are opposite. In this case

the eigenvalue of largest module can be approximated by applying the

power method to the matrix A2 . Indeed, for i = 1, . . . , n, »i (A2 ) =

2

[»i (A)] , so that »2 = »2 and the analysis falls into the previous case,

1 2

where the matrix is now A2 .

3. »2 = »1 : the two dominant eigenvalues are complex conjugate. Here,

undamped oscillations arise in the sequence of vectors q(k) and the

power method is not convergent (see [Wil65], Chapter 9, Section 12).

As for the computer implementation of (5.17), it is worth noting that nor-

malizing the vector q(k) to 1 keeps away from over¬‚ow (when |»1 | > 1) or

under¬‚ow (when |»1 | < 1) in (5.20). We also point out that the requirement

±1 = 0 (which is a priori impossible to ful¬l when no information about

the eigenvector x1 is available) is not essential for the actual convergence

of the algorithm.

Indeed, although it can be proved that, working in exact arithmetic, the

sequence (5.17) converges to the pair (»2 , x2 ) if ±1 = 0 (see Exercise 10),

the arising of (unavoidable) rounding errors ensures that in practice the

vector q(k) contains a non-null component also in the direction of x1 . This

allows for the eigenvalue »1 to “show-up” and the power method to quickly

converge to it.

An implementation of the power method is given in Program 26. Here

and in the following algorithm, the convergence check is based on the a

posteriori estimate (5.26).

Here and in the remainder of the chapter, the input data z0, toll and

nmax are the initial vector, the tolerance for the stopping test and the

maximum admissible number of iterations, respectively. In output, the vec-

tors nu1 and err contain the sequences {ν (k) } and { r(k) 2 /| cos(θ» )|} (see

(5.26)), whilst x1 and niter are the approximation of the eigenvector x1

and the number of iterations taken by the algorithm to converge, respec-

tively.

Program 26 - powerm : Power method

function [nu1,x1,niter,err]=powerm(A,z0,toll,nmax)

198 5. Approximation of Eigenvalues and Eigenvectors

q=z0/norm(z0); q2=q; err=[]; nu1=[]; res=toll+1; niter=0; z=A*q;

while (res >= toll & niter <= nmax)

q=z/norm(z); z=A*q; lam=q™*z; x1=q;

z2=q2™*A; q2=z2/norm(z2); q2=q2™;

y1=q2; costheta=abs(y1™*x1);

if (costheta >= 5e-2),

niter=niter+1; res=norm(z-lam*q)/costheta;

err=[err; res]; nu1=[nu1; lam];

else

disp(™ Multiple eigenvalue ™); break;

end

end

A coding of the inverse power method is provided in Program 27. The

input parameter mu is the initial approximation of the eigenvalue. In output,

the vectors sigma and err contain the sequences {σ (k) } and r(k) 2 /|(w(k) )H

q(k) | (see (5.29)). The LU factorization (with partial pivoting) of the ma-

trix Mµ is carried out using the MATLAB intrinsic function lu.

Program 27 - invpower : Inverse power method

function [sigma,x,niter,err]=invpower(A,z0,mu,toll,nmax)

n=max(size(A)); M=A-mu*eye(n); [L,U,P]=lu(M);

q=z0/norm(z0); q2=q™; err=[]; sigma=[]; res=toll+1; niter=0;

while (res >= toll & niter <= nmax)

niter=niter+1; b=P*q; y=L\b; z=U\y;

q=z/norm(z); z=A*q; lam=q™*z;

b=q2™; y=U™\b; w=L™\y;

q2=(P™*w)™; q2=q2/norm(q2); costheta=abs(q2*q);

if (costheta >= 5e-2),

res=norm(z-lam*q)/costheta; err=[err; res]; sigma=[sigma; lam];

else,

disp(™ Multiple eigenvalue ™); break;

end

x=q;

end

Example 5.3 The matrix A in (5.30)

® ®

’2 ’0.944 ’0.088

15 2 0.393

A= ’3 , V = ’0.312 0.309

1 10 0.919 (5.30)

° » ° »

’2 1 0 0.112 0.013 0.947

has the following eigenvalues (to ¬ve signi¬cant ¬gures): »1 = 14.103, »2 = 10.385

and »3 = 0.512, while the corresponding eigenvectors are the vector columns of

matrix V.

5.3 The Power Method 199

0

0

10

10

’2

10

’2

10

’4

(NS)

10

’4

10

’6

10

’6

10

’8

10

’8

10 (S)

’10

10

’10

10 ’12

10

’14

’12

10

10

0 6 12 18

0 10 20 30 40 50 60 70 80

FIGURE 5.2. Comparison between the a posteriori error estimate and the actual

absolute error for matrix A in (5.30) (left); convergence curves for the power

method applied to matrix A in (5.31) in its symmetric (S) and nonsymmetric

(NS) forms (right)

To approximate the pair (»1 , x1 ), we have run the Program 26 with initial datum

z(0) = (1, 1, 1)T . After 71 iterations of the power method the absolute errors are

|»1 ’ ν (71) | = 7.91 · 10’11 and x1 ’ x1 ’11

(71)

∞ = 1.42 · 10 .

(0)

In a second run, we have used z = x2 + x3 (notice that with this choice

±1 = 0). After 215 iterations the absolute errors are |»1 ’ ν (215) | = 4.26 · 10’14

’14