»i = ∞, if tii = 0, sii = 0.

Exactly as in the matrix eigenvalue problem, the generalized Schur form

cannot be explicitly computed, so the counterpart of the real Schur form

(5.34) has to be computed. Assuming that the matrices A and B are real, it

˜ ˜

can be shown that there exist two orthogonal matrices U and Z such that

˜ ˜ ˜ ˜ ˜ ˜

T = UT AZ is upper quasi-triangular and S = UT BZ is upper triangular.

This decomposition is known as the generalized real Schur decomposition

of a pair (A,B) and can be computed by a suitably modi¬ed version of the

QR algorithm, known as QZ iteration, which consists of the following steps

(for a more detailed description, see [GL89], Section 7.7, [Dat95], Section

9.3):

1. reduce A and B into upper Hessenberg form and upper triangular

form, respectively, i.e., ¬nd two orthogonal matrices Q and Z such

that A = QT AZ is upper Hessenberg and B = QT BZ is upper trian-

gular;

2. the QR iteration is applied to the matrix AB ’1 to reduce it to real

Schur form.

To save computational resources, the QZ algorithm overwrites the matrices

A and B on their upper Hessenberg and triangular forms and requires 30n3

¬‚ops; an additional cost of 36n3 operations is required if Q and Z are

226 5. Approximation of Eigenvalues and Eigenvectors

also needed. The method is implemented in the LAPACK library in the

subroutine sgges and can be invoked in the MATLAB environment with

the command eig(A,B).

5.9.2 Generalized Real Schur Form of Symmetric-De¬nite

Pencils

A remarkable situation occurs when both A and B are symmetric, and one

of them, say B, is also positive de¬nite. In such a case, the pair (A,B) forms

a symmetric-de¬nite pencil for which the following result holds.

Theorem 5.7 The symmetric-de¬nite pencil (A,B) has real eigenvalues

and linearly independent eigenvectors. Moreover, the matrices A and B can

be simultaneously diagonalized. Precisely, there exists a nonsingular matrix

X ∈ Rn—n such that

XT AX = Λ = diag(»1 , »2 , . . . , »n ), XT BX = In ,

where for i = 1, . . . , n, »i are the eigenvalues of the pencil (A, B).

Proof. Since B is symmetric positive de¬nite, it admits a unique Cholesky fac-

torization B = HT H, where H is upper triangular (see Section 3.4.2). From (5.58)

we deduce that Cz = »z with C = H’T AH’1 , z = Hx, where (», x) is an

eigenvalue/eigenvector pair of (A,B).

The matrix C is symmetric; therefore, its eigenvalues are real and a set of

orthonormal eigenvectors (y1 , . . . , yn ) = Y exists. As a consequence, letting X =

H’1 Y allows for simultaneously diagonalizing both A and B since

XT AX = YT H’T AH’1 Y = YT CY = Λ = diag(»1 , . . . , »n ),

XT BX = YT H’T BH’1 Y = YT Y = In .

3

The following QR-Cholesky algorithm computes the eigenvalues »i and

the corresponding eigenvectors xi of a symmetric-de¬nite pencil (A,B), for

i = 1, . . . , n (see for more details [GL89], Section 8.7, [Dat95], Section 9.5):

1. compute the Cholesky factorization B = HT H;

2. compute C = H’T AH’1 ;

3. for i = 1, . . . , n, compute the eigenvalues »i and eigenvectors zi of the

symmetric matrix C using the QR iteration. Then construct from the

set {zi } an orthonormal set of eigenvectors {yi } (using, for instance,

the modi¬ed Gram-Schmidt procedure of Section 3.4.3);

4. for i = 1, . . . , n, compute the eigenvectors xi of the pencil (A,B) by

solving the systems Hxi = yi .

5.10 Methods for Eigenvalues of Symmetric matrices 227

This algorithm requires an order of 14n3 ¬‚ops and it can be shown (see

ˆ

[GL89], p. 464) that, if » is a computed eigenvalue, then

» ∈ σ(H’T AH’1 + E), B’1 2 .

ˆ with E uA

2 2

Thus, the generalized eigenvalue problem in the symmetric-de¬nite case

may become unstable with respect to rounding errors propagation if B is

ill-conditioned. For a stabilized version of the QR-Cholesky method, see

[GL89], p. 464 and the references cited therein.

5.10 Methods for Eigenvalues of Symmetric

matrices

In this section we deal with the computation of the eigenvalues of a sym-

metric matrix A ∈ Rn—n . Besides the QR method previously examined,

speci¬c algorithms which take advantage of the symmetry of A are avail-

able.

Among these, we ¬rst consider the Jacobi method, which generates a

sequence of matrices orthogonally similar to A and converging to the diag-

onal Schur form of A. Then, the Sturm sequence and Lanczos procedures

are presented, for handling the case of tridiagonal matrices and large sparse

matrices respectively.

5.10.1 The Jacobi Method

The Jacobi method generates a sequence of matrices A(k) that are orthog-

onally similar to matrix A and converge to a diagonal matrix whose entries

are the eigenvalues of A. This is done using the Givens similarity transfor-

mations (5.43) as follows.

Given A(0) = A, for any k = 1, 2, . . . , a pair of indices p and q is ¬xed,

with 1 ¤ p < q ¤ n. Next, letting Gpq = G(p, q, θ), the matrix A(k) =

(Gpq )T A(k’1) Gpq , orthogonally similar to A, is constructed in such a way

that

(k)

aij = 0 if (i, j) = (p, q). (5.59)

Letting c = cos θ and s = sin θ, the procedure for computing the entries of

A(k) that are changed with respect to those of A(k’1) , can be written as

® ®

T

(k) (k) (k’1) (k’1)

app apq cs app apq cs

° »= ° » . (5.60)

’s c ’s c

(k) (k) (k’1) (k’1)

apq aqq apq aqq

(k’1) (k’1)

If apq = 0, we can satisfy (5.59) by taking c = 1 and s = 0. If apq =

0, letting t = s/c, (5.60) requires the solution of the following algebraic

228 5. Approximation of Eigenvalues and Eigenvectors

equation

(k’1) (k’1)

’ app

aqq

t + 2·t ’ 1 = 0,

2

·= . (5.61)

(k’1)

2apq

The root t = 1/(· + 1 + · 2 ) is chosen in (5.61) if · ≥ 0, otherwise we

take t = ’1/(’· + 1 + · 2 ); next, we let

1

c= √ , s = ct. (5.62)

1 + t2

To examine the rate at which the o¬-diagonal entries of A(k) tend to zero,

it is convenient to introduce, for any matrix M ∈ Rn—n , the nonnegative

quantity

« 1/2

1/2

n n

¬ ·

’

m2 2

m2

Ψ(M) = = M . (5.63)

ij F ii

i,j=1 i=1

i=j

The Jacobi method ensures that Ψ(A(k) ) ¤ Ψ(A(k’1) ) for any k ≥ 1.

Indeed, the computation of (5.63) for matrix A(k) yields

2

(Ψ(A(k) ))2 = (Ψ(A(k’1) ))2 ’ 2 a(k’1) ¤ (Ψ(A(k’1) ))2 . (5.64)

pq

The estimate (5.64) suggests that, at each step k, the optimal choice of the

indices p and q is that corresponding to the entry in A(k’1) such that

(k’1)

|apq | = max |aij |.

(k’1)

i=j

The computational cost of this strategy is of the order of n2 ¬‚ops for the

search of the maximum module entry, while the updating step A(k) =

(Gpq )T A(k’1) Gpq requires only a cost of the order of n ¬‚ops, as already

noticed in Section 5.6.5. It is thus convenient to resort to the so called row

cyclic Jacobi method, in which the choice of the indices p and q is done by

a row-sweeping of the matrix A(k) according to the following algorithm: for

any k = 1, 2, . . . and for any i-th row of A(k) (i = 1, . . . , n ’ 1), we set

p = i and q = (i + 1), . . . , n. Each complete sweep requires N = n(n ’ 1)/2

Jacobi transformations. Assuming that |»i ’ »j | ≥ δ for i = j, it can be

shown that the cyclic Jacobi method converges quadratically, that is (see

[Wil65], [Wil62])

1

Ψ(A(k+N ) ) ¤ √ (Ψ(A(k) ))2 , k = 1, 2, . . .

δ2

For further details of the algorithm, we refer to [GL89], Section 8.4.

5.10 Methods for Eigenvalues of Symmetric matrices 229

Example 5.15 Let us apply the cyclic Jacobi method to the Hilbert matrix H4 ,

whose eigenvalues read (to ¬ve signi¬cant ¬gures) »1 = 1.5002, »2 = 1.6914·10’1 ,

»3 = 6.7383 · 10’3 and »4 = 9.6702 · 10’5 . Running Program 40 with toll =

10’15 , the method converges in 3 sweeps to a matrix whose diagonal entries

coincide with the eigenvalues of H4 unless 4.4409 · 10’16 . As for the o¬-diagonal

(k)

•

entries, the values attained by Ψ(H4 ) are reported in Table 5.3.

(k) (k) (k)

Sweep Ψ(H4 ) Sweep Ψ(H4 ) Sweep Ψ(H4 )

5.262 · 10’2 3.824 · 10’5 5.313 · 10’16

1 2 3

TABLE 5.3. Convergence of the cyclic Jacobi algorithm

Formulae (5.63) and (5.62) are implemented in Programs 38 and 39.

Program 38 - psinorm : Evaluation of Ψ(A)

function [psi]=psinorm(A)

n=max(size(A)); psi=0;

for i=1:(n-1), for j=(i+1):n, psi=psi+A(i,j)ˆ2+A(j,i)ˆ2; end; end; psi=sqrt(psi);

Program 39 - symschur : Evaluation of c and s

function [c,s]=symschur(A,p,q)

if (A(p,q)==0), c=1; s=0; else,

eta=(A(q,q)-A(p,p))/(2*A(p,q));

if (eta >= 0), t=1/(eta+sqrt(1+etaˆ2));

else, t=-1/(-eta+sqrt(1+etaˆ2)); end; c=1/sqrt(1+tˆ2); s=c*t;

end

A coding of the cyclic Jacobi method is implemented in Program 40.

This program gets as input parameters the symmetric matrix A ∈ Rn—n

and a tolerance toll. The program returns a matrix D = GT AG, G be-

ing orthogonal, such that Ψ(D) ¤ toll A F , the value of Ψ(D) and the

number of sweeps to achieve convergence.

Program 40 - cycjacobi : Cyclic Jacobi method for symmetric matrices

function [D,sweep,psi]=cycjacobi(A,toll)

n=max(size(A)); D=A; psiD=norm(A,™fro™);

epsi=toll*psiD; psiD=psinorm(D); [psi]=psiD; sweep=0;

while (psiD > epsi), sweep=sweep+1;

for p=1:(n-1), for q=(p+1):n

[c,s]=symschur(D,p,q); [D]=gacol(D,c,s,1,n,p,q); [D]=garow(D,c,s,p,q,1,n);

end; end; psiD=psinorm(D); psi=[psi; psiD];

end

230 5. Approximation of Eigenvalues and Eigenvectors

5.10.2 The Method of Sturm Sequences

In this section we deal with the calculation of the eigenvalues of a real,

tridiagonal and symmetric matrix T. Typical instances of such a problem

arise when applying the Householder transformation to a given symmetric

matrix A (see Section 5.6.2) or when solving boundary value problems in

one spatial dimension (see for an example Section 5.12.1).

We analyze the method of Sturm sequences, or Givens method, introduced

in [Giv54]. For i = 1, . . . , n, we denote by di the diagonal entries of T and by

bi , i = 1, . . . , n ’ 1, the elements of the upper and lower subdiagonals of T.

We shall assume that bi = 0 for any i. Otherwise, indeed, the computation

reduces to problems of less complexity.

Letting Ti be the principal minor of order i of matrix T and p0 (x) = 1,

we de¬ne for i = 1, . . . , n the following sequence of polynomials pi (x) =

det(Ti ’ xIi )

p1 (x) = d1 ’ x

(5.65)

pi (x) = (di ’ x)pi’1 (x) ’ b2 pi’2 (x), i = 2, . . . , n.

i’1

It can be checked that pn is the characteristic polynomial of T; the com-

putational cost of its evaluation at point x is of the order of 2n ¬‚ops. The

sequence (5.65) is called the Sturm sequence owing to the following result,

for whose proof we refer to [Wil65], Chapter 2, Section 47 and Chapter 5,

Section 37.

Property 5.11 (of Sturm sequence) For i = 2, . . . , n the eigenvalues

of Ti’1 strictly separate those of Ti , that is

»i (Ti ) < »i’1 (Ti’1 ) < »i’1 (Ti ) < . . . < »2 (Ti ) < »1 (Ti’1 ) < »1 (Ti ).

Moreover, letting for any real number µ

Sµ = {p0 (µ), p1 (µ), . . . , pn (µ)},

the number s(µ) of sign changes in Sµ yields the number of eigenvalues of T

that are strictly less than µ, with the convention that pi (µ) has opposite sign

to pi’1 (µ) if pi (µ) = 0 (two consecutive elements in the sequence cannot

vanish at the same value of µ).

Example 5.16 Let T be the tridiagonal part of the Hilbert matrix H4 ∈ R4—4 ,

having entries hij = 1/(i + j ’ 1). The eigenvalues of T are (to ¬ve signi¬cant

¬gures) »1 = 1.2813, »2 = 0.4205, »3 = ’0.1417 and »4 = 0.1161. Taking µ = 0,

Program 41 computes the following Sturm sequence

S0 = {p0 (0), p1 (0), p2 (0), p3 (0), p4 (0)} = {1, 1, 0.0833, ’0.0458, ’0.0089}

from which, applying Property 5.11, one concludes that matrix T has one eigen-

value less than 0. In the case of matrix T = tridiag4 (’1, 2, ’1), with eigenvalues

5.10 Methods for Eigenvalues of Symmetric matrices 231

{0.38, 1.38, 2.62, 3.62} (to three signi¬cant ¬gures), we get, taking µ = 3

{p0 (3), p1 (3), p2 (3), p3 (3), p4 (3)} = {1, ’1, 0, 1, ’1}

which shows that matrix T has three eigenvalues less than 3, since three sign

•

changes occur.

The Givens method for the calculation of the eigenvalues of T proceeds as

follows. Letting b0 = bn = 0, Theorem 5.2 yields the interval J = [±, β]

which contains the spectrum of T, where

± = min [di ’ (|bi’1 | + |bi |)] , β = max [di + (|bi’1 | + |bi |)] .

1¤i¤n 1¤i¤n

The set J is used as an initial guess in the search for generic eigenvalues

»i of matrix T, for i = 1, . . . , n, using the bisection method (see Chapter

6).

Precisely, given a(0) = ± and b(0) = β, we let c(0) = (± + β)/2 and

compute s(c(0) ); then, recalling Property 5.11, we let b(1) = c(0) if s(c(0) ) >

(n ’ i), otherwise we set a(1) = c(0) . After r iterations, the value c(r) =

(a(r) + b(r) )/2 provides an approximation of »i within (|±| + |β|) · 2’(r+1) ,

as is shown in (6.9).

A systematic procedure can be set up to store any information about

the position within the interval J of the eigenvalues of T that are being

computed by the Givens method. The resulting algorithm generates a se-

(r) (r)

quence of neighboring subintervals aj , bj , for j = 1, . . . , n, each one of

arbitrarily small length and containing one eigenvalue »j of T (for further

details, see [BMW67]).

Example 5.17 Let us employ the Givens method to compute the eigenvalue

»2 2.62 of matrix T considered in Example 5.16. Letting toll=10’4 in Program

42 we obtain the results reported in Table 5.4, which demonstrate the convergence

of the sequence c(k) to the desired eigenvalue in 13 iterations. We have denoted

for brevity, s(k) = s(c(k) ). Similar results are obtained by running Program 42 to

•

compute the remaining eigenvalues of T.

a(k) b(k) c(k) s(k) a(k) b(k) c(k) s(k)

k k

0 0 4.000 2.0000 2 7 2.5938 2.625 2.6094 2

1 2.0000 4.000 3.0000 3 8 2.6094 2.625 2.6172 2

2 2.0000 3.000 2.5000 2 9 2.6094 2.625 2.6172 2

3 2.5000 3.000 2.7500 3 10 2.6172 2.625 2.6211 3

4 2.5000 2.750 2.6250 3 11 2.6172 2.621 2.6191 3

5 2.5000 2.625 2.5625 2 12 2.6172 2.619 2.6182 3

6 2.5625 2.625 2.5938 2 13 2.6172 2.618 2.6177 2

TABLE 5.4. Convergence of the Givens method for the calculation of the eigen-

value »2 of the matrix T in Example 5.16

232 5. Approximation of Eigenvalues and Eigenvectors

An implementation of the polynomial evaluation (5.65) is given in Pro-

gram 41. This program receives in input the vectors dd and bb containing

the main and the upper diagonals of T. The output values pi (x) are stored,

for i = 0, . . . , n, in the vector p.

Program 41 - sturm : Sturm sequence evaluation

function [p]=sturm(dd,bb,x)

n=length(dd); p(1)=1; p(2)=d(1)-x;

for i=2:n, p(i+1)=(dd(i)-x)*p(i)-bb(i-1)ˆ2*p(i-1); end

A basic implementation of the Givens method is provided in Program

42. In input, ind is the pointer to the searched eigenvalue, while the other

parameters are similar to those in Program 41. In output the values of

the elements of sequences a(k) , b(k) and c(k) are returned, together with

the required number of iterations niter and the sequence of sign changes

s(c(k) ).

Program 42 - givsturm : Givens method using the Sturm sequence

function [ak,bk,ck,nch,niter]=givsturm(dd,bb,ind,toll)

[a, b]=bound(dd,bb); dist=abs(b-a); s=abs(b)+abs(a);

n=length(d); niter=0; nch=[];

while (dist > (toll*s)),

niter=niter+1; c=(b+a)/2;

ak(niter)=a; bk(niter)=b; ck(niter)=c;

nch(niter)=chcksign(dd,bb,c);

if (nch(niter) > (n-ind)), b=c;

else, a=c; end; dist=abs(b-a); s=abs(b)+abs(a);

end

Program 43 - chcksign : Sign changes in the Sturm sequence

function nch=chcksign(dd,bb,x)

[p]=sturm(dd,bb,x); n=length(dd); nch=0; s=0;

for i=2:(n+1),

if ((p(i)*p(i-1)) <= 0), nch=nch+1; end

if (p(i)==0), s=s+1; end

end

nch=nch-s;

Program 44 - bound : Calculation of the interval J = [±, β]

function [alfa,beta]=bound(dd,bb)

n=length(dd); alfa=dd(1)-abs(bb(1)); temp=dd(n)-abs(bb(n-1));

if (temp < alfa), alfa=temp; end;

for i=2:(n-1),

temp=dd(i)-abs(bb(i-1))-abs(bb(i));

5.11 The Lanczos Method 233

if (temp < alfa), alfa=temp; end;

end

beta=dd(1)+abs(bb(1)); temp=dd(n)+abs(bb(n-1));

if (temp > beta), beta=temp; end;

for i=2:(n-1),

temp=dd(i)+abs(bb(i-1))+abs(bb(i));

if (temp > beta), beta=temp; end;

end

5.11 The Lanczos Method

Let A ∈ Rn—n be a symmetric sparse matrix, whose (real) eigenvalues are

ordered as

»1 ≥ »2 ≥ . . . ≥ »n’1 ≥ »n . (5.66)

When n is very large, the Lanczos method [Lan50] described in Section

4.4.3 can be applied to approximate the extremal eigenvalues »n and »1 . It

generates a sequence of tridiagonal matrices Hm whose extremal eigenvalues

rapidly converge to the extremal eigenvalues of A.

To estimate the convergence of the tridiagonalization process, we intro-

duce the Rayleigh quotient r(x) = (xT Ax)/(xT x) associated with a nonnull

vector x ∈ Rn . The following result, known as Courant-Fisher Theorem,

holds (for the proof see [GL89], p. 394)

»1 (A) = max r(x), »n (A) = min r(x).

n n

x∈R x∈R

x=0 x=0

T

Its application to the matrix Hm = Vm AVm , yields

(Vm x)T A(Vm x)

= max r(Hm x) ¤ »1 (A)

»1 (Hm ) = maxx∈Rn

xT x

x=0 x 2 =1

(5.67)

T

(Vm x) A(Vm x)

= min r(Hm x) ≥ »n (A).

»m (Hm ) = minx∈Rn

xT x

x=0 x 2 =1

At each step of the Lanczos method, the estimates (5.67) provide a lower

and upper bound for the extremal eigenvalues of A. The convergence of the

sequences {»1 (Hm )} and {»m (Hm )} to »1 and »n , respectively, is governed

by the following property, for whose proof we refer to [GL89], pp. 475-477.

Property 5.12 Let A ∈ Rn—n be a symmetric matrix with eigenvalues

ordered as in (5.66) and let u1 , . . . , un be the corresponding orthonormal

eigenvectors. If ·1 , . . . , ·m denote the eigenvalues of Hm , with ·1 ≥ ·2 ≥

. . . ≥ ·m , then

(»1 ’ »n )(tan φ1 )2

»1 ≥ ·1 ≥ »1 ’ ,

(Tm’1 (1 + 2ρ1 ))2

234 5. Approximation of Eigenvalues and Eigenvectors

where cos φ1 = |(q(1) )T u1 |, ρ1 = (»1 ’ »2 )/(»2 ’ »n ) and Tm’1 (x) is the

Chebyshev polynomial of degree m ’ 1 (see Section 10.1.1).

A similar result holds of course for the convergence estimate of the eigen-

values ·m to »n

(»1 ’ »n )(tan φn )2

»n ¤ ·m ¤ »n + ,

(Tm’1 (1 + 2ρn ))2

where ρn = (»n’1 ’ »n )/(»1 ’ »n’1 ) and cos φn = |(q(n) )T un |.

A naive implementation of the Lanczos algorithm can be a¬ected by nu-

merical instability due to propagation of rounding errors. In particular, the

Lanczos vectors will not verify the mutual orthogonality relation, making

the extremal properties (5.67) false. This requires careful programming of

the Lanczos iteration by incorporating suitable reorthogonalization proce-

dures as described in [GL89], Sections 9.2.3-9.2.4.

Despite this limitation, the Lanczos method has two relevant features:

it preserves the sparsity pattern of the matrix (unlike Householder tridiag-

onalization), and such a property makes it quite attractive when dealing

with large size matrices; furthermore, it converges to the extremal eigen-

values of A much more rapidly than the power method does (see [Kan66],

[GL89], p. 477).

The Lanczos method can be generalized to compute the extremal eigen-

values of an unsymmetric matrix along the same lines as in Section 4.5

in the case of the solution of a linear system. Details on the practical im-

plementation of the algorithm and a theoretical convergence analysis can

be found in [LS96] and [Jia95], while some documentation of the latest

software can be found in NETLIB/scalapack/readme.arpack (see also the

MATLAB command eigs).

An implementation of the Lanczos algorithm is provided in Program 45.

The input parameter m is the size of the Krylov subspace in the tridiago-

nalization procedure, while toll is a tolerance monitoring the size of the

increment of the computed eigenvalues between two successive iterations.

The output vectors lmin, lmax and deltaeig contain the sequences of the

approximate extremal eigenvalues and of their increments between succes-

sive iterations. Program 42 is invoked for computing the eigenvalues of the

tridiagonal matrix Hm .

Program 45 - eiglancz : Extremal eigenvalues of a symmetric matrix

function [lmin,lmax,deltaeig,k]=eiglancz(A,m,toll)

n=size(A); V=[0*[1:n]™,[1,0*[1:n-1]]™];

beta(1)=0; normb=1; k=1; deltaeig(1)=1;

while k <= m & normb >= eps & deltaeig(k) < toll

5.12 Applications 235

vk = V(:,k+1); w = A*vk-beta(k)*V(:,k);

alpha(k)= w™*vk; w = w - alpha(k)*vk;

normb = norm(w,2); beta(k+1)=normb;

if normb ˜= 0

V=[V,w/normb];

if k==1

lmin(1)=alpha; lmax(1)=alpha;

k=k+1; deltaeig(k)=1;

else

d=alpha; b=beta(2:length(beta)-1);

[ak,bk,ck,nch,niter]=givsturm(d,b,1,toll);

lmax(k)=(ak(niter)+bk(niter))/2;

[ak,bk,ck,nch,niter]=givsturm(d,b,k,toll);

lmin(k)=(ak(niter)+bk(niter))/2;

deltaeig(k+1)=max(abs(lmin(k)-lmin(k-1)),abs(lmax(k)-lmax(k-1)));

k=k+1;

end

else

disp(™Breakdown™);

d=alpha; b=beta(2:length(beta)-1);

[ak,bk,ck,nch,niter]=givsturm(d,b,1,toll);

lmax(k)=(ak(niter)+bk(niter))/2;

[ak,bk,ck,nch,niter]=givsturm(d,b,k,toll);

lmin(k)=(ak(niter)+bk(niter))/2;

deltaeig(k+1)=max(abs(lmin(k)-lmin(k-1)),abs(lmax(k)-lmax(k-1)));

k=k+1;

end

end

k=k-1;

return

Example 5.18 Consider the eigenvalue problem for the matrix A∈ Rn—n with

n = 100, having diagonal entries equal to 2 and o¬-diagonal entries equal to -1

on the upper and lower tenth diagonal. Program 45, with m=100 and toll=eps,

takes 10 iterations to approximate the extremal eigenvalues of A with an absolute

•

error of the order of the machine precision.

5.12 Applications

A classical problem in engineering is to determine the proper or natural

frequencies of a system (mechanical, structural or electric). Typically, this

leads to solving a matrix eigenvalue problem. Two examples coming from

structural applications are presented in the forthcoming sections where the

buckling problem of a beam and the study of the free vibrations of a bridge

are considered.

236 5. Approximation of Eigenvalues and Eigenvectors

5.12.1 Analysis of the Buckling of a Beam

Consider the homogeneous and thin beam of length L shown in Figure

5.4. The beam is simply supported at the end and is subject to a normal

compression load P at x = L. Denote by y(x) the vertical displacement

of the beam; the structure constraints demand that y(0) = y(L) = 0. Let

L

P

x

y

FIGURE 5.4. A simply supported beam subject to a normal compression load

us consider the problem of the buckling of the beam. This amounts to

determining the critical load Pcr , i.e. the smallest value of P such that an

equilibrium con¬guration of the beam exists which is di¬erent from being

rectilinear. Reaching the condition of critical load is a warning of structure

instability, so that it is quite important to determine its value accurately.

The explicit computation of the critical load can be worked out under

the assumption of small displacements, writing the equilibrium equation

for the structure in its deformed con¬guration (drawn in dashed line in

Figure 5.4)

’E (J(x)y (x)) = Me (x), 0<x<L

(5.68)

y(0) = y(L) = 0,

where E is the constant Young™s modulus of the beam and Me (x) = P y(x)

is the momentum of the load P with respect to a generic point of the beam

of abscissa x. In (5.68) we are assuming that the momentum of inertia

J can be varying along the beam, which indeed happens if the beam has

nonuniform cross-section.

Equation (5.68) expresses the equilibrium between the external momen-

tum Me and the internal momentum Mi = ’E(Jy ) which tends to restore

the rectilinear equilibrium con¬guration of the beam. If the stabilizing re-

action Mi prevails on the unstabilizing action Me , the equilibrium of the

initial rectilinear con¬guration is stable. The critical situation (buckling of

the beam) clearly arises when Mi = Me .

Assume that J is constant and let ±2 = P/(EJ); solving the boundary value

problem (5.68), we get the equation C sin ±L = 0, which admits nontrivial

5.12 Applications 237

solutions ± = (kπ)/L, k = 1, 2, . . . . Taking k = 1 yields the value of the

2

critical load Pcr = π L2 .

EJ

To solve numerically the boundary value problem (5.68) it is conve-

nient to introduce for n ≥ 1, the discretization nodes xj = jh, with

h = L/(n + 1) and j = 1, . . . , n, thus de¬ning the vector of nodal ap-

proximate displacements uj at the internal nodes xj (where u0 = y(0) = 0,

un+1 = y(L) = 0). Then, using the ¬nite di¬erence method (see Section

12.2), the calculation of the critical load amounts to determining the small-

est eigenvalue of the tridiagonal symmetric and positive de¬nite matrix

A = tridiagn (’1, 2, ’1) ∈ Rn—n .

It can indeed be checked that the ¬nite di¬erence discretization of prob-

lem (5.68) by centered di¬erences leads to the following matrix eigenvalue

problem

Au = ±2 h2 u,

where u ∈ Rn is the vector of nodal displacements uj . The discrete coun-

terpart of condition C sin(±) = 0 requires that P h2 /(EJ) coincides with

the eigenvalues of A as P varies.

h

Denoting by »min and Pcr , the smallest eigenvalue of A and the (approx-

imate) value of the critical load, respectively, then Pcr = (»min EJ)/h2 .

h

Letting θ = π/(n + 1), it can be checked (see Exercise 3, Chapter 4) that

the eigenvalues of matrix A are

»j = 2(1 ’ cos(jθ)), j = 1, . . . , n. (5.69)

The numerical calculation of »min has been carried out using the Givens

algorithm described in Section 5.10.2 and assuming n = 10. Running the

Program 42 with an absolute tolerance equal to the roundo¬ unit, the

solution »min 0.081 has been obtained after 57 iterations.

It is also interesting to analyze the case where the beam has nonuniform

cross-section, since the value of the critical load, unlike the previous situa-

tion, is not exactly known a priori. We assume that, for each x ∈ [0, L], the

section of the beam is rectangular, with depth a ¬xed and height σ that

varies according to the rule

2

S x

’1 ’1 0 ¤ x ¤ L,

σ(x) = s 1 + ,

s L

where S and s are the values at the ends, with S ≥ s > 0. The momentum

of inertia, as a function of x, is given by J(x) = (1/12)aσ 3 (x); proceeding

similarly as before, we end up with a system of linear algebraic equations

of the form

˜

Au = (P/E)h2 u,

˜

where this time A = tridiagn (b, d, b) is a tridiagonal, symmetric and posi-

tive de¬nite matrix having diagonal entries di = J(xi’1/2 ) + J(xi+1/2 ), for

i = 1, . . . , n, and o¬-diagonal entries bi = ’J(xi+1/2 ), for i = 1, . . . , n ’ 1.

238 5. Approximation of Eigenvalues and Eigenvectors

Assume the following values of the parameters: a = 0.4 [m], s = a, S =

0.5 [m] and L = 10 [m]. To ensure a correct dimensional comparison, we

¯

have multiplied by J = a4 /12 the smallest eigenvalue of the matrix A in the

uniform case (corresponding to S = s = a), obtaining »min = 1.7283 · 10’4 .

Running Program 42, with n = 10, yields in the nonuniform case the value

»min = 2.243 · 10’4 . This result con¬rms that the critical load increases

for a beam having a wider section at x = 0, that is, the structure enters

the instability regime for higher values of the load than in the uniform

cross-section case.

5.12.2 Free Dynamic Vibration of a Bridge

We are concerned with the analysis of the free response of a bridge whose

schematic structure is shown in Figure 5.5. The number of the nodes of

the structure is equal to 2n while the number of the beams is 5n. Each

horizontal and vertical beam has a mass equal to m while the diagonal

√

beams have mass equal to m 2. The sti¬ness of each beam is represented

by the spring constant κ. The nodes labeled by “0” and “2n + 1” are

constrained to ground.

2n ’ 2 2n ’ 1 2n

n+2

n+1 n+3

0 2n + 1

n’2 n’1

3 n

1 2

FIGURE 5.5. Schematic structure of a bridge

Denoting by x and y the vectors of the 2n nodal horizontal and vertical

displacements the free response of the bridge can be studied by solving the

generalized eigenvalue problems

Mx = »Kx, My = »Ky, (5.70)

√

2, b = (β, . . . , β)T ∈

where M = mdiag2n (±, b, ±, γ, b, γ), where ± = 3 +

√ √

Rn’2 with β = 3/2 + 2 and γ = 1 + 2,

K11 K12

K=κ

K12 K11

for a positive constant κ and where K12 = tridiagn (’1, ’1, ’1), K11 =

tridiagn (’1, d, ’1) with d = (4, 5, . . . , 5, 4)T ∈ Rn . The diagonal matrix

M is the mass matrix while the symmetric and positive de¬nite matrix K

is the sti¬ness matrix.

5.12 Applications 239

100

90

80

70

60

50

40

30

20

10

0

0 10 20 30 40 50 60 70 80 90 100

FIGURE 5.6. Iterations number of the Lanczos method and of the inverse power

method versus the size 2n of matrix C. The solid and the dash-dotted curves

˜ ˜

refer to the inverse power method (for »2n and »2n’1 respectively), while the

˜

dashed and the dotted curves refer to the Lanczos method (still for »2n and

˜

»2n’1 , respectively)

For k = 1, . . . , 2n we denote by (»k , zk ) any eigenvalue/eigenvector pair

√

of (5.70) and call ωk = »k the natural frequencies and zk the modes of

vibration of the bridge. The study of the free vibrations is of primary im-

portance in the design of a structure like a bridge or a multi-story building.

Indeed, if the excitation frequency of an external force (vehicles, wind or,

even worse, an earthquake) coincides with one of the natural frequencies of

the structure then a condition of resonance occurs and, as a result, large

oscillations may dangerously arise.

Let us now deal with the numerical solution of the matrix eigenvalue

problem (5.70). For this purpose we introduce the change of variable z =

M1/2 x (or z = M1/2 y) so that each generalized eigenvalue problem in (5.70)

can be conveniently reformulated as

˜

Cz = »z

where » = 1/» and the matrix C = M’1/2 KM’1/2 is symmetric positive

˜

de¬nite. This property allows us to use the Lanczos method described in

Section 5.11 and also ensures quadratic convergence of the power iterations

(see Section 5.11).

˜ ˜

We approximate the ¬rst two subdominant eigenvalues »2n and »2n’1

of the matrix C (i.e., its smallest and second smallest eigenvalues) in the

case m = κ = 1 using the de¬‚ation procedure considered in Remark 5.3.

The inverse power iteration and the Lanczos method are compared in the

˜ ˜

computation of »2n and »2n’1 in Figure 5.6.

The results show the superiority of the Lanczos method over the inverse

iterations only when the matrix C is of small size. This is to be ascribed

to the fact that, as n grows, the progressive in¬‚uence of the rounding er-

240 5. Approximation of Eigenvalues and Eigenvectors

rors causes a loss of mutual orthogonality of the Lanczos vectors and, in

turn, an increase in the number of iterations for the method to converge.

Suitable reorthogonalization procedures are thus needed to improve the

performances of the Lanczos iteration as pointed out in Section 5.11.

We conclude the free response analysis of the bridge showing in Figure

5.7 (in the case n = 5, m = 10 and κ = 1) the modes of vibration z8

and z10 corresponding to the natural frequencies ω8 = 990.42 and ω10 =

2904.59. The MATLAB built-in function eig has been employed to solve

the generalized eigenvalue problems (5.70) as explained in Section 5.9.1.

1.5 1.5

1 1

0.5 0.5

0 0

’0.5 ’0.5

’1 0 1 2 3 4 5 6 7 ’1 0 1 2 3 4 5 6 7

FIGURE 5.7. Modes of vibration corresponding to the natural frequencies ω8

(left) and ω10 (right). The undeformed con¬guration of the bridge is drawn in

dotted line

5.13 Exercises

1. Using the Gershgorin theorems, localize the eigenvalues of the matrix A

which is obtained setting A = (P’1 DP)T and then a13 = 0, a23 = 0, where

D=diag3 (1, 50, 100) and

®

1 1 1

P = 10 20 30 .

° »

100 50 60

[Solution : σ(A) = {’151.84, 80.34, 222.5}.]

2. Localize the spectrum of the matrix

®

’1

12

A= 2 7 0 .

° »

’1 0 5

[Solution : σ(A) ‚ [’2, 9].]

5.13 Exercises 241

3. Draw the oriented graph of the matrix

®

13 0

A = 0 2 ’1 .

° »

’1 0 2

4. Check if the following matrices are reducible.

® ®

0 ’1 0

1 0 0 1 0

2 3 ’2 1 1

0 0 0

A1 = , A2 = .

’1 0 ’2 0 0

0 1 0

° » ° »

1 ’1 14 1 0 0 0

[Solution : A1 , reducible; A2 , irreducible.]

5. Provide an estimate of the number of complex eigenvalues of the matrix

®

’4 0 0 0.5 0

22 4 ’3 1

A = 0.5 0 0 0 .

’1

0.5 0 3 0

0.2

° »

’1

2 0.5 3 4

[Hint : Check that A can be reduced to the form

M1 M2

A=

0 M3

where M1 ∈ R2—2 and M2 ∈ R3—3 . Then, study the eigenvalues of blocks M1

and M2 using the Gershgorin theorems and check that A has no complex

eigenvalues.]

6. Let A ∈ Cn—n be a diagonal matrix and let A = A + E be a perturbation

of A with eii = 0 for i = 1, . . . , n. Show that

n

|»i (A) ’ »i (A)| ¤ |eij |, i = 1, . . . , n. (5.71)

j=1

7. Apply estimate (5.71) to the case in which A and E are, for µ ≥ 0, the

matrices

1 0 0 µ

A= , E= .

0 2 µ 0

√

[Solution : σ(A) = {1, 2} and σ(A) = {(3 “ 1 + 4µ2 )/2}.]

242 5. Approximation of Eigenvalues and Eigenvectors

8. Check that ¬nding the zeros of a polynomial of degree ¤ n with real coef-

¬cients

n

ak xk = a0 + a1 x + ... + an xn , an = 0, ak ∈ R, k = 0, . . . n

pn (x) =

k=0

is equivalent to determining the spectrum of the Frobenius matrix C ∈

Rn—n associated with pn (known as the companion matrix)

®

’(an’1 /an ) ’(an’2 /an ) . . . ’(a1 /an ) ’(a0 /an )

1 0 ... 0 0

0 1 ... 0 0

C= .(5.72)

. . . .

..

° »

. . . .

.

. . . .

0 0 ... 1 0

An important consequence of the result above is that, due to Abel™s theo-

rem, there exist in general no direct methods for computing the eigenvalues

of a given matrix, for n ≥ 5.

9. Show that if matrix A ∈ Cn—n admits eigenvalue/eigenvector pairs (», x),

then the matrix UH AU, with U unitary, admits eigenvalue/eigenvector

pairs », UH x . (Similarity transformation using an orthogonal matrix).

10. Suppose that all the assumptions needed to apply the power method are

satis¬ed except for the requirement ±1 = 0 (see Section 5.3.1). Show that

in such a case the sequence (5.17) converges to the eigenvalue/eigenvector

pair (»2 , x2 ). Then, study experimentally the behaviour of the method,

computing the pair (»1 , x1 ) for the matrix

®

1 ’1 2

A = ’2 0 5 .

° »

6 ’3 6

√

For this, use Program 26, taking q(0) = 1T / 3 and q(0) = w(0) / w(0) 2 ,

respectively, where w(0) = (1/3)x2 ’ (2/3)x3 .

[Solution : »1 = 5, »2 = 3, »3 = ’1 and x1 = [5, 16, 18]T , x2 = [1, 6, 4]T ,

x3 = [5, 16, 18]T .]

11. Show that the companion matrix associated with the polynomial pn (x) =

xn + an xn’1 + . . . + a1 , can be written in the alternative form (5.72)

®

0

0 a1

’1

0 a2

.. .. ..

A= .

. . .

’1

° »

0 an’1

0 ’1 an

12. (From [FF63]) Suppose that a real matrix A ∈ Rn—n has two maximum

module complex eigenvalues given by »1 = ρeiθ and »2 = ρe’iθ , with

5.13 Exercises 243

θ = 0. Assume, moreover, that the remaining eigenvalues have modules

less than ρ. The power method can then be modi¬ed as follows:

let q(0) be a real vector and q(k) be the vector provided by the power

(k)

method without normalization. Then, set xk = qn0 for some n0 , with

1 ¤ n0 ¤ n. Prove that

xk xk+2 ’ x2 »3

k+1

+O

2 k

ρ= ,

xk’1 xk+1 ’ x2 ρ

k

ρxk’1 + r’1 xk+1 »3

+O k

cos(θ) = .

2xk ρ

[Hint : ¬rst, show that

»3

xk = C(ρk cos(kθ + ±)) + O k

,

ρ

where ± depends on the components of the initial vector along the direc-

tions of the eigenvectors associated with »1 and »2 .]

13. Apply the modi¬ed power method of Exercise 12 to the matrix

®

1 ’1 4 1

4

A=° 1 0 »,

0

0 1 0

and compare the obtained results with those yielded by the standard power

method.

14. (Taken from [Dem97]). Apply the QR iteration with double shift to com-

pute the eigenvalues of the matrix

®

00 1

°1 0 0 ».

A=

01 0

Run Program 37 setting toll=eps, itmax=100 and comment about the

form of the obtained matrix T(iter) after iter iterations of the algorithm.

[Solution : the eigenvalues of A are the solution of »3 ’ 1 = 0, i.e.,

√

σ(A) = 1, ’1/2 ± 3/2i . After iter=100 iterations, Program 37 yields

the matrix

®

’1

0 0

T(100) = ° 1 0 »,

0

0 ’1 0

which means that the QR iteration leaves A unchanged (except for sign

changes that are non relevant for eigenvalues computation). This is a simple

but glaring example of matrix for which the QR method with double shift

fails to converge.]

6

Root¬nding for Nonlinear Equations

This chapter deals with the numerical approximation of the zeros of a real-

valued function of one variable, that is

given f : I = (a, b) ⊆ R ’ R, ¬nd ± ∈ C such that f (±) = 0. (6.1)

The analysis of problem (6.1) in the case of systems of nonlinear equations

will be addressed in Chapter 7.

Methods for the numerical approximation of a zero of f are usually iter-

ative. The aim is to generate a sequence of values x(k) such that

lim x(k) = ±.

k’∞

The convergence of the iteration is characterized by the following de¬nition.

De¬nition 6.1 A sequence x(k) generated by a numerical method is

said to converge to ± with order p ≥ 1 if

|x(k+1) ’ ±|

∃C > 0 : ¤ C, ∀k ≥ k0 , (6.2)

|x(k) ’ ±|p

where k0 ≥ 0 is a suitable integer. In such a case, the method is said to be

of order p. Notice that if p is equal to 1, in order for x(k) to converge to

± it is necessary that C < 1 in (6.2). In such an event, the constant C is

called the convergence factor of the method.

Unlike the case of linear systems, convergence of iterative methods for

root¬nding of nonlinear equations depends in general on the choice of the

246 6. Root¬nding for Nonlinear Equations

initial datum x(0) . This allows for establishing only local convergence re-

sults, that is, holding for any x(0) which belongs to a suitable neighborhood

of the root ±. Methods for which convergence to ± holds for any choice of

x(0) in the interval I, are said to be globally convergent to ±.

6.1 Conditioning of a Nonlinear Equation

Consider the nonlinear equation f (x) = •(x) ’ d = 0 and assume that f

is a continuously di¬erentiable function. Let us analyze the sensitivity of

¬nding the roots of f with respect to changes in the datum d.

The problem is well posed only if the function • is invertible. In such a

case, indeed, one gets ± = •’1 (d) from which, using the notation of Chapter

2, the resolvent G is •’1 . On the other hand, (•’1 ) (d) = 1/• (±), so that

formula (2.7) for the approximate condition number (relative and absolute)

yields

|d| 1

K(d) , Kabs (d) . (6.3)

|±||f (±)| |f (±)|

The problem is thus ill-conditioned when f (±) is “small” and well-condi-

tioned if f (±) is “large”.

The analysis which leads to (6.3) can be generalized to the case in which

± is a root of f with multiplicity m > 1 as follows. Expanding • in a Taylor

series around ± up to the m-th order term, we get

m

•(k) (±)

(δ±)k + o((δ±)m ).

d + δd = •(± + δ±) = •(±) +

k!

k=1

Since •(k) (±) = 0 for k = 1, . . . , m ’ 1, we obtain

δd = f (m) (±)(δ±)m /m!

so that an approximation to the absolute condition number is

1/m

m!δd 1

Kabs (d) . (6.4)

|δd|

f (m) (±)

Notice that (6.3) is the special case of (6.4) where m = 1. From this it also

follows that, even if δd is su¬ciently small to make |m!δd/f (m) (±)| < 1,

Kabs (d) could nevertheless be a large number. We therefore conclude that

the problem of root¬nding of a nonlinear equation is well-conditioned if ±

is a simple root and |f (±)| is de¬nitely di¬erent from zero, ill-conditioned

otherwise.

Let us now consider the following problem, which is closely connected

with the previous analysis. Assume d = 0 and let ± be a simple root of f ;

6.1 Conditioning of a Nonlinear Equation 247

moreover, for ± = ±, let f (ˆ ) = r = 0. We seek a bound for the di¬erence

ˆ ± ˆ

± ’ ± as a function of the residual r. Applying (6.3) yields

ˆ ˆ

1

Kabs (0) .

|f (±)|

Therefore, letting δx = ± ’ ± and δd = r in the de¬nition of Kabs (see

ˆ ˆ

(2.5)), we get

|ˆ ’ ±| |ˆ|

± r

, (6.5)

|±| |f (±)||±|

where the following convention has been adopted: if a ¤ b and a c, then

we write a c. If ± has multiplicity m > 1, using (6.4) instead of (6.3) and

proceeding as above, we get

1/m

|ˆ ’ ±|

± m!

|ˆ|1/m .

r (6.6)

|±| |f (m) (±)||±|m

These estimates will be useful in the analysis of stopping criteria for itera-

tive methods (see Section 6.5).

A remarkable example of a nonlinear problem is when f is a polynomial

pn of degree n, in which case it admits exactly n roots ±i , real or complex,

each one counted with its multiplicity. We want to investigate the sensitivity

of the roots of pn with respect to the changes of its coe¬cients.

To this end, let pn = pn + qn , where qn is a perturbation polynomial of

ˆ

degree n, and let ±i be the corresponding roots of pn . A direct use of (6.6)

ˆ ˆ

yields for any root ±i the following estimate

1/m

|ˆ i ’ ±i |

± m!

|qn (ˆ i )|1/m = S i ,

i

Erel = ± (6.7)

|±i | (m)

|pn (±i )||±i |m

where m is the multiplicity of the root at hand and qn (ˆ i ) = ’pn (ˆ i ) is

± ±

the “residual” of the polynomial pn evaluated at the perturbed root.

Remark 6.1 A formal analogy exists between the a priori estimates so

far obtained for the nonlinear problem •(±) = d and those developed in

Section 3.1.2 for linear systems, provided that A corresponds to • and b

to d. More precisely, (6.5) is the analogue of (3.9) if δA=0, and the same

holds for (6.7) (for m = 1) if δb = 0.

Example 6.1 Let p4 (x) = (x ’ 1)4 , and let p4 (x) = (x ’ 1)4 ’ µ, with 0 < µ

ˆ 1.

√

4

The roots of the perturbed polynomial are simple and equal to ±i = ±i + µ,

ˆ

where ±i = 1 are the (coincident) zeros of p4 . They lie with intervals of π/2 on

√

the circle of radius 4 µ and center z = (1, 0) in the complex plane.

248 6. Root¬nding for Nonlinear Equations

The problem is stable (that is limµ’0 ±i = 1), but is ill-conditioned since

ˆ

√

|±i ’ ±i |

ˆ

= 4 µ, i = 1, . . . 4,

|±i |

For example, if µ = 10’4 the relative change is 10’1 . Notice that the right-side

√

•

of (6.7) is just 4 µ, so that, in this case, (6.7) becomes an equality.

Example 6.2 (Wilkinson). Consider the following polynomial

p10 (x) = Π10 (x + k) = x10 + 55x9 + . . . + 10!.

k=1

Let p10 = p10 + µx9 , with µ = 2’23 1.2 · 10’7 . Let us study the conditioning of

ˆ

¬nding the roots of p10 . Using (6.7) with m = 1, we report for i = 1, . . . , 10 in

Table 6.1 the relative errors Erel and the corresponding estimates S i .

i

These results show that the problem is ill-conditioned, since the maximum

relative error for the root ±8 = ’8 is three orders of magnitude larger than

the corresponding absolute perturbation. Moreover, excellent agreement can be

•

observed between the a priori estimate and the actual relative error.

i

Si i

Si

i Erel i Erel

3.039 · 10’13 3.285 · 10’13 6.956 · 10’5 6.956 · 10’5

1 6

7.562 · 10’10 7.568 · 10’10 1.589 · 10’4 1.588 · 10’4

2 7

7.758 · 10’8 7.759 · 10’8 1.984 · 10’4 1.987 · 10’4

3 8

1.808 · 10’6 1.808 · 10’6 1.273 · 10’4 1.271 · 10’4

4 9

1.616 · 10’5 1.616 · 10’5 3.283 · 10’5 3.286 · 10’5

5 10

TABLE 6.1. Relative error and estimated error using (6.7) for the Wilkinson

polynomial of degree 10

6.2 A Geometric Approach to Root¬nding

In this section we introduce the following methods for ¬nding roots: the

bisection method, the chord method, the secant method, the false position

(or Regula Falsi) method and Newton™s method. The order of the presen-

tation re¬‚ects the growing complexity of the algorithms. In the case of the

bisection method, indeed, the only information that is being used is the sign

of the function f at the end points of any bisection (sub)interval, whilst

the remaining algorithms also take into account the values of the function

and/or its derivative.

6.2.1 The Bisection Method

The bisection method is based on the following property.

6.2 A Geometric Approach to Root¬nding 249

Property 6.1 (theorem of zeros for continuous functions) Given a

continuous function f : [a, b] ’ R, such that f (a)f (b) < 0, then ∃ ± ∈ (a, b)

such that f (±) = 0.

Starting from I0 = [a, b], the bisection method generates a sequence of

subintervals Ik = [a(k) , b(k) ], k ≥ 0, with Ik ‚ Ik’1 , k ≥ 1, and enjoys the

property that f (a(k) )f (b(k) ) < 0. Precisely, we set a(0) = a, b(0) = b and

x(0) = (a(0) + b(0) )/2; then, for k ≥ 0:

set a(k+1) = a(k) , b(k+1) = x(k) if f (x(k) )f (a(k) ) < 0;

set a(k+1) = x(k) , b(k+1) = b(k) if f (x(k) )f (b(k) ) < 0;

¬nally, set x(k+1) = (a(k+1) + b(k+1) )/2.

y 0

10

’2

10

f (x) ’4

10

± ’6

x 10

a b

x(1)

x(0)

’8

10

’10

10

I1 ’12

10

0 5 10 15 20 25 30

I0

FIGURE 6.1. The bisection method. The ¬rst two steps (left); convergence history

for the Example 6.3 (right). The number of iterations and the absolute error as

a function of k are reported on the x- and y-axis, respectively

The bisection iteration terminates at the m-th step for which |x(m) ’±| ¤

|Im | ¤ µ, where µ is a ¬xed tolerance and |Im | is the length of Im . As for

the speed of convergence of the bisection method, notice that |I0 | = b ’ a,

while

|Ik | = |I0 |/2k = (b ’ a)/2k , k ≥ 0. (6.8)

Denoting by e(k) = x(k) ’ ± the absolute error at step k, from (6.8) it

follows that |e(k) | ¤ (b ’ a)/2k , k ≥ 0, which implies limk’∞ |e(k) | = 0.

The bisection method is therefore globally convergent. Moreover, to get

|x ’ ±| ¤ µ we must take

(m)

log((b ’ a)/µ) log((b ’ a)/µ)

m ≥ log2 (b ’ a) ’ log2 (µ) = . (6.9)

log(2) 0.6931

In particular, to gain a signi¬cant ¬gure in the accuracy of the approxi-

mation of the root (that is, to have |x(k) ’ ±| = |x(j) ’ ±|/10), one needs

250 6. Root¬nding for Nonlinear Equations

k ’ j = log2 (10) 3.32 bisections. This singles out the bisection method as

an algorithm of certain, but slow, convergence. We must also point out that

the bisection method does not generally guarantee a monotone reduction

of the absolute error between two successive iterations, that is, we cannot

ensure a priori that

|e(k+1) | ¤ Mk |e(k) |, for any k ≥ 0, (6.10)

with Mk < 1. For this purpose, consider the situation depicted in Figure

6.1 (left), where clearly |e(1) | > |e(0) |. Failure to satisfy (6.10) does not

allow for qualifying the bisection method as a method of order 1, in the

sense of De¬nition 6.1.

Example 6.3 Let us check the convergence properties of the bisection method

in the approximation of the root ± 0.9062 of the Legendre polynomial of degree

5

x

L5 (x) = (63x4 ’ 70x2 + 15),

8

whose roots lie within the interval (’1, 1) (see Section 10.1.2). Program 46 has

been run taking a = 0.6, b = 1 (whence, L5 (a) · L5 (b) < 0), nmax = 100,

toll = 10’10 and has reached convergence in 32 iterations, this agrees with

the theoretical estimate (6.9) (indeed, m ≥ 31.8974). The convergence history is

reported in Figure 6.1 (right) and shows an (average) reduction of the error by a

factor of two, with an oscillating behavior of the sequence {x(k) }. •

The slow reduction of the error suggests employing the bisection method

as an “approaching” technique to the root. Indeed, taking few bisection

steps, a reasonable approximation to ± is obtained, starting from which a

higher order method can be successfully used for a rapid convergence to

the solution within the ¬xed tolerance. An example of such a procedure

will be addressed in Section 6.7.1.

The bisection algorithm is implemented in Program 46. The input pa-

rameters, here and in the remainder of this chapter, have the following

meaning: a and b denote the end points of the search interval, fun is the

variable containing the expression of the function f , toll is a ¬xed toler-

ance and nmax is the maximum admissible number of steps for the iterative

process.

In the output vectors xvect, xdif and fx the sequences {x(k) }, {|x(k+1) ’

x(k) |} and {f (x(k) )}, for k ≥ 0, are respectively stored, while nit denotes

the number of iterations needed to satisfy the stopping criteria. In the case

of the bisection method, the code returns as soon as the half-length of the

search interval is less than toll.

Program 46 - bisect : Bisection method

function [xvect,xdif,fx,nit]=bisect(a,b,toll,nmax,fun)

err=toll+1; nit=0; xvect=[]; fx=[]; xdif=[];

while (nit < nmax & err > toll)

6.2 A Geometric Approach to Root¬nding 251

nit=nit+1; c=(a+b)/2; x=c; fc=eval(fun); xvect=[xvect;x];

fx=[fx;fc]; x=a; if (fc*eval(fun) > 0), a=c; else, b=c; end;

err=abs(b-a); xdif=[xdif;err];

end;

6.2.2 The Methods of Chord, Secant and Regula Falsi and

Newton™s Method

In order to devise algorithms with better convergence properties than the

bisection method, it is necessary to include information from the values

attained by f and, possibly, also by its derivative f (if f is di¬erentiable)

or by a suitable approximation.

For this purpose, let us expand f in a Taylor series around ± and truncate

the expansion at the ¬rst order. The following linearized version of problem

(6.1) is obtained

f (±) = 0 = f (x) + (± ’ x)f (ξ), (6.11)

for a suitable ξ between ± and x. Equation (6.11) prompts the following

iterative method: for any k ≥ 0, given x(k) , determine x(k+1) by solving

equation f (x(k) ) + (x(k+1) ’ x(k) )qk = 0, where qk is a suitable approxima-

tion of f (x(k) ).

The method described here amounts to ¬nding the intersection between

the x-axis and the straight line of slope qk passing through the point

(x(k) , f (x(k) )), and thus can be more conveniently set up in the form

’1

x(k+1) = x(k) ’ qk f (x(k) ), ∀k ≥ 0.

We consider below four particular choices of qk .

y

y

f (x) f (x)

x(1) x(2) x(1)

x(0) x

x ±

x(3)

a ± b a b

FIGURE 6.2. The ¬rst step of the chord method (left) and the ¬rst three steps

of the secant method (right). For this method we set x(’1) = b and x(0) = a

252 6. Root¬nding for Nonlinear Equations

The chord method. We let

f (b) ’ f (a)

∀k ≥ 0

qk = q = ,

b’a

from which, given an initial value x(0) , the following recursive relation is

obtained

b’a

x(k+1) = x(k) ’ k ≥ 0.

f (x(k) ), (6.12)

f (b) ’ f (a)

In Section 6.3.1, we shall see that the sequence {x(k) } generated by (6.12)

converges to the root ± with order of convergence p = 1.

The secant method. We let

f (x(k) ) ’ f (x(k’1) )

∀k ≥ 0

qk = , (6.13)

x(k) ’ x(k’1)

from which, giving two initial values x(’1) and x(0) , we obtain the following

relation

x(k) ’ x(k’1)

x(k+1) = x(k) ’ k ≥ 0.

f (x(k) ), (6.14)

(k) ) ’ f (x(k’1) )

f (x

If compared with the chord method, the iterative process (6.14) requires

an extra initial point x(’1) and the corresponding function value f (x(’1) ),

as well as, for any k, computing the incremental ratio (6.13). The bene¬t

due to the increase in the computational cost is the higher speed of con-

vergence of the secant method, as stated in the following property which

can be regarded as a ¬rst example of the local convergence theorem (for the

proof see [IK66], pp. 99-101).

Property 6.2 Let f ∈ C 2 (J ), J being a suitable neighborhood of the root

± and assume that f (±) = 0. Then, if the initial data x(’1) and x(0) are

chosen in J su¬ciently close to ±, the sequence (6.14) converges to ± with

√

order p = (1 + 5)/2 1.63.

The Regula Falsi (or false position) method. This is a variant of

the secant method in which, instead of selecting the secant line through

the values (x(k) , f (x(k) ) and (x(k’1) , f (x(k’1) ), we take the one through

(x(k) , f (x(k) ) and (x(k ) , f (x(k ) ), k being the maximum index less than k

such that f (x(k ) ) · f (x(k) ) < 0. Precisely, once two values x(’1) and x(0)

have been found such that f (x(’1) ) · f (x(0) ) < 0, we let

x(k) ’ x(k )

x(k+1) = x(k) ’ k ≥ 0.

f (x(k) ), (6.15)

(k) ) ’ f (x(k ) )

f (x

6.2 A Geometric Approach to Root¬nding 253

Having ¬xed an absolute tolerance µ, the iteration (6.15) terminates at the

m-th step such that |f (x(m) )| < µ. Notice that the sequence of indices k

is nondecreasing; therefore, in order to ¬nd at step k the new value of k ,

it is not necessary to sweep all the sequence back, but it su¬ces to stop at

the value of k that has been determined at the previous step. We show in

Figure 6.3 (left) the ¬rst two steps of (6.15) in the special case in which

x(k ) coincides with x(’1) for any k ≥ 0.

The Regula Falsi method, though of the same complexity as the secant

method, has linear convergence order (see, for example, [RR78], pp. 339-