<<

. 10
( 26)



>>

»i = tii /sii , if sii = 0,
»i = ∞, if tii = 0, sii = 0.

Exactly as in the matrix eigenvalue problem, the generalized Schur form
cannot be explicitly computed, so the counterpart of the real Schur form
(5.34) has to be computed. Assuming that the matrices A and B are real, it
˜ ˜
can be shown that there exist two orthogonal matrices U and Z such that
˜ ˜ ˜ ˜ ˜ ˜
T = UT AZ is upper quasi-triangular and S = UT BZ is upper triangular.
This decomposition is known as the generalized real Schur decomposition
of a pair (A,B) and can be computed by a suitably modi¬ed version of the
QR algorithm, known as QZ iteration, which consists of the following steps
(for a more detailed description, see [GL89], Section 7.7, [Dat95], Section
9.3):
1. reduce A and B into upper Hessenberg form and upper triangular
form, respectively, i.e., ¬nd two orthogonal matrices Q and Z such
that A = QT AZ is upper Hessenberg and B = QT BZ is upper trian-
gular;
2. the QR iteration is applied to the matrix AB ’1 to reduce it to real
Schur form.


To save computational resources, the QZ algorithm overwrites the matrices
A and B on their upper Hessenberg and triangular forms and requires 30n3
¬‚ops; an additional cost of 36n3 operations is required if Q and Z are
226 5. Approximation of Eigenvalues and Eigenvectors

also needed. The method is implemented in the LAPACK library in the
subroutine sgges and can be invoked in the MATLAB environment with
the command eig(A,B).


5.9.2 Generalized Real Schur Form of Symmetric-De¬nite
Pencils
A remarkable situation occurs when both A and B are symmetric, and one
of them, say B, is also positive de¬nite. In such a case, the pair (A,B) forms
a symmetric-de¬nite pencil for which the following result holds.

Theorem 5.7 The symmetric-de¬nite pencil (A,B) has real eigenvalues
and linearly independent eigenvectors. Moreover, the matrices A and B can
be simultaneously diagonalized. Precisely, there exists a nonsingular matrix
X ∈ Rn—n such that

XT AX = Λ = diag(»1 , »2 , . . . , »n ), XT BX = In ,

where for i = 1, . . . , n, »i are the eigenvalues of the pencil (A, B).
Proof. Since B is symmetric positive de¬nite, it admits a unique Cholesky fac-
torization B = HT H, where H is upper triangular (see Section 3.4.2). From (5.58)
we deduce that Cz = »z with C = H’T AH’1 , z = Hx, where (», x) is an
eigenvalue/eigenvector pair of (A,B).
The matrix C is symmetric; therefore, its eigenvalues are real and a set of
orthonormal eigenvectors (y1 , . . . , yn ) = Y exists. As a consequence, letting X =
H’1 Y allows for simultaneously diagonalizing both A and B since

XT AX = YT H’T AH’1 Y = YT CY = Λ = diag(»1 , . . . , »n ),

XT BX = YT H’T BH’1 Y = YT Y = In .
3

The following QR-Cholesky algorithm computes the eigenvalues »i and
the corresponding eigenvectors xi of a symmetric-de¬nite pencil (A,B), for
i = 1, . . . , n (see for more details [GL89], Section 8.7, [Dat95], Section 9.5):
1. compute the Cholesky factorization B = HT H;
2. compute C = H’T AH’1 ;
3. for i = 1, . . . , n, compute the eigenvalues »i and eigenvectors zi of the
symmetric matrix C using the QR iteration. Then construct from the
set {zi } an orthonormal set of eigenvectors {yi } (using, for instance,
the modi¬ed Gram-Schmidt procedure of Section 3.4.3);
4. for i = 1, . . . , n, compute the eigenvectors xi of the pencil (A,B) by
solving the systems Hxi = yi .
5.10 Methods for Eigenvalues of Symmetric matrices 227

This algorithm requires an order of 14n3 ¬‚ops and it can be shown (see
ˆ
[GL89], p. 464) that, if » is a computed eigenvalue, then

» ∈ σ(H’T AH’1 + E), B’1 2 .
ˆ with E uA
2 2

Thus, the generalized eigenvalue problem in the symmetric-de¬nite case
may become unstable with respect to rounding errors propagation if B is
ill-conditioned. For a stabilized version of the QR-Cholesky method, see
[GL89], p. 464 and the references cited therein.


5.10 Methods for Eigenvalues of Symmetric
matrices
In this section we deal with the computation of the eigenvalues of a sym-
metric matrix A ∈ Rn—n . Besides the QR method previously examined,
speci¬c algorithms which take advantage of the symmetry of A are avail-
able.
Among these, we ¬rst consider the Jacobi method, which generates a
sequence of matrices orthogonally similar to A and converging to the diag-
onal Schur form of A. Then, the Sturm sequence and Lanczos procedures
are presented, for handling the case of tridiagonal matrices and large sparse
matrices respectively.


5.10.1 The Jacobi Method
The Jacobi method generates a sequence of matrices A(k) that are orthog-
onally similar to matrix A and converge to a diagonal matrix whose entries
are the eigenvalues of A. This is done using the Givens similarity transfor-
mations (5.43) as follows.
Given A(0) = A, for any k = 1, 2, . . . , a pair of indices p and q is ¬xed,
with 1 ¤ p < q ¤ n. Next, letting Gpq = G(p, q, θ), the matrix A(k) =
(Gpq )T A(k’1) Gpq , orthogonally similar to A, is constructed in such a way
that
(k)
aij = 0 if (i, j) = (p, q). (5.59)
Letting c = cos θ and s = sin θ, the procedure for computing the entries of
A(k) that are changed with respect to those of A(k’1) , can be written as
®  ® 
T
(k) (k) (k’1) (k’1)
app apq cs app apq cs
° »= ° » . (5.60)
’s c ’s c
(k) (k) (k’1) (k’1)
apq aqq apq aqq
(k’1) (k’1)
If apq = 0, we can satisfy (5.59) by taking c = 1 and s = 0. If apq =
0, letting t = s/c, (5.60) requires the solution of the following algebraic
228 5. Approximation of Eigenvalues and Eigenvectors

equation
(k’1) (k’1)
’ app
aqq
t + 2·t ’ 1 = 0,
2
·= . (5.61)
(k’1)
2apq

The root t = 1/(· + 1 + · 2 ) is chosen in (5.61) if · ≥ 0, otherwise we
take t = ’1/(’· + 1 + · 2 ); next, we let
1
c= √ , s = ct. (5.62)
1 + t2

To examine the rate at which the o¬-diagonal entries of A(k) tend to zero,
it is convenient to introduce, for any matrix M ∈ Rn—n , the nonnegative
quantity
« 1/2
1/2
n n
¬ ·

m2  2
m2
Ψ(M) =  = M . (5.63)
ij F ii
i,j=1 i=1
i=j


The Jacobi method ensures that Ψ(A(k) ) ¤ Ψ(A(k’1) ) for any k ≥ 1.
Indeed, the computation of (5.63) for matrix A(k) yields
2
(Ψ(A(k) ))2 = (Ψ(A(k’1) ))2 ’ 2 a(k’1) ¤ (Ψ(A(k’1) ))2 . (5.64)
pq


The estimate (5.64) suggests that, at each step k, the optimal choice of the
indices p and q is that corresponding to the entry in A(k’1) such that
(k’1)
|apq | = max |aij |.
(k’1)
i=j

The computational cost of this strategy is of the order of n2 ¬‚ops for the
search of the maximum module entry, while the updating step A(k) =
(Gpq )T A(k’1) Gpq requires only a cost of the order of n ¬‚ops, as already
noticed in Section 5.6.5. It is thus convenient to resort to the so called row
cyclic Jacobi method, in which the choice of the indices p and q is done by
a row-sweeping of the matrix A(k) according to the following algorithm: for
any k = 1, 2, . . . and for any i-th row of A(k) (i = 1, . . . , n ’ 1), we set
p = i and q = (i + 1), . . . , n. Each complete sweep requires N = n(n ’ 1)/2
Jacobi transformations. Assuming that |»i ’ »j | ≥ δ for i = j, it can be
shown that the cyclic Jacobi method converges quadratically, that is (see
[Wil65], [Wil62])
1
Ψ(A(k+N ) ) ¤ √ (Ψ(A(k) ))2 , k = 1, 2, . . .
δ2
For further details of the algorithm, we refer to [GL89], Section 8.4.
5.10 Methods for Eigenvalues of Symmetric matrices 229

Example 5.15 Let us apply the cyclic Jacobi method to the Hilbert matrix H4 ,
whose eigenvalues read (to ¬ve signi¬cant ¬gures) »1 = 1.5002, »2 = 1.6914·10’1 ,
»3 = 6.7383 · 10’3 and »4 = 9.6702 · 10’5 . Running Program 40 with toll =
10’15 , the method converges in 3 sweeps to a matrix whose diagonal entries
coincide with the eigenvalues of H4 unless 4.4409 · 10’16 . As for the o¬-diagonal
(k)

entries, the values attained by Ψ(H4 ) are reported in Table 5.3.



(k) (k) (k)
Sweep Ψ(H4 ) Sweep Ψ(H4 ) Sweep Ψ(H4 )
5.262 · 10’2 3.824 · 10’5 5.313 · 10’16
1 2 3
TABLE 5.3. Convergence of the cyclic Jacobi algorithm


Formulae (5.63) and (5.62) are implemented in Programs 38 and 39.

Program 38 - psinorm : Evaluation of Ψ(A)

function [psi]=psinorm(A)
n=max(size(A)); psi=0;
for i=1:(n-1), for j=(i+1):n, psi=psi+A(i,j)ˆ2+A(j,i)ˆ2; end; end; psi=sqrt(psi);


Program 39 - symschur : Evaluation of c and s

function [c,s]=symschur(A,p,q)
if (A(p,q)==0), c=1; s=0; else,
eta=(A(q,q)-A(p,p))/(2*A(p,q));
if (eta >= 0), t=1/(eta+sqrt(1+etaˆ2));
else, t=-1/(-eta+sqrt(1+etaˆ2)); end; c=1/sqrt(1+tˆ2); s=c*t;
end

A coding of the cyclic Jacobi method is implemented in Program 40.
This program gets as input parameters the symmetric matrix A ∈ Rn—n
and a tolerance toll. The program returns a matrix D = GT AG, G be-
ing orthogonal, such that Ψ(D) ¤ toll A F , the value of Ψ(D) and the
number of sweeps to achieve convergence.

Program 40 - cycjacobi : Cyclic Jacobi method for symmetric matrices

function [D,sweep,psi]=cycjacobi(A,toll)
n=max(size(A)); D=A; psiD=norm(A,™fro™);
epsi=toll*psiD; psiD=psinorm(D); [psi]=psiD; sweep=0;
while (psiD > epsi), sweep=sweep+1;
for p=1:(n-1), for q=(p+1):n
[c,s]=symschur(D,p,q); [D]=gacol(D,c,s,1,n,p,q); [D]=garow(D,c,s,p,q,1,n);
end; end; psiD=psinorm(D); psi=[psi; psiD];
end
230 5. Approximation of Eigenvalues and Eigenvectors

5.10.2 The Method of Sturm Sequences
In this section we deal with the calculation of the eigenvalues of a real,
tridiagonal and symmetric matrix T. Typical instances of such a problem
arise when applying the Householder transformation to a given symmetric
matrix A (see Section 5.6.2) or when solving boundary value problems in
one spatial dimension (see for an example Section 5.12.1).
We analyze the method of Sturm sequences, or Givens method, introduced
in [Giv54]. For i = 1, . . . , n, we denote by di the diagonal entries of T and by
bi , i = 1, . . . , n ’ 1, the elements of the upper and lower subdiagonals of T.
We shall assume that bi = 0 for any i. Otherwise, indeed, the computation
reduces to problems of less complexity.
Letting Ti be the principal minor of order i of matrix T and p0 (x) = 1,
we de¬ne for i = 1, . . . , n the following sequence of polynomials pi (x) =
det(Ti ’ xIi )

p1 (x) = d1 ’ x
(5.65)
pi (x) = (di ’ x)pi’1 (x) ’ b2 pi’2 (x), i = 2, . . . , n.
i’1

It can be checked that pn is the characteristic polynomial of T; the com-
putational cost of its evaluation at point x is of the order of 2n ¬‚ops. The
sequence (5.65) is called the Sturm sequence owing to the following result,
for whose proof we refer to [Wil65], Chapter 2, Section 47 and Chapter 5,
Section 37.

Property 5.11 (of Sturm sequence) For i = 2, . . . , n the eigenvalues
of Ti’1 strictly separate those of Ti , that is

»i (Ti ) < »i’1 (Ti’1 ) < »i’1 (Ti ) < . . . < »2 (Ti ) < »1 (Ti’1 ) < »1 (Ti ).

Moreover, letting for any real number µ

Sµ = {p0 (µ), p1 (µ), . . . , pn (µ)},

the number s(µ) of sign changes in Sµ yields the number of eigenvalues of T
that are strictly less than µ, with the convention that pi (µ) has opposite sign
to pi’1 (µ) if pi (µ) = 0 (two consecutive elements in the sequence cannot
vanish at the same value of µ).

Example 5.16 Let T be the tridiagonal part of the Hilbert matrix H4 ∈ R4—4 ,
having entries hij = 1/(i + j ’ 1). The eigenvalues of T are (to ¬ve signi¬cant
¬gures) »1 = 1.2813, »2 = 0.4205, »3 = ’0.1417 and »4 = 0.1161. Taking µ = 0,
Program 41 computes the following Sturm sequence

S0 = {p0 (0), p1 (0), p2 (0), p3 (0), p4 (0)} = {1, 1, 0.0833, ’0.0458, ’0.0089}

from which, applying Property 5.11, one concludes that matrix T has one eigen-
value less than 0. In the case of matrix T = tridiag4 (’1, 2, ’1), with eigenvalues
5.10 Methods for Eigenvalues of Symmetric matrices 231

{0.38, 1.38, 2.62, 3.62} (to three signi¬cant ¬gures), we get, taking µ = 3

{p0 (3), p1 (3), p2 (3), p3 (3), p4 (3)} = {1, ’1, 0, 1, ’1}

which shows that matrix T has three eigenvalues less than 3, since three sign

changes occur.

The Givens method for the calculation of the eigenvalues of T proceeds as
follows. Letting b0 = bn = 0, Theorem 5.2 yields the interval J = [±, β]
which contains the spectrum of T, where

± = min [di ’ (|bi’1 | + |bi |)] , β = max [di + (|bi’1 | + |bi |)] .
1¤i¤n 1¤i¤n

The set J is used as an initial guess in the search for generic eigenvalues
»i of matrix T, for i = 1, . . . , n, using the bisection method (see Chapter
6).
Precisely, given a(0) = ± and b(0) = β, we let c(0) = (± + β)/2 and
compute s(c(0) ); then, recalling Property 5.11, we let b(1) = c(0) if s(c(0) ) >
(n ’ i), otherwise we set a(1) = c(0) . After r iterations, the value c(r) =
(a(r) + b(r) )/2 provides an approximation of »i within (|±| + |β|) · 2’(r+1) ,
as is shown in (6.9).
A systematic procedure can be set up to store any information about
the position within the interval J of the eigenvalues of T that are being
computed by the Givens method. The resulting algorithm generates a se-
(r) (r)
quence of neighboring subintervals aj , bj , for j = 1, . . . , n, each one of
arbitrarily small length and containing one eigenvalue »j of T (for further
details, see [BMW67]).

Example 5.17 Let us employ the Givens method to compute the eigenvalue
»2 2.62 of matrix T considered in Example 5.16. Letting toll=10’4 in Program
42 we obtain the results reported in Table 5.4, which demonstrate the convergence
of the sequence c(k) to the desired eigenvalue in 13 iterations. We have denoted
for brevity, s(k) = s(c(k) ). Similar results are obtained by running Program 42 to

compute the remaining eigenvalues of T.

a(k) b(k) c(k) s(k) a(k) b(k) c(k) s(k)
k k
0 0 4.000 2.0000 2 7 2.5938 2.625 2.6094 2
1 2.0000 4.000 3.0000 3 8 2.6094 2.625 2.6172 2
2 2.0000 3.000 2.5000 2 9 2.6094 2.625 2.6172 2
3 2.5000 3.000 2.7500 3 10 2.6172 2.625 2.6211 3
4 2.5000 2.750 2.6250 3 11 2.6172 2.621 2.6191 3
5 2.5000 2.625 2.5625 2 12 2.6172 2.619 2.6182 3
6 2.5625 2.625 2.5938 2 13 2.6172 2.618 2.6177 2
TABLE 5.4. Convergence of the Givens method for the calculation of the eigen-
value »2 of the matrix T in Example 5.16
232 5. Approximation of Eigenvalues and Eigenvectors

An implementation of the polynomial evaluation (5.65) is given in Pro-
gram 41. This program receives in input the vectors dd and bb containing
the main and the upper diagonals of T. The output values pi (x) are stored,
for i = 0, . . . , n, in the vector p.

Program 41 - sturm : Sturm sequence evaluation
function [p]=sturm(dd,bb,x)
n=length(dd); p(1)=1; p(2)=d(1)-x;
for i=2:n, p(i+1)=(dd(i)-x)*p(i)-bb(i-1)ˆ2*p(i-1); end

A basic implementation of the Givens method is provided in Program
42. In input, ind is the pointer to the searched eigenvalue, while the other
parameters are similar to those in Program 41. In output the values of
the elements of sequences a(k) , b(k) and c(k) are returned, together with
the required number of iterations niter and the sequence of sign changes
s(c(k) ).

Program 42 - givsturm : Givens method using the Sturm sequence
function [ak,bk,ck,nch,niter]=givsturm(dd,bb,ind,toll)
[a, b]=bound(dd,bb); dist=abs(b-a); s=abs(b)+abs(a);
n=length(d); niter=0; nch=[];
while (dist > (toll*s)),
niter=niter+1; c=(b+a)/2;
ak(niter)=a; bk(niter)=b; ck(niter)=c;
nch(niter)=chcksign(dd,bb,c);
if (nch(niter) > (n-ind)), b=c;
else, a=c; end; dist=abs(b-a); s=abs(b)+abs(a);
end


Program 43 - chcksign : Sign changes in the Sturm sequence
function nch=chcksign(dd,bb,x)
[p]=sturm(dd,bb,x); n=length(dd); nch=0; s=0;
for i=2:(n+1),
if ((p(i)*p(i-1)) <= 0), nch=nch+1; end
if (p(i)==0), s=s+1; end
end
nch=nch-s;


Program 44 - bound : Calculation of the interval J = [±, β]
function [alfa,beta]=bound(dd,bb)
n=length(dd); alfa=dd(1)-abs(bb(1)); temp=dd(n)-abs(bb(n-1));
if (temp < alfa), alfa=temp; end;
for i=2:(n-1),
temp=dd(i)-abs(bb(i-1))-abs(bb(i));
5.11 The Lanczos Method 233

if (temp < alfa), alfa=temp; end;
end
beta=dd(1)+abs(bb(1)); temp=dd(n)+abs(bb(n-1));
if (temp > beta), beta=temp; end;
for i=2:(n-1),
temp=dd(i)+abs(bb(i-1))+abs(bb(i));
if (temp > beta), beta=temp; end;
end



5.11 The Lanczos Method
Let A ∈ Rn—n be a symmetric sparse matrix, whose (real) eigenvalues are
ordered as
»1 ≥ »2 ≥ . . . ≥ »n’1 ≥ »n . (5.66)
When n is very large, the Lanczos method [Lan50] described in Section
4.4.3 can be applied to approximate the extremal eigenvalues »n and »1 . It
generates a sequence of tridiagonal matrices Hm whose extremal eigenvalues
rapidly converge to the extremal eigenvalues of A.
To estimate the convergence of the tridiagonalization process, we intro-
duce the Rayleigh quotient r(x) = (xT Ax)/(xT x) associated with a nonnull
vector x ∈ Rn . The following result, known as Courant-Fisher Theorem,
holds (for the proof see [GL89], p. 394)
»1 (A) = max r(x), »n (A) = min r(x).
n n
x∈R x∈R
x=0 x=0

T
Its application to the matrix Hm = Vm AVm , yields
(Vm x)T A(Vm x)
= max r(Hm x) ¤ »1 (A)
»1 (Hm ) = maxx∈Rn
xT x
x=0 x 2 =1
(5.67)
T
(Vm x) A(Vm x)
= min r(Hm x) ≥ »n (A).
»m (Hm ) = minx∈Rn
xT x
x=0 x 2 =1


At each step of the Lanczos method, the estimates (5.67) provide a lower
and upper bound for the extremal eigenvalues of A. The convergence of the
sequences {»1 (Hm )} and {»m (Hm )} to »1 and »n , respectively, is governed
by the following property, for whose proof we refer to [GL89], pp. 475-477.

Property 5.12 Let A ∈ Rn—n be a symmetric matrix with eigenvalues
ordered as in (5.66) and let u1 , . . . , un be the corresponding orthonormal
eigenvectors. If ·1 , . . . , ·m denote the eigenvalues of Hm , with ·1 ≥ ·2 ≥
. . . ≥ ·m , then
(»1 ’ »n )(tan φ1 )2
»1 ≥ ·1 ≥ »1 ’ ,
(Tm’1 (1 + 2ρ1 ))2
234 5. Approximation of Eigenvalues and Eigenvectors

where cos φ1 = |(q(1) )T u1 |, ρ1 = (»1 ’ »2 )/(»2 ’ »n ) and Tm’1 (x) is the
Chebyshev polynomial of degree m ’ 1 (see Section 10.1.1).

A similar result holds of course for the convergence estimate of the eigen-
values ·m to »n

(»1 ’ »n )(tan φn )2
»n ¤ ·m ¤ »n + ,
(Tm’1 (1 + 2ρn ))2

where ρn = (»n’1 ’ »n )/(»1 ’ »n’1 ) and cos φn = |(q(n) )T un |.

A naive implementation of the Lanczos algorithm can be a¬ected by nu-
merical instability due to propagation of rounding errors. In particular, the
Lanczos vectors will not verify the mutual orthogonality relation, making
the extremal properties (5.67) false. This requires careful programming of
the Lanczos iteration by incorporating suitable reorthogonalization proce-
dures as described in [GL89], Sections 9.2.3-9.2.4.
Despite this limitation, the Lanczos method has two relevant features:
it preserves the sparsity pattern of the matrix (unlike Householder tridiag-
onalization), and such a property makes it quite attractive when dealing
with large size matrices; furthermore, it converges to the extremal eigen-
values of A much more rapidly than the power method does (see [Kan66],
[GL89], p. 477).
The Lanczos method can be generalized to compute the extremal eigen-
values of an unsymmetric matrix along the same lines as in Section 4.5
in the case of the solution of a linear system. Details on the practical im-
plementation of the algorithm and a theoretical convergence analysis can
be found in [LS96] and [Jia95], while some documentation of the latest
software can be found in NETLIB/scalapack/readme.arpack (see also the
MATLAB command eigs).

An implementation of the Lanczos algorithm is provided in Program 45.
The input parameter m is the size of the Krylov subspace in the tridiago-
nalization procedure, while toll is a tolerance monitoring the size of the
increment of the computed eigenvalues between two successive iterations.
The output vectors lmin, lmax and deltaeig contain the sequences of the
approximate extremal eigenvalues and of their increments between succes-
sive iterations. Program 42 is invoked for computing the eigenvalues of the
tridiagonal matrix Hm .

Program 45 - eiglancz : Extremal eigenvalues of a symmetric matrix
function [lmin,lmax,deltaeig,k]=eiglancz(A,m,toll)
n=size(A); V=[0*[1:n]™,[1,0*[1:n-1]]™];
beta(1)=0; normb=1; k=1; deltaeig(1)=1;
while k <= m & normb >= eps & deltaeig(k) < toll
5.12 Applications 235

vk = V(:,k+1); w = A*vk-beta(k)*V(:,k);
alpha(k)= w™*vk; w = w - alpha(k)*vk;
normb = norm(w,2); beta(k+1)=normb;
if normb ˜= 0
V=[V,w/normb];
if k==1
lmin(1)=alpha; lmax(1)=alpha;
k=k+1; deltaeig(k)=1;
else
d=alpha; b=beta(2:length(beta)-1);
[ak,bk,ck,nch,niter]=givsturm(d,b,1,toll);
lmax(k)=(ak(niter)+bk(niter))/2;
[ak,bk,ck,nch,niter]=givsturm(d,b,k,toll);
lmin(k)=(ak(niter)+bk(niter))/2;
deltaeig(k+1)=max(abs(lmin(k)-lmin(k-1)),abs(lmax(k)-lmax(k-1)));
k=k+1;
end
else
disp(™Breakdown™);
d=alpha; b=beta(2:length(beta)-1);
[ak,bk,ck,nch,niter]=givsturm(d,b,1,toll);
lmax(k)=(ak(niter)+bk(niter))/2;
[ak,bk,ck,nch,niter]=givsturm(d,b,k,toll);
lmin(k)=(ak(niter)+bk(niter))/2;
deltaeig(k+1)=max(abs(lmin(k)-lmin(k-1)),abs(lmax(k)-lmax(k-1)));
k=k+1;
end
end
k=k-1;
return


Example 5.18 Consider the eigenvalue problem for the matrix A∈ Rn—n with
n = 100, having diagonal entries equal to 2 and o¬-diagonal entries equal to -1
on the upper and lower tenth diagonal. Program 45, with m=100 and toll=eps,
takes 10 iterations to approximate the extremal eigenvalues of A with an absolute

error of the order of the machine precision.




5.12 Applications
A classical problem in engineering is to determine the proper or natural
frequencies of a system (mechanical, structural or electric). Typically, this
leads to solving a matrix eigenvalue problem. Two examples coming from
structural applications are presented in the forthcoming sections where the
buckling problem of a beam and the study of the free vibrations of a bridge
are considered.
236 5. Approximation of Eigenvalues and Eigenvectors

5.12.1 Analysis of the Buckling of a Beam
Consider the homogeneous and thin beam of length L shown in Figure
5.4. The beam is simply supported at the end and is subject to a normal
compression load P at x = L. Denote by y(x) the vertical displacement
of the beam; the structure constraints demand that y(0) = y(L) = 0. Let

L
P
x



y
FIGURE 5.4. A simply supported beam subject to a normal compression load

us consider the problem of the buckling of the beam. This amounts to
determining the critical load Pcr , i.e. the smallest value of P such that an
equilibrium con¬guration of the beam exists which is di¬erent from being
rectilinear. Reaching the condition of critical load is a warning of structure
instability, so that it is quite important to determine its value accurately.

The explicit computation of the critical load can be worked out under
the assumption of small displacements, writing the equilibrium equation
for the structure in its deformed con¬guration (drawn in dashed line in
Figure 5.4)

’E (J(x)y (x)) = Me (x), 0<x<L
(5.68)
y(0) = y(L) = 0,

where E is the constant Young™s modulus of the beam and Me (x) = P y(x)
is the momentum of the load P with respect to a generic point of the beam
of abscissa x. In (5.68) we are assuming that the momentum of inertia
J can be varying along the beam, which indeed happens if the beam has
nonuniform cross-section.
Equation (5.68) expresses the equilibrium between the external momen-
tum Me and the internal momentum Mi = ’E(Jy ) which tends to restore
the rectilinear equilibrium con¬guration of the beam. If the stabilizing re-
action Mi prevails on the unstabilizing action Me , the equilibrium of the
initial rectilinear con¬guration is stable. The critical situation (buckling of
the beam) clearly arises when Mi = Me .
Assume that J is constant and let ±2 = P/(EJ); solving the boundary value
problem (5.68), we get the equation C sin ±L = 0, which admits nontrivial
5.12 Applications 237

solutions ± = (kπ)/L, k = 1, 2, . . . . Taking k = 1 yields the value of the
2
critical load Pcr = π L2 .
EJ

To solve numerically the boundary value problem (5.68) it is conve-
nient to introduce for n ≥ 1, the discretization nodes xj = jh, with
h = L/(n + 1) and j = 1, . . . , n, thus de¬ning the vector of nodal ap-
proximate displacements uj at the internal nodes xj (where u0 = y(0) = 0,
un+1 = y(L) = 0). Then, using the ¬nite di¬erence method (see Section
12.2), the calculation of the critical load amounts to determining the small-
est eigenvalue of the tridiagonal symmetric and positive de¬nite matrix
A = tridiagn (’1, 2, ’1) ∈ Rn—n .
It can indeed be checked that the ¬nite di¬erence discretization of prob-
lem (5.68) by centered di¬erences leads to the following matrix eigenvalue
problem
Au = ±2 h2 u,
where u ∈ Rn is the vector of nodal displacements uj . The discrete coun-
terpart of condition C sin(±) = 0 requires that P h2 /(EJ) coincides with
the eigenvalues of A as P varies.
h
Denoting by »min and Pcr , the smallest eigenvalue of A and the (approx-
imate) value of the critical load, respectively, then Pcr = (»min EJ)/h2 .
h

Letting θ = π/(n + 1), it can be checked (see Exercise 3, Chapter 4) that
the eigenvalues of matrix A are

»j = 2(1 ’ cos(jθ)), j = 1, . . . , n. (5.69)

The numerical calculation of »min has been carried out using the Givens
algorithm described in Section 5.10.2 and assuming n = 10. Running the
Program 42 with an absolute tolerance equal to the roundo¬ unit, the
solution »min 0.081 has been obtained after 57 iterations.
It is also interesting to analyze the case where the beam has nonuniform
cross-section, since the value of the critical load, unlike the previous situa-
tion, is not exactly known a priori. We assume that, for each x ∈ [0, L], the
section of the beam is rectangular, with depth a ¬xed and height σ that
varies according to the rule
2
S x
’1 ’1 0 ¤ x ¤ L,
σ(x) = s 1 + ,
s L
where S and s are the values at the ends, with S ≥ s > 0. The momentum
of inertia, as a function of x, is given by J(x) = (1/12)aσ 3 (x); proceeding
similarly as before, we end up with a system of linear algebraic equations
of the form
˜
Au = (P/E)h2 u,
˜
where this time A = tridiagn (b, d, b) is a tridiagonal, symmetric and posi-
tive de¬nite matrix having diagonal entries di = J(xi’1/2 ) + J(xi+1/2 ), for
i = 1, . . . , n, and o¬-diagonal entries bi = ’J(xi+1/2 ), for i = 1, . . . , n ’ 1.
238 5. Approximation of Eigenvalues and Eigenvectors

Assume the following values of the parameters: a = 0.4 [m], s = a, S =
0.5 [m] and L = 10 [m]. To ensure a correct dimensional comparison, we
¯
have multiplied by J = a4 /12 the smallest eigenvalue of the matrix A in the
uniform case (corresponding to S = s = a), obtaining »min = 1.7283 · 10’4 .
Running Program 42, with n = 10, yields in the nonuniform case the value
»min = 2.243 · 10’4 . This result con¬rms that the critical load increases
for a beam having a wider section at x = 0, that is, the structure enters
the instability regime for higher values of the load than in the uniform
cross-section case.


5.12.2 Free Dynamic Vibration of a Bridge
We are concerned with the analysis of the free response of a bridge whose
schematic structure is shown in Figure 5.5. The number of the nodes of
the structure is equal to 2n while the number of the beams is 5n. Each
horizontal and vertical beam has a mass equal to m while the diagonal

beams have mass equal to m 2. The sti¬ness of each beam is represented
by the spring constant κ. The nodes labeled by “0” and “2n + 1” are
constrained to ground.
2n ’ 2 2n ’ 1 2n
n+2
n+1 n+3


0 2n + 1
n’2 n’1
3 n
1 2


FIGURE 5.5. Schematic structure of a bridge

Denoting by x and y the vectors of the 2n nodal horizontal and vertical
displacements the free response of the bridge can be studied by solving the
generalized eigenvalue problems

Mx = »Kx, My = »Ky, (5.70)

2, b = (β, . . . , β)T ∈
where M = mdiag2n (±, b, ±, γ, b, γ), where ± = 3 +
√ √
Rn’2 with β = 3/2 + 2 and γ = 1 + 2,

K11 K12
K=κ
K12 K11

for a positive constant κ and where K12 = tridiagn (’1, ’1, ’1), K11 =
tridiagn (’1, d, ’1) with d = (4, 5, . . . , 5, 4)T ∈ Rn . The diagonal matrix
M is the mass matrix while the symmetric and positive de¬nite matrix K
is the sti¬ness matrix.
5.12 Applications 239

100

90

80

70

60

50

40

30

20

10

0
0 10 20 30 40 50 60 70 80 90 100



FIGURE 5.6. Iterations number of the Lanczos method and of the inverse power
method versus the size 2n of matrix C. The solid and the dash-dotted curves
˜ ˜
refer to the inverse power method (for »2n and »2n’1 respectively), while the
˜
dashed and the dotted curves refer to the Lanczos method (still for »2n and
˜
»2n’1 , respectively)

For k = 1, . . . , 2n we denote by (»k , zk ) any eigenvalue/eigenvector pair

of (5.70) and call ωk = »k the natural frequencies and zk the modes of
vibration of the bridge. The study of the free vibrations is of primary im-
portance in the design of a structure like a bridge or a multi-story building.
Indeed, if the excitation frequency of an external force (vehicles, wind or,
even worse, an earthquake) coincides with one of the natural frequencies of
the structure then a condition of resonance occurs and, as a result, large
oscillations may dangerously arise.

Let us now deal with the numerical solution of the matrix eigenvalue
problem (5.70). For this purpose we introduce the change of variable z =
M1/2 x (or z = M1/2 y) so that each generalized eigenvalue problem in (5.70)
can be conveniently reformulated as
˜
Cz = »z

where » = 1/» and the matrix C = M’1/2 KM’1/2 is symmetric positive
˜
de¬nite. This property allows us to use the Lanczos method described in
Section 5.11 and also ensures quadratic convergence of the power iterations
(see Section 5.11).
˜ ˜
We approximate the ¬rst two subdominant eigenvalues »2n and »2n’1
of the matrix C (i.e., its smallest and second smallest eigenvalues) in the
case m = κ = 1 using the de¬‚ation procedure considered in Remark 5.3.
The inverse power iteration and the Lanczos method are compared in the
˜ ˜
computation of »2n and »2n’1 in Figure 5.6.
The results show the superiority of the Lanczos method over the inverse
iterations only when the matrix C is of small size. This is to be ascribed
to the fact that, as n grows, the progressive in¬‚uence of the rounding er-
240 5. Approximation of Eigenvalues and Eigenvectors

rors causes a loss of mutual orthogonality of the Lanczos vectors and, in
turn, an increase in the number of iterations for the method to converge.
Suitable reorthogonalization procedures are thus needed to improve the
performances of the Lanczos iteration as pointed out in Section 5.11.

We conclude the free response analysis of the bridge showing in Figure
5.7 (in the case n = 5, m = 10 and κ = 1) the modes of vibration z8
and z10 corresponding to the natural frequencies ω8 = 990.42 and ω10 =
2904.59. The MATLAB built-in function eig has been employed to solve
the generalized eigenvalue problems (5.70) as explained in Section 5.9.1.
1.5 1.5




1 1




0.5 0.5




0 0




’0.5 ’0.5
’1 0 1 2 3 4 5 6 7 ’1 0 1 2 3 4 5 6 7



FIGURE 5.7. Modes of vibration corresponding to the natural frequencies ω8
(left) and ω10 (right). The undeformed con¬guration of the bridge is drawn in
dotted line




5.13 Exercises
1. Using the Gershgorin theorems, localize the eigenvalues of the matrix A
which is obtained setting A = (P’1 DP)T and then a13 = 0, a23 = 0, where
D=diag3 (1, 50, 100) and
® 
1 1 1
 
P =  10 20 30  .
° »
100 50 60
[Solution : σ(A) = {’151.84, 80.34, 222.5}.]
2. Localize the spectrum of the matrix
® 
’1
12
 
A= 2 7 0 .
° »
’1 0 5
[Solution : σ(A) ‚ [’2, 9].]
5.13 Exercises 241

3. Draw the oriented graph of the matrix
® 
13 0
 
A =  0 2 ’1  .
° »
’1 0 2


4. Check if the following matrices are reducible.
®  ® 
0 ’1 0
1 0 0 1 0
   
2 3 ’2 1   1
0 0 0
   
A1 =   , A2 =  .
 ’1 0 ’2 0   0
0 1 0
° » ° »
1 ’1 14 1 0 0 0

[Solution : A1 , reducible; A2 , irreducible.]
5. Provide an estimate of the number of complex eigenvalues of the matrix
® 
’4 0 0 0.5 0
 
22 4 ’3 1 
 
 
A =  0.5 0 0 0 .
’1
 
 
 0.5 0 3 0
0.2
° »
’1
2 0.5 3 4

[Hint : Check that A can be reduced to the form

M1 M2
A=
0 M3

where M1 ∈ R2—2 and M2 ∈ R3—3 . Then, study the eigenvalues of blocks M1
and M2 using the Gershgorin theorems and check that A has no complex
eigenvalues.]
6. Let A ∈ Cn—n be a diagonal matrix and let A = A + E be a perturbation
of A with eii = 0 for i = 1, . . . , n. Show that
n
|»i (A) ’ »i (A)| ¤ |eij |, i = 1, . . . , n. (5.71)
j=1



7. Apply estimate (5.71) to the case in which A and E are, for µ ≥ 0, the
matrices
1 0 0 µ
A= , E= .
0 2 µ 0

[Solution : σ(A) = {1, 2} and σ(A) = {(3 “ 1 + 4µ2 )/2}.]
242 5. Approximation of Eigenvalues and Eigenvectors

8. Check that ¬nding the zeros of a polynomial of degree ¤ n with real coef-
¬cients
n
ak xk = a0 + a1 x + ... + an xn , an = 0, ak ∈ R, k = 0, . . . n
pn (x) =
k=0

is equivalent to determining the spectrum of the Frobenius matrix C ∈
Rn—n associated with pn (known as the companion matrix)
® 
’(an’1 /an ) ’(an’2 /an ) . . . ’(a1 /an ) ’(a0 /an )
 
1 0 ... 0 0
 
 
0 1 ... 0 0
C=  .(5.72)
 
. . . .
..
° »
. . . .
.
. . . .
0 0 ... 1 0

An important consequence of the result above is that, due to Abel™s theo-
rem, there exist in general no direct methods for computing the eigenvalues
of a given matrix, for n ≥ 5.
9. Show that if matrix A ∈ Cn—n admits eigenvalue/eigenvector pairs (», x),
then the matrix UH AU, with U unitary, admits eigenvalue/eigenvector
pairs », UH x . (Similarity transformation using an orthogonal matrix).
10. Suppose that all the assumptions needed to apply the power method are
satis¬ed except for the requirement ±1 = 0 (see Section 5.3.1). Show that
in such a case the sequence (5.17) converges to the eigenvalue/eigenvector
pair (»2 , x2 ). Then, study experimentally the behaviour of the method,
computing the pair (»1 , x1 ) for the matrix
® 
1 ’1 2
 
A =  ’2 0 5 .
° »
6 ’3 6

For this, use Program 26, taking q(0) = 1T / 3 and q(0) = w(0) / w(0) 2 ,
respectively, where w(0) = (1/3)x2 ’ (2/3)x3 .
[Solution : »1 = 5, »2 = 3, »3 = ’1 and x1 = [5, 16, 18]T , x2 = [1, 6, 4]T ,
x3 = [5, 16, 18]T .]
11. Show that the companion matrix associated with the polynomial pn (x) =
xn + an xn’1 + . . . + a1 , can be written in the alternative form (5.72)
® 
0
0 a1
 ’1 
0 a2
 
 
.. .. ..
A= .
. . .
 
 
’1
° »
0 an’1
0 ’1 an

12. (From [FF63]) Suppose that a real matrix A ∈ Rn—n has two maximum
module complex eigenvalues given by »1 = ρeiθ and »2 = ρe’iθ , with
5.13 Exercises 243

θ = 0. Assume, moreover, that the remaining eigenvalues have modules
less than ρ. The power method can then be modi¬ed as follows:
let q(0) be a real vector and q(k) be the vector provided by the power
(k)
method without normalization. Then, set xk = qn0 for some n0 , with
1 ¤ n0 ¤ n. Prove that

xk xk+2 ’ x2 »3
k+1
+O
2 k
ρ= ,
xk’1 xk+1 ’ x2 ρ
k


ρxk’1 + r’1 xk+1 »3
+O k
cos(θ) = .
2xk ρ

[Hint : ¬rst, show that

»3
xk = C(ρk cos(kθ + ±)) + O k
,
ρ

where ± depends on the components of the initial vector along the direc-
tions of the eigenvectors associated with »1 and »2 .]
13. Apply the modi¬ed power method of Exercise 12 to the matrix
® 
1 ’1 4 1
4
A=° 1 0 »,
0
0 1 0

and compare the obtained results with those yielded by the standard power
method.
14. (Taken from [Dem97]). Apply the QR iteration with double shift to com-
pute the eigenvalues of the matrix
® 
00 1
°1 0 0 ».
A=
01 0

Run Program 37 setting toll=eps, itmax=100 and comment about the
form of the obtained matrix T(iter) after iter iterations of the algorithm.
[Solution : the eigenvalues of A are the solution of »3 ’ 1 = 0, i.e.,

σ(A) = 1, ’1/2 ± 3/2i . After iter=100 iterations, Program 37 yields
the matrix
® 
’1
0 0
T(100) = ° 1 0 »,
0
0 ’1 0

which means that the QR iteration leaves A unchanged (except for sign
changes that are non relevant for eigenvalues computation). This is a simple
but glaring example of matrix for which the QR method with double shift
fails to converge.]
6
Root¬nding for Nonlinear Equations




This chapter deals with the numerical approximation of the zeros of a real-
valued function of one variable, that is

given f : I = (a, b) ⊆ R ’ R, ¬nd ± ∈ C such that f (±) = 0. (6.1)

The analysis of problem (6.1) in the case of systems of nonlinear equations
will be addressed in Chapter 7.
Methods for the numerical approximation of a zero of f are usually iter-
ative. The aim is to generate a sequence of values x(k) such that

lim x(k) = ±.
k’∞

The convergence of the iteration is characterized by the following de¬nition.

De¬nition 6.1 A sequence x(k) generated by a numerical method is
said to converge to ± with order p ≥ 1 if

|x(k+1) ’ ±|
∃C > 0 : ¤ C, ∀k ≥ k0 , (6.2)
|x(k) ’ ±|p
where k0 ≥ 0 is a suitable integer. In such a case, the method is said to be
of order p. Notice that if p is equal to 1, in order for x(k) to converge to
± it is necessary that C < 1 in (6.2). In such an event, the constant C is
called the convergence factor of the method.
Unlike the case of linear systems, convergence of iterative methods for
root¬nding of nonlinear equations depends in general on the choice of the
246 6. Root¬nding for Nonlinear Equations

initial datum x(0) . This allows for establishing only local convergence re-
sults, that is, holding for any x(0) which belongs to a suitable neighborhood
of the root ±. Methods for which convergence to ± holds for any choice of
x(0) in the interval I, are said to be globally convergent to ±.


6.1 Conditioning of a Nonlinear Equation
Consider the nonlinear equation f (x) = •(x) ’ d = 0 and assume that f
is a continuously di¬erentiable function. Let us analyze the sensitivity of
¬nding the roots of f with respect to changes in the datum d.
The problem is well posed only if the function • is invertible. In such a
case, indeed, one gets ± = •’1 (d) from which, using the notation of Chapter
2, the resolvent G is •’1 . On the other hand, (•’1 ) (d) = 1/• (±), so that
formula (2.7) for the approximate condition number (relative and absolute)
yields
|d| 1
K(d) , Kabs (d) . (6.3)
|±||f (±)| |f (±)|

The problem is thus ill-conditioned when f (±) is “small” and well-condi-
tioned if f (±) is “large”.
The analysis which leads to (6.3) can be generalized to the case in which
± is a root of f with multiplicity m > 1 as follows. Expanding • in a Taylor
series around ± up to the m-th order term, we get
m
•(k) (±)
(δ±)k + o((δ±)m ).
d + δd = •(± + δ±) = •(±) +
k!
k=1

Since •(k) (±) = 0 for k = 1, . . . , m ’ 1, we obtain

δd = f (m) (±)(δ±)m /m!

so that an approximation to the absolute condition number is
1/m
m!δd 1
Kabs (d) . (6.4)
|δd|
f (m) (±)

Notice that (6.3) is the special case of (6.4) where m = 1. From this it also
follows that, even if δd is su¬ciently small to make |m!δd/f (m) (±)| < 1,
Kabs (d) could nevertheless be a large number. We therefore conclude that
the problem of root¬nding of a nonlinear equation is well-conditioned if ±
is a simple root and |f (±)| is de¬nitely di¬erent from zero, ill-conditioned
otherwise.
Let us now consider the following problem, which is closely connected
with the previous analysis. Assume d = 0 and let ± be a simple root of f ;
6.1 Conditioning of a Nonlinear Equation 247

moreover, for ± = ±, let f (ˆ ) = r = 0. We seek a bound for the di¬erence
ˆ ± ˆ
± ’ ± as a function of the residual r. Applying (6.3) yields
ˆ ˆ

1
Kabs (0) .
|f (±)|

Therefore, letting δx = ± ’ ± and δd = r in the de¬nition of Kabs (see
ˆ ˆ
(2.5)), we get

|ˆ ’ ±| |ˆ|
± r
, (6.5)
|±| |f (±)||±|

where the following convention has been adopted: if a ¤ b and a c, then
we write a c. If ± has multiplicity m > 1, using (6.4) instead of (6.3) and
proceeding as above, we get
1/m
|ˆ ’ ±|
± m!
|ˆ|1/m .
r (6.6)
|±| |f (m) (±)||±|m


These estimates will be useful in the analysis of stopping criteria for itera-
tive methods (see Section 6.5).
A remarkable example of a nonlinear problem is when f is a polynomial
pn of degree n, in which case it admits exactly n roots ±i , real or complex,
each one counted with its multiplicity. We want to investigate the sensitivity
of the roots of pn with respect to the changes of its coe¬cients.
To this end, let pn = pn + qn , where qn is a perturbation polynomial of
ˆ
degree n, and let ±i be the corresponding roots of pn . A direct use of (6.6)
ˆ ˆ
yields for any root ±i the following estimate
1/m
|ˆ i ’ ±i |
± m!
|qn (ˆ i )|1/m = S i ,
i
Erel = ± (6.7)
|±i | (m)
|pn (±i )||±i |m

where m is the multiplicity of the root at hand and qn (ˆ i ) = ’pn (ˆ i ) is
± ±
the “residual” of the polynomial pn evaluated at the perturbed root.

Remark 6.1 A formal analogy exists between the a priori estimates so
far obtained for the nonlinear problem •(±) = d and those developed in
Section 3.1.2 for linear systems, provided that A corresponds to • and b
to d. More precisely, (6.5) is the analogue of (3.9) if δA=0, and the same
holds for (6.7) (for m = 1) if δb = 0.

Example 6.1 Let p4 (x) = (x ’ 1)4 , and let p4 (x) = (x ’ 1)4 ’ µ, with 0 < µ
ˆ 1.

4
The roots of the perturbed polynomial are simple and equal to ±i = ±i + µ,
ˆ
where ±i = 1 are the (coincident) zeros of p4 . They lie with intervals of π/2 on

the circle of radius 4 µ and center z = (1, 0) in the complex plane.
248 6. Root¬nding for Nonlinear Equations

The problem is stable (that is limµ’0 ±i = 1), but is ill-conditioned since
ˆ


|±i ’ ±i |
ˆ
= 4 µ, i = 1, . . . 4,
|±i |

For example, if µ = 10’4 the relative change is 10’1 . Notice that the right-side


of (6.7) is just 4 µ, so that, in this case, (6.7) becomes an equality.


Example 6.2 (Wilkinson). Consider the following polynomial

p10 (x) = Π10 (x + k) = x10 + 55x9 + . . . + 10!.
k=1


Let p10 = p10 + µx9 , with µ = 2’23 1.2 · 10’7 . Let us study the conditioning of
ˆ
¬nding the roots of p10 . Using (6.7) with m = 1, we report for i = 1, . . . , 10 in
Table 6.1 the relative errors Erel and the corresponding estimates S i .
i

These results show that the problem is ill-conditioned, since the maximum
relative error for the root ±8 = ’8 is three orders of magnitude larger than
the corresponding absolute perturbation. Moreover, excellent agreement can be

observed between the a priori estimate and the actual relative error.

i
Si i
Si
i Erel i Erel
3.039 · 10’13 3.285 · 10’13 6.956 · 10’5 6.956 · 10’5
1 6
7.562 · 10’10 7.568 · 10’10 1.589 · 10’4 1.588 · 10’4
2 7
7.758 · 10’8 7.759 · 10’8 1.984 · 10’4 1.987 · 10’4
3 8
1.808 · 10’6 1.808 · 10’6 1.273 · 10’4 1.271 · 10’4
4 9
1.616 · 10’5 1.616 · 10’5 3.283 · 10’5 3.286 · 10’5
5 10
TABLE 6.1. Relative error and estimated error using (6.7) for the Wilkinson
polynomial of degree 10




6.2 A Geometric Approach to Root¬nding
In this section we introduce the following methods for ¬nding roots: the
bisection method, the chord method, the secant method, the false position
(or Regula Falsi) method and Newton™s method. The order of the presen-
tation re¬‚ects the growing complexity of the algorithms. In the case of the
bisection method, indeed, the only information that is being used is the sign
of the function f at the end points of any bisection (sub)interval, whilst
the remaining algorithms also take into account the values of the function
and/or its derivative.


6.2.1 The Bisection Method
The bisection method is based on the following property.
6.2 A Geometric Approach to Root¬nding 249

Property 6.1 (theorem of zeros for continuous functions) Given a
continuous function f : [a, b] ’ R, such that f (a)f (b) < 0, then ∃ ± ∈ (a, b)
such that f (±) = 0.
Starting from I0 = [a, b], the bisection method generates a sequence of
subintervals Ik = [a(k) , b(k) ], k ≥ 0, with Ik ‚ Ik’1 , k ≥ 1, and enjoys the
property that f (a(k) )f (b(k) ) < 0. Precisely, we set a(0) = a, b(0) = b and
x(0) = (a(0) + b(0) )/2; then, for k ≥ 0:

set a(k+1) = a(k) , b(k+1) = x(k) if f (x(k) )f (a(k) ) < 0;
set a(k+1) = x(k) , b(k+1) = b(k) if f (x(k) )f (b(k) ) < 0;
¬nally, set x(k+1) = (a(k+1) + b(k+1) )/2.


y 0
10


’2
10

f (x) ’4
10

± ’6
x 10

a b
x(1)
x(0)
’8
10


’10
10


I1 ’12
10
0 5 10 15 20 25 30
I0

FIGURE 6.1. The bisection method. The ¬rst two steps (left); convergence history
for the Example 6.3 (right). The number of iterations and the absolute error as
a function of k are reported on the x- and y-axis, respectively

The bisection iteration terminates at the m-th step for which |x(m) ’±| ¤
|Im | ¤ µ, where µ is a ¬xed tolerance and |Im | is the length of Im . As for
the speed of convergence of the bisection method, notice that |I0 | = b ’ a,
while

|Ik | = |I0 |/2k = (b ’ a)/2k , k ≥ 0. (6.8)

Denoting by e(k) = x(k) ’ ± the absolute error at step k, from (6.8) it
follows that |e(k) | ¤ (b ’ a)/2k , k ≥ 0, which implies limk’∞ |e(k) | = 0.
The bisection method is therefore globally convergent. Moreover, to get
|x ’ ±| ¤ µ we must take
(m)


log((b ’ a)/µ) log((b ’ a)/µ)
m ≥ log2 (b ’ a) ’ log2 (µ) = . (6.9)
log(2) 0.6931
In particular, to gain a signi¬cant ¬gure in the accuracy of the approxi-
mation of the root (that is, to have |x(k) ’ ±| = |x(j) ’ ±|/10), one needs
250 6. Root¬nding for Nonlinear Equations

k ’ j = log2 (10) 3.32 bisections. This singles out the bisection method as
an algorithm of certain, but slow, convergence. We must also point out that
the bisection method does not generally guarantee a monotone reduction
of the absolute error between two successive iterations, that is, we cannot
ensure a priori that

|e(k+1) | ¤ Mk |e(k) |, for any k ≥ 0, (6.10)

with Mk < 1. For this purpose, consider the situation depicted in Figure
6.1 (left), where clearly |e(1) | > |e(0) |. Failure to satisfy (6.10) does not
allow for qualifying the bisection method as a method of order 1, in the
sense of De¬nition 6.1.

Example 6.3 Let us check the convergence properties of the bisection method
in the approximation of the root ± 0.9062 of the Legendre polynomial of degree
5
x
L5 (x) = (63x4 ’ 70x2 + 15),
8
whose roots lie within the interval (’1, 1) (see Section 10.1.2). Program 46 has
been run taking a = 0.6, b = 1 (whence, L5 (a) · L5 (b) < 0), nmax = 100,
toll = 10’10 and has reached convergence in 32 iterations, this agrees with
the theoretical estimate (6.9) (indeed, m ≥ 31.8974). The convergence history is
reported in Figure 6.1 (right) and shows an (average) reduction of the error by a
factor of two, with an oscillating behavior of the sequence {x(k) }. •

The slow reduction of the error suggests employing the bisection method
as an “approaching” technique to the root. Indeed, taking few bisection
steps, a reasonable approximation to ± is obtained, starting from which a
higher order method can be successfully used for a rapid convergence to
the solution within the ¬xed tolerance. An example of such a procedure
will be addressed in Section 6.7.1.
The bisection algorithm is implemented in Program 46. The input pa-
rameters, here and in the remainder of this chapter, have the following
meaning: a and b denote the end points of the search interval, fun is the
variable containing the expression of the function f , toll is a ¬xed toler-
ance and nmax is the maximum admissible number of steps for the iterative
process.
In the output vectors xvect, xdif and fx the sequences {x(k) }, {|x(k+1) ’
x(k) |} and {f (x(k) )}, for k ≥ 0, are respectively stored, while nit denotes
the number of iterations needed to satisfy the stopping criteria. In the case
of the bisection method, the code returns as soon as the half-length of the
search interval is less than toll.

Program 46 - bisect : Bisection method
function [xvect,xdif,fx,nit]=bisect(a,b,toll,nmax,fun)
err=toll+1; nit=0; xvect=[]; fx=[]; xdif=[];
while (nit < nmax & err > toll)
6.2 A Geometric Approach to Root¬nding 251

nit=nit+1; c=(a+b)/2; x=c; fc=eval(fun); xvect=[xvect;x];
fx=[fx;fc]; x=a; if (fc*eval(fun) > 0), a=c; else, b=c; end;
err=abs(b-a); xdif=[xdif;err];
end;



6.2.2 The Methods of Chord, Secant and Regula Falsi and
Newton™s Method
In order to devise algorithms with better convergence properties than the
bisection method, it is necessary to include information from the values
attained by f and, possibly, also by its derivative f (if f is di¬erentiable)
or by a suitable approximation.
For this purpose, let us expand f in a Taylor series around ± and truncate
the expansion at the ¬rst order. The following linearized version of problem
(6.1) is obtained
f (±) = 0 = f (x) + (± ’ x)f (ξ), (6.11)
for a suitable ξ between ± and x. Equation (6.11) prompts the following
iterative method: for any k ≥ 0, given x(k) , determine x(k+1) by solving
equation f (x(k) ) + (x(k+1) ’ x(k) )qk = 0, where qk is a suitable approxima-
tion of f (x(k) ).
The method described here amounts to ¬nding the intersection between
the x-axis and the straight line of slope qk passing through the point
(x(k) , f (x(k) )), and thus can be more conveniently set up in the form
’1
x(k+1) = x(k) ’ qk f (x(k) ), ∀k ≥ 0.
We consider below four particular choices of qk .

y
y



f (x) f (x)


x(1) x(2) x(1)
x(0) x
x ±
x(3)
a ± b a b



FIGURE 6.2. The ¬rst step of the chord method (left) and the ¬rst three steps
of the secant method (right). For this method we set x(’1) = b and x(0) = a
252 6. Root¬nding for Nonlinear Equations

The chord method. We let
f (b) ’ f (a)
∀k ≥ 0
qk = q = ,
b’a

from which, given an initial value x(0) , the following recursive relation is
obtained
b’a
x(k+1) = x(k) ’ k ≥ 0.
f (x(k) ), (6.12)
f (b) ’ f (a)

In Section 6.3.1, we shall see that the sequence {x(k) } generated by (6.12)
converges to the root ± with order of convergence p = 1.

The secant method. We let
f (x(k) ) ’ f (x(k’1) )
∀k ≥ 0
qk = , (6.13)
x(k) ’ x(k’1)

from which, giving two initial values x(’1) and x(0) , we obtain the following
relation
x(k) ’ x(k’1)
x(k+1) = x(k) ’ k ≥ 0.
f (x(k) ), (6.14)
(k) ) ’ f (x(k’1) )
f (x

If compared with the chord method, the iterative process (6.14) requires
an extra initial point x(’1) and the corresponding function value f (x(’1) ),
as well as, for any k, computing the incremental ratio (6.13). The bene¬t
due to the increase in the computational cost is the higher speed of con-
vergence of the secant method, as stated in the following property which
can be regarded as a ¬rst example of the local convergence theorem (for the
proof see [IK66], pp. 99-101).

Property 6.2 Let f ∈ C 2 (J ), J being a suitable neighborhood of the root
± and assume that f (±) = 0. Then, if the initial data x(’1) and x(0) are
chosen in J su¬ciently close to ±, the sequence (6.14) converges to ± with

order p = (1 + 5)/2 1.63.


The Regula Falsi (or false position) method. This is a variant of
the secant method in which, instead of selecting the secant line through
the values (x(k) , f (x(k) ) and (x(k’1) , f (x(k’1) ), we take the one through
(x(k) , f (x(k) ) and (x(k ) , f (x(k ) ), k being the maximum index less than k
such that f (x(k ) ) · f (x(k) ) < 0. Precisely, once two values x(’1) and x(0)
have been found such that f (x(’1) ) · f (x(0) ) < 0, we let

x(k) ’ x(k )
x(k+1) = x(k) ’ k ≥ 0.
f (x(k) ), (6.15)
(k) ) ’ f (x(k ) )
f (x
6.2 A Geometric Approach to Root¬nding 253

Having ¬xed an absolute tolerance µ, the iteration (6.15) terminates at the
m-th step such that |f (x(m) )| < µ. Notice that the sequence of indices k
is nondecreasing; therefore, in order to ¬nd at step k the new value of k ,
it is not necessary to sweep all the sequence back, but it su¬ces to stop at
the value of k that has been determined at the previous step. We show in
Figure 6.3 (left) the ¬rst two steps of (6.15) in the special case in which
x(k ) coincides with x(’1) for any k ≥ 0.
The Regula Falsi method, though of the same complexity as the secant
method, has linear convergence order (see, for example, [RR78], pp. 339-

<<

. 10
( 26)



>>