ńņš. 2 |

Ļ1 ā„ . . . ā„ Ļr > Ļr+1 = . . . = Ļp = 0,

then the rank of A is r, the kernel of A is the span of the column vectors

of V, {vr+1 , . . . , vn }, and the range of A is the span of the column vectors

of U, {u1 , . . . , ur }.

Deļ¬nition 1.15 Suppose that Aā CmĆ—n has rank equal to r and that it

admits a SVD of the type UH AV = Ī£. The matrix Aā = VĪ£ā UH is called

the Moore-Penrose pseudo-inverse matrix, being

1 1

Ī£ā = diag , . . . , , 0, . . . , 0 . (1.11)

Ļ1 Ļr

The matrix Aā is also called the generalized inverse of A (see Exercise 13).

Indeed, if rank(A) = n < m, then Aā = (AT A)ā’1 AT , while if n = m =

rank(A), Aā = Aā’1 . For further properties of Aā , see also Exercise 12.

1.10 Scalar Product and Norms in Vector Spaces

Very often, to quantify errors or measure distances one needs to compute

the magnitude of a vector or a matrix. For that purpose we introduce in

this section the concept of a vector norm and, in the following one, of a

matrix norm. We refer the reader to [Ste73], [SS90] and [Axe94] for the

proofs of the properties that are reported hereafter.

18 1. Foundations of Matrix Analysis

Deļ¬nition 1.16 A scalar product on a vector space V deļ¬ned over K

is any map (Ā·, Ā·) acting from V Ć— V into K which enjoys the following

properties:

1. it is linear with respect to the vectors of V, that is

(Ī³x + Ī»z, y) = Ī³(x, y) + Ī»(z, y), āx, z ā V, āĪ³, Ī» ā K;

2. it is hermitian, that is, (y, x) = (x, y), āx, y ā V ;

3. it is positive deļ¬nite, that is, (x, x) > 0, āx = 0 (in other words,

(x, x) ā„ 0, and (x, x) = 0 if and only if x = 0).

In the case V = Cn (or Rn ), an example is provided by the classical Eu-

clidean scalar product given by

n

H

(x, y) = y x = xi yi ,

ĀÆ

i=1

where z denotes the complex conjugate of z.

ĀÆ

Moreover, for any given square matrix A of order n and for any x, yā Cn

the following relation holds

(Ax, y) = (x, AH y). (1.12)

In particular, since for any matrix Q ā CnĆ—n , (Qx, Qy) = (x, QH Qy), one

gets

Property 1.8 Unitary matrices preserve the Euclidean scalar product, that

is, (Qx, Qy) = (x, y) for any unitary matrix Q and for any pair of vectors

x and y.

Deļ¬nition 1.17 Let V be a vector space over K. We say that the map

Ā· from V into R is a norm on V if the following axioms are satisļ¬ed:

1. (i) v ā„ 0 āv ā V and (ii) v = 0 if and only if v = 0;

2. Ī±v = |Ī±| v āĪ± ā K, āv ā V (homogeneity property);

3. v + w ā¤ v + w āv, w ā V (triangular inequality),

where |Ī±| denotes the absolute value of Ī± if K = R, the module of Ī± if

K = C.

1.10 Scalar Product and Norms in Vector Spaces 19

The pair (V, Ā· ) is called a normed space. We shall distinguish among

norms by a suitable subscript at the margin of the double bar symbol. In

the case the map | Ā· | from V into R enjoys only the properties 1(i), 2 and

3 we shall call such a map a seminorm. Finally, we shall call a unit vector

any vector of V having unit norm.

An example of a normed space is Rn , equipped for instance by the p-norm

(or HĀØlder norm); this latter is deļ¬ned for a vector x of components {xi }

o

as

1/p

n

|xi | for 1 ā¤ p < ā.

p

x = , (1.13)

p

i=1

Notice that the limit as p goes to inļ¬nity of x p exists, is ļ¬nite, and equals

the maximum module of the components of x. Such a limit deļ¬nes in turn

a norm, called the inļ¬nity norm (or maximum norm), given by

= max |xi |.

x ā

1ā¤iā¤n

When p = 2, from (1.13) the standard deļ¬nition of Euclidean norm is

recovered

1/2

n

1/2

|xi |2

= (x, x)1/2 = = xT x

x ,

2

i=1

for which the following property holds.

Property 1.9 (Cauchy-Schwarz inequality) For any pair x, y ā Rn ,

|(x, y)| = |xT y| ā¤ x y 2, (1.14)

2

where strict equality holds iļ¬ y = Ī±x for some Ī± ā R.

We recall that the scalar product in Rn can be related to the p-norms

introduced over Rn in (1.13) by the HĀØlder inequality

o

11

|(x, y)| ā¤ x y q, with + = 1.

p

pq

In the case where V is a ļ¬nite-dimensional space the following property

holds (for a sketch of the proof, see Exercise 14).

Property 1.10 Any vector norm Ā· deļ¬ned on V is a continuous function

of its argument, namely, āĪµ > 0, āC > 0 such that if x ā’ x ā¤ Īµ then

| x ā’ x | ā¤ CĪµ, for any x, x ā V .

New norms can be easily built using the following result.

20 1. Foundations of Matrix Analysis

Property 1.11 Let Ā· be a norm of Rn and A ā RnĆ—n be a matrix with

n linearly independent columns. Then, the function Ā· A2 acting from Rn

into R deļ¬ned as

āx ā Rn ,

x = Ax

A2

is a norm of Rn .

Two vectors x, y in V are said to be orthogonal if (x, y) = 0. This statement

has an immediate geometric interpretation when V = R2 since in such a

case

(x, y) = x y cos(Ļ‘),

2 2

where Ļ‘ is the angle between the vectors x and y. As a consequence, if

(x, y) = 0 then Ļ‘ is a right angle and the two vectors are orthogonal in the

geometric sense.

Deļ¬nition 1.18 Two norms Ā· p and Ā· q on V are equivalent if there

exist two positive constants cpq and Cpq such that

ā¤x ā¤ Cpq x āx ā V.

cpq x q p q

In a ļ¬nite-dimensional normed space all norms are equivalent. In particular,

if V = Rn it can be shown that for the p-norms, with p = 1, 2, and ā, the

constants cpq and Cpq take the value reported in Table 1.1.

q=ā q=ā

cpq q=1 q=2 Cpq q=1 q=2

n1/2

p=1 1 1 1 p=1 1 n

ā’1/2

n1/2

p=2 n 1 1 p=2 1 1

nā’1 ā’1/2

p=ā p=ā

n 1 1 1 1

TABLE 1.1. Equivalence constants for the main norms of Rn

In this book we shall often deal with sequences of vectors and with their

convergence. For this purpose, we recall that a sequence of vectors x(k)

in a vector space V having ļ¬nite dimension n, converges to a vector x, and

we write lim x(k) = x if

kā’ā

(k)

lim xi = xi , i = 1, . . . , n (1.15)

kā’ā

(k)

where xi and xi are the components of the corresponding vectors with

respect to a basis of V . If V = Rn , due to the uniqueness of the limit of a

1.11 Matrix Norms 21

sequence of real numbers, (1.15) implies also the uniqueness of the limit, if

existing, of a sequence of vectors.

We further notice that in a ļ¬nite-dimensional space all the norms are topo-

logically equivalent in the sense of convergence, namely, given a sequence

of vectors x(k) ,

|||x(k) ||| ā’ 0 ā” x(k) ā’ 0 if k ā’ ā,

where ||| Ā· ||| and Ā· are any two vector norms. As a consequence, we can

establish the following link between norms and limits.

Ā·

Property 1.12 Let be a norm in a space ļ¬nite dimensional space V .

Then

lim x(k) = x ā” lim x ā’ x(k) = 0,

kā’ā kā’ā

where x ā V and x(k) is a sequence of elements of V .

1.11 Matrix Norms

Deļ¬nition 1.19 A matrix norm is a mapping Ā· : RmĆ—n ā’ R such that:

1. A ā„ 0 āA ā RmĆ—n and A = 0 if and only if A = 0;

2. Ī±A = |Ī±| A āĪ± ā R, āA ā RmĆ—n (homogeneity);

3. A + B ā¤ A + B āA, B ā RmĆ—n (triangular inequality).

Unless otherwise speciļ¬ed we shall employ the same symbol Ā· , to denote

matrix norms and vector norms.

We can better characterize the matrix norms by introducing the concepts

of compatible norm and norm induced by a vector norm.

Deļ¬nition 1.20 We say that a matrix norm Ā· is compatible or consistent

with a vector norm Ā· if

Ax ā¤ A āx ā Rn .

x, (1.16)

More generally, given three norms, all denoted by Ā· , albeit deļ¬ned on

Rm , Rn and RmĆ—n , respectively, we say that they are consistent if āx ā Rn ,

Ax = y ā Rm , A ā RmĆ—n , we have that y ā¤ A x .

In order to single out matrix norms of practical interest, the following

property is in general required

22 1. Foundations of Matrix Analysis

Ā·

Deļ¬nition 1.21 We say that a matrix norm is sub-multiplicative if

āA ā RnĆ—m , āB ā RmĆ—q

AB ā¤ A B. (1.17)

This property is not satisļ¬ed by any matrix norm. For example (taken from

[GL89]), the norm A ā = max |aij | for i = 1, . . . , n, j = 1, . . . , m does

not satisfy (1.17) if applied to the matrices

1 1

A=B= ,

1 1

since 2 = AB ā > A ā B ā = 1.

Notice that, given a certain sub-multiplicative matrix norm Ā· Ī± , there

always exists a consistent vector norm. For instance, given any ļ¬xed vector

y = 0 in Cn , it suļ¬ces to deļ¬ne the consistent vector norm as

x ā Cn .

x = xyH Ī±

As a consequence, in the case of sub-multiplicative matrix norms it is no

longer necessary to explicitly specify the vector norm with respect to the

matrix norm is consistent.

Example 1.7 The norm

n

|aij |2 = tr(AAH )

A = (1.18)

F

i,j=1

2

is a matrix norm called the Frobenius norm (or Euclidean norm in Cn ) and is

compatible with the Euclidean vector norm Ā· 2 . Indeed,

2

n n n n n

ā¤ |aij | |xj |2

2 2 2

x 2.

Ax = aij xj =A

2 F 2

i=1 j=1 i=1 j=1 j=1

ā

ā¢

Notice that for such a norm In = n.

F

In view of the deļ¬nition of a natural norm, we recall the following theorem.

Theorem 1.1 Let Ā· be a vector norm. The function

Ax

A = sup (1.19)

x

x=0

is a matrix norm called induced matrix norm or natural matrix norm.

1.11 Matrix Norms 23

Proof. We start by noticing that (1.19) is equivalent to

A = sup Ax . (1.20)

x =1

Indeed, one can deļ¬ne for any x = 0 the unit vector u = x/ x , so that (1.19)

becomes

A = sup Au = Aw with w = 1.

u =1

This being taken as given, let us check that (1.19) (or, equivalently, (1.20)) is

actually a norm, making direct use of Deļ¬nition 1.19.

1. If Ax ā„ 0, then it follows that A = sup Ax ā„ 0. Moreover

x =1

Ax

= 0 ā” Ax = 0 āx = 0

A = sup

x

x=0

and Ax = 0 āx = 0 if and only if A=0; therefore A = 0 ā” A = 0.

2. Given a scalar Ī±,

Ī±A = sup Ī±Ax = |Ī±| sup Ax = |Ī±| A .

x =1 x =1

3. Finally, triangular inequality holds. Indeed, by deļ¬nition of supremum, if

x = 0 then

Ax

ā¤A ā’ Ax ā¤ A x,

x

so that, taking x with unit norm, one gets

(A + B)x ā¤ Ax + Bx ā¤ A + B ,

from which it follows that A + B = sup (A + B)x ā¤ A + B .

x =1

3

Relevant instances of induced matrix norms are the so-called p-norms de-

ļ¬ned as

Ax p

A = sup

p

xp

x=0

The 1-norm and the inļ¬nity norm are easily computable since

m n

|aij |, |aij |

A = max A = max

ā

1

j=1,... ,n i=1,... ,m

i=1 j=1

and they are called the column sum norm and the row sum norm, respec-

tively.

Moreover, we have A 1 = AT ā and, if A is self-adjoint or real sym-

metric, A 1 = A ā .

A special discussion is deserved by the 2-norm or spectral norm for which

the following theorem holds.

24 1. Foundations of Matrix Analysis

Theorem 1.2 Let Ļ1 (A) be the largest singular value of A. Then

Ļ(AH A) = Ļ(AAH ) = Ļ1 (A).

A = (1.21)

2

In particular, if A is hermitian (or real and symmetric), then

A = Ļ(A), (1.22)

2

while, if A is unitary, A = 1.

2

Proof. Since AH A is hermitian, there exists a unitary matrix U such that

UH AH AU = diag(Āµ1 , . . . , Āµn ),

where Āµi are the (positive) eigenvalues of AH A. Let y = UH x, then

(AH Ax, x) (UH AH AUy, y)

A = sup = sup

2

(x, x) (y, y)

x=0 y=0

n n

Āµi |yi | / |yi |2 = max |Āµi |,

2

= sup

i=1,... ,n

y=0

i=1 i=1

from which (1.21) follows, thanks to (1.10).

If A is hermitian, the same considerations as above apply directly to A.

Finally, if A is unitary

2

= (Ax, Ax) = (x, AH Ax) = x 2

Ax 2 2

3

so that A = 1.

2

As a consequence, the computation of A 2 is much more expensive than

that of A ā or A 1 . However, if only an estimate of A 2 is required,

the following relations can be proļ¬tably employed in the case of square

matrices

max|aij | ā¤ A ā¤ n max|aij |,2

i,j i,j

ā

A ā ā¤ A 2 ā¤ n A ā,

1

ā

n

ā

A 1 ā¤ A 2 ā¤ n A 1,

1

ā

n

ā¤

A A A ā.

2 1

For other estimates of similar type we refer to Exercise 17. Moreover, if A

is normal then A 2 ā¤ A p for any n and all p ā„ 2.

Theorem 1.3 Let ||| Ā· ||| be a matrix norm induced by a vector norm Ā·.

Then

1. Ax ā¤ |||A||| x , that is, ||| Ā· ||| is a norm compatible with Ā·;

1.11 Matrix Norms 25

2. |||I||| = 1;

3. |||AB||| ā¤ |||A||| |||B|||, that is, ||| Ā· ||| is sub-multiplicative.

Proof. Part 1 of the theorem is already contained in the proof of Theorem 1.1,

while part 2 follows from the fact that |||I||| = sup Ix / x = 1. Part 3 is simple

x=0

3

to check.

Notice that the p-norms are sub-multiplicative. Moreover, we remark that

the sub-multiplicativity property by itself would only allow us to conclude

that |||I||| ā„ 1. Indeed, |||I||| = |||I Ā· I||| ā¤ |||I|||2 .

1.11.1 Relation between Norms and the Spectral Radius of a

Matrix

We next recall some results that relate the spectral radius of a matrix to

matrix norms and that will be widely employed in Chapter 4.

Ā·

Theorem 1.4 Let be a consistent matrix norm; then

Ļ(A) ā¤ A āA ā CnĆ—n .

Proof. Let Ī» be an eigenvalue of A and v = 0 an associated eigenvector. As a

Ā·

consequence, since is consistent, we have

|Ī»| v = Ī»v = Av ā¤ A v

so that |Ī»| ā¤ A . 3

More precisely, the following property holds (see for the proof [IK66], p.

12, Theorem 3).

Property 1.13 Let A ā CnĆ—n and Īµ > 0. Then, there exists a consistent

matrix norm Ā· A,Īµ (depending on Īµ) such that

ā¤ Ļ(A) + Īµ.

A A,Īµ

As a result, having ļ¬xed an arbitrarily small tolerance, there always exists

a matrix norm which is arbitrarily close to the spectral radius of A, namely

Ļ(A) = inf A , (1.23)

Ā·

the inļ¬mum being taken on the set of all the consistent norms.

For the sake of clarity, we notice that the spectral radius is a sub-

multiplicative seminorm, since it is not true that Ļ(A) = 0 iļ¬ A = 0.

As an example, any triangular matrix with null diagonal entries clearly has

spectral radius equal to zero. Moreover, we have the following result.

26 1. Foundations of Matrix Analysis

Property 1.14 Let A be a square matrix and let Ā· be a consistent norm.

Then

lim Am 1/m

= Ļ(A).

mā’ā

1.11.2 Sequences and Series of Matrices

ā RnĆ—n is said to converge to a matrix

A(k)

A sequence of matrices

A ā RnĆ—n if

lim A(k) ā’ A = 0.

kā’ā

The choice of the norm does not inļ¬‚uence the result since in RnĆ—n all norms

are equivalent. In particular, when studying the convergence of iterative

methods for solving linear systems (see Chapter 4), one is interested in the

so-called convergent matrices for which

lim Ak = 0,

kā’ā

0 being the null matrix. The following theorem holds.

Theorem 1.5 Let A be a square matrix; then

lim Ak = 0 ā” Ļ(A) < 1. (1.24)

kā’ā

ā

Ak is convergent iļ¬ Ļ(A) < 1. In such a

Moreover, the geometric series

k=0

case

ā

Ak = (I ā’ A)ā’1 . (1.25)

k=0

As a result, if Ļ(A) < 1 the matrix I ā’ A is invertible and the following

inequalities hold

1 1

ā¤ (I ā’ A)ā’1 ā¤ (1.26)

1ā’ A

1+ A

Ā·

where is an induced matrix norm such that A < 1.

Proof. Let us prove (1.24). Let Ļ(A) < 1, then āĪµ > 0 such that Ļ(A) < 1 ā’ Īµ

and thus, thanks to Property 1.13, there exists a consistent matrix norm Ā· such

that A ā¤ Ļ(A) + Īµ < 1. From the fact that Ak ā¤ A k < 1 and from the

deļ¬nition of convergence it turns out that as k ā’ ā the sequence Ak tends

to zero. Conversely, assume that lim Ak = 0 and let Ī» denote an eigenvalue of

kā’ā

k k

A. Then, A x = Ī» x, being x(=0) an eigenvector associated with Ī», so that

1.12 Positive Deļ¬nite, Diagonally Dominant and M-matrices 27

lim Ī»k = 0. As a consequence, |Ī»| < 1 and because this is true for a generic

kā’ā

eigenvalue one gets Ļ(A) < 1 as desired. Relation (1.25) can be obtained noting

ļ¬rst that the eigenvalues of Iā’A are given by 1 ā’ Ī»(A), Ī»(A) being the generic

eigenvalue of A. On the other hand, since Ļ(A) < 1, we deduce that Iā’A is

nonsingular. Then, from the identity

(I ā’ A)(I + A + . . . + An ) = (I ā’ An+1 )

and taking the limit for n tending to inļ¬nity the thesis follows since

ā

(I ā’ A) Ak = I.

k=0

Finally, thanks to Theorem 1.3, the equality I = 1 holds, so that

(I ā’ A)ā’1 ā¤ (1 + A ) (I ā’ A)ā’1 ,

1= I ā¤ Iā’A

giving the ļ¬rst inequality in (1.26). As for the second part, noting that I =

Iā’A+A and multiplying both sides on the right by (Iā’A)ā’1 , one gets (Iā’A)ā’1 =

I + A(I ā’ A)ā’1 . Passing to the norms, we obtain

(I ā’ A)ā’1 ā¤ 1 + A (I ā’ A)ā’1 ,

3

and thus the second inequality, since A < 1.

Remark 1.1 The assumption that there exists an induced matrix norm

such that A < 1 is justiļ¬ed by Property 1.13, recalling that A is conver-

gent and, therefore, Ļ(A) < 1.

Notice that (1.25) suggests an algorithm to approximate the inverse of a

matrix by a truncated series expansion.

1.12 Positive Deļ¬nite, Diagonally Dominant and

M-matrices

Deļ¬nition 1.22 A matrix A ā CnĆ—n is positive deļ¬nite in Cn if the num-

ber (Ax, x) is real and positive āx ā Cn , x = 0. A matrix A ā RnĆ—n is

positive deļ¬nite in Rn if (Ax, x) > 0 āx ā Rn , x = 0. If the strict in-

equality is substituted by the weak one (ā„) the matrix is called positive

semideļ¬nite.

Example 1.8 Matrices that are positive deļ¬nite in Rn are not necessarily sym-

metric. An instance is provided by matrices of the form

2 Ī±

A= (1.27)

ā’2 ā’ Ī± 2

28 1. Foundations of Matrix Analysis

for Ī± = ā’1. Indeed, for any non null vector x = (x1 , x2 )T in R2

(Ax, x) = 2(x2 + x2 ā’ x1 x2 ) > 0.

1 2

Notice that A is not positive deļ¬nite in C2 . Indeed, if we take a complex vector

ā¢

x we ļ¬nd out that the number (Ax, x) is not real-valued in general.

Deļ¬nition 1.23 Let A ā RnĆ—n . The matrices

1 1

(A + AT ), ASS = (A ā’ AT )

AS =

2 2

are respectively called the symmetric part and the skew-symmetric part

of A. Obviously, A = AS + ASS . If A ā CnĆ—n , the deļ¬nitions modify as

follows: AS = 1 (A + AH ) and ASS = 1 (A ā’ AH ).

2 2

The following property holds

Property 1.15 A real matrix A of order n is positive deļ¬nite iļ¬ its sym-

metric part AS is positive deļ¬nite.

Indeed, it suļ¬ces to notice that, due to (1.12) and the deļ¬nition of ASS ,

xT ASS x = 0 āx ā Rn . For instance, the matrix in (1.27) has a positive

deļ¬nite symmetric part, since

2 ā’1

1

(A + AT ) =

AS = .

ā’1 2

2

This holds more generally (for the proof see [Axe94]).

Property 1.16 Let A ā CnĆ—n (respectively, A ā RnĆ—n ); if (Ax, x) is real-

valued āx ā Cn , then A is hermitian (respectively, symmetric).

An immediate consequence of the above results is that matrices that are

positive deļ¬nite in Cn do satisfy the following characterizing property.

Property 1.17 A square matrix A of order n is positive deļ¬nite in Cn

iļ¬ it is hermitian and has positive eigenvalues. Thus, a positive deļ¬nite

matrix is nonsingular.

In the case of positive deļ¬nite real matrices in Rn , results more speciļ¬c

than those presented so far hold only if the matrix is also symmetric (this is

the reason why many textbooks deal only with symmetric positive deļ¬nite

matrices). In particular

Property 1.18 Let A ā RnĆ—n be symmetric. Then, A is positive deļ¬nite

iļ¬ one of the following properties is satisļ¬ed:

1. (Ax, x) > 0 āx = 0 with xā Rn ;

1.12 Positive Deļ¬nite, Diagonally Dominant and M-matrices 29

2. the eigenvalues of the principal submatrices of A are all positive;

3. the dominant principal minors of A are all positive (Sylvester crite-

rion);

4. there exists a nonsingular matrix H such that A = HT H.

All the diagonal entries of a positive deļ¬nite matrix are positive. Indeed,

if ei is the i-th vector of the canonical basis of Rn , then eT Aei = aii > 0.

i

Moreover, it can be shown that if A is symmetric positive deļ¬nite, the

entry with the largest module must be a diagonal entry (these last two

properties are therefore necessary conditions for a matrix to be positive

deļ¬nite).

We ļ¬nally notice that if A is symmetric positive deļ¬nite and A1/2 is

the only positive deļ¬nite matrix that is a solution of the matrix equation

X2 = A, the norm

= A1/2 x = (Ax, x)1/2

x (1.28)

A 2

deļ¬nes a vector norm, called the energy norm of the vector x. Related to

the energy norm is the energy scalar product given by (x, y)A = (Ax, y).

Deļ¬nition 1.24 A matrix Aā RnĆ—n is called diagonally dominant by rows

if

n

|aii | ā„ |aij |, with i = 1, . . . , n,

j=1,j=i

while it is called diagonally dominant by columns if

n

|aii | ā„ |aji |, with i = 1, . . . , n.

j=1,j=i

If the inequalities above hold in a strict sense, A is called strictly diagonally

dominant (by rows or by columns, respectively).

A strictly diagonally dominant matrix that is symmetric with positive di-

agonal entries is also positive deļ¬nite.

Deļ¬nition 1.25 A nonsingular matrix A ā RnĆ—n is an M-matrix if aij ā¤ 0

for i = j and if all the entries of its inverse are nonnegative.

M-matrices enjoy the so-called discrete maximum principle, that is, if A is

an M-matrix and Ax ā¤ 0, then x ā¤ 0 (where the inequalities are meant

componentwise). In this connection, the following result can be useful.

Property 1.19 (M-criterion) Let a matrix A satisfy aij ā¤ 0 for i = j.

Then A is an M-matrix if and only if there exists a vector w > 0 such that

Aw > 0.

30 1. Foundations of Matrix Analysis

Finally, M-matrices are related to strictly diagonally dominant matrices

by the following property.

Property 1.20 A matrix A ā RnĆ—n that is strictly diagonally dominant

by rows and whose entries satisfy the relations aij ā¤ 0 for i = j and aii > 0,

is an M-matrix.

For further results about M-matrices, see for instance [Axe94] and [Var62].

1.13 Exercises

1. Let W1 and W2 be two subspaces of Rn . Prove that if V = W1 ā• W2 , then

dim(V ) = dim(W1 ) + dim(W2 ), while in general

dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) ā’ dim(W1 ā© W2 ).

[Hint : Consider a basis for W1 ā© W2 and ļ¬rst extend it to W1 , then to

W2 , verifying that the basis formed by the set of the obtained vectors is a

basis for the sum space.]

2. Check that the following set of vectors

vi = xiā’1 , xiā’1 , . . . , xiā’1 , i = 1, 2, . . . , n,

n

1 2

forms a basis for Rn , x1 , . . . , xn being a set of n distinct points of R.

3. Exhibit an example showing that the product of two symmetric matrices

may be nonsymmetric.

4. Let B be a skew-symmetric matrix, namely, BT = ā’B. Let A = (I + B)(I ā’

B)ā’1 and show that Aā’1 = AT .

5. A matrix A ā CnĆ—n is called skew-hermitian if AH = ā’A. Show that the

diagonal entries of A must be purely imaginary numbers.

6. Let A, B and A+B be invertible matrices of order n. Show that also Aā’1 +

Bā’1 is nonsingular and that

ā’1

Aā’1 + Bā’1 = A (A + B)ā’1 B = B (A + B)ā’1 A.

ā’1 ā’1

= A (B + A)ā’1 B. The sec-

[Solution : Aā’1 + Bā’1 = A I + Bā’1 A

ond equality is proved similarly by factoring out B and A, respectively from

left and right.]

7. Given the non symmetric real matrix

ļ£® ļ£¹

0 1 1

A=ļ£° 1 ā’1 ļ£» ,

0

ā’1 ā’1 0

check that it is similar to the diagonal matrix D = diag(1, 0, ā’1) and ļ¬nd

its eigenvectors. Is this matrix normal?

[Solution : the matrix is not normal.]

1.13 Exercises 31

n

ck Ak and

8. Let A be a square matrix of order n. Check that if P (A) =

k=0

Ī»(A) are the eigenvalues of A, then the eigenvalues of P (A) are given by

Ī»(P (A)) = P (Ī»(A)). In particular, prove that Ļ(A2 ) = [Ļ(A)]2 .

9. Prove that a matrix of order n having n distinct eigenvalues cannot be

defective. Moreover, prove that a normal matrix cannot be defective.

10. Commutativity of matrix product. Show that if A and B are square matri-

ces that share the same set of eigenvectors, then AB = BA. Prove, by a

counterexample, that the converse is false.

11. Let A be a normal matrix whose eigenvalues are Ī»1 , . . . , Ī»n . Show that the

singular values of A are |Ī»1 |, . . . , |Ī»n |.

12. Let A ā CmĆ—n with rank(A) = n. Show that Aā = (AT A)ā’1 AT enjoys the

following properties

(1) Aā A = In ; (2) Aā AAā = Aā , AAā A = A; (3) if m = n, Aā = Aā’1 .

13. Show that the Moore-Penrose pseudo-inverse matrix Aā is the only matrix

that minimizes the functional

AX ā’ Im

min F,

XāCnĆ—m

Ā·

where is the Frobenius norm.

F

14. Prove Property 1.10.

[Solution : For any x, x ā V show that | x ā’ x | ā¤ x ā’ x . Assuming

that dim(V ) = n and expanding the vector w = x ā’ x on a basis of V,

show that w ā¤ C w ā , from which the thesis follows by imposing in

the ļ¬rst obtained inequality that w ā ā¤ Īµ.]

15. Prove Property 1.11 in the case A ā RnĆ—m with m linearly independent

columns.

[Hint : First show that Ā· A fulļ¬lls all the properties characterizing a

norm: positiveness (A has linearly independent columns, thus if x = 0, then

Ax = 0, which proves the thesis), homogeneity and triangular inequality.]

16. Show that for a rectangular matrix A ā RmĆ—n

2 2 2

A = Ļ1 + . . . + Ļ p ,

F

where p is the minimum between m and n, Ļi are the singular values of A

and Ā· F is the Frobenius norm.

17. Assuming p, q = 1, 2, ā, F , recover the following table of equivalence con-

stants cpq such that āA ā RnĆ—n , A p ā¤ cpq A q .

q=ā

cpq q=1 q=2 q=F

ā ā

p=1 1 n n n

ā ā

p=2 n 1 n 1

ā ā

p=ā n n 1 n

ā ā ā

p=F n n n 1

32 1. Foundations of Matrix Analysis

18. A matrix norm for which A = |A| is called absolute norm, having

denoted by |A| the matrix of the absolute values of the entries of A. Prove

that Ā· 1 , Ā· ā and Ā· F are absolute norms, while Ā· 2 is not. Show

that for this latter

ā

1

āA ā¤ |A| ā¤ n A 2.

2 2

n

2

Principles of Numerical Mathematics

The basic concepts of consistency, stability and convergence of a numerical

method will be introduced in a very general context in the ļ¬rst part of

the chapter: they provide the common framework for the analysis of any

method considered henceforth. The second part of the chapter deals with

the computer ļ¬nite representation of real numbers and the analysis of error

propagation in machine operations.

2.1 Well-posedness and Condition Number of a

Problem

Consider the following problem: ļ¬nd x such that

F (x, d) = 0 (2.1)

where d is the set of data which the solution depends on and F is the

functional relation between x and d. According to the kind of problem

that is represented in (2.1), the variables x and d may be real numbers,

vectors or functions. Typically, (2.1) is called a direct problem if F and d

are given and x is the unknown, inverse problem if F and x are known

and d is the unknown, identiļ¬cation problem when x and d are given while

the functional relation F is the unknown (these latter problems will not be

covered in this volume).

Problem (2.1) is well posed if it admits a unique solution x which depends

with continuity on the data. We shall use the terms well posed and stable in

34 2. Principles of Numerical Mathematics

an interchanging manner and we shall deal henceforth only with well-posed

problems.

A problem which does not enjoy the property above is called ill posed or

unstable and before undertaking its numerical solution it has to be regular-

ized, that is, it must be suitably transformed into a well-posed problem (see,

for instance [Mor84]). Indeed, it is not appropriate to pretend the numerical

method can cure the pathologies of an intrinsically ill-posed problem.

Example 2.1 A simple instance of an ill-posed problem is ļ¬nding the number

of real roots of a polynomial. For example, the polynomial p(x) = x4 ā’ x2 (2a ā’

1) + a(a ā’ 1) exhibits a discontinuous variation of the number of real roots as a

continuously varies in the real ļ¬eld. We have, indeed, 4 real roots if a ā„ 1, 2 if

a ā [0, 1) while no real roots exist if a < 0. ā¢

Continuous dependence on the data means that small perturbations on

the data d yield āsmallā changes in the solution x. Precisely, denoting by Ī“d

an admissible perturbation on the data and by Ī“x the consequent change

in the solution, in such a way that

F (x + Ī“x, d + Ī“d) = 0, (2.2)

then

āĪ· > 0, āK(Ī·, d) : Ī“d < Ī· ā’ Ī“x ā¤ K(Ī·, d) Ī“d . (2.3)

The norms used for the data and for the solution may not coincide, when-

ever d and x represent variables of diļ¬erent kinds.

With the aim of making this analysis more quantitative, we introduce the

following deļ¬nition.

Deļ¬nition 2.1 For problem (2.1) we deļ¬ne the relative condition number

to be

Ī“x / x

, (2.4)

K(d) = sup

Ī“d / d

Ī“dāD

where D is a neighborhood of the origin and denotes the set of admissible

perturbations on the data for which the perturbed problem (2.2) still makes

sense. Whenever d = 0 or x = 0, it is necessary to introduce the absolute

condition number, given by

Ī“x

Kabs (d) = sup . (2.5)

Ī“d

Ī“dāD

Problem (2.1) is called ill-conditioned if K(d) is ābigā for any admissible

datum d (the precise meaning of āsmallā and ābigā is going to change

depending on the considered problem).

2.1 Well-posedness and Condition Number of a Problem 35

The property of a problem of being well-conditioned is independent of

the numerical method that is being used to solve it. In fact, it is possible

to generate stable as well as unstable numerical schemes for solving well-

conditioned problems. The concept of stability for an algorithm or for a

numerical method is analogous to that used for problem (2.1) and will be

made precise in the next section.

Remark 2.1 (Ill-posed problems) Even in the case in which the condi-

tion number does not exist (formally, it is inļ¬nite), it is not necessarily true

that the problem is ill-posed. In fact there exist well posed problems (for

instance, the search of multiple roots of algebraic equations, see Example

2.2) for which the condition number is inļ¬nite, but such that they can be

reformulated in equivalent problems (that is, having the same solutions)

with a ļ¬nite condition number.

If problem (2.1) admits a unique solution, then there necessarily exists a

mapping G, that we call resolvent, between the sets of the data and of the

solutions, such that

x = G(d), that is F (G(d), d) = 0. (2.6)

According to this deļ¬nition, (2.2) yields x + Ī“x = G(d + Ī“d). Assuming

that G is diļ¬erentiable in d and denoting formally by G (d) its derivative

with respect to d (if G : Rn ā’ Rm , G (d) will be the Jacobian matrix of

G evaluated at the vector d), a Taylorā™s expansion of G truncated at ļ¬rst

order ensures that

G(d + Ī“d) ā’ G(d) = G (d)Ī“d + o( Ī“d ) for Ī“d ā’ 0,

where Ā· is a suitable norm for Ī“d and o(Ā·) is the classical inļ¬nitesimal

symbol denoting an inļ¬nitesimal term of higher order with respect to its

argument. Neglecting the inļ¬nitesimal of higher order with respect to Ī“d ,

from (2.4) and (2.5) we respectively deduce that

d

K(d) G (d) , Kabs (d) G (d) , (2.7)

G(d)

the symbol Ā· denoting the matrix norm associated with the vector norm

(deļ¬ned in (1.19)). The estimates in (2.7) are of great practical usefulness

in the analysis of problems in the form (2.6), as shown in the forthcoming

examples.

Example 2.2 (Algebraic equations of second degree) The solutions to the

algebraic equation x2 ā’ 2px + 1 = 0, with p ā„ 1, are xĀ± = p Ā± p2 ā’ 1. In this

case, F (x, p) = x2 ā’ 2px + 1, the datum d is the coeļ¬cient p, while x is the vector

of components {x+ , xā’ }. As for the condition number, we notice that (2.6) holds

36 2. Principles of Numerical Mathematics

by taking G : R ā’ R2 , G(p) = {x+ , xā’ }. Letting GĀ± (p) = xĀ± , it follows that

GĀ± (p) = 1 Ā± p/ p2 ā’ 1. Using (2.7) with Ā· = Ā· 2 we get

|p|

K(p) , p > 1. (2.8)

p2 ā’ 1

ā

From (2.8) it turns out that in the case of separated roots (say, if p ā„ 2)

problem F (x, p) = 0 is well conditioned. The behavior dramatically changes in

the case of multiple roots, that is when p = 1. First of all, one notices that the

function GĀ± (p) = p Ā± p2 ā’ 1 is no longer diļ¬erentiable for p = 1, which makes

(2.8) meaningless. On the other hand, equation (2.8) shows that, for p close to

1, the problem at hand is ill conditioned. However, the problem is not ill posed.

Indeed, following Remark 2.1, it is possible to reformulate it in an equivalent

manner as F (x, t) = x2 ā’ ((1 + t2 )/t)x + 1 = 0, with t = p + p2 ā’ 1, whose

roots xā’ = t and x+ = 1/t coincide for t = 1. The change of parameter thus

removes the singularity that is present in the former representation of the roots

as functions of p. The two roots xā’ = xā’ (t) and x+ = x+ (t) are now indeed

regular functions of t in the neighborhood of t = 1 and evaluating the condition

number by (2.7) yields K(t) 1 for any value of t. The transformed problem is

ā¢

thus well conditioned.

Example 2.3 (Systems of linear equations) Consider the linear system Ax

= b, where x and b are two vectors in Rn , while A is the matrix (n Ć— n) of the

real coeļ¬cients of the system. Suppose that A is nonsingular; in such a case x

is the unknown solution x, while the data d are the right-hand side b and the

matrix A, that is, d = {bi , aij , 1 ā¤ i, j ā¤ n}.

Suppose now that we perturb only the right-hand side b. We have d = b,

x = G(b) = Aā’1 b so that, G (b) = Aā’1 , and (2.7) yields

Aā’1 b Ax

Aā’1 ā¤ A Aā’1 = K(A),

K(d) = (2.9)

ā’1 b

A x

where K(A) is the condition number of matrix A (see Section 3.1.1) and the use

of a consistent matrix norm is understood. Therefore, if A is well conditioned,

solving the linear system Ax=b is a stable problem with respect to perturbations

of the right-hand side b. Stability with respect to perturbations on the entries of

ā¢

A will be analyzed in Section 3.10.

Example 2.4 (Nonlinear equations) Let f : R ā’ R be a function of class

C 1 and consider the nonlinear equation

F (x, d) = f (x) = Ļ•(x) ā’ d = 0,

where Ļ• : R ā’ R is a suitable function and d ā R a datum (possibly equal

to zero). The problem is well deļ¬ned only if Ļ• is invertible in a neighborhood

of d: in such a case, indeed, x = Ļ•ā’1 (d) and the resolvent is G = Ļ•ā’1 . Since

ā’1

(Ļ•ā’1 ) (d) = [Ļ• (x)] , the ļ¬rst relation in (2.7) yields, for d = 0,

|d|

|[Ļ• (x)]ā’1 |,

K(d) (2.10)

|x|

2.2 Stability of Numerical Methods 37

while if d = 0 or x = 0 we have

|[Ļ• (x)]ā’1 |.

Kabs (d) (2.11)

The problem is thus ill posed if x is a multiple root of Ļ•(x)ā’d; it is ill conditioned

when Ļ• (x) is āsmallā, well conditioned when Ļ• (x) is ālargeā. We shall further

ā¢

address this subject in Section 6.1.

In view of (2.8), the quantity G (d) is an approximation of Kabs (d) and

is sometimes called ļ¬rst order absolute condition number. This latter rep-

resents the limit of the Lipschitz constant of G (see Section 11.1) as the

perturbation on the data tends to zero.

Such a number does not always provide a sound estimate of the condition

number Kabs (d). This happens, for instance, when G vanishes at a point

whilst G is non null in a neighborhood of the same point. For example,

take x = G(d) = cos(d) ā’ 1 for d ā (ā’Ļ/2, Ļ/2); we have G (0) = 0, while

Kabs (0) = 2/Ļ.

2.2 Stability of Numerical Methods

We shall henceforth suppose the problem (2.1) to be well posed. A numer-

ical method for the approximate solution of (2.1) will consist, in general,

of a sequence of approximate problems

nā„1

Fn (xn , dn ) = 0 (2.12)

depending on a certain parameter n (to be deļ¬ned case by case). The

understood expectation is that xn ā’ x as n ā’ ā, i.e. that the numerical

solution converges to the exact solution. For that, it is necessary that dn ā’

d and that Fn āapproximatesā F , as n ā’ ā. Precisely, if the datum d of

problem (2.1) is admissible for Fn , we say that (2.12) is consistent if

Fn (x, d) = Fn (x, d) ā’ F (x, d) ā’ 0 for n ā’ ā (2.13)

where x is the solution to problem (2.1) corresponding to the datum d.

The meaning of this deļ¬nition will be made precise in the next chapters

for any single class of considered problems.

A method is said to be strongly consistent if Fn (x, d) = 0 for any value

of n and not only for n ā’ ā.

In some cases (e.g., when iterative methods are used) problem (2.12)

could take the following form

nā„q

Fn (xn , xnā’1 , . . . , xnā’q , dn ) = 0 (2.14)

where x0 , x1 , . . . , xqā’1 are given. In such a case, the property of strong

consistency becomes Fn (x, x, . . . , x, d) = 0 for all n ā„ q.

38 2. Principles of Numerical Mathematics

Example 2.5 Let us consider the following iterative method (known as New-

tonā™s method and discussed in Section 6.2.2) for approximating a simple root Ī±

of a function f : R ā’ R,

f (xnā’1 )

xn = xnā’1 ā’ n ā„ 1.

given x0 , , (2.15)

f (xnā’1 )

The method (2.15) can be written in the form (2.14) by setting Fn (xn , xnā’1 , f ) =

xn ā’ xnā’1 + f (xnā’1 )/f (xnā’1 ) and is strongly consistent since Fn (Ī±, Ī±, f ) = 0

for all n ā„ 1.

Consider now the following numerical method (known as the composite mid-

b

point rule discussed in Section 9.2) for approximating x = a f (t) dt,

n

tk + tk+1

nā„1

xn = H f ,

2

k=1

where H = (b ā’ a)/n and tk = a + (k ā’ 1)H, k = 1, . . . , (n + 1). This method

is consistent; it is also strongly consistent provided thet f is a piecewise linear

polynomial.

More generally, all numerical methods obtained from the mathematical prob-

lem by truncation of limit operations (such as integrals, derivatives, series, . . . )

ā¢

are not strongly consistent.

Recalling what has been previously stated about problem (2.1), in order

for the numerical method to be well posed (or stable) we require that for any

ļ¬xed n, there exists a unique solution xn corresponding to the datum dn ,

that the computation of xn as a function of dn is unique and, furthermore,

that xn depends continuously on the data, i.e.

āĪ· > 0, āKn (Ī·, dn ) : Ī“dn < Ī· ā’ Ī“xn ā¤ Kn (Ī·, dn ) Ī“dn . (2.16)

As done in (2.4), we introduce for each problem in the sequence (2.12) the

quantities

Ī“xn / xn Ī“xn

Kn (dn ) = sup , Kabs,n (dn ) = sup , (2.17)

Ī“dn / dn Ī“dn

Ī“dn āDn Ī“dn āDn

and then deļ¬ne

K num (dn ) = lim sup Kn (dn ), num

Kabs (dn ) = lim sup Kabs,n (dn ).

kā’ā nā„k kā’ā nā„k

We call K num (dn ) the relative asymptotic condition number of the numer-

num

ical method (2.12) and Kabs (dn ) absolute asymptotic condition number,

corresponding to the datum dn .

The numerical method is said to be well conditioned if K num is āsmallā

for any admissible datum dn , ill conditioned otherwise. As in (2.6), let us

2.2 Stability of Numerical Methods 39

consider the case where, for each n, the functional relation (2.1) deļ¬nes a

mapping Gn between the sets of the numerical data and the solutions

xn = Gn (dn ), that is Fn (Gn (dn ), dn ) = 0. (2.18)

Assuming that Gn is diļ¬erentiable, we can obtain from (2.17)

dn

Kn (dn ) Gn (dn ) , Kabs,n (dn ) Gn (dn ) . (2.19)

Gn (dn )

Example 2.6 (Sum and subtraction) The function f : R2 ā’ R, f (a, b) =

a + b, is a linear mapping whose gradient is the vector f (a, b) = (1, 1)T . Using

the vector norm Ā· 1 deļ¬ned in (1.13) yields K(a, b) (|a| + |b|)/(|a + b|), from

which it follows that summing two numbers of the same sign is a well conditioned

operation, being K(a, b) 1. On the other hand, subtracting two numbers almost

equal is ill conditioned, since |a + b| |a| + |b|. This fact, already pointed out in

Example 2.2, leads to the cancellation of signiļ¬cant digits whenever numbers can

be represented using only a ļ¬nite number of digits (as in ļ¬‚oating-point arithmetic,

ā¢

see Section 2.5).

Example 2.7 Consider again the problem of computing the roots of a polyno-

mial of second degree analyzed in Example 2.2. When p > 1 (separated roots),

such a problem is well conditioned. However, we generate an unstable algorithm

if we evaluate the root xā’ by the formula xā’ = p ā’ p2 ā’ 1. This formula is

indeed subject to errors due to numerical cancellation of signiļ¬cant digits (see

Section 2.4) that are introduced by the ļ¬nite arithmetic of the computer. A pos-

sible remedy to this trouble consists of computing x+ = p + p2 ā’ 1 at ļ¬rst,

then xā’ = 1/x+ . Alternatively, one can solve F (x, p) = x2 ā’ 2px + 1 = 0 using

Newtonā™s method (proposed in Example 2.5)

xn = xnā’1 ā’ (x2 ā’ 2pxnā’1 + 1)/(2xnā’1 ā’ 2p) = fn (p), n ā„ 1, x0 given.

nā’1

|p|/|xn ā’ p|. To compute K num (p)

Applying (2.19) for p > 1 yields Kn (p)

we notice that, in the case when the algorithm converges, the solution xn would

converge to one of the roots x+ or xā’ ; therefore, |xn ā’ p| ā’ p2 ā’ 1 and thus

Kn (p) ā’ K num (p) |p|/ p2 ā’ 1, in perfect agreement with the value (2.8) of

the condition number of the exact problem.

We can conclude that Newtonā™s method for the search of simple roots of a

second order algebraic equation is ill conditioned if |p| is very close to 1, while it

ā¢

is well conditioned in the other cases.

The ļ¬nal goal of numerical approximation is, of course, to build, through

numerical problems of the type (2.12), solutions xn that āget closerā to the

solution of problem (2.1) as much as n gets larger. This concept is made

precise in the next deļ¬nition.

Deļ¬nition 2.2 The numerical method (2.12) is convergent iļ¬

āĪµ > 0 ān0 (Īµ), āĪ“(n0 , Īµ) > 0 :

(2.20)

ān > n0 (Īµ), ā Ī“dn < Ī“(n0 , Īµ) ā’ x(d) ā’ xn (d + Ī“dn ) ā¤ Īµ,

40 2. Principles of Numerical Mathematics

where d is an admissible datum for the problem (2.1), x(d) is the corre-

sponding solution and xn (d + Ī“dn ) is the solution of the numerical problem

(2.12) with datum d + Ī“dn .

To verify the implication (2.20) it suļ¬ces to check that under the same

assumptions

Īµ

x(d + Ī“dn ) ā’ xn (d + Ī“dn ) ā¤ . (2.21)

2

Indeed, thanks to (2.3) we have

x(d) ā’ xn (d + Ī“dn ) ā¤ x(d) ā’ x(d + Ī“dn )

+ x(d + Ī“dn ) ā’ xn (d + Ī“dn ) ā¤ K(Ī“(n0 , Īµ), d) Ī“dn + 2 .

Īµ

Īµ

Choosing Ī“dn such that K(Ī“(n0 , Īµ), d) Ī“dn < 2 , one obtains (2.20).

Measures of the convergence of xn to x are given by the absolute error

or the relative error, respectively deļ¬ned as

|x ā’ xn |

E(xn ) = |x ā’ xn |, Erel (xn ) = , (if x = 0). (2.22)

|x|

In the cases where x and xn are matrix or vector quantities, in addition

to the deļ¬nitions in (2.22) (where the absolute values are substituted by

suitable norms) it is sometimes useful to introduce the error by component

deļ¬ned as

|(x ā’ xn )ij |

c

Erel (xn ) = max . (2.23)

|xij |

i,j

2.2.1 Relations between Stability and Convergence

The concepts of stability and convergence are strongly connected.

First of all, if problem (2.1) is well posed, a necessary condition in order

for the numerical problem (2.12) to be convergent is that it is stable.

Let us thus assume that the method is convergent, and prove that it is

stable by ļ¬nding a bound for Ī“xn . We have

xn (d + Ī“dn ) ā’ xn (d) ā¤ xn (d) ā’ x(d)

Ī“xn =

x(d) ā’ x(d + Ī“dn ) + x(d + Ī“dn ) ā’ xn (d + Ī“dn )

+ (2.24)

ā¤ K(Ī“(n0 , Īµ), d) Ī“dn + Īµ,

having used (2.3) and (2.21) twice. From (2.24) we can conclude that, for n

suļ¬ciently large, Ī“xn / Ī“dn can be bounded by a constant of the order

2.3 A priori and a posteriori Analysis 41

of K(Ī“(n0 , Īµ), d), so that the method is stable. Thus, we are interested in

stable numerical methods since only these can be convergent.

The stability of a numerical method becomes a suļ¬cient condition for

the numerical problem (2.12) to converge if this latter is also consistent

with problem (2.1). Indeed, under these assumptions we have

x(d + Ī“dn ) ā’ xn (d + Ī“dn ) ā¤ x(d + Ī“dn ) ā’ x(d)

x(d) ā’ xn (d) + xn (d) ā’ xn (d + Ī“dn ) .

+

Thanks to (2.3), the ļ¬rst term at right-hand side can be bounded by Ī“dn

(up to a multiplicative constant independent of Ī“dn ). A similar bound holds

for the third term, due to the stability property (2.16). Finally, concerning

the remaining term, if Fn is diļ¬erentiable with respect to the variable x,

an expansion in a Taylor series gives

ā‚Fn

Fn (x(d), d) ā’ Fn (xn (d), d) = |(x,d) (x(d) ā’ xn (d)),

ā‚x

for a suitable x ābetweenā x(d) and xn (d). Assuming also that ā‚Fn /ā‚x is

invertible, we get

ā’1

ā‚Fn

x(d) ā’ xn (d) = [Fn (x(d), d) ā’ Fn (xn (d), d)]. (2.25)

ā‚x |(x,d)

On the other hand, replacing Fn (xn (d), d) with F (x(d), d) (since both terms

are equal to zero) and passing to the norms, we ļ¬nd

ā’1

ā‚Fn

x(d) ā’ xn (d) ā¤ Fn (x(d), d) ā’ F (x(d), d) .

ā‚x |(x,d)

Thanks to (2.13) we can thus conclude that x(d)ā’xn (d) ā’ 0 for n ā’ ā.

The result that has just been proved, although stated in qualitative terms,

is a milestone in numerical analysis, known as equivalence theorem (or

Lax-Richtmyer theorem): āfor a consistent numerical method, stability is

equivalent to convergenceā. A rigorous proof of this theorem is available in

[Dah56] for the case of linear Cauchy problems, or in [Lax65] and in [RM67]

for linear well-posed initial value problems.

2.3 A priori and a posteriori Analysis

The stability analysis of a numerical method can be carried out following

diļ¬erent strategies:

1. forward analysis, which provides a bound to the variations Ī“xn on

the solution due to both perturbations in the data and to errors that

are intrinsic to the numerical method;

42 2. Principles of Numerical Mathematics

2. backward analysis, which aims at estimating the perturbations that

should be āimpressedā to the data of a given problem in order to

obtain the results actually computed under the assumption of working

in exact arithmetic. Equivalently, given a certain computed solution

xn , backward analysis looks for the perturbations Ī“dn on the data

such that Fn (xn , dn + Ī“dn ) = 0. Notice that, when performing such

an estimate, no account at all is taken into the way xn has been

obtained (that is, which method has been employed to generate it).

Forward and backward analyses are two diļ¬erent instances of the so

called a priori analysis. This latter can be applied to investigate not only

the stability of a numerical method, but also its convergence. In this case

it is referred to as a priori error analysis, which can again be performed

using either a forward or a backward technique.

A priori error analysis is distincted from the so called a posteriori error

analysis, which aims at producing an estimate of the error on the grounds

of quantities that are actually computed by a speciļ¬c numerical method.

Typically, denoting by xn the computed numerical solution, approximation

to the solution x of problem (2.1), the a posteriori error analysis aims at

evaluating the error x ā’ xn as a function of the residual rn = F (xn , d) by

means of constants that are called stability factors (see [EEHJ96]).

Example 2.8 For the sake of illustration, consider the problem of ļ¬nding the

zeros Ī±1 , . . . , Ī±n of a polynomial pn (x) = n ak xk of degree n.

k=0

Denoting by pn (x) = n ak xk a perturbed polynomial whose zeros are Ī±i ,

Ė Ė Ė

k=0

forward analysis aims at estimating the error between two corresponding zeros

Ī±i and Ī±i , in terms of the variations on the coeļ¬cients ak ā’ ak , k = 0, 1, . . . , n.

Ė Ė

On the other hand, let {Ī±i } be the approximate zeros of pn (computed some-

Ė

how). Backward analysis provides an estimate of the perturbations Ī“ak which

should be impressed to the coeļ¬cients so that n (ak + Ī“ak )Ī±i = 0, for a ļ¬xed

Ėk

k=0

Ī±i . The goal of a posteriori error analysis would rather be to provide an estimate

Ė

of the error Ī±i ā’ Ī±i as a function of the residual value pn (Ī±i ).

Ė Ė

ā¢

This analysis will be carried out in Section 6.1.

Example 2.9 Consider the linear system Ax=b, where Aā RnĆ—n is a nonsin-

gular matrix.

Ė

Ėx

For the perturbed system AĖ = b, forward analysis provides an estimate of

Ė

Ė

the error x ā’ x in terms of A ā’ A and b ā’ b, while backward analysis estimates

Ė

the perturbations Ī“A = (Ī“aij ) and Ī“b = (Ī“bi ) which should be impressed to the

entries of A and b in order to get (A + Ī“A)xn = b + Ī“b, xn being the solution of

the linear system (computed somehow). Finally, a posteriori error analysis looks

for an estimate of the error x ā’ xn as a function of the residual rn = b ā’ Axn .

ā¢

We will develop this analysis in Section 3.1.

It is important to point out the role played by the a posteriori analysis in

devising strategies for adaptive error control. These strategies, by suitably

changing the discretization parameters (for instance, the spacing between

2.4 Sources of Error in Computational Models 43

nodes in the numerical integration of a function or a diļ¬erential equation),

employ the a posteriori analysis in order to ensure that the error does not

exceed a ļ¬xed tolerance.

A numerical method that makes use of an adaptive error control is called

adaptive numerical method. In practice, a method of this kind applies in the

computational process the idea of feedback, by activating on the grounds of

a computed solution a convergence test which ensures the control of error

within a ļ¬xed tolerance. In case the convergence test fails, a suitable strat-

egy for modifying the discretization parameters is automatically adopted

in order to enhance the accuracy of the solution to be newly computed,

and the overall procedure is iterated until the convergence check is passed.

2.4 Sources of Error in Computational Models

Whenever the numerical problem (2.12) is an approximation to the math-

ematical problem (2.1) and this latter is in turn a model of a physical

problem (which will be shortly denoted by PP), we shall say that (2.12) is

a computational model for PP.

In this process the global error, denoted by e, is expressed by the dif-

ference between the actually computed solution, xn , and the physical so-

lution, xph , of which x provides a model. The global error e can thus be

interpreted as being the sum of the error em of the mathematical model,

given by x ā’ xph , and the error ec of the computational model, xn ā’ x, that

is e = em + ec (see Figure 2.1).

PP : xph

em e

ec

xn

F (x, d) = 0

en ea

Fn (xn , dn ) = 0

FIGURE 2.1. Errors in computational models

The error em will in turn take into account the error of the mathematical

model in strict sense (that is, the extent at which the functional equation

(2.1) does realistically describe the problem PP) and the error on the data

(that is, how much accurately does d provide a measure of the real physical

44 2. Principles of Numerical Mathematics

data). In the same way, ec turns out to be the combination of the numerical

discretization error en = xn ā’ x, the error ea introduced by the numerical

algorithm and the roundoļ¬ error introduced by the computer during the

actual solution of problem (2.12) (see Section 2.5).

In general, we can thus outline the following sources of error:

1. errors due to the model, that can be controlled by a proper choice of

the mathematical model;

2. errors in the data, that can be reduced by enhancing the accuracy in

the measurement of the data themselves;

3. truncation errors, arising from having replaced in the numerical model

limits by operations that involve a ļ¬nite number of steps;

4. rounding errors.

The errors at the items 3. and 4. give rise to the computational error. A

numerical method will thus be convergent if this error can be made arbi-

trarily small by increasing the computational eļ¬ort. Of course, convergence

is the primary, albeit not unique, goal of a numerical method, the others

being accuracy, reliability and eļ¬ciency.

Accuracy means that the errors are small with respect to a ļ¬xed tol-

erance. It is usually quantiļ¬ed by the order of inļ¬nitesimal of the error

en with respect to the discretization characteristic parameter (for instance

the largest grid spacing between the discretization nodes). By the way, we

notice that machine precision does not limit, on theoretical grounds, the

accuracy.

Reliability means it is likely that the global error can be guaranteed to be

below a certain tolerance. Of course, a numerical model can be considered

to be reliable only if suitably tested, that is, successfully applied to several

test cases.

Eļ¬ciency means that the computational complexity that is needed to

control the error (that is, the amount of operations and the size of the

memory required) is as small as possible.

Having encountered the term algorithm several times in this section, we

cannot refrain from providing an intuitive description of it. By algorithm

we mean a directive that indicates, through elementary operations, all the

passages that are needed to solve a speciļ¬c problem. An algorithm can in

turn contain sub-algorithms and must have the feature of terminating after

a ļ¬nite number of elementary operations. As a consequence, the executor

of the algorithm (machine or human being) must ļ¬nd within the algorithm

itself all the instructions to completely solve the problem at hand (provided

that the necessary resources for its execution are available).

For instance, the statement that a polynomial of second degree surely

admits two roots in the complex plane does not characterize an algorithm,

2.5 Machine Representation of Numbers 45

whereas the formula yielding the roots is an algorithm (provided that the

sub-algorithms needed to correctly execute all the operations have been

deļ¬ned in turn).

Finally, the complexity of an algorithm is a measure of its executing

time. Calculating the complexity of an algorithm is therefore a part of the

analysis of the eļ¬ciency of a numerical method. Since several algorithms,

with diļ¬erent complexities, can be employed to solve the same problem P ,

it is useful to introduce the concept of complexity of a problem, this latter

meaning the complexity of the algorithm that has minimum complexity

among those solving P . The complexity of a problem is typically measured

by a parameter directly associated with P . For instance, in the case of

the product of two square matrices, the computational complexity can be

expressed as a function of a power of the matrix size n (see, [Str69]).

2.5 Machine Representation of Numbers

Any machine operation is aļ¬ected by rounding errors or roundoļ¬. They are

due to the fact that on a computer only a ļ¬nite subset of the set of real

numbers can be represented. In this section, after recalling the positional

notation of real numbers, we introduce their machine representation.

2.5.1 The Positional System

ńņš. 2 |