<<

. 2
( 26)



>>

i = 1, . . . , n. As far as the rank is concerned, if

σ1 ≥ . . . ≥ σr > σr+1 = . . . = σp = 0,

then the rank of A is r, the kernel of A is the span of the column vectors
of V, {vr+1 , . . . , vn }, and the range of A is the span of the column vectors
of U, {u1 , . . . , ur }.

De¬nition 1.15 Suppose that A∈ Cm—n has rank equal to r and that it
admits a SVD of the type UH AV = Σ. The matrix A† = VΣ† UH is called
the Moore-Penrose pseudo-inverse matrix, being
1 1
Σ† = diag , . . . , , 0, . . . , 0 . (1.11)
σ1 σr


The matrix A† is also called the generalized inverse of A (see Exercise 13).
Indeed, if rank(A) = n < m, then A† = (AT A)’1 AT , while if n = m =
rank(A), A† = A’1 . For further properties of A† , see also Exercise 12.


1.10 Scalar Product and Norms in Vector Spaces
Very often, to quantify errors or measure distances one needs to compute
the magnitude of a vector or a matrix. For that purpose we introduce in
this section the concept of a vector norm and, in the following one, of a
matrix norm. We refer the reader to [Ste73], [SS90] and [Axe94] for the
proofs of the properties that are reported hereafter.
18 1. Foundations of Matrix Analysis

De¬nition 1.16 A scalar product on a vector space V de¬ned over K
is any map (·, ·) acting from V — V into K which enjoys the following
properties:

1. it is linear with respect to the vectors of V, that is

(γx + »z, y) = γ(x, y) + »(z, y), ∀x, z ∈ V, ∀γ, » ∈ K;

2. it is hermitian, that is, (y, x) = (x, y), ∀x, y ∈ V ;

3. it is positive de¬nite, that is, (x, x) > 0, ∀x = 0 (in other words,
(x, x) ≥ 0, and (x, x) = 0 if and only if x = 0).



In the case V = Cn (or Rn ), an example is provided by the classical Eu-
clidean scalar product given by
n
H
(x, y) = y x = xi yi ,
¯
i=1

where z denotes the complex conjugate of z.
¯

Moreover, for any given square matrix A of order n and for any x, y∈ Cn
the following relation holds

(Ax, y) = (x, AH y). (1.12)

In particular, since for any matrix Q ∈ Cn—n , (Qx, Qy) = (x, QH Qy), one
gets

Property 1.8 Unitary matrices preserve the Euclidean scalar product, that
is, (Qx, Qy) = (x, y) for any unitary matrix Q and for any pair of vectors
x and y.

De¬nition 1.17 Let V be a vector space over K. We say that the map
· from V into R is a norm on V if the following axioms are satis¬ed:

1. (i) v ≥ 0 ∀v ∈ V and (ii) v = 0 if and only if v = 0;

2. ±v = |±| v ∀± ∈ K, ∀v ∈ V (homogeneity property);

3. v + w ¤ v + w ∀v, w ∈ V (triangular inequality),

where |±| denotes the absolute value of ± if K = R, the module of ± if
K = C.
1.10 Scalar Product and Norms in Vector Spaces 19

The pair (V, · ) is called a normed space. We shall distinguish among
norms by a suitable subscript at the margin of the double bar symbol. In
the case the map | · | from V into R enjoys only the properties 1(i), 2 and
3 we shall call such a map a seminorm. Finally, we shall call a unit vector
any vector of V having unit norm.
An example of a normed space is Rn , equipped for instance by the p-norm
(or H¨lder norm); this latter is de¬ned for a vector x of components {xi }
o
as
1/p
n
|xi | for 1 ¤ p < ∞.
p
x = , (1.13)
p
i=1

Notice that the limit as p goes to in¬nity of x p exists, is ¬nite, and equals
the maximum module of the components of x. Such a limit de¬nes in turn
a norm, called the in¬nity norm (or maximum norm), given by

= max |xi |.
x ∞
1¤i¤n

When p = 2, from (1.13) the standard de¬nition of Euclidean norm is
recovered
1/2
n
1/2
|xi |2
= (x, x)1/2 = = xT x
x ,
2
i=1

for which the following property holds.

Property 1.9 (Cauchy-Schwarz inequality) For any pair x, y ∈ Rn ,

|(x, y)| = |xT y| ¤ x y 2, (1.14)
2

where strict equality holds i¬ y = ±x for some ± ∈ R.
We recall that the scalar product in Rn can be related to the p-norms
introduced over Rn in (1.13) by the H¨lder inequality
o
11
|(x, y)| ¤ x y q, with + = 1.
p
pq
In the case where V is a ¬nite-dimensional space the following property
holds (for a sketch of the proof, see Exercise 14).

Property 1.10 Any vector norm · de¬ned on V is a continuous function
of its argument, namely, ∀µ > 0, ∃C > 0 such that if x ’ x ¤ µ then
| x ’ x | ¤ Cµ, for any x, x ∈ V .


New norms can be easily built using the following result.
20 1. Foundations of Matrix Analysis

Property 1.11 Let · be a norm of Rn and A ∈ Rn—n be a matrix with
n linearly independent columns. Then, the function · A2 acting from Rn
into R de¬ned as

∀x ∈ Rn ,
x = Ax
A2

is a norm of Rn .

Two vectors x, y in V are said to be orthogonal if (x, y) = 0. This statement
has an immediate geometric interpretation when V = R2 since in such a
case

(x, y) = x y cos(‘),
2 2

where ‘ is the angle between the vectors x and y. As a consequence, if
(x, y) = 0 then ‘ is a right angle and the two vectors are orthogonal in the
geometric sense.


De¬nition 1.18 Two norms · p and · q on V are equivalent if there
exist two positive constants cpq and Cpq such that

¤x ¤ Cpq x ∀x ∈ V.
cpq x q p q




In a ¬nite-dimensional normed space all norms are equivalent. In particular,
if V = Rn it can be shown that for the p-norms, with p = 1, 2, and ∞, the
constants cpq and Cpq take the value reported in Table 1.1.

q=∞ q=∞
cpq q=1 q=2 Cpq q=1 q=2
n1/2
p=1 1 1 1 p=1 1 n
’1/2
n1/2
p=2 n 1 1 p=2 1 1
n’1 ’1/2
p=∞ p=∞
n 1 1 1 1
TABLE 1.1. Equivalence constants for the main norms of Rn

In this book we shall often deal with sequences of vectors and with their
convergence. For this purpose, we recall that a sequence of vectors x(k)
in a vector space V having ¬nite dimension n, converges to a vector x, and
we write lim x(k) = x if
k’∞

(k)
lim xi = xi , i = 1, . . . , n (1.15)
k’∞

(k)
where xi and xi are the components of the corresponding vectors with
respect to a basis of V . If V = Rn , due to the uniqueness of the limit of a
1.11 Matrix Norms 21

sequence of real numbers, (1.15) implies also the uniqueness of the limit, if
existing, of a sequence of vectors.
We further notice that in a ¬nite-dimensional space all the norms are topo-
logically equivalent in the sense of convergence, namely, given a sequence
of vectors x(k) ,

|||x(k) ||| ’ 0 ” x(k) ’ 0 if k ’ ∞,

where ||| · ||| and · are any two vector norms. As a consequence, we can
establish the following link between norms and limits.

·
Property 1.12 Let be a norm in a space ¬nite dimensional space V .
Then
lim x(k) = x ” lim x ’ x(k) = 0,
k’∞ k’∞

where x ∈ V and x(k) is a sequence of elements of V .


1.11 Matrix Norms
De¬nition 1.19 A matrix norm is a mapping · : Rm—n ’ R such that:
1. A ≥ 0 ∀A ∈ Rm—n and A = 0 if and only if A = 0;
2. ±A = |±| A ∀± ∈ R, ∀A ∈ Rm—n (homogeneity);
3. A + B ¤ A + B ∀A, B ∈ Rm—n (triangular inequality).




Unless otherwise speci¬ed we shall employ the same symbol · , to denote
matrix norms and vector norms.
We can better characterize the matrix norms by introducing the concepts
of compatible norm and norm induced by a vector norm.

De¬nition 1.20 We say that a matrix norm · is compatible or consistent
with a vector norm · if

Ax ¤ A ∀x ∈ Rn .
x, (1.16)

More generally, given three norms, all denoted by · , albeit de¬ned on
Rm , Rn and Rm—n , respectively, we say that they are consistent if ∀x ∈ Rn ,
Ax = y ∈ Rm , A ∈ Rm—n , we have that y ¤ A x .
In order to single out matrix norms of practical interest, the following
property is in general required
22 1. Foundations of Matrix Analysis

·
De¬nition 1.21 We say that a matrix norm is sub-multiplicative if
∀A ∈ Rn—m , ∀B ∈ Rm—q

AB ¤ A B. (1.17)



This property is not satis¬ed by any matrix norm. For example (taken from
[GL89]), the norm A ∆ = max |aij | for i = 1, . . . , n, j = 1, . . . , m does
not satisfy (1.17) if applied to the matrices

1 1
A=B= ,
1 1

since 2 = AB ∆ > A ∆ B ∆ = 1.
Notice that, given a certain sub-multiplicative matrix norm · ± , there
always exists a consistent vector norm. For instance, given any ¬xed vector
y = 0 in Cn , it su¬ces to de¬ne the consistent vector norm as

x ∈ Cn .
x = xyH ±

As a consequence, in the case of sub-multiplicative matrix norms it is no
longer necessary to explicitly specify the vector norm with respect to the
matrix norm is consistent.

Example 1.7 The norm

n
|aij |2 = tr(AAH )
A = (1.18)
F
i,j=1

2
is a matrix norm called the Frobenius norm (or Euclidean norm in Cn ) and is
compatible with the Euclidean vector norm · 2 . Indeed,
2
n n n n n
¤ |aij | |xj |2
2 2 2
x 2.
Ax = aij xj =A
2 F 2
i=1 j=1 i=1 j=1 j=1



Notice that for such a norm In = n.
F




In view of the de¬nition of a natural norm, we recall the following theorem.

Theorem 1.1 Let · be a vector norm. The function

Ax
A = sup (1.19)
x
x=0

is a matrix norm called induced matrix norm or natural matrix norm.
1.11 Matrix Norms 23

Proof. We start by noticing that (1.19) is equivalent to
A = sup Ax . (1.20)
x =1

Indeed, one can de¬ne for any x = 0 the unit vector u = x/ x , so that (1.19)
becomes
A = sup Au = Aw with w = 1.
u =1

This being taken as given, let us check that (1.19) (or, equivalently, (1.20)) is
actually a norm, making direct use of De¬nition 1.19.
1. If Ax ≥ 0, then it follows that A = sup Ax ≥ 0. Moreover
x =1

Ax
= 0 ” Ax = 0 ∀x = 0
A = sup
x
x=0

and Ax = 0 ∀x = 0 if and only if A=0; therefore A = 0 ” A = 0.
2. Given a scalar ±,
±A = sup ±Ax = |±| sup Ax = |±| A .
x =1 x =1

3. Finally, triangular inequality holds. Indeed, by de¬nition of supremum, if
x = 0 then
Ax
¤A ’ Ax ¤ A x,
x
so that, taking x with unit norm, one gets
(A + B)x ¤ Ax + Bx ¤ A + B ,

from which it follows that A + B = sup (A + B)x ¤ A + B .
x =1
3
Relevant instances of induced matrix norms are the so-called p-norms de-
¬ned as
Ax p
A = sup
p
xp
x=0

The 1-norm and the in¬nity norm are easily computable since
m n
|aij |, |aij |
A = max A = max

1
j=1,... ,n i=1,... ,m
i=1 j=1

and they are called the column sum norm and the row sum norm, respec-
tively.
Moreover, we have A 1 = AT ∞ and, if A is self-adjoint or real sym-
metric, A 1 = A ∞ .
A special discussion is deserved by the 2-norm or spectral norm for which
the following theorem holds.
24 1. Foundations of Matrix Analysis

Theorem 1.2 Let σ1 (A) be the largest singular value of A. Then

ρ(AH A) = ρ(AAH ) = σ1 (A).
A = (1.21)
2


In particular, if A is hermitian (or real and symmetric), then

A = ρ(A), (1.22)
2

while, if A is unitary, A = 1.
2

Proof. Since AH A is hermitian, there exists a unitary matrix U such that
UH AH AU = diag(µ1 , . . . , µn ),

where µi are the (positive) eigenvalues of AH A. Let y = UH x, then

(AH Ax, x) (UH AH AUy, y)
A = sup = sup
2
(x, x) (y, y)
x=0 y=0

n n
µi |yi | / |yi |2 = max |µi |,
2
= sup
i=1,... ,n
y=0
i=1 i=1


from which (1.21) follows, thanks to (1.10).
If A is hermitian, the same considerations as above apply directly to A.
Finally, if A is unitary
2
= (Ax, Ax) = (x, AH Ax) = x 2
Ax 2 2


3
so that A = 1.
2


As a consequence, the computation of A 2 is much more expensive than
that of A ∞ or A 1 . However, if only an estimate of A 2 is required,
the following relations can be pro¬tably employed in the case of square
matrices
max|aij | ¤ A ¤ n max|aij |,2
i,j i,j

A ∞ ¤ A 2 ¤ n A ∞,
1

n

A 1 ¤ A 2 ¤ n A 1,
1

n

¤
A A A ∞.
2 1

For other estimates of similar type we refer to Exercise 17. Moreover, if A
is normal then A 2 ¤ A p for any n and all p ≥ 2.

Theorem 1.3 Let ||| · ||| be a matrix norm induced by a vector norm ·.
Then
1. Ax ¤ |||A||| x , that is, ||| · ||| is a norm compatible with ·;
1.11 Matrix Norms 25

2. |||I||| = 1;
3. |||AB||| ¤ |||A||| |||B|||, that is, ||| · ||| is sub-multiplicative.
Proof. Part 1 of the theorem is already contained in the proof of Theorem 1.1,
while part 2 follows from the fact that |||I||| = sup Ix / x = 1. Part 3 is simple
x=0
3
to check.

Notice that the p-norms are sub-multiplicative. Moreover, we remark that
the sub-multiplicativity property by itself would only allow us to conclude
that |||I||| ≥ 1. Indeed, |||I||| = |||I · I||| ¤ |||I|||2 .


1.11.1 Relation between Norms and the Spectral Radius of a
Matrix
We next recall some results that relate the spectral radius of a matrix to
matrix norms and that will be widely employed in Chapter 4.

·
Theorem 1.4 Let be a consistent matrix norm; then

ρ(A) ¤ A ∀A ∈ Cn—n .

Proof. Let » be an eigenvalue of A and v = 0 an associated eigenvector. As a
·
consequence, since is consistent, we have

|»| v = »v = Av ¤ A v

so that |»| ¤ A . 3

More precisely, the following property holds (see for the proof [IK66], p.
12, Theorem 3).

Property 1.13 Let A ∈ Cn—n and µ > 0. Then, there exists a consistent
matrix norm · A,µ (depending on µ) such that

¤ ρ(A) + µ.
A A,µ

As a result, having ¬xed an arbitrarily small tolerance, there always exists
a matrix norm which is arbitrarily close to the spectral radius of A, namely

ρ(A) = inf A , (1.23)
·

the in¬mum being taken on the set of all the consistent norms.
For the sake of clarity, we notice that the spectral radius is a sub-
multiplicative seminorm, since it is not true that ρ(A) = 0 i¬ A = 0.
As an example, any triangular matrix with null diagonal entries clearly has
spectral radius equal to zero. Moreover, we have the following result.
26 1. Foundations of Matrix Analysis

Property 1.14 Let A be a square matrix and let · be a consistent norm.
Then

lim Am 1/m
= ρ(A).
m’∞



1.11.2 Sequences and Series of Matrices
∈ Rn—n is said to converge to a matrix
A(k)
A sequence of matrices
A ∈ Rn—n if

lim A(k) ’ A = 0.
k’∞

The choice of the norm does not in¬‚uence the result since in Rn—n all norms
are equivalent. In particular, when studying the convergence of iterative
methods for solving linear systems (see Chapter 4), one is interested in the
so-called convergent matrices for which

lim Ak = 0,
k’∞

0 being the null matrix. The following theorem holds.

Theorem 1.5 Let A be a square matrix; then

lim Ak = 0 ” ρ(A) < 1. (1.24)
k’∞


Ak is convergent i¬ ρ(A) < 1. In such a
Moreover, the geometric series
k=0
case

Ak = (I ’ A)’1 . (1.25)
k=0

As a result, if ρ(A) < 1 the matrix I ’ A is invertible and the following
inequalities hold
1 1
¤ (I ’ A)’1 ¤ (1.26)
1’ A
1+ A

·
where is an induced matrix norm such that A < 1.
Proof. Let us prove (1.24). Let ρ(A) < 1, then ∃µ > 0 such that ρ(A) < 1 ’ µ
and thus, thanks to Property 1.13, there exists a consistent matrix norm · such
that A ¤ ρ(A) + µ < 1. From the fact that Ak ¤ A k < 1 and from the
de¬nition of convergence it turns out that as k ’ ∞ the sequence Ak tends
to zero. Conversely, assume that lim Ak = 0 and let » denote an eigenvalue of
k’∞
k k
A. Then, A x = » x, being x(=0) an eigenvector associated with », so that
1.12 Positive De¬nite, Diagonally Dominant and M-matrices 27

lim »k = 0. As a consequence, |»| < 1 and because this is true for a generic
k’∞
eigenvalue one gets ρ(A) < 1 as desired. Relation (1.25) can be obtained noting
¬rst that the eigenvalues of I’A are given by 1 ’ »(A), »(A) being the generic
eigenvalue of A. On the other hand, since ρ(A) < 1, we deduce that I’A is
nonsingular. Then, from the identity

(I ’ A)(I + A + . . . + An ) = (I ’ An+1 )

and taking the limit for n tending to in¬nity the thesis follows since

(I ’ A) Ak = I.
k=0


Finally, thanks to Theorem 1.3, the equality I = 1 holds, so that

(I ’ A)’1 ¤ (1 + A ) (I ’ A)’1 ,
1= I ¤ I’A

giving the ¬rst inequality in (1.26). As for the second part, noting that I =
I’A+A and multiplying both sides on the right by (I’A)’1 , one gets (I’A)’1 =
I + A(I ’ A)’1 . Passing to the norms, we obtain

(I ’ A)’1 ¤ 1 + A (I ’ A)’1 ,

3
and thus the second inequality, since A < 1.

Remark 1.1 The assumption that there exists an induced matrix norm
such that A < 1 is justi¬ed by Property 1.13, recalling that A is conver-
gent and, therefore, ρ(A) < 1.

Notice that (1.25) suggests an algorithm to approximate the inverse of a
matrix by a truncated series expansion.



1.12 Positive De¬nite, Diagonally Dominant and
M-matrices
De¬nition 1.22 A matrix A ∈ Cn—n is positive de¬nite in Cn if the num-
ber (Ax, x) is real and positive ∀x ∈ Cn , x = 0. A matrix A ∈ Rn—n is
positive de¬nite in Rn if (Ax, x) > 0 ∀x ∈ Rn , x = 0. If the strict in-
equality is substituted by the weak one (≥) the matrix is called positive
semide¬nite.

Example 1.8 Matrices that are positive de¬nite in Rn are not necessarily sym-
metric. An instance is provided by matrices of the form

2 ±
A= (1.27)
’2 ’ ± 2
28 1. Foundations of Matrix Analysis

for ± = ’1. Indeed, for any non null vector x = (x1 , x2 )T in R2

(Ax, x) = 2(x2 + x2 ’ x1 x2 ) > 0.
1 2

Notice that A is not positive de¬nite in C2 . Indeed, if we take a complex vector

x we ¬nd out that the number (Ax, x) is not real-valued in general.

De¬nition 1.23 Let A ∈ Rn—n . The matrices
1 1
(A + AT ), ASS = (A ’ AT )
AS =
2 2
are respectively called the symmetric part and the skew-symmetric part
of A. Obviously, A = AS + ASS . If A ∈ Cn—n , the de¬nitions modify as
follows: AS = 1 (A + AH ) and ASS = 1 (A ’ AH ).
2 2

The following property holds

Property 1.15 A real matrix A of order n is positive de¬nite i¬ its sym-
metric part AS is positive de¬nite.
Indeed, it su¬ces to notice that, due to (1.12) and the de¬nition of ASS ,
xT ASS x = 0 ∀x ∈ Rn . For instance, the matrix in (1.27) has a positive
de¬nite symmetric part, since

2 ’1
1
(A + AT ) =
AS = .
’1 2
2

This holds more generally (for the proof see [Axe94]).

Property 1.16 Let A ∈ Cn—n (respectively, A ∈ Rn—n ); if (Ax, x) is real-
valued ∀x ∈ Cn , then A is hermitian (respectively, symmetric).
An immediate consequence of the above results is that matrices that are
positive de¬nite in Cn do satisfy the following characterizing property.

Property 1.17 A square matrix A of order n is positive de¬nite in Cn
i¬ it is hermitian and has positive eigenvalues. Thus, a positive de¬nite
matrix is nonsingular.
In the case of positive de¬nite real matrices in Rn , results more speci¬c
than those presented so far hold only if the matrix is also symmetric (this is
the reason why many textbooks deal only with symmetric positive de¬nite
matrices). In particular

Property 1.18 Let A ∈ Rn—n be symmetric. Then, A is positive de¬nite
i¬ one of the following properties is satis¬ed:
1. (Ax, x) > 0 ∀x = 0 with x∈ Rn ;
1.12 Positive De¬nite, Diagonally Dominant and M-matrices 29

2. the eigenvalues of the principal submatrices of A are all positive;
3. the dominant principal minors of A are all positive (Sylvester crite-
rion);
4. there exists a nonsingular matrix H such that A = HT H.
All the diagonal entries of a positive de¬nite matrix are positive. Indeed,
if ei is the i-th vector of the canonical basis of Rn , then eT Aei = aii > 0.
i
Moreover, it can be shown that if A is symmetric positive de¬nite, the
entry with the largest module must be a diagonal entry (these last two
properties are therefore necessary conditions for a matrix to be positive
de¬nite).
We ¬nally notice that if A is symmetric positive de¬nite and A1/2 is
the only positive de¬nite matrix that is a solution of the matrix equation
X2 = A, the norm
= A1/2 x = (Ax, x)1/2
x (1.28)
A 2

de¬nes a vector norm, called the energy norm of the vector x. Related to
the energy norm is the energy scalar product given by (x, y)A = (Ax, y).

De¬nition 1.24 A matrix A∈ Rn—n is called diagonally dominant by rows
if
n
|aii | ≥ |aij |, with i = 1, . . . , n,
j=1,j=i

while it is called diagonally dominant by columns if
n
|aii | ≥ |aji |, with i = 1, . . . , n.
j=1,j=i

If the inequalities above hold in a strict sense, A is called strictly diagonally
dominant (by rows or by columns, respectively).
A strictly diagonally dominant matrix that is symmetric with positive di-
agonal entries is also positive de¬nite.

De¬nition 1.25 A nonsingular matrix A ∈ Rn—n is an M-matrix if aij ¤ 0
for i = j and if all the entries of its inverse are nonnegative.
M-matrices enjoy the so-called discrete maximum principle, that is, if A is
an M-matrix and Ax ¤ 0, then x ¤ 0 (where the inequalities are meant
componentwise). In this connection, the following result can be useful.

Property 1.19 (M-criterion) Let a matrix A satisfy aij ¤ 0 for i = j.
Then A is an M-matrix if and only if there exists a vector w > 0 such that
Aw > 0.
30 1. Foundations of Matrix Analysis

Finally, M-matrices are related to strictly diagonally dominant matrices
by the following property.

Property 1.20 A matrix A ∈ Rn—n that is strictly diagonally dominant
by rows and whose entries satisfy the relations aij ¤ 0 for i = j and aii > 0,
is an M-matrix.
For further results about M-matrices, see for instance [Axe94] and [Var62].


1.13 Exercises
1. Let W1 and W2 be two subspaces of Rn . Prove that if V = W1 • W2 , then
dim(V ) = dim(W1 ) + dim(W2 ), while in general

dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) ’ dim(W1 © W2 ).

[Hint : Consider a basis for W1 © W2 and ¬rst extend it to W1 , then to
W2 , verifying that the basis formed by the set of the obtained vectors is a
basis for the sum space.]
2. Check that the following set of vectors

vi = xi’1 , xi’1 , . . . , xi’1 , i = 1, 2, . . . , n,
n
1 2

forms a basis for Rn , x1 , . . . , xn being a set of n distinct points of R.
3. Exhibit an example showing that the product of two symmetric matrices
may be nonsymmetric.
4. Let B be a skew-symmetric matrix, namely, BT = ’B. Let A = (I + B)(I ’
B)’1 and show that A’1 = AT .
5. A matrix A ∈ Cn—n is called skew-hermitian if AH = ’A. Show that the
diagonal entries of A must be purely imaginary numbers.
6. Let A, B and A+B be invertible matrices of order n. Show that also A’1 +
B’1 is nonsingular and that
’1
A’1 + B’1 = A (A + B)’1 B = B (A + B)’1 A.
’1 ’1
= A (B + A)’1 B. The sec-
[Solution : A’1 + B’1 = A I + B’1 A
ond equality is proved similarly by factoring out B and A, respectively from
left and right.]
7. Given the non symmetric real matrix
® 
0 1 1
A=° 1 ’1 » ,
0
’1 ’1 0

check that it is similar to the diagonal matrix D = diag(1, 0, ’1) and ¬nd
its eigenvectors. Is this matrix normal?
[Solution : the matrix is not normal.]
1.13 Exercises 31

n
ck Ak and
8. Let A be a square matrix of order n. Check that if P (A) =
k=0
»(A) are the eigenvalues of A, then the eigenvalues of P (A) are given by
»(P (A)) = P (»(A)). In particular, prove that ρ(A2 ) = [ρ(A)]2 .
9. Prove that a matrix of order n having n distinct eigenvalues cannot be
defective. Moreover, prove that a normal matrix cannot be defective.
10. Commutativity of matrix product. Show that if A and B are square matri-
ces that share the same set of eigenvectors, then AB = BA. Prove, by a
counterexample, that the converse is false.
11. Let A be a normal matrix whose eigenvalues are »1 , . . . , »n . Show that the
singular values of A are |»1 |, . . . , |»n |.
12. Let A ∈ Cm—n with rank(A) = n. Show that A† = (AT A)’1 AT enjoys the
following properties

(1) A† A = In ; (2) A† AA† = A† , AA† A = A; (3) if m = n, A† = A’1 .

13. Show that the Moore-Penrose pseudo-inverse matrix A† is the only matrix
that minimizes the functional

AX ’ Im
min F,
X∈Cn—m


·
where is the Frobenius norm.
F

14. Prove Property 1.10.
[Solution : For any x, x ∈ V show that | x ’ x | ¤ x ’ x . Assuming
that dim(V ) = n and expanding the vector w = x ’ x on a basis of V,
show that w ¤ C w ∞ , from which the thesis follows by imposing in
the ¬rst obtained inequality that w ∞ ¤ µ.]
15. Prove Property 1.11 in the case A ∈ Rn—m with m linearly independent
columns.
[Hint : First show that · A ful¬lls all the properties characterizing a
norm: positiveness (A has linearly independent columns, thus if x = 0, then
Ax = 0, which proves the thesis), homogeneity and triangular inequality.]
16. Show that for a rectangular matrix A ∈ Rm—n
2 2 2
A = σ1 + . . . + σ p ,
F


where p is the minimum between m and n, σi are the singular values of A
and · F is the Frobenius norm.
17. Assuming p, q = 1, 2, ∞, F , recover the following table of equivalence con-
stants cpq such that ∀A ∈ Rn—n , A p ¤ cpq A q .

q=∞
cpq q=1 q=2 q=F
√ √
p=1 1 n n n
√ √
p=2 n 1 n 1
√ √
p=∞ n n 1 n
√ √ √
p=F n n n 1
32 1. Foundations of Matrix Analysis

18. A matrix norm for which A = |A| is called absolute norm, having
denoted by |A| the matrix of the absolute values of the entries of A. Prove
that · 1 , · ∞ and · F are absolute norms, while · 2 is not. Show
that for this latter

1
√A ¤ |A| ¤ n A 2.
2 2
n
2
Principles of Numerical Mathematics




The basic concepts of consistency, stability and convergence of a numerical
method will be introduced in a very general context in the ¬rst part of
the chapter: they provide the common framework for the analysis of any
method considered henceforth. The second part of the chapter deals with
the computer ¬nite representation of real numbers and the analysis of error
propagation in machine operations.


2.1 Well-posedness and Condition Number of a
Problem
Consider the following problem: ¬nd x such that

F (x, d) = 0 (2.1)

where d is the set of data which the solution depends on and F is the
functional relation between x and d. According to the kind of problem
that is represented in (2.1), the variables x and d may be real numbers,
vectors or functions. Typically, (2.1) is called a direct problem if F and d
are given and x is the unknown, inverse problem if F and x are known
and d is the unknown, identi¬cation problem when x and d are given while
the functional relation F is the unknown (these latter problems will not be
covered in this volume).
Problem (2.1) is well posed if it admits a unique solution x which depends
with continuity on the data. We shall use the terms well posed and stable in
34 2. Principles of Numerical Mathematics

an interchanging manner and we shall deal henceforth only with well-posed
problems.
A problem which does not enjoy the property above is called ill posed or
unstable and before undertaking its numerical solution it has to be regular-
ized, that is, it must be suitably transformed into a well-posed problem (see,
for instance [Mor84]). Indeed, it is not appropriate to pretend the numerical
method can cure the pathologies of an intrinsically ill-posed problem.

Example 2.1 A simple instance of an ill-posed problem is ¬nding the number
of real roots of a polynomial. For example, the polynomial p(x) = x4 ’ x2 (2a ’
1) + a(a ’ 1) exhibits a discontinuous variation of the number of real roots as a
continuously varies in the real ¬eld. We have, indeed, 4 real roots if a ≥ 1, 2 if
a ∈ [0, 1) while no real roots exist if a < 0. •

Continuous dependence on the data means that small perturbations on
the data d yield “small” changes in the solution x. Precisely, denoting by δd
an admissible perturbation on the data and by δx the consequent change
in the solution, in such a way that

F (x + δx, d + δd) = 0, (2.2)

then

∀· > 0, ∃K(·, d) : δd < · ’ δx ¤ K(·, d) δd . (2.3)

The norms used for the data and for the solution may not coincide, when-
ever d and x represent variables of di¬erent kinds.
With the aim of making this analysis more quantitative, we introduce the
following de¬nition.

De¬nition 2.1 For problem (2.1) we de¬ne the relative condition number
to be
δx / x
, (2.4)
K(d) = sup
δd / d
δd∈D

where D is a neighborhood of the origin and denotes the set of admissible
perturbations on the data for which the perturbed problem (2.2) still makes
sense. Whenever d = 0 or x = 0, it is necessary to introduce the absolute
condition number, given by
δx
Kabs (d) = sup . (2.5)
δd
δd∈D




Problem (2.1) is called ill-conditioned if K(d) is “big” for any admissible
datum d (the precise meaning of “small” and “big” is going to change
depending on the considered problem).
2.1 Well-posedness and Condition Number of a Problem 35

The property of a problem of being well-conditioned is independent of
the numerical method that is being used to solve it. In fact, it is possible
to generate stable as well as unstable numerical schemes for solving well-
conditioned problems. The concept of stability for an algorithm or for a
numerical method is analogous to that used for problem (2.1) and will be
made precise in the next section.

Remark 2.1 (Ill-posed problems) Even in the case in which the condi-
tion number does not exist (formally, it is in¬nite), it is not necessarily true
that the problem is ill-posed. In fact there exist well posed problems (for
instance, the search of multiple roots of algebraic equations, see Example
2.2) for which the condition number is in¬nite, but such that they can be
reformulated in equivalent problems (that is, having the same solutions)
with a ¬nite condition number.

If problem (2.1) admits a unique solution, then there necessarily exists a
mapping G, that we call resolvent, between the sets of the data and of the
solutions, such that

x = G(d), that is F (G(d), d) = 0. (2.6)

According to this de¬nition, (2.2) yields x + δx = G(d + δd). Assuming
that G is di¬erentiable in d and denoting formally by G (d) its derivative
with respect to d (if G : Rn ’ Rm , G (d) will be the Jacobian matrix of
G evaluated at the vector d), a Taylor™s expansion of G truncated at ¬rst
order ensures that

G(d + δd) ’ G(d) = G (d)δd + o( δd ) for δd ’ 0,

where · is a suitable norm for δd and o(·) is the classical in¬nitesimal
symbol denoting an in¬nitesimal term of higher order with respect to its
argument. Neglecting the in¬nitesimal of higher order with respect to δd ,
from (2.4) and (2.5) we respectively deduce that

d
K(d) G (d) , Kabs (d) G (d) , (2.7)
G(d)

the symbol · denoting the matrix norm associated with the vector norm
(de¬ned in (1.19)). The estimates in (2.7) are of great practical usefulness
in the analysis of problems in the form (2.6), as shown in the forthcoming
examples.

Example 2.2 (Algebraic equations of second degree) The solutions to the
algebraic equation x2 ’ 2px + 1 = 0, with p ≥ 1, are x± = p ± p2 ’ 1. In this
case, F (x, p) = x2 ’ 2px + 1, the datum d is the coe¬cient p, while x is the vector
of components {x+ , x’ }. As for the condition number, we notice that (2.6) holds
36 2. Principles of Numerical Mathematics

by taking G : R ’ R2 , G(p) = {x+ , x’ }. Letting G± (p) = x± , it follows that
G± (p) = 1 ± p/ p2 ’ 1. Using (2.7) with · = · 2 we get

|p|
K(p) , p > 1. (2.8)
p2 ’ 1

From (2.8) it turns out that in the case of separated roots (say, if p ≥ 2)
problem F (x, p) = 0 is well conditioned. The behavior dramatically changes in
the case of multiple roots, that is when p = 1. First of all, one notices that the
function G± (p) = p ± p2 ’ 1 is no longer di¬erentiable for p = 1, which makes
(2.8) meaningless. On the other hand, equation (2.8) shows that, for p close to
1, the problem at hand is ill conditioned. However, the problem is not ill posed.
Indeed, following Remark 2.1, it is possible to reformulate it in an equivalent
manner as F (x, t) = x2 ’ ((1 + t2 )/t)x + 1 = 0, with t = p + p2 ’ 1, whose
roots x’ = t and x+ = 1/t coincide for t = 1. The change of parameter thus
removes the singularity that is present in the former representation of the roots
as functions of p. The two roots x’ = x’ (t) and x+ = x+ (t) are now indeed
regular functions of t in the neighborhood of t = 1 and evaluating the condition
number by (2.7) yields K(t) 1 for any value of t. The transformed problem is

thus well conditioned.

Example 2.3 (Systems of linear equations) Consider the linear system Ax
= b, where x and b are two vectors in Rn , while A is the matrix (n — n) of the
real coe¬cients of the system. Suppose that A is nonsingular; in such a case x
is the unknown solution x, while the data d are the right-hand side b and the
matrix A, that is, d = {bi , aij , 1 ¤ i, j ¤ n}.
Suppose now that we perturb only the right-hand side b. We have d = b,
x = G(b) = A’1 b so that, G (b) = A’1 , and (2.7) yields

A’1 b Ax
A’1 ¤ A A’1 = K(A),
K(d) = (2.9)
’1 b
A x

where K(A) is the condition number of matrix A (see Section 3.1.1) and the use
of a consistent matrix norm is understood. Therefore, if A is well conditioned,
solving the linear system Ax=b is a stable problem with respect to perturbations
of the right-hand side b. Stability with respect to perturbations on the entries of

A will be analyzed in Section 3.10.

Example 2.4 (Nonlinear equations) Let f : R ’ R be a function of class
C 1 and consider the nonlinear equation

F (x, d) = f (x) = •(x) ’ d = 0,

where • : R ’ R is a suitable function and d ∈ R a datum (possibly equal
to zero). The problem is well de¬ned only if • is invertible in a neighborhood
of d: in such a case, indeed, x = •’1 (d) and the resolvent is G = •’1 . Since
’1
(•’1 ) (d) = [• (x)] , the ¬rst relation in (2.7) yields, for d = 0,

|d|
|[• (x)]’1 |,
K(d) (2.10)
|x|
2.2 Stability of Numerical Methods 37

while if d = 0 or x = 0 we have

|[• (x)]’1 |.
Kabs (d) (2.11)

The problem is thus ill posed if x is a multiple root of •(x)’d; it is ill conditioned
when • (x) is “small”, well conditioned when • (x) is “large”. We shall further

address this subject in Section 6.1.

In view of (2.8), the quantity G (d) is an approximation of Kabs (d) and
is sometimes called ¬rst order absolute condition number. This latter rep-
resents the limit of the Lipschitz constant of G (see Section 11.1) as the
perturbation on the data tends to zero.
Such a number does not always provide a sound estimate of the condition
number Kabs (d). This happens, for instance, when G vanishes at a point
whilst G is non null in a neighborhood of the same point. For example,
take x = G(d) = cos(d) ’ 1 for d ∈ (’π/2, π/2); we have G (0) = 0, while
Kabs (0) = 2/π.


2.2 Stability of Numerical Methods
We shall henceforth suppose the problem (2.1) to be well posed. A numer-
ical method for the approximate solution of (2.1) will consist, in general,
of a sequence of approximate problems

n≥1
Fn (xn , dn ) = 0 (2.12)

depending on a certain parameter n (to be de¬ned case by case). The
understood expectation is that xn ’ x as n ’ ∞, i.e. that the numerical
solution converges to the exact solution. For that, it is necessary that dn ’
d and that Fn “approximates” F , as n ’ ∞. Precisely, if the datum d of
problem (2.1) is admissible for Fn , we say that (2.12) is consistent if

Fn (x, d) = Fn (x, d) ’ F (x, d) ’ 0 for n ’ ∞ (2.13)

where x is the solution to problem (2.1) corresponding to the datum d.
The meaning of this de¬nition will be made precise in the next chapters
for any single class of considered problems.
A method is said to be strongly consistent if Fn (x, d) = 0 for any value
of n and not only for n ’ ∞.
In some cases (e.g., when iterative methods are used) problem (2.12)
could take the following form

n≥q
Fn (xn , xn’1 , . . . , xn’q , dn ) = 0 (2.14)

where x0 , x1 , . . . , xq’1 are given. In such a case, the property of strong
consistency becomes Fn (x, x, . . . , x, d) = 0 for all n ≥ q.
38 2. Principles of Numerical Mathematics

Example 2.5 Let us consider the following iterative method (known as New-
ton™s method and discussed in Section 6.2.2) for approximating a simple root ±
of a function f : R ’ R,

f (xn’1 )
xn = xn’1 ’ n ≥ 1.
given x0 , , (2.15)
f (xn’1 )

The method (2.15) can be written in the form (2.14) by setting Fn (xn , xn’1 , f ) =
xn ’ xn’1 + f (xn’1 )/f (xn’1 ) and is strongly consistent since Fn (±, ±, f ) = 0
for all n ≥ 1.
Consider now the following numerical method (known as the composite mid-
b
point rule discussed in Section 9.2) for approximating x = a f (t) dt,
n
tk + tk+1
n≥1
xn = H f ,
2
k=1


where H = (b ’ a)/n and tk = a + (k ’ 1)H, k = 1, . . . , (n + 1). This method
is consistent; it is also strongly consistent provided thet f is a piecewise linear
polynomial.
More generally, all numerical methods obtained from the mathematical prob-
lem by truncation of limit operations (such as integrals, derivatives, series, . . . )

are not strongly consistent.


Recalling what has been previously stated about problem (2.1), in order
for the numerical method to be well posed (or stable) we require that for any
¬xed n, there exists a unique solution xn corresponding to the datum dn ,
that the computation of xn as a function of dn is unique and, furthermore,
that xn depends continuously on the data, i.e.

∀· > 0, ∃Kn (·, dn ) : δdn < · ’ δxn ¤ Kn (·, dn ) δdn . (2.16)

As done in (2.4), we introduce for each problem in the sequence (2.12) the
quantities

δxn / xn δxn
Kn (dn ) = sup , Kabs,n (dn ) = sup , (2.17)
δdn / dn δdn
δdn ∈Dn δdn ∈Dn

and then de¬ne

K num (dn ) = lim sup Kn (dn ), num
Kabs (dn ) = lim sup Kabs,n (dn ).
k’∞ n≥k k’∞ n≥k


We call K num (dn ) the relative asymptotic condition number of the numer-
num
ical method (2.12) and Kabs (dn ) absolute asymptotic condition number,
corresponding to the datum dn .
The numerical method is said to be well conditioned if K num is “small”
for any admissible datum dn , ill conditioned otherwise. As in (2.6), let us
2.2 Stability of Numerical Methods 39

consider the case where, for each n, the functional relation (2.1) de¬nes a
mapping Gn between the sets of the numerical data and the solutions
xn = Gn (dn ), that is Fn (Gn (dn ), dn ) = 0. (2.18)
Assuming that Gn is di¬erentiable, we can obtain from (2.17)
dn
Kn (dn ) Gn (dn ) , Kabs,n (dn ) Gn (dn ) . (2.19)
Gn (dn )

Example 2.6 (Sum and subtraction) The function f : R2 ’ R, f (a, b) =
a + b, is a linear mapping whose gradient is the vector f (a, b) = (1, 1)T . Using
the vector norm · 1 de¬ned in (1.13) yields K(a, b) (|a| + |b|)/(|a + b|), from
which it follows that summing two numbers of the same sign is a well conditioned
operation, being K(a, b) 1. On the other hand, subtracting two numbers almost
equal is ill conditioned, since |a + b| |a| + |b|. This fact, already pointed out in
Example 2.2, leads to the cancellation of signi¬cant digits whenever numbers can
be represented using only a ¬nite number of digits (as in ¬‚oating-point arithmetic,

see Section 2.5).

Example 2.7 Consider again the problem of computing the roots of a polyno-
mial of second degree analyzed in Example 2.2. When p > 1 (separated roots),
such a problem is well conditioned. However, we generate an unstable algorithm
if we evaluate the root x’ by the formula x’ = p ’ p2 ’ 1. This formula is
indeed subject to errors due to numerical cancellation of signi¬cant digits (see
Section 2.4) that are introduced by the ¬nite arithmetic of the computer. A pos-
sible remedy to this trouble consists of computing x+ = p + p2 ’ 1 at ¬rst,
then x’ = 1/x+ . Alternatively, one can solve F (x, p) = x2 ’ 2px + 1 = 0 using
Newton™s method (proposed in Example 2.5)
xn = xn’1 ’ (x2 ’ 2pxn’1 + 1)/(2xn’1 ’ 2p) = fn (p), n ≥ 1, x0 given.
n’1

|p|/|xn ’ p|. To compute K num (p)
Applying (2.19) for p > 1 yields Kn (p)
we notice that, in the case when the algorithm converges, the solution xn would
converge to one of the roots x+ or x’ ; therefore, |xn ’ p| ’ p2 ’ 1 and thus
Kn (p) ’ K num (p) |p|/ p2 ’ 1, in perfect agreement with the value (2.8) of
the condition number of the exact problem.
We can conclude that Newton™s method for the search of simple roots of a
second order algebraic equation is ill conditioned if |p| is very close to 1, while it

is well conditioned in the other cases.

The ¬nal goal of numerical approximation is, of course, to build, through
numerical problems of the type (2.12), solutions xn that “get closer” to the
solution of problem (2.1) as much as n gets larger. This concept is made
precise in the next de¬nition.

De¬nition 2.2 The numerical method (2.12) is convergent i¬
∀µ > 0 ∃n0 (µ), ∃δ(n0 , µ) > 0 :
(2.20)
∀n > n0 (µ), ∀ δdn < δ(n0 , µ) ’ x(d) ’ xn (d + δdn ) ¤ µ,
40 2. Principles of Numerical Mathematics

where d is an admissible datum for the problem (2.1), x(d) is the corre-
sponding solution and xn (d + δdn ) is the solution of the numerical problem
(2.12) with datum d + δdn .

To verify the implication (2.20) it su¬ces to check that under the same
assumptions
µ
x(d + δdn ) ’ xn (d + δdn ) ¤ . (2.21)
2
Indeed, thanks to (2.3) we have

x(d) ’ xn (d + δdn ) ¤ x(d) ’ x(d + δdn )

+ x(d + δdn ) ’ xn (d + δdn ) ¤ K(δ(n0 , µ), d) δdn + 2 .
µ


µ
Choosing δdn such that K(δ(n0 , µ), d) δdn < 2 , one obtains (2.20).

Measures of the convergence of xn to x are given by the absolute error
or the relative error, respectively de¬ned as

|x ’ xn |
E(xn ) = |x ’ xn |, Erel (xn ) = , (if x = 0). (2.22)
|x|

In the cases where x and xn are matrix or vector quantities, in addition
to the de¬nitions in (2.22) (where the absolute values are substituted by
suitable norms) it is sometimes useful to introduce the error by component
de¬ned as
|(x ’ xn )ij |
c
Erel (xn ) = max . (2.23)
|xij |
i,j



2.2.1 Relations between Stability and Convergence
The concepts of stability and convergence are strongly connected.
First of all, if problem (2.1) is well posed, a necessary condition in order
for the numerical problem (2.12) to be convergent is that it is stable.
Let us thus assume that the method is convergent, and prove that it is
stable by ¬nding a bound for δxn . We have

xn (d + δdn ) ’ xn (d) ¤ xn (d) ’ x(d)
δxn =

x(d) ’ x(d + δdn ) + x(d + δdn ) ’ xn (d + δdn )
+ (2.24)

¤ K(δ(n0 , µ), d) δdn + µ,

having used (2.3) and (2.21) twice. From (2.24) we can conclude that, for n
su¬ciently large, δxn / δdn can be bounded by a constant of the order
2.3 A priori and a posteriori Analysis 41

of K(δ(n0 , µ), d), so that the method is stable. Thus, we are interested in
stable numerical methods since only these can be convergent.
The stability of a numerical method becomes a su¬cient condition for
the numerical problem (2.12) to converge if this latter is also consistent
with problem (2.1). Indeed, under these assumptions we have
x(d + δdn ) ’ xn (d + δdn ) ¤ x(d + δdn ) ’ x(d)

x(d) ’ xn (d) + xn (d) ’ xn (d + δdn ) .
+
Thanks to (2.3), the ¬rst term at right-hand side can be bounded by δdn
(up to a multiplicative constant independent of δdn ). A similar bound holds
for the third term, due to the stability property (2.16). Finally, concerning
the remaining term, if Fn is di¬erentiable with respect to the variable x,
an expansion in a Taylor series gives
‚Fn
Fn (x(d), d) ’ Fn (xn (d), d) = |(x,d) (x(d) ’ xn (d)),
‚x
for a suitable x “between” x(d) and xn (d). Assuming also that ‚Fn /‚x is
invertible, we get
’1
‚Fn
x(d) ’ xn (d) = [Fn (x(d), d) ’ Fn (xn (d), d)]. (2.25)
‚x |(x,d)

On the other hand, replacing Fn (xn (d), d) with F (x(d), d) (since both terms
are equal to zero) and passing to the norms, we ¬nd
’1
‚Fn
x(d) ’ xn (d) ¤ Fn (x(d), d) ’ F (x(d), d) .
‚x |(x,d)

Thanks to (2.13) we can thus conclude that x(d)’xn (d) ’ 0 for n ’ ∞.
The result that has just been proved, although stated in qualitative terms,
is a milestone in numerical analysis, known as equivalence theorem (or
Lax-Richtmyer theorem): “for a consistent numerical method, stability is
equivalent to convergence”. A rigorous proof of this theorem is available in
[Dah56] for the case of linear Cauchy problems, or in [Lax65] and in [RM67]
for linear well-posed initial value problems.


2.3 A priori and a posteriori Analysis
The stability analysis of a numerical method can be carried out following
di¬erent strategies:
1. forward analysis, which provides a bound to the variations δxn on
the solution due to both perturbations in the data and to errors that
are intrinsic to the numerical method;
42 2. Principles of Numerical Mathematics

2. backward analysis, which aims at estimating the perturbations that
should be “impressed” to the data of a given problem in order to
obtain the results actually computed under the assumption of working
in exact arithmetic. Equivalently, given a certain computed solution
xn , backward analysis looks for the perturbations δdn on the data
such that Fn (xn , dn + δdn ) = 0. Notice that, when performing such
an estimate, no account at all is taken into the way xn has been
obtained (that is, which method has been employed to generate it).

Forward and backward analyses are two di¬erent instances of the so
called a priori analysis. This latter can be applied to investigate not only
the stability of a numerical method, but also its convergence. In this case
it is referred to as a priori error analysis, which can again be performed
using either a forward or a backward technique.
A priori error analysis is distincted from the so called a posteriori error
analysis, which aims at producing an estimate of the error on the grounds
of quantities that are actually computed by a speci¬c numerical method.
Typically, denoting by xn the computed numerical solution, approximation
to the solution x of problem (2.1), the a posteriori error analysis aims at
evaluating the error x ’ xn as a function of the residual rn = F (xn , d) by
means of constants that are called stability factors (see [EEHJ96]).

Example 2.8 For the sake of illustration, consider the problem of ¬nding the
zeros ±1 , . . . , ±n of a polynomial pn (x) = n ak xk of degree n.
k=0
Denoting by pn (x) = n ak xk a perturbed polynomial whose zeros are ±i ,
˜ ˜ ˜
k=0
forward analysis aims at estimating the error between two corresponding zeros
±i and ±i , in terms of the variations on the coe¬cients ak ’ ak , k = 0, 1, . . . , n.
˜ ˜
On the other hand, let {±i } be the approximate zeros of pn (computed some-
ˆ
how). Backward analysis provides an estimate of the perturbations δak which
should be impressed to the coe¬cients so that n (ak + δak )±i = 0, for a ¬xed
ˆk
k=0
±i . The goal of a posteriori error analysis would rather be to provide an estimate
ˆ
of the error ±i ’ ±i as a function of the residual value pn (±i ).
ˆ ˆ

This analysis will be carried out in Section 6.1.

Example 2.9 Consider the linear system Ax=b, where A∈ Rn—n is a nonsin-
gular matrix.
˜
˜x
For the perturbed system A˜ = b, forward analysis provides an estimate of
˜
˜
the error x ’ x in terms of A ’ A and b ’ b, while backward analysis estimates
˜
the perturbations δA = (δaij ) and δb = (δbi ) which should be impressed to the
entries of A and b in order to get (A + δA)xn = b + δb, xn being the solution of
the linear system (computed somehow). Finally, a posteriori error analysis looks
for an estimate of the error x ’ xn as a function of the residual rn = b ’ Axn .

We will develop this analysis in Section 3.1.

It is important to point out the role played by the a posteriori analysis in
devising strategies for adaptive error control. These strategies, by suitably
changing the discretization parameters (for instance, the spacing between
2.4 Sources of Error in Computational Models 43

nodes in the numerical integration of a function or a di¬erential equation),
employ the a posteriori analysis in order to ensure that the error does not
exceed a ¬xed tolerance.
A numerical method that makes use of an adaptive error control is called
adaptive numerical method. In practice, a method of this kind applies in the
computational process the idea of feedback, by activating on the grounds of
a computed solution a convergence test which ensures the control of error
within a ¬xed tolerance. In case the convergence test fails, a suitable strat-
egy for modifying the discretization parameters is automatically adopted
in order to enhance the accuracy of the solution to be newly computed,
and the overall procedure is iterated until the convergence check is passed.


2.4 Sources of Error in Computational Models
Whenever the numerical problem (2.12) is an approximation to the math-
ematical problem (2.1) and this latter is in turn a model of a physical
problem (which will be shortly denoted by PP), we shall say that (2.12) is
a computational model for PP.
In this process the global error, denoted by e, is expressed by the dif-
ference between the actually computed solution, xn , and the physical so-
lution, xph , of which x provides a model. The global error e can thus be
interpreted as being the sum of the error em of the mathematical model,
given by x ’ xph , and the error ec of the computational model, xn ’ x, that
is e = em + ec (see Figure 2.1).


PP : xph
em e

ec
xn
F (x, d) = 0

en ea

Fn (xn , dn ) = 0

FIGURE 2.1. Errors in computational models

The error em will in turn take into account the error of the mathematical
model in strict sense (that is, the extent at which the functional equation
(2.1) does realistically describe the problem PP) and the error on the data
(that is, how much accurately does d provide a measure of the real physical
44 2. Principles of Numerical Mathematics

data). In the same way, ec turns out to be the combination of the numerical
discretization error en = xn ’ x, the error ea introduced by the numerical
algorithm and the roundo¬ error introduced by the computer during the
actual solution of problem (2.12) (see Section 2.5).
In general, we can thus outline the following sources of error:
1. errors due to the model, that can be controlled by a proper choice of
the mathematical model;
2. errors in the data, that can be reduced by enhancing the accuracy in
the measurement of the data themselves;
3. truncation errors, arising from having replaced in the numerical model
limits by operations that involve a ¬nite number of steps;
4. rounding errors.
The errors at the items 3. and 4. give rise to the computational error. A
numerical method will thus be convergent if this error can be made arbi-
trarily small by increasing the computational e¬ort. Of course, convergence
is the primary, albeit not unique, goal of a numerical method, the others
being accuracy, reliability and e¬ciency.
Accuracy means that the errors are small with respect to a ¬xed tol-
erance. It is usually quanti¬ed by the order of in¬nitesimal of the error
en with respect to the discretization characteristic parameter (for instance
the largest grid spacing between the discretization nodes). By the way, we
notice that machine precision does not limit, on theoretical grounds, the
accuracy.
Reliability means it is likely that the global error can be guaranteed to be
below a certain tolerance. Of course, a numerical model can be considered
to be reliable only if suitably tested, that is, successfully applied to several
test cases.
E¬ciency means that the computational complexity that is needed to
control the error (that is, the amount of operations and the size of the
memory required) is as small as possible.

Having encountered the term algorithm several times in this section, we
cannot refrain from providing an intuitive description of it. By algorithm
we mean a directive that indicates, through elementary operations, all the
passages that are needed to solve a speci¬c problem. An algorithm can in
turn contain sub-algorithms and must have the feature of terminating after
a ¬nite number of elementary operations. As a consequence, the executor
of the algorithm (machine or human being) must ¬nd within the algorithm
itself all the instructions to completely solve the problem at hand (provided
that the necessary resources for its execution are available).
For instance, the statement that a polynomial of second degree surely
admits two roots in the complex plane does not characterize an algorithm,
2.5 Machine Representation of Numbers 45

whereas the formula yielding the roots is an algorithm (provided that the
sub-algorithms needed to correctly execute all the operations have been
de¬ned in turn).
Finally, the complexity of an algorithm is a measure of its executing
time. Calculating the complexity of an algorithm is therefore a part of the
analysis of the e¬ciency of a numerical method. Since several algorithms,
with di¬erent complexities, can be employed to solve the same problem P ,
it is useful to introduce the concept of complexity of a problem, this latter
meaning the complexity of the algorithm that has minimum complexity
among those solving P . The complexity of a problem is typically measured
by a parameter directly associated with P . For instance, in the case of
the product of two square matrices, the computational complexity can be
expressed as a function of a power of the matrix size n (see, [Str69]).



2.5 Machine Representation of Numbers
Any machine operation is a¬ected by rounding errors or roundo¬. They are
due to the fact that on a computer only a ¬nite subset of the set of real
numbers can be represented. In this section, after recalling the positional
notation of real numbers, we introduce their machine representation.


2.5.1 The Positional System

<<

. 2
( 26)



>>