where A(3) = M2 A(2) . Therefore

M2 M1 A = A(3) = U. (3.34)

On the other hand, matrices M1 and M2 are lower triangular, their product

is still lower triangular, as is their inverse; thus, from (3.34) one gets

A = (M2 M1 )’1 U = LU,

which is the desired factorization of A.

This identity can be generalized as follows. Setting

mk = (0, . . . , 0, mk+1,k , . . . , mn,k )T ∈ Rn

3.3 The Gaussian Elimination Method (GEM) and LU Factorization 73

and de¬ning

®

1 ... 0 0 ... 0

. .. .

. .

. .

. .

.

. . . .

0 0

1 0

Mk = = In ’ mk eT

0 0

’mk+1,k k

1

. .

. . . ..

°. .»

. . . .

. . . . .

’mn,k

0 ... 0 ... 1

as the k-th Gaussian transformation matrix, one ¬nds out that

(Mk )ip = δip ’ (mk eT )ip = δip ’ mik δkp , i, p = 1, . . . , n.

k

On the other hand, from (3.31) we have that

n

(k+1) (k) (k) (k)

’ (δip ’ mik δkp )apj ,

aij = aij mik δkk akj = i, j = k + 1, . . . , n,

p=1

or, equivalently,

A(k+1) = Mk A(k) . (3.35)

As a consequence, at the end of the elimination process the matrices Mk ,

with k = 1, . . . , n ’ 1, and the matrix U have been generated such that

Mn’1 Mn’2 . . . M1 A = U.

The matrices Mk are unit lower triangular with inverse given by

M’1 = 2In ’ Mk = In + mk eT , (3.36)

k

k

where (mi eT )(mj eT ) are equal to the null matrix if i = j. As a consequence

i j

A = M’1 M’1 . . . M’1 U

1 2 n’1

= (In + m1 eT )(In + m2 eT ) . . . (In + mn’1 eT )U

1 2 n’1

n’1

mi eT

= In + U

i

i=1

®

1 0 ... ... 0

(3.37)

.

.

m21

1 .

.

=. .

.. U.

.

.

. m32 .

.

.

. ..

.

.

.

. 0

° »

mn1 mn2 ... mn,n’1 1

74 3. Direct Methods for the Solution of Linear Systems

De¬ning L = (Mn’1 Mn’2 . . . M1 )’1 = M’1 . . . M’1 , it follows that

1 n’1

A = LU.

We notice that, due to (3.37), the subdiagonal entries of L are the multi-

pliers mik produced by GEM, while the diagonal entries are equal to one.

Once the matrices L and U have been computed, solving the linear system

consists only of solving successively the two triangular systems

Ly = b

Ux = y.

The computational cost of the factorization process is obviously the same

as that required by GEM.

The following result establishes a link between the leading dominant

minors of a matrix and its LU factorization induced by GEM.

Theorem 3.4 Let A ∈ Rn—n . The LU factorization of A with lii = 1 for

i = 1, . . . , n exists and is unique i¬ the principal submatrices Ai of A of

order i = 1, . . . , n ’ 1 are nonsingular.

Proof. The existence of the LU factorization can be proved following the steps

of the GEM. Here we prefer to pursue an alternative approach, which allows for

proving at the same time both existence and uniqueness and that will be used

again in later sections.

Let us assume that the leading minors Ai of A are nonsingular for i = 1, . . . , n’

1 and prove, by induction on i, that under this hypothesis the LU factorization

of A(= An ) with lii = 1 for i = 1, . . . , n, exists and is unique.

The property is obviously true if i = 1. Assume therefore that there exists an

(i’1)

unique LU factorization of Ai’1 of the form Ai’1 = L(i’1) U(i’1) with lkk = 1

for k = 1, . . . , i ’ 1, and show that there exists an unique factorization also for

Ai . We partition Ai by block matrices as

®

Ai’1 c

Ai = ° »

T

d aii

and look for a factorization of Ai of the form

® ®

L(i’1) 0 U(i’1) u

Ai = L U = ° »° »,

(i) (i)

(3.38)

T

0T

l 1 uii

having also partitioned by blocks the factors L(i) and U(i) . Computing the prod-

uct of these two factors and equating by blocks the elements of Ai , it turns out

that the vectors l and u are the solutions to the linear systems L(i’1) u = c,

lT U(i’1) = dT .

3.3 The Gaussian Elimination Method (GEM) and LU Factorization 75

On the other hand, since 0 = det(Ai’1 ) = det(L(i’1) )det(U(i’1) ), the matrices

L(i’1) and U(i’1) are nonsingular and, as a result, u and l exist and are unique.

Thus, there exists a unique factorization of Ai , where uii is the unique solution

of the equation uii = aii ’ lT u. This completes the induction step of the proof.

It now remains to prove that, if the factorization at hand exists and is unique,

then the ¬rst n ’ 1 leading minors of A must be nonsingular. We shall distinguish

the case where A is singular and when it is nonsingular.

Let us start from the second one and assume that the LU factorization of A

with lii = 1 for i = 1, . . . , n, exists and is unique. Then, due to (3.38), we have

Ai = L(i) U(i) for i = 1, . . . , n. Thus

det(Ai ) = det(L(i) )det(U(i) ) = det(U(i) ) = u11 u22 . . . uii , (3.39)

from which, taking i = n and A nonsingular, we obtain u11 u22 . . . unn = 0, and

thus, necessarily, det(Ai ) = u11 u22 . . . uii = 0 for i = 1, . . . , n ’ 1.

Now let A be a singular matrix and assume that (at least) one diagonal entry

of U is equal to zero. Denote by ukk the null entry of U with minimum index k.

Thanks to (3.38), the factorization can be computed without troubles until the

k + 1-th step. From that step on, since the matrix U(k) is singular, existence and

uniqueness of the vector lT are certainly lost, and, thus, the same holds for the

uniqueness of the factorization. In order for this not to occur before the process

has factorized the whole matrix A, the ukk entries must all be nonzero up to the

index k = n ’ 1 included, and thus, due to (3.39), all the leading minors Ak must

be nonsingular for k = 1, . . . , n ’ 1. 3

From the above theorem we conclude that, if an Ai , with i = 1, . . . , n ’ 1,

is singular, then the factorization may either not exist or not be unique.

Example 3.3 Consider the matrices

1 2 0 1 0 1

B= , C= , D= .

1 2 1 0 0 2

According to Theorem 3.4, the singular matrix B, having nonsingular leading

minor B1 = 1, admits a unique LU factorization. The remaining two examples

outline that, if the assumptions of the theorem are not ful¬lled, the factorization

may fail to exist or be unique.

Actually, the nonsingular matrix C, with C1 singular, does not admit any

factorization, while the (singular) matrix D, with D1 singular, admits an in¬nite

number of factorizations of the form D = Lβ Uβ , with

1 0 0 1

∀β ∈ R.

Lβ = , Uβ = ,

2’β

β 1 0

•

In the case where the LU factorization is unique, we point out that, because

det(A) = det(LU) = det(L) det(U) = det(U), the determinant of A is given

76 3. Direct Methods for the Solution of Linear Systems

by

det(A) = u11 · · · unn .

Let us now recall the following property (referring for its proof to [GL89]

or [Hig96]).

Property 3.2 If A is a matrix diagonally dominant by rows or by columns,

then the LU factorization of A exists. In particular, if A is diagonally dom-

inant by columns, then |lij | ¤ 1 ∀i, j.

In the proof of Theorem 3.4 we exploited the fact the the diagonal entries

of L are equal to 1. In a similar manner, we could have ¬xed to 1 the

diagonal entries of the upper triangular matrix U, obtaining a variant of

GEM that will be considered in Section 3.3.4.

The freedom in setting up either the diagonal entries of L or those of U,

implies that several LU factorizations exist which can be obtained one from

the other by multiplication with a suitable diagonal matrix (see Section

3.4.1).

3.3.2 The E¬ect of Rounding Errors

If rounding errors are taken into account, the factorization process induced

by GEM yields two matrices, L and U, such that LU = A + δA, δA being a

perturbation matrix. The size of such a perturbation can be estimated by

nu

|δA| ¤ |L| |U|, (3.40)

1 ’ nu

where u is the roundo¬ unit (for the proof of this result we refer to [Hig89]).

From (3.40) it is seen that the presence of small pivotal entries can make

the right side of the inequality virtually unbounded, with a consequent loss

of control on the size of the perturbation matrix δA. The interest is thus

in ¬nding out estimates like (3.40) of the form

|δA| ¤ g(u)|A|,

where g(u) is a suitable function of u. For instance, assuming that L and

U have nonnegative entries, then since |L| |U| = |LU| one gets

nu

|L| |U| = |LU| = |A + δA| ¤ |A| + |δA| ¤ |A| + |L| |U|, (3.41)

1 ’ nu

from which the desired bound is achieved by taking g(u) = nu/(1 ’ 2nu).

The technique of pivoting, examined in Section 3.5, keeps the size of the

pivotal entries under control and makes it possible to obtain estimates like

(3.41) for any matrix.

3.3 The Gaussian Elimination Method (GEM) and LU Factorization 77

3.3.3 Implementation of LU Factorization

Since L is a lower triangular matrix with diagonal entries equal to 1 and U

is upper triangular, it is possible (and convenient) to store the LU factor-

ization directly in the same memory area that is occupied by the matrix A.

More precisely, U is stored in the upper triangular part of A (including the

diagonal), whilst L occupies the lower triangular portion of A (the diagonal

entries of L are not stored since they are implicitly assumed to be 1).

A coding of the algorithm is reported in Program 4. The output matrix

A contains the overwritten LU factorization.

Program 4 - lu kji : LU factorization of matrix A. kji version

function [A] = lu kji (A)

[n,n]=size(A);

for k=1:n-1

A(k+1:n,k)=A(k+1:n,k)/A(k,k);

for j=k+1:n, for i=k+1:n

A(i,j)=A(i,j)-A(i,k)*A(k,j);

end, end

end

This implementation of the factorization algorithm is commonly referred

to as the kji version, due to the order in which the cycles are executed.

In a more appropriate notation, it is called the SAXP Y ’ kji version,

due to the fact that the basic operation of the algorithm, which consists of

multiplying a scalar A by a vector X, summing another vector Y and then

storing the result, is usually called SAXPY (i.e. Scalar A X P lus Y ).

The factorization can of course be executed by following a di¬erent order.

In general, the forms in which the cycle on index i precedes the cycle on

j are called row-oriented, whilst the others are called column-oriented. As

usual, this terminology refers to the fact that the matrix is accessed by

rows or by columns.

An example of LU factorization, jki version and column-oriented, is given

in Program 5. This version is commonly called GAXP Y ’ jki, since the

basic operation (a product matrix-vector), is called GAXPY which stands

for Generalized sAXPY (see for further details [DGK84]). In the GAXPY

operation the scalar A of the SAXPY operation is replaced by a matrix.

Program 5 - lu jki : LU factorization of matrix A. jki version

function [A] = lu jki (A)

[n,n]=size(A);

for j=1:n

for k=1:j-1, for i=k+1:n

A(i,j)=A(i,j)-A(i,k)*A(k,j);

end, end

for i=j+1:n, A(i,j)=A(i,j)/A(j,j); end

end

78 3. Direct Methods for the Solution of Linear Systems

3.3.4 Compact Forms of Factorization

Remarkable variants of LU factorization are the Crout factorization and

Doolittle factorization, and are known also as compact forms of the Gauss

elimination method. This name is due to the fact that these approaches

require less intermediate results than the standard GEM to generate the

factorization of A.

Computing the LU factorization of A is formally equivalent to solving

the following nonlinear system of n2 equations

min(i,j)

aij = lir urj , (3.42)

r=1

the unknowns being the n2 + n coe¬cients of the triangular matrices L and

U. If we arbitrarily set n coe¬cients to 1, for example the diagonal entries

of L or U, we end up with the Doolittle and Crout methods, respectively,

which provide an e¬cient way to solve system (3.42).

In fact, supposing that the ¬rst k ’ 1 columns of L and U are available

and setting lkk = 1 (Doolittle method), the following equations are obtained

from (3.42)

k’1

akj = lkr urj + ukj , j = k, . . . , n

r=1

k’1

aik = lir urk + lik ukk , i = k + 1, . . . , n.

r=1

Note that these equations can be solved in a sequential way with respect

to the boxed variables ukj and lik . From the Doolittle compact method

we thus obtain ¬rst the k-th row of U and then the k-th column of L, as

follows: for k = 1, . . . , n

k’1

ukj = akj ’ lkr urj j = k, . . . , n

r=1 (3.43)

k’1

1

aik ’

lik = lir urk i = k + 1, . . . , n.

ukk r=1

The Crout factorization is generated similarly, computing ¬rst the k-th

column of L and then the k-th row of U: for k = 1, . . . , n

k’1

lik = aik ’ lir urk i = k, . . . , n

r=1

k’1

1

akj ’

ukj = lkr urj j = k + 1, . . . , n,

lkk r=1

3.4 Other Types of Factorization 79

where we set ukk = 1. Recalling the notations introduced above, the Doolit-

tle factorization is nothing but the ijk version of GEM.

We provide in Program 6 the implementation of the Doolittle scheme.

Notice that now the main computation is a dot product, so this scheme is

also known as the DOT ’ ijk version of GEM.

Program 6 - lu ijk : LU factorization of the matrix A: ijk version

function [A] = lu ijk (A)

[n,n]=size(A);

for i=1:n

for j=2:i

A(i,j-1)=A(i,j-1)/A(j-1,j-1);

for k=1:j-1, A(i,j)=A(i,j)-A(i,k)*A(k,j); end

end

for j=i+1:n

for k=1:i-1, A(i,j)=A(i,j)-A(i,k)*A(k,j); end

end

end

3.4 Other Types of Factorization

We now address factorizations suitable for symmetric and rectangular ma-

trices.

LDMT Factorization

3.4.1

It is possible to devise other types of factorizations of A removing the

hypothesis that the elements of L are equal to one. Speci¬cally, we will

address some variants where the factorization of A is of the form

A = LDMT .

where L, MT and D are lower triangular, upper triangular and diagonal

matrices, respectively.

After the construction of this factorization, the resolution of the system

can be carried out solving ¬rst the lower triangular system Ly=b, then the

diagonal one Dz=y, and ¬nally the upper triangular system MT x=z, with

a cost of n2 + n ¬‚ops. In the symmetric case, we obtain M = L and the

LDLT factorization can be computed with half the cost (see Section 3.4.2).

The LDLT factorization enjoys a property analogous to the one in The-

orem 3.4 for the LU factorization. In particular, the following result holds.

Theorem 3.5 If all the principal minors of a matrix A∈ Rn—n are nonzero

then there exist a unique diagonal matrix D, a unique unit lower triangu-

lar matrix L and a unique unit upper triangular matrix MT , such that

A = LDMT .

80 3. Direct Methods for the Solution of Linear Systems

Proof. By Theorem 3.4 we already know that there exists a unique LU factor-

ization of A with lii = 1 for i = 1, . . . , n. If we set the diagonal entries of D

equal to uii (nonzero because U is nonsingular), then A = LU = LD(D’1 U).

Upon de¬ning MT = D’1 U, the existence of the LDMT factorization follows,

where D’1 U is a unit upper triangular matrix. The uniqueness of the LDMT

3

factorization is a consequence of the uniqueness of the LU factorization.

The above proof shows that, since the diagonal entries of D coincide

with those of U, we could compute L, MT and D starting from the LU

factorization of A. It su¬ces to compute MT as D’1 U. Nevertheless, this

algorithm has the same cost as the standard LU factorization. Likewise,

it is also possible to compute the three matrices of the factorization by

enforcing the identity A=LDMT entry by entry.

3.4.2 Symmetric and Positive De¬nite Matrices: The

Cholesky Factorization

As already pointed out, the factorization LDMT simpli¬es considerably

when A is symmetric because in such a case M=L, yielding the so-called

LDMT factorization. The computational cost halves, with respect to the

LU factorization, to about (n3 /3) ¬‚ops.

As an example, the Hilbert matrix of order 3 admits the following LDLT

factorization

® ® ® ®

113 1

113 1

100 10 0

2 2

1 1 1 1

H3 = 2 3 4 = 2 1 0 0 12 0 0 1 1 .

1

° »° »° »° »

1 1

1 1 1

001

11 0 0 180

3

3 4 5

In the case that A is also positive de¬nite, the diagonal entries of D in the

LDLT factorization are positive. Moreover, we have the following result.

Theorem 3.6 Let A ∈ Rn—n be a symmetric and positive de¬nite matrix.

Then, there exists a unique upper triangular matrix H with positive diagonal

entries such that

A = HT H. (3.44)

This factorization is called Cholesky factorization and the entries hij of HT

√

can be computed as follows: h11 = a11 and, for i = 2, . . . , n,

j’1

aij ’ /hjj , j = 1, . . . , i ’ 1,

hij = hik hjk

k=1

(3.45)

1/2

i’1

aii ’ h2

hii = .

ik

k=1

3.4 Other Types of Factorization 81

Proof. Let us prove the theorem proceeding by induction on the size i of the

matrix (as done in Theorem 3.4), recalling that if Ai ∈ Ri—i is symmetric positive

de¬nite, then all its principal submatrices enjoy the same property.

For i = 1 the result is obviously true. Thus, suppose that it holds for i ’ 1 and

prove that it also holds for i. There exists an upper triangular matrix Hi’1 such

that Ai’1 = HT Hi’1 . Let us partition Ai as

i’1

Ai’1 v

Ai = ,

vT ±

with ± ∈ R+ , vT ∈ Ri’1 and look for a factorization of Ai of the form

HT Hi’1 h

0

Ai = HT Hi = i’1

.

i

0T

hT β

β

Enforcing the equality with the entries of Ai yields the equations HT h = v

i’1

and hT h + β 2 = ±. The vector h is thus uniquely determined, since HT is

i’1

nonsingular. As for β, due to the properties of determinants

0 < det(Ai ) = det(HT ) det(Hi ) = β 2 (det(Hi’1 ))2 ,

i

√

we can conclude that it must be a real number. As a result, β = ± ’ hT h is

the desired diagonal entry and this concludes the inductive argument.

√

Let us now prove formulae (3.45). The fact that h11 = a11 is an immediate

consequence of the induction argument for i = 1. In the case of a generic i,

relations (3.45)1 are the forward substitution formulae for the solution of the

linear system HT h = v = (a1i , a2i , . . . , ai’1,i )T , while formulae (3.45)2 state

√ i’1

that β = ± ’ hT h, where ± = aii . 3

The algorithm which implements (3.45) requires about (n3 /3) ¬‚ops and it

turns out to be stable with respect to the propagation of rounding errors.

˜

It can indeed be shown that the upper triangular matrix H is such that

˜˜

HT H = A + δA, where δA is a pertubation matrix such that δA 2 ¤

8n(n + 1)u A 2 , when the rounding errors are considered and assuming

that 2n(n + 1)u ¤ 1 ’ (n + 1)u (see [Wil68]).

Also, for the Cholesky factorization it is possible to overwrite the matrix

T

H in the lower triangular portion of A, without any further memory stor-

age. By doing so, both A and the factorization are preserved, noting that

A is stored in the upper triangular section since it is symmetric and that

i’1

its diagonal entries can be computed as a11 = h2 , aii = h2 + k=1 h2 ,

11 ii ik

i = 2, . . . , n.

An example of implementation of the Cholesky factorization is coded in

Program 7.

Program 7 - chol2 : Cholesky factorization

function [A] = chol2 (A)

[n,n]=size(A);

82 3. Direct Methods for the Solution of Linear Systems

for k=1:n-1

A(k,k)=sqrt(A(k,k)); A(k+1:n,k)=A(k+1:n,k)/A(k,k);

for j=k+1:n, A(j:n,j)=A(j:n,j)-A(j:n,k)*A(j,k); end

end

A(n,n)=sqrt(A(n,n));

3.4.3 Rectangular Matrices: The QR Factorization

De¬nition 3.1 A matrix A ∈ Rm—n , with m ≥ n, admits a QR fac-

torization if there exist an orthogonal matrix Q ∈ Rm—m and an upper

trapezoidal matrix R ∈ Rm—n with null rows from the n + 1-th one on,

such that

A = QR. (3.46)

This factorization can be constructed either using suitable transformation

matrices (Givens or Householder matrices, see Section 5.6.1) or using the

Gram-Schmidt orthogonalization algorithm discussed below.

It is also possible to generate a reduced version of the QR factorization

(3.46), as stated in the following result.

Property 3.3 Let A ∈ Rm—n be a matrix of rank n for which a QR fac-

torization is known. Then there exists a unique factorization of A of the

form

A = QR (3.47)

where Q and R are submatrices of Q and R given respectively by

(3.48)

Q = Q(1 : m, 1 : n), R = R(1 : n, 1 : n).

Moreover, Q has orthonormal vector columns and R is upper triangular

and coincides with the Cholesky factor H of the symmetric positive de¬nite

matrix AT A, that is, AT A = RT R.

˜

If A has rank n (i.e., full rank), then the column vectors of Q form an

orthonormal basis for the vector space range(A) (de¬ned in (1.5)). As a

consequence, constructing the QR factorization can also be interpreted as

a procedure for generating an orthonormal basis for a given set of vectors.

If A has rank r < n, the QR factorization does not necessarily yield an

orthonormal basis for range(A). However, one can obtain a factorization of

the form

R11 R12

QT AP = ,

0 0

3.4 Other Types of Factorization 83

m’n

n n n

˜

Rn

0

˜

A

m =Q

m’n

FIGURE 3.1. The reduced factorization. The matrices of the QR factorization

are drawn in dashed lines

where Q is orthogonal, P is a permutation matrix and R11 is a nonsingular

upper triangular matrix of order r.

In general, when using the QR factorization, we shall always refer to its

reduced form (3.47) as it ¬nds a remarkable application in the solution of

overdetermined systems (see Section 3.13).

˜ ˜

The matrix factors Q and R in (3.47) can be computed using the Gram-

Schmidt orthogonalization. Starting from a set of linearly independent vec-

tors, x1 , . . . , xn , this algorithm generates a new set of mutually orthogonal

vectors, q1 , . . . , qn , given by

q1 = x1 ,

k (3.49)

(qi , xk+1 )

qk+1 = xk+1 ’ k = 1, . . . , n ’ 1.

qi ,

(qi , qi )

i=1

˜

Denoting by a1 , . . . , an the column vectors of A, we set q1 = a1 / a1 2

˜ as

and, for k = 1, . . . , n ’ 1, compute the column vectors of Q

˜

qk+1 = qk+1 / qk+1 2 ,

where

k

qk+1 = ak+1 ’ (˜ j , ak+1 )˜ j .

q q

j=1

˜˜ ˜

Next, imposing that A=QR and exploiting the fact that Q is orthogonal

(that is, Q’1 = QT ), the entries of R can easily be computed. The overall

˜ ˜ ˜

computational cost of the algorithm is of the order of mn2 ¬‚ops.

It is also worth noting that if A has full rank, the matrix AT A is sym-

metric and positive de¬nite (see Section 1.9) and thus it admits a unique

Cholesky factorization of the form HT H. On the other hand, since the or-

˜

thogonality of Q implies

˜ ˜ ˜˜ ˜˜

HT H = AT A = RT QT QR = RT R,

84 3. Direct Methods for the Solution of Linear Systems

˜

we conclude that R is actually the Cholesky factor H of AT A. Thus, the

˜

diagonal entries of R are all nonzero only if A has full rank.

The Gram-Schmidt method is of little practical use since the generated

vectors lose their linear independence due to rounding errors. Indeed, in

¬‚oating-point arithmetic the algorithm produces very small values of qk+1 2

and rkk with a consequent numerical instability and loss of orthogonality

˜

˜

for matrix Q (see Example 3.4).

These drawbacks suggest employing a more stable version, known as

modi¬ed Gram-Schmidt method. At the beginning of the k + 1-th step, the

˜ ˜

projections of the vector ak+1 along the vectors q1 , . . . , qk are progressively

subtracted from ak+1 . On the resulting vector, the orthogonalization step

is then carried out. In practice, after computing (˜ 1 , ak+1 )˜ 1 at the k +1-th

q q

step, this vector is immediately subtracted from ak+1 . As an example, one

lets

(1)

ak+1 = ak+1 ’ (˜ 1 , ak+1 )˜ 1 .

q q

(1)

˜

This new vector ak+1 is projected along the direction of q2 and the obtained

(1)

projection is subtracted from ak+1 , yielding

(2) (1) (1)

ak+1 = ak+1 ’ (˜ 2 , ak+1 )˜ 2

q q

(k)

and so on, until ak+1 is computed.

(k)

It can be checked that ak+1 coincides with the corresponding vector qk+1

in the standard Gram-Schmidt process, since, due to the orthogonality of

˜˜ ˜

vectors q1 , q2 , . . . , qk ,

(k)

= ak+1 ’ (˜ 1 , ak+1 )˜ 1 ’ (˜ 2 , ak+1 ’ (˜ 1 , ak+1 )˜ 1 ) q2 + . . .

q˜

ak+1 q q q q

k

= ak+1 ’ (˜ j , ak+1 )˜ j .

q q

j=1

Program 8 implements the modi¬ed Gram-Schmidt method. Notice that

it is not possible to overwrite the computed QR factorization on the ma-

trix A. In general, the matrix R is overwritten on A, whilst Q is stored

separately. The computational cost of the modi¬ed Gram-Schmidt method

has the order of 2mn2 ¬‚ops.

Program 8 - mod grams : Modi¬ed Gram-Schmidt method

function [Q,R] = mod grams(A)

[m,n]=size(A);

Q=zeros(m,n); Q(1:m,1) = A(1:m,1); R=zeros(n); R(1,1)=1;

for k = 1:n

R(k,k) = norm (A(1:m,k)); Q(1:m,k) = A(1:m,k)/R(k,k);

3.5 Pivoting 85

for j=k+1:n

R (k,j) = Q (1:m,k)™ * A(1:m,j);

A (1:m,j) = A (1:m,j) - Q(1:m,k)*R(k,j);

end

end

Example 3.4 Let us consider the Hilbert matrix H4 of order 4 (see (3.32)). The

˜

matrix Q, generated by the standard Gram-Schmidt algorithm, is orthogonal up

to the order of 10’10 , being

®

0.0000 ’0.0000 0.0001 ’0.0041

’0.0000 0.0004 ’0.0099

0

I ’ QT Q = 10’10

˜˜

° 0.0001 0 ’0.4785 »

0.0004

’0.0041 ’0.0099 ’0.4785 0

and I ’ QT Q ∞ = 4.9247 · 10’11 . Using the modi¬ed Gram-Schmidt method,

˜˜

we would obtain

®

0.0001 ’0.0005 0.0069 ’0.2853

’0.0005 0.0213

0 ’0.0023

I ’ QT Q = 10’12

˜˜

° 0.0069 ’0.0023 0.0002 ’0.0103 »

’0.2853 0.0213 ’0.0103 0

and this time I ’ QT Q ∞ = 3.1686 · 10’13 .

˜˜

An improved result can be obtained using, instead of Program 8, the intrinsic

function QR of MATLAB. This function can be properly employed to generate

•

both the factorization (3.46) as well as its reduced version (3.47).

3.5 Pivoting

As previously pointed out, the GEM process breaks down as soon as a zero

pivotal entry is computed. In such an event, one needs to resort to the so-

called pivoting technique, which amounts to exchanging rows (or columns)

of the system in such a way that non vanishing pivots are obtained.

Example 3.5 Let us go back to matrix (3.33) for which GEM furnishes at the

second step a zero pivotal element. By simply exchanging the second row with

the third one, we can execute one step further of the elimination method, ¬nding

a nonzero pivot. The generated system is equivalent to the original one and it

can be noticed that it is already in upper triangular form. Indeed

®

1 2 3

A(2) = ° 0 ’6 ’12 » = U,

’1

0 0

while the transformation matrices are given by

® ®

1 00 1 0 0

° ’2 1 0 » , M(2) = ° 0 0 ».

(1)

1

M=

’7 0 1 0 0 1

86 3. Direct Methods for the Solution of Linear Systems

From an algebraic standpoint, a permutation of the rows of A has been performed.

In fact, it now no longer holds that A=M’1 M’1 U, but rather A=M’1 P M’1 U,

1 2 1 2

P being the permutation matrix

®

100

P = ° 0 0 1 ». (3.50)

010

•

The pivoting strategy adopted in Example 3.5 can be generalized by look-

ing, at each step k of the elimination procedure, for a nonzero pivotal entry

by searching within the entries of the subcolumn A(k) (k : n, k). For that

reason, it is called partial pivoting (by rows).

From (3.30) it can be seen that a large value of mik (generated for ex-

(k)

ample by a small value of the pivot akk ) might amplify the rounding errors

(k)

a¬ecting the entries akj . Therefore, in order to ensure a better stability,

the pivotal element is chosen as the largest entry (in module) of the column

A(k) (k : n, k) and partial pivoting is generally performed at every step of

the elimination procedure, even if not strictly necessary (that is, even if

nonzero pivotal entries are found).

Alternatively, the searching process could have been extended to the

whole submatrix A(k) (k : n, k : n), ending up with a complete pivoting

(see Figure 3.2). Notice, however, that while partial pivoting requires an

additional cost of about n2 searches, complete pivoting needs about 2n3 /3,

with a considerable increase of the computational cost of GEM.

00000000000000000

11111111111111111 11111111111111111

00000000000000000

00000000000000000

11111111111111111 11111111111111111

00000000000000000

00000000000000000

11111111111111111 11111111111111111

00000000000000000

11111111111111111

00000000000000000 00000000000000000

11111111111111111

11111111111111111

00000000000000000 11111111111111111

00000000000000000

11111111111111111

00000000000000000 11111111111111111

00000000000000000

00000000000000000

11111111111111111 11111111111111111

00000000000000000

11111111111111111

00000000000000000 00000000000000000

11111111111111111

k

k

0

0 r

r

q

k k

FIGURE 3.2. Partial pivoting by row (left) or complete pivoting (right). Shaded

areas of the matrix are those involved in the searching for the pivotal entry

Example 3.6 Let us consider the linear system Ax = b with

10’13 1

A=

1 1

3.5 Pivoting 87

and where b is chosen in such a way that x = (1, 1)T is the exact solution.

Suppose we use base 2 and 16 signi¬cant digits. GEM without pivoting would

give xM EG = (0.99920072216264, 1)T , while GEM plus partial pivoting furnishes

•

the exact solution up to the 16th digit.

Let us analyze how partial pivoting a¬ects the LU factorization induced

by GEM. At the ¬rst step of GEM with partial pivoting, after ¬nding

out the entry ar1 of maximum module in the ¬rst column, the elementary

permutation matrix P1 which exchanges the ¬rst row with the r-th row is

constructed (if r = 1, P1 is the identity matrix). Next, the ¬rst Gaussian

transformation matrix M1 is generated and we set A(2) = M1 P1 A(1) . A

similar approach is now taken on A(2) , searching for a new permutation

matrix P2 and a new matrix M2 such that

A(3) = M2 P2 A(2) = M2 P2 M1 P1 A(1) .

Executing all the elimination steps, the resulting upper triangular matrix

U is now given by

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) . (3.51)

Letting M = Mn’1 Pn’1 . . . M1 P1 and P = Pn’1 . . . P1 , we obtain that

U=MA and, thus, U = (MP’1 )PA. It can easily be checked that the matrix

L = PM’1 is unit lower triangular, so that the LU factorization reads

PA = LU. (3.52)

One should not be worried by the presence of the inverse of M, since M’1 =

P’1 M’1 . . . P’1 M’1 and P’1 = PT while M’1 = 2In ’ Mi .

i

1 1 n’1 n’1 i i

Once L, U and P are available, solving the initial linear system amounts

to solving the triangular systems Ly = Pb and Ux = y. Notice that the

entries of the matrix L coincide with the multipliers computed by LU fac-

torization, without pivoting, when applied to the matrix PA.

If complete pivoting is performed, at the ¬rst step of the process, once the

element aqr of largest module in submatrix A(1 : n, 1 : n) has been found,

we must exchange the ¬rst row and column with the q-th row and the

r-th column. This generates the matrix P1 A(1) Q1 , where P1 and Q1 are

permutation matrices by rows and by columns, respectively.

As a consequence, the action of matrix M1 is now such that A(2) =

M1 P1 A(1) Q1 . Repeating the process, at the last step, instead of (3.51) we

obtain

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) Q1 . . . Qn’1 .

In the case of complete pivoting the LU factorization becomes

PAQ = LU,

88 3. Direct Methods for the Solution of Linear Systems

where Q = Q1 . . . Qn’1 is a permutation matrix accounting for all permu-

tations that have been operated. By construction, matrix L is still lower

triangular, with module entries less than or equal to 1. As happens in

partial pivoting, the entries of L are the multipliers produced by the LU

factorization process without pivoting, when applied to the matrix PAQ.

Program 9 is an implementation of the LU factorization with complete

pivoting. For an e¬cient computer implementation of the LU factorization

with partial pivoting, we refer to the MATLAB intrinsic function lu.

Program 9 - LUpivtot : LU factorization with complete pivoting

function [L,U,P,Q] = LUpivtot(A,n)

P=eye(n); Q=P; Minv=P;

for k=1:n-1

[Pk,Qk]=pivot(A,k,n); A=Pk*A*Qk;

[Mk,Mkinv]=MGauss(A,k,n);

A=Mk*A; P=Pk*P; Q=Q*Qk;

Minv=Minv*Pk*Mkinv;

end

U=triu(A); L=P*Minv;

function [Mk,Mkinv]=MGauss(A,k,n)

Mk=eye(n);

for i=k+1:n, Mk(i,k)=-A(i,k)/A(k,k); end

Mkinv=2*eye(n)-Mk;

function [Pk,Qk]=pivot(A,k,n)

[y,i]=max(abs(A(k:n,k:n))); [piv,jpiv]=max(y);

ipiv=i(jpiv); jpiv=jpiv+k-1; ipiv=ipiv+k-1;

Pk=eye(n); Pk(ipiv,ipiv)=0; Pk(k,k)=0; Pk(k,ipiv)=1; Pk(ipiv,k)=1;

Qk=eye(n); Qk(jpiv,jpiv)=0; Qk(k,k)=0; Qk(k,jpiv)=1; Qk(jpiv,k)=1;

Remark 3.3 The presence of large pivotal entries is not in itself su¬cient

to guarantee accurate solutions, as demonstrated by the following example

(taken from [JM92]). For the linear system Ax = b

® ® ®

’4000 2000 2000 x1 400

° 2000 0.78125 0 » ° x2 » = ° 1.3816 »

2000 0 0 x3 1.9273

at the ¬rst step the pivotal entry coincides with the diagonal entry ’4000

itself. However, executing GEM on such a matrix yields the solution

T

x = [0.00096365, ’0.698496, 0.90042329]

whose ¬rst component drastically di¬ers from that of the exact solution

T

x = [1.9273, ’0.698496, 0.9004233] . The cause of this behaviour should

3.6 Computing the Inverse of a Matrix 89

be ascribed to the wide variations among the system coe¬cients. Such cases

can be remedied by a suitable scaling of the matrix (see Section 3.12.1).

Remark 3.4 (Pivoting for symmetric matrices) As already noticed,

pivoting is not strictly necessary if A is symmetric and positive de¬nite.

A separate comment is deserved when A is symmetric but not positive

de¬nite, since pivoting could destroy the symmetry of the matrix. This

can be avoided by employing a complete pivoting of the form PAPT , even

though this pivoting can only turn out into a reordering of the diagonal

entries of A. As a consequence, the presence on the diagonal of A of small

entries might inhibit the advantages of the pivoting. To deal with matrices

of this kind, special algorithms are needed (like the Parlett-Reid method

[PR70] or the Aasen method [Aas71]) for whose description we refer to

[GL89], and to [JM92] for the case of sparse matrices.

3.6 Computing the Inverse of a Matrix

The explicit computation of the inverse of a matrix can be carried out using

the LU factorization as follows. Denoting by X the inverse of a nonsingular

matrix A∈ Rn—n , the column vectors of X are the solutions to the linear

systems Axi = ei , for i = 1, . . . , n.

Supposing that PA=LU, where P is the partial pivoting permutation

matrix, we must solve 2n triangular systems of the form

Lyi = Pei , Uxi = yi i = 1, . . . , n,

i.e., a succession of linear systems having the same coe¬cient matrix but

di¬erent right hand sides. The computation of the inverse of a matrix is a

costly procedure which can sometimes be even less stable than MEG (see

[Hig88]).

An alternative approach for computing the inverse of A is provided by

the Faddev or Leverrier formula, which, letting B0 =I, recursively computes

1

tr(ABk’1 ), Bk = ’ABk’1 + ±k I, k = 1, 2, . . . , n.

±k =

k

Since Bn = 0, if ±n = 0 we get

1

A’1 = Bn’1 ,

±n

and the computational cost of the method for a full matrix is equal to

(n ’ 1)n3 ¬‚ops (for further details see [FF63], [Bar89]).

90 3. Direct Methods for the Solution of Linear Systems

3.7 Banded Systems

Discretization methods for boundary value problems often lead to solving

linear systems with matrices having banded, block or sparse forms. Ex-

ploiting the structure of the matrix allows for a dramatic reduction in the

computational costs of the factorization and of the substitution algorithms.

In the present and forthcoming sections, we shall address special variants

of MEG or LU factorization that are properly devised for dealing with ma-

trices of this kind. For the proofs and a more comprehensive treatment, we

refer to [GL89] and [Hig88] for banded or block matrices, while we refer to

[JM92], [GL81] and [Saa96] for sparse matrices and the techniques for their

storage.

The main result for banded matrices is the following.

Property 3.4 Let A∈ Rn—n . Suppose that there exists a LU factorization

of A. If A has upper bandwidth q and lower bandwidth p, then L has lower

bandwidth p and U has upper bandwidth q.

In particular, notice that the same memory area used for A is enough to

also store its LU factorization. Consider, indeed, that a matrix A having

upper bandwidth q and lower bandwidth p is usually stored in a matrix B

(p + q + 1) — n, assuming that

bi’j+q+1,j = aij

for all the indices i, j that fall into the band of the matrix. For instance, in

the case of the tridiagonal matrix A=tridiag5 (’1, 2, ’1) (where q = p = 1),

the compact storage reads

®

0 ’1 ’1 ’1 ’1

B=° 2 2 ».

2 2 2

’1 ’1 ’1 ’1 0

The same format can be used for storing the factorization LU of A. It is

clear that this storage format can be quite inconvenient in the case where

only a few bands of the matrix are large. In the limit, if only one column

and one row were full, we would have p = q = n and thus B would be a

full matrix with a lot of zero entries.

Finally, we notice that the inverse of a banded matrix is generally full

(as happens for the matrix A considered above).

3.7 Banded Systems 91

3.7.1 Tridiagonal Matrices

Consider the particular case of a linear system with nonsingular tridiagonal

matrix A given by

®

0

a1 c1

b2 a2 . . .

A= .

..

cn’1

.

° »

0 bn an

In such an event, the matrices L and U of the LU factorization of A are

bidiagonal matrices of the form

®

®

0

±1 c1

0

1

β2 1 .

±2 . .

U=

.

L= .. .. ..

. cn’1

. . »

° ° »

0 0

βn 1 ±n

The coe¬cients ±i and βi can easily be computed by the following relations

bi

, ±i = ai ’ βi ci’1 , i = 2, . . . , n.

±1 = a1 , βi = (3.53)

±i’1

This is known as the Thomas algorithm and can be regarded as a particular

instance of the Doolittle factorization, without pivoting. When one is not

interested in storing the coe¬cients of the original matrix, the entries ±i

and βi can be overwritten on A.

The Thomas algorithm can also be extended to solve the whole tridi-

agonal system Ax = f . This amounts to solving two bidiagonal systems

Ly = f and Ux = y, for which the following formulae hold

(Ly = f ) y1 = f1 , yi = fi ’ βi yi’1 , i = 2, . . . , n, (3.54)

yn

, xi = (yi ’ ci xi+1 ) /±i , i = n ’ 1, . . . , 1. (3.55)

(Ux = y) xn =

±n

The algorithm requires only 8n ’ 7 ¬‚ops: precisely, 3(n ’ 1) ¬‚ops for the

factorization (3.53) and 5n ’ 4 ¬‚ops for the substitution procedure (3.54)-

(3.55).

As for the stability of the method, if A is a nonsingular tridiagonal matrix

and L and U are the factors actually computed, then

|δA| ¤ (4u + 3u2 + u3 )|L| |U|,

92 3. Direct Methods for the Solution of Linear Systems

where δA is implicitly de¬ned by the relation A + δA = LU while u is the

roundo¬ unit. In particular, if A is also symmetric and positive de¬nite or

it is an M-matrix, we have

4u + 3u2 + u3

|δA| ¤ |A|,

1’u

which implies the stability of the factorization procedure in such cases. A

similar result holds even if A is diagonally dominant.

3.7.2 Implementation Issues

An implementation of the LU factorization for banded matrices is shown

in Program 10.

Program 10 - lu band : LU factorization for a banded matrix

function [A] = lu band (A,p,q)

[n,n]=size(A);

for k = 1:n-1

for i = k+1:min(k+p,n), A(i,k)=A(i,k)/A(k,k); end

for j = k+1:min(k+q,n)

for i = k+1:min(k+p,n), A(i,j)=A(i,j)-A(i,k)*A(k,j); end

end

end

In the case where n p and n q, this algorithm approximately takes

2npq ¬‚ops, with a considerable saving with respect to the case in which A

is a full matrix.

Similarly, ad hoc versions of the substitution methods can be devised

(see Programs 11 and 12). Their costs are, respectively, of the order of 2np

¬‚ops and 2nq ¬‚ops, always assuming that n p and n q.

Program 11 - forw band : Forward substitution for a banded matrix L

function [b] = forw band (L, p, b)

[n,n]=size(L);

for j = 1:n

for i=j+1:min(j+p,n); b(i) = b(i) - L(i,j)*b(j); end

end

Program 12 - back band : Backward substitution for a banded matrix U

function [b] = back band (U, q, b)

[n,n]=size(U);

for j=n:-1:1

b (j) = b (j) / U (j,j);

for i = max(1,j-q):j-1, b(i)=b(i)-U(i,j)*b(j); end

end

3.8 Block Systems 93

The programs assume that the whole matrix is stored (including also the

zero entries).

Concerning the tridiagonal case, the Thomas algorithm can be imple-

mented in several ways. In particular, when implementing it on computers

where divisions are more costly than multiplications, it is possible (and

convenient) to devise a version of the algorithm without divisions in (3.54)

and (3.55), by resorting to the following form of the factorization

A = LDMT =

® ® ®

’1 ’1

0

γ1 0 0 γ1 c1 0

γ1

.. ..

γ2

b2

’1 ’1

. .

γ2 0 γ2

..

.

.. .. .. ..

° » ° » ° »

. . . .

0 cn’1

0

’1 ’1

γn

0 bn γn 0 0 γn

The coe¬cients γi can be recursively computed by the formulae

γi = (ai ’ bi γi’1 ci’1 )’1 , for i = 1, . . . , n

where γ0 = 0, b1 = 0 and cn = 0 have been assumed. The forward and

backward substitution algorithms respectively read

y1 = γ1 f1 , yi = γi (fi ’ bi yi’1 ), i = 2, . . . , n

(Ly = f )

(3.56)

xi = yi ’ γi ci xi+1 , i = n ’ 1, . . . , 1.

(Ux = y) xn = yn

In Program 13 we show an implementation of the Thomas algorithm in

the form (3.56), without divisions. The input vectors a, b and c contain

the coe¬cients of the tridiagonal matrix {ai }, {bi } and {ci }, respectively,

while the vector f contains the components fi of the right-hand side f.

Program 13 - mod thomas : Thomas algorithm, modi¬ed version

function [x] = mod thomas (a,b,c,f)

n = size(a); b = [0; b]; c = [c; 0]; gamma (1) = 1/a (1);

for i =2:n, gamma(i)=1/(a(i)-b(i)*gamma(i-1)*c(i-1)); end

y (1) = gamma (1) * f (1);

for i = 2:n, y(i)=gamma(i)*(f(i)-b(i)*y(i-1)); end

x (n) = y (n);

for i = n-1:-1:1, x(i)=y(i)-gamma(i)*c(i)*x(i+1); end

3.8 Block Systems

In this section we deal with the LU factorization of block-partitioned matri-

ces, where each block can possibly be of a di¬erent size. Our aim is twofold:

optimizing the storage occupation by suitably exploiting the structure of

the matrix and reducing the computational cost of the solution of the sys-

tem.

94 3. Direct Methods for the Solution of Linear Systems

3.8.1 Block LU Factorization

Let A∈ Rn—n be the following block partitioned matrix

A11 A12

A= ,

A21 A22

where A11 ∈ Rr—r is a nonsingular square matrix whose factorization

L11 D1 R11 is known, while A22 ∈ R(n’r)—(n’r) . In such a case it is possible

to factorize A using only the LU factorization of the block A11 . Indeed, it

is true that

A11 A12 L11 0 D1 0 R11 R12

= ,

A21 A22 L21 In’r 0 ∆2 0 In’r

where

L21 = A21 R’1 D’1 , R12 = D’1 L’1 A12 ,

11 1 1 11

∆2 = A22 ’ L21 D1 R12 .

If necessary, the reduction procedure can be repeated on the matrix ∆2 ,

thus obtaining a block-version of the LU factorization.

If A11 were a scalar, the above approach would reduce by one the size of

the factorization of a given matrix. Applying iteratively this method yields

an alternative way of performing the Gauss elimination.

We also notice that the proof of Theorem 3.4 can be extended to the

case of block matrices, obtaining the following result.

Theorem 3.7 Let A ∈ Rn—n be partitioned in m — m blocks Aij with

i, j = 1, . . . , m. A admits a unique LU block factorization (with L having

unit diagonal entries) i¬ the m ’ 1 dominant principal block minors of A

are nonzero.

Since the block factorization is an equivalent formulation of the standard

LU factorization of A, the stability analysis carried out for the latter holds

for its block-version as well. Improved results concerning the e¬cient use

in block algorithms of fast forms of matrix-matrix product are dealt with

in [Hig88]. In the forthcoming section we focus solely on block-tridiagonal

matrices.

3.8.2 Inverse of a Block-partitioned Matrix

The inverse of a block matrix can be constructed using the LU factorization

introduced in the previous section. A remarkable application is when A is

a block matrix of the form

A = C + UBV,

3.8 Block Systems 95

where C is a block matrix that is “easy” to invert (for instance, when C

is given by the diagonal blocks of A), while U, B and V take into account

the connections between the diagonal blocks. In such an event A can be

inverted by using the Sherman-Morrison or Woodbury formula

’1

’1

BVC’1 , (3.57)

A’1 = (C + UBV) = C’1 ’ C’1 U I + BVC’1 U

having assumed that C and I + BVC’1 U are two nonsingular matrices.

This formula has several practical and theoretical applications, and is par-

ticularly e¬ective if connections between blocks are of modest relevance.

3.8.3 Block Tridiagonal Systems

Consider block tridiagonal systems of the form

® ® ®

0

A11 A12 x1 b1

.. . .

A21

. .

.

A22

. .

= ,

An x = (3.58)

.. ..

° . .

»° »

. .

. .

° An’1,n » . .

0 xn bn

An,n’1 Ann

where Aij are matrices of order ni — nj and xi and bi are column vectors of

size ni , for i, j = 1, . . . , n. We assume that the diagonal blocks are squared,

although not necessarily of the same size. For k = 1, . . . , n, set

®U

®

0

A12

0

In1 1

L1 ..

In2

.

U2

Ak = .

.. ..

..

. .

° »° . Ak’1,k »

0 0

Lk’1 Ink Uk

Equating for k = n the matrix above with the corresponding blocks of An ,

it turns out that U1 = A11 , while the remaining blocks can be obtained

solving sequentially, for i = 2, . . . , n, the systems Li’1 Ui’1 = Ai,i’1 for

the columns of L and computing Ui = Aii ’ Li’1 Ai’1,i .

This procedure is well de¬ned only if all the matrices Ui are nonsingular,

which is the case if, for instance, the matrices A1 , . . . , An are nonsingular.

As an alternative, one could resort to factorization methods for banded

matrices, even if this requires the storage of a large number of zero entries

(unless a suitable reordering of the rows of the matrix is performed).

A remarkable instance is when the matrix is block tridiagonal and sym-

metric, with symmetric and positive de¬nite blocks. In such a case (3.58)

96 3. Direct Methods for the Solution of Linear Systems

takes the form

® ® ®

A11 AT 0 x1 b1

21

.. . .

A21 A22

. .

.

. .

= .

.. ..

° . .

»° »

. .

AT

. .

° » . .

n,n’1

0 xn bn

An,n’1 Ann

Here we consider an extension to the block case of the Thomas algorithm,

which aims at transforming A into a block bidiagonal matrix. To this pur-

pose, we ¬rst have to eliminate the block corresponding to matrix A21 .

Assume that the Cholesky factorization of A11 is available and denote by

H11 the Cholesky factor. If we multiply the ¬rst row of the block system

by H’T , we ¬nd

11

H11 x1 + H’T AT x2 = H’T b1 .

21

11 11

Letting H21 = H’T AT and c1 = H11 b1 , it follows that A21 = HT H11 and

21 21

11

thus the ¬rst two rows of the system are

H11 x1 + H21 x2 = c1 ,

HT H11 x1 + A22 x2 + AT x3 = b2 .

21 32

As a consequence, multiplying the ¬rst row by HT and subtracting it from

21

the second one, the unknown x1 is eliminated and the following equivalent

equation is obtained

(1)

A22 x2 + AT x3 = b2 ’ H21 c1 ,

32

(1) (1)

with A22 = A22 ’ HT H21 . At this point, the factorization of A22 is carried

21

out and the unknown x3 is eliminated from the third row of the system,

and the same is repeated for the remaining rows of the system. At the end

n’1

of the procedure, which requires solving (n ’ 1) j=1 nj linear systems to

compute the matrices Hi+1,i , i = 1, . . . , n ’ 1, we end up with the following

block bidiagonal system

® ® ®

0

H11 H21 x1 c1

. .

..

. .

.

H22

. .

=

..

. Hn,n’1 ° . » ° . ». .

° » . .

0 xn cn

Hnn

which can be solved with a (block) back substitution method. If all blocks

have the same size p, then the number of multiplications required by the

algorithm is about (7/6)(n’1)p3 ¬‚ops (assuming both p and n very large).

3.9 Sparse Matrices 97

3.9 Sparse Matrices

In this section we brie¬‚y address the numerical solution of linear sparse

systems, that is, systems where the matrix A∈ Rn—n has a number of

nonzero entries of the order of n (and not n2 ). We call a pattern of a sparse

matrix the set of its nonzero coe¬cients.

Banded matrices with su¬ciently small bands are sparse matrices. Ob-

viously, for a sparse matrix the matrix structure itself is redundant and it

can be more conveniently substituted by a vector-like structure by means

of matrix compacting techniques, like the banded matrix format discussed

in Section 3.7.

4

5 3

x xxx x x 2

6

xxx x

x

xxxx

xxxxxx 1

7

x xx

xxx

xx xx x

xxx xx 8 12

xxx x

xx x

11

xxx xx 9

x xx xx 10

FIGURE 3.3. Pattern of a symmetric sparse matrix (left) and of its associated

graph (right). For the sake of clarity, the loops have not been drawn; moreover,

since the matrix is symmetric, only one of the two sides associated with each

aij = 0 has been reported

For sake of convenience, we associate with a sparse matrix A an oriented

graph G(A). A graph is a pair (V, X ) where V is a set of p points and X

is a set of q ordered pairs of elements of V that are linked by a line. The

elements of V are called the vertices of the graph, while the connection lines

are called the paths of the graph.

The graph G(A) associated with a matrix A∈ Rm—n can be constructed

by identifying the vertices with the set of the indices from 1 to the maximum

between m and n and supposing that a path exists which connects two

vertices i and j if aij = 0 and is directed from i to j, for i = 1, . . . , m and

j = 1, . . . , n. For a diagonal entry aii = 0, the path joining the vertex i

with itself is called a loop. Since an orientation is associated with each side,

the graph is called oriented (or ¬nite directed). As an example, Figure 3.3

displays the pattern of a symmetric and sparse 12 — 12 matrix, together