<<

. 4
( 26)



>>

0 ’1 1 0 ’m32 1

where A(3) = M2 A(2) . Therefore

M2 M1 A = A(3) = U. (3.34)

On the other hand, matrices M1 and M2 are lower triangular, their product
is still lower triangular, as is their inverse; thus, from (3.34) one gets

A = (M2 M1 )’1 U = LU,

which is the desired factorization of A.
This identity can be generalized as follows. Setting

mk = (0, . . . , 0, mk+1,k , . . . , mn,k )T ∈ Rn
3.3 The Gaussian Elimination Method (GEM) and LU Factorization 73

and de¬ning
® 
1 ... 0 0 ... 0
 . .. .
. .
. .
. .
.
. . . .
0 0
1 0
Mk =   = In ’ mk eT
0 0
’mk+1,k k
1
 
. .
. . . ..
°. .»
. . . .
. . . . .
’mn,k
0 ... 0 ... 1
as the k-th Gaussian transformation matrix, one ¬nds out that
(Mk )ip = δip ’ (mk eT )ip = δip ’ mik δkp , i, p = 1, . . . , n.
k

On the other hand, from (3.31) we have that
n
(k+1) (k) (k) (k)
’ (δip ’ mik δkp )apj ,
aij = aij mik δkk akj = i, j = k + 1, . . . , n,
p=1

or, equivalently,
A(k+1) = Mk A(k) . (3.35)
As a consequence, at the end of the elimination process the matrices Mk ,
with k = 1, . . . , n ’ 1, and the matrix U have been generated such that
Mn’1 Mn’2 . . . M1 A = U.
The matrices Mk are unit lower triangular with inverse given by
M’1 = 2In ’ Mk = In + mk eT , (3.36)
k
k

where (mi eT )(mj eT ) are equal to the null matrix if i = j. As a consequence
i j

A = M’1 M’1 . . . M’1 U
1 2 n’1

= (In + m1 eT )(In + m2 eT ) . . . (In + mn’1 eT )U
1 2 n’1

n’1
mi eT
= In + U
i
i=1
® 
1 0 ... ... 0
  (3.37)
 
 
.
.
 m21 
1 .
 
 
. 
=. .
..  U.
.
.
. m32 . 
 
. 
. 
. ..
.
. 
.
. 0
° »
mn1 mn2 ... mn,n’1 1
74 3. Direct Methods for the Solution of Linear Systems

De¬ning L = (Mn’1 Mn’2 . . . M1 )’1 = M’1 . . . M’1 , it follows that
1 n’1

A = LU.

We notice that, due to (3.37), the subdiagonal entries of L are the multi-
pliers mik produced by GEM, while the diagonal entries are equal to one.

Once the matrices L and U have been computed, solving the linear system
consists only of solving successively the two triangular systems

Ly = b

Ux = y.

The computational cost of the factorization process is obviously the same
as that required by GEM.

The following result establishes a link between the leading dominant
minors of a matrix and its LU factorization induced by GEM.

Theorem 3.4 Let A ∈ Rn—n . The LU factorization of A with lii = 1 for
i = 1, . . . , n exists and is unique i¬ the principal submatrices Ai of A of
order i = 1, . . . , n ’ 1 are nonsingular.
Proof. The existence of the LU factorization can be proved following the steps
of the GEM. Here we prefer to pursue an alternative approach, which allows for
proving at the same time both existence and uniqueness and that will be used
again in later sections.
Let us assume that the leading minors Ai of A are nonsingular for i = 1, . . . , n’
1 and prove, by induction on i, that under this hypothesis the LU factorization
of A(= An ) with lii = 1 for i = 1, . . . , n, exists and is unique.
The property is obviously true if i = 1. Assume therefore that there exists an
(i’1)
unique LU factorization of Ai’1 of the form Ai’1 = L(i’1) U(i’1) with lkk = 1
for k = 1, . . . , i ’ 1, and show that there exists an unique factorization also for
Ai . We partition Ai by block matrices as
® 
Ai’1 c
Ai = ° »
T
d aii

and look for a factorization of Ai of the form
® ® 
L(i’1) 0 U(i’1) u
Ai = L U = ° »° »,
(i) (i)
(3.38)
T
0T
l 1 uii

having also partitioned by blocks the factors L(i) and U(i) . Computing the prod-
uct of these two factors and equating by blocks the elements of Ai , it turns out
that the vectors l and u are the solutions to the linear systems L(i’1) u = c,
lT U(i’1) = dT .
3.3 The Gaussian Elimination Method (GEM) and LU Factorization 75

On the other hand, since 0 = det(Ai’1 ) = det(L(i’1) )det(U(i’1) ), the matrices
L(i’1) and U(i’1) are nonsingular and, as a result, u and l exist and are unique.
Thus, there exists a unique factorization of Ai , where uii is the unique solution
of the equation uii = aii ’ lT u. This completes the induction step of the proof.
It now remains to prove that, if the factorization at hand exists and is unique,
then the ¬rst n ’ 1 leading minors of A must be nonsingular. We shall distinguish
the case where A is singular and when it is nonsingular.
Let us start from the second one and assume that the LU factorization of A
with lii = 1 for i = 1, . . . , n, exists and is unique. Then, due to (3.38), we have
Ai = L(i) U(i) for i = 1, . . . , n. Thus

det(Ai ) = det(L(i) )det(U(i) ) = det(U(i) ) = u11 u22 . . . uii , (3.39)

from which, taking i = n and A nonsingular, we obtain u11 u22 . . . unn = 0, and
thus, necessarily, det(Ai ) = u11 u22 . . . uii = 0 for i = 1, . . . , n ’ 1.
Now let A be a singular matrix and assume that (at least) one diagonal entry
of U is equal to zero. Denote by ukk the null entry of U with minimum index k.
Thanks to (3.38), the factorization can be computed without troubles until the
k + 1-th step. From that step on, since the matrix U(k) is singular, existence and
uniqueness of the vector lT are certainly lost, and, thus, the same holds for the
uniqueness of the factorization. In order for this not to occur before the process
has factorized the whole matrix A, the ukk entries must all be nonzero up to the
index k = n ’ 1 included, and thus, due to (3.39), all the leading minors Ak must
be nonsingular for k = 1, . . . , n ’ 1. 3

From the above theorem we conclude that, if an Ai , with i = 1, . . . , n ’ 1,
is singular, then the factorization may either not exist or not be unique.

Example 3.3 Consider the matrices

1 2 0 1 0 1
B= , C= , D= .
1 2 1 0 0 2

According to Theorem 3.4, the singular matrix B, having nonsingular leading
minor B1 = 1, admits a unique LU factorization. The remaining two examples
outline that, if the assumptions of the theorem are not ful¬lled, the factorization
may fail to exist or be unique.
Actually, the nonsingular matrix C, with C1 singular, does not admit any
factorization, while the (singular) matrix D, with D1 singular, admits an in¬nite
number of factorizations of the form D = Lβ Uβ , with

1 0 0 1
∀β ∈ R.
Lβ = , Uβ = ,
2’β
β 1 0



In the case where the LU factorization is unique, we point out that, because
det(A) = det(LU) = det(L) det(U) = det(U), the determinant of A is given
76 3. Direct Methods for the Solution of Linear Systems

by

det(A) = u11 · · · unn .

Let us now recall the following property (referring for its proof to [GL89]
or [Hig96]).

Property 3.2 If A is a matrix diagonally dominant by rows or by columns,
then the LU factorization of A exists. In particular, if A is diagonally dom-
inant by columns, then |lij | ¤ 1 ∀i, j.

In the proof of Theorem 3.4 we exploited the fact the the diagonal entries
of L are equal to 1. In a similar manner, we could have ¬xed to 1 the
diagonal entries of the upper triangular matrix U, obtaining a variant of
GEM that will be considered in Section 3.3.4.
The freedom in setting up either the diagonal entries of L or those of U,
implies that several LU factorizations exist which can be obtained one from
the other by multiplication with a suitable diagonal matrix (see Section
3.4.1).


3.3.2 The E¬ect of Rounding Errors
If rounding errors are taken into account, the factorization process induced
by GEM yields two matrices, L and U, such that LU = A + δA, δA being a
perturbation matrix. The size of such a perturbation can be estimated by
nu
|δA| ¤ |L| |U|, (3.40)
1 ’ nu
where u is the roundo¬ unit (for the proof of this result we refer to [Hig89]).
From (3.40) it is seen that the presence of small pivotal entries can make
the right side of the inequality virtually unbounded, with a consequent loss
of control on the size of the perturbation matrix δA. The interest is thus
in ¬nding out estimates like (3.40) of the form

|δA| ¤ g(u)|A|,

where g(u) is a suitable function of u. For instance, assuming that L and
U have nonnegative entries, then since |L| |U| = |LU| one gets
nu
|L| |U| = |LU| = |A + δA| ¤ |A| + |δA| ¤ |A| + |L| |U|, (3.41)
1 ’ nu
from which the desired bound is achieved by taking g(u) = nu/(1 ’ 2nu).
The technique of pivoting, examined in Section 3.5, keeps the size of the
pivotal entries under control and makes it possible to obtain estimates like
(3.41) for any matrix.
3.3 The Gaussian Elimination Method (GEM) and LU Factorization 77

3.3.3 Implementation of LU Factorization
Since L is a lower triangular matrix with diagonal entries equal to 1 and U
is upper triangular, it is possible (and convenient) to store the LU factor-
ization directly in the same memory area that is occupied by the matrix A.
More precisely, U is stored in the upper triangular part of A (including the
diagonal), whilst L occupies the lower triangular portion of A (the diagonal
entries of L are not stored since they are implicitly assumed to be 1).
A coding of the algorithm is reported in Program 4. The output matrix
A contains the overwritten LU factorization.
Program 4 - lu kji : LU factorization of matrix A. kji version

function [A] = lu kji (A)
[n,n]=size(A);
for k=1:n-1
A(k+1:n,k)=A(k+1:n,k)/A(k,k);
for j=k+1:n, for i=k+1:n
A(i,j)=A(i,j)-A(i,k)*A(k,j);
end, end
end

This implementation of the factorization algorithm is commonly referred
to as the kji version, due to the order in which the cycles are executed.
In a more appropriate notation, it is called the SAXP Y ’ kji version,
due to the fact that the basic operation of the algorithm, which consists of
multiplying a scalar A by a vector X, summing another vector Y and then
storing the result, is usually called SAXPY (i.e. Scalar A X P lus Y ).
The factorization can of course be executed by following a di¬erent order.
In general, the forms in which the cycle on index i precedes the cycle on
j are called row-oriented, whilst the others are called column-oriented. As
usual, this terminology refers to the fact that the matrix is accessed by
rows or by columns.
An example of LU factorization, jki version and column-oriented, is given
in Program 5. This version is commonly called GAXP Y ’ jki, since the
basic operation (a product matrix-vector), is called GAXPY which stands
for Generalized sAXPY (see for further details [DGK84]). In the GAXPY
operation the scalar A of the SAXPY operation is replaced by a matrix.
Program 5 - lu jki : LU factorization of matrix A. jki version

function [A] = lu jki (A)
[n,n]=size(A);
for j=1:n
for k=1:j-1, for i=k+1:n
A(i,j)=A(i,j)-A(i,k)*A(k,j);
end, end
for i=j+1:n, A(i,j)=A(i,j)/A(j,j); end
end
78 3. Direct Methods for the Solution of Linear Systems

3.3.4 Compact Forms of Factorization
Remarkable variants of LU factorization are the Crout factorization and
Doolittle factorization, and are known also as compact forms of the Gauss
elimination method. This name is due to the fact that these approaches
require less intermediate results than the standard GEM to generate the
factorization of A.
Computing the LU factorization of A is formally equivalent to solving
the following nonlinear system of n2 equations
min(i,j)
aij = lir urj , (3.42)
r=1

the unknowns being the n2 + n coe¬cients of the triangular matrices L and
U. If we arbitrarily set n coe¬cients to 1, for example the diagonal entries
of L or U, we end up with the Doolittle and Crout methods, respectively,
which provide an e¬cient way to solve system (3.42).
In fact, supposing that the ¬rst k ’ 1 columns of L and U are available
and setting lkk = 1 (Doolittle method), the following equations are obtained
from (3.42)
k’1
akj = lkr urj + ukj , j = k, . . . , n
r=1
k’1
aik = lir urk + lik ukk , i = k + 1, . . . , n.
r=1

Note that these equations can be solved in a sequential way with respect
to the boxed variables ukj and lik . From the Doolittle compact method
we thus obtain ¬rst the k-th row of U and then the k-th column of L, as
follows: for k = 1, . . . , n
k’1
ukj = akj ’ lkr urj j = k, . . . , n
r=1 (3.43)
k’1
1
aik ’
lik = lir urk i = k + 1, . . . , n.
ukk r=1

The Crout factorization is generated similarly, computing ¬rst the k-th
column of L and then the k-th row of U: for k = 1, . . . , n
k’1
lik = aik ’ lir urk i = k, . . . , n
r=1
k’1
1
akj ’
ukj = lkr urj j = k + 1, . . . , n,
lkk r=1
3.4 Other Types of Factorization 79

where we set ukk = 1. Recalling the notations introduced above, the Doolit-
tle factorization is nothing but the ijk version of GEM.
We provide in Program 6 the implementation of the Doolittle scheme.
Notice that now the main computation is a dot product, so this scheme is
also known as the DOT ’ ijk version of GEM.
Program 6 - lu ijk : LU factorization of the matrix A: ijk version
function [A] = lu ijk (A)
[n,n]=size(A);
for i=1:n
for j=2:i
A(i,j-1)=A(i,j-1)/A(j-1,j-1);
for k=1:j-1, A(i,j)=A(i,j)-A(i,k)*A(k,j); end
end
for j=i+1:n
for k=1:i-1, A(i,j)=A(i,j)-A(i,k)*A(k,j); end
end
end



3.4 Other Types of Factorization
We now address factorizations suitable for symmetric and rectangular ma-
trices.

LDMT Factorization
3.4.1
It is possible to devise other types of factorizations of A removing the
hypothesis that the elements of L are equal to one. Speci¬cally, we will
address some variants where the factorization of A is of the form
A = LDMT .
where L, MT and D are lower triangular, upper triangular and diagonal
matrices, respectively.
After the construction of this factorization, the resolution of the system
can be carried out solving ¬rst the lower triangular system Ly=b, then the
diagonal one Dz=y, and ¬nally the upper triangular system MT x=z, with
a cost of n2 + n ¬‚ops. In the symmetric case, we obtain M = L and the
LDLT factorization can be computed with half the cost (see Section 3.4.2).
The LDLT factorization enjoys a property analogous to the one in The-
orem 3.4 for the LU factorization. In particular, the following result holds.

Theorem 3.5 If all the principal minors of a matrix A∈ Rn—n are nonzero
then there exist a unique diagonal matrix D, a unique unit lower triangu-
lar matrix L and a unique unit upper triangular matrix MT , such that
A = LDMT .
80 3. Direct Methods for the Solution of Linear Systems

Proof. By Theorem 3.4 we already know that there exists a unique LU factor-
ization of A with lii = 1 for i = 1, . . . , n. If we set the diagonal entries of D
equal to uii (nonzero because U is nonsingular), then A = LU = LD(D’1 U).
Upon de¬ning MT = D’1 U, the existence of the LDMT factorization follows,
where D’1 U is a unit upper triangular matrix. The uniqueness of the LDMT
3
factorization is a consequence of the uniqueness of the LU factorization.

The above proof shows that, since the diagonal entries of D coincide
with those of U, we could compute L, MT and D starting from the LU
factorization of A. It su¬ces to compute MT as D’1 U. Nevertheless, this
algorithm has the same cost as the standard LU factorization. Likewise,
it is also possible to compute the three matrices of the factorization by
enforcing the identity A=LDMT entry by entry.

3.4.2 Symmetric and Positive De¬nite Matrices: The
Cholesky Factorization
As already pointed out, the factorization LDMT simpli¬es considerably
when A is symmetric because in such a case M=L, yielding the so-called
LDMT factorization. The computational cost halves, with respect to the
LU factorization, to about (n3 /3) ¬‚ops.

As an example, the Hilbert matrix of order 3 admits the following LDLT
factorization
® ® ® ® 
113 1
113 1
100 10 0
2 2
    
1 1 1 1   
H3 =  2 3 4  =  2 1 0   0 12 0  0 1 1 .
1
° »° »° »° »
1 1
1 1 1
001
11 0 0 180
3
3 4 5
In the case that A is also positive de¬nite, the diagonal entries of D in the
LDLT factorization are positive. Moreover, we have the following result.

Theorem 3.6 Let A ∈ Rn—n be a symmetric and positive de¬nite matrix.
Then, there exists a unique upper triangular matrix H with positive diagonal
entries such that
A = HT H. (3.44)
This factorization is called Cholesky factorization and the entries hij of HT

can be computed as follows: h11 = a11 and, for i = 2, . . . , n,
j’1
aij ’ /hjj , j = 1, . . . , i ’ 1,
hij = hik hjk
k=1
(3.45)
1/2
i’1
aii ’ h2
hii = .
ik
k=1
3.4 Other Types of Factorization 81



Proof. Let us prove the theorem proceeding by induction on the size i of the
matrix (as done in Theorem 3.4), recalling that if Ai ∈ Ri—i is symmetric positive
de¬nite, then all its principal submatrices enjoy the same property.
For i = 1 the result is obviously true. Thus, suppose that it holds for i ’ 1 and
prove that it also holds for i. There exists an upper triangular matrix Hi’1 such
that Ai’1 = HT Hi’1 . Let us partition Ai as
i’1


Ai’1 v
Ai = ,
vT ±

with ± ∈ R+ , vT ∈ Ri’1 and look for a factorization of Ai of the form

HT Hi’1 h
0
Ai = HT Hi = i’1
.
i
0T
hT β
β

Enforcing the equality with the entries of Ai yields the equations HT h = v
i’1
and hT h + β 2 = ±. The vector h is thus uniquely determined, since HT is
i’1
nonsingular. As for β, due to the properties of determinants

0 < det(Ai ) = det(HT ) det(Hi ) = β 2 (det(Hi’1 ))2 ,
i

we can conclude that it must be a real number. As a result, β = ± ’ hT h is
the desired diagonal entry and this concludes the inductive argument.

Let us now prove formulae (3.45). The fact that h11 = a11 is an immediate
consequence of the induction argument for i = 1. In the case of a generic i,
relations (3.45)1 are the forward substitution formulae for the solution of the
linear system HT h = v = (a1i , a2i , . . . , ai’1,i )T , while formulae (3.45)2 state
√ i’1
that β = ± ’ hT h, where ± = aii . 3

The algorithm which implements (3.45) requires about (n3 /3) ¬‚ops and it
turns out to be stable with respect to the propagation of rounding errors.
˜
It can indeed be shown that the upper triangular matrix H is such that
˜˜
HT H = A + δA, where δA is a pertubation matrix such that δA 2 ¤
8n(n + 1)u A 2 , when the rounding errors are considered and assuming
that 2n(n + 1)u ¤ 1 ’ (n + 1)u (see [Wil68]).
Also, for the Cholesky factorization it is possible to overwrite the matrix
T
H in the lower triangular portion of A, without any further memory stor-
age. By doing so, both A and the factorization are preserved, noting that
A is stored in the upper triangular section since it is symmetric and that
i’1
its diagonal entries can be computed as a11 = h2 , aii = h2 + k=1 h2 ,
11 ii ik
i = 2, . . . , n.
An example of implementation of the Cholesky factorization is coded in
Program 7.
Program 7 - chol2 : Cholesky factorization
function [A] = chol2 (A)
[n,n]=size(A);
82 3. Direct Methods for the Solution of Linear Systems

for k=1:n-1
A(k,k)=sqrt(A(k,k)); A(k+1:n,k)=A(k+1:n,k)/A(k,k);
for j=k+1:n, A(j:n,j)=A(j:n,j)-A(j:n,k)*A(j,k); end
end
A(n,n)=sqrt(A(n,n));



3.4.3 Rectangular Matrices: The QR Factorization
De¬nition 3.1 A matrix A ∈ Rm—n , with m ≥ n, admits a QR fac-
torization if there exist an orthogonal matrix Q ∈ Rm—m and an upper
trapezoidal matrix R ∈ Rm—n with null rows from the n + 1-th one on,
such that

A = QR. (3.46)



This factorization can be constructed either using suitable transformation
matrices (Givens or Householder matrices, see Section 5.6.1) or using the
Gram-Schmidt orthogonalization algorithm discussed below.
It is also possible to generate a reduced version of the QR factorization
(3.46), as stated in the following result.

Property 3.3 Let A ∈ Rm—n be a matrix of rank n for which a QR fac-
torization is known. Then there exists a unique factorization of A of the
form

A = QR (3.47)

where Q and R are submatrices of Q and R given respectively by

(3.48)
Q = Q(1 : m, 1 : n), R = R(1 : n, 1 : n).

Moreover, Q has orthonormal vector columns and R is upper triangular
and coincides with the Cholesky factor H of the symmetric positive de¬nite
matrix AT A, that is, AT A = RT R.
˜
If A has rank n (i.e., full rank), then the column vectors of Q form an
orthonormal basis for the vector space range(A) (de¬ned in (1.5)). As a
consequence, constructing the QR factorization can also be interpreted as
a procedure for generating an orthonormal basis for a given set of vectors.
If A has rank r < n, the QR factorization does not necessarily yield an
orthonormal basis for range(A). However, one can obtain a factorization of
the form
R11 R12
QT AP = ,
0 0
3.4 Other Types of Factorization 83

m’n
n n n
˜
Rn
0
˜
A
m =Q
m’n


FIGURE 3.1. The reduced factorization. The matrices of the QR factorization
are drawn in dashed lines

where Q is orthogonal, P is a permutation matrix and R11 is a nonsingular
upper triangular matrix of order r.
In general, when using the QR factorization, we shall always refer to its
reduced form (3.47) as it ¬nds a remarkable application in the solution of
overdetermined systems (see Section 3.13).

˜ ˜
The matrix factors Q and R in (3.47) can be computed using the Gram-
Schmidt orthogonalization. Starting from a set of linearly independent vec-
tors, x1 , . . . , xn , this algorithm generates a new set of mutually orthogonal
vectors, q1 , . . . , qn , given by

q1 = x1 ,
k (3.49)
(qi , xk+1 )
qk+1 = xk+1 ’ k = 1, . . . , n ’ 1.
qi ,
(qi , qi )
i=1

˜
Denoting by a1 , . . . , an the column vectors of A, we set q1 = a1 / a1 2
˜ as
and, for k = 1, . . . , n ’ 1, compute the column vectors of Q

˜
qk+1 = qk+1 / qk+1 2 ,

where
k
qk+1 = ak+1 ’ (˜ j , ak+1 )˜ j .
q q
j=1

˜˜ ˜
Next, imposing that A=QR and exploiting the fact that Q is orthogonal
(that is, Q’1 = QT ), the entries of R can easily be computed. The overall
˜ ˜ ˜
computational cost of the algorithm is of the order of mn2 ¬‚ops.
It is also worth noting that if A has full rank, the matrix AT A is sym-
metric and positive de¬nite (see Section 1.9) and thus it admits a unique
Cholesky factorization of the form HT H. On the other hand, since the or-
˜
thogonality of Q implies
˜ ˜ ˜˜ ˜˜
HT H = AT A = RT QT QR = RT R,
84 3. Direct Methods for the Solution of Linear Systems

˜
we conclude that R is actually the Cholesky factor H of AT A. Thus, the
˜
diagonal entries of R are all nonzero only if A has full rank.
The Gram-Schmidt method is of little practical use since the generated
vectors lose their linear independence due to rounding errors. Indeed, in
¬‚oating-point arithmetic the algorithm produces very small values of qk+1 2
and rkk with a consequent numerical instability and loss of orthogonality
˜
˜
for matrix Q (see Example 3.4).
These drawbacks suggest employing a more stable version, known as
modi¬ed Gram-Schmidt method. At the beginning of the k + 1-th step, the
˜ ˜
projections of the vector ak+1 along the vectors q1 , . . . , qk are progressively
subtracted from ak+1 . On the resulting vector, the orthogonalization step
is then carried out. In practice, after computing (˜ 1 , ak+1 )˜ 1 at the k +1-th
q q
step, this vector is immediately subtracted from ak+1 . As an example, one
lets
(1)
ak+1 = ak+1 ’ (˜ 1 , ak+1 )˜ 1 .
q q

(1)
˜
This new vector ak+1 is projected along the direction of q2 and the obtained
(1)
projection is subtracted from ak+1 , yielding

(2) (1) (1)
ak+1 = ak+1 ’ (˜ 2 , ak+1 )˜ 2
q q

(k)
and so on, until ak+1 is computed.
(k)
It can be checked that ak+1 coincides with the corresponding vector qk+1
in the standard Gram-Schmidt process, since, due to the orthogonality of
˜˜ ˜
vectors q1 , q2 , . . . , qk ,
(k)
= ak+1 ’ (˜ 1 , ak+1 )˜ 1 ’ (˜ 2 , ak+1 ’ (˜ 1 , ak+1 )˜ 1 ) q2 + . . .

ak+1 q q q q
k
= ak+1 ’ (˜ j , ak+1 )˜ j .
q q
j=1

Program 8 implements the modi¬ed Gram-Schmidt method. Notice that
it is not possible to overwrite the computed QR factorization on the ma-
trix A. In general, the matrix R is overwritten on A, whilst Q is stored
separately. The computational cost of the modi¬ed Gram-Schmidt method
has the order of 2mn2 ¬‚ops.

Program 8 - mod grams : Modi¬ed Gram-Schmidt method
function [Q,R] = mod grams(A)
[m,n]=size(A);
Q=zeros(m,n); Q(1:m,1) = A(1:m,1); R=zeros(n); R(1,1)=1;
for k = 1:n
R(k,k) = norm (A(1:m,k)); Q(1:m,k) = A(1:m,k)/R(k,k);
3.5 Pivoting 85

for j=k+1:n
R (k,j) = Q (1:m,k)™ * A(1:m,j);
A (1:m,j) = A (1:m,j) - Q(1:m,k)*R(k,j);
end
end

Example 3.4 Let us consider the Hilbert matrix H4 of order 4 (see (3.32)). The
˜
matrix Q, generated by the standard Gram-Schmidt algorithm, is orthogonal up
to the order of 10’10 , being
® 
0.0000 ’0.0000 0.0001 ’0.0041
 ’0.0000 0.0004 ’0.0099 
0
I ’ QT Q = 10’10  
˜˜
° 0.0001 0 ’0.4785 »
0.0004
’0.0041 ’0.0099 ’0.4785 0

and I ’ QT Q ∞ = 4.9247 · 10’11 . Using the modi¬ed Gram-Schmidt method,
˜˜
we would obtain
® 
0.0001 ’0.0005 0.0069 ’0.2853
 ’0.0005 0.0213 
0 ’0.0023
I ’ QT Q = 10’12  
˜˜
° 0.0069 ’0.0023 0.0002 ’0.0103 »
’0.2853 0.0213 ’0.0103 0

and this time I ’ QT Q ∞ = 3.1686 · 10’13 .
˜˜
An improved result can be obtained using, instead of Program 8, the intrinsic
function QR of MATLAB. This function can be properly employed to generate

both the factorization (3.46) as well as its reduced version (3.47).



3.5 Pivoting
As previously pointed out, the GEM process breaks down as soon as a zero
pivotal entry is computed. In such an event, one needs to resort to the so-
called pivoting technique, which amounts to exchanging rows (or columns)
of the system in such a way that non vanishing pivots are obtained.

Example 3.5 Let us go back to matrix (3.33) for which GEM furnishes at the
second step a zero pivotal element. By simply exchanging the second row with
the third one, we can execute one step further of the elimination method, ¬nding
a nonzero pivot. The generated system is equivalent to the original one and it
can be noticed that it is already in upper triangular form. Indeed
® 
1 2 3
A(2) = ° 0 ’6 ’12 » = U,
’1
0 0
while the transformation matrices are given by
®  ® 
1 00 1 0 0
° ’2 1 0 » , M(2) = ° 0 0 ».
(1)
1
M=
’7 0 1 0 0 1
86 3. Direct Methods for the Solution of Linear Systems

From an algebraic standpoint, a permutation of the rows of A has been performed.
In fact, it now no longer holds that A=M’1 M’1 U, but rather A=M’1 P M’1 U,
1 2 1 2
P being the permutation matrix
® 
100
P = ° 0 0 1 ». (3.50)
010



The pivoting strategy adopted in Example 3.5 can be generalized by look-
ing, at each step k of the elimination procedure, for a nonzero pivotal entry
by searching within the entries of the subcolumn A(k) (k : n, k). For that
reason, it is called partial pivoting (by rows).
From (3.30) it can be seen that a large value of mik (generated for ex-
(k)
ample by a small value of the pivot akk ) might amplify the rounding errors
(k)
a¬ecting the entries akj . Therefore, in order to ensure a better stability,
the pivotal element is chosen as the largest entry (in module) of the column
A(k) (k : n, k) and partial pivoting is generally performed at every step of
the elimination procedure, even if not strictly necessary (that is, even if
nonzero pivotal entries are found).
Alternatively, the searching process could have been extended to the
whole submatrix A(k) (k : n, k : n), ending up with a complete pivoting
(see Figure 3.2). Notice, however, that while partial pivoting requires an
additional cost of about n2 searches, complete pivoting needs about 2n3 /3,
with a considerable increase of the computational cost of GEM.

00000000000000000
11111111111111111 11111111111111111
00000000000000000
00000000000000000
11111111111111111 11111111111111111
00000000000000000
00000000000000000
11111111111111111 11111111111111111
00000000000000000
11111111111111111
00000000000000000 00000000000000000
11111111111111111
11111111111111111
00000000000000000 11111111111111111
00000000000000000
11111111111111111
00000000000000000 11111111111111111
00000000000000000
00000000000000000
11111111111111111 11111111111111111
00000000000000000
11111111111111111
00000000000000000 00000000000000000
11111111111111111
k
k
0
0 r
r

q
k k

FIGURE 3.2. Partial pivoting by row (left) or complete pivoting (right). Shaded
areas of the matrix are those involved in the searching for the pivotal entry



Example 3.6 Let us consider the linear system Ax = b with

10’13 1
A=
1 1
3.5 Pivoting 87

and where b is chosen in such a way that x = (1, 1)T is the exact solution.
Suppose we use base 2 and 16 signi¬cant digits. GEM without pivoting would
give xM EG = (0.99920072216264, 1)T , while GEM plus partial pivoting furnishes

the exact solution up to the 16th digit.

Let us analyze how partial pivoting a¬ects the LU factorization induced
by GEM. At the ¬rst step of GEM with partial pivoting, after ¬nding
out the entry ar1 of maximum module in the ¬rst column, the elementary
permutation matrix P1 which exchanges the ¬rst row with the r-th row is
constructed (if r = 1, P1 is the identity matrix). Next, the ¬rst Gaussian
transformation matrix M1 is generated and we set A(2) = M1 P1 A(1) . A
similar approach is now taken on A(2) , searching for a new permutation
matrix P2 and a new matrix M2 such that

A(3) = M2 P2 A(2) = M2 P2 M1 P1 A(1) .

Executing all the elimination steps, the resulting upper triangular matrix
U is now given by

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) . (3.51)

Letting M = Mn’1 Pn’1 . . . M1 P1 and P = Pn’1 . . . P1 , we obtain that
U=MA and, thus, U = (MP’1 )PA. It can easily be checked that the matrix
L = PM’1 is unit lower triangular, so that the LU factorization reads

PA = LU. (3.52)

One should not be worried by the presence of the inverse of M, since M’1 =
P’1 M’1 . . . P’1 M’1 and P’1 = PT while M’1 = 2In ’ Mi .
i
1 1 n’1 n’1 i i
Once L, U and P are available, solving the initial linear system amounts
to solving the triangular systems Ly = Pb and Ux = y. Notice that the
entries of the matrix L coincide with the multipliers computed by LU fac-
torization, without pivoting, when applied to the matrix PA.
If complete pivoting is performed, at the ¬rst step of the process, once the
element aqr of largest module in submatrix A(1 : n, 1 : n) has been found,
we must exchange the ¬rst row and column with the q-th row and the
r-th column. This generates the matrix P1 A(1) Q1 , where P1 and Q1 are
permutation matrices by rows and by columns, respectively.
As a consequence, the action of matrix M1 is now such that A(2) =
M1 P1 A(1) Q1 . Repeating the process, at the last step, instead of (3.51) we
obtain

U = A(n) = Mn’1 Pn’1 . . . M1 P1 A(1) Q1 . . . Qn’1 .

In the case of complete pivoting the LU factorization becomes

PAQ = LU,
88 3. Direct Methods for the Solution of Linear Systems

where Q = Q1 . . . Qn’1 is a permutation matrix accounting for all permu-
tations that have been operated. By construction, matrix L is still lower
triangular, with module entries less than or equal to 1. As happens in
partial pivoting, the entries of L are the multipliers produced by the LU
factorization process without pivoting, when applied to the matrix PAQ.
Program 9 is an implementation of the LU factorization with complete
pivoting. For an e¬cient computer implementation of the LU factorization
with partial pivoting, we refer to the MATLAB intrinsic function lu.

Program 9 - LUpivtot : LU factorization with complete pivoting
function [L,U,P,Q] = LUpivtot(A,n)
P=eye(n); Q=P; Minv=P;
for k=1:n-1
[Pk,Qk]=pivot(A,k,n); A=Pk*A*Qk;
[Mk,Mkinv]=MGauss(A,k,n);
A=Mk*A; P=Pk*P; Q=Q*Qk;
Minv=Minv*Pk*Mkinv;
end
U=triu(A); L=P*Minv;

function [Mk,Mkinv]=MGauss(A,k,n)
Mk=eye(n);
for i=k+1:n, Mk(i,k)=-A(i,k)/A(k,k); end
Mkinv=2*eye(n)-Mk;

function [Pk,Qk]=pivot(A,k,n)
[y,i]=max(abs(A(k:n,k:n))); [piv,jpiv]=max(y);
ipiv=i(jpiv); jpiv=jpiv+k-1; ipiv=ipiv+k-1;
Pk=eye(n); Pk(ipiv,ipiv)=0; Pk(k,k)=0; Pk(k,ipiv)=1; Pk(ipiv,k)=1;
Qk=eye(n); Qk(jpiv,jpiv)=0; Qk(k,k)=0; Qk(k,jpiv)=1; Qk(jpiv,k)=1;



Remark 3.3 The presence of large pivotal entries is not in itself su¬cient
to guarantee accurate solutions, as demonstrated by the following example
(taken from [JM92]). For the linear system Ax = b
® ® ® 
’4000 2000 2000 x1 400
° 2000 0.78125 0 » ° x2 » = ° 1.3816 »
2000 0 0 x3 1.9273
at the ¬rst step the pivotal entry coincides with the diagonal entry ’4000
itself. However, executing GEM on such a matrix yields the solution
T
x = [0.00096365, ’0.698496, 0.90042329]

whose ¬rst component drastically di¬ers from that of the exact solution
T
x = [1.9273, ’0.698496, 0.9004233] . The cause of this behaviour should
3.6 Computing the Inverse of a Matrix 89

be ascribed to the wide variations among the system coe¬cients. Such cases
can be remedied by a suitable scaling of the matrix (see Section 3.12.1).

Remark 3.4 (Pivoting for symmetric matrices) As already noticed,
pivoting is not strictly necessary if A is symmetric and positive de¬nite.
A separate comment is deserved when A is symmetric but not positive
de¬nite, since pivoting could destroy the symmetry of the matrix. This
can be avoided by employing a complete pivoting of the form PAPT , even
though this pivoting can only turn out into a reordering of the diagonal
entries of A. As a consequence, the presence on the diagonal of A of small
entries might inhibit the advantages of the pivoting. To deal with matrices
of this kind, special algorithms are needed (like the Parlett-Reid method
[PR70] or the Aasen method [Aas71]) for whose description we refer to
[GL89], and to [JM92] for the case of sparse matrices.



3.6 Computing the Inverse of a Matrix
The explicit computation of the inverse of a matrix can be carried out using
the LU factorization as follows. Denoting by X the inverse of a nonsingular
matrix A∈ Rn—n , the column vectors of X are the solutions to the linear
systems Axi = ei , for i = 1, . . . , n.
Supposing that PA=LU, where P is the partial pivoting permutation
matrix, we must solve 2n triangular systems of the form

Lyi = Pei , Uxi = yi i = 1, . . . , n,

i.e., a succession of linear systems having the same coe¬cient matrix but
di¬erent right hand sides. The computation of the inverse of a matrix is a
costly procedure which can sometimes be even less stable than MEG (see
[Hig88]).

An alternative approach for computing the inverse of A is provided by
the Faddev or Leverrier formula, which, letting B0 =I, recursively computes

1
tr(ABk’1 ), Bk = ’ABk’1 + ±k I, k = 1, 2, . . . , n.
±k =
k
Since Bn = 0, if ±n = 0 we get

1
A’1 = Bn’1 ,
±n

and the computational cost of the method for a full matrix is equal to
(n ’ 1)n3 ¬‚ops (for further details see [FF63], [Bar89]).
90 3. Direct Methods for the Solution of Linear Systems

3.7 Banded Systems
Discretization methods for boundary value problems often lead to solving
linear systems with matrices having banded, block or sparse forms. Ex-
ploiting the structure of the matrix allows for a dramatic reduction in the
computational costs of the factorization and of the substitution algorithms.
In the present and forthcoming sections, we shall address special variants
of MEG or LU factorization that are properly devised for dealing with ma-
trices of this kind. For the proofs and a more comprehensive treatment, we
refer to [GL89] and [Hig88] for banded or block matrices, while we refer to
[JM92], [GL81] and [Saa96] for sparse matrices and the techniques for their
storage.
The main result for banded matrices is the following.



Property 3.4 Let A∈ Rn—n . Suppose that there exists a LU factorization
of A. If A has upper bandwidth q and lower bandwidth p, then L has lower
bandwidth p and U has upper bandwidth q.



In particular, notice that the same memory area used for A is enough to
also store its LU factorization. Consider, indeed, that a matrix A having
upper bandwidth q and lower bandwidth p is usually stored in a matrix B
(p + q + 1) — n, assuming that

bi’j+q+1,j = aij

for all the indices i, j that fall into the band of the matrix. For instance, in
the case of the tridiagonal matrix A=tridiag5 (’1, 2, ’1) (where q = p = 1),
the compact storage reads

® 
0 ’1 ’1 ’1 ’1
B=° 2 2 ».
2 2 2
’1 ’1 ’1 ’1 0

The same format can be used for storing the factorization LU of A. It is
clear that this storage format can be quite inconvenient in the case where
only a few bands of the matrix are large. In the limit, if only one column
and one row were full, we would have p = q = n and thus B would be a
full matrix with a lot of zero entries.
Finally, we notice that the inverse of a banded matrix is generally full
(as happens for the matrix A considered above).
3.7 Banded Systems 91

3.7.1 Tridiagonal Matrices
Consider the particular case of a linear system with nonsingular tridiagonal
matrix A given by
® 
0
a1 c1
 
 b2 a2 . . . 
 
A= .
..
 cn’1 
.
° »
0 bn an

In such an event, the matrices L and U of the LU factorization of A are
bidiagonal matrices of the form
® 

®
0
±1 c1
0
1
 

 β2 1 .
 
±2 . .


 U= 
  .
L= .. ..  ..
 . cn’1 
. . »
° ° »
0 0
βn 1 ±n

The coe¬cients ±i and βi can easily be computed by the following relations

bi
, ±i = ai ’ βi ci’1 , i = 2, . . . , n.
±1 = a1 , βi = (3.53)
±i’1

This is known as the Thomas algorithm and can be regarded as a particular
instance of the Doolittle factorization, without pivoting. When one is not
interested in storing the coe¬cients of the original matrix, the entries ±i
and βi can be overwritten on A.
The Thomas algorithm can also be extended to solve the whole tridi-
agonal system Ax = f . This amounts to solving two bidiagonal systems
Ly = f and Ux = y, for which the following formulae hold

(Ly = f ) y1 = f1 , yi = fi ’ βi yi’1 , i = 2, . . . , n, (3.54)


yn
, xi = (yi ’ ci xi+1 ) /±i , i = n ’ 1, . . . , 1. (3.55)
(Ux = y) xn =
±n

The algorithm requires only 8n ’ 7 ¬‚ops: precisely, 3(n ’ 1) ¬‚ops for the
factorization (3.53) and 5n ’ 4 ¬‚ops for the substitution procedure (3.54)-
(3.55).
As for the stability of the method, if A is a nonsingular tridiagonal matrix
and L and U are the factors actually computed, then

|δA| ¤ (4u + 3u2 + u3 )|L| |U|,
92 3. Direct Methods for the Solution of Linear Systems

where δA is implicitly de¬ned by the relation A + δA = LU while u is the
roundo¬ unit. In particular, if A is also symmetric and positive de¬nite or
it is an M-matrix, we have
4u + 3u2 + u3
|δA| ¤ |A|,
1’u
which implies the stability of the factorization procedure in such cases. A
similar result holds even if A is diagonally dominant.


3.7.2 Implementation Issues
An implementation of the LU factorization for banded matrices is shown
in Program 10.

Program 10 - lu band : LU factorization for a banded matrix
function [A] = lu band (A,p,q)
[n,n]=size(A);
for k = 1:n-1
for i = k+1:min(k+p,n), A(i,k)=A(i,k)/A(k,k); end
for j = k+1:min(k+q,n)
for i = k+1:min(k+p,n), A(i,j)=A(i,j)-A(i,k)*A(k,j); end
end
end
In the case where n p and n q, this algorithm approximately takes
2npq ¬‚ops, with a considerable saving with respect to the case in which A
is a full matrix.
Similarly, ad hoc versions of the substitution methods can be devised
(see Programs 11 and 12). Their costs are, respectively, of the order of 2np
¬‚ops and 2nq ¬‚ops, always assuming that n p and n q.

Program 11 - forw band : Forward substitution for a banded matrix L
function [b] = forw band (L, p, b)
[n,n]=size(L);
for j = 1:n
for i=j+1:min(j+p,n); b(i) = b(i) - L(i,j)*b(j); end
end


Program 12 - back band : Backward substitution for a banded matrix U
function [b] = back band (U, q, b)
[n,n]=size(U);
for j=n:-1:1
b (j) = b (j) / U (j,j);
for i = max(1,j-q):j-1, b(i)=b(i)-U(i,j)*b(j); end
end
3.8 Block Systems 93

The programs assume that the whole matrix is stored (including also the
zero entries).
Concerning the tridiagonal case, the Thomas algorithm can be imple-
mented in several ways. In particular, when implementing it on computers
where divisions are more costly than multiplications, it is possible (and
convenient) to devise a version of the algorithm without divisions in (3.54)
and (3.55), by resorting to the following form of the factorization
A = LDMT =
® ® ® 
’1 ’1
0
γ1 0 0 γ1 c1 0
γ1
     
.. ..
γ2
 b2     
’1 ’1
. .
γ2 0 γ2
     
..
     
.
.. .. .. ..
° » ° » ° »
. . . .
0 cn’1
0
’1 ’1
γn
0 bn γn 0 0 γn

The coe¬cients γi can be recursively computed by the formulae
γi = (ai ’ bi γi’1 ci’1 )’1 , for i = 1, . . . , n
where γ0 = 0, b1 = 0 and cn = 0 have been assumed. The forward and
backward substitution algorithms respectively read
y1 = γ1 f1 , yi = γi (fi ’ bi yi’1 ), i = 2, . . . , n
(Ly = f )
(3.56)
xi = yi ’ γi ci xi+1 , i = n ’ 1, . . . , 1.
(Ux = y) xn = yn
In Program 13 we show an implementation of the Thomas algorithm in
the form (3.56), without divisions. The input vectors a, b and c contain
the coe¬cients of the tridiagonal matrix {ai }, {bi } and {ci }, respectively,
while the vector f contains the components fi of the right-hand side f.

Program 13 - mod thomas : Thomas algorithm, modi¬ed version
function [x] = mod thomas (a,b,c,f)
n = size(a); b = [0; b]; c = [c; 0]; gamma (1) = 1/a (1);
for i =2:n, gamma(i)=1/(a(i)-b(i)*gamma(i-1)*c(i-1)); end
y (1) = gamma (1) * f (1);
for i = 2:n, y(i)=gamma(i)*(f(i)-b(i)*y(i-1)); end
x (n) = y (n);
for i = n-1:-1:1, x(i)=y(i)-gamma(i)*c(i)*x(i+1); end



3.8 Block Systems
In this section we deal with the LU factorization of block-partitioned matri-
ces, where each block can possibly be of a di¬erent size. Our aim is twofold:
optimizing the storage occupation by suitably exploiting the structure of
the matrix and reducing the computational cost of the solution of the sys-
tem.
94 3. Direct Methods for the Solution of Linear Systems

3.8.1 Block LU Factorization
Let A∈ Rn—n be the following block partitioned matrix

A11 A12
A= ,
A21 A22

where A11 ∈ Rr—r is a nonsingular square matrix whose factorization
L11 D1 R11 is known, while A22 ∈ R(n’r)—(n’r) . In such a case it is possible
to factorize A using only the LU factorization of the block A11 . Indeed, it
is true that
A11 A12 L11 0 D1 0 R11 R12
= ,
A21 A22 L21 In’r 0 ∆2 0 In’r

where
L21 = A21 R’1 D’1 , R12 = D’1 L’1 A12 ,
11 1 1 11

∆2 = A22 ’ L21 D1 R12 .

If necessary, the reduction procedure can be repeated on the matrix ∆2 ,
thus obtaining a block-version of the LU factorization.
If A11 were a scalar, the above approach would reduce by one the size of
the factorization of a given matrix. Applying iteratively this method yields
an alternative way of performing the Gauss elimination.
We also notice that the proof of Theorem 3.4 can be extended to the
case of block matrices, obtaining the following result.

Theorem 3.7 Let A ∈ Rn—n be partitioned in m — m blocks Aij with
i, j = 1, . . . , m. A admits a unique LU block factorization (with L having
unit diagonal entries) i¬ the m ’ 1 dominant principal block minors of A
are nonzero.
Since the block factorization is an equivalent formulation of the standard
LU factorization of A, the stability analysis carried out for the latter holds
for its block-version as well. Improved results concerning the e¬cient use
in block algorithms of fast forms of matrix-matrix product are dealt with
in [Hig88]. In the forthcoming section we focus solely on block-tridiagonal
matrices.


3.8.2 Inverse of a Block-partitioned Matrix
The inverse of a block matrix can be constructed using the LU factorization
introduced in the previous section. A remarkable application is when A is
a block matrix of the form

A = C + UBV,
3.8 Block Systems 95

where C is a block matrix that is “easy” to invert (for instance, when C
is given by the diagonal blocks of A), while U, B and V take into account
the connections between the diagonal blocks. In such an event A can be
inverted by using the Sherman-Morrison or Woodbury formula

’1
’1
BVC’1 , (3.57)
A’1 = (C + UBV) = C’1 ’ C’1 U I + BVC’1 U

having assumed that C and I + BVC’1 U are two nonsingular matrices.
This formula has several practical and theoretical applications, and is par-
ticularly e¬ective if connections between blocks are of modest relevance.



3.8.3 Block Tridiagonal Systems
Consider block tridiagonal systems of the form
® ®  ® 
0
A11 A12 x1 b1
   
.. . .
 A21   
. .
.
A22
  . .
= ,
An x =   (3.58)
 
.. ..
 ° . .
»° »
. .
. .
° An’1,n » . .
0 xn bn
An,n’1 Ann

where Aij are matrices of order ni — nj and xi and bi are column vectors of
size ni , for i, j = 1, . . . , n. We assume that the diagonal blocks are squared,
although not necessarily of the same size. For k = 1, . . . , n, set

®U 
®
0
A12
0
In1 1
 
 L1  .. 
In2
  .
U2 
Ak =   .
.. ..
  .. 
. .
° »° . Ak’1,k »
0 0
Lk’1 Ink Uk

Equating for k = n the matrix above with the corresponding blocks of An ,
it turns out that U1 = A11 , while the remaining blocks can be obtained
solving sequentially, for i = 2, . . . , n, the systems Li’1 Ui’1 = Ai,i’1 for
the columns of L and computing Ui = Aii ’ Li’1 Ai’1,i .
This procedure is well de¬ned only if all the matrices Ui are nonsingular,
which is the case if, for instance, the matrices A1 , . . . , An are nonsingular.
As an alternative, one could resort to factorization methods for banded
matrices, even if this requires the storage of a large number of zero entries
(unless a suitable reordering of the rows of the matrix is performed).
A remarkable instance is when the matrix is block tridiagonal and sym-
metric, with symmetric and positive de¬nite blocks. In such a case (3.58)
96 3. Direct Methods for the Solution of Linear Systems

takes the form
® ®  ® 
A11 AT 0 x1 b1
21
   
.. . .
 A21 A22   
. .
.
  . .
= .
   
.. ..
 ° . .
»° »
. .
AT
. .
° » . .
n,n’1
0 xn bn
An,n’1 Ann

Here we consider an extension to the block case of the Thomas algorithm,
which aims at transforming A into a block bidiagonal matrix. To this pur-
pose, we ¬rst have to eliminate the block corresponding to matrix A21 .
Assume that the Cholesky factorization of A11 is available and denote by
H11 the Cholesky factor. If we multiply the ¬rst row of the block system
by H’T , we ¬nd
11

H11 x1 + H’T AT x2 = H’T b1 .
21
11 11

Letting H21 = H’T AT and c1 = H11 b1 , it follows that A21 = HT H11 and
21 21
11
thus the ¬rst two rows of the system are

H11 x1 + H21 x2 = c1 ,
HT H11 x1 + A22 x2 + AT x3 = b2 .
21 32

As a consequence, multiplying the ¬rst row by HT and subtracting it from
21
the second one, the unknown x1 is eliminated and the following equivalent
equation is obtained
(1)
A22 x2 + AT x3 = b2 ’ H21 c1 ,
32

(1) (1)
with A22 = A22 ’ HT H21 . At this point, the factorization of A22 is carried
21
out and the unknown x3 is eliminated from the third row of the system,
and the same is repeated for the remaining rows of the system. At the end
n’1
of the procedure, which requires solving (n ’ 1) j=1 nj linear systems to
compute the matrices Hi+1,i , i = 1, . . . , n ’ 1, we end up with the following
block bidiagonal system
® ® ® 
0
H11 H21 x1 c1
  .   . 
..
  .   . 
.
H22
  .   . 
  = 
..
 . Hn,n’1  ° . » ° . ». .
° » . .
0 xn cn
Hnn

which can be solved with a (block) back substitution method. If all blocks
have the same size p, then the number of multiplications required by the
algorithm is about (7/6)(n’1)p3 ¬‚ops (assuming both p and n very large).
3.9 Sparse Matrices 97

3.9 Sparse Matrices
In this section we brie¬‚y address the numerical solution of linear sparse
systems, that is, systems where the matrix A∈ Rn—n has a number of
nonzero entries of the order of n (and not n2 ). We call a pattern of a sparse
matrix the set of its nonzero coe¬cients.
Banded matrices with su¬ciently small bands are sparse matrices. Ob-
viously, for a sparse matrix the matrix structure itself is redundant and it
can be more conveniently substituted by a vector-like structure by means
of matrix compacting techniques, like the banded matrix format discussed
in Section 3.7.

4
5 3

x xxx x x 2
6
xxx x
x
xxxx
xxxxxx 1
7
x xx
xxx
xx xx x
xxx xx 8 12
xxx x
xx x
11
xxx xx 9
x xx xx 10


FIGURE 3.3. Pattern of a symmetric sparse matrix (left) and of its associated
graph (right). For the sake of clarity, the loops have not been drawn; moreover,
since the matrix is symmetric, only one of the two sides associated with each
aij = 0 has been reported

For sake of convenience, we associate with a sparse matrix A an oriented
graph G(A). A graph is a pair (V, X ) where V is a set of p points and X
is a set of q ordered pairs of elements of V that are linked by a line. The
elements of V are called the vertices of the graph, while the connection lines
are called the paths of the graph.
The graph G(A) associated with a matrix A∈ Rm—n can be constructed
by identifying the vertices with the set of the indices from 1 to the maximum
between m and n and supposing that a path exists which connects two
vertices i and j if aij = 0 and is directed from i to j, for i = 1, . . . , m and
j = 1, . . . , n. For a diagonal entry aii = 0, the path joining the vertex i
with itself is called a loop. Since an orientation is associated with each side,
the graph is called oriented (or ¬nite directed). As an example, Figure 3.3
displays the pattern of a symmetric and sparse 12 — 12 matrix, together

<<

. 4
( 26)



>>