<<

. 4
( 18)



>>

Create a h1 — h1 matrix N1 , with N1 [i, j] = N [i, j] + N [i, h1 + j]
Recursively compute R1 = M1 · N1
Add R1 to upper left quadrant of R, i.e., R[i, j]+ = R1 [i, j]
Create a h1 — h1 matrix M2 , with M2 [i, j] = M [i, h1 + j] + M [h1 + i, h1 + j]
Create a h1 — h1 matrix N2 , with N2 [i, j] = N [h1 + i, j] + N [h1 + i, h1 + j]
Recursively compute R2 = M2 · N2
Add R2 to lower right quadrant of R, i.e., R[h1 + i, h1 + j]+ = R2 [i, j]
Create a h1 — h1 matrix M3 , with M3 [i, j] = M [i, h1 + j] + M [h1 + i, j]
Create a h1 — h1 matrix N3 , with N3 [i, j] = N [h1 + i, j] ’ N [i, h1 + j]
Recursively compute R3 = M3 · N3
Add R3 to upper left quadrant of R, i.e., R[i, j]+ = R3 [i, j]
Subtract R3 from lower right quadrant of R, i.e., R[h1 +i, h1 +j]’ = R3 [i, j]
Create a h1 — h1 matrix M4 , with M4 [i, j] = M [h1 + i, j]
Create a h1 — h1 matrix N4 , with N4 [i, j] = N [i, j] + N [h1 + i, j]
Recursively compute R4 = M4 · N4
Subtract R4 from upper left quadrant of R, i.e., R[i, j]’ = R4 [i, j]
Add R4 to lower left quadrant of R, i.e., R[h1 + i, j]+ = R4 [i, j]
Create a h1 — h1 matrix M5 , with M5 [i, j] = M [i, h1 + j]
Create a h1 — h1 matrix N5 , with N5 [i, j] = N [i, h1 + j] + N [h1 + i, h1 + j]
Recursively compute R5 = M5 · N5
Add R5 to upper right quadrant of R, i.e., R[i, h1 + j]+ = R5 [i, j]
Subtract R5 from lower right quadrant of R, i.e., R[h1 +i, h1 +j]’ = R5 [i, j]
Create a h1 — h1 matrix M6 , with M6 [i, j] = M [h1 + i, j] ’ M [h1 + i, h1 + j]
Create a h1 — h1 matrix N6 , with N6 [i, j] = N [h1 + i, j]
Recursively compute R6 = M6 · N6
Add R6 to lower right quadrant of R, i.e., R[h1 + i, h1 + j]+ = R6 [i, j]
Subtract R6 from lower left quadrant of R, i.e., R[h1 + i, j]’ = R6 [i, j]
Create a h1 — h1 matrix M7 , with M7 [i, j] = M [i, j] ’ M [i, h1 + j]
Create a h1 — h1 matrix N7 , with N7 [i, j] = N [i, h1 + j]
Recursively compute R7 = M7 · N7
Add R7 to upper right quadrant of R, i.e., R[i, h1 + j]+ = R7 [i, j]
Subtract R6 from upper left quadrant of R, i.e., R[i, j]’ = R7 [i, j]
Return the upper left n — n submatrix of R




© 2009 by Taylor and Francis Group, LLC
Linear algebra 83




Algorithm 3.3 Strassen matrix multiplication (rounding down)
Require: Input n — n matrices M and N
if n = 1 then
Create R a 1 — 1 matrix with entry M [1, 1] · N [1, 1]
Return R
end if
Let h1 ←’ n/2
Let n1 ←’ 2h1
Create R a n1 — n1 matrix with zero entries
Prepare, perform and post-process 7 recursive calls as in Algorithm 3.2
if n is odd then
Rede¬ne R as n — n.
for i from 1 to n1 do
for j from 1 to n1 do
Let R[i, j] ←’ R[ i, j] + M [i, n] · N [n, j]
end for
end for
for i from 1 to n1 do
n
Let R[i, n] ←’ k=1 M [i, k] · N [k, n]
n
Let R[n, i] ←’ k=1 M [n, k] · N [k, i]
end for
n
Let R[n, n] ←’ k=1 M [n, k] · N [k, n]
end if
Return n — n matrix of R




© 2009 by Taylor and Francis Group, LLC
84 Algorithmic Cryptanalysis


Winograd™s formulas for multiplying 2 — 2 matrices also re-
quire 7 multiplications, as in Strassen™s algorithm, but only
15 additions. Using the same notations, to multiply M by
M , the formulas are:

S2 = S1 ’ a,
S1 = c + d,
S3 = a ’ c, S4 = b ’ S2 ,
S5 = b ’ a , S6 = d ’ S5 ,
S7 = d ’ b , S8 = S6 ’ c ,
Q1 = S2 · S6 , Q2 = a · a ,
Q3 = b · c , Q4 = S3 · S7 ,
Q5 = S1 · S5 , Q6 = S4 · d ,
Q7 = d · S8 ,
T1 = Q1 + Q2 , T2 = T1 + Q4 and
Q2 + Q3 T1 + Q5 + Q6
M ·M = .
T2 ’ Q7 T2 + Q5

Figure 3.2: Winograd™s formulas for matrix multiplication



exhaustive and is here simply to highlight some unexpected technical details
that arise when implementing matrix multiplication. We consider two cases
of matrix multiplication which are cryptographically relevant. One case uses
multiplication of machine size integers and to avoid over¬‚ows when multiply-
ing numbers, we chose to work in Fp , where p = 46337 is the largest prime
such that 2p2 ¬ts in an unsigned number on 32 bits. The second case mul-
tiplies Boolean matrices whose sizes are multiples of 32. In this second case,
elementary block matrices of size 32 — 32 are multiplied using Program 3.5.
Both programs are too long to print here and are available on the book™s
website.
In both cases, another consideration is to decide whether to use the round-
down as in Algorithm 3.2 or the round-up approach of Algorithm 3.3 when
implementing Strassen™s algorithm. Both approaches work equally well for
matrices whose dimensions (or numbers of blocks over F2 ) are powers of two.
Similarly, both approaches have a worse case for which the other approach
would yield a much faster alternative. Namely, using the rounding up ap-
proach for a matrix of dimension of the form 2t + 1 is a bad idea. Likewise,
using the rounding down approach for a matrix of dimension 2t ’ 1 is inad-
equate. Moreover, rounding up costs more than rounding down on average.
Thus, to minimize the adverse consequences of rounding, we implemented
both approaches within each of our matrix multiplication codes, choosing the




© 2009 by Taylor and Francis Group, LLC
Linear algebra 85

rounding up option only for dimensions congruent to -1 modulo some small
power of 2. This choice allows us to use one rounding up instead of four
rounding down when possible. It is not optimal, but gives a nice, easy to
implement compromise and the resulting running times behave reasonably
nicely as a function of the dimension.
In both cases, we compare the running time of our implementation of
Strassen™s algorithm with an elementary matrix multiplication. Over F2 this
elementary multiplication uses a block by block approach in order to make use
of the fast code we have for 32 — 32 matrices. Programs 3.6 and 3.7 contain
the elementary matrix multiplication used as a reference. One drawback of
Program 3.7 is that it performs a modular reduction after each multiplication,
which is quite costly. A better approach is to perform the modular reduction
once, at the end of each scalar product, as in Program 3.8. However, this
limits the modulus that can be achieved with a ¬xed integer size. Similarly,
in Program 3.6, instead of using the 32 — 32 matrix multiplication routine in a
black box manner, we can combine the inner optimizations together with the
block matrix structure; see Exercise 4. Note that depending on the speci¬c
machine or compiler, the programs can sometimes be improved by reversing
the order of the two outer loops.
Over F2 , pushing the recursion in Strassen™s algorithm all the way down
to 32 — 32 matrices, the resulting running times almost perfectly re¬‚ect the
asymptotic analysis. The running times are given in seconds in Table 3.2 and
shown on a graph in Figure 3.3. Both sets measured running times closely fol-
low the theoretical predictions and can be approximated by curves of the form
t0 ·(x/X)3 and t1 ·(x/X)log2 7 , where t0 and t1 are the respective running times
of the two multiplication programs on the last data point available: X. More-
over, Strassen™s algorithm quickly becomes more e¬ective than the ordinary
matrix multiplication. It is interesting to remark that near 512 blocks, our
rounding strategy does not behave well, which explains the irregular running
times. Also note that these codes can still be greatly improved. In particular,
enlarging the basic block to 64 or 128 bits would be a very good idea. It is
also possible to improve the basic block multiplication with a time-memory
tradeo¬ algorithm called the algorithm of four Russians [ADKF70]. This is
especially useful here since each basic block appears in several multiplications.
With these improvements, the cuto¬ point for Strassen™s algorithm would be
higher.
Over Fp , we experimented several di¬erent values of the cuto¬ parameters,
namely 32, 64 and 128, it turns out that, with the simple basic implementation
we have, 64 seems to be a reasonable choice for the cuto¬ value. The running
times are shown on a graph in Figure 3.4. Note that the basic implementation
we are using is not optimized, as a consequence, by writting it more carefully,
it would be possible to speed up the matrix multiplication in Fp by at least an
order of magnitude. The reader can experiment this on the various computer
algebra systems listed in the Preface.
With our implementation, Strassen™s algorithm over Fp also behaves as ex-




© 2009 by Taylor and Francis Group, LLC
86 Algorithmic Cryptanalysis

Program 3.6 C code for elementary 32n — 32n matrix multiplication over F2
#include <stdio.h>
#include <stdlib.h>

#define WORD unsigned int

#define access(M,i,j,bsize) (&M[((i)*(bsize)*DIM)+((j)*DIM)])

/* External procedure for 32x32 boolean matrix multiplication */
extern void Mul(WORD res[DIM], WORD mat1[DIM], WORD mat2[DIM]);

void matmul(WORD * A, WORD * B, WORD *Res, int bsize) {
int i,j,k,l;
WORD tmp[DIM];

for(i=0;i<DIM*bsize*bsize;i++)
Res[i]=0;

for(i=0;i<bsize;i++)
for(j=0;j<bsize;j++) {
for(k=0;k<bsize;k++) {
Mul(tmp, access(A,i,k,bsize), access(B,k,j,bsize));
for(l=0;l<DIM;l++) {
access(Res,i,j,bsize)[l]^=tmp[l];
}
}
}
}



Num. of blocks (n) Ordinary mult. Strassen™s mult.
16 0.01 < 0.01
24 0.02 0.01
32 0.04 0.03
48 0.13 0.08
64 0.29 0.20
96 0.98 0.60
128 2.32 1.41
192 7.88 4.22
256 18.79 9.94
384 63.97 29.75
511 147.88 69.94


Table 3.2: Times for (32n) — (32n) Boolean matrix multiplication




© 2009 by Taylor and Francis Group, LLC
Linear algebra 87




150




100
Time (sec.)




Ordinary


50



Strassen




128 256 384 512
Dimension



Figure 3.3: Performance of Strassen™s multiplication over F2




© 2009 by Taylor and Francis Group, LLC
88 Algorithmic Cryptanalysis

Program 3.7 C code for elementary matrix multiplication over Fp
#include <stdio.h>
#include <stdlib.h>

#define TYPE unsigned short
#define MODULO 46337 /* Also works with primes up to 65521*/
#define access(M,i,j,size) M[(i)+((j)*(size))]

void matmul(TYPE * A, TYPE * B, TYPE *Res, int size) {
int i,j,k;
unsigned int tmp;

for(i=0;i<size;i++)
for(j=0;j<size;j++) {
tmp=0;
for(k=0;k<size;k++) {
tmp=(tmp+access(A,i,k,size)*access(B,k,j,size))%MODULO;
}
access(Res,i,j,size)=tmp;
}
}



pected. For large values of the dimension, however, we can see some cyclic
variations around the theoretical behavior. This probably re¬‚ects the fact
that the di¬erence in terms of performances between rounding up and round-
ing down is a function of the dimension. It shows once again that the rounding
strategy can be improved. On the contrary, the behavior of the elementary
algorithm is much more surprising. Instead of following a simple cubic equa-
tion, the running times seem to follow some cubic equation, then switch to
another and possibly a third. Moreover, for a small number of speci¬c values
of the dimension, the behavior is even worse. This mysterious behavior calls
for an explanation. In fact, it is due to cache e¬ects when reading the ma-
trices from memory during multiplication. These e¬ects are a side-e¬ect of
cache mecanisms, a frequently used tool in computer science. The common
basic idea of cache mecanism is to keep local copies of data elements in order
to avoid fetching them again when they are requested a second time. This
can be used to speed up all kind of data accesses. Frequently encountered
applications are disk or webpage accesses. More information about the use of
cache in modern processors can be found on Figure 3.5.
To understand why cache e¬ects arise in our implementation of elementary
multiplication, we can remark that the two innermost loops on j and k com-
pletely read the matrix B and scan the same line of A over and over. On the
one hand, the total size of B expressed in bytes is 2n2 for a n — n matrix and




© 2009 by Taylor and Francis Group, LLC
Linear algebra 89

for values of n within a reasonable range, it is comparable to the size of either
the ¬rst (up to n ≈ 150) or the second level (up to n ≈ 1000) of memory
cache. On the other hand, the size of a single line of A is 2n and thus should
comfortably ¬t into the ¬rst level of cache. To explain the misbehaviors of
the code, we should look for transitions of B from the ¬rst level of cache to
the second level or from the second level of cache to the main memory. This
¬rst cache e¬ect corresponds to the progressive slow-down of the code as the
dimension increases. To explain the second e¬ect, we should also look for ex-
ceptional cases where “ for technical reasons “ a line of A cannot be kept in the
¬rst level of cache. This happens when many addresses with a large number
of equal low order bits are required at the same time. This phenomenon is
due to the low level details of the implementation of the cache mechanism. In
fact, each memory address can only be stored in a small number of possible
di¬erent locations in the cache. These authorized locations are determined
by looking at a number of low order bits (say 20 or a bit less). When the
dimension of the matrix is divisible by a large power of 2, due to the memory
representation we are using, several memory locations may compete for the
same cache locations. In this case, the same line of A needs to be fetched
over and over again from memory. Moreover, reading A in this context is
even slower than reading B from main memory, because B takes advantage of
automatic prefetching mechanisms.
These cache e¬ects are more visible on the faster implementation that uses
fewer modular reductions. The reason is that the memory accesses in this
implementation use up a larger fraction of the running time. Note that the
e¬ect of cache misses greatly depends on the considered computer architecture.
More recent computers seem to perform better in this respect.
All these cache misbehaviors, or at least most of them, can be avoided
by reorganizing the loops and thus the memory accesses. However, this is
not straightforward and it can take a long time for a programmer to ¬nd a
working approach. It is extremely interesting to remark that thanks to its
use of a divide-and-conquer approach, Strassen™s algorithm neatly avoids this
problem.


3.2.2 Asymptotically fast matrix multiplication
After the discovery of Strassen™s algorithm, the question of ¬nding an op-
timal asymptotic algorithm for matrix multiplication became an essential is-
sue. Great advances were made and the exponent of matrix multiplication
was lowered a lot. These improvements rely on increasingly complex formulas
for multiplying various tensor forms. In particular, the use of approximate
formulas was essential to lower the number of necessary multiplications. For
example, a method of Bini, Capovani, Romani and Lotti [BCRL79] allows to
multiply a 3 by 2 matrix and a 2 by 3 matrix using only 10 multiplications of
low degree polynomials and leads to a complexity O(n2.7799 ). These methods
are discussed in detail in a book by Pan [Pan84]. The best current asymp-




© 2009 by Taylor and Francis Group, LLC
90 Algorithmic Cryptanalysis




Program 3.8 C code for matrix mult. over Fp with fewer modular reductions
#include <stdio.h>
#include <stdlib.h>

#define TYPE unsigned short
#define MODULO 46337
#define access(M,i,j,size) M[(i)+((j)*(size))]

void matmul(TYPE * A, TYPE * B, TYPE *Res, int size) {
int i,j,k;
unsigned int tmp;

for(i=0;i<size;i++)
for(j=0;j<size;j++) {
tmp=0;
for(k=0;k<size;k++) {
tmp=(tmp+access(A,i,k,size)*access(B,k,j,size));
if (tmp>=(MODULO*MODULO)) tmp-=MODULO*MODULO;
}
access(Res,i,j,size)=tmp%MODULO;
}
}




© 2009 by Taylor and Francis Group, LLC
Linear algebra 91




500




400




300 Few
Time (sec.)




reductions



200

Strassen

100
Ordinary



128 256 384 512 640 768 896 1024
Dimension



Figure 3.4: Performance of Strassen™s multiplication over Fp




© 2009 by Taylor and Francis Group, LLC
92 Algorithmic Cryptanalysis

In modern processors, a need for cache mechanisms arised from the fact that
processors speeds have been progressing much faster than memory speeds.
In this context, a CPU cache is a piece of fast memory located as close to
the processor unit as possible. Of course, the size of caches is much smaller
than the size of the main memory of the computer. As a consequence,
it is necessary to devise e¬cient heuristic techniques to make sure that in
many applications the cache will hold the necessary piece of data when it
is required. This good event is called a cache hit. The bad case, where a
required piece of data is not present in the cache, is called a cache miss.
Each entry in a CPU cache contains both a copy of a data piece from the
main memory and the address of this data in the memory. To avoid using
a large proportion of the fast cache memory simply to store addresses, CPU
caches do not individually store very small pieces of memory. The basic
unit is called a cache line and can contain several integers. To simplify the
management of the CPU cache, the cache lines are usually aligned, i.e., they
can only start at a position in memory whose address is a multiple of the
size of a cache line. Moreover, cache mechanisms cannot use too complex
algorithmic techniques to detect whether a data is already present in the
cache or not. Thus, to simplify the issue, cache lines are often grouped into
several smaller caches depending on the bit patterns of their addresses. These
smaller caches are called banks. This allows the CPU to load more than one
piece of data per clock cycle as long as the data comes from di¬erent banks.
Otherwise, there is a bank con¬‚ict which slows things down. Similarly, since
many applications use a limited amount of memory, cache accesses often
ignore the higher bits of addresses during the initial data search. This causes
another kind of con¬‚ict when two pieces of data located at positions whose
addresses di¬er by a large power of 2 are needed.
CPU caches are further complicated by the fact that the ¬rst level of cache
is usually too small and that a second level of bigger but slower cache is
often added on top of it. They are also usually combined with prefetching
mechanisms which try to predict future memory accesses and load the corre-
sponding data in advance into the cache, this works quite well with accesses
to regularly spaced memory addresses. In multitasking operating systems,
the conversion of the virtual memory addresses seen by an individual task
into physical memory addresses can become extremely slow if the tables
needed for the conversion cannot be held into ¬rst level of cache. Another
important issue is that cache mechanisms may induce security concerns and
allow an adversarial program to learn secret data, such as cryptographic
keys, which it should not be able to access [AS08, AScKK07, OST06].
Algorithms and computer programs that take cache mechanisms into
account in order to achieve better performance are called cache-
oblivious [FLPR99].

Figure 3.5: Principles of cached memory in processors




© 2009 by Taylor and Francis Group, LLC
Linear algebra 93

totic result is the method of Coppersmith and Winograd [CW90] which yields
a matrix multiplication algorithm running in time O(n2.376 ). This algorithm
has recently been reformulated in a group theoretic setting by Cohn, Klein-
berg, Szegedy and Umans [CKSU05]. However, no recent advances have been
made concerning the exponent of the complexity of matrix multiplication. A
widely held opinion is that the correct exponent is 2 and that the asymptotic
complexity of matrix multiplication probably is O(n2 log(n))t for some small
integer t.
From a practical point-of-view, these algorithms with an asymptotic com-
plexity better than Strassen™s are not applicable. Indeed, they involve ex-
tremely large constant overheads, due to the highly complex basic formulas
that are used. In practice, none of the algorithms with asymptotic complexity
better than Strassen™s has been reported as useful. Thus, as far as practical
aspects are considered, and in particular for cryptanalysis, the best currently
achievable exponent for matrix multiplication is log2 7.


3.2.3 Relation to other linear algebra problems
Since all linear algebra problems can be solved using matrix multiplication,
addition and inverse, in order to relate the general complexity of linear al-
gebra to the complexity of matrix multiplication, it su¬ces to determine the
complexity of inversion from the complexity of matrix multiplication. Indeed,
addition of n — n matrices can be done in time n2 , and thus it cannot increase
the exponent in the complexity of linear algebra problems. In this section, we
assume that we are given a matrix multiplication algorithm with complexity
O(nω ), for some constant 2 < ω ¤ 3 and we would like to show that matrix
inversion can also be performed in time O(nω ). As Strassen™s algorithm, this
can be obtained by working with 2 — 2 block matrices. Write:

AB
M= , (3.2)
CD

and remark that when D is invertible then multiplying M on the right by

Id 0
N= (3.3)
’1
’D C Id

yields:
A ’ BD’1 C B
MN = . (3.4)
0 D
In this equation, A ’ BD’1 C is called the Schur complement of M, in the
sequel, we denote it by S. Since a triangular matrix is easy to invert, indeed:
’1
U ’1 ’U ’1 V W ’1
UV
= , (3.5)
W ’1
0W 0




© 2009 by Taylor and Francis Group, LLC
94 Algorithmic Cryptanalysis

we can compute the inverse of M as:
’1
SB
’1
M =N . (3.6)
0D

This equation only requires two inversions of half-size matrices (D and S) and
some matrix multiplications and additions. As a consequence, it can be used
as a basis for a recursive matrix inversion algorithm with asymptotic running
time of the same form as the underlying matrix multiplication algorithm, i.e.,
O(nω ).
The use of the Schur complement is a powerful tool for numerical analysis,
many of its applications in this context are described in Zhang™s book [Zha05].
It is also possible to use the Schur complement to invert matrices over
¬nite ¬elds. Two methods are possible. The ¬rst option is to lift the matrix
to a larger ¬eld. For example, a matrix modulo a prime p can be lifted
to an integer matrix. However, this approach is costly, since the inverse of
the integer matrix may involve large denominators. The other option is to
compute Schur complement directly in the ¬nite ¬eld. In this case, a speci¬c
di¬culty arises: we need to make sure that D is invertible at each step of
the algorithm. Indeed, over the ¬nite ¬eld Fq , even a random matrix is not
guaranteed to have full rank. This is most critical when considering matrices
over F2 , because a random square matrix of dimension n over F2 has full rank
n
with probability i=1 (1 ’ 2’i ), which is close to 0.29 when n becomes large.




3.3 Gaussian elimination algorithms
When working with matrices, frequently encountered problems are to solve
a linear system of equations or to invert a matrix. In Section 3.2.3 we described
a recursive approach to matrix inversion. Here, we consider a more direct
approach: Gaussian elimination algorithms. Since the problems of solving
systems of equations and of inverting matrices are related, it is convenient to
start by the simpler problem: solving linear system of equations.
Gaussian elimination works in two phases. During the ¬rst phase, called
Gauss™s pivoting, we progressively modify the original system of equations
using reversible transforms, in order to turn it into a triangular system of
equations. During the second phase, thanks to the triangular form, it becomes
possible to determine, one at a time, the value of each unknown.
Gaussian elimination is quite simple to describe; however, when writing
down a complete algorithm, some technicalities arise in order to avoid any
division by zero. To avoid these technicalities, we initially write down slightly
incorrect versions of the two phases in order to outline the basic idea. To
simplify the presentation and avoid dealing with the representation of the




© 2009 by Taylor and Francis Group, LLC
Linear algebra 95

linear equations themselves, we simply assume that each equation is given as
an equality between a linear combination of variables xi given on the left-
hand side and a constant value given on the right-hand side. Moreover, we
assume that we have access to elementary routines that compute the addition
or subtraction of two equations, multiply an equation by a given scalar and
access individual coe¬cients within an equation. The ¬rst phase in simpli¬ed
form is given as Algorithm 3.4 and the second phase as Algorithm 3.5. The
action of the ingredient of Algorithm 3.5, pivoting, on the matrix of coe¬cients
of a linear system is illustrated in Figure 3.6.
Another application of Gaussian elimination is to compute the determinant
of a matrix. Indeed, the ¬rst phase does not modify the determinant and once
we have a triangular matrix, it su¬ces to compute the product of diagonal
element to obtain the determinant.


Algorithm 3.4 Triangularization of a linear system (simpli¬ed, incorrect)
Require: Input linear system n equations Ei in n unknowns xi
for i from 1 to n ’ 1 do
Let P be the non-zero coe¬cient of xi in Ei
for j from i + 1 to n do
Let C be the coe¬cient of xi in Ej
Ej ←’ Ej ’ (C/P ) · Ei
end for
end for
Output modi¬ed upper triangular system of n equations Ei




Algorithm 3.5 Backtracking to solve a triangular system
Require: Input triangular linear system n equations Ei in n unknowns xi
Create an array X of n elements to store the values of variables xi
for i from n down to 1 do
Let V be the constant term in equation Ei
for j from i + 1 to n do
Let C be the coe¬cient of xj in Ei
V ←’ V ’ C · X[j]
end for
Let P be the non-zero coe¬cient of xi in Ei
Let X[i] ←’ V /P
end for
Output array of determined values X




© 2009 by Taylor and Francis Group, LLC
96 Algorithmic Cryptanalysis

«  « 
— — ... — — ... — — — ... — — ... —
a1,1 a1,1
a2,2 — . . . — — ... —· a2,2 — . . . — — ... —·
¬0 ¬0
·
¬ ¬ ·
0 a3,3 . . . — — ... —· — — ... —·
¬0 ¬0 0 a3,3 . . .
¬. . . .. . ¬. . . .. .
¬ · ¬ ·
¬. . . .. ¬. . . .
· ·
.
— ... —· — ... —·
¬. . . . ·’¬ . . . . ·
— ... —· — ... —·
¬0 0 0 . . . ai,i ¬0 0 0 ... ai,i
¬ · ¬ ·
— ... —· — ... —·
¬0 0 0 . . . ai+1,i ¬0 0 0 ... 0
¬ · ¬ ·
¬. . . .. . . .. . · ¬. . . .. . . .·
. . . .. . . . . . . . . updated . 
.
. . . . . . . . . . . .
— ... — — ... —
0 0 0 . . . an,i 0 0 0 ... 0

Figure 3.6: E¬ect of a pivoting step


Our simpli¬ed algorithms are incorrect in general. They work correctly
under several important assumptions. The ¬rst of these assumptions is that
the input system has n equations in n unknowns and a unique solution. In
general, this need not be the case, linear systems may have more unknowns
than equations or the reverse. And, as recalled in Chapter 2, even systems
of n equations in n unknowns do not necessarily have a unique solution. If
their kernel is non-zero, they may have none or in¬nitely many. The next
assumption is that whenever we request the coe¬cient of xi in equation Ei
at some step of the algorithm, this value is non-zero. This assumption is
highlighted in boldface type in the algorithms.
When the ¬rst assumption does not hold, i.e., when the system of equations
does not have full rank, printing out an error message may, at least for now,
be an acceptable solution. However, the second assumption that no zeros
are encountered as denominators throughout the algorithm, is a technical
assumption that does not re¬‚ect any underlying mathematical invariant of
the linear system. In order to better understand its role, let us discuss the
inner working of Algorithm 3.4. This algorithm has a main loop where variable
i is used to successively go through all equations. At each step, we take the
coe¬cient of xi in the current version of equation Ei . Thanks to the work
already performed by the algorithm, all coe¬cients of previous variables x1
up to xi’1 are zero in Ei . Assuming that the coe¬cient of xi is non-zero,
the inner loop indexed by j then modi¬es the system to make xi vanish in
all subsequent equations. It is important to note that this transformation,
called pivoting, is reversible and thus does not change the set of solutions of
the linear system. As a consequence of the successive pivoting steps, after the
algorithm execution, the linear system becomes upper triangular. The non-
zero coe¬cient of xi in Ei used for pivoting is called the pivot. Taking the i-th
pivot from equation Ei comes naturally when writing down the algorithm, but
it is unessential to its inner working. In truth, for each pivoting step it su¬ces
to ¬nd a pivot, i.e., a non-zero coe¬cient, for xi in any equation that has not
yet been used for this purpose. In fact, as long as the system of equations is




© 2009 by Taylor and Francis Group, LLC
Linear algebra 97

invertible, at each step of the algorithm it is always possible to ¬nd at least one
equation Ej with j ≥ i such that the coe¬cient of xi in Ej is non-zero. Thus,
by reordering the equations, we can modify Algorithm 3.4 into Algorithm 3.6.
When working over ¬nite ¬elds, this change su¬ces to obtain a fully function
Gaussian elimination algorithm. Over the real or complex ¬elds, this is not
the end of story. Since this is outside of the scope of this book, let us simply
state that in this case the pivot should be chosen with even more care in order
to minimize the loss of accuracy during the algorithm execution.


Algorithm 3.6 Triangularization of a linear system
Require: Input linear system n equations Ei in n unknowns xi
for i from 1 to n ’ 1 do
for j from i to n do
Let P be the coe¬cient of xi in Ej
if P = 0 then
Exit Loop
end if
end for
if P = 0 then
Output ˜Non-invertible system™, Exit Algorithm
end if
Exchange equations Ei and Ej
Let P be the non-zero coe¬cient of xi in Ei
for j from i + 1 to n do
Let C be the coe¬cient of xi in Ej
Ej ←’ Ej ’ (C/P ) · Ei
end for
end for
Output modi¬ed system of n equations Ei



Once the linear system is transformed into triangular form, ¬nding the
value of each variable within the unique solution is a simple matter. Due to
the triangular form of the transformed system, the last equation relates the
value of the single variable, the last one, to a constant. Thus dividing the
constant in this equation by the coe¬cient of the last variable xn , we recover
its value. Once the value of xn is known, we can substitute this value in the
previous equation which only contains xn and xn’1 . After the substitution,
we can recover the value of xn’1 . Following this approach, we successively
determine xn’2 and all other variables. Algorithm 3.5 uses this approach.




© 2009 by Taylor and Francis Group, LLC
98 Algorithmic Cryptanalysis

3.3.1 Matrix inversion
With Gaussian elimination as a tool to solve invertible linear systems of
equations, we can now turn to the problem of computing the inverse of a ma-
trix. Remember that M is invertible, if and only if, there exists a matrix N
such that: M N = Id. Here, Id denotes the identity matrix, with a diagonal
of ones and zeros everywhere else, which is the neutral element in the multi-
plicative group of n — n matrices. Viewing the matrix N as n column vectors
N (i) , each such vector satis¬es a linear equation: M · N (i) = ∆(i) , where ∆(i)
is the vector whose coordinates are 0 everywhere except on line i where the
coe¬cient is 1. Solving these n linear systems of equations, we thus invert
M . However, from a complexity point-of-view this approach is not satisfying.
Instead, it is preferable to use the similarity between all these systems and
solve them all at once. Indeed, the sequence of operations in Algorithms 3.5
and 3.6 does not depend on the constant side of the equations. This allows
us to perform these operations in parallel on these n related systems.
To illustrate these ideas in the context of 32—32 Boolean matrices as in Sec-
tion 3.1, we give in Program 3.9 a C code compatible with the representation,
input and output routines given in the matrix multiplication Program 3.2.


3.3.2 Non-invertible matrices
Linear systems of equations encountered in cryptography do not necessarily
have full rank. When faced with such a system, Gaussian elimination is also
very useful. However, we need to change and complete our algorithms. Our
goal is to ¬nd whether a given system has solutions and, at least, to compute
a single solution. More generally, for non-invertible square matrix M , we aim
at computing its kernel Ker(M ) and image Im(M ).
Since these sets are linear subspaces, they can be very large, or even in¬nite
when the base ¬eld is in¬nite. Thus, it is not conceivable to represent them
by a list of elements; instead, we would like to obtain e¬cient representations
such as linear bases for these subspaces.
Remember that a linear system of equations, written in matrix form as
M x = y, with M non-invertible, has the following properties:
• If y does not belong to the image of M , then the equation has no solution.
• If y belongs to the image of M , then the set of solutions is an a¬ne
subspace.
• More precisely, starting from an arbitrary solution x0 of the system, the
set of solutions is x0 + Ker(M ).
When solving a single system for a non-invertible matrix M , it is convenient
to write a variation of Algorithms 3.5 and 3.6 to solve the system. When faced
with many systems of equations involving a single matrix M , it is preferable to
¬rst compute global information about the matrix, ¬nd its rank and describe




© 2009 by Taylor and Francis Group, LLC
Linear algebra 99




Algorithm 3.7 Matrix inversion
Require: Input n — n matrix M
Initialize matrix N to identity
for i from 1 to n ’ 1 do
for j from i to n do
Let P = Mj,i
if P = 0 then
Exit Loop
end if
end for
if P = 0 then
Output ˜Non-invertible matrix™, Exit Algorithm
end if
Exchange lines i and j in M
Exchange lines i and j in N
Divide lines i of M and N by P
for j from i + 1 to n do
Let C = Mj,i
Subtract C times line i of M from line j of M
Subtract C times line i of N from line j of N
end for
end for
Assert: M is upper triangular with a diagonal of 1
for i from n down to 1 do
for j from 1 to i ’ 1 do
Let C = Mj,i
Subtract C times line i of M from line j of M
Subtract C times line i of N from line j of N
end for
end for
Assert: M is the identity matrix
Output N the inverse of input matrix M




© 2009 by Taylor and Francis Group, LLC
100 Algorithmic Cryptanalysis

Program 3.9 Inversion of 32 — 32 matrix over F2
/* Warning: mat is transformed during MatInv */
int MatInv(WORD mat[DIM], WORD inv[DIM])
{
int piv,l,c,k;
WORD val,vali,mask;
for(piv=0,mask=1;piv<DIM;piv++,mask<<=1)
inv[piv]=mask;
for(piv=0,mask=1;piv<DIM;piv++,mask<<=1) {
for (c=piv;c<DIM;c++) if (mask&mat[c]) break;
if (c>=DIM) return(FALSE);
val=mat[c];mat[c]=mat[piv];mat[piv]=val;
vali=inv[c];inv[c]=inv[piv];inv[piv]=vali;
for(c=0;c<DIM;c++) if ((c!=piv)&&(mask&mat[c])) {
mat[c]^=val;inv[c]^=vali; }}
return(TRUE);
}



its kernel and image. This global information then allows fast resolution of
the systems involving M . This approach is similar to the computation of a
matrix inverse with Algorithm 3.7.
Let us start with the single system case. Since the system is non-invertible,
we know that running Algorithm 3.6 on it produces the error message “Non-
invertible system.” This occurs when no pivot can be found for a variable xi .
However, at that point, since xi has a zero coe¬cient on all lines from i to
n, the triangularization is in some sense already complete for line i, except
that there is a zero on the diagonal at this position, instead of a 1. Thus,
instead of aborting, we could simply skip to the next iteration of the main
loop. However, if we do not make sure that the current line is going to be
considered when looking for the next pivot, then the backtracking phase may
incorrectly claim that the system does not have a solution. To avoid this, the
simplest approach is to renumber the variables, pushing those without a pivot
toward the end. With this simple modi¬cation, all systems of equations can be
written, after renumbering the variables, in a nice triangular form, possibly
with a bunch of zeros at the lower right end of the diagonal. Once such a
system goes as input to Algorithm 3.5, it is easy to modify this algorithm to
¬nd solutions. For all the trailing zeros on the diagonal, if the corresponding
equation has a non-zero constant, the system does not have any solution.
Otherwise, the corresponding variable may be set at random. For the non-zero
diagonal entries, the backtracking algorithm resumes its former behavior. The
modi¬ed triangularization and backtracking algorithms (for a single solution)
are given as Algorithms 3.8 and 3.9.
Let us now consider the case of multiple systems involving the same matrix




© 2009 by Taylor and Francis Group, LLC
Linear algebra 101




Algorithm 3.8 Triangularization of a possibly non-invertible system
Require: Input linear system n equations Ei in n unknowns xi
Create an array Π of n entries, initialized to Π[i] = i.
Let Last = n
for i from 1 to Last do
for j from i to n do
Let P be the coe¬cient of xΠ[i] in Ej
if P = 0 then
Exit Loop
end if
end for
if P = 0 then
Exchange equations Ei and Ej
Let P be the non-zero coe¬cient of xΠ[i] in Ei
for j from i + 1 to n do
Let C be the coe¬cient of xΠ[i] in Ej
Ej ←’ Ej ’ (C/P ) · Ei
end for
else
Exchange Π[i] and Π[Last]
Decrement Last
Decrement i (to re-run the loop on the same i value)
end if
end for
Output modi¬ed system of n equations Ei and renumbering Π of the vari-
ables x




© 2009 by Taylor and Francis Group, LLC
102 Algorithmic Cryptanalysis

Algorithm 3.9 Backtracking of a possibly non-invertible triangular system
Require: Input triangular linear system n equations Ei in n unknowns xi
Require: Renumbering of the variables Π
Create an array X of n elements to store the values of variables xi
for i from n down to 1 do
Let V be the constant term in equation Ei
Let P be the coe¬cient of xΠ[i] in Ei
if P = 0 then
if V = 0 then
Output ˜No solution exists™, Exit Algorithm
else
Assign random value to X[Π[i]]
end if
else
for j from i + 1 to n do
Let C be the coe¬cient of xΠ[j] in Ei
V ←’ V ’ C · X[Π[j]]
end for
Let X[Π[i]] ←’ V /P
end if
end for
Output array of determined values X


M . In that case, we need to compute several objects. More precisely, we
need a basis of the kernel of M to recover all solutions of the system, we need
an e¬cient way to test whether a vector y belongs to the image of M and a
procedure to compute a solution x0 for the system.
The easiest approach is to start by modifying Algorithm 3.7 in order to avoid
any division by zero, using the same renumbering idea as in Algorithms 3.8
and 3.9. At the end of the modi¬ed algorithm, the matrix M is transformed,
in renumbered form into a matrix:
Id K
H= . (3.7)
00
At the same point, matrix N contains a transformation matrix such that
H = N M PΠ , where PΠ is the permutation matrix representing the permuta-
tion Π. Alternatively, reversing the roles of rows and columns, we can similarly
write H = PΠ M N , where:
Id 0
H= . (3.8)
K0
In that case, the matrix H is a basis for the image of M , with permuted
coordinates, and the ¬nal columns N corresponding to null columns after
multiplications by M form a basis of the kernel of M .




© 2009 by Taylor and Francis Group, LLC
Linear algebra 103

3.3.3 Hermite normal forms

In all the Gaussian elimination algorithms we presented in this section,
we encounter divisions by diagonal elements. When the considered matrix
or system of equations is de¬ned over a ¬eld, all the divisions, by non-zero
elements, can be performed without trouble. However, if the matrix is de¬ned
over a ring, we may have di¬culties. The typical example is the case of
a system of equations de¬ned over the integers. Such a system does not
necessarily have an integer solution. In that case, it would be nice to compute
the triangularization of the matrix in a way that avoids all non-exact divisions.
When the matrix is invertible this is related to the computation of a Hermite
normal form.


DEFINITION 3.1 An invertible square matrix M with integer coef-
¬cients is said to be in Hermite normal form if it is upper triangular,
with positive elements on its diagonal and if furthermore all non-diagonal ele-
ments are non-negative and smaller than the diagonal element in their column.
Equivalently:


∀i > j : Mi,j = 0 (3.9)
∀i : Mi,i > 0 (3.10)
∀i < j : 0 ¤ Mi,j < Mj,j (3.11)


Computing Hermite normal forms is reminiscent of both Gaussian elimina-
tion and GCD computations. During the ¬rst phase of Gaussian elimination,
at each pivoting step, we choose in the current column the smallest non-zero
entry (in absolute value) as pivot. We move the corresponding row in order
to have this pivot on the diagonal, potentially replace it by its opposite to get
a positive pivot, and remove integer multiples of this row from all rows below
it. We choose the multiples to make sure that the coe¬cients in the current
column become non-negative and smaller than the coe¬cient of the pivot.
If all coe¬cients of transformed rows in this column are zeros, we proceed
to the next column, otherwise we repeat the same computation, choosing a
smaller pivot. During the second phase of Gaussian elimination, we remove
multiples of each row from the rows above it, to make all non-diagonal entries
non-negative and smaller than the diagonal entry in their column. This is
described as Algorithm 3.10. This algorithm can easily be modi¬ed to keep
track of the transformations in an auxiliary matrix as in the matrix inversion
Algorithm 3.7. Note that a similar algorithm can be used to transform a
non-invertible matrix into something called a row echelon form. The main
di¬erence with the Hermite normal form is that in columns that do not have
any non-zero pivot, the size of the entries in previous rows cannot be reduced.




© 2009 by Taylor and Francis Group, LLC
104 Algorithmic Cryptanalysis

Algorithm 3.10 Hermite normal forms
Require: Input invertible n — n matrix M with integer entries
for i from 1 to n ’ 1 do
Let done ←’ false
while done = true do
Let P ←’ 0
for k from i to n do
if Mi,k = 0 then
if P = 0 or |Mi,k | < P then
Let j ←’ k; let P ←’ Mi,j
end if
end if
end for
if P = 0 then
Output ˜Non-invertible system™, Exit Algorithm
end if
Let done ←’ true
Exchange rows Mi and Mj
if P < 0 then
Let row Mi ←’ ’Mi ; let P ←’ ’P
end if
for j from i + 1 to n do
Let C ←’ Mj,i
Let row Mj ←’ Mj ’ C/P · Mi
if Mj,i = 0 then
Let done ←’ false
end if
end for
end while
end for
if Mn,n = 0 then
Output ˜Non-invertible system™, Exit Algorithm
end if
if Mn,n < 0 then
Let row Mn ←’ ’Mn
end if
for i from 2 to n do
Let P ←’ Mi,i
for j from 1 to i ’ 1 do
Let C ←’ Mj,i
Let row Mj ←’ Mj ’ C/P · Mi
end for
end for
Output Hermite normal form M




© 2009 by Taylor and Francis Group, LLC
Linear algebra 105

3.3.3.1 Linear algebra modulo composites and prime powers
When performing linear algebra modulo a composite number N , several
cases can be encountered. If the determinant of the matrix that de¬nes the
system is invertible modulo N , then the solution is unique and can be derived
using Gaussian elimination as in Section 3.3. However, if the determinant is
not invertible modulo N , then Gaussian elimination fails. In fact, without
loss of generality, we can limit ourselves to considering the case where N is
prime power. Indeed, otherwise we can use the Chinese remainder theorem.
More precisely, when N = N1 · N2 and N1 and N2 are coprime, then any
solution of the linear system modulo N can be obtained by pasting together
a solution modulo N1 and a solution modulo N2 .
When N is a prime power, say N = pe , we can generalize Gaussian elim-
ination and solve the linear system anyway. The basic idea, remains the
same, ¬rst we triangularize the linear system, choosing for each pivot a value
which is “as invertible as possible.” Formally, this means that if we de¬ne the
valuation v(x) of an element x in Z/pe Z as the multiplicity of p in any repre-
sentative for x, we should always choose as pivot the value in a given column
with the smallest valuation. During the second stage, when creating the list
of solutions, we can divide a value y by a pivot x if and only if v(y) ≥ v(x).
When possible, such a division yields pv(x) di¬erent solutions.




3.4 Sparse linear algebra
All the linear algebra algorithms that we have presented up to now deal
with dense matrices represented by their complete lists of entries. However,
there are many applications, both in scienti¬c computing and in cryptography,
where sparse matrices are involved. A sparse matrix is a matrix that contains
a relatively small number of non-zero entries. Very frequently, it takes the
form of a matrix in which each line (or column) only contains a small number
of non-zero entries, compared to the dimension of the matrix. With sparse
matrices, it is possible to represent in computer memory much larger matrices,
by giving for each line (resp. column) the list of positions containing a non-
zero coe¬cient, together with the value of the coe¬cient. Indeed, assuming an
average of l entries per line, storing a n—n matrix requires about 2ln numbers
instead of n2 . When dealing with a sparse linear system of equations, using
plain Gaussian elimination is often a bad idea. Each pivoting step during
Gaussian elimination increases the number of entries in the matrix and after
a relatively small number of steps, the matrix can no longer be considered
as sparse. As a consequence, if the dimension of the initial matrix was large,
Gaussian elimination quickly over¬‚ows the available memory. In order to deal
with sparse systems, a di¬erent approach is required. Moreover, sparsity is




© 2009 by Taylor and Francis Group, LLC
106 Algorithmic Cryptanalysis

not a well-behaved mathematical property. In particular, the inverse of a
sparse invertible matrix is not necessarily sparse. As a consequence, the best
we can hope for is an e¬cient, sparsity preserving algorithm to solve a single
linear system of equations.
Two main families of algorithms have been devised for sparse systems. One
family called structured Gaussian elimination contains variations on the ordi-
nary Gaussian elimination that chooses pivots in order to minimize the loss
of sparsity. The other family uses a totally di¬erent approach; it does not try
to modify the input matrix but instead aims at directly ¬nding a solution of
the linear system using only matrix by vector multiplications. In this family,
we ¬nd the Lanczos™s and the Wiedemann™s algorithms.


3.4.1 Iterative algorithms
3.4.1.1 Lanczos™s algorithm
Lanczos™s algorithm is a famous algorithm which has been devised to ¬nd
solutions of linear algebra systems of real or complex numbers. It is much
easier to describe when we can rely on a notion of convergence. Thus to
explain this algorithm, we temporarily leave our usual setting, forget about
¬nite ¬eld and consider a linear equation M y = x over the real numbers. For
simplicity, we assume that M is square and invertible. Moreover, without loss
of generality, we may assume that M is symmetric. Indeed, multiplying the
initial equation by the transpose of M , it can be transformed to an equation
( M M )y = ( M x), where ( M M ) is a symmetric matrix. When a square
n — n matrix M is real, symmetric and invertible, it induces a scalar product
(·|·)M on the vector space Rn , de¬ned from the usual scalar product by:

(u|v)M = (u|M v) = (M u|v). (3.12)

·
This scalar product induces a norm de¬ned as:
M


u = (u|u)M (3.13)
M

Over the ¬eld of real numbers, Lanczos™s algorithm works by ¬rst construct-
ing an orthonormal basis of Rn for the scalar product (·|·)M , i.e., a family of
vectors v1 , v2 , . . . , vn such that:

(vi |vi )M = 1 ∀i ∈ [1 · · · n] and (3.14)
(vi |vj )M = 0 ∀i = j.

Then it decomposes the solution y of M y = x over the basis (vi )i∈[1···n] using
the decomposition formula:
n
y= (y|vi )M vi . (3.15)
i=1




© 2009 by Taylor and Francis Group, LLC
Linear algebra 107

This decomposition formula can be used to solve the equation because the
coe¬cients (y|vi )M can be computed from x without previous knowledge of y
by remarking that:
(y|vi )M = (M y|vi ) = (x|vi ). (3.16)
Since the decomposition formula does not cost too much in terms of running
time, because it only requires n scalar products, this gives an e¬cient algo-
rithm if and only if the orthonormal basis can be constructed e¬ciently. A
nice approach works as follows, ¬rst choose a random vector w1 , then compute
v1 as:
w1
v1 = . (3.17)
w1 M
From v1 , we construct the orthonormal basis iteratively, computing for
each i, wi = M · vi and letting vi by the vector obtained by orthonormalizing
wi , i.e., letting:
i’1
w i = wi ’ (wi |vj )M vj and (3.18)
j=1
wi
vi = . (3.19)
wi M

At ¬rst, this orthonormalization process does not seem e¬cient, because
a naive implementation requires i scalar products at each step and O(n2 )
scalar products for the complete algorithm. However, it is easy to modify the
computation and perform two scalar products at each step. The reason is
that for i > j + 1, the scalar product (wi |vj )M is already 0, as a consequence
we can rewrite the computation of wi as:

wi = wi ’ (wi |vi’1 )M vi’1 ’ (wi |vi’2 )M vi’2 (3.20)

The reason for this simpli¬cation is that:

(wi |vj )M = (M vi |vj )M = (vi |M vj )M = (vi |wj )M . (3.21)

Since wj is in the vector space spanned by v1 , . . . , vj+1 , whenever i > j + 1,
this coe¬cient is already 0.
We stop the process, when the orthogonalized vector is equal to 0. Clearly,
the sequence of vectors v1 , . . . , vk that is generated is an orthonormal family.
However, is it a basis? The answer is yes if the family is large enough, more
precisely, if k = n. Due to the initial random choice of v1 , this is the most
frequent case.
Moreover, with most real matrices, Lanczos™s algorithm has a very useful
property of convergence; even partial decompositions, which are obtained by
truncating the sum in Equation (3.15), quickly give very good approximations
of y.




© 2009 by Taylor and Francis Group, LLC
108 Algorithmic Cryptanalysis

Very surprisingly, Lanczos™s algorithm can also be applied to linear system
de¬ned over ¬nite ¬elds. Of course, in ¬nite ¬elds, we cannot rely on conver-
gence arguments and need to run the algorithm till the end. However, if the
sequence (vi ) really forms a basis of the image vector space of M , everything
remains ¬ne. Thus, Lanczos™s algorithm works over ¬nite ¬eld, as long as the
construction of the sequence (vi ) does not abort prematurely. It can abort for
two main reasons. First, as in the real ¬eld case, some vector wk may already
belong to the vector space spanned by the previously obtained vectors in (vi ).
Second, in ¬nite ¬eld, it may happen that wi M = 0 with wi = 0. This
comes from the fact that in a ¬nite ¬eld, all computations need to be per-
formed modulo the characteristic of the ¬eld. To illustrate the problem, take
the row vector x = (1, 1), over the real ¬eld, its Euclidean norm is x = 2;
however, over F2 the norm is taken modulo 2 and thus equal to 0. When the
characteristic is small, this problem occurs frequently and Lanczos™s algorithm
needs to be modi¬ed into a block algorithm (see Section 3.4.1.3). However,
when the characteristic is a large prime p, this is a very rare event. More
precisely, due to the initial randomization, we can heuristically estimate the
probability of error at each step of the algorithm as 1/p. As a consequence,
the overall probability of failure is roughly n/p. Thus, for large values of p,
this probability is negligible.
When implementing Lanczos™s algorithm over ¬nite ¬elds, in order to avoid
the computation of the square roots that appear in the computation of norms,
it is preferable to avoid normalizing the vectors vi and instead to divide by
their norms where necessary. This is described in Algorithm 3.11.

3.4.1.2 Wiedemann™s algorithm
Wiedemann™s algorithm is another approach to ¬nd solutions of linear sys-
tems using matrix-vector products. However, instead of computing an or-
thogonal family of vectors, it aims at reconstructing a minimal polynomial.
Before presenting this algorithm, we need to recall a few facts about square
matrices and minimal polynomials.

3.4.1.2.1 Minimal polynomials of matrices Given a square matrix A
over a ¬eld K and a univariate polynomial f in K[X], it is clear that we can
evaluate f at A, thus computing another matrix f (A) of the same dimension
as f . When f (A) is the zero matrix, we say that f annihilates A. Let IA be
the set of polynomials of K[X] that annihilate A. In fact, this set IA is an
ideal of K[X]. Indeed, if f and g both annihilate A, then for all polynomial
± and β, we see that ±f + βg also annihilates A. Since IA is an ideal of
univariate polynomials, if IA is di¬erent from the zero ideal, there exists a
polynomial fA , unique up to multiplication by a non-zero element of K, that
generates IA . This polynomial fA is called the minimal polynomial of A.
It remains to show that IA is not the zero ideal. This is a simple corollary
of the Cayley-Hamilton theorem:




© 2009 by Taylor and Francis Group, LLC
Linear algebra 109




Algorithm 3.11 Lanczos™s algorithm over ¬nite ¬elds
Require: Input vector x and routines for multiplications by M and M
Let X ←’ M · x
Initialize vector y to zero
Initialize vector v1 to random
Let w1 ←’ M M v1
Let N1 ←’ (w1 |v1 )
Let y ←’ (X|w1 )v1 /N1
Let v2 ←’ w1 ’ (w1 |w1 )v1 /N1
Let w2 ←’ M M v2
Let N2 ←’ (w2 |v2 )
Let y ←’ y + (X|w2 )v2 /N2
for i from 3 to n do
Let v3 ←’ w2 ’ (w2 |w1 )v1 /N1 ’ (w2 |w2 )v2 /N2
Let w3 ←’ M M v3
Let N3 ←’ (w3 |v3 )
if N3 = 0 then
Exit Loop.
end if
Let y ←’ y + (X|w3 )v3 /N3
Let v1 ←’ v2 , w1 ←’ w2 , N1 ←’ N2 .
Let v2 ←’ v3 , w2 ←’ w3 , N2 ←’ N3 .
end for
Let z ←’ M y
if z = x then
Output: y is a solution of the system.
else
Let Z ←’ M z
if Z = X then
Output: z ’ x is in the kernel of M .
else
Output: Something wrong occurred.
end if
end if




© 2009 by Taylor and Francis Group, LLC
110 Algorithmic Cryptanalysis

THEOREM 3.1
For any square matrix A, let FA , the characteristic polynomial of A, be de¬ned
as FA (X) = det(A ’ X · Id). Then, FA annihilates A.


PROOF See [Lan05].

3.4.1.2.2 Application to linear systems Writing down the minimal
polynomial of A as:
d
±i X i ,
fA (X) = (3.22)
i=0

we can rewrite fA (A)0 as:
d
±i Ai = 0. (3.23)
i=0

As a consequence, for any vector b, we ¬nd:
d
±i (Ai · b) = 0. (3.24)
i=0


This implies that the sequence B de¬ned as Bi = Ai · b satis¬es a relation of
linear recurrence:
d’1
1
Bi+d = ’ ±j Bi+j . (3.25)
±d j=0

If ±0 = 0, then this linear recurrence can be used backward and in particular,
we may write:
d
1
B0 = ’ ±i Bi . (3.26)
±0 i=1

Note that when ±0 = 0, we can factor X out of the minimal polynomial fA and
writing A·(fa /X)(A) = 0 conclude that A is non-invertible. Indeed, otherwise
fA /X would annihilate A, which by minimality of fA is not possible.
From this remark, we can derive the basic idea of Wiedemann™s algorithm:

• To solve Ax = y, build a sequence y, Ay, . . . , Ai y, . . .

• Find a recurrence relation in the above sequence.

• Remark that the sequence can be expanded on its left by adding x and
use the recurrence backward to recover x.

However, this basic idea, as we just presented, su¬ers from a major ob-
struction. In order to determine this recurrence relation, we need a number




© 2009 by Taylor and Francis Group, LLC
Linear algebra 111

of vectors at least equal to the length of the relation. Storing all these vec-
tors requires roughly the same amount of memory as storing a dense matrix
of the same dimensions as A. Of course, this is not acceptable for a sparse
linear algebra algorithm. To avoid this problem, we do not store all the vec-
tors in the sequence. Instead, we only keep their scalar product with some
¬xed vector. Of course, the current vector needs to be kept in memory when
computing the next one, it is only erased after this computation. After com-
puting enough of these scalar products, roughly twice the dimension of A, we
can use Berlekamp-Massey algorithm from Chapter 2 to recover the minimal
recurrence relation satis¬ed by the sequence of scalar product. Expressed as
a polynomial, this recurrence relation divides the minimal polynomial fA . In
order to get rid of the bad case where we have a proper divisor, it is useful to
study more carefully the relation between these polynomials.
For this analysis, let us look at the following polynomials:
• fA the minimal polynomial of A.

• fA the minimal polynomial of the sequence S b of vectors b, Ab, . . . , Ai b,
b

...
b,u
• fA the minimal polynomial of the sequence T b,u of vectors (b|u), (Ab|u),
. . . , (Ai b|u), . . .

It is clear that fA annihilates S b and that fA annihilates T b,u . As a conse-
b

b,u b b
quence, fA divides fA and fA divides fA . Note that when the constant term
b,u b
of fA or of fA is zero, so is the constant term of fA and A is non-invertible.
b,u
The reverse is false in general. In fact, it is even possible to have fA = 1, for
example when u = 0.
x
To solve a system Ay = x, it su¬ces to compute fA . Indeed, assume that
d
fA = i=0 ±i X i with ±0 = 0 and consider the vector:
x


d
1
±i Ai’1 x.
y=’ (3.27)
±0 i=1

Multiplying by A, we ¬nd that:
d
1
±i Ai x = x,
Ay = ’ (3.28)
±0 i=1

x
by de¬nition of the polynomial fA .
x
However, as said above, directly computing fA would require to store the
sequence S b and would be too costly in terms of memory. As a consequence,
b,u
Wiedemann™s algorithm focuses on the computation of fA for a random
(non-zero) vector u. Moreover, it uses probabilistic arguments to show that




<<

. 4
( 18)



>>