ńņš. 4 |

Recursively compute R1 = M1 Ā· N1

Add R1 to upper left quadrant of R, i.e., R[i, j]+ = R1 [i, j]

Create a h1 Ć— h1 matrix M2 , with M2 [i, j] = M [i, h1 + j] + M [h1 + i, h1 + j]

Create a h1 Ć— h1 matrix N2 , with N2 [i, j] = N [h1 + i, j] + N [h1 + i, h1 + j]

Recursively compute R2 = M2 Ā· N2

Add R2 to lower right quadrant of R, i.e., R[h1 + i, h1 + j]+ = R2 [i, j]

Create a h1 Ć— h1 matrix M3 , with M3 [i, j] = M [i, h1 + j] + M [h1 + i, j]

Create a h1 Ć— h1 matrix N3 , with N3 [i, j] = N [h1 + i, j] ā’ N [i, h1 + j]

Recursively compute R3 = M3 Ā· N3

Add R3 to upper left quadrant of R, i.e., R[i, j]+ = R3 [i, j]

Subtract R3 from lower right quadrant of R, i.e., R[h1 +i, h1 +j]ā’ = R3 [i, j]

Create a h1 Ć— h1 matrix M4 , with M4 [i, j] = M [h1 + i, j]

Create a h1 Ć— h1 matrix N4 , with N4 [i, j] = N [i, j] + N [h1 + i, j]

Recursively compute R4 = M4 Ā· N4

Subtract R4 from upper left quadrant of R, i.e., R[i, j]ā’ = R4 [i, j]

Add R4 to lower left quadrant of R, i.e., R[h1 + i, j]+ = R4 [i, j]

Create a h1 Ć— h1 matrix M5 , with M5 [i, j] = M [i, h1 + j]

Create a h1 Ć— h1 matrix N5 , with N5 [i, j] = N [i, h1 + j] + N [h1 + i, h1 + j]

Recursively compute R5 = M5 Ā· N5

Add R5 to upper right quadrant of R, i.e., R[i, h1 + j]+ = R5 [i, j]

Subtract R5 from lower right quadrant of R, i.e., R[h1 +i, h1 +j]ā’ = R5 [i, j]

Create a h1 Ć— h1 matrix M6 , with M6 [i, j] = M [h1 + i, j] ā’ M [h1 + i, h1 + j]

Create a h1 Ć— h1 matrix N6 , with N6 [i, j] = N [h1 + i, j]

Recursively compute R6 = M6 Ā· N6

Add R6 to lower right quadrant of R, i.e., R[h1 + i, h1 + j]+ = R6 [i, j]

Subtract R6 from lower left quadrant of R, i.e., R[h1 + i, j]ā’ = R6 [i, j]

Create a h1 Ć— h1 matrix M7 , with M7 [i, j] = M [i, j] ā’ M [i, h1 + j]

Create a h1 Ć— h1 matrix N7 , with N7 [i, j] = N [i, h1 + j]

Recursively compute R7 = M7 Ā· N7

Add R7 to upper right quadrant of R, i.e., R[i, h1 + j]+ = R7 [i, j]

Subtract R6 from upper left quadrant of R, i.e., R[i, j]ā’ = R7 [i, j]

Return the upper left n Ć— n submatrix of R

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 83

Algorithm 3.3 Strassen matrix multiplication (rounding down)

Require: Input n Ć— n matrices M and N

if n = 1 then

Create R a 1 Ć— 1 matrix with entry M [1, 1] Ā· N [1, 1]

Return R

end if

Let h1 āā’ n/2

Let n1 āā’ 2h1

Create R a n1 Ć— n1 matrix with zero entries

Prepare, perform and post-process 7 recursive calls as in Algorithm 3.2

if n is odd then

Redeļ¬ne R as n Ć— n.

for i from 1 to n1 do

for j from 1 to n1 do

Let R[i, j] āā’ R[ i, j] + M [i, n] Ā· N [n, j]

end for

end for

for i from 1 to n1 do

n

Let R[i, n] āā’ k=1 M [i, k] Ā· N [k, n]

n

Let R[n, i] āā’ k=1 M [n, k] Ā· N [k, i]

end for

n

Let R[n, n] āā’ k=1 M [n, k] Ā· N [k, n]

end if

Return n Ć— n matrix of R

Ā© 2009 by Taylor and Francis Group, LLC

84 Algorithmic Cryptanalysis

Winogradā™s formulas for multiplying 2 Ć— 2 matrices also re-

quire 7 multiplications, as in Strassenā™s algorithm, but only

15 additions. Using the same notations, to multiply M by

M , the formulas are:

S2 = S1 ā’ a,

S1 = c + d,

S3 = a ā’ c, S4 = b ā’ S2 ,

S5 = b ā’ a , S6 = d ā’ S5 ,

S7 = d ā’ b , S8 = S6 ā’ c ,

Q1 = S2 Ā· S6 , Q2 = a Ā· a ,

Q3 = b Ā· c , Q4 = S3 Ā· S7 ,

Q5 = S1 Ā· S5 , Q6 = S4 Ā· d ,

Q7 = d Ā· S8 ,

T1 = Q1 + Q2 , T2 = T1 + Q4 and

Q2 + Q3 T1 + Q5 + Q6

M Ā·M = .

T2 ā’ Q7 T2 + Q5

Figure 3.2: Winogradā™s formulas for matrix multiplication

exhaustive and is here simply to highlight some unexpected technical details

that arise when implementing matrix multiplication. We consider two cases

of matrix multiplication which are cryptographically relevant. One case uses

multiplication of machine size integers and to avoid overļ¬‚ows when multiply-

ing numbers, we chose to work in Fp , where p = 46337 is the largest prime

such that 2p2 ļ¬ts in an unsigned number on 32 bits. The second case mul-

tiplies Boolean matrices whose sizes are multiples of 32. In this second case,

elementary block matrices of size 32 Ć— 32 are multiplied using Program 3.5.

Both programs are too long to print here and are available on the bookā™s

website.

In both cases, another consideration is to decide whether to use the round-

down as in Algorithm 3.2 or the round-up approach of Algorithm 3.3 when

implementing Strassenā™s algorithm. Both approaches work equally well for

matrices whose dimensions (or numbers of blocks over F2 ) are powers of two.

Similarly, both approaches have a worse case for which the other approach

would yield a much faster alternative. Namely, using the rounding up ap-

proach for a matrix of dimension of the form 2t + 1 is a bad idea. Likewise,

using the rounding down approach for a matrix of dimension 2t ā’ 1 is inad-

equate. Moreover, rounding up costs more than rounding down on average.

Thus, to minimize the adverse consequences of rounding, we implemented

both approaches within each of our matrix multiplication codes, choosing the

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 85

rounding up option only for dimensions congruent to -1 modulo some small

power of 2. This choice allows us to use one rounding up instead of four

rounding down when possible. It is not optimal, but gives a nice, easy to

implement compromise and the resulting running times behave reasonably

nicely as a function of the dimension.

In both cases, we compare the running time of our implementation of

Strassenā™s algorithm with an elementary matrix multiplication. Over F2 this

elementary multiplication uses a block by block approach in order to make use

of the fast code we have for 32 Ć— 32 matrices. Programs 3.6 and 3.7 contain

the elementary matrix multiplication used as a reference. One drawback of

Program 3.7 is that it performs a modular reduction after each multiplication,

which is quite costly. A better approach is to perform the modular reduction

once, at the end of each scalar product, as in Program 3.8. However, this

limits the modulus that can be achieved with a ļ¬xed integer size. Similarly,

in Program 3.6, instead of using the 32 Ć— 32 matrix multiplication routine in a

black box manner, we can combine the inner optimizations together with the

block matrix structure; see Exercise 4. Note that depending on the speciļ¬c

machine or compiler, the programs can sometimes be improved by reversing

the order of the two outer loops.

Over F2 , pushing the recursion in Strassenā™s algorithm all the way down

to 32 Ć— 32 matrices, the resulting running times almost perfectly reļ¬‚ect the

asymptotic analysis. The running times are given in seconds in Table 3.2 and

shown on a graph in Figure 3.3. Both sets measured running times closely fol-

low the theoretical predictions and can be approximated by curves of the form

t0 Ā·(x/X)3 and t1 Ā·(x/X)log2 7 , where t0 and t1 are the respective running times

of the two multiplication programs on the last data point available: X. More-

over, Strassenā™s algorithm quickly becomes more eļ¬ective than the ordinary

matrix multiplication. It is interesting to remark that near 512 blocks, our

rounding strategy does not behave well, which explains the irregular running

times. Also note that these codes can still be greatly improved. In particular,

enlarging the basic block to 64 or 128 bits would be a very good idea. It is

also possible to improve the basic block multiplication with a time-memory

tradeoļ¬ algorithm called the algorithm of four Russians [ADKF70]. This is

especially useful here since each basic block appears in several multiplications.

With these improvements, the cutoļ¬ point for Strassenā™s algorithm would be

higher.

Over Fp , we experimented several diļ¬erent values of the cutoļ¬ parameters,

namely 32, 64 and 128, it turns out that, with the simple basic implementation

we have, 64 seems to be a reasonable choice for the cutoļ¬ value. The running

times are shown on a graph in Figure 3.4. Note that the basic implementation

we are using is not optimized, as a consequence, by writting it more carefully,

it would be possible to speed up the matrix multiplication in Fp by at least an

order of magnitude. The reader can experiment this on the various computer

algebra systems listed in the Preface.

With our implementation, Strassenā™s algorithm over Fp also behaves as ex-

Ā© 2009 by Taylor and Francis Group, LLC

86 Algorithmic Cryptanalysis

Program 3.6 C code for elementary 32n Ć— 32n matrix multiplication over F2

#include <stdio.h>

#include <stdlib.h>

#define WORD unsigned int

#define access(M,i,j,bsize) (&M[((i)*(bsize)*DIM)+((j)*DIM)])

/* External procedure for 32x32 boolean matrix multiplication */

extern void Mul(WORD res[DIM], WORD mat1[DIM], WORD mat2[DIM]);

void matmul(WORD * A, WORD * B, WORD *Res, int bsize) {

int i,j,k,l;

WORD tmp[DIM];

for(i=0;i<DIM*bsize*bsize;i++)

Res[i]=0;

for(i=0;i<bsize;i++)

for(j=0;j<bsize;j++) {

for(k=0;k<bsize;k++) {

Mul(tmp, access(A,i,k,bsize), access(B,k,j,bsize));

for(l=0;l<DIM;l++) {

access(Res,i,j,bsize)[l]^=tmp[l];

}

}

}

}

Num. of blocks (n) Ordinary mult. Strassenā™s mult.

16 0.01 < 0.01

24 0.02 0.01

32 0.04 0.03

48 0.13 0.08

64 0.29 0.20

96 0.98 0.60

128 2.32 1.41

192 7.88 4.22

256 18.79 9.94

384 63.97 29.75

511 147.88 69.94

Table 3.2: Times for (32n) Ć— (32n) Boolean matrix multiplication

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 87

150

100

Time (sec.)

Ordinary

50

Strassen

128 256 384 512

Dimension

Figure 3.3: Performance of Strassenā™s multiplication over F2

Ā© 2009 by Taylor and Francis Group, LLC

88 Algorithmic Cryptanalysis

Program 3.7 C code for elementary matrix multiplication over Fp

#include <stdio.h>

#include <stdlib.h>

#define TYPE unsigned short

#define MODULO 46337 /* Also works with primes up to 65521*/

#define access(M,i,j,size) M[(i)+((j)*(size))]

void matmul(TYPE * A, TYPE * B, TYPE *Res, int size) {

int i,j,k;

unsigned int tmp;

for(i=0;i<size;i++)

for(j=0;j<size;j++) {

tmp=0;

for(k=0;k<size;k++) {

tmp=(tmp+access(A,i,k,size)*access(B,k,j,size))%MODULO;

}

access(Res,i,j,size)=tmp;

}

}

pected. For large values of the dimension, however, we can see some cyclic

variations around the theoretical behavior. This probably reļ¬‚ects the fact

that the diļ¬erence in terms of performances between rounding up and round-

ing down is a function of the dimension. It shows once again that the rounding

strategy can be improved. On the contrary, the behavior of the elementary

algorithm is much more surprising. Instead of following a simple cubic equa-

tion, the running times seem to follow some cubic equation, then switch to

another and possibly a third. Moreover, for a small number of speciļ¬c values

of the dimension, the behavior is even worse. This mysterious behavior calls

for an explanation. In fact, it is due to cache eļ¬ects when reading the ma-

trices from memory during multiplication. These eļ¬ects are a side-eļ¬ect of

cache mecanisms, a frequently used tool in computer science. The common

basic idea of cache mecanism is to keep local copies of data elements in order

to avoid fetching them again when they are requested a second time. This

can be used to speed up all kind of data accesses. Frequently encountered

applications are disk or webpage accesses. More information about the use of

cache in modern processors can be found on Figure 3.5.

To understand why cache eļ¬ects arise in our implementation of elementary

multiplication, we can remark that the two innermost loops on j and k com-

pletely read the matrix B and scan the same line of A over and over. On the

one hand, the total size of B expressed in bytes is 2n2 for a n Ć— n matrix and

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 89

for values of n within a reasonable range, it is comparable to the size of either

the ļ¬rst (up to n ā 150) or the second level (up to n ā 1000) of memory

cache. On the other hand, the size of a single line of A is 2n and thus should

comfortably ļ¬t into the ļ¬rst level of cache. To explain the misbehaviors of

the code, we should look for transitions of B from the ļ¬rst level of cache to

the second level or from the second level of cache to the main memory. This

ļ¬rst cache eļ¬ect corresponds to the progressive slow-down of the code as the

dimension increases. To explain the second eļ¬ect, we should also look for ex-

ceptional cases where ā“ for technical reasons ā“ a line of A cannot be kept in the

ļ¬rst level of cache. This happens when many addresses with a large number

of equal low order bits are required at the same time. This phenomenon is

due to the low level details of the implementation of the cache mechanism. In

fact, each memory address can only be stored in a small number of possible

diļ¬erent locations in the cache. These authorized locations are determined

by looking at a number of low order bits (say 20 or a bit less). When the

dimension of the matrix is divisible by a large power of 2, due to the memory

representation we are using, several memory locations may compete for the

same cache locations. In this case, the same line of A needs to be fetched

over and over again from memory. Moreover, reading A in this context is

even slower than reading B from main memory, because B takes advantage of

automatic prefetching mechanisms.

These cache eļ¬ects are more visible on the faster implementation that uses

fewer modular reductions. The reason is that the memory accesses in this

implementation use up a larger fraction of the running time. Note that the

eļ¬ect of cache misses greatly depends on the considered computer architecture.

More recent computers seem to perform better in this respect.

All these cache misbehaviors, or at least most of them, can be avoided

by reorganizing the loops and thus the memory accesses. However, this is

not straightforward and it can take a long time for a programmer to ļ¬nd a

working approach. It is extremely interesting to remark that thanks to its

use of a divide-and-conquer approach, Strassenā™s algorithm neatly avoids this

problem.

3.2.2 Asymptotically fast matrix multiplication

After the discovery of Strassenā™s algorithm, the question of ļ¬nding an op-

timal asymptotic algorithm for matrix multiplication became an essential is-

sue. Great advances were made and the exponent of matrix multiplication

was lowered a lot. These improvements rely on increasingly complex formulas

for multiplying various tensor forms. In particular, the use of approximate

formulas was essential to lower the number of necessary multiplications. For

example, a method of Bini, Capovani, Romani and Lotti [BCRL79] allows to

multiply a 3 by 2 matrix and a 2 by 3 matrix using only 10 multiplications of

low degree polynomials and leads to a complexity O(n2.7799 ). These methods

are discussed in detail in a book by Pan [Pan84]. The best current asymp-

Ā© 2009 by Taylor and Francis Group, LLC

90 Algorithmic Cryptanalysis

Program 3.8 C code for matrix mult. over Fp with fewer modular reductions

#include <stdio.h>

#include <stdlib.h>

#define TYPE unsigned short

#define MODULO 46337

#define access(M,i,j,size) M[(i)+((j)*(size))]

void matmul(TYPE * A, TYPE * B, TYPE *Res, int size) {

int i,j,k;

unsigned int tmp;

for(i=0;i<size;i++)

for(j=0;j<size;j++) {

tmp=0;

for(k=0;k<size;k++) {

tmp=(tmp+access(A,i,k,size)*access(B,k,j,size));

if (tmp>=(MODULO*MODULO)) tmp-=MODULO*MODULO;

}

access(Res,i,j,size)=tmp%MODULO;

}

}

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 91

500

400

300 Few

Time (sec.)

reductions

200

Strassen

100

Ordinary

128 256 384 512 640 768 896 1024

Dimension

Figure 3.4: Performance of Strassenā™s multiplication over Fp

Ā© 2009 by Taylor and Francis Group, LLC

92 Algorithmic Cryptanalysis

In modern processors, a need for cache mechanisms arised from the fact that

processors speeds have been progressing much faster than memory speeds.

In this context, a CPU cache is a piece of fast memory located as close to

the processor unit as possible. Of course, the size of caches is much smaller

than the size of the main memory of the computer. As a consequence,

it is necessary to devise eļ¬cient heuristic techniques to make sure that in

many applications the cache will hold the necessary piece of data when it

is required. This good event is called a cache hit. The bad case, where a

required piece of data is not present in the cache, is called a cache miss.

Each entry in a CPU cache contains both a copy of a data piece from the

main memory and the address of this data in the memory. To avoid using

a large proportion of the fast cache memory simply to store addresses, CPU

caches do not individually store very small pieces of memory. The basic

unit is called a cache line and can contain several integers. To simplify the

management of the CPU cache, the cache lines are usually aligned, i.e., they

can only start at a position in memory whose address is a multiple of the

size of a cache line. Moreover, cache mechanisms cannot use too complex

algorithmic techniques to detect whether a data is already present in the

cache or not. Thus, to simplify the issue, cache lines are often grouped into

several smaller caches depending on the bit patterns of their addresses. These

smaller caches are called banks. This allows the CPU to load more than one

piece of data per clock cycle as long as the data comes from diļ¬erent banks.

Otherwise, there is a bank conļ¬‚ict which slows things down. Similarly, since

many applications use a limited amount of memory, cache accesses often

ignore the higher bits of addresses during the initial data search. This causes

another kind of conļ¬‚ict when two pieces of data located at positions whose

addresses diļ¬er by a large power of 2 are needed.

CPU caches are further complicated by the fact that the ļ¬rst level of cache

is usually too small and that a second level of bigger but slower cache is

often added on top of it. They are also usually combined with prefetching

mechanisms which try to predict future memory accesses and load the corre-

sponding data in advance into the cache, this works quite well with accesses

to regularly spaced memory addresses. In multitasking operating systems,

the conversion of the virtual memory addresses seen by an individual task

into physical memory addresses can become extremely slow if the tables

needed for the conversion cannot be held into ļ¬rst level of cache. Another

important issue is that cache mechanisms may induce security concerns and

allow an adversarial program to learn secret data, such as cryptographic

keys, which it should not be able to access [AS08, AScKK07, OST06].

Algorithms and computer programs that take cache mechanisms into

account in order to achieve better performance are called cache-

oblivious [FLPR99].

Figure 3.5: Principles of cached memory in processors

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 93

totic result is the method of Coppersmith and Winograd [CW90] which yields

a matrix multiplication algorithm running in time O(n2.376 ). This algorithm

has recently been reformulated in a group theoretic setting by Cohn, Klein-

berg, Szegedy and Umans [CKSU05]. However, no recent advances have been

made concerning the exponent of the complexity of matrix multiplication. A

widely held opinion is that the correct exponent is 2 and that the asymptotic

complexity of matrix multiplication probably is O(n2 log(n))t for some small

integer t.

From a practical point-of-view, these algorithms with an asymptotic com-

plexity better than Strassenā™s are not applicable. Indeed, they involve ex-

tremely large constant overheads, due to the highly complex basic formulas

that are used. In practice, none of the algorithms with asymptotic complexity

better than Strassenā™s has been reported as useful. Thus, as far as practical

aspects are considered, and in particular for cryptanalysis, the best currently

achievable exponent for matrix multiplication is log2 7.

3.2.3 Relation to other linear algebra problems

Since all linear algebra problems can be solved using matrix multiplication,

addition and inverse, in order to relate the general complexity of linear al-

gebra to the complexity of matrix multiplication, it suļ¬ces to determine the

complexity of inversion from the complexity of matrix multiplication. Indeed,

addition of n Ć— n matrices can be done in time n2 , and thus it cannot increase

the exponent in the complexity of linear algebra problems. In this section, we

assume that we are given a matrix multiplication algorithm with complexity

O(nĻ ), for some constant 2 < Ļ ā¤ 3 and we would like to show that matrix

inversion can also be performed in time O(nĻ ). As Strassenā™s algorithm, this

can be obtained by working with 2 Ć— 2 block matrices. Write:

AB

M= , (3.2)

CD

and remark that when D is invertible then multiplying M on the right by

Id 0

N= (3.3)

ā’1

ā’D C Id

yields:

A ā’ BDā’1 C B

MN = . (3.4)

0 D

In this equation, A ā’ BDā’1 C is called the Schur complement of M, in the

sequel, we denote it by S. Since a triangular matrix is easy to invert, indeed:

ā’1

U ā’1 ā’U ā’1 V W ā’1

UV

= , (3.5)

W ā’1

0W 0

Ā© 2009 by Taylor and Francis Group, LLC

94 Algorithmic Cryptanalysis

we can compute the inverse of M as:

ā’1

SB

ā’1

M =N . (3.6)

0D

This equation only requires two inversions of half-size matrices (D and S) and

some matrix multiplications and additions. As a consequence, it can be used

as a basis for a recursive matrix inversion algorithm with asymptotic running

time of the same form as the underlying matrix multiplication algorithm, i.e.,

O(nĻ ).

The use of the Schur complement is a powerful tool for numerical analysis,

many of its applications in this context are described in Zhangā™s book [Zha05].

It is also possible to use the Schur complement to invert matrices over

ļ¬nite ļ¬elds. Two methods are possible. The ļ¬rst option is to lift the matrix

to a larger ļ¬eld. For example, a matrix modulo a prime p can be lifted

to an integer matrix. However, this approach is costly, since the inverse of

the integer matrix may involve large denominators. The other option is to

compute Schur complement directly in the ļ¬nite ļ¬eld. In this case, a speciļ¬c

diļ¬culty arises: we need to make sure that D is invertible at each step of

the algorithm. Indeed, over the ļ¬nite ļ¬eld Fq , even a random matrix is not

guaranteed to have full rank. This is most critical when considering matrices

over F2 , because a random square matrix of dimension n over F2 has full rank

n

with probability i=1 (1 ā’ 2ā’i ), which is close to 0.29 when n becomes large.

3.3 Gaussian elimination algorithms

When working with matrices, frequently encountered problems are to solve

a linear system of equations or to invert a matrix. In Section 3.2.3 we described

a recursive approach to matrix inversion. Here, we consider a more direct

approach: Gaussian elimination algorithms. Since the problems of solving

systems of equations and of inverting matrices are related, it is convenient to

start by the simpler problem: solving linear system of equations.

Gaussian elimination works in two phases. During the ļ¬rst phase, called

Gaussā™s pivoting, we progressively modify the original system of equations

using reversible transforms, in order to turn it into a triangular system of

equations. During the second phase, thanks to the triangular form, it becomes

possible to determine, one at a time, the value of each unknown.

Gaussian elimination is quite simple to describe; however, when writing

down a complete algorithm, some technicalities arise in order to avoid any

division by zero. To avoid these technicalities, we initially write down slightly

incorrect versions of the two phases in order to outline the basic idea. To

simplify the presentation and avoid dealing with the representation of the

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 95

linear equations themselves, we simply assume that each equation is given as

an equality between a linear combination of variables xi given on the left-

hand side and a constant value given on the right-hand side. Moreover, we

assume that we have access to elementary routines that compute the addition

or subtraction of two equations, multiply an equation by a given scalar and

access individual coeļ¬cients within an equation. The ļ¬rst phase in simpliļ¬ed

form is given as Algorithm 3.4 and the second phase as Algorithm 3.5. The

action of the ingredient of Algorithm 3.5, pivoting, on the matrix of coeļ¬cients

of a linear system is illustrated in Figure 3.6.

Another application of Gaussian elimination is to compute the determinant

of a matrix. Indeed, the ļ¬rst phase does not modify the determinant and once

we have a triangular matrix, it suļ¬ces to compute the product of diagonal

element to obtain the determinant.

Algorithm 3.4 Triangularization of a linear system (simpliļ¬ed, incorrect)

Require: Input linear system n equations Ei in n unknowns xi

for i from 1 to n ā’ 1 do

Let P be the non-zero coeļ¬cient of xi in Ei

for j from i + 1 to n do

Let C be the coeļ¬cient of xi in Ej

Ej āā’ Ej ā’ (C/P ) Ā· Ei

end for

end for

Output modiļ¬ed upper triangular system of n equations Ei

Algorithm 3.5 Backtracking to solve a triangular system

Require: Input triangular linear system n equations Ei in n unknowns xi

Create an array X of n elements to store the values of variables xi

for i from n down to 1 do

Let V be the constant term in equation Ei

for j from i + 1 to n do

Let C be the coeļ¬cient of xj in Ei

V āā’ V ā’ C Ā· X[j]

end for

Let P be the non-zero coeļ¬cient of xi in Ei

Let X[i] āā’ V /P

end for

Output array of determined values X

Ā© 2009 by Taylor and Francis Group, LLC

96 Algorithmic Cryptanalysis

ļ£« ļ£¶ ļ£« ļ£¶

Ć— Ć— ... Ć— Ć— ... Ć— Ć— Ć— ... Ć— Ć— ... Ć—

a1,1 a1,1

a2,2 Ć— . . . Ć— Ć— ... Ć—ļ£· a2,2 Ć— . . . Ć— Ć— ... Ć—ļ£·

ļ£¬0 ļ£¬0

ļ£·

ļ£¬ ļ£¬ ļ£·

0 a3,3 . . . Ć— Ć— ... Ć—ļ£· Ć— Ć— ... Ć—ļ£·

ļ£¬0 ļ£¬0 0 a3,3 . . .

ļ£¬. . . .. . ļ£¬. . . .. .

ļ£¬ ļ£· ļ£¬ ļ£·

ļ£¬. . . .. ļ£¬. . . .

ļ£· ļ£·

.

Ć— ... Ć—ļ£· Ć— ... Ć—ļ£·

ļ£¬. . . . ļ£·ā’ļ£¬ . . . . ļ£·

Ć— ... Ć—ļ£· Ć— ... Ć—ļ£·

ļ£¬0 0 0 . . . ai,i ļ£¬0 0 0 ... ai,i

ļ£¬ ļ£· ļ£¬ ļ£·

Ć— ... Ć—ļ£· Ć— ... Ć—ļ£·

ļ£¬0 0 0 . . . ai+1,i ļ£¬0 0 0 ... 0

ļ£¬ ļ£· ļ£¬ ļ£·

ļ£¬. . . .. . . .. . ļ£· ļ£¬. . . .. . . .ļ£·

ļ£. . . .. . . .ļ£ø ļ£. . . . . updated . ļ£ø

.

. . . . . . . . . . . .

Ć— ... Ć— Ć— ... Ć—

0 0 0 . . . an,i 0 0 0 ... 0

Figure 3.6: Eļ¬ect of a pivoting step

Our simpliļ¬ed algorithms are incorrect in general. They work correctly

under several important assumptions. The ļ¬rst of these assumptions is that

the input system has n equations in n unknowns and a unique solution. In

general, this need not be the case, linear systems may have more unknowns

than equations or the reverse. And, as recalled in Chapter 2, even systems

of n equations in n unknowns do not necessarily have a unique solution. If

their kernel is non-zero, they may have none or inļ¬nitely many. The next

assumption is that whenever we request the coeļ¬cient of xi in equation Ei

at some step of the algorithm, this value is non-zero. This assumption is

highlighted in boldface type in the algorithms.

When the ļ¬rst assumption does not hold, i.e., when the system of equations

does not have full rank, printing out an error message may, at least for now,

be an acceptable solution. However, the second assumption that no zeros

are encountered as denominators throughout the algorithm, is a technical

assumption that does not reļ¬‚ect any underlying mathematical invariant of

the linear system. In order to better understand its role, let us discuss the

inner working of Algorithm 3.4. This algorithm has a main loop where variable

i is used to successively go through all equations. At each step, we take the

coeļ¬cient of xi in the current version of equation Ei . Thanks to the work

already performed by the algorithm, all coeļ¬cients of previous variables x1

up to xiā’1 are zero in Ei . Assuming that the coeļ¬cient of xi is non-zero,

the inner loop indexed by j then modiļ¬es the system to make xi vanish in

all subsequent equations. It is important to note that this transformation,

called pivoting, is reversible and thus does not change the set of solutions of

the linear system. As a consequence of the successive pivoting steps, after the

algorithm execution, the linear system becomes upper triangular. The non-

zero coeļ¬cient of xi in Ei used for pivoting is called the pivot. Taking the i-th

pivot from equation Ei comes naturally when writing down the algorithm, but

it is unessential to its inner working. In truth, for each pivoting step it suļ¬ces

to ļ¬nd a pivot, i.e., a non-zero coeļ¬cient, for xi in any equation that has not

yet been used for this purpose. In fact, as long as the system of equations is

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 97

invertible, at each step of the algorithm it is always possible to ļ¬nd at least one

equation Ej with j ā„ i such that the coeļ¬cient of xi in Ej is non-zero. Thus,

by reordering the equations, we can modify Algorithm 3.4 into Algorithm 3.6.

When working over ļ¬nite ļ¬elds, this change suļ¬ces to obtain a fully function

Gaussian elimination algorithm. Over the real or complex ļ¬elds, this is not

the end of story. Since this is outside of the scope of this book, let us simply

state that in this case the pivot should be chosen with even more care in order

to minimize the loss of accuracy during the algorithm execution.

Algorithm 3.6 Triangularization of a linear system

Require: Input linear system n equations Ei in n unknowns xi

for i from 1 to n ā’ 1 do

for j from i to n do

Let P be the coeļ¬cient of xi in Ej

if P = 0 then

Exit Loop

end if

end for

if P = 0 then

Output ā˜Non-invertible systemā™, Exit Algorithm

end if

Exchange equations Ei and Ej

Let P be the non-zero coeļ¬cient of xi in Ei

for j from i + 1 to n do

Let C be the coeļ¬cient of xi in Ej

Ej āā’ Ej ā’ (C/P ) Ā· Ei

end for

end for

Output modiļ¬ed system of n equations Ei

Once the linear system is transformed into triangular form, ļ¬nding the

value of each variable within the unique solution is a simple matter. Due to

the triangular form of the transformed system, the last equation relates the

value of the single variable, the last one, to a constant. Thus dividing the

constant in this equation by the coeļ¬cient of the last variable xn , we recover

its value. Once the value of xn is known, we can substitute this value in the

previous equation which only contains xn and xnā’1 . After the substitution,

we can recover the value of xnā’1 . Following this approach, we successively

determine xnā’2 and all other variables. Algorithm 3.5 uses this approach.

Ā© 2009 by Taylor and Francis Group, LLC

98 Algorithmic Cryptanalysis

3.3.1 Matrix inversion

With Gaussian elimination as a tool to solve invertible linear systems of

equations, we can now turn to the problem of computing the inverse of a ma-

trix. Remember that M is invertible, if and only if, there exists a matrix N

such that: M N = Id. Here, Id denotes the identity matrix, with a diagonal

of ones and zeros everywhere else, which is the neutral element in the multi-

plicative group of n Ć— n matrices. Viewing the matrix N as n column vectors

N (i) , each such vector satisļ¬es a linear equation: M Ā· N (i) = ā(i) , where ā(i)

is the vector whose coordinates are 0 everywhere except on line i where the

coeļ¬cient is 1. Solving these n linear systems of equations, we thus invert

M . However, from a complexity point-of-view this approach is not satisfying.

Instead, it is preferable to use the similarity between all these systems and

solve them all at once. Indeed, the sequence of operations in Algorithms 3.5

and 3.6 does not depend on the constant side of the equations. This allows

us to perform these operations in parallel on these n related systems.

To illustrate these ideas in the context of 32Ć—32 Boolean matrices as in Sec-

tion 3.1, we give in Program 3.9 a C code compatible with the representation,

input and output routines given in the matrix multiplication Program 3.2.

3.3.2 Non-invertible matrices

Linear systems of equations encountered in cryptography do not necessarily

have full rank. When faced with such a system, Gaussian elimination is also

very useful. However, we need to change and complete our algorithms. Our

goal is to ļ¬nd whether a given system has solutions and, at least, to compute

a single solution. More generally, for non-invertible square matrix M , we aim

at computing its kernel Ker(M ) and image Im(M ).

Since these sets are linear subspaces, they can be very large, or even inļ¬nite

when the base ļ¬eld is inļ¬nite. Thus, it is not conceivable to represent them

by a list of elements; instead, we would like to obtain eļ¬cient representations

such as linear bases for these subspaces.

Remember that a linear system of equations, written in matrix form as

M x = y, with M non-invertible, has the following properties:

ā¢ If y does not belong to the image of M , then the equation has no solution.

ā¢ If y belongs to the image of M , then the set of solutions is an aļ¬ne

subspace.

ā¢ More precisely, starting from an arbitrary solution x0 of the system, the

set of solutions is x0 + Ker(M ).

When solving a single system for a non-invertible matrix M , it is convenient

to write a variation of Algorithms 3.5 and 3.6 to solve the system. When faced

with many systems of equations involving a single matrix M , it is preferable to

ļ¬rst compute global information about the matrix, ļ¬nd its rank and describe

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 99

Algorithm 3.7 Matrix inversion

Require: Input n Ć— n matrix M

Initialize matrix N to identity

for i from 1 to n ā’ 1 do

for j from i to n do

Let P = Mj,i

if P = 0 then

Exit Loop

end if

end for

if P = 0 then

Output ā˜Non-invertible matrixā™, Exit Algorithm

end if

Exchange lines i and j in M

Exchange lines i and j in N

Divide lines i of M and N by P

for j from i + 1 to n do

Let C = Mj,i

Subtract C times line i of M from line j of M

Subtract C times line i of N from line j of N

end for

end for

Assert: M is upper triangular with a diagonal of 1

for i from n down to 1 do

for j from 1 to i ā’ 1 do

Let C = Mj,i

Subtract C times line i of M from line j of M

Subtract C times line i of N from line j of N

end for

end for

Assert: M is the identity matrix

Output N the inverse of input matrix M

Ā© 2009 by Taylor and Francis Group, LLC

100 Algorithmic Cryptanalysis

Program 3.9 Inversion of 32 Ć— 32 matrix over F2

/* Warning: mat is transformed during MatInv */

int MatInv(WORD mat[DIM], WORD inv[DIM])

{

int piv,l,c,k;

WORD val,vali,mask;

for(piv=0,mask=1;piv<DIM;piv++,mask<<=1)

inv[piv]=mask;

for(piv=0,mask=1;piv<DIM;piv++,mask<<=1) {

for (c=piv;c<DIM;c++) if (mask&mat[c]) break;

if (c>=DIM) return(FALSE);

val=mat[c];mat[c]=mat[piv];mat[piv]=val;

vali=inv[c];inv[c]=inv[piv];inv[piv]=vali;

for(c=0;c<DIM;c++) if ((c!=piv)&&(mask&mat[c])) {

mat[c]^=val;inv[c]^=vali; }}

return(TRUE);

}

its kernel and image. This global information then allows fast resolution of

the systems involving M . This approach is similar to the computation of a

matrix inverse with Algorithm 3.7.

Let us start with the single system case. Since the system is non-invertible,

we know that running Algorithm 3.6 on it produces the error message āNon-

invertible system.ā This occurs when no pivot can be found for a variable xi .

However, at that point, since xi has a zero coeļ¬cient on all lines from i to

n, the triangularization is in some sense already complete for line i, except

that there is a zero on the diagonal at this position, instead of a 1. Thus,

instead of aborting, we could simply skip to the next iteration of the main

loop. However, if we do not make sure that the current line is going to be

considered when looking for the next pivot, then the backtracking phase may

incorrectly claim that the system does not have a solution. To avoid this, the

simplest approach is to renumber the variables, pushing those without a pivot

toward the end. With this simple modiļ¬cation, all systems of equations can be

written, after renumbering the variables, in a nice triangular form, possibly

with a bunch of zeros at the lower right end of the diagonal. Once such a

system goes as input to Algorithm 3.5, it is easy to modify this algorithm to

ļ¬nd solutions. For all the trailing zeros on the diagonal, if the corresponding

equation has a non-zero constant, the system does not have any solution.

Otherwise, the corresponding variable may be set at random. For the non-zero

diagonal entries, the backtracking algorithm resumes its former behavior. The

modiļ¬ed triangularization and backtracking algorithms (for a single solution)

are given as Algorithms 3.8 and 3.9.

Let us now consider the case of multiple systems involving the same matrix

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 101

Algorithm 3.8 Triangularization of a possibly non-invertible system

Require: Input linear system n equations Ei in n unknowns xi

Create an array Ī of n entries, initialized to Ī [i] = i.

Let Last = n

for i from 1 to Last do

for j from i to n do

Let P be the coeļ¬cient of xĪ [i] in Ej

if P = 0 then

Exit Loop

end if

end for

if P = 0 then

Exchange equations Ei and Ej

Let P be the non-zero coeļ¬cient of xĪ [i] in Ei

for j from i + 1 to n do

Let C be the coeļ¬cient of xĪ [i] in Ej

Ej āā’ Ej ā’ (C/P ) Ā· Ei

end for

else

Exchange Ī [i] and Ī [Last]

Decrement Last

Decrement i (to re-run the loop on the same i value)

end if

end for

Output modiļ¬ed system of n equations Ei and renumbering Ī of the vari-

ables x

Ā© 2009 by Taylor and Francis Group, LLC

102 Algorithmic Cryptanalysis

Algorithm 3.9 Backtracking of a possibly non-invertible triangular system

Require: Input triangular linear system n equations Ei in n unknowns xi

Require: Renumbering of the variables Ī

Create an array X of n elements to store the values of variables xi

for i from n down to 1 do

Let V be the constant term in equation Ei

Let P be the coeļ¬cient of xĪ [i] in Ei

if P = 0 then

if V = 0 then

Output ā˜No solution existsā™, Exit Algorithm

else

Assign random value to X[Ī [i]]

end if

else

for j from i + 1 to n do

Let C be the coeļ¬cient of xĪ [j] in Ei

V āā’ V ā’ C Ā· X[Ī [j]]

end for

Let X[Ī [i]] āā’ V /P

end if

end for

Output array of determined values X

M . In that case, we need to compute several objects. More precisely, we

need a basis of the kernel of M to recover all solutions of the system, we need

an eļ¬cient way to test whether a vector y belongs to the image of M and a

procedure to compute a solution x0 for the system.

The easiest approach is to start by modifying Algorithm 3.7 in order to avoid

any division by zero, using the same renumbering idea as in Algorithms 3.8

and 3.9. At the end of the modiļ¬ed algorithm, the matrix M is transformed,

in renumbered form into a matrix:

Id K

H= . (3.7)

00

At the same point, matrix N contains a transformation matrix such that

H = N M PĪ , where PĪ is the permutation matrix representing the permuta-

tion Ī . Alternatively, reversing the roles of rows and columns, we can similarly

write H = PĪ M N , where:

Id 0

H= . (3.8)

K0

In that case, the matrix H is a basis for the image of M , with permuted

coordinates, and the ļ¬nal columns N corresponding to null columns after

multiplications by M form a basis of the kernel of M .

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 103

3.3.3 Hermite normal forms

In all the Gaussian elimination algorithms we presented in this section,

we encounter divisions by diagonal elements. When the considered matrix

or system of equations is deļ¬ned over a ļ¬eld, all the divisions, by non-zero

elements, can be performed without trouble. However, if the matrix is deļ¬ned

over a ring, we may have diļ¬culties. The typical example is the case of

a system of equations deļ¬ned over the integers. Such a system does not

necessarily have an integer solution. In that case, it would be nice to compute

the triangularization of the matrix in a way that avoids all non-exact divisions.

When the matrix is invertible this is related to the computation of a Hermite

normal form.

DEFINITION 3.1 An invertible square matrix M with integer coef-

ļ¬cients is said to be in Hermite normal form if it is upper triangular,

with positive elements on its diagonal and if furthermore all non-diagonal ele-

ments are non-negative and smaller than the diagonal element in their column.

Equivalently:

āi > j : Mi,j = 0 (3.9)

āi : Mi,i > 0 (3.10)

āi < j : 0 ā¤ Mi,j < Mj,j (3.11)

Computing Hermite normal forms is reminiscent of both Gaussian elimina-

tion and GCD computations. During the ļ¬rst phase of Gaussian elimination,

at each pivoting step, we choose in the current column the smallest non-zero

entry (in absolute value) as pivot. We move the corresponding row in order

to have this pivot on the diagonal, potentially replace it by its opposite to get

a positive pivot, and remove integer multiples of this row from all rows below

it. We choose the multiples to make sure that the coeļ¬cients in the current

column become non-negative and smaller than the coeļ¬cient of the pivot.

If all coeļ¬cients of transformed rows in this column are zeros, we proceed

to the next column, otherwise we repeat the same computation, choosing a

smaller pivot. During the second phase of Gaussian elimination, we remove

multiples of each row from the rows above it, to make all non-diagonal entries

non-negative and smaller than the diagonal entry in their column. This is

described as Algorithm 3.10. This algorithm can easily be modiļ¬ed to keep

track of the transformations in an auxiliary matrix as in the matrix inversion

Algorithm 3.7. Note that a similar algorithm can be used to transform a

non-invertible matrix into something called a row echelon form. The main

diļ¬erence with the Hermite normal form is that in columns that do not have

any non-zero pivot, the size of the entries in previous rows cannot be reduced.

Ā© 2009 by Taylor and Francis Group, LLC

104 Algorithmic Cryptanalysis

Algorithm 3.10 Hermite normal forms

Require: Input invertible n Ć— n matrix M with integer entries

for i from 1 to n ā’ 1 do

Let done āā’ false

while done = true do

Let P āā’ 0

for k from i to n do

if Mi,k = 0 then

if P = 0 or |Mi,k | < P then

Let j āā’ k; let P āā’ Mi,j

end if

end if

end for

if P = 0 then

Output ā˜Non-invertible systemā™, Exit Algorithm

end if

Let done āā’ true

Exchange rows Mi and Mj

if P < 0 then

Let row Mi āā’ ā’Mi ; let P āā’ ā’P

end if

for j from i + 1 to n do

Let C āā’ Mj,i

Let row Mj āā’ Mj ā’ C/P Ā· Mi

if Mj,i = 0 then

Let done āā’ false

end if

end for

end while

end for

if Mn,n = 0 then

Output ā˜Non-invertible systemā™, Exit Algorithm

end if

if Mn,n < 0 then

Let row Mn āā’ ā’Mn

end if

for i from 2 to n do

Let P āā’ Mi,i

for j from 1 to i ā’ 1 do

Let C āā’ Mj,i

Let row Mj āā’ Mj ā’ C/P Ā· Mi

end for

end for

Output Hermite normal form M

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 105

3.3.3.1 Linear algebra modulo composites and prime powers

When performing linear algebra modulo a composite number N , several

cases can be encountered. If the determinant of the matrix that deļ¬nes the

system is invertible modulo N , then the solution is unique and can be derived

using Gaussian elimination as in Section 3.3. However, if the determinant is

not invertible modulo N , then Gaussian elimination fails. In fact, without

loss of generality, we can limit ourselves to considering the case where N is

prime power. Indeed, otherwise we can use the Chinese remainder theorem.

More precisely, when N = N1 Ā· N2 and N1 and N2 are coprime, then any

solution of the linear system modulo N can be obtained by pasting together

a solution modulo N1 and a solution modulo N2 .

When N is a prime power, say N = pe , we can generalize Gaussian elim-

ination and solve the linear system anyway. The basic idea, remains the

same, ļ¬rst we triangularize the linear system, choosing for each pivot a value

which is āas invertible as possible.ā Formally, this means that if we deļ¬ne the

valuation v(x) of an element x in Z/pe Z as the multiplicity of p in any repre-

sentative for x, we should always choose as pivot the value in a given column

with the smallest valuation. During the second stage, when creating the list

of solutions, we can divide a value y by a pivot x if and only if v(y) ā„ v(x).

When possible, such a division yields pv(x) diļ¬erent solutions.

3.4 Sparse linear algebra

All the linear algebra algorithms that we have presented up to now deal

with dense matrices represented by their complete lists of entries. However,

there are many applications, both in scientiļ¬c computing and in cryptography,

where sparse matrices are involved. A sparse matrix is a matrix that contains

a relatively small number of non-zero entries. Very frequently, it takes the

form of a matrix in which each line (or column) only contains a small number

of non-zero entries, compared to the dimension of the matrix. With sparse

matrices, it is possible to represent in computer memory much larger matrices,

by giving for each line (resp. column) the list of positions containing a non-

zero coeļ¬cient, together with the value of the coeļ¬cient. Indeed, assuming an

average of l entries per line, storing a nĆ—n matrix requires about 2ln numbers

instead of n2 . When dealing with a sparse linear system of equations, using

plain Gaussian elimination is often a bad idea. Each pivoting step during

Gaussian elimination increases the number of entries in the matrix and after

a relatively small number of steps, the matrix can no longer be considered

as sparse. As a consequence, if the dimension of the initial matrix was large,

Gaussian elimination quickly overļ¬‚ows the available memory. In order to deal

with sparse systems, a diļ¬erent approach is required. Moreover, sparsity is

Ā© 2009 by Taylor and Francis Group, LLC

106 Algorithmic Cryptanalysis

not a well-behaved mathematical property. In particular, the inverse of a

sparse invertible matrix is not necessarily sparse. As a consequence, the best

we can hope for is an eļ¬cient, sparsity preserving algorithm to solve a single

linear system of equations.

Two main families of algorithms have been devised for sparse systems. One

family called structured Gaussian elimination contains variations on the ordi-

nary Gaussian elimination that chooses pivots in order to minimize the loss

of sparsity. The other family uses a totally diļ¬erent approach; it does not try

to modify the input matrix but instead aims at directly ļ¬nding a solution of

the linear system using only matrix by vector multiplications. In this family,

we ļ¬nd the Lanczosā™s and the Wiedemannā™s algorithms.

3.4.1 Iterative algorithms

3.4.1.1 Lanczosā™s algorithm

Lanczosā™s algorithm is a famous algorithm which has been devised to ļ¬nd

solutions of linear algebra systems of real or complex numbers. It is much

easier to describe when we can rely on a notion of convergence. Thus to

explain this algorithm, we temporarily leave our usual setting, forget about

ļ¬nite ļ¬eld and consider a linear equation M y = x over the real numbers. For

simplicity, we assume that M is square and invertible. Moreover, without loss

of generality, we may assume that M is symmetric. Indeed, multiplying the

initial equation by the transpose of M , it can be transformed to an equation

( M M )y = ( M x), where ( M M ) is a symmetric matrix. When a square

n Ć— n matrix M is real, symmetric and invertible, it induces a scalar product

(Ā·|Ā·)M on the vector space Rn , deļ¬ned from the usual scalar product by:

(u|v)M = (u|M v) = (M u|v). (3.12)

Ā·

This scalar product induces a norm deļ¬ned as:

M

u = (u|u)M (3.13)

M

Over the ļ¬eld of real numbers, Lanczosā™s algorithm works by ļ¬rst construct-

ing an orthonormal basis of Rn for the scalar product (Ā·|Ā·)M , i.e., a family of

vectors v1 , v2 , . . . , vn such that:

(vi |vi )M = 1 āi ā [1 Ā· Ā· Ā· n] and (3.14)

(vi |vj )M = 0 āi = j.

Then it decomposes the solution y of M y = x over the basis (vi )iā[1Ā·Ā·Ā·n] using

the decomposition formula:

n

y= (y|vi )M vi . (3.15)

i=1

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 107

This decomposition formula can be used to solve the equation because the

coeļ¬cients (y|vi )M can be computed from x without previous knowledge of y

by remarking that:

(y|vi )M = (M y|vi ) = (x|vi ). (3.16)

Since the decomposition formula does not cost too much in terms of running

time, because it only requires n scalar products, this gives an eļ¬cient algo-

rithm if and only if the orthonormal basis can be constructed eļ¬ciently. A

nice approach works as follows, ļ¬rst choose a random vector w1 , then compute

v1 as:

w1

v1 = . (3.17)

w1 M

From v1 , we construct the orthonormal basis iteratively, computing for

each i, wi = M Ā· vi and letting vi by the vector obtained by orthonormalizing

wi , i.e., letting:

iā’1

w i = wi ā’ (wi |vj )M vj and (3.18)

j=1

wi

vi = . (3.19)

wi M

At ļ¬rst, this orthonormalization process does not seem eļ¬cient, because

a naive implementation requires i scalar products at each step and O(n2 )

scalar products for the complete algorithm. However, it is easy to modify the

computation and perform two scalar products at each step. The reason is

that for i > j + 1, the scalar product (wi |vj )M is already 0, as a consequence

we can rewrite the computation of wi as:

wi = wi ā’ (wi |viā’1 )M viā’1 ā’ (wi |viā’2 )M viā’2 (3.20)

The reason for this simpliļ¬cation is that:

(wi |vj )M = (M vi |vj )M = (vi |M vj )M = (vi |wj )M . (3.21)

Since wj is in the vector space spanned by v1 , . . . , vj+1 , whenever i > j + 1,

this coeļ¬cient is already 0.

We stop the process, when the orthogonalized vector is equal to 0. Clearly,

the sequence of vectors v1 , . . . , vk that is generated is an orthonormal family.

However, is it a basis? The answer is yes if the family is large enough, more

precisely, if k = n. Due to the initial random choice of v1 , this is the most

frequent case.

Moreover, with most real matrices, Lanczosā™s algorithm has a very useful

property of convergence; even partial decompositions, which are obtained by

truncating the sum in Equation (3.15), quickly give very good approximations

of y.

Ā© 2009 by Taylor and Francis Group, LLC

108 Algorithmic Cryptanalysis

Very surprisingly, Lanczosā™s algorithm can also be applied to linear system

deļ¬ned over ļ¬nite ļ¬elds. Of course, in ļ¬nite ļ¬elds, we cannot rely on conver-

gence arguments and need to run the algorithm till the end. However, if the

sequence (vi ) really forms a basis of the image vector space of M , everything

remains ļ¬ne. Thus, Lanczosā™s algorithm works over ļ¬nite ļ¬eld, as long as the

construction of the sequence (vi ) does not abort prematurely. It can abort for

two main reasons. First, as in the real ļ¬eld case, some vector wk may already

belong to the vector space spanned by the previously obtained vectors in (vi ).

Second, in ļ¬nite ļ¬eld, it may happen that wi M = 0 with wi = 0. This

comes from the fact that in a ļ¬nite ļ¬eld, all computations need to be per-

formed modulo the characteristic of the ļ¬eld. To illustrate the problem, take

the row vector x = (1, 1), over the real ļ¬eld, its Euclidean norm is x = 2;

however, over F2 the norm is taken modulo 2 and thus equal to 0. When the

characteristic is small, this problem occurs frequently and Lanczosā™s algorithm

needs to be modiļ¬ed into a block algorithm (see Section 3.4.1.3). However,

when the characteristic is a large prime p, this is a very rare event. More

precisely, due to the initial randomization, we can heuristically estimate the

probability of error at each step of the algorithm as 1/p. As a consequence,

the overall probability of failure is roughly n/p. Thus, for large values of p,

this probability is negligible.

When implementing Lanczosā™s algorithm over ļ¬nite ļ¬elds, in order to avoid

the computation of the square roots that appear in the computation of norms,

it is preferable to avoid normalizing the vectors vi and instead to divide by

their norms where necessary. This is described in Algorithm 3.11.

3.4.1.2 Wiedemannā™s algorithm

Wiedemannā™s algorithm is another approach to ļ¬nd solutions of linear sys-

tems using matrix-vector products. However, instead of computing an or-

thogonal family of vectors, it aims at reconstructing a minimal polynomial.

Before presenting this algorithm, we need to recall a few facts about square

matrices and minimal polynomials.

3.4.1.2.1 Minimal polynomials of matrices Given a square matrix A

over a ļ¬eld K and a univariate polynomial f in K[X], it is clear that we can

evaluate f at A, thus computing another matrix f (A) of the same dimension

as f . When f (A) is the zero matrix, we say that f annihilates A. Let IA be

the set of polynomials of K[X] that annihilate A. In fact, this set IA is an

ideal of K[X]. Indeed, if f and g both annihilate A, then for all polynomial

Ī± and Ī², we see that Ī±f + Ī²g also annihilates A. Since IA is an ideal of

univariate polynomials, if IA is diļ¬erent from the zero ideal, there exists a

polynomial fA , unique up to multiplication by a non-zero element of K, that

generates IA . This polynomial fA is called the minimal polynomial of A.

It remains to show that IA is not the zero ideal. This is a simple corollary

of the Cayley-Hamilton theorem:

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 109

Algorithm 3.11 Lanczosā™s algorithm over ļ¬nite ļ¬elds

Require: Input vector x and routines for multiplications by M and M

Let X āā’ M Ā· x

Initialize vector y to zero

Initialize vector v1 to random

Let w1 āā’ M M v1

Let N1 āā’ (w1 |v1 )

Let y āā’ (X|w1 )v1 /N1

Let v2 āā’ w1 ā’ (w1 |w1 )v1 /N1

Let w2 āā’ M M v2

Let N2 āā’ (w2 |v2 )

Let y āā’ y + (X|w2 )v2 /N2

for i from 3 to n do

Let v3 āā’ w2 ā’ (w2 |w1 )v1 /N1 ā’ (w2 |w2 )v2 /N2

Let w3 āā’ M M v3

Let N3 āā’ (w3 |v3 )

if N3 = 0 then

Exit Loop.

end if

Let y āā’ y + (X|w3 )v3 /N3

Let v1 āā’ v2 , w1 āā’ w2 , N1 āā’ N2 .

Let v2 āā’ v3 , w2 āā’ w3 , N2 āā’ N3 .

end for

Let z āā’ M y

if z = x then

Output: y is a solution of the system.

else

Let Z āā’ M z

if Z = X then

Output: z ā’ x is in the kernel of M .

else

Output: Something wrong occurred.

end if

end if

Ā© 2009 by Taylor and Francis Group, LLC

110 Algorithmic Cryptanalysis

THEOREM 3.1

For any square matrix A, let FA , the characteristic polynomial of A, be deļ¬ned

as FA (X) = det(A ā’ X Ā· Id). Then, FA annihilates A.

PROOF See [Lan05].

3.4.1.2.2 Application to linear systems Writing down the minimal

polynomial of A as:

d

Ī±i X i ,

fA (X) = (3.22)

i=0

we can rewrite fA (A)0 as:

d

Ī±i Ai = 0. (3.23)

i=0

As a consequence, for any vector b, we ļ¬nd:

d

Ī±i (Ai Ā· b) = 0. (3.24)

i=0

This implies that the sequence B deļ¬ned as Bi = Ai Ā· b satisļ¬es a relation of

linear recurrence:

dā’1

1

Bi+d = ā’ Ī±j Bi+j . (3.25)

Ī±d j=0

If Ī±0 = 0, then this linear recurrence can be used backward and in particular,

we may write:

d

1

B0 = ā’ Ī±i Bi . (3.26)

Ī±0 i=1

Note that when Ī±0 = 0, we can factor X out of the minimal polynomial fA and

writing AĀ·(fa /X)(A) = 0 conclude that A is non-invertible. Indeed, otherwise

fA /X would annihilate A, which by minimality of fA is not possible.

From this remark, we can derive the basic idea of Wiedemannā™s algorithm:

ā¢ To solve Ax = y, build a sequence y, Ay, . . . , Ai y, . . .

ā¢ Find a recurrence relation in the above sequence.

ā¢ Remark that the sequence can be expanded on its left by adding x and

use the recurrence backward to recover x.

However, this basic idea, as we just presented, suļ¬ers from a major ob-

struction. In order to determine this recurrence relation, we need a number

Ā© 2009 by Taylor and Francis Group, LLC

Linear algebra 111

of vectors at least equal to the length of the relation. Storing all these vec-

tors requires roughly the same amount of memory as storing a dense matrix

of the same dimensions as A. Of course, this is not acceptable for a sparse

linear algebra algorithm. To avoid this problem, we do not store all the vec-

tors in the sequence. Instead, we only keep their scalar product with some

ļ¬xed vector. Of course, the current vector needs to be kept in memory when

computing the next one, it is only erased after this computation. After com-

puting enough of these scalar products, roughly twice the dimension of A, we

can use Berlekamp-Massey algorithm from Chapter 2 to recover the minimal

recurrence relation satisļ¬ed by the sequence of scalar product. Expressed as

a polynomial, this recurrence relation divides the minimal polynomial fA . In

order to get rid of the bad case where we have a proper divisor, it is useful to

study more carefully the relation between these polynomials.

For this analysis, let us look at the following polynomials:

ā¢ fA the minimal polynomial of A.

ā¢ fA the minimal polynomial of the sequence S b of vectors b, Ab, . . . , Ai b,

b

...

b,u

ā¢ fA the minimal polynomial of the sequence T b,u of vectors (b|u), (Ab|u),

. . . , (Ai b|u), . . .

It is clear that fA annihilates S b and that fA annihilates T b,u . As a conse-

b

b,u b b

quence, fA divides fA and fA divides fA . Note that when the constant term

b,u b

of fA or of fA is zero, so is the constant term of fA and A is non-invertible.

b,u

The reverse is false in general. In fact, it is even possible to have fA = 1, for

example when u = 0.

x

To solve a system Ay = x, it suļ¬ces to compute fA . Indeed, assume that

d

fA = i=0 Ī±i X i with Ī±0 = 0 and consider the vector:

x

d

1

Ī±i Aiā’1 x.

y=ā’ (3.27)

Ī±0 i=1

Multiplying by A, we ļ¬nd that:

d

1

Ī±i Ai x = x,

Ay = ā’ (3.28)

Ī±0 i=1

x

by deļ¬nition of the polynomial fA .

x

However, as said above, directly computing fA would require to store the

sequence S b and would be too costly in terms of memory. As a consequence,

b,u

Wiedemannā™s algorithm focuses on the computation of fA for a random

(non-zero) vector u. Moreover, it uses probabilistic arguments to show that

ńņš. 4 |