ńņš. 6 |

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 141

analyzed precisely by Pomerance in [Pom82] for Dixonā™s factoring algorithm

and allowed improvements of its asymptotic time complexity.

Once trial division is ļ¬nished, we could move to other generic factoring al-

gorithms such as Pollardā™s Rho or ECM to factor the reduced norms; however,

at this point, it is tempting to use the extra structure shared by our candi-

dates. All of them are of the form a + bĪ± with (a, b) in the sieve zone. A good

trick in order to ļ¬nd among the candidates those which are divisible by an

element of B is to sieve again. More precisely, we initialize the sieve zone to

zero except in locations corresponding to candidates, where we write a iden-

tiļ¬er for the candidate. During this second sieve, for each point encountered

during the lattice walk, we do nothing if the pointā™s location contains a zero

and otherwise we divide the reduced norm of the corresponding candidate,

taking multiplicity into account. This approach allows us to factor out all

elements of B corresponding to medium-sized primes.

Finally, after another ļ¬ltering pass, where we keep small enough reduced

norms, we complete the factorization using primality tests and either Pollardā™s

Rho (see Chapter 7) or ECM depending on our choice of factor base B.

Note that, while checking for smoothness, we obtained the factorization of

the norm of each candidate. In order to avoid redoing this job later, it is

preferable in most cases to store this factorization together with the smooth

candidate. This stored factorization is used in the subsequent phases of index

calculus method described in Chapter 15.

4.2.1.2.4 Working with smaller arrays. In order to reduce the re-

quired amount of memory, a natural approach is to mimic the idea behind

wheel factorization described in Section 4.1.2.1. The basic idea behind wheel

factorization is to remove from the memory representation all elements which

are known to be useless. With the current problem, we known in advance

that a pair (a, b) is useless whenever the GCD of a and b is not one. Thus a

good way to reduce the memory usage is to ļ¬lter out such pairs. Of course,

the ļ¬ltering should not be too complicated to avoid high runtime penalties.

The simplest approach is to remove all pairs where both a and b are even,

this reduces the amount of memory by 1/4 without involving any complicated

computation. In fact, it is similar to the removal of even integers in Eratos-

thenesā™s sieve. Of course, one could push the approach further and remove all

pairs with 2, 3, 5, . . . as a common factor between a and b. However, this

is much less interesting than wheel factorization. Indeed, in Eratosthenesā™s

sieve, the proportion of integers removed by adding p to the wheel is roughly

1/p, here it is 1/p2 . This new ratio is not so good, since the sum of the series

formed of the squared inverses of primes is converging.

4.2.1.3 The case of polynomials over F2

Some index calculus algorithms need to sieve over pairs (a, b) where a and

b are univariate polynomials. In this section, we address the frequent special

Ā© 2009 by Taylor and Francis Group, LLC

142 Algorithmic Cryptanalysis

case where these polynomials are deļ¬ned over F2 . Using polynomials implies

a slight rewriting of some of the mathematical objects involved, for example

we need to replace prime numbers by irreducible polynomials. However, these

changes are unessential where the sieving process is concerned. Here, we are

interested by the direct impact of using polynomials on the sieve algorithm.

Interestingly, using polynomials over F2 greatly simpliļ¬es the task of sieving.

This is the reason why we start with this case. The ļ¬rst remark concerns the

sieving zone. Since we are working over F2 any polynomial is its own opposite.

This additional symmetry leads us to reduce the sieve area to a square zone

a and b should both belong to the same set of small polynomials. Moreover,

it is extremely convenient to deļ¬ne this set by giving a upper bound on the

degrees of a and b. However, this greatly limits the ļ¬‚exibility of the memory

usage. Taking all polynomials of degree < D, we have 22D possible pairs (a, b).

Writing a and b as polynomials in x, it is easy to remove from the memory

representation pairs (a, b) where a and b are both multiples of x. Similarly, it

is easy to remove pairs (a, b) where a and b are both multiples of x + 1. As

soon as D ā„ 2 removing these pairs lowers the required amount of memory

to 9/16 of the initial amount 22D .

From a practical point-of-view, polynomials over F2 can be eļ¬ciently rep-

resented by numbers encoded in binary, as long as their degree is smaller than

the available number of bits. Assuming that the bits are numbered from right

to left starting with bit 0, we use bit i to represent the coeļ¬cient of xi in the

polynomial. For example, the low order bit (rightmost) is used to encode the

constant term. With this convention, the polynomial x3 + x + 1 is encoded

by the number 11. With this representation, polynomials can be added using

the XOR operation on numbers, they can be multiplied by x using a simple

left shift (multiplication by 2 of the corresponding number). Moreover, within

this representation, multiples of x and x + 1 are easily recognized. A multiple

of x corresponds to an even number and a multiple of x + 1 corresponds to

a number whose Hamming weight is even, or equivalently to a number that

contains an even number of 1s in its representation.

Despite its drawback of limiting the choice in terms of used memory, choos-

ing a sieving zone comprising all polynomials of degree < D is extremely

convenient for sieving. First, when representing polynomials over F2 as an

integer, testing whether the degree is < D can be done by comparing the

polynomialā™s representation with 2D . Since integer comparison is a native

operation on modern computers, this is extremely eļ¬cient. Second, when

sieving, we need to construct all multiples of a given polynomial up to degree

D. If we start with a polynomial of degree d, this is achieved by multiplying

it by all polynomials up to degree D ā’ d. With another choice of sieving zone,

we cannot use the additivity of degrees during multiplication and thus need

to use a more complicated and less eļ¬cient approach. When constructing all

polynomials up to degree D, we can even speed up the construction of the

multiples by using a Gray code to enumerate them quickly.

To make this more precise, assume that we want to mark all pairs of

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 143

Binary Gray codes oļ¬er a way of Programming Gray codes on n is a

enumerating all vectors on n-bits, simple matter. It suļ¬ces to run an

arithmetic counter i from 1 to 2n ā’1

while changing only a single bit be-

tween one vector and the next. For that contains the number of the cur-

example, on 4 bits, we have the fol- rent change. The position of the bit

lowing code: that needs to be changed during the

i-th is the position of the rightmost

0 0 0 0

bit equal to 1 in the binary repre-

0 0 0 1

sentation of i. The position of the

0 0 1 1

rightmost bit is easily obtained by

0 0 1 0

shifting a copy of i to the right, until

0 1 1 0

the resulting number becomes odd.

0 1 1 1

Note that Gray codes can easily be

0 1 0 1

generalized to vectors of n elements

0 1 0 0

in a small ļ¬nite ļ¬eld Z/pZ. For ex-

1 1 0 0

ample, for vectors of 3 ternary dig-

1 1 0 1

its, we obtain the following Gray

1 1 1 1

code:

1 1 1 0

000 101

1 0 1 0

001 111

1 0 1 1

002 112

1 0 0 1

012 110

1 0 0 0

010 210

which contains the sixteen possibil- 011 211

ities. Gray codes are useful when- 021 212

ever manipulating complex objects 022 222

where changes are costly. In this 020 220

case, reducing the total number of 120 221

changes is clearly worthy. In partic- 121 201

ular, it is possible to use Gray codes 122 202

to eļ¬ciently produce all multiples 102 200

of a binary polynomial a(x) by all 100

polynomial of degree at most n ā’ 1

by using a Gray code on n bits. We Ternary Gray codes are slightly less

start from an initial multiple equal suited to the architecture of present

to 0, i.e., 0 Ā· a(x), assuming that bits computer, because to implement

are numbered from the right start- them, it is preferable to know the

ing at 0, a change on bit c in the decomposition of the change counter

Gray code is converted into an ad- i in base 3. As a consequence, they

dition of xc a(x) to the current mul- use many reductions modulo 3 and

tiple. division by 3.

Figure 4.4: Gray codes

Ā© 2009 by Taylor and Francis Group, LLC

144 Algorithmic Cryptanalysis

polynomials (a(x), b(x)) divisible by some element of the smoothness basis:

(p(x), Ī±p (x)), where p(x) is an irreducible polynomial. These pairs satisfy the

equation:

a(x) + b(x)Ī±p (x) ā” 0 (mod p(x)). (4.3)

Using a partial run of Euclidā™s extended GCD Algorithm 2.2 on the pair of

polynomials (Ī±p (x), p(x)), we can ļ¬nd two diļ¬erent small solutions, (a1 , b1 )

and (a2 , b2 ), where the degrees of a1 , a2 , b1 and b2 are near the half-degree

of p(x). Moreover, any solution (a, b) can be written as a linear combination

of these two initial solutions, i.e., there exist two polynomials Ī»(x) and Āµ(x)

such that:

a(x) = Ī»(x)a1 (x) + Āµ(x)a2 (x) and (4.4)

b(x) = Ī»(x)b1 (x) + Āµ(x)b2 (x).

In order to exhibit all solutions, it suļ¬ces to consider all pairs (Ī», Āµ) with small

enough degree. As a consequence, we simply need to nest two independent

Gray code enumerations to eļ¬ciently enumerate all the (a, b) pairs. This is

much simpler than the corresponding algorithm in the case of numbers, as can

be seen on Algorithm 4.4. This algorithm also tests whether both coordinates

in each pair are multiples of x or x + 1 before marking the pair.

4.2.1.4 The case of numbers

Sieving with numbers is more complicated than sieving with polynomials

over F2 ; we must take into account the geometry of both the sieving zone and

the lattice to be sieved. Due to the variety of relative positions of the basis

vectors (u, v), many diļ¬erent cases are possible. Assuming that the latticeā™s

basis is Gaussā™s reduced as described in Chapter 10, we know that u is (one of)

the shortest in the lattice and that v is (one of) the shortest among the latticeā™s

vectors linearly independent from u. Moreover, |(u|v)| is smaller than u /2.

Thus, u and v are both short and they are nearly orthogonal to each other.

This means that considering vectors of the form cu u + cv v for cu and cv in

given intervals allows us to cover an almost rectangular looking parallelogram.

However, the axes of these parallelogram have no reason whatsoever to be

aligned with the axis of the sieving zone. We describe one possible way of

marking the multiples in Algorithm 4.5. In this code, we denote by wa and

wb the coordinates of a vector w. A speciļ¬c instance of the algorithm is

illustrated in Figure 4.5.

4.2.1.5 The case of polynomials in odd characteristic

Finally, the case of polynomials in odd characteristic is a mix of the two

previous cases. Either the characteristic is small and the degree of polynomials

quite large and we proceed as for polynomials over F2 , with the small caveat

that it is no longer possible to represent polynomials simply as bit vectors. Or

the characteristic is larger and we consider only polynomials of small degree,

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 145

Algorithm 4.4 Walking the multiples with polynomials

Require: Element of B: (p(x), Ī±p (x))

Require: Maximal degrees of a(x) and b(x) in the sieving zone, da and db

Create initial solutions (a1 (x), b1 (x)) and (a2 (x), b2 (x))

Let d1 āā’ min(da ā’ deg a1 , db ā’ deg b1 )

Let d2 āā’ min(da ā’ deg a2 , db ā’ deg b2 )

Let (Ī±(x), Ī²(x)) āā’ (0, 0)

for i from 0 to 2d1 ā’ 1 do

if i = 0 then

Let i āā’ i

Let c āā’ 0

while i is even do

Let i āā’ i /2

Let c āā’ c + 1

end while

Let Ī±(x) āā’ Ī±(x) ā• xc a1 (x)

Let Ī²(x) āā’ Ī²(x) ā• xc b1 (x)

end if

for j from 0 to 2d2 ā’ 1 do

if j = 0 then

Let j āā’ j

Let c āā’ 0

while j is even do

Let j āā’ j /2

Let c āā’ c + 1

end while

Let Ī±(x) āā’ Ī±(x) ā• xc a2 (x)

Let Ī²(x) āā’ Ī²(x) ā• xc b2 (x)

end if

if Ī±(0) = 0 or Ī²(0) = 0 then

if Ī±(1) = 0 or Ī²(1) = 0 then

Mark (Ī±(x), Ī²(x))

end if

end if

end for

end for

Ā© 2009 by Taylor and Francis Group, LLC

146 Algorithmic Cryptanalysis

Algorithm 4.5 Walking the multiples with numbers

Require: Element of B: (p, Ī±p )

Require: Sieve limits Sa and Sb

Create reduced basis (u, v) for lattice

Modify basis to ensure 0 ā¤ ua < va and ub Ā· vb ā¤ 0

Assume ub < 0 ā¤ vb (otherwise adapt boundary condition in the sequel)

0

Let w āā’

0

repeat

Let z āā’ w

Let w āā’ w + u

if wa ā„ va then

Let w āā’ w ā’ v

end if

until wb ā¤ ā’Sb

Let w āā’ z {Here w is in lower left corner}

Let z āā’ w

repeat

Let t āā’ z {Start ļ¬rst sieving zone}

repeat

if ta is odd or tb is odd then

Mark point corresponding to t

end if

Let t āā’ t + u

until ta ā„ Sa or tb ā¤ ā’Sb

Let z āā’ z + v

until za ā„ Sa or zb ā„ Sb

Let w āā’ w ā’ u

if wa < 0 then

Let w āā’ w + v

end if

repeat

Let z āā’ w {Second sieving zone}

repeat

if za is odd or zb is odd then

Mark point corresponding to z

end if

Let z āā’ z + v

until za ā„ Sa or zb ā„ Sb

Let w āā’ w ā’ u

if wa < 0 then

Let w āā’ w + v

end if

until wb ā„ Sb

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 147

b

one

ing Z

Siev

nd

Seco

a

Firs

t Sie

ving

Zone

Figure 4.5: Illustration of Algorithm 4.5

Ā© 2009 by Taylor and Francis Group, LLC

148 Algorithmic Cryptanalysis

sometimes linear polynomials. Since we are sieving over large coeļ¬cients, this

behaves more like the case of Section 4.2.1.4.

4.2.2 Advanced sieving approaches

The main drawback of the basic algorithm is its huge memory requirement.

Since it sieves the whole sieving zone at once, it needs enough memory to store

this zone, which is impractical in most cases. In order to lift this diļ¬culty, the

natural answer is to cut the sieving zone in small fragments and address each

in turn. In the literature, two important ways of fragmenting the sieving zone

are considered. The simplest, called line sieving, cuts the sieving zone in lines,

each line corresponding to a ļ¬xed value of one coordinate in the pair (a, b).

The other frequent approach looks at two-dimensional fragments obtained by

considering points on sub-lattices within the sieving zone. This is called the

special-q approach. We now discuss both approaches in detail.

4.2.2.1 Line sieving

Line sieving is a simple and elegant solution to sieve large zones. In fact, it is

even conceptually easier than the basic lattice Algorithm 4.3. Indeed, it does

(i)

not even require a good lattice description of the linear equation a + bĪ±p ā”

0 (mod p). Assume that we sieve by lines corresponding to a ļ¬xed value

of b. Since our choice of sieving zone has a ā„ 0, for each value of b and

(i)

for each element (p, Ī±p ) of the smoothness basis, we simply start with the

(i)

smallest non-negative representative of bĪ±p mod p and tick every p-th integer

starting from that point. When sieving several consecutive lines on the same

machine, the performance can be greatly improved by remarking that given

(i)

the starting point of (p, Ī±p ) for b, the corresponding starting point for b + 1

can be computed with a single modular addition which is more eļ¬cient than

a modular multiplication. This results in Algorithm 4.6 to sieve a set of

consecutive lines.

Of course, as with the basic lattice sieve, it is easy to derive variations of

this algorithm that simply count the number of divisors of each pair (a, b)

or that remove very small elements of B from the sieving loop. Once this is

done, we can observe that the innermost operation that adds to LogSum[a]

is the most important as running time goes. As a consequence, it is worth

optimizing this inner loop. The key fact is that in most cases, Sa is quite

large and thus the total amount of memory we use is larger than the cache

size. As a consequence, it is a good idea to take the cache structure into

account in order to sieve faster. Assuming that the memory structure is

simple, with a single level of cache, this can be done by further segmenting

the representation of the a values in memory into several blocks that ļ¬t in

the cache. For elements of B with a p value smaller than the blocksā™ size, we

can sieve a given block while staying within the cache. For larger values of p,

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 149

Algorithm 4.6 Basic line sieve

Require: Smoothness basis B

Require: Sieve limit Sa

Require: Range for b: bmin Ā· Ā· Ā· bmax

Require: Expected value LogV al for smooth candidates

Create single dimensional array LogSum indexed by [O Ā· Ā· Ā· Sa ā’ 1]

Create single dimensional array StartP os indexed by B

(i)

for all P = (p, Ī±p ) in B do

(i)

StartP os[P] āā’ bmin Ī±p (mod p)

end for

for b from bmin to bmax do

Initialize LogSum to zero

(i)

for all P = (p, Ī±p ) in B do

a āā’ StartP os[P]

while a ā¤ Sa do

Add log p to LogSum[a]

a āā’ a + p

end while

end for

for a from 0 to Sa do

if LogSum[a] ā„ LogV al then

Check whether a + bĪ± is really smooth

If so Display((a, b),ā˜corresponds to a smooth valueā™)

end if

end for

(i)

for all P = (p, Ī±p ) in B do

(i)

StartP os[P] āā’ StartP os[P] + alphap (mod p)

end for

end for

Ā© 2009 by Taylor and Francis Group, LLC

150 Algorithmic Cryptanalysis

unfortunately, we need to jump between blocks and do not gain. Luckily, the

small values of p require more operations than the small ones, which makes

this optimization very useful. We illustrate this idea with two levels of cache

by the C code given in Program 4.3.

4.2.2.2 Special-q lattice sieving

The other frequent way of splitting a sieve into smaller is more technically

involved than line sieving. However, it presents the very useful property

of guaranteeing that all the pairs being considered are already divisible by

a common element. This common element, called the special q is usually

chosen among the element B associated to a large prime value. Each piece of

the sieve corresponds to a diļ¬erent special-q. One technicality is the fact that

the various pieces are not necessarily disjoint. As a consequence, when using

this algorithm for sieving, some solutions are duplicated. These duplicates

should be removed using a ļ¬ltering program. Given a special q written as

(q, Ī±q ), the special-q lattice sieving algorithm focuses on pairs (a, b) divisible

by the special and thus satisfying the linear condition a+bĪ±q ā” 0 (mod q). As

explained in the case of the basic lattice sieving, this corresponds to pairs (a, b)

on some integer lattice, which can be described by a reduced basis (uq , vq ).

In most cases, the largest coordinate of each vector in the reduced basis is

ā

in absolute value not too far from q. When this is not the case, the norm

of uq is much smaller than the norm vq and the basis is unbalanced. With

such an unbalanced basis, special q lattice sieving is much less convenient.

As a consequence, we omit special qā™s with unbalanced basis and can assume

from now on that the reduced basis is balanced. More precisely, we ļ¬x some

constant Ī and keep a given special-q only when the coordinates of the two

ā

basis vectors are bounded (in absolute value) by Ī q. Using this choice

allows us to choose the sieving zone within the sublattice independently of the

corresponding special q. We consider pairs (a, b) written as cu uq + cv vq with

(cu , cv ) in a ļ¬xed sieving zone [0 Ā· Ā· Ā· Su ]Ć—[ā’Sv Ā· Ā· Ā· Sv ]. A direct consequence of

this choice is that sieving such a sublattice can be done using the basic lattice

sieve Algorithm 4.3. The only condition is that all elements of B should be

expressed in terms of the new coordinates (cu , cv ).

(i)

In order to express an element (p, Ī±p ) of B in terms of the new coordinates,

let us recall that a + bĪ± belongs to the corresponding lattice whenever the

(i)

linear condition a + bĪ±p ā” 0 (mod p) is satisļ¬ed. Replacing a and b by their

expression in cu and cv we ļ¬nd the condition:

(cu ua + cv vq ) + (cu ub + cv vq )Ī±p ā” 0 (mod p) or

a b (i)

(4.5)

q q

cu (ua + ub Ī±p ) + cv (vq + vq Ī±p ). ā” 0 (mod p)

(i) a b (i)

q q

This is a linear equation in cu and cv modulo p. As a consequence, we can

follow the same approach as in the case of the original equation in a and b

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 151

Program 4.3 Line sieving with two levels of cache

#define SIZE 510000 /* Max. Number of prime pairs */

#define ZONE 9000000 /* Size of each zone, mult of CACHE2 */

#define CACHE1 25000 /* Approximate size of L1 CACHE */

#define CACHE2 750000 /* Approximate size of L2 CACHE */

#define START_ZONE -500 /* Number of first sieve zone */

#define END_ZONE 500 /* Number of last sieve zone */

#define THRESHOLD 12 /* Num. of hits per smooth candidate */

#define START 30 /* Prime pairs below START not sieved */

int root[SIZE], mod[SIZE]; /* Arrays for prime pairs */

char Zone[ZONE]; /* Array for sieving */

int Size; /*True number of pairs */

int Size1, Size2; /*Limits to fit in CACHE1 and CACHE2 */

main()

{ int i,j,offset; int limit2,limit1;

Read();/*Get prime pairs, set Size1/2 to fit CACHE1/2*/

for(i=START;i<Size;i++) {

int r,q; /* Loop shifts primes pairs to first zone */

q=mod[i]; offset=MulMod(START_ZONE,ZONE,q);

r=root[i]-offset; if (r<0) r+=q; root[i]=r; }

for (j=START_ZONE;j<END_ZONE;j++){

for(i=0;i<ZONE;i++) Zone[i]=0; /*Reset sieve zone */

for(limit1=limit2=0;limit2<=ZONE;limit2+=CACHE2) {

for(;limit1<=limit2;limit1+=CACHE1) {

for(i=START;i<Size1;i++) { /* Sieve small*/

int r,m; r=root[i]; m=mod[i];

while (r<limit1) {Zone[r]++; r+=m;}

root[i]=r;}}

for(i=Size1;i<Size2;i++) { /* Sieve medium*/

int r,m; r=root[i]; m=mod[i];

while (r<limit2) { Zone[r]++; r+=m;}

root[i]=r;}}

for(i=START;i<Size2;i++) { /* Shift to next zone */

root[i]=root[i]-ZONE;}

for(i=Size2;i<Size;i++) { /* Sieve large */

int r,m; r=root[i]; m=mod[i];

while (r<ZONE) {Zone[r]++; r+=m;}

root[i]=r-ZONE;}

for(i=0;i<ZONE;i++){ /* Detect and print smooth candidates*/

if (Zone[i]>=THRESHOLD) {printf("F(%d*SZ+%d);\n",j,i);

fflush(stdout); }}}}

Ā© 2009 by Taylor and Francis Group, LLC

152 Algorithmic Cryptanalysis

in order to write down a good representation of the lattice comprising all the

coordinates (cu , cv ) of divisible elements.

Finally, we see that lattice sieving with special-q does not diļ¬er much from

basic lattice sieving. We need to reinitialize the representations of elements of

B for each special-q and change the accepting bound on the number of divisors

or on the sum of their logarithm to account for the additional special-q divisors

and their inļ¬‚uence on the value of norms. Except for this minor change,

everything else is a verbatim copy of our previous approach. However, the

fact that we need to sieve on a relatively small zone for each special-q greatly

enhances the sieve performance.

4.2.3 Sieving without sieving

Before closing this chapter on sieving, it is important to mention an alterna-

tive algorithm proposed by D. Bernstein in [Ber00] that allows the implemen-

tation of index calculus algorithms without using sieving at all. Instead, his

algorithm addresses the problem of ļ¬nding smooth enough numbers within

a long list of numbers. A nice property of this algorithm is that contrary

to sieve algorithms, it does not require additional structure2 on the numbers

themselves. This interesting property could be especially useful for some spe-

ciļ¬c applications of index calculus such as Coppersmith factorization factory.

Note that this speciļ¬c algorithm is almost dedicated to index calculus algo-

rithms that use numbers, when polynomials come in play, it is easier to use

a fast polynomial factorization algorithm. However, Bernstein notes in his

paper that a careful comparison is required in the case of polynomials.

The key ingredient in Bernsteinā™s algorithm is to consider a large set of

numbers at the same time, to identify the set of primes belonging to the

smoothness basis that occur in the decomposition of the product of these

numbers and using a divide and conquer approach to distribute these primes

to smaller and smaller sets until the decomposition of each number is known.

The main diļ¬culty of the algorithm is the fact that it requires a two-way

divide and conquer approach, to ļ¬nd out which primes are occuring and in

which numbers. The algorithm proposed in [Ber00] nests two divide and

conquer approaches: the inner one is used to determine the set of primes

involved in a particular set of numbers and the outer one to focus on sets

on decreasing size. In order to make the inner loop eļ¬cient, the given set of

primes is preprocessed and represented as a product tree. The product tree is

a binary tree whose leaves are labelled by the given primes. Each inner node

of the tree is labelled by the product of the label of its two children. As a

consequence, the root of the product tree contains the product of all primes

in the given set.

2 With sieve algorithms, the additional structure comes from the fact that the numbers are

all obtained by evaluating a ļ¬xed polynomial at many regularly spaced points.

Ā© 2009 by Taylor and Francis Group, LLC

Sieve algorithms 153

Given a set of primes P and a set of numbers N , the outer divide and

conquer step needs to determine the subset P of primes that occurs in the

decomposition of numbers in N . First, it computes the product Ī N of all

numbers in N and a product tree TP for P . Then it reduces Ī N modulo the

label of the root of the product tree TP . After that, it pushes this reduced

value down the product tree, computing Ī N modulo each label by reducing

the modular value of the father node in order to compute the reduced value

at each node. When the leaves are reached, it is easy to see that the prime

label of a leaf occurs in N if and only if the modular value of Ī N at this leaf

is zero. Thus, P is simply the subset of P corresponding where the modular

value of Ī N vanishes.

In order to reļ¬ne the decomposition, we simply need to split N in two

(or more) subsets and recursively apply the same algorithm on each subset

together with the prime set P . At the very end of the algorithm, we know

the list of primes for each numbers and can easily check for smoothness.

In order to improve the underlying number arithmetic, Bernstein proposes

a slightly diļ¬erent method of computing the modular values at each leaf of

TP . His approach assumes that 2 does not belong to the initial list of primes,

which is not a real constraint since trial dividing by 2 is so easy. Then, when

computing the reduced value of a node from the value of the father, instead

of doing an exact computation, he computes the modular value up to some

power of 2. This improves the arithmetic and of course does not change the

ļ¬nal criteria for odd prime: a prime p occurs in the decomposition of N if

and only if Ī N reduced modulo p is zero. Since 2 is invertible modulo the

odd prime p, multiplying the value Ī N by a power of 2 preserves the ļ¬nal

zero. This modiļ¬cation changes non-zero values but this is irrelevant for our

purpose.

A fair conclusion to this section is to remark that determining the impact of

Bernsteinā™s algorithm on index calculus methods is currently an open problem.

At the time being, no large index calculus publicly announced computation

has ever made use of this algorithm to improve running times. However,

studying its practical impact is an interesting research problem.

Ā© 2009 by Taylor and Francis Group, LLC

154 Algorithmic Cryptanalysis

Exercises

1. Program a basic implementation of Eratosthenesā™s sieve using a wheel

of perimeter 30.

2. Modify the implementation of Exercise 1 into a segmented sieve, similar

to Program 4.2.

3h . Using your answer to Exercise 4.2 as a template, write a program that

generates implementations of Eratosthenesā™s sieve with larger wheels of

ļ¬xed perimeter. Adding the primes 7, 11 and 13, write a code for wheels

of perimeter 210, 2310 and 30030. Time these programs and compare

their respective performances.

4. Write a version of Eratosthenesā™s sieve to deal with wheels of varying

perimeter.

5. Combine wheels of varying perimeter with a segmented sieve.

6h . Create a variant of Eratosthenesā™s sieve for ļ¬nding irreducible polyno-

mials over F2 . Make sure that you are using Gray codes and write a

segmented version.

7. An amusing variation of sieving process aims at constructing āLuckyā

numbers. It starts with the list of odd integers, consider the second

number ā˜3ā™ and cross every third number, i.e., 5, 11, . . . In this new list,

the third number is ā˜7ā™, thus we cross every seventh number in the list,

starting with ā˜19ā™. Write a basic sieve program for computing lucky

numbers. Which optimizations of Eratosthenesā™s sieve can you adapt to

this case?

8h . Choose a prime p and a polynomial P of degree d. Compare the following

approaches to determine all the polynomials of the form P + a for a in

Fp that have d diļ¬erent roots.

(a) Sieving on a line.

(b) Individual factorization of each P + a.

(c) Adapted variant of Bernsteinā™s algorithm.

Ā© 2009 by Taylor and Francis Group, LLC

Chapter 5

Brute force cryptanalysis

5.1 Introductory example: Dictionary attacks

Brute force is one of the less subtle1 ways of attacking cryptosystems; it

simply consists of trying all possible values for the secret to recover, until the

correct one is found. Yet, this approach is more tricky than it may seem.

To illustrate this point, we start this chapter by a special case of brute force

attacks: dictionary attacks. Typically, a dictionary attack is used to ļ¬nd out

the password that gives access to a userā™s account on an information system.

Since passwords are used for a large number of purposes, dictionary attacks

can be used in a large variety of contexts.

At the ļ¬rst level of dictionary attacks, we ļ¬nd simple password guessing, as

often encountered in books and movies. The attacker simply sits at his targetā™s

desk, thinks deeply for a little while and, after a couple of unsuccessful trials,

comes up with the correct password.

A slightly more sophisticated version is to automate the process and have

a tool for submitting many passwords until the correct one is found. For

example, this can be easily done when trying to connect to a remote server.

This kind of dictionary attack is called an online dictionary attack. For each

new tested password, the attacker needs to try to open a new session. One

problem (or advantage, depending on the point-of-view) of online dictionary

attacks is that they can be easily detected. For example, smart cards are often

protected using a very short 4-digit password, also called PIN code2 . Thus,

any attacker could easily discover the correct PIN code, simply by trying out

the 10,000 diļ¬erent possibilities. Assuming a slow rate of one trial per second,

this could be done in a couple of hours. To avoid this attack, smart cards are

programmed to detect sequences of consecutive failures. After three failures,

they usually stop accepting PIN codes and put themselves in a locked state.

To unlock the card, the user needs a diļ¬erent, much longer password.

Oļ¬„ine attacks are a much stronger form of dictionary attacks. In these

attacks, the attacker ļ¬rst recovers some side information and then using this

1 Of course, pointing a gun at the legitimate user and asking for the secret is even less subtle.

2 Where PIN stands for Personal Identiļ¬cation Number.

155

Ā© 2009 by Taylor and Francis Group, LLC

156 Algorithmic Cryptanalysis

information is able to test on his own whether a given password is correct or

not. With oļ¬„ine attacks, the attacker is only limited by the raw computing

he has access to. The side information used by the attacker can be of several

diļ¬erent types: it can be the transcript of a protocol execution where the

password was used, it can be an entry from a password ļ¬le, or even some

side-channel information obtained from the system during password testing.

For our illustration, we focus on the simple case of password ļ¬les entries.

Password ļ¬les are used by operating systems or applications that need to check

user passwords in order to store side information that can be used to verify

correctness of passwords. In their simplest form, password ļ¬les simply store

plaintext copies of passwords. Of course, this is a bad idea, since in that case

passwords are directly accessible to an attacker that obtains the password

ļ¬le, regardless of the passwordsā™ length. An equally bad idea is to store

encrypted passwords using a secret key algorithm, indeed, regardless of the

strength of the algorithm, the secret key needs to be stored somewhere to allow

password checking and if the attacker ļ¬nds this secret key, he can decrypt all

passwords in the ļ¬le. To avoid these straightforward attacks, the password

ļ¬le can contain either passwords encrypted with a public key algorithm or

hashed passwords. It also needs to contain auxiliary information that allows

passwords to be tested against the password ļ¬le. For instance, when using a

randomized public encryption, the random value used for encryption needs to

be stored. This allows at a later point to reencrypt a given password and to

check equality between this new encryption and the stored encryption.

The most common approach is to use hashed passwords, indeed, it is much

faster than using public key encryption. However, there is more to it than

simply running passwords through a standard hash function. Indeed, in that

case, attackers could prepare in advance a dictionary of hashed passwords and

store them in sorted form (see Chapter 6). This would allow using a simple

search to discover hashed passwords from the dictionary extremely quickly.

To avoid this, the hash function needs to be randomized. A simple approach

is to store a hash value of the string formed by concatenating a reasonable

short random string together with a userā™s password. To allow veriļ¬cation,

the random string is stored together with the corresponding hash value in the

password ļ¬le. When the random value is quite short and used as a side input

that modiļ¬es the computation of hash values, it is often called a salt. For

example, for decades, Unix-based operating systems used a salted variation

of the DES encryption algorithm to hash passwords. In order to avoid the

possibility of decryption, the password was used as key and served to encrypt

a ļ¬xed message. Even with a relatively short salt, storing hashed dictionaries

is highly impractical. Indeed, since you need enough dictionaries to cover

a noticeable fraction of all possible salts, the storage requirements quickly

become impractical.

More recently, passwords have been used, in conjunction with public key

mechanisms, in order to protect cryptographic protocols between users which

only share a short password (see [GL03]). In this context, the goal of the

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 157

designers is to make make sure that the only eļ¬cient attacks need one online

attempt to test one possible password and that, under some security hypoth-

esis, no oļ¬„ine attack exists.

In addition to brute force, dictionary attacks may also rely on time-memory

tradeoļ¬s. A typical example is Hellmanā™s time memory tradeoļ¬, described in

Chapter 7. Where brute force attacks are concerned, the key ingredient is to

combine an eļ¬cient dictionary of potential passwords with a very fast imple-

mentation of the underlying cryptographic algorithm. In the next section, we

illustrate this issue of fast implementations using the famous DES algorithm.

5.2 Brute force and the DES algorithm

The DES, Data Encryption Algorithm [DES77], was the ļ¬rst modern block

cipher available to a wide audience. It was standardized by the NBS, National

Bureau of Standards, in 1977. This block cipher encrypts 64-bit blocks using

56-bit keys. At the time of DES standardization, this keysize was judged to

be too small and potentially vulnerable to brute force attacks. However, no

successful brute force attack against DES was publicly reported until 1997. At

that time, RSA Inc. launched three successive challenges, asking challengers

to decrypt cyphertexts encrypted with the DES. The ļ¬nal challenge in 1999

was broken in less than a day, using a mix of hardware and software.

When software brute force attacks against DES are considered, it is es-

sential to use DES implementations that are very eļ¬cient. However, using

implementations that have been written with fast message encryption in mind

are far from optimal when used for cryptanalytic purposes. The main reason

is that to optimize the encryption speed, it is often a good idea to assume that

keys change once in a while and that message blocks change rapidly. For brute

force cryptanalysis, it is exactly the contrary, the known plaintext/ciphertext

pairs (or the available ciphertext for ciphertext only attacks) are ļ¬xed, and

the test key changes extremely quickly. As a consequence, optimizing encryp-

tion algorithms with brute force in mind is a very speciļ¬c task. In the rest of

this section, we describe in detail how this can be done in the case of DES.

For the sake of completeness, we start by recalling the description of the DES

algorithm.

5.2.1 The DES algorithm

DES is based on the Feistel construction and uses 16 consecutive rounds of

Feistel. Before applying these rounds, we ļ¬nd an unkeyed initial permutation

which moves bits around. After the Feistel rounds, we ļ¬nd an unkeyed ļ¬nal

permutation which is the inverse of the initial permutation. In the standard

Ā© 2009 by Taylor and Francis Group, LLC

158 Algorithmic Cryptanalysis

58 50 42 34 26 18 10 2

60 52 44 36 28 20 12 4

62 54 46 38 30 22 14 6

64 56 48 40 32 24 16 8

57 49 41 33 25 17 9 1

59 51 43 35 27 19 11 3

61 53 45 37 29 21 13 5

63 55 47 39 31 23 15 7

Table 5.1: DES initial permutation

40 8 48 16 56 24 64 32

39 7 47 15 55 23 63 31

38 6 46 14 54 22 62 30

37 5 45 13 53 21 61 29

36 4 44 12 52 20 60 28

35 3 43 11 51 19 59 27

34 2 42 10 50 18 58 26

33 1 41 9 49 17 57 25

Table 5.2: DES ļ¬nal permutation

DES description, individual bits within blocks are numbered from left to right,

from 1 to 64 for full blocks, from 1 to 32 for half blocks or from 1 to t for

t-bit subblocks. Using this convention, the initial and ļ¬nal permutations of

DES are given in Tables 5.1 and 5.2. It says that the ļ¬rst bit of the permuted

input is bit 58 of the original input, that the second bit is bit 50, the ninth

bit is bit 52 and so on. It is worth noting that if the inputs are viewed as a

concatenation of eight bytes, then the ļ¬fth permuted byte is the concatenation

of the ļ¬rst bits of all original bytes, the ļ¬rst permuted byte the concatenation

of second bits and similarly for other bytes. Thus, implementing the input

permutation in hardware costs virtually nothing when data is entered on an

8-bit bus. It suļ¬ces to consider that each bit of the bus is a serial input and

to put together each series of incoming bits in a corresponding byte.

After the initial permutation, the 64 bits of plaintext are split into two

32-bit halves L0 and R0 . The left half L0 is formed of bits 1 to 32 of the

permuted input. The right half R0 is formed of bits 33 to 64 of the permuted

input. For each round of Feistel numbered from 1 to 16, DES generates a

speciļ¬c subkey: Ki for round i. At round i, the algorithm computes:

Li = Riā’1 and Ri = Liā’1 ā• f (Riā’1 , Ki ), (5.1)

where f is the round function keyed by Ki . After round 16, the concatenation

of R16 and L16 is passed through the ļ¬nal permutation. Note that on output,

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 159

16 7 20 21 29 12 28 17

1 15 23 26 5 18 31 10

2 8 24 14 32 27 3 9

19 13 30 6 22 11 4 25

Table 5.3: Permutation of the round function

the left and right halves are reversed. This is an important feature; thanks to

it DES (with round keys in inverse order) is its own inverse.

To make our description of DES complete, we now need to describe the

round function f and the key expansion process which is used to derive the

round keys from the DES 56-bit key. The round function can be written as:

f (x, k) = P ā—¦ S(E(x) ā• k), (5.2)

where E is a linear expansion from 32 to 48 bits, S a non-linear transform

consisting of 8 S-boxes from 6 to 4 bits each and P is a bit permutation on 32

bits. The linear expansion E expands its 32 bits of input by duplicating some

of them. Using the same bit numbering convention as before, the expansion E

is described in Table 5.4. After expansion, the block is xored with the round

subkey, then split into 8 slices of 6 bits each. The ļ¬rst slice, consisting of bits 1

to 6, is used as input for the ļ¬rst S-box, usually called S1. Similarly, the other

slices are used as input to the other S-boxes, S2 to S8. The outputs of these

S-boxes are then concatenated back into a 32-bit word, starting by S1 and

ending with S8. The S-boxes S1 to S8 are described in Tables 5.5 to 5.12. The

reader unaccustomed to the description of DES, should take great care with

these tables. Indeed, they are presented in a way that emphasizes the presence

of four permutations on the set of integers from 0 to 15, embedded within each

S-box. However, they are not used in a straightforward manner. Each S-box

takes 6 bits of the expanded input xored with the round subkey. Due to the

structure of the expansion, the middle two bits in each set of six are only

used once, while the other four are also used in neighbour sets. The S-boxes

are used in a way that reļ¬‚ects this important fact. When computing Si on a

set of six bits b1 b2 b3 b4 b5 b6 , the number obtained from b1 b6 (i.e., 2b1 + b6 ) is

used to determine the row number from 0 to 3 and the number obtained from

b2 b3 b4 b5 is used to determine the column number from 0 to 15. The output

value of the S-box is then interpreted as a bitstring c1 c2 c3 c4 corresponding

to the number 8c1 + 4c2 + 2c3 + c4 . After concatenation, the output bits of

S1 are numbered from 1 to 4, while the output bits of S8 are numbered from

29 to 32. Using this way to read the S-boxes makes sure that bits duplicated

during the expansion are used once as row indices and once as column indices,

while non-duplicated bits are only used as column indices.

Ā© 2009 by Taylor and Francis Group, LLC

160 Algorithmic Cryptanalysis

32 1 2 3 4 545 6 7 8 9 8 9 10 11

12 13 12 13 14 15 16 17 16 17 18 19 20 21 20 21

22 23 24 25 24 25 26 27 28 29 28 29 30 31 32 1

Table 5.4: Expansion of the round function

14 4 13 1 2 15 11 8 3 10 6 12 5 9 07

0 15 7 4 14 2 13 1 10 6 12 11 9 5 38

4 1 14 8 13 6 2 11 15 12 9 7 3 10 50

15 12 82 4 9 1 7 5 11 3 14 10 0 6 13

Table 5.5: S-box S1

15 1 8 14 6 11 3 4 9 7 2 13 12 0 5 10

3 13 4 7 15 2 8 14 12 0 1 10 6 9 11 5

0 14 7 11 10 4 13 1 5 8 12 6 9 3 2 15

13 8 10 1 3 15 4 2 11 6 7 12 0 5 14 9

Table 5.6: S-box S2

10 09 14 6 3 15 5 1 13 12 7 11 4 2 8

13 70 93 4 6 10 2 8 5 14 12 11 15 1

13 64 98 15 3 0 11 1 2 12 5 10 14 7

1 10 13 06 9 8 7 4 15 14 3 11 5 2 12

Table 5.7: S-box S3

7 13 14 3 0 6 9 10 1 2 8 5 11 12 4 15

13 8 11 5 6 15 0 3 4 7 2 12 1 10 14 9

10 6 9 0 12 11 7 13 15 1 3 14 5 2 8 4

3 15 0 6 10 1 13 8 9 4 5 11 12 7 2 14

Table 5.8: S-box S4

2 12 4 1 7 10 11 6 8 5 3 15 13 0 14 9

14 11 2 12 4 7 13 1 5 0 15 10 3 986

4 2 1 11 10 13 7 8 15 9 12 5 6 3 0 14

11 8 12 7 1 14 2 13 6 15 0 9 10 453

Table 5.9: S-box S5

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 161

12 1 10 15 9 2 6 8 0 13 3 4 14 7 5 11

10 15 4 27 12 9 5 6 1 13 14 0 11 3 8

9 14 15 5 2 8 12 3 7 0 4 10 1 13 11 6

4 3 2 12 9 5 15 10 11 14 1 7 6 0 8 13

Table 5.10: S-box S6

4 11 2 14 15 0 8 13 3 12 9 7 5 10 6 1

13 0 11 7 4 9 1 10 14 3 5 12 2 15 8 6

1 4 11 13 12 3 7 14 10 15 6 8 0 5 9 2

6 11 13 8 1 4 10 7 9 5 0 15 14 2 3 12

Table 5.11: S-box S7

5.2.1.1 Round key expansion

In the DES algorithm, the 56-bit key is extracted from a 64-bit word with

bits numbered from 1 to 64, by ignoring the high order bit of each byte, i.e.,

ignoring bits 8, 16, . . . , 64. The remaining bits are split in two halves after

a permutation called PC-1 described in Table 5.13. The bits obtained from

the ļ¬rst part of the table are stored in a register C0 and the bits obtained

from the second part of the table in a register D0 . At round i, registers Ciā’1

and Diā’1 are transformed into two new values Ci and Di using left rotations.

The 48-bit round key Ki is obtained by concatenating 24 bits from Ci and 24

bits from Di the selected bits are obtained using PC-2 from Table 5.14, under

the convention that bits from Ci are numbered from 1 to 28 and bits from Di

from 29 to 56. To obtain Ci and Di from Ciā’1 and Diā’1 the left rotations

rotate two bits to the left except when i is 1, 2, 9 or 16 where they are rotated

by a single bit. As a consequence, the accumulated amount of rotation during

the 16 rounds is 28 bits, thus C16 = C0 and D16 = D0 .

5.2.2 Brute force on DES

First, we can note that during known plaintext brute force attacks, the ini-

tial and ļ¬nal permutations of DES can safely be ignored since the permuta-

tions can be applied beforehand to plaintext/ciphertext pairs. For ciphertext

13 2 84 6 15 11 1 10 9 3 14 5 0 12 7

1 15 13 8 10 3 7 4 12 5 6 11 0 14 9 2

7 11 4 1 9 12 14 2 0 6 10 13 15 3 5 8

2 1 14 7 4 10 8 13 15 12 9 0 3 5 6 11

Table 5.12: S-box S8

Ā© 2009 by Taylor and Francis Group, LLC

162 Algorithmic Cryptanalysis

57 49 41 33 25 17 9

1 58 50 42 34 26 18

10 2 59 51 43 35 27

19 11 3 60 52 44 36

63 55 47 39 31 23 15

7 62 54 46 38 30 22

14 6 61 53 45 37 29

21 13 5 28 20 12 4

Table 5.13: Permutation PC-1 of the key bits

14 17 11 24 1 5

3 28 15 6 21 10

23 19 12 4 26 8

16 7 27 20 13 2

41 52 31 37 47 55

30 40 51 45 33 48

44 49 39 56 34 53

46 42 50 36 29 32

Table 5.14: Table PC-2 to extract Ki from Ci and Di

only attacks, the input permutation often allows us to express our knowledge

about the plaintext distribution in a more compact form. For example, if

we know that the plaintext consists of printable ASCII characters, we know

that all high order bits are zeros. After the initial permutation, this implies

that bits 25 to 32 form a block of eight consecutive zeros. As a consequence,

an implementation of DES dedicated to brute force attacks does not need to

implement the initial and ļ¬nal permutations. Similarly, several other minor

improvements are possible. For testing several keys that diļ¬er only on a few

bits, it is not necessary to recompute the ļ¬rst rounds of decryption completely.

It is also not necessary to decrypt the whole block of message for all keys. In-

deed, if we compute a few bits of the decrypted message and see that they

are wrong, the rest of the bits are useless.

The most eļ¬cient idea to attack DES by brute force is probably the bitslic-

ing technique proposed by Biham in [Bih97]. With this technique, we view the

available computer as a parallel computer that operates in parallel on many

single bit registers. At ļ¬rst, this approach may seem surprising; however, it

allows us to replace many slow operations such as bit permutations and S-

box accesses by much faster operation. Depending on the speciļ¬c computer

used for the computation, we may obtain 32, 64 or even 128 bit operations in

parallel. This level of parallelism is used to try several keys simultaneously.

As a consequence, to get a fast bitslice implementation of DES, it essentially

suļ¬ces to express DES eļ¬ciently using bit operations. We now study in more

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 163

details how the various basic operations of DES may be implemented using

bit operations:

ā¢ Bit permutations: Permuting bits on a single-bit computer essentially

comes for free. Instead of moving values around, it suļ¬ces to wait until

the time where the permuted values are used and to directly access the

corresponding non-permuted register.

ā¢ Expansion: Since the DES expansion is a simple copy of bits, instead

of doing the copy, we proceed as we explain above for bit permutations

and directly read the required bit when needed.

ā¢ Xor: Since xoring is a bit by bit operation, it can clearly be done

without any change at the bit level.

ā¢ S-boxes: In fact, S-boxes access are the only operations which are not

straightforward on a single bit computer. Each S-box is a function from

6 bits to 4 bits. Alternatively, it can be viewed as four functions from 6

bits to 1 bit. Moreover, it is well known that any bit valued function can

be written using only XOR and Logical-AND operations. Thus, S-boxes

can be computed on single bit computers without any real diļ¬culty. To

optimize bitslice implementations of DES, it is important to express

each S-box, using as few bit operations as possible. Currently, the best

available expressions are those described by Matthew Kwan in [Kwa00].

In addition to this bitslicing technique, one should also remark that at

the very end of the computation the internal structure of the DES allows us

to stop almost two rounds early. On a total of 16 rounds, this permits to

save another 10% of the running time. First, the ļ¬nal round can be avoided

in most cases, indeed, once the penultimate round has been performed, half

of the ciphertext values for the set of keys being tested are already known.

Most of the time, these values do not match the expected ciphertext and the

computation can be aborted. Similarly, thanks to the S-box structure, this

penultimate round can be computed by small 4-bit chunk. As soon as one

chunk does not match the expected value, the corresponding key can safely

be forgotten. This approach is called an early abort strategy for the brute

force computation.

5.3 Brute force as a security mechanism

As most cryptanalysis tools, brute force can also be used to add security

features in computerized systems. The basic idea is to remark that in many

computer-assisted tasks, the amount of computation required from a legiti-

mate user is often extremely small or even negligible. For example, testing

Ā© 2009 by Taylor and Francis Group, LLC

164 Algorithmic Cryptanalysis

the correctness of a password by hashing at in Section 5.1 takes an extremely

small amount of time. Especially when this amount of time is compared to the

inherent delay of the human-computer interface involved. Indeed, no user re-

ally cares whether the password veriļ¬cation can be done within a millisecond

or a tenth of a second. For practical purposes, this is as good as immediate.

Of course, longer delays of the order of the second might be perceived as un-

bearable. Similarly, when sending out emails, a small delay would be perfectly

acceptable. This simple practical remark is the key to the material in this sec-

tion. During the short delay induced by the user, the computer could perform

some real work instead of simply waiting. For password testing for example,

one can easily conceive a slow hash function that would take a full tenth of a

second to test a password. No user would see the diļ¬erence. However, from

an attacker point-of-view any slow down in the password testing routine is

going to slow any attempt to dictionary attack by the same factor. Testing

a password in a tenth of a second instead of a millisecond is not an issue for

the legitimate user, but it costs the attacker a factor of a hundred, either in

terms of running time or of computing power.

However, implementing this simple idea is tricky. It is not enough to make

the password testing routine slow. For example, in the early days of computer

science, wait loops were added to the password testing routines. However,

since an attacker only need to remove the wait loops to make his code for

dictionary attacks faster, this approach is essentially useless. Similarly, using

unoptimized code for a password testing routine is not a solution. Any moti-

vated attacker can sit down and optimize the code on his own. Instead, one

should make sure to use a deeply optimized code implementing an inherently

slow function. These functions have been proposed in the literature under

the name of moderately hard-function. They were initially introduced

in [DN93], an interesting variation making use of memory accesses to slow the

computation even on fast computers was proposed in [ABW03].

5.4 Brute force steps in advanced cryptanalysis

Brute force is not necessarily a stand-alone cryptanalytic technique, it can

also occur as the ļ¬nal step of a larger cryptanalytic eļ¬ort. To illustrate this,

we propose to look at the diļ¬erential cryptanalysis of the hash function SHA-0

and its practical implementation. The hash function SHA-0 was proposed in

1993 by NIST, under the name SHA. It was modiļ¬ed in 1995 and the new

version is called SHA-1. We use the name SHA-0 to emphasize that we are

considering the early version of 1993. In this section, we describe SHA-0, show

how to devise a diļ¬erential attack to ļ¬nd collisions in this hash function.

Finally, we show that the implementation of this diļ¬erential attack amounts

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 165

to sampling using brute force a large enough set of message pairs.

5.4.1 Description of the SHA hash function family

Since SHA-0 and SHA-1 are very similar, they can easily share a common

description. For this common description, we refer to both algorithms using

the generic name of SHA. The general architecture of SHA is very similar to

other existing hash functions. It consists of two parts, a low-level compression

function which underlies the hash function and a mode of operation which

turns it in a full-ļ¬‚edged hash function. The low level compression function

takes as input two values, one on 160 bits and one on 512 bits and outputs

a 160 bits value. It is built from an ad-hoc block cipher with 160-bit blocks

and 512-bit keys. The block cipher is transformed into a function, rather

than a permutation, using a variation of the Davies-Meyer transform. The

mode of operation used SHA is the Merkle-DamgĖ construction with ļ¬nal

ard

strengthening. The idea of the construction is to process the message to be

hashed iteratively by individual 512-bit blocks. Starting from a ļ¬xed initial

value IV on 160 bits and the ļ¬rst block of message, the compression function

produces a new 160-bit value. This value is compressed together with the

second block of message and so on . . .

To make the description more precise, let us see in details how a message

M is processed. The ļ¬rst step is to make sure that M is encoded into a

multiple of the block size in a non-ambiguous fashion. This requirement is

very important. Indeed, if two diļ¬erent messages M and M can be encoded

into the same sequence of blocks, then they necessarily have the same hash

value and the hash function cannot be collision resistant. With SHA, a simple

encoding is used. The bitstring representation of M is padded with a single

bit set to 1, a sequence of bits set to 0 and ļ¬nally a 64-bit representation

of the bitlength of the unpadded message M . The length of sequence of

zeros is the smallest possible number to have a multiple of 512 as length of

the padded message. Thus, the number of zeros is contained in the interval

between 0 and 511. It is worth noting that with this padding the length of the

unpadded message is encoded twice, once explicitly by the 64-bit number at

the end of the padding and once implicitly, since the original message can be

recovered by removing from the padded message the 64 ļ¬nal bits, all trailing

zeros and a single one. This double encoding of the length was introduced

as a strengthening of the Merkle-DamgĖ construction. It is usually called

ard

MD-strengthening.

After padding, the message can be divided into 512-bit blocks, numbered

M1 , M2 , . . . , M , where is the number of blocks. Denoting by F the com-

pression function, the hash value of M is computed as follows:

h0 = IV (5.3)

hi = F (hiā’1 , Mi ) for all i from 1 to .

Ā© 2009 by Taylor and Francis Group, LLC

166 Algorithmic Cryptanalysis

The last value h is the hash value of M . The ļ¬rst value h0 is the ļ¬xed initial

value. For convenience, the other values hi are called intermediate hash values.

The compression function F of SHA is described in Section 5.4.1.2.

5.4.1.1 Security properties of the Merkle-DamgĖ construction

ard

As discussed in Chapter 1, when considering hash functions, we are inter-

ested by three security properties: collision resistance, preimage resistance

and second preimage resistance.

When considering the Merkle-DamgĖ construction, it is possible to show

ard

that the security of the hash function H can be derived from the security

of the compression function H. For example, given a collision for H, it is

easy to produce a collision for the compression function F . First, if M and

M have diļ¬erent lengths, their ļ¬nal blocks M and M are diļ¬erent. If

H(M ) = H(M ), we necessarily have:

F (h ā’1 , M ) = F (h ā’1 , M ), (5.4)

which is a collision for F . Otherwise, M and M are diļ¬erent messages of the

same length and we look at the largest block index i in the interval [1, ] such

that either Mi = Mi or hiā’1 = hiā’1 . Of course, hi = hi and

F (hiā’1 , Mi ) = F (hiā’1 , Mi ) (5.5)

is a collision for F . It is important to remark that in general we obtain

collisions for F with diļ¬erent values on the h component. As a consequence,

proving that F (h, Ā·) is collision resistant for all ļ¬xed values of h is not suļ¬cient

to establish collision resistance of H. In relation to this comment, note that

some speciļ¬c diļ¬erential attacks, called multiblock attacks, require several

consecutive calls to F before producing a collision for H. From this multiblock

attack, one can derive a collision on F but usually not on F (h, Ā·).

Using a very similar analysis, we can prove that if F is preimage resistant

then H is preimage resistant. Indeed, if we can ļ¬nd a preimage M of a target

value h, we see that F (h ā’1 , M ) = h and obtain a preimage of h for the

compression function F .

However, where second preimage resistance is concerned, we obtain a slightly

weaker result. Any second preimage attack on H can be used to obtain a col-

lision attack on F , but not necessarily a second preimage attack on F .

5.4.1.2 Compression function of SHA

At the core of the compression function of SHA, we ļ¬nd a speciļ¬c block

cipher that encrypts 160-bit blocks using a 512-bit key. In order to be ef-

ļ¬cient in software, this speciļ¬c block cipher is mostly based on 32-bit un-

signed arithmetic. As a consequence, each 160-bit block is decomposed in ļ¬ve

32-bit values. The input to the block cipher is thus denoted by the vector

A(0) , B (0) , C (0) , D(0) , E (0) . The block cipher itself is a generalized Feistel

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 167

Function f (i) (x, y, z) Constant K (i)

Round i

Name Deļ¬nition

(x ā§ y) āØ (ĀÆ ā§ z)

0ā“19 IF x 0x5A827999

(x ā• y ā• z)

20ā“39 XOR 0x6ED9EBA1

MAJ (x ā§ y) āØ (x ā§ z) āØ (y ā§ z)

40ā“59 0x8F1BBCDC

(x ā• y ā• z)

60ā“79 XOR 0xCA62C1D6

Table 5.15: Deļ¬nition of the round functions and constants

scheme with 80 rounds. At round i, we denote the inner state of the en-

cryption algorithm by A(i) , B (i) , C (i) , D(i) , E (i) . To update the inner state

between round i and round i+1, a new value is computed for A(i+1) by mixing

together the previous values of A, B, C, D, E and some key material. At the

same time, B (i+1) , C (i+1) , D(i+1) and E (i+1) are obtained as (rotated) copies

of A(i) , B (i) , C (i) and D(i) . More precisely, for each round, with i varying

from 0 to 79, the registers A, B, C, D and E are updated according to the

following formulas:

A(i+1) = ROL5 A(i) + f (i) (B (i) , C (i) , D(i) ) + E (i) + W (i) + K (i) ,

B (i+1) = A(i) ,

C (i+1) = ROL30 B (i) , (5.6)

D(i+1) = C (i) and

E (i+1) = D(i) .

In these equations, addition is performed modulo 232 and ROLb X denotes the

value obtained by rotating b bits to the left the binary representation of the

32-bit unsigned integer X. Moreover, f (i) is a round function and K (i) a round

constant, they both depend on i. They are speciļ¬ed in Table 5.15. Finally,

W (i) is the contribution of the 512-bit key to round i. The 16 values from

W (0) to W (15) are simply obtained by decomposing the 512-bit input into 16

consecutive 32-bit words. The next 64 values W (16) to W (79) are obtained as

the result of an expansion process. In SHA-0 the expansion is deļ¬ned by:

W (i) = W (iā’3) ā• W (iā’8) ā• W (iā’14) ā• W (iā’16) .

ā i : 16 ā¤ i < 80, (5.7)

Note that this only diļ¬erence between SHA-0 and SHA-1 concerns this expan-

sion. In SHA-1, the expansion becomes:

W (i) = ROL1 W (iā’3) ā• W (iā’8) ā• W (iā’14) ā• W (iā’16) . (5.8)

After 80 rounds, the block cipher output is A(80) , B (80) , C (80) , D(80) , E (80) .

However, despite the fact that SHA is based on the block cipher as mentioned at

Ā© 2009 by Taylor and Francis Group, LLC

168 Algorithmic Cryptanalysis

the beginning of this section, using the Merkle-DamgĖ construction directly

ard

with a block cipher is not a good idea; see Exercise 5 in Chapter 6. Instead,

an additional step, similar to the Davies-Meyer construction, is used to trans-

form this block cipher into a function, rather than a permutation. Recall that

the Davies-Meyer construction is a very useful way to turn a cryptographic

permutation Ļ into a cryptographic function x ā’ Ļ(x) ā• x. With SHA, a

variation based on 32-bit addition in place of the exclusive or is used. With

this transform, the ļ¬nal output of the compression function is:

A(80) + A(0) , B (80) + B (0) , C (80) + C (0) , D(80) + D(0) , E (80) + D(0) .

5.4.2 A linear model of SHA-0

In order to study the propagation of diļ¬erences in SHA-0 and to learn how

to control this propagation well enough to construct collisions, it is very useful

to start by looking at a fully linearized variation of SHA-0. First, remark that

in SHA-0, there are two sources of non-linearity: the f (i) functions in rounds

0 to 19 and in rounds 40 to 59 (respectively IF and MAJ) and the addition

modulo 232 . Indeed, addition modulo 232 may involve propagation of carries,

and as a consequence it is not linear over F2 . The linear model of SHA-0 is

obtained by replacing the additions by XOR operations on 32-bit words and

by using the XOR function as round function in every round. We also replace

addition by XOR in the ļ¬nal Davies-Meyer transform.

With this change, we obtained a completely linear compression function.

As a consequence, ļ¬nding collisions becomes a straightforward task. It suļ¬ces

using linear algebra algorithms to construct an element of the kernel of this

function. However, this approach does not shed any light on SHA-0 itself.

Instead, we are now going to write this linear algebra attack using a speciļ¬c

sparse approach. The advantage is that this approach can then be transformed

into a probabilistic attack of SHA-0. The speciļ¬c approach splits the problem

of collision ļ¬nding into two easier tasks. The ļ¬rst task concentrates on a

small number of consecutive rounds ignoring the expansion of the vector W ,

yielding local collisions. The second task is to merge together several local

collisions while remaining consistent with the expansion.

5.4.2.1 Local collisions in linearized SHA-0

In order to construct local collisions in the linearized version of SHA-0, it

suļ¬ces to introduce a change on a single bit of W , to follow the propagation

of this change in the subsequent rounds and using adequate corrections to pre-

vent the initial change from impacting too many bits. More precisely, starting

from a common inner state A(i) , B (i) , C (i) , D(i) , E (i) , assume that we per-

form in parallel two partial hash computations. In the ļ¬rst computation, we

(i)

input a message block W1 and in the second computation a message block

Ā© 2009 by Taylor and Francis Group, LLC

Brute force cryptanalysis 169

(i) (i) (i)

W2 . We call Ī“ = W1 ā• W2 the diļ¬erence3 of the two message blocks. To

make the analysis simpler, it is useful to assume that the diļ¬erence Ī“ is on a

single bit.

Clearly, by linearity, the diļ¬erence Ī“ directly propagates in the computation

(i+1) (i+1)

of A(i+1) and we ļ¬nd that A1 ā• A2 = Ī“. In the subsequent steps, this

(i+1) (i+2)

, C (i+3) , D(i+4) and E (i+5) . How-

initial diļ¬erence on A moves to B

ever, if we do not make any other changes in the message blocks, secondary

changes also occur in A(i+2) , A(i+3) , . . . To control the propagation of the

diļ¬erences, the key idea is to make other changes in W in order to prevent

any secondary change in A. We refer to these additional changes as correc-

tions. The ļ¬rst correction is to prevent the diļ¬erence in A(i+1) to propagate to

A(i+2) . The reader can easily check that, by linearity, this is eļ¬ected by choos-

(i+1) (i+1)

ā• W2

ing W1 = ROL5 (Ī“). The second correction prevents the diļ¬erence

(i+2) (i+2)

(i+2)

to aļ¬ect A(i+3) , it consists of choosing W1 ā• W2

in B = Ī“. The

(i+3) (i+4)

, since C (i+3)

third correction prevents the diļ¬erence in C to aļ¬ect A

(i+3) (i+3)

is a rotated copy of B (i+2) , the correction requires W1 ā•W2 = ROL30 (Ī“).

(i+4)

and E (i+5) using

Finally, the fourth and ļ¬fth corrections account for D

(i+4) (i+4) (i+5) (i+5)

ā• W2 ā• W2

W1 = ROL30 (Ī“) and W1 = ROL30 (Ī“).

After that ļ¬fth correction, we see that the initial diļ¬erence vanishes and

thus the two parallel computations yield the same inner state

A(i+6) , B (i+6) , C (i+6) , D(i+6) , E (i+6) .

This propagation of diļ¬erences is summarized in Figure 5.1. In this ļ¬gure,

A(i) [j] denotes the j-th bit of A(i) .

5.4.2.2 Combining local collisions with the message expansion

In order to combine local collisions with the message expansion, it suļ¬ces to

remark that thanks to linearity, xoring together the changes and corrections of

several local collisions yields a possible pattern of collision. As a consequence,

it suļ¬ces to put together several local collisions in a way that guarantees

that the XOR of all involved changes and corrections is a valid output of

the message expansion. With SHA-0, the fact that the expansion is not only

linear but also applies in parallel to individual bits of the words W is quite

helpful. Indeed, any valid expanded sequences of words W0 , . . . , W79 can

be constructed by pasting together 32 expanded sequences of bits. Since the

expansion computes 80 expanded bits from 16 initial bits, it is even easy to

exhaustively compute the 216 diļ¬erent expanded sequences on a single bit

position.

The key idea is to insert several local collisions in a way that follows such

an expanded sequence of bits. In particular, this means that all changes of

3 This is a standard name in diļ¬erential cryptanalysis, reminiscent from the fact that in F2 ,

a ā’ b is the same thing as a ā• b.

Ā© 2009 by Taylor and Francis Group, LLC

170 Algorithmic Cryptanalysis

Corrections

Perturbation

on bit 1

ńņš. 6 |