<<

. 6
( 18)



>>




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 141

analyzed precisely by Pomerance in [Pom82] for Dixon™s factoring algorithm
and allowed improvements of its asymptotic time complexity.
Once trial division is ¬nished, we could move to other generic factoring al-
gorithms such as Pollard™s Rho or ECM to factor the reduced norms; however,
at this point, it is tempting to use the extra structure shared by our candi-
dates. All of them are of the form a + b± with (a, b) in the sieve zone. A good
trick in order to ¬nd among the candidates those which are divisible by an
element of B is to sieve again. More precisely, we initialize the sieve zone to
zero except in locations corresponding to candidates, where we write a iden-
ti¬er for the candidate. During this second sieve, for each point encountered
during the lattice walk, we do nothing if the point™s location contains a zero
and otherwise we divide the reduced norm of the corresponding candidate,
taking multiplicity into account. This approach allows us to factor out all
elements of B corresponding to medium-sized primes.
Finally, after another ¬ltering pass, where we keep small enough reduced
norms, we complete the factorization using primality tests and either Pollard™s
Rho (see Chapter 7) or ECM depending on our choice of factor base B.
Note that, while checking for smoothness, we obtained the factorization of
the norm of each candidate. In order to avoid redoing this job later, it is
preferable in most cases to store this factorization together with the smooth
candidate. This stored factorization is used in the subsequent phases of index
calculus method described in Chapter 15.

4.2.1.2.4 Working with smaller arrays. In order to reduce the re-
quired amount of memory, a natural approach is to mimic the idea behind
wheel factorization described in Section 4.1.2.1. The basic idea behind wheel
factorization is to remove from the memory representation all elements which
are known to be useless. With the current problem, we known in advance
that a pair (a, b) is useless whenever the GCD of a and b is not one. Thus a
good way to reduce the memory usage is to ¬lter out such pairs. Of course,
the ¬ltering should not be too complicated to avoid high runtime penalties.
The simplest approach is to remove all pairs where both a and b are even,
this reduces the amount of memory by 1/4 without involving any complicated
computation. In fact, it is similar to the removal of even integers in Eratos-
thenes™s sieve. Of course, one could push the approach further and remove all
pairs with 2, 3, 5, . . . as a common factor between a and b. However, this
is much less interesting than wheel factorization. Indeed, in Eratosthenes™s
sieve, the proportion of integers removed by adding p to the wheel is roughly
1/p, here it is 1/p2 . This new ratio is not so good, since the sum of the series
formed of the squared inverses of primes is converging.

4.2.1.3 The case of polynomials over F2
Some index calculus algorithms need to sieve over pairs (a, b) where a and
b are univariate polynomials. In this section, we address the frequent special




© 2009 by Taylor and Francis Group, LLC
142 Algorithmic Cryptanalysis

case where these polynomials are de¬ned over F2 . Using polynomials implies
a slight rewriting of some of the mathematical objects involved, for example
we need to replace prime numbers by irreducible polynomials. However, these
changes are unessential where the sieving process is concerned. Here, we are
interested by the direct impact of using polynomials on the sieve algorithm.
Interestingly, using polynomials over F2 greatly simpli¬es the task of sieving.
This is the reason why we start with this case. The ¬rst remark concerns the
sieving zone. Since we are working over F2 any polynomial is its own opposite.
This additional symmetry leads us to reduce the sieve area to a square zone
a and b should both belong to the same set of small polynomials. Moreover,
it is extremely convenient to de¬ne this set by giving a upper bound on the
degrees of a and b. However, this greatly limits the ¬‚exibility of the memory
usage. Taking all polynomials of degree < D, we have 22D possible pairs (a, b).
Writing a and b as polynomials in x, it is easy to remove from the memory
representation pairs (a, b) where a and b are both multiples of x. Similarly, it
is easy to remove pairs (a, b) where a and b are both multiples of x + 1. As
soon as D ≥ 2 removing these pairs lowers the required amount of memory
to 9/16 of the initial amount 22D .
From a practical point-of-view, polynomials over F2 can be e¬ciently rep-
resented by numbers encoded in binary, as long as their degree is smaller than
the available number of bits. Assuming that the bits are numbered from right
to left starting with bit 0, we use bit i to represent the coe¬cient of xi in the
polynomial. For example, the low order bit (rightmost) is used to encode the
constant term. With this convention, the polynomial x3 + x + 1 is encoded
by the number 11. With this representation, polynomials can be added using
the XOR operation on numbers, they can be multiplied by x using a simple
left shift (multiplication by 2 of the corresponding number). Moreover, within
this representation, multiples of x and x + 1 are easily recognized. A multiple
of x corresponds to an even number and a multiple of x + 1 corresponds to
a number whose Hamming weight is even, or equivalently to a number that
contains an even number of 1s in its representation.
Despite its drawback of limiting the choice in terms of used memory, choos-
ing a sieving zone comprising all polynomials of degree < D is extremely
convenient for sieving. First, when representing polynomials over F2 as an
integer, testing whether the degree is < D can be done by comparing the
polynomial™s representation with 2D . Since integer comparison is a native
operation on modern computers, this is extremely e¬cient. Second, when
sieving, we need to construct all multiples of a given polynomial up to degree
D. If we start with a polynomial of degree d, this is achieved by multiplying
it by all polynomials up to degree D ’ d. With another choice of sieving zone,
we cannot use the additivity of degrees during multiplication and thus need
to use a more complicated and less e¬cient approach. When constructing all
polynomials up to degree D, we can even speed up the construction of the
multiples by using a Gray code to enumerate them quickly.
To make this more precise, assume that we want to mark all pairs of




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 143


Binary Gray codes o¬er a way of Programming Gray codes on n is a
enumerating all vectors on n-bits, simple matter. It su¬ces to run an
arithmetic counter i from 1 to 2n ’1
while changing only a single bit be-
tween one vector and the next. For that contains the number of the cur-
example, on 4 bits, we have the fol- rent change. The position of the bit
lowing code: that needs to be changed during the
i-th is the position of the rightmost
0 0 0 0
bit equal to 1 in the binary repre-
0 0 0 1
sentation of i. The position of the
0 0 1 1
rightmost bit is easily obtained by
0 0 1 0
shifting a copy of i to the right, until
0 1 1 0
the resulting number becomes odd.
0 1 1 1
Note that Gray codes can easily be
0 1 0 1
generalized to vectors of n elements
0 1 0 0
in a small ¬nite ¬eld Z/pZ. For ex-
1 1 0 0
ample, for vectors of 3 ternary dig-
1 1 0 1
its, we obtain the following Gray
1 1 1 1
code:
1 1 1 0
000 101
1 0 1 0
001 111
1 0 1 1
002 112
1 0 0 1
012 110
1 0 0 0
010 210
which contains the sixteen possibil- 011 211
ities. Gray codes are useful when- 021 212
ever manipulating complex objects 022 222
where changes are costly. In this 020 220
case, reducing the total number of 120 221
changes is clearly worthy. In partic- 121 201
ular, it is possible to use Gray codes 122 202
to e¬ciently produce all multiples 102 200
of a binary polynomial a(x) by all 100
polynomial of degree at most n ’ 1
by using a Gray code on n bits. We Ternary Gray codes are slightly less
start from an initial multiple equal suited to the architecture of present
to 0, i.e., 0 · a(x), assuming that bits computer, because to implement
are numbered from the right start- them, it is preferable to know the
ing at 0, a change on bit c in the decomposition of the change counter
Gray code is converted into an ad- i in base 3. As a consequence, they
dition of xc a(x) to the current mul- use many reductions modulo 3 and
tiple. division by 3.


Figure 4.4: Gray codes




© 2009 by Taylor and Francis Group, LLC
144 Algorithmic Cryptanalysis

polynomials (a(x), b(x)) divisible by some element of the smoothness basis:
(p(x), ±p (x)), where p(x) is an irreducible polynomial. These pairs satisfy the
equation:
a(x) + b(x)±p (x) ≡ 0 (mod p(x)). (4.3)
Using a partial run of Euclid™s extended GCD Algorithm 2.2 on the pair of
polynomials (±p (x), p(x)), we can ¬nd two di¬erent small solutions, (a1 , b1 )
and (a2 , b2 ), where the degrees of a1 , a2 , b1 and b2 are near the half-degree
of p(x). Moreover, any solution (a, b) can be written as a linear combination
of these two initial solutions, i.e., there exist two polynomials »(x) and µ(x)
such that:

a(x) = »(x)a1 (x) + µ(x)a2 (x) and (4.4)
b(x) = »(x)b1 (x) + µ(x)b2 (x).

In order to exhibit all solutions, it su¬ces to consider all pairs (», µ) with small
enough degree. As a consequence, we simply need to nest two independent
Gray code enumerations to e¬ciently enumerate all the (a, b) pairs. This is
much simpler than the corresponding algorithm in the case of numbers, as can
be seen on Algorithm 4.4. This algorithm also tests whether both coordinates
in each pair are multiples of x or x + 1 before marking the pair.

4.2.1.4 The case of numbers
Sieving with numbers is more complicated than sieving with polynomials
over F2 ; we must take into account the geometry of both the sieving zone and
the lattice to be sieved. Due to the variety of relative positions of the basis
vectors (u, v), many di¬erent cases are possible. Assuming that the lattice™s
basis is Gauss™s reduced as described in Chapter 10, we know that u is (one of)
the shortest in the lattice and that v is (one of) the shortest among the lattice™s
vectors linearly independent from u. Moreover, |(u|v)| is smaller than u /2.
Thus, u and v are both short and they are nearly orthogonal to each other.
This means that considering vectors of the form cu u + cv v for cu and cv in
given intervals allows us to cover an almost rectangular looking parallelogram.
However, the axes of these parallelogram have no reason whatsoever to be
aligned with the axis of the sieving zone. We describe one possible way of
marking the multiples in Algorithm 4.5. In this code, we denote by wa and
wb the coordinates of a vector w. A speci¬c instance of the algorithm is
illustrated in Figure 4.5.

4.2.1.5 The case of polynomials in odd characteristic
Finally, the case of polynomials in odd characteristic is a mix of the two
previous cases. Either the characteristic is small and the degree of polynomials
quite large and we proceed as for polynomials over F2 , with the small caveat
that it is no longer possible to represent polynomials simply as bit vectors. Or
the characteristic is larger and we consider only polynomials of small degree,




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 145




Algorithm 4.4 Walking the multiples with polynomials
Require: Element of B: (p(x), ±p (x))
Require: Maximal degrees of a(x) and b(x) in the sieving zone, da and db
Create initial solutions (a1 (x), b1 (x)) and (a2 (x), b2 (x))
Let d1 ←’ min(da ’ deg a1 , db ’ deg b1 )
Let d2 ←’ min(da ’ deg a2 , db ’ deg b2 )
Let (±(x), β(x)) ←’ (0, 0)
for i from 0 to 2d1 ’ 1 do
if i = 0 then
Let i ←’ i
Let c ←’ 0
while i is even do
Let i ←’ i /2
Let c ←’ c + 1
end while
Let ±(x) ←’ ±(x) • xc a1 (x)
Let β(x) ←’ β(x) • xc b1 (x)
end if
for j from 0 to 2d2 ’ 1 do
if j = 0 then
Let j ←’ j
Let c ←’ 0
while j is even do
Let j ←’ j /2
Let c ←’ c + 1
end while
Let ±(x) ←’ ±(x) • xc a2 (x)
Let β(x) ←’ β(x) • xc b2 (x)
end if
if ±(0) = 0 or β(0) = 0 then
if ±(1) = 0 or β(1) = 0 then
Mark (±(x), β(x))
end if
end if
end for
end for




© 2009 by Taylor and Francis Group, LLC
146 Algorithmic Cryptanalysis

Algorithm 4.5 Walking the multiples with numbers
Require: Element of B: (p, ±p )
Require: Sieve limits Sa and Sb
Create reduced basis (u, v) for lattice
Modify basis to ensure 0 ¤ ua < va and ub · vb ¤ 0
Assume ub < 0 ¤ vb (otherwise adapt boundary condition in the sequel)
0
Let w ←’
0
repeat
Let z ←’ w
Let w ←’ w + u
if wa ≥ va then
Let w ←’ w ’ v
end if
until wb ¤ ’Sb
Let w ←’ z {Here w is in lower left corner}
Let z ←’ w
repeat
Let t ←’ z {Start ¬rst sieving zone}
repeat
if ta is odd or tb is odd then
Mark point corresponding to t
end if
Let t ←’ t + u
until ta ≥ Sa or tb ¤ ’Sb
Let z ←’ z + v
until za ≥ Sa or zb ≥ Sb
Let w ←’ w ’ u
if wa < 0 then
Let w ←’ w + v
end if
repeat
Let z ←’ w {Second sieving zone}
repeat
if za is odd or zb is odd then
Mark point corresponding to z
end if
Let z ←’ z + v
until za ≥ Sa or zb ≥ Sb
Let w ←’ w ’ u
if wa < 0 then
Let w ←’ w + v
end if
until wb ≥ Sb




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 147



b




one
ing Z
Siev
nd
Seco



a
Firs
t Sie
ving
Zone




Figure 4.5: Illustration of Algorithm 4.5




© 2009 by Taylor and Francis Group, LLC
148 Algorithmic Cryptanalysis

sometimes linear polynomials. Since we are sieving over large coe¬cients, this
behaves more like the case of Section 4.2.1.4.


4.2.2 Advanced sieving approaches
The main drawback of the basic algorithm is its huge memory requirement.
Since it sieves the whole sieving zone at once, it needs enough memory to store
this zone, which is impractical in most cases. In order to lift this di¬culty, the
natural answer is to cut the sieving zone in small fragments and address each
in turn. In the literature, two important ways of fragmenting the sieving zone
are considered. The simplest, called line sieving, cuts the sieving zone in lines,
each line corresponding to a ¬xed value of one coordinate in the pair (a, b).
The other frequent approach looks at two-dimensional fragments obtained by
considering points on sub-lattices within the sieving zone. This is called the
special-q approach. We now discuss both approaches in detail.

4.2.2.1 Line sieving

Line sieving is a simple and elegant solution to sieve large zones. In fact, it is
even conceptually easier than the basic lattice Algorithm 4.3. Indeed, it does
(i)
not even require a good lattice description of the linear equation a + b±p ≡
0 (mod p). Assume that we sieve by lines corresponding to a ¬xed value
of b. Since our choice of sieving zone has a ≥ 0, for each value of b and
(i)
for each element (p, ±p ) of the smoothness basis, we simply start with the
(i)
smallest non-negative representative of b±p mod p and tick every p-th integer
starting from that point. When sieving several consecutive lines on the same
machine, the performance can be greatly improved by remarking that given
(i)
the starting point of (p, ±p ) for b, the corresponding starting point for b + 1
can be computed with a single modular addition which is more e¬cient than
a modular multiplication. This results in Algorithm 4.6 to sieve a set of
consecutive lines.
Of course, as with the basic lattice sieve, it is easy to derive variations of
this algorithm that simply count the number of divisors of each pair (a, b)
or that remove very small elements of B from the sieving loop. Once this is
done, we can observe that the innermost operation that adds to LogSum[a]
is the most important as running time goes. As a consequence, it is worth
optimizing this inner loop. The key fact is that in most cases, Sa is quite
large and thus the total amount of memory we use is larger than the cache
size. As a consequence, it is a good idea to take the cache structure into
account in order to sieve faster. Assuming that the memory structure is
simple, with a single level of cache, this can be done by further segmenting
the representation of the a values in memory into several blocks that ¬t in
the cache. For elements of B with a p value smaller than the blocks™ size, we
can sieve a given block while staying within the cache. For larger values of p,




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 149




Algorithm 4.6 Basic line sieve
Require: Smoothness basis B
Require: Sieve limit Sa
Require: Range for b: bmin · · · bmax
Require: Expected value LogV al for smooth candidates
Create single dimensional array LogSum indexed by [O · · · Sa ’ 1]
Create single dimensional array StartP os indexed by B
(i)
for all P = (p, ±p ) in B do
(i)
StartP os[P] ←’ bmin ±p (mod p)
end for
for b from bmin to bmax do
Initialize LogSum to zero
(i)
for all P = (p, ±p ) in B do
a ←’ StartP os[P]
while a ¤ Sa do
Add log p to LogSum[a]
a ←’ a + p
end while
end for
for a from 0 to Sa do
if LogSum[a] ≥ LogV al then
Check whether a + b± is really smooth
If so Display((a, b),˜corresponds to a smooth value™)
end if
end for
(i)
for all P = (p, ±p ) in B do
(i)
StartP os[P] ←’ StartP os[P] + alphap (mod p)
end for
end for




© 2009 by Taylor and Francis Group, LLC
150 Algorithmic Cryptanalysis

unfortunately, we need to jump between blocks and do not gain. Luckily, the
small values of p require more operations than the small ones, which makes
this optimization very useful. We illustrate this idea with two levels of cache
by the C code given in Program 4.3.

4.2.2.2 Special-q lattice sieving
The other frequent way of splitting a sieve into smaller is more technically
involved than line sieving. However, it presents the very useful property
of guaranteeing that all the pairs being considered are already divisible by
a common element. This common element, called the special q is usually
chosen among the element B associated to a large prime value. Each piece of
the sieve corresponds to a di¬erent special-q. One technicality is the fact that
the various pieces are not necessarily disjoint. As a consequence, when using
this algorithm for sieving, some solutions are duplicated. These duplicates
should be removed using a ¬ltering program. Given a special q written as
(q, ±q ), the special-q lattice sieving algorithm focuses on pairs (a, b) divisible
by the special and thus satisfying the linear condition a+b±q ≡ 0 (mod q). As
explained in the case of the basic lattice sieving, this corresponds to pairs (a, b)
on some integer lattice, which can be described by a reduced basis (uq , vq ).
In most cases, the largest coordinate of each vector in the reduced basis is

in absolute value not too far from q. When this is not the case, the norm
of uq is much smaller than the norm vq and the basis is unbalanced. With
such an unbalanced basis, special q lattice sieving is much less convenient.
As a consequence, we omit special q™s with unbalanced basis and can assume
from now on that the reduced basis is balanced. More precisely, we ¬x some
constant Λ and keep a given special-q only when the coordinates of the two

basis vectors are bounded (in absolute value) by Λ q. Using this choice
allows us to choose the sieving zone within the sublattice independently of the
corresponding special q. We consider pairs (a, b) written as cu uq + cv vq with
(cu , cv ) in a ¬xed sieving zone [0 · · · Su ]—[’Sv · · · Sv ]. A direct consequence of
this choice is that sieving such a sublattice can be done using the basic lattice
sieve Algorithm 4.3. The only condition is that all elements of B should be
expressed in terms of the new coordinates (cu , cv ).
(i)
In order to express an element (p, ±p ) of B in terms of the new coordinates,
let us recall that a + b± belongs to the corresponding lattice whenever the
(i)
linear condition a + b±p ≡ 0 (mod p) is satis¬ed. Replacing a and b by their
expression in cu and cv we ¬nd the condition:

(cu ua + cv vq ) + (cu ub + cv vq )±p ≡ 0 (mod p) or
a b (i)
(4.5)
q q

cu (ua + ub ±p ) + cv (vq + vq ±p ). ≡ 0 (mod p)
(i) a b (i)
q q


This is a linear equation in cu and cv modulo p. As a consequence, we can
follow the same approach as in the case of the original equation in a and b




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 151

Program 4.3 Line sieving with two levels of cache
#define SIZE 510000 /* Max. Number of prime pairs */
#define ZONE 9000000 /* Size of each zone, mult of CACHE2 */
#define CACHE1 25000 /* Approximate size of L1 CACHE */
#define CACHE2 750000 /* Approximate size of L2 CACHE */
#define START_ZONE -500 /* Number of first sieve zone */
#define END_ZONE 500 /* Number of last sieve zone */
#define THRESHOLD 12 /* Num. of hits per smooth candidate */
#define START 30 /* Prime pairs below START not sieved */
int root[SIZE], mod[SIZE]; /* Arrays for prime pairs */
char Zone[ZONE]; /* Array for sieving */
int Size; /*True number of pairs */
int Size1, Size2; /*Limits to fit in CACHE1 and CACHE2 */

main()
{ int i,j,offset; int limit2,limit1;
Read();/*Get prime pairs, set Size1/2 to fit CACHE1/2*/
for(i=START;i<Size;i++) {
int r,q; /* Loop shifts primes pairs to first zone */
q=mod[i]; offset=MulMod(START_ZONE,ZONE,q);
r=root[i]-offset; if (r<0) r+=q; root[i]=r; }

for (j=START_ZONE;j<END_ZONE;j++){
for(i=0;i<ZONE;i++) Zone[i]=0; /*Reset sieve zone */
for(limit1=limit2=0;limit2<=ZONE;limit2+=CACHE2) {
for(;limit1<=limit2;limit1+=CACHE1) {
for(i=START;i<Size1;i++) { /* Sieve small*/
int r,m; r=root[i]; m=mod[i];
while (r<limit1) {Zone[r]++; r+=m;}
root[i]=r;}}
for(i=Size1;i<Size2;i++) { /* Sieve medium*/
int r,m; r=root[i]; m=mod[i];
while (r<limit2) { Zone[r]++; r+=m;}
root[i]=r;}}
for(i=START;i<Size2;i++) { /* Shift to next zone */
root[i]=root[i]-ZONE;}
for(i=Size2;i<Size;i++) { /* Sieve large */
int r,m; r=root[i]; m=mod[i];
while (r<ZONE) {Zone[r]++; r+=m;}
root[i]=r-ZONE;}
for(i=0;i<ZONE;i++){ /* Detect and print smooth candidates*/
if (Zone[i]>=THRESHOLD) {printf("F(%d*SZ+%d);\n",j,i);
fflush(stdout); }}}}




© 2009 by Taylor and Francis Group, LLC
152 Algorithmic Cryptanalysis

in order to write down a good representation of the lattice comprising all the
coordinates (cu , cv ) of divisible elements.
Finally, we see that lattice sieving with special-q does not di¬er much from
basic lattice sieving. We need to reinitialize the representations of elements of
B for each special-q and change the accepting bound on the number of divisors
or on the sum of their logarithm to account for the additional special-q divisors
and their in¬‚uence on the value of norms. Except for this minor change,
everything else is a verbatim copy of our previous approach. However, the
fact that we need to sieve on a relatively small zone for each special-q greatly
enhances the sieve performance.


4.2.3 Sieving without sieving
Before closing this chapter on sieving, it is important to mention an alterna-
tive algorithm proposed by D. Bernstein in [Ber00] that allows the implemen-
tation of index calculus algorithms without using sieving at all. Instead, his
algorithm addresses the problem of ¬nding smooth enough numbers within
a long list of numbers. A nice property of this algorithm is that contrary
to sieve algorithms, it does not require additional structure2 on the numbers
themselves. This interesting property could be especially useful for some spe-
ci¬c applications of index calculus such as Coppersmith factorization factory.
Note that this speci¬c algorithm is almost dedicated to index calculus algo-
rithms that use numbers, when polynomials come in play, it is easier to use
a fast polynomial factorization algorithm. However, Bernstein notes in his
paper that a careful comparison is required in the case of polynomials.
The key ingredient in Bernstein™s algorithm is to consider a large set of
numbers at the same time, to identify the set of primes belonging to the
smoothness basis that occur in the decomposition of the product of these
numbers and using a divide and conquer approach to distribute these primes
to smaller and smaller sets until the decomposition of each number is known.
The main di¬culty of the algorithm is the fact that it requires a two-way
divide and conquer approach, to ¬nd out which primes are occuring and in
which numbers. The algorithm proposed in [Ber00] nests two divide and
conquer approaches: the inner one is used to determine the set of primes
involved in a particular set of numbers and the outer one to focus on sets
on decreasing size. In order to make the inner loop e¬cient, the given set of
primes is preprocessed and represented as a product tree. The product tree is
a binary tree whose leaves are labelled by the given primes. Each inner node
of the tree is labelled by the product of the label of its two children. As a
consequence, the root of the product tree contains the product of all primes
in the given set.


2 With sieve algorithms, the additional structure comes from the fact that the numbers are
all obtained by evaluating a ¬xed polynomial at many regularly spaced points.




© 2009 by Taylor and Francis Group, LLC
Sieve algorithms 153

Given a set of primes P and a set of numbers N , the outer divide and
conquer step needs to determine the subset P of primes that occurs in the
decomposition of numbers in N . First, it computes the product ΠN of all
numbers in N and a product tree TP for P . Then it reduces ΠN modulo the
label of the root of the product tree TP . After that, it pushes this reduced
value down the product tree, computing ΠN modulo each label by reducing
the modular value of the father node in order to compute the reduced value
at each node. When the leaves are reached, it is easy to see that the prime
label of a leaf occurs in N if and only if the modular value of ΠN at this leaf
is zero. Thus, P is simply the subset of P corresponding where the modular
value of ΠN vanishes.
In order to re¬ne the decomposition, we simply need to split N in two
(or more) subsets and recursively apply the same algorithm on each subset
together with the prime set P . At the very end of the algorithm, we know
the list of primes for each numbers and can easily check for smoothness.
In order to improve the underlying number arithmetic, Bernstein proposes
a slightly di¬erent method of computing the modular values at each leaf of
TP . His approach assumes that 2 does not belong to the initial list of primes,
which is not a real constraint since trial dividing by 2 is so easy. Then, when
computing the reduced value of a node from the value of the father, instead
of doing an exact computation, he computes the modular value up to some
power of 2. This improves the arithmetic and of course does not change the
¬nal criteria for odd prime: a prime p occurs in the decomposition of N if
and only if ΠN reduced modulo p is zero. Since 2 is invertible modulo the
odd prime p, multiplying the value ΠN by a power of 2 preserves the ¬nal
zero. This modi¬cation changes non-zero values but this is irrelevant for our
purpose.
A fair conclusion to this section is to remark that determining the impact of
Bernstein™s algorithm on index calculus methods is currently an open problem.
At the time being, no large index calculus publicly announced computation
has ever made use of this algorithm to improve running times. However,
studying its practical impact is an interesting research problem.




© 2009 by Taylor and Francis Group, LLC
154 Algorithmic Cryptanalysis



Exercises
1. Program a basic implementation of Eratosthenes™s sieve using a wheel
of perimeter 30.
2. Modify the implementation of Exercise 1 into a segmented sieve, similar
to Program 4.2.
3h . Using your answer to Exercise 4.2 as a template, write a program that
generates implementations of Eratosthenes™s sieve with larger wheels of
¬xed perimeter. Adding the primes 7, 11 and 13, write a code for wheels
of perimeter 210, 2310 and 30030. Time these programs and compare
their respective performances.
4. Write a version of Eratosthenes™s sieve to deal with wheels of varying
perimeter.
5. Combine wheels of varying perimeter with a segmented sieve.
6h . Create a variant of Eratosthenes™s sieve for ¬nding irreducible polyno-
mials over F2 . Make sure that you are using Gray codes and write a
segmented version.

7. An amusing variation of sieving process aims at constructing “Lucky”
numbers. It starts with the list of odd integers, consider the second
number ˜3™ and cross every third number, i.e., 5, 11, . . . In this new list,
the third number is ˜7™, thus we cross every seventh number in the list,
starting with ˜19™. Write a basic sieve program for computing lucky
numbers. Which optimizations of Eratosthenes™s sieve can you adapt to
this case?
8h . Choose a prime p and a polynomial P of degree d. Compare the following
approaches to determine all the polynomials of the form P + a for a in
Fp that have d di¬erent roots.

(a) Sieving on a line.
(b) Individual factorization of each P + a.
(c) Adapted variant of Bernstein™s algorithm.




© 2009 by Taylor and Francis Group, LLC
Chapter 5
Brute force cryptanalysis




5.1 Introductory example: Dictionary attacks
Brute force is one of the less subtle1 ways of attacking cryptosystems; it
simply consists of trying all possible values for the secret to recover, until the
correct one is found. Yet, this approach is more tricky than it may seem.
To illustrate this point, we start this chapter by a special case of brute force
attacks: dictionary attacks. Typically, a dictionary attack is used to ¬nd out
the password that gives access to a user™s account on an information system.
Since passwords are used for a large number of purposes, dictionary attacks
can be used in a large variety of contexts.
At the ¬rst level of dictionary attacks, we ¬nd simple password guessing, as
often encountered in books and movies. The attacker simply sits at his target™s
desk, thinks deeply for a little while and, after a couple of unsuccessful trials,
comes up with the correct password.
A slightly more sophisticated version is to automate the process and have
a tool for submitting many passwords until the correct one is found. For
example, this can be easily done when trying to connect to a remote server.
This kind of dictionary attack is called an online dictionary attack. For each
new tested password, the attacker needs to try to open a new session. One
problem (or advantage, depending on the point-of-view) of online dictionary
attacks is that they can be easily detected. For example, smart cards are often
protected using a very short 4-digit password, also called PIN code2 . Thus,
any attacker could easily discover the correct PIN code, simply by trying out
the 10,000 di¬erent possibilities. Assuming a slow rate of one trial per second,
this could be done in a couple of hours. To avoid this attack, smart cards are
programmed to detect sequences of consecutive failures. After three failures,
they usually stop accepting PIN codes and put themselves in a locked state.
To unlock the card, the user needs a di¬erent, much longer password.
O¬„ine attacks are a much stronger form of dictionary attacks. In these
attacks, the attacker ¬rst recovers some side information and then using this


1 Of course, pointing a gun at the legitimate user and asking for the secret is even less subtle.
2 Where PIN stands for Personal Identi¬cation Number.



155
© 2009 by Taylor and Francis Group, LLC
156 Algorithmic Cryptanalysis

information is able to test on his own whether a given password is correct or
not. With o¬„ine attacks, the attacker is only limited by the raw computing
he has access to. The side information used by the attacker can be of several
di¬erent types: it can be the transcript of a protocol execution where the
password was used, it can be an entry from a password ¬le, or even some
side-channel information obtained from the system during password testing.
For our illustration, we focus on the simple case of password ¬les entries.
Password ¬les are used by operating systems or applications that need to check
user passwords in order to store side information that can be used to verify
correctness of passwords. In their simplest form, password ¬les simply store
plaintext copies of passwords. Of course, this is a bad idea, since in that case
passwords are directly accessible to an attacker that obtains the password
¬le, regardless of the passwords™ length. An equally bad idea is to store
encrypted passwords using a secret key algorithm, indeed, regardless of the
strength of the algorithm, the secret key needs to be stored somewhere to allow
password checking and if the attacker ¬nds this secret key, he can decrypt all
passwords in the ¬le. To avoid these straightforward attacks, the password
¬le can contain either passwords encrypted with a public key algorithm or
hashed passwords. It also needs to contain auxiliary information that allows
passwords to be tested against the password ¬le. For instance, when using a
randomized public encryption, the random value used for encryption needs to
be stored. This allows at a later point to reencrypt a given password and to
check equality between this new encryption and the stored encryption.
The most common approach is to use hashed passwords, indeed, it is much
faster than using public key encryption. However, there is more to it than
simply running passwords through a standard hash function. Indeed, in that
case, attackers could prepare in advance a dictionary of hashed passwords and
store them in sorted form (see Chapter 6). This would allow using a simple
search to discover hashed passwords from the dictionary extremely quickly.
To avoid this, the hash function needs to be randomized. A simple approach
is to store a hash value of the string formed by concatenating a reasonable
short random string together with a user™s password. To allow veri¬cation,
the random string is stored together with the corresponding hash value in the
password ¬le. When the random value is quite short and used as a side input
that modi¬es the computation of hash values, it is often called a salt. For
example, for decades, Unix-based operating systems used a salted variation
of the DES encryption algorithm to hash passwords. In order to avoid the
possibility of decryption, the password was used as key and served to encrypt
a ¬xed message. Even with a relatively short salt, storing hashed dictionaries
is highly impractical. Indeed, since you need enough dictionaries to cover
a noticeable fraction of all possible salts, the storage requirements quickly
become impractical.
More recently, passwords have been used, in conjunction with public key
mechanisms, in order to protect cryptographic protocols between users which
only share a short password (see [GL03]). In this context, the goal of the




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 157

designers is to make make sure that the only e¬cient attacks need one online
attempt to test one possible password and that, under some security hypoth-
esis, no o¬„ine attack exists.
In addition to brute force, dictionary attacks may also rely on time-memory
tradeo¬s. A typical example is Hellman™s time memory tradeo¬, described in
Chapter 7. Where brute force attacks are concerned, the key ingredient is to
combine an e¬cient dictionary of potential passwords with a very fast imple-
mentation of the underlying cryptographic algorithm. In the next section, we
illustrate this issue of fast implementations using the famous DES algorithm.




5.2 Brute force and the DES algorithm
The DES, Data Encryption Algorithm [DES77], was the ¬rst modern block
cipher available to a wide audience. It was standardized by the NBS, National
Bureau of Standards, in 1977. This block cipher encrypts 64-bit blocks using
56-bit keys. At the time of DES standardization, this keysize was judged to
be too small and potentially vulnerable to brute force attacks. However, no
successful brute force attack against DES was publicly reported until 1997. At
that time, RSA Inc. launched three successive challenges, asking challengers
to decrypt cyphertexts encrypted with the DES. The ¬nal challenge in 1999
was broken in less than a day, using a mix of hardware and software.
When software brute force attacks against DES are considered, it is es-
sential to use DES implementations that are very e¬cient. However, using
implementations that have been written with fast message encryption in mind
are far from optimal when used for cryptanalytic purposes. The main reason
is that to optimize the encryption speed, it is often a good idea to assume that
keys change once in a while and that message blocks change rapidly. For brute
force cryptanalysis, it is exactly the contrary, the known plaintext/ciphertext
pairs (or the available ciphertext for ciphertext only attacks) are ¬xed, and
the test key changes extremely quickly. As a consequence, optimizing encryp-
tion algorithms with brute force in mind is a very speci¬c task. In the rest of
this section, we describe in detail how this can be done in the case of DES.
For the sake of completeness, we start by recalling the description of the DES
algorithm.


5.2.1 The DES algorithm
DES is based on the Feistel construction and uses 16 consecutive rounds of
Feistel. Before applying these rounds, we ¬nd an unkeyed initial permutation
which moves bits around. After the Feistel rounds, we ¬nd an unkeyed ¬nal
permutation which is the inverse of the initial permutation. In the standard




© 2009 by Taylor and Francis Group, LLC
158 Algorithmic Cryptanalysis

58 50 42 34 26 18 10 2
60 52 44 36 28 20 12 4
62 54 46 38 30 22 14 6
64 56 48 40 32 24 16 8
57 49 41 33 25 17 9 1
59 51 43 35 27 19 11 3
61 53 45 37 29 21 13 5
63 55 47 39 31 23 15 7


Table 5.1: DES initial permutation

40 8 48 16 56 24 64 32
39 7 47 15 55 23 63 31
38 6 46 14 54 22 62 30
37 5 45 13 53 21 61 29
36 4 44 12 52 20 60 28
35 3 43 11 51 19 59 27
34 2 42 10 50 18 58 26
33 1 41 9 49 17 57 25


Table 5.2: DES ¬nal permutation


DES description, individual bits within blocks are numbered from left to right,
from 1 to 64 for full blocks, from 1 to 32 for half blocks or from 1 to t for
t-bit subblocks. Using this convention, the initial and ¬nal permutations of
DES are given in Tables 5.1 and 5.2. It says that the ¬rst bit of the permuted
input is bit 58 of the original input, that the second bit is bit 50, the ninth
bit is bit 52 and so on. It is worth noting that if the inputs are viewed as a
concatenation of eight bytes, then the ¬fth permuted byte is the concatenation
of the ¬rst bits of all original bytes, the ¬rst permuted byte the concatenation
of second bits and similarly for other bytes. Thus, implementing the input
permutation in hardware costs virtually nothing when data is entered on an
8-bit bus. It su¬ces to consider that each bit of the bus is a serial input and
to put together each series of incoming bits in a corresponding byte.
After the initial permutation, the 64 bits of plaintext are split into two
32-bit halves L0 and R0 . The left half L0 is formed of bits 1 to 32 of the
permuted input. The right half R0 is formed of bits 33 to 64 of the permuted
input. For each round of Feistel numbered from 1 to 16, DES generates a
speci¬c subkey: Ki for round i. At round i, the algorithm computes:

Li = Ri’1 and Ri = Li’1 • f (Ri’1 , Ki ), (5.1)

where f is the round function keyed by Ki . After round 16, the concatenation
of R16 and L16 is passed through the ¬nal permutation. Note that on output,




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 159

16 7 20 21 29 12 28 17
1 15 23 26 5 18 31 10
2 8 24 14 32 27 3 9
19 13 30 6 22 11 4 25


Table 5.3: Permutation of the round function



the left and right halves are reversed. This is an important feature; thanks to
it DES (with round keys in inverse order) is its own inverse.
To make our description of DES complete, we now need to describe the
round function f and the key expansion process which is used to derive the
round keys from the DES 56-bit key. The round function can be written as:


f (x, k) = P —¦ S(E(x) • k), (5.2)


where E is a linear expansion from 32 to 48 bits, S a non-linear transform
consisting of 8 S-boxes from 6 to 4 bits each and P is a bit permutation on 32
bits. The linear expansion E expands its 32 bits of input by duplicating some
of them. Using the same bit numbering convention as before, the expansion E
is described in Table 5.4. After expansion, the block is xored with the round
subkey, then split into 8 slices of 6 bits each. The ¬rst slice, consisting of bits 1
to 6, is used as input for the ¬rst S-box, usually called S1. Similarly, the other
slices are used as input to the other S-boxes, S2 to S8. The outputs of these
S-boxes are then concatenated back into a 32-bit word, starting by S1 and
ending with S8. The S-boxes S1 to S8 are described in Tables 5.5 to 5.12. The
reader unaccustomed to the description of DES, should take great care with
these tables. Indeed, they are presented in a way that emphasizes the presence
of four permutations on the set of integers from 0 to 15, embedded within each
S-box. However, they are not used in a straightforward manner. Each S-box
takes 6 bits of the expanded input xored with the round subkey. Due to the
structure of the expansion, the middle two bits in each set of six are only
used once, while the other four are also used in neighbour sets. The S-boxes
are used in a way that re¬‚ects this important fact. When computing Si on a
set of six bits b1 b2 b3 b4 b5 b6 , the number obtained from b1 b6 (i.e., 2b1 + b6 ) is
used to determine the row number from 0 to 3 and the number obtained from
b2 b3 b4 b5 is used to determine the column number from 0 to 15. The output
value of the S-box is then interpreted as a bitstring c1 c2 c3 c4 corresponding
to the number 8c1 + 4c2 + 2c3 + c4 . After concatenation, the output bits of
S1 are numbered from 1 to 4, while the output bits of S8 are numbered from
29 to 32. Using this way to read the S-boxes makes sure that bits duplicated
during the expansion are used once as row indices and once as column indices,
while non-duplicated bits are only used as column indices.




© 2009 by Taylor and Francis Group, LLC
160 Algorithmic Cryptanalysis

32 1 2 3 4 545 6 7 8 9 8 9 10 11
12 13 12 13 14 15 16 17 16 17 18 19 20 21 20 21
22 23 24 25 24 25 26 27 28 29 28 29 30 31 32 1


Table 5.4: Expansion of the round function


14 4 13 1 2 15 11 8 3 10 6 12 5 9 07
0 15 7 4 14 2 13 1 10 6 12 11 9 5 38
4 1 14 8 13 6 2 11 15 12 9 7 3 10 50
15 12 82 4 9 1 7 5 11 3 14 10 0 6 13


Table 5.5: S-box S1

15 1 8 14 6 11 3 4 9 7 2 13 12 0 5 10
3 13 4 7 15 2 8 14 12 0 1 10 6 9 11 5
0 14 7 11 10 4 13 1 5 8 12 6 9 3 2 15
13 8 10 1 3 15 4 2 11 6 7 12 0 5 14 9


Table 5.6: S-box S2

10 09 14 6 3 15 5 1 13 12 7 11 4 2 8
13 70 93 4 6 10 2 8 5 14 12 11 15 1
13 64 98 15 3 0 11 1 2 12 5 10 14 7
1 10 13 06 9 8 7 4 15 14 3 11 5 2 12


Table 5.7: S-box S3

7 13 14 3 0 6 9 10 1 2 8 5 11 12 4 15
13 8 11 5 6 15 0 3 4 7 2 12 1 10 14 9
10 6 9 0 12 11 7 13 15 1 3 14 5 2 8 4
3 15 0 6 10 1 13 8 9 4 5 11 12 7 2 14


Table 5.8: S-box S4

2 12 4 1 7 10 11 6 8 5 3 15 13 0 14 9
14 11 2 12 4 7 13 1 5 0 15 10 3 986
4 2 1 11 10 13 7 8 15 9 12 5 6 3 0 14
11 8 12 7 1 14 2 13 6 15 0 9 10 453


Table 5.9: S-box S5




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 161

12 1 10 15 9 2 6 8 0 13 3 4 14 7 5 11
10 15 4 27 12 9 5 6 1 13 14 0 11 3 8
9 14 15 5 2 8 12 3 7 0 4 10 1 13 11 6
4 3 2 12 9 5 15 10 11 14 1 7 6 0 8 13


Table 5.10: S-box S6

4 11 2 14 15 0 8 13 3 12 9 7 5 10 6 1
13 0 11 7 4 9 1 10 14 3 5 12 2 15 8 6
1 4 11 13 12 3 7 14 10 15 6 8 0 5 9 2
6 11 13 8 1 4 10 7 9 5 0 15 14 2 3 12


Table 5.11: S-box S7


5.2.1.1 Round key expansion
In the DES algorithm, the 56-bit key is extracted from a 64-bit word with
bits numbered from 1 to 64, by ignoring the high order bit of each byte, i.e.,
ignoring bits 8, 16, . . . , 64. The remaining bits are split in two halves after
a permutation called PC-1 described in Table 5.13. The bits obtained from
the ¬rst part of the table are stored in a register C0 and the bits obtained
from the second part of the table in a register D0 . At round i, registers Ci’1
and Di’1 are transformed into two new values Ci and Di using left rotations.
The 48-bit round key Ki is obtained by concatenating 24 bits from Ci and 24
bits from Di the selected bits are obtained using PC-2 from Table 5.14, under
the convention that bits from Ci are numbered from 1 to 28 and bits from Di
from 29 to 56. To obtain Ci and Di from Ci’1 and Di’1 the left rotations
rotate two bits to the left except when i is 1, 2, 9 or 16 where they are rotated
by a single bit. As a consequence, the accumulated amount of rotation during
the 16 rounds is 28 bits, thus C16 = C0 and D16 = D0 .


5.2.2 Brute force on DES
First, we can note that during known plaintext brute force attacks, the ini-
tial and ¬nal permutations of DES can safely be ignored since the permuta-
tions can be applied beforehand to plaintext/ciphertext pairs. For ciphertext


13 2 84 6 15 11 1 10 9 3 14 5 0 12 7
1 15 13 8 10 3 7 4 12 5 6 11 0 14 9 2
7 11 4 1 9 12 14 2 0 6 10 13 15 3 5 8
2 1 14 7 4 10 8 13 15 12 9 0 3 5 6 11


Table 5.12: S-box S8




© 2009 by Taylor and Francis Group, LLC
162 Algorithmic Cryptanalysis

57 49 41 33 25 17 9
1 58 50 42 34 26 18
10 2 59 51 43 35 27
19 11 3 60 52 44 36
63 55 47 39 31 23 15
7 62 54 46 38 30 22
14 6 61 53 45 37 29
21 13 5 28 20 12 4


Table 5.13: Permutation PC-1 of the key bits

14 17 11 24 1 5
3 28 15 6 21 10
23 19 12 4 26 8
16 7 27 20 13 2
41 52 31 37 47 55
30 40 51 45 33 48
44 49 39 56 34 53
46 42 50 36 29 32


Table 5.14: Table PC-2 to extract Ki from Ci and Di


only attacks, the input permutation often allows us to express our knowledge
about the plaintext distribution in a more compact form. For example, if
we know that the plaintext consists of printable ASCII characters, we know
that all high order bits are zeros. After the initial permutation, this implies
that bits 25 to 32 form a block of eight consecutive zeros. As a consequence,
an implementation of DES dedicated to brute force attacks does not need to
implement the initial and ¬nal permutations. Similarly, several other minor
improvements are possible. For testing several keys that di¬er only on a few
bits, it is not necessary to recompute the ¬rst rounds of decryption completely.
It is also not necessary to decrypt the whole block of message for all keys. In-
deed, if we compute a few bits of the decrypted message and see that they
are wrong, the rest of the bits are useless.
The most e¬cient idea to attack DES by brute force is probably the bitslic-
ing technique proposed by Biham in [Bih97]. With this technique, we view the
available computer as a parallel computer that operates in parallel on many
single bit registers. At ¬rst, this approach may seem surprising; however, it
allows us to replace many slow operations such as bit permutations and S-
box accesses by much faster operation. Depending on the speci¬c computer
used for the computation, we may obtain 32, 64 or even 128 bit operations in
parallel. This level of parallelism is used to try several keys simultaneously.
As a consequence, to get a fast bitslice implementation of DES, it essentially
su¬ces to express DES e¬ciently using bit operations. We now study in more




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 163

details how the various basic operations of DES may be implemented using
bit operations:
• Bit permutations: Permuting bits on a single-bit computer essentially
comes for free. Instead of moving values around, it su¬ces to wait until
the time where the permuted values are used and to directly access the
corresponding non-permuted register.
• Expansion: Since the DES expansion is a simple copy of bits, instead
of doing the copy, we proceed as we explain above for bit permutations
and directly read the required bit when needed.
• Xor: Since xoring is a bit by bit operation, it can clearly be done
without any change at the bit level.
• S-boxes: In fact, S-boxes access are the only operations which are not
straightforward on a single bit computer. Each S-box is a function from
6 bits to 4 bits. Alternatively, it can be viewed as four functions from 6
bits to 1 bit. Moreover, it is well known that any bit valued function can
be written using only XOR and Logical-AND operations. Thus, S-boxes
can be computed on single bit computers without any real di¬culty. To
optimize bitslice implementations of DES, it is important to express
each S-box, using as few bit operations as possible. Currently, the best
available expressions are those described by Matthew Kwan in [Kwa00].
In addition to this bitslicing technique, one should also remark that at
the very end of the computation the internal structure of the DES allows us
to stop almost two rounds early. On a total of 16 rounds, this permits to
save another 10% of the running time. First, the ¬nal round can be avoided
in most cases, indeed, once the penultimate round has been performed, half
of the ciphertext values for the set of keys being tested are already known.
Most of the time, these values do not match the expected ciphertext and the
computation can be aborted. Similarly, thanks to the S-box structure, this
penultimate round can be computed by small 4-bit chunk. As soon as one
chunk does not match the expected value, the corresponding key can safely
be forgotten. This approach is called an early abort strategy for the brute
force computation.




5.3 Brute force as a security mechanism
As most cryptanalysis tools, brute force can also be used to add security
features in computerized systems. The basic idea is to remark that in many
computer-assisted tasks, the amount of computation required from a legiti-
mate user is often extremely small or even negligible. For example, testing




© 2009 by Taylor and Francis Group, LLC
164 Algorithmic Cryptanalysis

the correctness of a password by hashing at in Section 5.1 takes an extremely
small amount of time. Especially when this amount of time is compared to the
inherent delay of the human-computer interface involved. Indeed, no user re-
ally cares whether the password veri¬cation can be done within a millisecond
or a tenth of a second. For practical purposes, this is as good as immediate.
Of course, longer delays of the order of the second might be perceived as un-
bearable. Similarly, when sending out emails, a small delay would be perfectly
acceptable. This simple practical remark is the key to the material in this sec-
tion. During the short delay induced by the user, the computer could perform
some real work instead of simply waiting. For password testing for example,
one can easily conceive a slow hash function that would take a full tenth of a
second to test a password. No user would see the di¬erence. However, from
an attacker point-of-view any slow down in the password testing routine is
going to slow any attempt to dictionary attack by the same factor. Testing
a password in a tenth of a second instead of a millisecond is not an issue for
the legitimate user, but it costs the attacker a factor of a hundred, either in
terms of running time or of computing power.
However, implementing this simple idea is tricky. It is not enough to make
the password testing routine slow. For example, in the early days of computer
science, wait loops were added to the password testing routines. However,
since an attacker only need to remove the wait loops to make his code for
dictionary attacks faster, this approach is essentially useless. Similarly, using
unoptimized code for a password testing routine is not a solution. Any moti-
vated attacker can sit down and optimize the code on his own. Instead, one
should make sure to use a deeply optimized code implementing an inherently
slow function. These functions have been proposed in the literature under
the name of moderately hard-function. They were initially introduced
in [DN93], an interesting variation making use of memory accesses to slow the
computation even on fast computers was proposed in [ABW03].




5.4 Brute force steps in advanced cryptanalysis
Brute force is not necessarily a stand-alone cryptanalytic technique, it can
also occur as the ¬nal step of a larger cryptanalytic e¬ort. To illustrate this,
we propose to look at the di¬erential cryptanalysis of the hash function SHA-0
and its practical implementation. The hash function SHA-0 was proposed in
1993 by NIST, under the name SHA. It was modi¬ed in 1995 and the new
version is called SHA-1. We use the name SHA-0 to emphasize that we are
considering the early version of 1993. In this section, we describe SHA-0, show
how to devise a di¬erential attack to ¬nd collisions in this hash function.
Finally, we show that the implementation of this di¬erential attack amounts




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 165

to sampling using brute force a large enough set of message pairs.


5.4.1 Description of the SHA hash function family
Since SHA-0 and SHA-1 are very similar, they can easily share a common
description. For this common description, we refer to both algorithms using
the generic name of SHA. The general architecture of SHA is very similar to
other existing hash functions. It consists of two parts, a low-level compression
function which underlies the hash function and a mode of operation which
turns it in a full-¬‚edged hash function. The low level compression function
takes as input two values, one on 160 bits and one on 512 bits and outputs
a 160 bits value. It is built from an ad-hoc block cipher with 160-bit blocks
and 512-bit keys. The block cipher is transformed into a function, rather
than a permutation, using a variation of the Davies-Meyer transform. The
mode of operation used SHA is the Merkle-Damg˚ construction with ¬nal
ard
strengthening. The idea of the construction is to process the message to be
hashed iteratively by individual 512-bit blocks. Starting from a ¬xed initial
value IV on 160 bits and the ¬rst block of message, the compression function
produces a new 160-bit value. This value is compressed together with the
second block of message and so on . . .
To make the description more precise, let us see in details how a message
M is processed. The ¬rst step is to make sure that M is encoded into a
multiple of the block size in a non-ambiguous fashion. This requirement is
very important. Indeed, if two di¬erent messages M and M can be encoded
into the same sequence of blocks, then they necessarily have the same hash
value and the hash function cannot be collision resistant. With SHA, a simple
encoding is used. The bitstring representation of M is padded with a single
bit set to 1, a sequence of bits set to 0 and ¬nally a 64-bit representation
of the bitlength of the unpadded message M . The length of sequence of
zeros is the smallest possible number to have a multiple of 512 as length of
the padded message. Thus, the number of zeros is contained in the interval
between 0 and 511. It is worth noting that with this padding the length of the
unpadded message is encoded twice, once explicitly by the 64-bit number at
the end of the padding and once implicitly, since the original message can be
recovered by removing from the padded message the 64 ¬nal bits, all trailing
zeros and a single one. This double encoding of the length was introduced
as a strengthening of the Merkle-Damg˚ construction. It is usually called
ard
MD-strengthening.
After padding, the message can be divided into 512-bit blocks, numbered
M1 , M2 , . . . , M , where is the number of blocks. Denoting by F the com-
pression function, the hash value of M is computed as follows:

h0 = IV (5.3)
hi = F (hi’1 , Mi ) for all i from 1 to .




© 2009 by Taylor and Francis Group, LLC
166 Algorithmic Cryptanalysis

The last value h is the hash value of M . The ¬rst value h0 is the ¬xed initial
value. For convenience, the other values hi are called intermediate hash values.
The compression function F of SHA is described in Section 5.4.1.2.

5.4.1.1 Security properties of the Merkle-Damg˚ construction
ard
As discussed in Chapter 1, when considering hash functions, we are inter-
ested by three security properties: collision resistance, preimage resistance
and second preimage resistance.
When considering the Merkle-Damg˚ construction, it is possible to show
ard
that the security of the hash function H can be derived from the security
of the compression function H. For example, given a collision for H, it is
easy to produce a collision for the compression function F . First, if M and
M have di¬erent lengths, their ¬nal blocks M and M are di¬erent. If
H(M ) = H(M ), we necessarily have:

F (h ’1 , M ) = F (h ’1 , M ), (5.4)

which is a collision for F . Otherwise, M and M are di¬erent messages of the
same length and we look at the largest block index i in the interval [1, ] such
that either Mi = Mi or hi’1 = hi’1 . Of course, hi = hi and

F (hi’1 , Mi ) = F (hi’1 , Mi ) (5.5)

is a collision for F . It is important to remark that in general we obtain
collisions for F with di¬erent values on the h component. As a consequence,
proving that F (h, ·) is collision resistant for all ¬xed values of h is not su¬cient
to establish collision resistance of H. In relation to this comment, note that
some speci¬c di¬erential attacks, called multiblock attacks, require several
consecutive calls to F before producing a collision for H. From this multiblock
attack, one can derive a collision on F but usually not on F (h, ·).
Using a very similar analysis, we can prove that if F is preimage resistant
then H is preimage resistant. Indeed, if we can ¬nd a preimage M of a target
value h, we see that F (h ’1 , M ) = h and obtain a preimage of h for the
compression function F .
However, where second preimage resistance is concerned, we obtain a slightly
weaker result. Any second preimage attack on H can be used to obtain a col-
lision attack on F , but not necessarily a second preimage attack on F .

5.4.1.2 Compression function of SHA
At the core of the compression function of SHA, we ¬nd a speci¬c block
cipher that encrypts 160-bit blocks using a 512-bit key. In order to be ef-
¬cient in software, this speci¬c block cipher is mostly based on 32-bit un-
signed arithmetic. As a consequence, each 160-bit block is decomposed in ¬ve
32-bit values. The input to the block cipher is thus denoted by the vector
A(0) , B (0) , C (0) , D(0) , E (0) . The block cipher itself is a generalized Feistel




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 167

Function f (i) (x, y, z) Constant K (i)
Round i
Name De¬nition
(x § y) ∨ (¯ § z)
0“19 IF x 0x5A827999
(x • y • z)
20“39 XOR 0x6ED9EBA1
MAJ (x § y) ∨ (x § z) ∨ (y § z)
40“59 0x8F1BBCDC
(x • y • z)
60“79 XOR 0xCA62C1D6


Table 5.15: De¬nition of the round functions and constants



scheme with 80 rounds. At round i, we denote the inner state of the en-
cryption algorithm by A(i) , B (i) , C (i) , D(i) , E (i) . To update the inner state
between round i and round i+1, a new value is computed for A(i+1) by mixing
together the previous values of A, B, C, D, E and some key material. At the
same time, B (i+1) , C (i+1) , D(i+1) and E (i+1) are obtained as (rotated) copies
of A(i) , B (i) , C (i) and D(i) . More precisely, for each round, with i varying
from 0 to 79, the registers A, B, C, D and E are updated according to the
following formulas:

A(i+1) = ROL5 A(i) + f (i) (B (i) , C (i) , D(i) ) + E (i) + W (i) + K (i) ,
B (i+1) = A(i) ,
C (i+1) = ROL30 B (i) , (5.6)
D(i+1) = C (i) and
E (i+1) = D(i) .

In these equations, addition is performed modulo 232 and ROLb X denotes the
value obtained by rotating b bits to the left the binary representation of the
32-bit unsigned integer X. Moreover, f (i) is a round function and K (i) a round
constant, they both depend on i. They are speci¬ed in Table 5.15. Finally,
W (i) is the contribution of the 512-bit key to round i. The 16 values from
W (0) to W (15) are simply obtained by decomposing the 512-bit input into 16
consecutive 32-bit words. The next 64 values W (16) to W (79) are obtained as
the result of an expansion process. In SHA-0 the expansion is de¬ned by:

W (i) = W (i’3) • W (i’8) • W (i’14) • W (i’16) .
∀ i : 16 ¤ i < 80, (5.7)

Note that this only di¬erence between SHA-0 and SHA-1 concerns this expan-
sion. In SHA-1, the expansion becomes:

W (i) = ROL1 W (i’3) • W (i’8) • W (i’14) • W (i’16) . (5.8)

After 80 rounds, the block cipher output is A(80) , B (80) , C (80) , D(80) , E (80) .
However, despite the fact that SHA is based on the block cipher as mentioned at




© 2009 by Taylor and Francis Group, LLC
168 Algorithmic Cryptanalysis

the beginning of this section, using the Merkle-Damg˚ construction directly
ard
with a block cipher is not a good idea; see Exercise 5 in Chapter 6. Instead,
an additional step, similar to the Davies-Meyer construction, is used to trans-
form this block cipher into a function, rather than a permutation. Recall that
the Davies-Meyer construction is a very useful way to turn a cryptographic
permutation π into a cryptographic function x ’ π(x) • x. With SHA, a
variation based on 32-bit addition in place of the exclusive or is used. With
this transform, the ¬nal output of the compression function is:

A(80) + A(0) , B (80) + B (0) , C (80) + C (0) , D(80) + D(0) , E (80) + D(0) .


5.4.2 A linear model of SHA-0
In order to study the propagation of di¬erences in SHA-0 and to learn how
to control this propagation well enough to construct collisions, it is very useful
to start by looking at a fully linearized variation of SHA-0. First, remark that
in SHA-0, there are two sources of non-linearity: the f (i) functions in rounds
0 to 19 and in rounds 40 to 59 (respectively IF and MAJ) and the addition
modulo 232 . Indeed, addition modulo 232 may involve propagation of carries,
and as a consequence it is not linear over F2 . The linear model of SHA-0 is
obtained by replacing the additions by XOR operations on 32-bit words and
by using the XOR function as round function in every round. We also replace
addition by XOR in the ¬nal Davies-Meyer transform.
With this change, we obtained a completely linear compression function.
As a consequence, ¬nding collisions becomes a straightforward task. It su¬ces
using linear algebra algorithms to construct an element of the kernel of this
function. However, this approach does not shed any light on SHA-0 itself.
Instead, we are now going to write this linear algebra attack using a speci¬c
sparse approach. The advantage is that this approach can then be transformed
into a probabilistic attack of SHA-0. The speci¬c approach splits the problem
of collision ¬nding into two easier tasks. The ¬rst task concentrates on a
small number of consecutive rounds ignoring the expansion of the vector W ,
yielding local collisions. The second task is to merge together several local
collisions while remaining consistent with the expansion.

5.4.2.1 Local collisions in linearized SHA-0
In order to construct local collisions in the linearized version of SHA-0, it
su¬ces to introduce a change on a single bit of W , to follow the propagation
of this change in the subsequent rounds and using adequate corrections to pre-
vent the initial change from impacting too many bits. More precisely, starting
from a common inner state A(i) , B (i) , C (i) , D(i) , E (i) , assume that we per-
form in parallel two partial hash computations. In the ¬rst computation, we
(i)
input a message block W1 and in the second computation a message block




© 2009 by Taylor and Francis Group, LLC
Brute force cryptanalysis 169
(i) (i) (i)
W2 . We call δ = W1 • W2 the di¬erence3 of the two message blocks. To
make the analysis simpler, it is useful to assume that the di¬erence δ is on a
single bit.
Clearly, by linearity, the di¬erence δ directly propagates in the computation
(i+1) (i+1)
of A(i+1) and we ¬nd that A1 • A2 = δ. In the subsequent steps, this
(i+1) (i+2)
, C (i+3) , D(i+4) and E (i+5) . How-
initial di¬erence on A moves to B
ever, if we do not make any other changes in the message blocks, secondary
changes also occur in A(i+2) , A(i+3) , . . . To control the propagation of the
di¬erences, the key idea is to make other changes in W in order to prevent
any secondary change in A. We refer to these additional changes as correc-
tions. The ¬rst correction is to prevent the di¬erence in A(i+1) to propagate to
A(i+2) . The reader can easily check that, by linearity, this is e¬ected by choos-
(i+1) (i+1)
• W2
ing W1 = ROL5 (δ). The second correction prevents the di¬erence
(i+2) (i+2)
(i+2)
to a¬ect A(i+3) , it consists of choosing W1 • W2
in B = δ. The
(i+3) (i+4)
, since C (i+3)
third correction prevents the di¬erence in C to a¬ect A
(i+3) (i+3)
is a rotated copy of B (i+2) , the correction requires W1 •W2 = ROL30 (δ).
(i+4)
and E (i+5) using
Finally, the fourth and ¬fth corrections account for D
(i+4) (i+4) (i+5) (i+5)
• W2 • W2
W1 = ROL30 (δ) and W1 = ROL30 (δ).
After that ¬fth correction, we see that the initial di¬erence vanishes and
thus the two parallel computations yield the same inner state

A(i+6) , B (i+6) , C (i+6) , D(i+6) , E (i+6) .

This propagation of di¬erences is summarized in Figure 5.1. In this ¬gure,
A(i) [j] denotes the j-th bit of A(i) .

5.4.2.2 Combining local collisions with the message expansion
In order to combine local collisions with the message expansion, it su¬ces to
remark that thanks to linearity, xoring together the changes and corrections of
several local collisions yields a possible pattern of collision. As a consequence,
it su¬ces to put together several local collisions in a way that guarantees
that the XOR of all involved changes and corrections is a valid output of
the message expansion. With SHA-0, the fact that the expansion is not only
linear but also applies in parallel to individual bits of the words W is quite
helpful. Indeed, any valid expanded sequences of words W0 , . . . , W79 can
be constructed by pasting together 32 expanded sequences of bits. Since the
expansion computes 80 expanded bits from 16 initial bits, it is even easy to
exhaustively compute the 216 di¬erent expanded sequences on a single bit
position.
The key idea is to insert several local collisions in a way that follows such
an expanded sequence of bits. In particular, this means that all changes of

3 This is a standard name in di¬erential cryptanalysis, reminiscent from the fact that in F2 ,
a ’ b is the same thing as a • b.




© 2009 by Taylor and Francis Group, LLC
170 Algorithmic Cryptanalysis

Corrections
Perturbation
on bit 1

<<

. 6
( 18)



>>