. 9
( 18)


After using Nivasch™s algorithm, we may still want to ¬nd the entrance of
the cycle and obtain a real collision. In that case, we can of course use Algo-
rithm 7.3; however, this algorithm can be improved by using the information
already collected by Nivasch™s algorithm. Indeed, let us consider the state of
the stack in Algorithm 7.4 at the point where the collision is detected. At
that point in time i, we know that the current value x is equal to a value on
the stack Sk = (x, j). Clearly, thanks to the principle of Nivasch™s algorithm,
x is the minimal value of the whole cycle. As a consequence, unless k = 0, we
know that Sk’1 corresponds to a value outside of the cycle. If Sk’1 = (Xt , t),

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 231

this value Xt can be used as a replacement for the start-point X0 in Algo-
rithm 7.3. When k = 0, we use the original start-point X0 . On average, this
allows us to recover the start of the cycle a bit faster.

7.2 Analysis of random functions
The good behavior of Floyd™s algorithm and Brent™s algorithm depends on
the speci¬c properties of the function F used to de¬ne the considered recur-
sive sequences. We already known that some functions such as F (X) = X + 1
(mod 2n ) exhibit extremely bad behaviors. In this section, we present back-
ground information about the case of random functions, also called random
mappings. A very nice survey about this topic is Random Mapping Statis-
tics [FO90] by Flajolet and Odlyzko. It derives a large number of useful facts
and statistics about random mappings and their cryptographic applications.
All the facts stated in this section are extracted from [FO90] and apply to
the graph of a random function on a set of N elements. They are asymptotic
results and give asymptotic equivalents for the various parameters as N tends
to in¬nity.
Each random function F on the set SN of N elements is also viewed as a
directed graph GF whose vertices are the points of SN and whose directed
edges are of the form (v, F (v)), for all vertices v in SN . Each vertex in the
graph has out-degree 1, the in-degree is variable, it can be zero or possibly
a large number. Of course, the average in-degree is equal to the average
out-degree and thus to 1.

7.2.1 Global properties
Connected components. Asymptotically, the number of connected com-
ponents in the graph GF , is log N/2.
We recall that each connected component is obtained by grouping together
all the points that can be reached from a start-point by following the edges
of GF , either in the forward or the backward direction.

Terminal nodes. A terminal node in GF is a vertex v of in-degree 0. Equiv-
alently, it is an element v of SN not in the image of the function F . Asymp-
totically, the number of terminal nodes is N/e, where e = exp(1) is the basis
of natural logarithms.

Image points. An image point is a point in the image of F , asymptotically,
there are (1 ’ 1/e)N image points in GF .

© 2009 by Taylor and Francis Group, LLC
232 Algorithmic Cryptanalysis

Iterate image points. A point v of SN is a k-th iterate image point, if and
only if, there exists an element x of SN such that v = F (k) (x), where F (k)
denotes k consecutive applications of F . Note that a k-th iterate image point
also is a (k ’ 1)-th iterate image point. Asymptotically, the number of k-th
iterate image points is (1 ’ „k )N, where „k is de¬ned by the recursion law:
„0 = 0 and „k+1 = e„k ’1 . (7.6)

Cyclic nodes. A cycle node v of GF is a node that belongs to some cycle of
GF , i.e., v can be obtained as a k-th iterate image of itself for some value of k.
Asymptotically the average number of cyclic nodes in a random mapping is

7.2.2 Local properties
By opposition to global properties which only depend on the graph GF
itself, we now consider local properties obtained when looking at GF from
some random start-point. These local properties are extremely important for
us, since they mostly dictate the behavior of cycle ¬nding algorithms.

Tail and cycle lengths. As before, the tail length is the distance from
the start-point to the cycle obtained when iterating F . Asymptotically, for a
random mapping and a random start-point, the average tail length is πN/8.
In the same conditions, the average cycle length is equal to the average tail
length and also is πN/8. This implies that on average, the total length on
the sequence obtained by recursively applying F to the start-point up to its
¬rst repetition is πN/2.

Component size and tree size. On average the size of the connected
component containing the starting point is 2N/3. Similarly, if we restrict
ourselves to the non-cyclic part of this component and take the tree containing
all points which enter, in the same place, the same cycle as our start-point,
then the average size of this tree is N/3.
Note that these results imply that a large fraction of the points in the graph
GF are grouped in a single connected component. This component is often
called the giant component of the random mapping.

Predecessor size. Another question we can ask is the number of points
which are iterated preimages of our chosen start-point. On average, the num-
ber of such predecessors is πN/8.

7.2.3 Extremal properties
Another important class of properties within the graph of random mappings
is the class of extremal properties.

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 233

Longest paths. The ¬rst extremal property we would like to determine is
the behavior of a random mapping when choosing the worst possible starting
point. How long do cycle ¬nding algorithms take in this worst case? The
good news is that the expected length of the longest tail, longest cycle and
longest path (including both tail and cycle) are of the same order of magnitude

O( N ) as in the average case. Only the constants di¬er.
More precisely, asymptotically the expected length of the longest cycle is
√ √
with c1 ≈ 0.782. The expected length of the longest tail is c2 N with
c1 N√
= 2π log 2 ≈ 1.737. Finally, the expected length of the longest path is
c2 √
c3 N with c3 ≈ 2.415. It is remarked in [FO90] that quite interestingly,
c3 < c1 + c2 . As a consequence, for a non-negligible fraction of random
mappings, the longest tail does not lead to the longest cycle.

Giant component. Other noteworthy properties are the average size of the
giant component and of the largest tree in the graph of a random mapping.
Due to the results on the average component size and average tree size for a
random start-point, we already know that these sizes are respectively larger
than 2N/3 and N/3. Asymptotically, the giant component is of size d1 N and
the largest tree of size d2 N , with d1 ≈ 0.758 and d2 ≈ 0.48.

7.3 Number-theoretic applications
7.3.1 Pollard™s Rho factoring algorithm
Pollard™s Rho is a very practical factoring algorithm introduced in 1975 by
Pollard [Pol75]. This algorithm aims at ¬nding relatively small factors of an
integer N . It heavily uses the algorithms of Floyd or Brent in order to ¬nd
cycles in sequences de¬ned by a recursion formula. With Pollard™s Rho, the
sequence that we consider needs some additional property. More precisely, if
we denote this sequence by X, we require that for any prime factor p of N , the
sequence X (mod p) is also de¬ned by a recursion formula. This limits the
possible choices for the function F used to compute the sequence. To ensure
this property and have correctly de¬ned sequences modulo each prime factor,
a good approach is to choose for F a polynomial. In fact, the most common
choice is to use:
F (x) = x2 + c (mod N ), (7.7)
for some constant c.
This may seem surprising because the analysis of sequences de¬ned by a
recursion formula is done under the assumption that F is a random function,
which x2 + c is not. However, Pollard™s Rho algorithm is used routinely and
works extremely well. So despite the apparent inconsistency, this choice of

© 2009 by Taylor and Francis Group, LLC
234 Algorithmic Cryptanalysis

F is a good one. A partial explanation is given in [FO90]. More precisely,
they remark that the graph of F (x) = x2 + c is very speci¬c, each node but
one (the value c) has either 0 or 2 predecessors. As a consequence, they also
analyze random binary function graphs where each node has in-degree 0 or
2 and ¬nd that the multiplicative constants appearing in the graph statistics
of the previous section are a¬ected by this change but that their orders of
magnitude are preserved.
Once we choose for F a polynomial, we can easily check that for the sequence
de¬ned by:
Xi+1 = F (Xi ) (mod N ), (7.8)
reduction modulo p for any factor p of N gives the same recursion formula
(modulo p instead of modulo N ) for the reduced sequence.
For a prime divisor p of N , let X (p) denote the sequence X (mod p). We
know that X (p) satis¬es the recursion:
(p) (p)
Xi+1 = F (Xi ) (mod p). (7.9)

Using the usual analysis for these sequences, we expect that X (p) cycles with

parameters s and of the order of p. In that case, Floyd™s or Brent™s

algorithm ¬nd a cycle in time O( p), assuming that we can test for collisions
within the sequence X (p) . To see that it su¬ces to remark that:
(p) (p) (p) (p)
” GCD(Xi ’ Xj , N ) = 1.
Xi = Xj (7.10)

Thus, we can e¬ciently test for collisions using GCD computations. Moreover,
when a collision occurs modulo p, there is no special reason to have collisions
modulo the other factors of N . As a consequence, as soon as a collision occurs,
we usually recover a proper factor of N . Pollard™s Rho using Brent™s collision
¬nding algorithm is given as Algorithm 7.5.
Of course, Pollard™s Rho can also be used in conjunction with Floyd™s al-
gorithm. In both cases, one costly part of the method is that each equality
test requires a GCD computation and, thus, becomes expensive. In practice,
this number of GCD computations can be lowered. The idea is to multiply
(p) (p)
together several consecutive values of the form Xi ’ Xj corresponding to
several equality tests modulo p. Then, we compute a single GCD of the re-
sulting product with N . If this GCD is 1, all the equality tests fail. If the
GCD is a proper factor, we are done. However, if the GCD is N , we need to
backtrack and redo the equality tests one at a time, hoping for a proper factor
of N . The reason behind the need to backtrack is that by grouping several
tests together, we increase the probability of having simultaneous collisions
for the various factors of N .
Pollard™s Rho is a very interesting application of cycle ¬nding algorithms,
which call for several speci¬c comments. The ¬rst comment is that with
Pollard™s Rho, there is no need to compute the parameters s and of the
sequence X (p) . Instead, it su¬ces to ¬nd a collision by two values Xi and

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 235

Algorithm 7.5 Pollard™s Rho factoring algorithm
Require: Input number to factor N , parameter c , max. iterations M
Let x ←’ 0
Let y ←’ x
Let trap ←’ 0
Let nexttrap ←’ 1
for i from 1 to M do
Let x ←’ x2 + c mod N
Let f ←’ GCD(x ’ y, N )
if f = 1 then
if f = N then
Output ˜Fails: collision modulo N ™
Output ˜Found f as factor of N ™
end if
end if
if i = nexttrap then
Let trap ←’ nexttrap
Let nexttrap ←’ 2 · trap
Let y ←’ x
end if
end for
Output Failed

© 2009 by Taylor and Francis Group, LLC
236 Algorithmic Cryptanalysis
Xj which can be arbitrarily located. In particular, we do not need to use
Algorithm 7.3 to ¬nd the start of the cycle. A puzzling question concerning
Pollard™s Rho algorithm is whether we can devise a variation of this algorithm
based on a traditional birthday-based algorithm. In other words, given two
lists of numbers L1 and L2 with comparable sizes and an integer N , can we
e¬ciently discover two numbers x1 in L1 and x2 in L2 such that GCD(x2 ’
x1 , N ) = 1. This is a di¬cult problem which was solved by P. Montgomery
in his thesis as a tool to improve elliptic curve factorization [Mon92].

7.3.2 Pollard™s Rho discrete logarithm algorithm
As many factoring algorithms, Pollard™s Rho can be adapted to compute
discrete logarithms in arbitrary cyclic groups. Let G be a multiplicative group
of prime2 order p, let g be a generator of G and let h be an element of G.
Solving the discrete logarithm problem corresponds to ¬nding an integer ±
such that h = g ± . In order to apply a cycle ¬nding algorithm, we choose
a random looking function F on G. To construct the function F , we ¬rst
partition G into three disjoint subsets G1 , G2 and G3 , approximately of the
same size, preferably with 1 not in G1 . Then, we de¬ne F as follows:
 x if x ∈ G1 ,
F (x) = gx if x ∈ G2 , (7.11)
hx if x ∈ G3 .

Then, we de¬ne the recursive sequence X, by letting X0 = 1. Note that if
1 ∈ G1 , 1 is a ¬xed point of F . Thus in that case, we would need to choose
a di¬erent start-point. This is why it is preferable to have 1 ∈ G1 . As usual,

we expect, after O( p) steps, to detect a cycle in F and a collision Xi = Xj .
By itself, this collision does not su¬ce to recover ± and we need to compute
some additional information about the sequence X.
In order to do this, let us consider the function φ from H = Z/pZ — Z/pZ
to G de¬ned by:
(a, b) ’’ g a hb . (7.12)

This function is a surjection onto G and each element of G has p distinct
preimages in H. Moreover, given two distinct elements in H with the same
image in G, we can compute ±. Indeed, if φ(a, b) = φ(a , b ) then g a+±b =
ha +±b and ± = a ’a (mod p).
b ’b
The next step is to lift the sequence X to H and to ¬nd a sequence Y of
elements in H such that X = φ(Y ). This can be done using the following

2 For a general treatment about discrete logarithms and a justi¬cation for restricting our-
selves to groups of prime order, see Chapter 6, Section 6.4.1.

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 237

function G from H to H:
 (2a, 2b) if φ(a, b) ∈ G1 ,
G(a, b) = (a + 1, b) if φ(a, b) ∈ G2 , (7.13)
(a, b + 1) if φ(a, b) ∈ G3 ,

It is easy to verify that φ(G(a, b)) = F (φ(a, b)) for all elements (a, b) in H.
As a consequence, the sequence Y de¬ned by choosing Y0 = (0, 0) together
with the recursion law Yi+1 = G(Yi ) satis¬es X = φ(Y ).
Of course, Y is a sequence on a much larger set than X, as a consequence,
we do not expect to have a collision in the sequence Y as early as in X.
Thus, the collision returned for the sequence X by the cycle ¬nding algorithm
is usually not a collision for Y . As a consequence, we obtain two di¬erent
preimages from the same element of G and can compute ±.
Note that if we have a collision on Y itself, the discrete logarithm can also
be recovered for a slightly higher cost by ¬nding the entrance of the cycle.
The only exception occurs if Y collides on its initial value Y0 .

7.3.3 Pollard™s kangaroos
Together with his Rho method, Pollard introduced another algorithm for
computing discrete logarithms when some additional information is known.
More precisely, he considers the problem g x = h in G and wants to recover x
from g and h, if we know in addition that x belongs to some interval [a, b]. In
this context, using Pollard™s Rho in the group G, without taking in this extra
information, is not always a good idea. In particular, when b ’ a is smaller
than the square root of the size of G, it is more e¬cient to recover x by brute
force, try all values in [a, b]. To deal with this problem, Pollard proposes to
use what he calls kangaroos. This method is also known as Pollard™s lambda
A kangaroo in Pollard™s method is a sequence of elements in G whose log-
arithm increases by successive jumps. In order to apply the method, we need
two kangaroos, a tame one, whose start-point is g a and a wild one, with start-
point is h. The ¬rst kangaroo is called tame, because the logarithm of the
corresponding sequence is always known. To precisely de¬ne the kangaroos,
we need a jump function j that associates to each element of G a positive num-

ber upper bounded by b ’ a. A classical method is to divide G in k subset
of roughly equal sizes S0 , . . . Sk’1 , with k = log2 b ’ a/2 , and for x in Si to
de¬ne j(x) = 2i . Once the jump function j is ¬xed, we let F (x) = x — g j(x) .
The tame kangaroo is the sequence de¬ned by:

T0 = g a (7.14)
Ti+1 = F (Ti ).

© 2009 by Taylor and Francis Group, LLC
238 Algorithmic Cryptanalysis

The wild kangaroo is the sequence de¬ned by:

W0 = h (7.15)
Wi+1 = F (Wi ).

While computing the tame kangaroo sequence, it is important to keep track
of the discrete logarithm of Ti , i.e., of the value a + k=0 j(Tk ). As soon
as the discrete logarithm goes beyond b, we abort the sequence T and recall
the ¬nal position Tn , together with its discrete logarithm ln , i.e., Tn = g ln .
After this ¬nal point is known, we start the wild sequence keeping track of the
sum of the jumps taken by the wild kangaroo. We abort the wild sequence
either when encountering Tn or when the sum of the jumps becomes larger
than b ’ a. In the latter case, the algorithm fails. Otherwise, we have found a
position Wi with corresponding sum si such that Wi = Tn , since Wi = h — g si
and Tn = g ln we conclude that:

h = g ln ’si . (7.16)

As Pollard™s Rho, Pollard™s kangaroo method is a generic algorithm. This

algorithm has a good probability of success and takes time O( b ’ a) to
compute the discrete logarithm of h; for a rigorous analysis, see [MT09].

7.4 A direct cryptographic application in the context of
blockwise security
Blockwise security is a practical concern about the security of modes of
operation for block cipher which was discovered independently by two groups
of researchers. In [BKN04], it is shown that the practical implementation
of secure shell (SSH) which relies on the cipher block chaining (CBC) mode
of operation is insecure against a variation on chosen plaintext attacks. At
Crypto 2002, in [JMV02], the theoretical idea of blockwise security was in-
troduced. Moreover, in the same paper, it is shown that for several common
modes of operation there is a gap between blockwise and ordinary message-
wise security. After this discovery, several papers such as [FJMV04], [BT04]
or [FJP04] studied this paradigm of blockwise security and came up with
blockwise secure modes. The key result is that there exist modes of operation
with blockwise security levels essentially equivalent to usual messagewise se-
curity levels. More precisely, these modes of operation are secure up to the
birthday paradox bound and after that become insecure. In this section, after
some brief reminders about the CBC mode of operation and its security both
as a messagewise mode and as a blockwise mode, we study the behavior of
this mode beyond the birthday paradox bound and show that in the blockwise

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 239

security model, we can use Brent™s algorithm to devise attacks which work
more e¬ciently than their messagewise counterparts.

7.4.1 Blockwise security of CBC encryption
In Section 6.1, we considered the security of CBC encryption, viewed as a
messagewise process, with the plaintext provided as a whole and the corre-
sponding ciphertext returned as a whole. However, in practice, implementing
this approach may become problematic, since the encryption box needs to be
able to internally store whole messages, which can potentially be very long.
For this reason, practitioners often use CBC encryption as a blockwise mode.
In that case, the plaintext message is transmitted block by block, and each
encrypted block is immediately returned. However, as noticed in [BKN04]
and [JMV02], in this blockwise model, CBC encryption is no longer secure.
Indeed, an active attacker may observe the current ciphertext block C (i) before
submitting the next block of plaintext P (i+1) . As a consequence, this allows
him to choose at will the value v which enters EK by letting P (i+1) = v •C (i) .
This property can be used in several ways to show that CBC encryption is
not secure as a blockwise mode of operation, for example by creating a simple
distinguisher able to make the di¬erence between a CBC ciphertext and a
random message of the same length. To create this distinguisher, it su¬ces
to make sure that the same value v is to be encrypted twice in CBC mode.
Thus, a CBC encrypted message re¬‚ects this equality. Of course, in a random
message, the equality has no special reason to hold.
Interestingly, there is a simple countermeasure to ¬x this problem. This is
called Delayed CBC encryption and is proven to be secure in [FMP03]. The
basic idea on Delayed CBC is very simple, once a ciphertext block C (i) has
been computed, instead of returning it immediately, the encryption device
waits until it gets P (i+1) and then returns C (i) . For the last encrypted block,
the encryption device returns it upon reception of a special block indicating
that the message is complete, this special block is not encrypted, it only
serves a bookkeeping purpose. The rationale behind Delayed CBC is that the
attacker can no longer control the block cipher inputs. Thus, Delayed CBC
prevents the simple attack that breaks plain CBC.

7.4.2 CBC encryption beyond the birthday bound
Beyond the security bound of CBC encryption, the birthday paradox comes
into play. As a consequence, among the ciphertext blocks, we expect to ¬nd at
least one collision, say C (i) = C (j) . If we now replace each block of ciphertext
in this equality by its expression, we ¬nd that:

EK (C (i’1) • P (i) ) = EK (C (j’1) • P (j) ), and thus (7.17)
C (i’1) • P (i) = C (j’1) • P (j) since EK is a permutation.

© 2009 by Taylor and Francis Group, LLC
240 Algorithmic Cryptanalysis

We can rewrite this as P (i) • P (j) = C (i’1) • C (j’1) . As a consequence, given
one collision, we learn the XOR of two blocks of plaintext. As the length of
the message grows, more and more collisions are found and thus more and
more information is learned about the plaintext message.
For a cryptanalyst to use this attack, he must be able to e¬ciently detect
collisions among the blocks of ciphertext. One very common technique is to
proceed as in Chapter 6 and sort the ciphertext blocks in order to detect these
collisions. However, this requires either a large amount of main memory or at
least a fast auxiliary storage device. For 64-bit block ciphers, the cryptanalyst
needs to store and sort about 232 blocks or equivalently 235 bytes. In fact,
the situation is slightly worse, since the cryptanalyst also needs to keep track
of the initial block positions when sorting in order to know the plaintext
blocks involved in the equation derived from the collision. If there is not
enough memory to sort a table containing both the ciphertext value and the
original position, an alternative is to keep the unsorted values somewhere on
an auxiliary storage device, to sort the ciphertext blocks without indices and
then to scan the auxiliary device to locate the original position of the colliding
blocks once their common value is known.
Even with this approach, we need 235 bytes of fast memory. Despite the
quick progress of computer hardware, 32 Gbytes is still a large amount of
memory, only available on dedicated computers. Thus, in practice, even when
CBC encryption is used with a 64-bit block cipher, say Triple-DES, putting
the attack into play requires some important computing e¬ort from the crypt-
analyst. For this reason, among others, CBC encryption with Triple-DES is
still widely encountered.

7.4.3 Delayed CBC beyond the birthday bound
After considering the security of messagewise CBC beyond the birthday
bound, we now turn to blockwise security of Delayed CBC encryption beyond
this bound. In order to construct this attack, we are going to use a trick
similar to the one used for attacking ordinary blockwise CBC. More precisely,
whenever we receive a block of ciphertext, we determine the next block of
plaintext as a function of this block and send it back to CBC encryption.
Due to the one block delay, we can only start this from the second block of
plaintext. For the ¬rst plaintext block, we can submit any value of our choice,
such as the all-zeros block. Starting from the second block, we choose our next
plaintext block as:
P (i) = P (i’1) • C (i’2) . (7.18)
Even with Delayed CBC, both values Pi’1 and Ci’2 are e¬ectively known at
the time we need to submit Pi . Now, with this speci¬c choice for plaintext
blocks, the ciphertext values are determined by the following equation:
C (i) = EK (C (i’1) • P (i’1) • C (i’2) ) (7.19)
= EK (EK (P (i’1) • C (i’2) ) • (P (i’1) • C (i’2) )).

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 241

In other words, if we let Zi denote the input to the block cipher in round i,
i.e., Zi = P (i) • C (i’1) , we see that Zi satis¬es a recursion law:

Zi = EK (Zi’1 ) • Zi’1 . (7.20)

Quite interestingly, xoring the input and the output to a block cipher is a
pretty standard construction used to build a pseudo-random function from
a pseudo-random permutation, known as Davies-Meyer construction. This
Davies-Meyer construction is often used in hash functions. For example, it is
encountered in SHA-1. If we de¬ne the function F by F (x) = x • EK (x), the
recursion law becomes Zi = F (Zi’1 ).
Using the analysis of Section 7.2, we know that the sequence Z loops back
to itself after about O(2n/2 ) steps. Moreover, we can consider using a cycle
detection algorithm to discover this loop. Floyd™s algorithm
Remember that to use Floyd™s algorithm, we need to compute in parallel the
sequence Zi and the sequence Wi = Z2i in order to ¬nd an equality Zi = Wi .
Without going further into the details, we see that this is troublesome in our
context of blockwise CBC encryption. Indeed, since we do not control the
initial values C0 used during encryption, there is no way to compute both
sequences Zi and Wi at the same time. Of course, we can always store Z and
compare Zi with Z2i at each step, but this requires a large amount of memory,
which is exactly what we are trying to avoid. Still, note that this approach
would be better than sorting and could even work with slow memory or disk
storage. Brent™s algorithm
On the contrary, with Brent™s algorithm, we immediately see that the loop
we induced in Delayed CBC encryption can easily be detected. Following our
description from Section 7.1.2, whenever the index i is a power of 2, we store
this value Z2t and then compare it to the subsequent values of Zi up to the
next power of two. This can be done without any need to compute the same
sequence twice or to control initial values.
In some sense, detecting a loop is already an attack. Indeed, this can
be seen as a real or random distinguisher. If the output is real, i.e., comes
from a blockwise CBC encryption, we ¬nd a cycle in time O(2n/2 ) with high
probability. If the output is random, then Brent™s algorithm almost never
¬nds a new value equal to the stored value. However, we can strengthen the
attack and do more than this basic distinguisher. In fact, we can use the cycle
to obtain the encryption of any value v of our choice under the underlying
block cipher EK . For this, when storing Z2t , which is a block cipher input, we
t t
also store the corresponding output C (2 ) . Due to the one block delay, C (2 )
is not available immediately, but we store it whenever we receive it. When

© 2009 by Taylor and Francis Group, LLC
242 Algorithmic Cryptanalysis

the collision is detected, we have a new input Zi equal to Z2t . Of course,
this equality between inputs to the block cipher implies equality between the
output. So we know in advance that C (i) = C (2 ) . Thanks to this knowledge,
we can set P (i+1) = C (i) • v and thus obtain the encrypted value EK (v).
This shows that thanks to Brent™s algorithm, Delayed CBC encryption as
a blockwise mode of operation is (slightly) more vulnerable to attacks be-
yond the birthday paradox bound than ordinary CBC encryption used as a
messagewise mode of operation. Nivasch™s algorithm
In this context of blockwise attacks, Nivasch™s algorithm can be very useful.
Indeed, it allows us to reduce the amount of required ciphertext before the
cycle is detected. In fact, using Algorithm 7.4 as presented in Section 7.1, we
are guaranteed to require at most s + 2 blocks of ciphertext. Of course, it
would be interesting to further reduce this number. Clearly, the ¬rst collision
occurs at position s+ and we cannot use fewer than s+ blocks of ciphertexts
to detect a collision. Following Nivasch in [Niv04], it is possible to devise a
variation of Algorithm 7.4 that uses some additional memory and can detect
the cycle with s + (1 + ±) blocks, for arbitrary small values of ±. This
algorithm uses a technique called partitioning and its memory requirements
increase as ± decreases.

7.5 Collisions in hash functions
Another application of cycle detection algorithms is the search for collisions
in hash functions, especially when their output length is too short. For ex-
ample, given a hash function with a 128-bit output, thanks to the birthday
paradox, we can ¬nd a collision with roughly 264 evaluations of the hash func-
tion. However, it is clearly infeasible with current computers to store and
sort the corresponding list of 264 hash values in order to discover the collision.
Thus, it is natural to ask how to use cycle detection algorithms to obtain a
collision. The ¬rst answer is to simply invoke the cycle detection algorithm
on a recursive sequence, where each value is obtained by hashing the previous
value. Yet, this answer is not satisfactory for several reasons. The ¬rst reason
is that this approach can only produce a meaningless collision and output
two random looking messages with the same hash. Of course, it would be
much preferable for a cryptanalyst to obtain two meaningful messages with
di¬erent meanings and the same hash. The second reason is that the cycle
detection algorithms presented in Section 7.1 are inherently sequential. And
while performing 264 operations is feasible with today™s computers, perform-
ing 264 sequential operations on a single computer is a completely di¬erent

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 243


7.5.1 Collisions between meaningful messages
Finding a collision between meaningful messages is a now classical trick,
¬rst introduced in [Yuv79]. The idea is to start by choosing two messages
M and M with di¬erent meanings. Then in each message, we need to ¬nd
a number of places where safe modi¬cations can be performed. Here, we
call safe modi¬cation any change which does not alter the meaning of the
message. There are usually many possible safe modi¬cations in a human
readable message. We could replace a space by a line feed, a non-breaking
space or even two spaces. Similarly, at some points in a message, punctuation
can be added or removed; upper-case and lower-case letters can be exchanged
without altering the message™s meaning. Given t places of safe modi¬cation
for a message M , it is possible to construct a set of 2t di¬erent messages
with the same meaning. Note that for the hash function the 2t messages
are genuinely di¬erent and we obtain 2t random looking3 hash values. If we
proceed in the same way, we also obtain 2t di¬erent copies of M . Clearly,
for a n-bit hash function, with t > n/2 we expect the existence of a collision
between a copy of M and a copy of M .
The same idea of safe modi¬cations can also be used in conjunction with a
cycle ¬nding technique. However, we need some extra care to avoid uninter-
esting collisions between two copies of M or two copies of M . Given t points
of safe modi¬cation of M , for any t-bit number i, we denote by Mi the copy
obtained by setting the speci¬c value of each point of modi¬cation according
to the value of a corresponding bit in the binary expansion of i. Similarly, we
denote by Mi the i-th copy of M . Using this notation, we can construct a
function F from t + 1 to n bits as follows:

Mi if v = 0,
F (v, i) = (7.21)
Mi if v = 1.

Adding a truncation from n bits to t + 1, we can obtain a sequence of hash
values of meaningful messages and look for a collision. However, this direct
approach with t ≈ n/2 as above usually yields a trivial collision. Indeed, if
two hash values collide after truncation to t+1, at the next iteration we hash a
previously encountered message and enter a cycle with a collision between the
hash values of twice the same message. To avoid this bad case, we need more
points of safe modi¬cation, namely we need to choose t = n ’ 1. With this
choice, the entrance of the cycle corresponds to a genuine collision. Moreover,
with probability 1/2 this genuine collision is between a copy of M and a copy
of M . If not, we can simply restart the algorithm with a di¬erent start-point.

3 Unless the hash function is badly broken.

© 2009 by Taylor and Francis Group, LLC
244 Algorithmic Cryptanalysis

7.5.2 Parallelizable collision search
Since cycle ¬nding algorithms are inherently sequential, in order to ¬nd
collisions using a parallel computation, we need a di¬erent approach. This ap-
proach cannot look for a cycle within a single recursively de¬ned sequence, but
instead needs to rely on the computation of several independent sequences.
We already introduced an algorithm of this kind to compute discrete loga-
rithms in a short interval: Pollard™s Kangaroo algorithm (see Section 7.3.3).
However, to use this idea in a parallelizable way, we need to adapt it. When
running on a parallel computer, it is essential to reduce the amount of commu-
nication and synchronization between parallel processes. A common approach
when looking for a speci¬c value or property among a large set of data is to
use a large number of computers to create likely candidates and then to report
these candidates to a dedicated server which performs the ¬nal checking. For
a collision search, we need to report values which are likely to collide. This can
be achieved using the distinguished point technique from [QD90] and [vW96].
Note that in distinguished point algorithms, one node in the parallel com-
puter plays a special role. This node receives distinguished points from all
other nodes, checks for collisions within the set of distinguished points using
a sort based birthday algorithm as in Chapter 6 and pulls back the collisions
from this set of distinguished points to the real values. It is very important to
devise parallel collision search algorithms in a way that minimizes the com-
munication cost, the memory requirement and the computational overhead of
this server node.
In this section, we present one of the numerous possible options for paral-
lelizable collision search on a random function F . We assume that F operates
on a set of N elements and that the parallel computer consists of P identical
parallel nodes and a special server node. We also choose a set of D distin-
guished points. For algorithmic convenience, the distinguished points should
be easy to recognize and easy to generate. A typical example of set of dis-
tinguished points is the set of n-bits integers with d leading zeros. This set
contains D = 2n’d elements in a larger set of cardinality N = 2n . Each distin-
guished point x is easily recognized by the property: 0 ¤ x < 2n’d . Moreover,
with this de¬nition of distinguished points, it is very easy to generate random
distinguished points at will.
With this setting, our algorithm works as follows. First, the server node
allocates a (distinct) distinguished point to each computation node. Given
this distinguished point, the computation node uses it as start-point for Ni-
vasch™s cycle detection Algorithm 7.4, with a small change. Namely, when
Nivasch™s algorithm encounters a distinguished point, it aborts and sends a
report. The report contains three values: the starting distinguished point, the
distinguished end-point and the length of the path between the two points.
The server node maintains a list of reports and detects collisions between
end-points. If a collision occurs between the values taken by F during the
global computation, necessarily this implies the existence of either a cycle on

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 245

one computation node or a collision between two di¬erent nodes. Both pos-
sibilities are detected by our algorithm, in the ¬rst case, Nivasch™s algorithm
detects the cycle, in the second case, the server node discovers a collision
between two end-points. From this collision, between end-points, the real col-
lision is easily recovered. Indeed, assume that there is a path of length L1
from a distinguished point S1 to a end-point P and another path of length L2
from S2 to P . Without loss of generality, we can further assume that L1 ≥ L2 .
Then, starting from S1 , the server node walks the ¬rst path, taking exactly
L1 ’ L2 steps. From that point, it now su¬ces to walk both paths in parallel,
until a collision is reached. Note that, it is always a genuine collision. Indeed,
since E is the ¬rst distinguished point encountered by walking from S1 , S2
cannot lie on the ¬rst path. As a consequence, the two parallel walks start
from di¬erent places and end at the same position. Moreover, they have the
same length and the collision necessarily occurs after taking the same number
of steps on both paths.
Since any collision between two of the computed values √ F is detected,
the number of values that need to be computed remains O( N ), as with se-
quential cycle detection algorithms. As a consequence, with P processors, the
parallelization speed-up is essentially optimal. One caveat is that the com-
puted values are not integrated into the global count until the corresponding
distinguished point is reported. Thus the number of distinguished points
should be adequately chosen. With D distinguished points, the fraction of
distinguished points is N/D. Thus the average length of a path between two
distinguished points is essentially N/D. Note that for some start-points, we
may enter a cycle. In this case, we cannot nicely de¬ne the distance between
two distinguished points. Ignoring this technical di¬culty, we ¬nd that, as
a consequence, running our algorithm on P processors allows it to compute

P N/D points. In order to have O( N ) values of F , we need to choose:

D = O(P N ). (7.22)

Since the server node needs to store P paths, i.e., one per computation node,
the amount of required memory is O(P ). Finally, the running time of the
code that extracts the collision on F from a collision between distinguished
points is upper bounded by the time needed to walk through two paths, i.e.,

O(N/D) = O( N /P ).
Note that with the above description and analysis, the computation is not
perfectly balanced between processors. Indeed, some paths between distin-
guished points are shorter than others. To improve this imbalance, an easy
patch is to serially run several instances of Nivasch™s algorithm on each com-
putation node, in order to average the paths™ length. This increases the pa-
rameter P and as a consequence the memory required on the server node.
Another remark is that when P becomes large, collisions coming from a
cycle within a single path become extremely rare. As a consequence, we
may replace Nivasch™s algorithm by a simpler one, which simply runs from a

© 2009 by Taylor and Francis Group, LLC
246 Algorithmic Cryptanalysis

distinguished point to the next, possibly aborting when it takes too long to
¬nd a distinguished end-point. The only drawback of this approach is that the
exact analysis of the algorithm™s behavior becomes much more cumbersome. Looking for many collisions
Another interesting aspect of the above parallelizable collision search algo-
rithm is that it can be used to e¬ciently construct a large number of collisions.
Note that the sequential cycle ¬nding algorithms are not very e¬cient for con-
struction multiple collisions. Indeed, to construct k collisions, we essentially

need to repeat the cycle ¬nding algorithm k times, for a total cost O(k N ).
On the contrary, with the above parallel algorithm, the birthday paradox con-
tinues to apply and to play for us after the ¬rst collision. As a consequence,

the overall cost to construct k collisions is reduced to O( kN ). However, we
should be careful when choosing the set of distinguished points. In particular,
it is clear that each stored path can yield at most one collision. As a conse-
quence, the parameter P should be chosen larger than the number of desired
collisions k.
An alternative would be to use arbitrary points instead of distinguished
points as start-points for the stored path. However, this approach introduces
a new drawback. Indeed, nothing would prevent us to compute twice the same
path or, more precisely, to compute two paths where one is a subpath of the
other. These two paths would not yield any new collision and would increase
the computational cost without any gain.

7.6 Hellman™s time memory tradeo¬
Hellman™s time memory tradeo¬ is a generic method for recovering the
key K of a block cipher E. This method assumes that the cryptanalyst can
perform a massive precomputation about E beforehand and memorize some
information summarizing this precomputation, in order to help in recovering
the key K at a later stage. The goal is to achieve a non-trivial time mem-
ory tradeo¬ for the late stage of the attack. Two trivial possibilities are to
either memorize nothing and use brute force to recover K, or to memorize a
sorted table containing all pairs (EK (0n ), K) and to perform a table lookup
of EK (0n ) in order to recover K.
Despite the fact the Hellman™s time memory tradeo¬ uses a large amount
of memory, it is presented in this chapter for two main reasons. First, the
behavior of Hellman™s tradeo¬ is deeply connected with the analysis of random
functions. Second, Hellman™s tradeo¬ is extremely useful for extracting a
block cipher key after the blockwise attacks of Section 7.4.
Hellman™s algorithm is based on the analysis of the function F (K) =

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 247

EK (0n ) from the set of possible keys to itself. Note that if the key size and
the block size are di¬erent, this function needs to be adapted. If the key size
is smaller than the blocksize, we need to truncate the result. If the key size is
larger than the blocksize, then we concatenate several encryptions (say choos-
ing the constant blocks, 0n , 0n’1 1, . . . ) and truncate the concatenation if
necessary. Breaking the block cipher E and ¬nding K is clearly equivalent to
computing the inverse of F on the point EK (0n ) obtained from an encryption
of 0n . Thus, Hellman™s tradeo¬ is usually considered to be a chosen plaintext
attack. In fact, it is slightly better than that, since we do not insist on the
choice of 0n . The only requirement is to make sure that there is some ¬xed
value v, for which it is easy to obtain an encryption of EK (v). For exam-
ple, if the adversary knows that users routinely encrypt messages with some
¬xed headers, which is a frequent feature of several word processors, then,
assuming a deterministic encryption mode, chosen plaintext attacks may not
be necessary. Hellman™s algorithm gives another reason why using random-
ized encryption modes is important when using block ciphers. Indeed, with a
randomized mode, the adversary can no longer obtain the encryption of his
chosen ¬xed value and Hellman™s method can no longer be used.

7.6.1 Simpli¬ed case
To understand the key idea behind Hellman™s algorithm, we ¬rst consider
a very special case and assume that F is a cyclic permutation. In real life,
this never happens. However, in this case the analysis is much simpler. With
this unlikely hypothesis, we can start from an arbitrary initial key K0 and
precompute the complete sequence given by Ki+1 = F (Ki ). This sequence
follows the cycle of F and goes in turn through all possible keys for the
block cipher, eventually coming back to K0 after 2k iterations, where k is the
number of key-bits. Every 2l steps, with l = k/2 , we memorize the pair
(Kj·2l , K(j’1)·2l ). In other words, we remember the current key together with
its ancestor located 2l steps before. Once we have collected the 2k’l possible
pairs around the cycle, we sort these pairs by values and keep this sorted
Given access to this array of sorted pairs, we can now invert F quite easily
in 2l steps. Starting from EK (0n ) for an unknown key K, we look up this
value in our array. If it is not present, we apply F and look up again. After t
steps, we ¬nd a value F (t) (EK (0n )) in the array. For this value, we read the
second number in the pair, thus going 2l steps backward in the cycle of F .
Then, we apply F again 2l ’ t ’ 1 times and recover the (unique) preimage
of EK (0n ), i.e., the key K.
In this simpli¬ed case, Hellman™s tradeo¬ needs to memorize 2k’l , i.e.,
2l or 2l+1 with our choice for l, pairs and the running time of the attack
stage (omitting the precomputation phase) is l · 2l , when counting the cost
of the dichotomy lookups in the sorted array. Note that this technique is

© 2009 by Taylor and Francis Group, LLC
248 Algorithmic Cryptanalysis

very similar to the baby step, giant step algorithm described in Section 6.4.2.
Reformulating Hellman™s algorithm in this context, we see that to take a single
baby step backward, we need to take many baby steps forward and a single
giant step backward.

7.6.2 General case
In the general case, F is no longer cyclic and no longer a permutation. More-
over, we expect it to behave like a random mapping. Thus, choosing a random
start-point and applying F iteratively quickly leads to a cycle of square-root
size. As a consequence, if we want to cover more than a negligible fraction of
the keys, we clearly need to use many start-points. Another di¬culty is that
with several start-points, nothing prevents two di¬erent sequences to merge
at some point. In fact, any two starting points in the giant component lead to
merging sequences. At the latest, the sequences merge somewhere in the cycle
corresponding to the giant component. Even worse, ¬nding a preimage for
EK (0n ) is not su¬cient to recover K, since preimages are no longer unique.
In order to bypass all these problems, Hellman™s tradeo¬ requires more time
and more memory than in the simpli¬ed case. The usual balance (neglecting
logarithmic factors) is 22k/3 in terms of both time and memory.
The classical analysis of Hellman™s tradeo¬ assumes that we are computing
sequences of length t for m random start-points. For each sequence, the start-
and end-points are memorized, sorted by end-point values. In order to ¬nd a
preimage by F , the same idea as in the simpli¬ed case is used. From a start-
point, F is applied repeatedly until a memorized end-point is reached. Then
the algorithm jumps back to the corresponding start-point and moves forward
again. Several cases are possible, in the most favorable we discover K, in the
other cases, we fail, either by ¬nding a wrong preimage for EK (0n ), i.e., a
di¬erent key that yields the same encryption, or by not even coming back to
the point EK (0n ). The probability of success is of the order of mt/2k as long
as mt2 remains below 2k . If we go beyond this limit, the number of collisions
between chains becomes very large and the tradeo¬ becomes unworthy. Taking
m = t = 2k/3 , we ¬nd a probability of 2’k/3 with memory requirements
of O(2k/3 ) keys and running time O(k2k/3 ). To improve the probability,
Hellman proposes to use 2k/3 di¬erent choices for the function F , by slightly
changing the way a ciphertext EK (0n ) is converted back into a key. Assuming
independence between the tables generated for each of these functions, the
success probability increases to a constant fraction of the key space with time
and memory requirements 22k/3 (neglecting logarithmic factors).

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 249

1. At the end of Floyd™s algorithm, prove that t is the smallest multiple of
greater than or equal to s.
2h . Given a discrete logarithm problem y = g x with the additional informa-
tion that every bit in odd position in x is a 0, devise a generic algorithm
to e¬ciently obtain x, assuming that there are n unknown bits in x.

3. In this exercise, we consider the simultaneous use of several copies of
Nivasch™s algorithm.
(a) Given a sequence Xi+1 = f (Xi ) and two di¬erent order relations
on elements Xi , run in parallel two copies of Nivasch™s algorithm
de¬ning one with each order relation. We detect the loop as soon
as one of the two copies does. Assuming that the result of each dif-
ferent comparison between unequal values behaves randomly, when
do you expect this detection to happen?
(b) What can be achieved with t di¬erent copies of Nivasch™s algo-
(c) Describe an application where this technique o¬ers a noticeable
4h . What is the expected complexity of ¬nding the cycle™s start given the
complete state of Nivasch™s algorithm, after detecting a collision? Can
this be improved with t copies of the algorithm?

5. Implement the search of hash collisions using a cycle ¬nding algorithm
on a reduced-sized hash function. For example, you can use a truncated
copy of SHA.
6h . Devise a method for searching many collisions (not multicollisions, just
several independent collisions) within a hash function, using a cycle ¬nd-
ing technique. Compare with seach for many collisions using a birthday
paradox approach with memory.
7. Open problem. Find a method to construct multicollisions (at least
a 3-multicollision) using a cycle ¬nding approach.

© 2009 by Taylor and Francis Group, LLC
Chapter 8
Birthday attacks through

In practice, cryptanalytic attacks based on the birthday paradox are often
limited by the amount of memory they require. Indeed, the time and mem-
ory requirements of birthday paradox based attacks are roughly the same
and for practical purposes, memory costs much more than time. Of course,
when looking for collisions involving functions, we can use the techniques of
Chapter 7, however, their range of applicability is limited. In this chapter,
we study other speci¬c instances of the birthday-based attacks for which the
techniques of Chapter 7 cannot be applied, but which, nonetheless can be
tackled using less memory than the generic methods of Chapter 6. These
techniques presented here reduce the required amount of memory, without
signi¬cantly increasing the running times. In fact, even when there is enough
memory available to perform a generic birthday attack, using these variations
may improve performance, due to cache e¬ects.

8.1 Introductory example: Subset sum problems
To serve as an introduction to the techniques presented in this chapter, we
present an algorithm of Schroeppel and Shamir [SS81] ¬rst presented in 1979.
This example studies the problem of ¬nding all the solutions to a given subset
sum problem. We ¬rst recall the de¬nition of a subset sum problem.

DEFINITION 8.1 A subset sum problem consists, given n positive inte-
gers xi and a target sum s, in ¬nding all the solutions to the equation:
ei xi = s, (8.1)

where the ei values are integers equal to either 0 or 1.
The subset sum problem also has an associated decision problem where the
goal is to say whether a solution exists or not.

© 2009 by Taylor and Francis Group, LLC
252 Algorithmic Cryptanalysis

The subset sum problem is also called the knapsack problem.

8.1.1 Preliminaries
Clearly, the subset sum problem can be broken by brute force. It su¬ces
to compute the 2n di¬erent sums. This requires essentially no memory and
has a running time of 2n evaluations of a sum of n terms. One can do better
with a simple application of the birthday paradox: ¬rst split the x values into
two sets of roughly equal sizes, compute L1 the list of all possible sums for
the ¬rst set and L2 the list obtained by subtracting from s all the possible
sums for the second set. Any collision between L1 and L2 corresponds to an
n/2 n
ei xi = s ’ ei xi , (8.2)
i=1 i= n/2 +1

thus yielding a solution to the knapsack problem. As usual, this birthday-
based attack has a runtime O(n2n/2 ) and require to store 2 n/2 partial sums
in memory. For values of n where the brute force attack is at the edge of
possible computations, say n = 80 it can be argued whether the birthday-
based attack is better than the brute force attack. In practice, it is easier to
perform 280 computations in a distributed e¬ort on many small computers or
to ¬nd a large computer with enough memory to store 240 numbers?
With the results of Chapter 7 in mind, it is natural to try solving such
a knapsack problem using a cycle ¬nding algorithm. At ¬rst, this approach
seems reasonable. Indeed, assuming for simplicity that n is even, we can
easily de¬ne a function F on n/2 + 1 bits that maps values starting by 0 to
a sum of the ¬rst n/2 numbers in the knapsack as in the left-hand side of
Equation (8.2). The same function would map values starting by a 1 to the
target value s minus a sum of the last n/2 numbers, as in the right-hand side
of the same equation. Clearly, a random collision on F yields a solution to the
knapsack problem whenever the colliding values di¬er on their ¬rst bits, which
should heuristically occur with probability roughly 1/2. As a consequence, it
is natural to try using cycle ¬nding algorithms to obtain a collision on F .
However, in many cryptographic applications of knapsacks, the values of the
knapsack elements are large numbers, of n bits or more. Otherwise, given
a possible value for the sum s there would be a huge number of solutions.
As a consequence, F is an expanding function which maps n/2 + 1 bits to
approximately n bits. For this reason, it cannot directly be used repeatedly
to construct an ultimately periodic sequence. Instead, we need to use some
function G to truncate the n bits back to n/2 + 1. Of course, from a starting
point y0 we can easily de¬ne a sequence, using the recursion formula:

yi+1 = G(F (yi )). (8.3)

© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 253

Clearly, this sequence usually leads to a cycle which can be found using a
cycle detection algorithm. Yet, this cycle does not usually produce any real
collision and cannot yield a knapsack solution.
The reason for this phenomenon is that by writing down the sequence y,
we also implicitly de¬ned another sequence z by:

z0 = F (y0 ) and zi+1 = F (G(zi )). (8.4)

This implicit sequence z also yields a cycle. However, z values are short
numbers on n/2 + 1 bits, while y values are long numbers on n bits. As a con-
sequence, with overwhelming probability the sequences enter a cycle through
a z-collision and not through a y-collision. This means that we eventually ¬nd
an initial collision zi = zj which de¬nes the entrance of the cycle. This colli-
sion of course leads to a collision on the subsequent y values, i.e., yi+1 = yj+1 .
However, both values result from the same sum expression and thus cannot
o¬er a solution to the knapsack problem. It is worth noting that for knapsack
problems based on small knapsack elements, G would no longer be necessary
and this approach would work.

8.1.2 The algorithm of Shamir and Schroeppel
To overcome this di¬culty, Shamir and Schroeppel devised an algorithm
which uses less memory than the birthday attack but still requires some non-
negligible amount of memory. Before presenting the algorithm of Shamir and
Schroeppel, it is useful to ¬rst remember from Section 6.3 that the birthday
attack can be implemented in two di¬erent ways. The ¬rst possibility is to
store a sorted copy of L1 , then to generate on the ¬‚y elements of L2 and
search them by dichotomy in L1 . The second possibility is to sort both L1
and L2 then to read both lists sequentially, advancing in turns either in L1 or
in L2 , until collisions are found. In the generic setting, the second possibility
is the more costly in terms of memory, because two lists1 need to be stored.
However, in order to implement this option, we do not really need to store L1
and L2 , instead we only need to be able to go through L1 (and L2 ) step by step
in increasing order. The key idea introduced by Shamir and Schroeppel shows
that for subset sum problems, we can produce the elements of L1 in increasing
order without fully storing L1 . We also need to produce the elements of L2 in
increasing order. Since L2 is constructed by subtracting a partial subset sum
from s, it means that we need to produce the corresponding partial sums in
decreasing order. This can be achieved through a straightforward adaptation
of the method used with L1 .
Assume for now than we are given a subset of the knapsack elements and
denote them by y1 , . . . , yl . We need a memory e¬cient subroutine to produce

1 Moreover, when n is odd, the second list is twice as long as the ¬rst.

© 2009 by Taylor and Francis Group, LLC
254 Algorithmic Cryptanalysis

all the sums for this subset in either increasing or decreasing order. For this,
we ¬rst split l roughly in half, letting l = l1 + l2 and construct two sets:

ei yi | ∀ (e1 , · · · , el1 ) ∈ {0, 1}l1
Y1 = and (8.5)
ei yi | ∀ (el1 +1 , · · · , el ) ∈ {0, 1}l2
Y2 = .
i=l1 +1

Any knapsack element can be uniquely written as the sum of an element from
Y1 and an element from Y2 .
To solve our problem we ¬rst ask the following question: “Given two such
sums, σ = γ1 + γ2 and σ = γ1 + γ2 what can we say about the relative orders
of σ and σ from the relative orders between the γ values?” Clearly, if γ1 ≥ γ1
and γ2 ≥ γ2 then σ ≥ σ . Similarly, when both γs are smaller, then σ is
smaller. However, when if γ1 ≥ γ1 and γ2 ¤ γ2 , prediction is not easy.
To make good use of this partial information, we ¬rst sort Y2 , then for each
number in Y1 , we add it to the ¬rst element of Y2 and memorize the sum
together with its decomposition. For e¬ciency, we memorize these sums in
a binary search tree. This allows us to retrieve these sums in sorted order.
After this initialization, we proceed as follows, at each step take the smallest
sum from the tree and produce it as the next element of Y1 + Y2 . After that,
we look at the decomposition, update the sum and put it back in the tree.
The update is done by keeping the same number from Y1 and by moving to
the next number in Y2 (which is easy, since Y2 is sorted). After the update,
we put the new value back into the tree, unless the end of Y2 was reached.
Using a self-balancing tree, since the size of tree is the size of Y1 , the cost of
insertion and deletion is guaranteed to be O(log |Y1 |).
To generate the values in decreasing order, two minor changes2 are required,
start from the end of Y2 instead of the beginning and take the last tree element
at each step instead of the ¬rst one. In both versions of the algorithm, each
element of Y1 is paired with each element of Y2 and thus the set Y1 + Y2
is completely constructed. Putting together one copy of algorithm going in
increasing order for Y1 + Y2 with another copy going in decreasing order for
Y3 + Y4 (and thus in increasing order for s ’ Y3 ’ Y4 ) allows us to ¬nd all
occurrences of s in Y1 + Y2 + Y3 + Y4 . From a complexity point-of-view, this
requires a running time O(n2n/2 ) and a memory of O(2n/4 ) integers. The
running time is the same as the running time of the generic birthday attack,
but the required memory is much lower. Note that the constants implicit in
the O notations depend on the value of n modulo 4. The best case occurs
when n is a multiple of 4.

2 Alternatively,we can simply use the exact same algorithm as before, using the reversed
order instead of the natural order on integers.

© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 255

Algorithm 8.1 Initialization of Shamir and Schroeppel algorithm
Require: Input knapsack elements y1 , . . . , y
Let 1 ←’ /2 and 2 ←’ ’ 1
Construct an array Y2 containing the 2 2 possible sums of y 1 +1 , . . . , y .
Sort the array Y2
Create an empty AVL-tree T and a set a 2 1 free nodes for this tree structure
for (e1 , · · · , e 1 ) in {0, 1} 1 do
Compute Y1 ←’ i=1 ei yi 1

Take the next free node N
Let Nvalue ←’ Y1 , Nindex ←’ 0 and NsortKey ←’ Y1 + Y2 [0] {Insert
into the ¬elds of N the value Y1 , the index 0 and the sorting key Y1 +Y2 [0]}
Insert N into T {according to the value of the sorting key}
end for
Output T and Y2 for use in Algorithm 8.2

Algorithm 8.2 Get next knapsack sum with Shamir and Schroeppel algo-
Require: Input array Y2 and current state of tree T
If T is empty, Output ˜Finished™
Remove the smallest node of T , let N be this node
Let S ←’ NsortKey
Let Nindex ←’ Nindex + 1
if Nindex < 2 2 then
Let NsortKey ←’ Nvalue + Y2 [Nindex ]
Insert modi¬ed node N back into the tree T
end if
Output sum S and updated tree T

© 2009 by Taylor and Francis Group, LLC
256 Algorithmic Cryptanalysis

Clearly, the algorithm of Shamir and Schroeppel does not make use of the
strong internal structure of the list Y1 , Y2 , Y3 and Y4 which are built by
adding together knapsack elements. However, taking advantage of this extra
structure to either speed up the search for a knapsack solution or further
reduce the required memory is a nice open problem. Not using the internal
structure of Y1 , Y2 , Y3 and Y4 also means that the algorithm of Shamir and
Schroeppel has a large range of applicability. We study it in the sequel of
this chapter together with other algorithms which present the same kind of
tradeo¬ between time and memory.

8.2 General setting for reduced memory birthday at-
The idea of splitting a problem in four parts instead of two for the usual
birthday attacks, may be used in a variety of applications. In the general
setting, we are given four sets L1 , L2 , L3 and L4 of elements from a group G
with group operation , together with a target group value g. The problem
is then to ¬nd all solutions (g1 , g2 , g3 , g4 ) ∈ L1 — L2 — L3 — L4 of the equation:
g1 g2 g3 g4 = g. (8.6)
Assuming that the sets have sizes N1 , N2 , N3 and N4 of the same order
of magnitude, a generic birthday-based attack exists for this problem. This
generic attack ¬rst constructs the two sets:
L = {g1 g2 | ∀ (g1 , g2 ) ∈ L1 — L2 } and (8.7)
g4 )’1 | ∀ (g3 , g4 ) ∈ L3 — L4 ,
L= g (g3
where h’1 denotes the inverse of h in the group G. Then the attack searches
for collisions between L and L . Each collision yields a solution of Equa-
tion (8.6).
The size of L is N1 N2 and the size of L is N3 N4 . Assuming without loss
of generality that N1 N2 ¤ N3 N4 , the time complexity of the generic attack
is O(max(N1 N2 log(N1 N2 ), N3 N4 )) and the memory complexity measured in
number of group elements is N1 N2 . The knapsack example from Section 8.1
shows that for the additive group of integers (Z, +), the memory complexity
can be lowered without increasing the time complexity. More generally, we
can ask for a list of groups where this improvement can be achieved. In
the sequel, we give a list of speci¬c examples where the improvement can be
achieved, followed by a more general treatment and examples of groups where
no known method exists.
Note that without loss of generality, we can always consider that the target
value g is the group identity element. Indeed composing Equation (8.6) with

© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 257

g ’1 under the group operation, we ¬nd:

g ’1 ) = 1G .
g1 g2 g3 (g4 (8.8)

g ’1 to
Thus it su¬ces to transform L4 by replacing each element g4 by g4
make g disappear from Equation (8.6).

8.2.1 Xoring bit strings
A simple and natural group to consider in computer science and cryptogra-
phy is the group of n-bit strings, together with the group operation obtained
by bitwise xoring strings. Mathematically, this group can be viewed in many
di¬erent ways, the simplest is to consider it as the n-th fold direct product
Gn where G2 is {0, 1} with addition modulo 2. Alternatively, adding some
more structure, it can also be viewed as the additive group of the ¬nite ¬eld
F2n .
In this group, our generic problem reduces itself to ¬nd all quadruples of
values whose XOR is equal to the speci¬ed target. As usual, we can assume
without loss of generality that the speci¬ed target is the group identity ele-
ment, i.e., the string 0n .
Clearly, with this group, the approach used by Shamir and Schroeppel for
subset sum problems cannot work. Indeed, this approach heavily relies on the
existence of an order ¤ on Z , compatible with the group operation. In Gn , no
such order exists. Instead, we need to use a very di¬erent method introduced
in [CJM02].
The basic idea of the method starts by selecting a subset of t bits among
the n available bits. For simplicity, we choose the t high bits of the n-bit
numbers that encode elements of Gn . For each value x in Gn , we denote by
2 2
[x]t the restriction of x to its t high bits. This restriction is compatible with
the XOR operation, more precisely we have:

[x • y]t = [x]t • [y]t . (8.9)

Thus for any solution (x1 , x2 , x3 , x4 ) of Equation (8.6) rewritten in Gn as:

x1 • x2 • x3 • x4 = 0n (8.10)

we necessarily have:
[x1 • x2 ]t = [x3 • x4 ]t . (8.11)
In other words, the value of the t high bits is the same when xoring the leftmost
elements x1 and x2 , or when xoring the rightmost elements x3 and x4 . Of
course, we do not know in advance this middle value of the t high bits; however,
we know for sure that it is shared between both sides. As a consequence,
denoting by Mt this middle value, we can enumerate all possibilities for Mt and
for each possibility create the list LMt of all pairs (x1 , x2 ) with [x1 •x2 ]t = Mt

© 2009 by Taylor and Francis Group, LLC
258 Algorithmic Cryptanalysis

and the list LMt of pairs (x3 , x4 ) with [x3 • x4 ]t = Mt . Any collision between
these lists for any value of Mt yields a solution of Equation (8.10).
For a ¬xed value of Mt , the expected number of pairs in the ¬rst of these
two lists is N1 N2 · 2’t and the expected numbers of pairs in the second list is
N3 N4 · 2’t . For typical applications, we have N1 ≈ N2 ≈ N3 ≈ N4 ≈ 2n/4 .
Thus, to keep the memory requirements around O(2n/4 ) it is natural to choose
t ≈ n/4. Indeed, with this choice the two middle lists are of approximate size
2n/4 · 2n/4 · 2’t ≈ 2n/4 . Moreover, for each value of Mt ¬nding collisions
between these lists costs O((n/2 ’ t) 2n/2’t ), i.e., O(n 2n/4 ) for our choice of
t. Thus, taking into account the loop on the 2t possibilities for Mt , the total
time for ¬nding collisions is O(n 2n/2 ).
To make sure that the approach is sound, we need to make sure that LMt
and LMt can be constructed e¬ciently. Assuming that L2 is already sorted
with respect to the value of the t high bits of each element, all solutions of
[x1 • x2 ]t = Mt can be found in time O(N1 log N2 ) using a simple variation
of the collision search method with a sorted list. For each element x1 of
L1 , compute [x1 ]t • Mt and search this value by dichotomy in the high bits
of L2 . When N1 ≈ N2 ≈ N3 ≈ N4 ≈ 2n/4 and t ≈ n/4 the time and
memory complexity for building all lists LMt and LMt in turn are O(n2n/2 )
and O(2n/4 ). Note that it is essential to remember that we only need to
store the lists LMt and LMt corresponding to the current value of Mt . The
corresponding pseudo-code for this approach is given as Algorithm 8.3 or
alternatively as Algorithm 8.4
This method is quite di¬erent from the method of Shamir and Schroeppel
but it gives similar performances. Moreover, it removes the need to deal with
balanced trees altogether and, thus, it is simpler to implement.

8.2.2 Generalization to di¬erent groups
In the general setting, the group G from which elements of our four lists are
drawn is an arbitrary group. For now, we assume that this group is abelian.
In that case, group classi¬cation tells us that G is isomorphic to an additive
Z/n1 Z — Z/n2 Z — · · · — Z/nt Z,
where each ni divides ni’1 . As a consequence, it is natural to ¬rst consider
the case where G has this speci¬c form. In the previous section, we dealt with
one extreme example of this case: (Z/2Z)n .
Here, the size of the group G is N , the product of the ni values. As usual, we
expect a solution when the size of the lists is of the order of N 1/4 . Depending
on the distribution of the ni values, several approaches are possible. The ¬rst
approach works when some partial product of the ni values is of the order of
N 1/4 . In that case, we can write G as a direct product G1 — G2 by grouping
all the contributions to this partial product in G1 and the rest in G2 . In this
direct product, G1 is of size around N 1/4 and G2 of size around N 3/4 . Due

© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 259

Algorithm 8.3 Generating all solutions to Equation (8.10)
Require: Input four lists L1 , L2 , L3 , L4 of ≈ 2n/4 n-bit values each
Require: Parameter t
Sort L2 and L4
Allocate arrays L and L of size 2n/2’t plus some margin
for Mt from 0 to 2t ’ 1 do
Reinitialize L and L to empty arrays
for all x1 ∈ L1 do
Search ¬rst occurrence x2 of ([x1 ]t • Mt ) in L2 high bits
while [x2 ]t = [x1 ]t • Mt do
Add x1 • x2 to list L {Keeping track of x1 and x2 values}
Skip to next x2 in L2
end while
end for
for all x3 ∈ L3 do
Search ¬rst occurrence x4 of [x3 ]t • Mt in L4 high bits
while [x4 ]t = [x3 ]t • Mt do
Add x3 • x4 to list L {Keeping track of x3 and x4 values}
Skip to next x4 in L4
end while
end for
Sort L and L
Find and output all collisions between L and L as quadruples
(x1 , x2 , x3 , x4 ).
end for

© 2009 by Taylor and Francis Group, LLC
260 Algorithmic Cryptanalysis

Algorithm 8.4 Alternative option to Algorithm 8.3
Require: Input four lists L1 , L2 , L3 , L4 of ≈ 2n/4 n-bit values each
Require: Parameter t
Sort L2 and L4
Allocate arrays L and L of size 2n/2’t plus some margin
for Mt from 0 to 2t ’ 1 do
Reinitialize L and L to empty arrays
for all x1 ∈ L1 do
Search ¬rst occurrence x2 of ([x1 ]t • Mt ) in L2 high bits
while [x2 ]t = [x1 ]t • Mt do
Add x1 • x2 to list L {Keeping track of x1 and x2 values}
Skip to next x2 in L2
end while
end for
Sort L
for all x3 ∈ L3 do
Search ¬rst occurrence x4 of [x3 ]t • Mt in L4 high bits
while [x4 ]t = [x3 ]t • Mt do
Search x3 • x4 by dichotomy in L
if x3 • x4 found then
Output collision as a quadruple (x1 , x2 , x3 , x4 ).
end if
Skip to next x4 in L4
end while
end for
end for

© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 261

to this repartition, a straightforward adaptation of Algorithm 8.3 gives the
desired result. It su¬ces to take as middle value Mt the projection of group
elements on the ¬rst coordinate G1 . Then, for each element x1 of the ¬rst list
L1 , with ¬rst coordinate [x1 ]G1 it is easy to ¬nd all elements x2 from L2 with
a G1 -coordinate that satis¬es:

[x1 ]G1 + [x2 ]G1 = Mt . (8.12)

Similarly, we ¬nd pairs (x3 , x4 ) with:

[x3 ]G1 + [x4 ]G1 = ’Mt . (8.13)

To ¬nalize the computation, it su¬ces to search for collisions between G2
coordinates of the values of x1 + x2 and ’(x3 + x4 ) occurring in the middle
When it is not possible to write G as a direct product with a correct size
repartition, the four-list problem needs a di¬erent approach. To illustrate
this, let us consider G = Z/pZ for a prime p. Here, G cannot be written as a
proper direct product. Instead, a simple alternative is to lift the problem to
Z and remark that for any solution:

x1 + x2 + x3 + x4 = 0 (mod p), (8.14)

the integer x1 + x2 + x3 + x4 is either 0, p, 2p or 3p, assuming that each
representative xi is chosen in [0, p ’ 1]. Moreover, the value 0 can only be
reached when x1 = x2 = x3 = x4 = 0. As a consequence, over Z/pZ we can
use three consecutive applications of the approach of Shamir and Schroeppel
described in Section 8.1 and solve the four-list problem without using too
much memory.
The same approach is easily generalized to a direct product with a bad
repartition. In that case, we write G = Z/N1 Z — G2 where N1 is an integer
greater than the size of G to the power 3/4 and G2 is the remaining part
of the group structure. Denoting elements of G as pairs (x, y) and using
the approach of Shamir and Schroeppel, it is easy to create the sublist of
values x1 + x2 in increasing order over the integers and the sublist of x3 + x4
in decreasing values. From these lists, we extract partial solutions with x-
coordinates summing up to either N1 , 2N1 or 3N1 . Checking that in addition
the y coordinates also sum to 0 modulo N2 is a simple matter. Badly presented groups
When the group G is not directly given as in additive form as

Z/n1 Z — Z/n2 Z — · · · — Z/nt Z,

it is not completely clear whether and how a reduced memory birthday para-
dox approach can be followed. Once again, several cases need to be consid-
ered. If the cardinality of G is known and if its factorization is accessible, the

© 2009 by Taylor and Francis Group, LLC


. 9
( 18)