ńņš. 9 |

the cycle and obtain a real collision. In that case, we can of course use Algo-

rithm 7.3; however, this algorithm can be improved by using the information

already collected by Nivaschā™s algorithm. Indeed, let us consider the state of

the stack in Algorithm 7.4 at the point where the collision is detected. At

that point in time i, we know that the current value x is equal to a value on

the stack Sk = (x, j). Clearly, thanks to the principle of Nivaschā™s algorithm,

x is the minimal value of the whole cycle. As a consequence, unless k = 0, we

know that Skā’1 corresponds to a value outside of the cycle. If Skā’1 = (Xt , t),

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 231

this value Xt can be used as a replacement for the start-point X0 in Algo-

rithm 7.3. When k = 0, we use the original start-point X0 . On average, this

allows us to recover the start of the cycle a bit faster.

7.2 Analysis of random functions

The good behavior of Floydā™s algorithm and Brentā™s algorithm depends on

the speciļ¬c properties of the function F used to deļ¬ne the considered recur-

sive sequences. We already known that some functions such as F (X) = X + 1

(mod 2n ) exhibit extremely bad behaviors. In this section, we present back-

ground information about the case of random functions, also called random

mappings. A very nice survey about this topic is Random Mapping Statis-

tics [FO90] by Flajolet and Odlyzko. It derives a large number of useful facts

and statistics about random mappings and their cryptographic applications.

All the facts stated in this section are extracted from [FO90] and apply to

the graph of a random function on a set of N elements. They are asymptotic

results and give asymptotic equivalents for the various parameters as N tends

to inļ¬nity.

Each random function F on the set SN of N elements is also viewed as a

directed graph GF whose vertices are the points of SN and whose directed

edges are of the form (v, F (v)), for all vertices v in SN . Each vertex in the

graph has out-degree 1, the in-degree is variable, it can be zero or possibly

a large number. Of course, the average in-degree is equal to the average

out-degree and thus to 1.

7.2.1 Global properties

Connected components. Asymptotically, the number of connected com-

ponents in the graph GF , is log N/2.

We recall that each connected component is obtained by grouping together

all the points that can be reached from a start-point by following the edges

of GF , either in the forward or the backward direction.

Terminal nodes. A terminal node in GF is a vertex v of in-degree 0. Equiv-

alently, it is an element v of SN not in the image of the function F . Asymp-

totically, the number of terminal nodes is N/e, where e = exp(1) is the basis

of natural logarithms.

Image points. An image point is a point in the image of F , asymptotically,

there are (1 ā’ 1/e)N image points in GF .

Ā© 2009 by Taylor and Francis Group, LLC

232 Algorithmic Cryptanalysis

Iterate image points. A point v of SN is a k-th iterate image point, if and

only if, there exists an element x of SN such that v = F (k) (x), where F (k)

denotes k consecutive applications of F . Note that a k-th iterate image point

also is a (k ā’ 1)-th iterate image point. Asymptotically, the number of k-th

iterate image points is (1 ā’ Ļ„k )N, where Ļ„k is deļ¬ned by the recursion law:

Ļ„0 = 0 and Ļ„k+1 = eĻ„k ā’1 . (7.6)

Cyclic nodes. A cycle node v of GF is a node that belongs to some cycle of

GF , i.e., v can be obtained as a k-th iterate image of itself for some value of k.

Asymptotically the average number of cyclic nodes in a random mapping is

ĻN/2.

7.2.2 Local properties

By opposition to global properties which only depend on the graph GF

itself, we now consider local properties obtained when looking at GF from

some random start-point. These local properties are extremely important for

us, since they mostly dictate the behavior of cycle ļ¬nding algorithms.

Tail and cycle lengths. As before, the tail length is the distance from

the start-point to the cycle obtained when iterating F . Asymptotically, for a

random mapping and a random start-point, the average tail length is ĻN/8.

In the same conditions, the average cycle length is equal to the average tail

length and also is ĻN/8. This implies that on average, the total length on

the sequence obtained by recursively applying F to the start-point up to its

ļ¬rst repetition is ĻN/2.

Component size and tree size. On average the size of the connected

component containing the starting point is 2N/3. Similarly, if we restrict

ourselves to the non-cyclic part of this component and take the tree containing

all points which enter, in the same place, the same cycle as our start-point,

then the average size of this tree is N/3.

Note that these results imply that a large fraction of the points in the graph

GF are grouped in a single connected component. This component is often

called the giant component of the random mapping.

Predecessor size. Another question we can ask is the number of points

which are iterated preimages of our chosen start-point. On average, the num-

ber of such predecessors is ĻN/8.

7.2.3 Extremal properties

Another important class of properties within the graph of random mappings

is the class of extremal properties.

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 233

Longest paths. The ļ¬rst extremal property we would like to determine is

the behavior of a random mapping when choosing the worst possible starting

point. How long do cycle ļ¬nding algorithms take in this worst case? The

good news is that the expected length of the longest tail, longest cycle and

longest path (including both tail and cycle) are of the same order of magnitude

ā

O( N ) as in the average case. Only the constants diļ¬er.

More precisely, asymptotically the expected length of the longest cycle is

ā ā

with c1 ā 0.782. The expected length of the longest tail is c2 N with

c1 Nā

= 2Ļ log 2 ā 1.737. Finally, the expected length of the longest path is

c2 ā

c3 N with c3 ā 2.415. It is remarked in [FO90] that quite interestingly,

c3 < c1 + c2 . As a consequence, for a non-negligible fraction of random

mappings, the longest tail does not lead to the longest cycle.

Giant component. Other noteworthy properties are the average size of the

giant component and of the largest tree in the graph of a random mapping.

Due to the results on the average component size and average tree size for a

random start-point, we already know that these sizes are respectively larger

than 2N/3 and N/3. Asymptotically, the giant component is of size d1 N and

the largest tree of size d2 N , with d1 ā 0.758 and d2 ā 0.48.

7.3 Number-theoretic applications

7.3.1 Pollardā™s Rho factoring algorithm

Pollardā™s Rho is a very practical factoring algorithm introduced in 1975 by

Pollard [Pol75]. This algorithm aims at ļ¬nding relatively small factors of an

integer N . It heavily uses the algorithms of Floyd or Brent in order to ļ¬nd

cycles in sequences deļ¬ned by a recursion formula. With Pollardā™s Rho, the

sequence that we consider needs some additional property. More precisely, if

we denote this sequence by X, we require that for any prime factor p of N , the

sequence X (mod p) is also deļ¬ned by a recursion formula. This limits the

possible choices for the function F used to compute the sequence. To ensure

this property and have correctly deļ¬ned sequences modulo each prime factor,

a good approach is to choose for F a polynomial. In fact, the most common

choice is to use:

F (x) = x2 + c (mod N ), (7.7)

for some constant c.

This may seem surprising because the analysis of sequences deļ¬ned by a

recursion formula is done under the assumption that F is a random function,

which x2 + c is not. However, Pollardā™s Rho algorithm is used routinely and

works extremely well. So despite the apparent inconsistency, this choice of

Ā© 2009 by Taylor and Francis Group, LLC

234 Algorithmic Cryptanalysis

F is a good one. A partial explanation is given in [FO90]. More precisely,

they remark that the graph of F (x) = x2 + c is very speciļ¬c, each node but

one (the value c) has either 0 or 2 predecessors. As a consequence, they also

analyze random binary function graphs where each node has in-degree 0 or

2 and ļ¬nd that the multiplicative constants appearing in the graph statistics

of the previous section are aļ¬ected by this change but that their orders of

magnitude are preserved.

Once we choose for F a polynomial, we can easily check that for the sequence

deļ¬ned by:

Xi+1 = F (Xi ) (mod N ), (7.8)

reduction modulo p for any factor p of N gives the same recursion formula

(modulo p instead of modulo N ) for the reduced sequence.

For a prime divisor p of N , let X (p) denote the sequence X (mod p). We

know that X (p) satisļ¬es the recursion:

(p) (p)

Xi+1 = F (Xi ) (mod p). (7.9)

Using the usual analysis for these sequences, we expect that X (p) cycles with

ā

parameters s and of the order of p. In that case, Floydā™s or Brentā™s

ā

algorithm ļ¬nd a cycle in time O( p), assuming that we can test for collisions

within the sequence X (p) . To see that it suļ¬ces to remark that:

(p) (p) (p) (p)

ā” GCD(Xi ā’ Xj , N ) = 1.

Xi = Xj (7.10)

Thus, we can eļ¬ciently test for collisions using GCD computations. Moreover,

when a collision occurs modulo p, there is no special reason to have collisions

modulo the other factors of N . As a consequence, as soon as a collision occurs,

we usually recover a proper factor of N . Pollardā™s Rho using Brentā™s collision

ļ¬nding algorithm is given as Algorithm 7.5.

Of course, Pollardā™s Rho can also be used in conjunction with Floydā™s al-

gorithm. In both cases, one costly part of the method is that each equality

test requires a GCD computation and, thus, becomes expensive. In practice,

this number of GCD computations can be lowered. The idea is to multiply

(p) (p)

together several consecutive values of the form Xi ā’ Xj corresponding to

several equality tests modulo p. Then, we compute a single GCD of the re-

sulting product with N . If this GCD is 1, all the equality tests fail. If the

GCD is a proper factor, we are done. However, if the GCD is N , we need to

backtrack and redo the equality tests one at a time, hoping for a proper factor

of N . The reason behind the need to backtrack is that by grouping several

tests together, we increase the probability of having simultaneous collisions

for the various factors of N .

Pollardā™s Rho is a very interesting application of cycle ļ¬nding algorithms,

which call for several speciļ¬c comments. The ļ¬rst comment is that with

Pollardā™s Rho, there is no need to compute the parameters s and of the

(p)

sequence X (p) . Instead, it suļ¬ces to ļ¬nd a collision by two values Xi and

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 235

Algorithm 7.5 Pollardā™s Rho factoring algorithm

Require: Input number to factor N , parameter c , max. iterations M

Let x āā’ 0

Let y āā’ x

Let trap āā’ 0

Let nexttrap āā’ 1

for i from 1 to M do

Let x āā’ x2 + c mod N

Let f āā’ GCD(x ā’ y, N )

if f = 1 then

if f = N then

Output ā˜Fails: collision modulo N ā™

Exit

else

Output ā˜Found f as factor of N ā™

Exit

end if

end if

if i = nexttrap then

Let trap āā’ nexttrap

Let nexttrap āā’ 2 Ā· trap

Let y āā’ x

end if

end for

Output Failed

Ā© 2009 by Taylor and Francis Group, LLC

236 Algorithmic Cryptanalysis

(p)

Xj which can be arbitrarily located. In particular, we do not need to use

Algorithm 7.3 to ļ¬nd the start of the cycle. A puzzling question concerning

Pollardā™s Rho algorithm is whether we can devise a variation of this algorithm

based on a traditional birthday-based algorithm. In other words, given two

lists of numbers L1 and L2 with comparable sizes and an integer N , can we

eļ¬ciently discover two numbers x1 in L1 and x2 in L2 such that GCD(x2 ā’

x1 , N ) = 1. This is a diļ¬cult problem which was solved by P. Montgomery

in his thesis as a tool to improve elliptic curve factorization [Mon92].

7.3.2 Pollardā™s Rho discrete logarithm algorithm

As many factoring algorithms, Pollardā™s Rho can be adapted to compute

discrete logarithms in arbitrary cyclic groups. Let G be a multiplicative group

of prime2 order p, let g be a generator of G and let h be an element of G.

Solving the discrete logarithm problem corresponds to ļ¬nding an integer Ī±

such that h = g Ī± . In order to apply a cycle ļ¬nding algorithm, we choose

a random looking function F on G. To construct the function F , we ļ¬rst

partition G into three disjoint subsets G1 , G2 and G3 , approximately of the

same size, preferably with 1 not in G1 . Then, we deļ¬ne F as follows:

ļ£±2

ļ£² x if x ā G1 ,

F (x) = gx if x ā G2 , (7.11)

hx if x ā G3 .

ļ£³

Then, we deļ¬ne the recursive sequence X, by letting X0 = 1. Note that if

1 ā G1 , 1 is a ļ¬xed point of F . Thus in that case, we would need to choose

a diļ¬erent start-point. This is why it is preferable to have 1 ā G1 . As usual,

ā

we expect, after O( p) steps, to detect a cycle in F and a collision Xi = Xj .

By itself, this collision does not suļ¬ce to recover Ī± and we need to compute

some additional information about the sequence X.

In order to do this, let us consider the function Ļ from H = Z/pZ Ć— Z/pZ

to G deļ¬ned by:

(a, b) ā’ā’ g a hb . (7.12)

This function is a surjection onto G and each element of G has p distinct

preimages in H. Moreover, given two distinct elements in H with the same

image in G, we can compute Ī±. Indeed, if Ļ(a, b) = Ļ(a , b ) then g a+Ī±b =

ha +Ī±b and Ī± = a ā’a (mod p).

b ā’b

The next step is to lift the sequence X to H and to ļ¬nd a sequence Y of

elements in H such that X = Ļ(Y ). This can be done using the following

2 For a general treatment about discrete logarithms and a justiļ¬cation for restricting our-

selves to groups of prime order, see Chapter 6, Section 6.4.1.

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 237

function G from H to H:

ļ£±

ļ£² (2a, 2b) if Ļ(a, b) ā G1 ,

G(a, b) = (a + 1, b) if Ļ(a, b) ā G2 , (7.13)

(a, b + 1) if Ļ(a, b) ā G3 ,

ļ£³

It is easy to verify that Ļ(G(a, b)) = F (Ļ(a, b)) for all elements (a, b) in H.

As a consequence, the sequence Y deļ¬ned by choosing Y0 = (0, 0) together

with the recursion law Yi+1 = G(Yi ) satisļ¬es X = Ļ(Y ).

Of course, Y is a sequence on a much larger set than X, as a consequence,

we do not expect to have a collision in the sequence Y as early as in X.

Thus, the collision returned for the sequence X by the cycle ļ¬nding algorithm

is usually not a collision for Y . As a consequence, we obtain two diļ¬erent

preimages from the same element of G and can compute Ī±.

Note that if we have a collision on Y itself, the discrete logarithm can also

be recovered for a slightly higher cost by ļ¬nding the entrance of the cycle.

The only exception occurs if Y collides on its initial value Y0 .

7.3.3 Pollardā™s kangaroos

Together with his Rho method, Pollard introduced another algorithm for

computing discrete logarithms when some additional information is known.

More precisely, he considers the problem g x = h in G and wants to recover x

from g and h, if we know in addition that x belongs to some interval [a, b]. In

this context, using Pollardā™s Rho in the group G, without taking in this extra

information, is not always a good idea. In particular, when b ā’ a is smaller

than the square root of the size of G, it is more eļ¬cient to recover x by brute

force, try all values in [a, b]. To deal with this problem, Pollard proposes to

use what he calls kangaroos. This method is also known as Pollardā™s lambda

algorithm.

A kangaroo in Pollardā™s method is a sequence of elements in G whose log-

arithm increases by successive jumps. In order to apply the method, we need

two kangaroos, a tame one, whose start-point is g a and a wild one, with start-

point is h. The ļ¬rst kangaroo is called tame, because the logarithm of the

corresponding sequence is always known. To precisely deļ¬ne the kangaroos,

we need a jump function j that associates to each element of G a positive num-

ā

ber upper bounded by b ā’ a. A classical method is to divide G in k subset

of roughly equal sizes S0 , . . . Skā’1 , with k = log2 b ā’ a/2 , and for x in Si to

deļ¬ne j(x) = 2i . Once the jump function j is ļ¬xed, we let F (x) = x Ć— g j(x) .

The tame kangaroo is the sequence deļ¬ned by:

T0 = g a (7.14)

Ti+1 = F (Ti ).

Ā© 2009 by Taylor and Francis Group, LLC

238 Algorithmic Cryptanalysis

The wild kangaroo is the sequence deļ¬ned by:

W0 = h (7.15)

Wi+1 = F (Wi ).

While computing the tame kangaroo sequence, it is important to keep track

iā’1

of the discrete logarithm of Ti , i.e., of the value a + k=0 j(Tk ). As soon

as the discrete logarithm goes beyond b, we abort the sequence T and recall

the ļ¬nal position Tn , together with its discrete logarithm ln , i.e., Tn = g ln .

After this ļ¬nal point is known, we start the wild sequence keeping track of the

sum of the jumps taken by the wild kangaroo. We abort the wild sequence

either when encountering Tn or when the sum of the jumps becomes larger

than b ā’ a. In the latter case, the algorithm fails. Otherwise, we have found a

position Wi with corresponding sum si such that Wi = Tn , since Wi = h Ć— g si

and Tn = g ln we conclude that:

h = g ln ā’si . (7.16)

As Pollardā™s Rho, Pollardā™s kangaroo method is a generic algorithm. This

ā

algorithm has a good probability of success and takes time O( b ā’ a) to

compute the discrete logarithm of h; for a rigorous analysis, see [MT09].

7.4 A direct cryptographic application in the context of

blockwise security

Blockwise security is a practical concern about the security of modes of

operation for block cipher which was discovered independently by two groups

of researchers. In [BKN04], it is shown that the practical implementation

of secure shell (SSH) which relies on the cipher block chaining (CBC) mode

of operation is insecure against a variation on chosen plaintext attacks. At

Crypto 2002, in [JMV02], the theoretical idea of blockwise security was in-

troduced. Moreover, in the same paper, it is shown that for several common

modes of operation there is a gap between blockwise and ordinary message-

wise security. After this discovery, several papers such as [FJMV04], [BT04]

or [FJP04] studied this paradigm of blockwise security and came up with

blockwise secure modes. The key result is that there exist modes of operation

with blockwise security levels essentially equivalent to usual messagewise se-

curity levels. More precisely, these modes of operation are secure up to the

birthday paradox bound and after that become insecure. In this section, after

some brief reminders about the CBC mode of operation and its security both

as a messagewise mode and as a blockwise mode, we study the behavior of

this mode beyond the birthday paradox bound and show that in the blockwise

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 239

security model, we can use Brentā™s algorithm to devise attacks which work

more eļ¬ciently than their messagewise counterparts.

7.4.1 Blockwise security of CBC encryption

In Section 6.1, we considered the security of CBC encryption, viewed as a

messagewise process, with the plaintext provided as a whole and the corre-

sponding ciphertext returned as a whole. However, in practice, implementing

this approach may become problematic, since the encryption box needs to be

able to internally store whole messages, which can potentially be very long.

For this reason, practitioners often use CBC encryption as a blockwise mode.

In that case, the plaintext message is transmitted block by block, and each

encrypted block is immediately returned. However, as noticed in [BKN04]

and [JMV02], in this blockwise model, CBC encryption is no longer secure.

Indeed, an active attacker may observe the current ciphertext block C (i) before

submitting the next block of plaintext P (i+1) . As a consequence, this allows

him to choose at will the value v which enters EK by letting P (i+1) = v ā•C (i) .

This property can be used in several ways to show that CBC encryption is

not secure as a blockwise mode of operation, for example by creating a simple

distinguisher able to make the diļ¬erence between a CBC ciphertext and a

random message of the same length. To create this distinguisher, it suļ¬ces

to make sure that the same value v is to be encrypted twice in CBC mode.

Thus, a CBC encrypted message reļ¬‚ects this equality. Of course, in a random

message, the equality has no special reason to hold.

Interestingly, there is a simple countermeasure to ļ¬x this problem. This is

called Delayed CBC encryption and is proven to be secure in [FMP03]. The

basic idea on Delayed CBC is very simple, once a ciphertext block C (i) has

been computed, instead of returning it immediately, the encryption device

waits until it gets P (i+1) and then returns C (i) . For the last encrypted block,

the encryption device returns it upon reception of a special block indicating

that the message is complete, this special block is not encrypted, it only

serves a bookkeeping purpose. The rationale behind Delayed CBC is that the

attacker can no longer control the block cipher inputs. Thus, Delayed CBC

prevents the simple attack that breaks plain CBC.

7.4.2 CBC encryption beyond the birthday bound

Beyond the security bound of CBC encryption, the birthday paradox comes

into play. As a consequence, among the ciphertext blocks, we expect to ļ¬nd at

least one collision, say C (i) = C (j) . If we now replace each block of ciphertext

in this equality by its expression, we ļ¬nd that:

EK (C (iā’1) ā• P (i) ) = EK (C (jā’1) ā• P (j) ), and thus (7.17)

C (iā’1) ā• P (i) = C (jā’1) ā• P (j) since EK is a permutation.

Ā© 2009 by Taylor and Francis Group, LLC

240 Algorithmic Cryptanalysis

We can rewrite this as P (i) ā• P (j) = C (iā’1) ā• C (jā’1) . As a consequence, given

one collision, we learn the XOR of two blocks of plaintext. As the length of

the message grows, more and more collisions are found and thus more and

more information is learned about the plaintext message.

For a cryptanalyst to use this attack, he must be able to eļ¬ciently detect

collisions among the blocks of ciphertext. One very common technique is to

proceed as in Chapter 6 and sort the ciphertext blocks in order to detect these

collisions. However, this requires either a large amount of main memory or at

least a fast auxiliary storage device. For 64-bit block ciphers, the cryptanalyst

needs to store and sort about 232 blocks or equivalently 235 bytes. In fact,

the situation is slightly worse, since the cryptanalyst also needs to keep track

of the initial block positions when sorting in order to know the plaintext

blocks involved in the equation derived from the collision. If there is not

enough memory to sort a table containing both the ciphertext value and the

original position, an alternative is to keep the unsorted values somewhere on

an auxiliary storage device, to sort the ciphertext blocks without indices and

then to scan the auxiliary device to locate the original position of the colliding

blocks once their common value is known.

Even with this approach, we need 235 bytes of fast memory. Despite the

quick progress of computer hardware, 32 Gbytes is still a large amount of

memory, only available on dedicated computers. Thus, in practice, even when

CBC encryption is used with a 64-bit block cipher, say Triple-DES, putting

the attack into play requires some important computing eļ¬ort from the crypt-

analyst. For this reason, among others, CBC encryption with Triple-DES is

still widely encountered.

7.4.3 Delayed CBC beyond the birthday bound

After considering the security of messagewise CBC beyond the birthday

bound, we now turn to blockwise security of Delayed CBC encryption beyond

this bound. In order to construct this attack, we are going to use a trick

similar to the one used for attacking ordinary blockwise CBC. More precisely,

whenever we receive a block of ciphertext, we determine the next block of

plaintext as a function of this block and send it back to CBC encryption.

Due to the one block delay, we can only start this from the second block of

plaintext. For the ļ¬rst plaintext block, we can submit any value of our choice,

such as the all-zeros block. Starting from the second block, we choose our next

plaintext block as:

P (i) = P (iā’1) ā• C (iā’2) . (7.18)

Even with Delayed CBC, both values Piā’1 and Ciā’2 are eļ¬ectively known at

the time we need to submit Pi . Now, with this speciļ¬c choice for plaintext

blocks, the ciphertext values are determined by the following equation:

C (i) = EK (C (iā’1) ā• P (iā’1) ā• C (iā’2) ) (7.19)

= EK (EK (P (iā’1) ā• C (iā’2) ) ā• (P (iā’1) ā• C (iā’2) )).

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 241

In other words, if we let Zi denote the input to the block cipher in round i,

i.e., Zi = P (i) ā• C (iā’1) , we see that Zi satisļ¬es a recursion law:

Zi = EK (Ziā’1 ) ā• Ziā’1 . (7.20)

Quite interestingly, xoring the input and the output to a block cipher is a

pretty standard construction used to build a pseudo-random function from

a pseudo-random permutation, known as Davies-Meyer construction. This

Davies-Meyer construction is often used in hash functions. For example, it is

encountered in SHA-1. If we deļ¬ne the function F by F (x) = x ā• EK (x), the

recursion law becomes Zi = F (Ziā’1 ).

Using the analysis of Section 7.2, we know that the sequence Z loops back

to itself after about O(2n/2 ) steps. Moreover, we can consider using a cycle

detection algorithm to discover this loop.

7.4.3.1 Floydā™s algorithm

Remember that to use Floydā™s algorithm, we need to compute in parallel the

sequence Zi and the sequence Wi = Z2i in order to ļ¬nd an equality Zi = Wi .

Without going further into the details, we see that this is troublesome in our

context of blockwise CBC encryption. Indeed, since we do not control the

initial values C0 used during encryption, there is no way to compute both

sequences Zi and Wi at the same time. Of course, we can always store Z and

compare Zi with Z2i at each step, but this requires a large amount of memory,

which is exactly what we are trying to avoid. Still, note that this approach

would be better than sorting and could even work with slow memory or disk

storage.

7.4.3.2 Brentā™s algorithm

On the contrary, with Brentā™s algorithm, we immediately see that the loop

we induced in Delayed CBC encryption can easily be detected. Following our

description from Section 7.1.2, whenever the index i is a power of 2, we store

this value Z2t and then compare it to the subsequent values of Zi up to the

next power of two. This can be done without any need to compute the same

sequence twice or to control initial values.

In some sense, detecting a loop is already an attack. Indeed, this can

be seen as a real or random distinguisher. If the output is real, i.e., comes

from a blockwise CBC encryption, we ļ¬nd a cycle in time O(2n/2 ) with high

probability. If the output is random, then Brentā™s algorithm almost never

ļ¬nds a new value equal to the stored value. However, we can strengthen the

attack and do more than this basic distinguisher. In fact, we can use the cycle

to obtain the encryption of any value v of our choice under the underlying

block cipher EK . For this, when storing Z2t , which is a block cipher input, we

t t

also store the corresponding output C (2 ) . Due to the one block delay, C (2 )

is not available immediately, but we store it whenever we receive it. When

Ā© 2009 by Taylor and Francis Group, LLC

242 Algorithmic Cryptanalysis

the collision is detected, we have a new input Zi equal to Z2t . Of course,

this equality between inputs to the block cipher implies equality between the

t

output. So we know in advance that C (i) = C (2 ) . Thanks to this knowledge,

we can set P (i+1) = C (i) ā• v and thus obtain the encrypted value EK (v).

This shows that thanks to Brentā™s algorithm, Delayed CBC encryption as

a blockwise mode of operation is (slightly) more vulnerable to attacks be-

yond the birthday paradox bound than ordinary CBC encryption used as a

messagewise mode of operation.

7.4.3.3 Nivaschā™s algorithm

In this context of blockwise attacks, Nivaschā™s algorithm can be very useful.

Indeed, it allows us to reduce the amount of required ciphertext before the

cycle is detected. In fact, using Algorithm 7.4 as presented in Section 7.1, we

are guaranteed to require at most s + 2 blocks of ciphertext. Of course, it

would be interesting to further reduce this number. Clearly, the ļ¬rst collision

occurs at position s+ and we cannot use fewer than s+ blocks of ciphertexts

to detect a collision. Following Nivasch in [Niv04], it is possible to devise a

variation of Algorithm 7.4 that uses some additional memory and can detect

the cycle with s + (1 + Ī±) blocks, for arbitrary small values of Ī±. This

algorithm uses a technique called partitioning and its memory requirements

increase as Ī± decreases.

7.5 Collisions in hash functions

Another application of cycle detection algorithms is the search for collisions

in hash functions, especially when their output length is too short. For ex-

ample, given a hash function with a 128-bit output, thanks to the birthday

paradox, we can ļ¬nd a collision with roughly 264 evaluations of the hash func-

tion. However, it is clearly infeasible with current computers to store and

sort the corresponding list of 264 hash values in order to discover the collision.

Thus, it is natural to ask how to use cycle detection algorithms to obtain a

collision. The ļ¬rst answer is to simply invoke the cycle detection algorithm

on a recursive sequence, where each value is obtained by hashing the previous

value. Yet, this answer is not satisfactory for several reasons. The ļ¬rst reason

is that this approach can only produce a meaningless collision and output

two random looking messages with the same hash. Of course, it would be

much preferable for a cryptanalyst to obtain two meaningful messages with

diļ¬erent meanings and the same hash. The second reason is that the cycle

detection algorithms presented in Section 7.1 are inherently sequential. And

while performing 264 operations is feasible with todayā™s computers, perform-

ing 264 sequential operations on a single computer is a completely diļ¬erent

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 243

matter.

7.5.1 Collisions between meaningful messages

Finding a collision between meaningful messages is a now classical trick,

ļ¬rst introduced in [Yuv79]. The idea is to start by choosing two messages

M and M with diļ¬erent meanings. Then in each message, we need to ļ¬nd

a number of places where safe modiļ¬cations can be performed. Here, we

call safe modiļ¬cation any change which does not alter the meaning of the

message. There are usually many possible safe modiļ¬cations in a human

readable message. We could replace a space by a line feed, a non-breaking

space or even two spaces. Similarly, at some points in a message, punctuation

can be added or removed; upper-case and lower-case letters can be exchanged

without altering the messageā™s meaning. Given t places of safe modiļ¬cation

for a message M , it is possible to construct a set of 2t diļ¬erent messages

with the same meaning. Note that for the hash function the 2t messages

are genuinely diļ¬erent and we obtain 2t random looking3 hash values. If we

proceed in the same way, we also obtain 2t diļ¬erent copies of M . Clearly,

for a n-bit hash function, with t > n/2 we expect the existence of a collision

between a copy of M and a copy of M .

The same idea of safe modiļ¬cations can also be used in conjunction with a

cycle ļ¬nding technique. However, we need some extra care to avoid uninter-

esting collisions between two copies of M or two copies of M . Given t points

of safe modiļ¬cation of M , for any t-bit number i, we denote by Mi the copy

obtained by setting the speciļ¬c value of each point of modiļ¬cation according

to the value of a corresponding bit in the binary expansion of i. Similarly, we

denote by Mi the i-th copy of M . Using this notation, we can construct a

function F from t + 1 to n bits as follows:

Mi if v = 0,

F (v, i) = (7.21)

Mi if v = 1.

Adding a truncation from n bits to t + 1, we can obtain a sequence of hash

values of meaningful messages and look for a collision. However, this direct

approach with t ā n/2 as above usually yields a trivial collision. Indeed, if

two hash values collide after truncation to t+1, at the next iteration we hash a

previously encountered message and enter a cycle with a collision between the

hash values of twice the same message. To avoid this bad case, we need more

points of safe modiļ¬cation, namely we need to choose t = n ā’ 1. With this

choice, the entrance of the cycle corresponds to a genuine collision. Moreover,

with probability 1/2 this genuine collision is between a copy of M and a copy

of M . If not, we can simply restart the algorithm with a diļ¬erent start-point.

3 Unless the hash function is badly broken.

Ā© 2009 by Taylor and Francis Group, LLC

244 Algorithmic Cryptanalysis

7.5.2 Parallelizable collision search

Since cycle ļ¬nding algorithms are inherently sequential, in order to ļ¬nd

collisions using a parallel computation, we need a diļ¬erent approach. This ap-

proach cannot look for a cycle within a single recursively deļ¬ned sequence, but

instead needs to rely on the computation of several independent sequences.

We already introduced an algorithm of this kind to compute discrete loga-

rithms in a short interval: Pollardā™s Kangaroo algorithm (see Section 7.3.3).

However, to use this idea in a parallelizable way, we need to adapt it. When

running on a parallel computer, it is essential to reduce the amount of commu-

nication and synchronization between parallel processes. A common approach

when looking for a speciļ¬c value or property among a large set of data is to

use a large number of computers to create likely candidates and then to report

these candidates to a dedicated server which performs the ļ¬nal checking. For

a collision search, we need to report values which are likely to collide. This can

be achieved using the distinguished point technique from [QD90] and [vW96].

Note that in distinguished point algorithms, one node in the parallel com-

puter plays a special role. This node receives distinguished points from all

other nodes, checks for collisions within the set of distinguished points using

a sort based birthday algorithm as in Chapter 6 and pulls back the collisions

from this set of distinguished points to the real values. It is very important to

devise parallel collision search algorithms in a way that minimizes the com-

munication cost, the memory requirement and the computational overhead of

this server node.

In this section, we present one of the numerous possible options for paral-

lelizable collision search on a random function F . We assume that F operates

on a set of N elements and that the parallel computer consists of P identical

parallel nodes and a special server node. We also choose a set of D distin-

guished points. For algorithmic convenience, the distinguished points should

be easy to recognize and easy to generate. A typical example of set of dis-

tinguished points is the set of n-bits integers with d leading zeros. This set

contains D = 2nā’d elements in a larger set of cardinality N = 2n . Each distin-

guished point x is easily recognized by the property: 0 ā¤ x < 2nā’d . Moreover,

with this deļ¬nition of distinguished points, it is very easy to generate random

distinguished points at will.

With this setting, our algorithm works as follows. First, the server node

allocates a (distinct) distinguished point to each computation node. Given

this distinguished point, the computation node uses it as start-point for Ni-

vaschā™s cycle detection Algorithm 7.4, with a small change. Namely, when

Nivaschā™s algorithm encounters a distinguished point, it aborts and sends a

report. The report contains three values: the starting distinguished point, the

distinguished end-point and the length of the path between the two points.

The server node maintains a list of reports and detects collisions between

end-points. If a collision occurs between the values taken by F during the

global computation, necessarily this implies the existence of either a cycle on

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 245

one computation node or a collision between two diļ¬erent nodes. Both pos-

sibilities are detected by our algorithm, in the ļ¬rst case, Nivaschā™s algorithm

detects the cycle, in the second case, the server node discovers a collision

between two end-points. From this collision, between end-points, the real col-

lision is easily recovered. Indeed, assume that there is a path of length L1

from a distinguished point S1 to a end-point P and another path of length L2

from S2 to P . Without loss of generality, we can further assume that L1 ā„ L2 .

Then, starting from S1 , the server node walks the ļ¬rst path, taking exactly

L1 ā’ L2 steps. From that point, it now suļ¬ces to walk both paths in parallel,

until a collision is reached. Note that, it is always a genuine collision. Indeed,

since E is the ļ¬rst distinguished point encountered by walking from S1 , S2

cannot lie on the ļ¬rst path. As a consequence, the two parallel walks start

from diļ¬erent places and end at the same position. Moreover, they have the

same length and the collision necessarily occurs after taking the same number

of steps on both paths.

Since any collision between two of the computed values ā F is detected,

of

the number of values that need to be computed remains O( N ), as with se-

quential cycle detection algorithms. As a consequence, with P processors, the

parallelization speed-up is essentially optimal. One caveat is that the com-

puted values are not integrated into the global count until the corresponding

distinguished point is reported. Thus the number of distinguished points

should be adequately chosen. With D distinguished points, the fraction of

distinguished points is N/D. Thus the average length of a path between two

distinguished points is essentially N/D. Note that for some start-points, we

may enter a cycle. In this case, we cannot nicely deļ¬ne the distance between

two distinguished points. Ignoring this technical diļ¬culty, we ļ¬nd that, as

a consequence, running our algorithm on P processors allows it to compute

ā

P N/D points. In order to have O( N ) values of F , we need to choose:

ā

D = O(P N ). (7.22)

Since the server node needs to store P paths, i.e., one per computation node,

the amount of required memory is O(P ). Finally, the running time of the

code that extracts the collision on F from a collision between distinguished

points is upper bounded by the time needed to walk through two paths, i.e.,

ā

O(N/D) = O( N /P ).

Note that with the above description and analysis, the computation is not

perfectly balanced between processors. Indeed, some paths between distin-

guished points are shorter than others. To improve this imbalance, an easy

patch is to serially run several instances of Nivaschā™s algorithm on each com-

putation node, in order to average the pathsā™ length. This increases the pa-

rameter P and as a consequence the memory required on the server node.

Another remark is that when P becomes large, collisions coming from a

cycle within a single path become extremely rare. As a consequence, we

may replace Nivaschā™s algorithm by a simpler one, which simply runs from a

Ā© 2009 by Taylor and Francis Group, LLC

246 Algorithmic Cryptanalysis

distinguished point to the next, possibly aborting when it takes too long to

ļ¬nd a distinguished end-point. The only drawback of this approach is that the

exact analysis of the algorithmā™s behavior becomes much more cumbersome.

7.5.2.1 Looking for many collisions

Another interesting aspect of the above parallelizable collision search algo-

rithm is that it can be used to eļ¬ciently construct a large number of collisions.

Note that the sequential cycle ļ¬nding algorithms are not very eļ¬cient for con-

struction multiple collisions. Indeed, to construct k collisions, we essentially

ā

need to repeat the cycle ļ¬nding algorithm k times, for a total cost O(k N ).

On the contrary, with the above parallel algorithm, the birthday paradox con-

tinues to apply and to play for us after the ļ¬rst collision. As a consequence,

ā

the overall cost to construct k collisions is reduced to O( kN ). However, we

should be careful when choosing the set of distinguished points. In particular,

it is clear that each stored path can yield at most one collision. As a conse-

quence, the parameter P should be chosen larger than the number of desired

collisions k.

An alternative would be to use arbitrary points instead of distinguished

points as start-points for the stored path. However, this approach introduces

a new drawback. Indeed, nothing would prevent us to compute twice the same

path or, more precisely, to compute two paths where one is a subpath of the

other. These two paths would not yield any new collision and would increase

the computational cost without any gain.

7.6 Hellmanā™s time memory tradeoļ¬

Hellmanā™s time memory tradeoļ¬ is a generic method for recovering the

key K of a block cipher E. This method assumes that the cryptanalyst can

perform a massive precomputation about E beforehand and memorize some

information summarizing this precomputation, in order to help in recovering

the key K at a later stage. The goal is to achieve a non-trivial time mem-

ory tradeoļ¬ for the late stage of the attack. Two trivial possibilities are to

either memorize nothing and use brute force to recover K, or to memorize a

sorted table containing all pairs (EK (0n ), K) and to perform a table lookup

of EK (0n ) in order to recover K.

Despite the fact the Hellmanā™s time memory tradeoļ¬ uses a large amount

of memory, it is presented in this chapter for two main reasons. First, the

behavior of Hellmanā™s tradeoļ¬ is deeply connected with the analysis of random

functions. Second, Hellmanā™s tradeoļ¬ is extremely useful for extracting a

block cipher key after the blockwise attacks of Section 7.4.

Hellmanā™s algorithm is based on the analysis of the function F (K) =

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 247

EK (0n ) from the set of possible keys to itself. Note that if the key size and

the block size are diļ¬erent, this function needs to be adapted. If the key size

is smaller than the blocksize, we need to truncate the result. If the key size is

larger than the blocksize, then we concatenate several encryptions (say choos-

ing the constant blocks, 0n , 0nā’1 1, . . . ) and truncate the concatenation if

necessary. Breaking the block cipher E and ļ¬nding K is clearly equivalent to

computing the inverse of F on the point EK (0n ) obtained from an encryption

of 0n . Thus, Hellmanā™s tradeoļ¬ is usually considered to be a chosen plaintext

attack. In fact, it is slightly better than that, since we do not insist on the

choice of 0n . The only requirement is to make sure that there is some ļ¬xed

value v, for which it is easy to obtain an encryption of EK (v). For exam-

ple, if the adversary knows that users routinely encrypt messages with some

ļ¬xed headers, which is a frequent feature of several word processors, then,

assuming a deterministic encryption mode, chosen plaintext attacks may not

be necessary. Hellmanā™s algorithm gives another reason why using random-

ized encryption modes is important when using block ciphers. Indeed, with a

randomized mode, the adversary can no longer obtain the encryption of his

chosen ļ¬xed value and Hellmanā™s method can no longer be used.

7.6.1 Simpliļ¬ed case

To understand the key idea behind Hellmanā™s algorithm, we ļ¬rst consider

a very special case and assume that F is a cyclic permutation. In real life,

this never happens. However, in this case the analysis is much simpler. With

this unlikely hypothesis, we can start from an arbitrary initial key K0 and

precompute the complete sequence given by Ki+1 = F (Ki ). This sequence

follows the cycle of F and goes in turn through all possible keys for the

block cipher, eventually coming back to K0 after 2k iterations, where k is the

number of key-bits. Every 2l steps, with l = k/2 , we memorize the pair

(KjĀ·2l , K(jā’1)Ā·2l ). In other words, we remember the current key together with

its ancestor located 2l steps before. Once we have collected the 2kā’l possible

pairs around the cycle, we sort these pairs by values and keep this sorted

array.

Given access to this array of sorted pairs, we can now invert F quite easily

in 2l steps. Starting from EK (0n ) for an unknown key K, we look up this

value in our array. If it is not present, we apply F and look up again. After t

steps, we ļ¬nd a value F (t) (EK (0n )) in the array. For this value, we read the

second number in the pair, thus going 2l steps backward in the cycle of F .

Then, we apply F again 2l ā’ t ā’ 1 times and recover the (unique) preimage

of EK (0n ), i.e., the key K.

In this simpliļ¬ed case, Hellmanā™s tradeoļ¬ needs to memorize 2kā’l , i.e.,

2l or 2l+1 with our choice for l, pairs and the running time of the attack

stage (omitting the precomputation phase) is l Ā· 2l , when counting the cost

of the dichotomy lookups in the sorted array. Note that this technique is

Ā© 2009 by Taylor and Francis Group, LLC

248 Algorithmic Cryptanalysis

very similar to the baby step, giant step algorithm described in Section 6.4.2.

Reformulating Hellmanā™s algorithm in this context, we see that to take a single

baby step backward, we need to take many baby steps forward and a single

giant step backward.

7.6.2 General case

In the general case, F is no longer cyclic and no longer a permutation. More-

over, we expect it to behave like a random mapping. Thus, choosing a random

start-point and applying F iteratively quickly leads to a cycle of square-root

size. As a consequence, if we want to cover more than a negligible fraction of

the keys, we clearly need to use many start-points. Another diļ¬culty is that

with several start-points, nothing prevents two diļ¬erent sequences to merge

at some point. In fact, any two starting points in the giant component lead to

merging sequences. At the latest, the sequences merge somewhere in the cycle

corresponding to the giant component. Even worse, ļ¬nding a preimage for

EK (0n ) is not suļ¬cient to recover K, since preimages are no longer unique.

In order to bypass all these problems, Hellmanā™s tradeoļ¬ requires more time

and more memory than in the simpliļ¬ed case. The usual balance (neglecting

logarithmic factors) is 22k/3 in terms of both time and memory.

The classical analysis of Hellmanā™s tradeoļ¬ assumes that we are computing

sequences of length t for m random start-points. For each sequence, the start-

and end-points are memorized, sorted by end-point values. In order to ļ¬nd a

preimage by F , the same idea as in the simpliļ¬ed case is used. From a start-

point, F is applied repeatedly until a memorized end-point is reached. Then

the algorithm jumps back to the corresponding start-point and moves forward

again. Several cases are possible, in the most favorable we discover K, in the

other cases, we fail, either by ļ¬nding a wrong preimage for EK (0n ), i.e., a

diļ¬erent key that yields the same encryption, or by not even coming back to

the point EK (0n ). The probability of success is of the order of mt/2k as long

as mt2 remains below 2k . If we go beyond this limit, the number of collisions

between chains becomes very large and the tradeoļ¬ becomes unworthy. Taking

m = t = 2k/3 , we ļ¬nd a probability of 2ā’k/3 with memory requirements

of O(2k/3 ) keys and running time O(k2k/3 ). To improve the probability,

Hellman proposes to use 2k/3 diļ¬erent choices for the function F , by slightly

changing the way a ciphertext EK (0n ) is converted back into a key. Assuming

independence between the tables generated for each of these functions, the

success probability increases to a constant fraction of the key space with time

and memory requirements 22k/3 (neglecting logarithmic factors).

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 249

Exercises

1. At the end of Floydā™s algorithm, prove that t is the smallest multiple of

greater than or equal to s.

2h . Given a discrete logarithm problem y = g x with the additional informa-

tion that every bit in odd position in x is a 0, devise a generic algorithm

to eļ¬ciently obtain x, assuming that there are n unknown bits in x.

3. In this exercise, we consider the simultaneous use of several copies of

Nivaschā™s algorithm.

(a) Given a sequence Xi+1 = f (Xi ) and two diļ¬erent order relations

on elements Xi , run in parallel two copies of Nivaschā™s algorithm

deļ¬ning one with each order relation. We detect the loop as soon

as one of the two copies does. Assuming that the result of each dif-

ferent comparison between unequal values behaves randomly, when

do you expect this detection to happen?

(b) What can be achieved with t diļ¬erent copies of Nivaschā™s algo-

rithm?

(c) Describe an application where this technique oļ¬ers a noticeable

gain.

4h . What is the expected complexity of ļ¬nding the cycleā™s start given the

complete state of Nivaschā™s algorithm, after detecting a collision? Can

this be improved with t copies of the algorithm?

5. Implement the search of hash collisions using a cycle ļ¬nding algorithm

on a reduced-sized hash function. For example, you can use a truncated

copy of SHA.

6h . Devise a method for searching many collisions (not multicollisions, just

several independent collisions) within a hash function, using a cycle ļ¬nd-

ing technique. Compare with seach for many collisions using a birthday

paradox approach with memory.

7. Open problem. Find a method to construct multicollisions (at least

a 3-multicollision) using a cycle ļ¬nding approach.

Ā© 2009 by Taylor and Francis Group, LLC

Chapter 8

Birthday attacks through

quadrisection

In practice, cryptanalytic attacks based on the birthday paradox are often

limited by the amount of memory they require. Indeed, the time and mem-

ory requirements of birthday paradox based attacks are roughly the same

and for practical purposes, memory costs much more than time. Of course,

when looking for collisions involving functions, we can use the techniques of

Chapter 7, however, their range of applicability is limited. In this chapter,

we study other speciļ¬c instances of the birthday-based attacks for which the

techniques of Chapter 7 cannot be applied, but which, nonetheless can be

tackled using less memory than the generic methods of Chapter 6. These

techniques presented here reduce the required amount of memory, without

signiļ¬cantly increasing the running times. In fact, even when there is enough

memory available to perform a generic birthday attack, using these variations

may improve performance, due to cache eļ¬ects.

8.1 Introductory example: Subset sum problems

To serve as an introduction to the techniques presented in this chapter, we

present an algorithm of Schroeppel and Shamir [SS81] ļ¬rst presented in 1979.

This example studies the problem of ļ¬nding all the solutions to a given subset

sum problem. We ļ¬rst recall the deļ¬nition of a subset sum problem.

DEFINITION 8.1 A subset sum problem consists, given n positive inte-

gers xi and a target sum s, in ļ¬nding all the solutions to the equation:

n

ei xi = s, (8.1)

i=1

where the ei values are integers equal to either 0 or 1.

The subset sum problem also has an associated decision problem where the

goal is to say whether a solution exists or not.

251

Ā© 2009 by Taylor and Francis Group, LLC

252 Algorithmic Cryptanalysis

The subset sum problem is also called the knapsack problem.

8.1.1 Preliminaries

Clearly, the subset sum problem can be broken by brute force. It suļ¬ces

to compute the 2n diļ¬erent sums. This requires essentially no memory and

has a running time of 2n evaluations of a sum of n terms. One can do better

with a simple application of the birthday paradox: ļ¬rst split the x values into

two sets of roughly equal sizes, compute L1 the list of all possible sums for

the ļ¬rst set and L2 the list obtained by subtracting from s all the possible

sums for the second set. Any collision between L1 and L2 corresponds to an

equation:

n/2 n

ei xi = s ā’ ei xi , (8.2)

i=1 i= n/2 +1

thus yielding a solution to the knapsack problem. As usual, this birthday-

based attack has a runtime O(n2n/2 ) and require to store 2 n/2 partial sums

in memory. For values of n where the brute force attack is at the edge of

possible computations, say n = 80 it can be argued whether the birthday-

based attack is better than the brute force attack. In practice, it is easier to

perform 280 computations in a distributed eļ¬ort on many small computers or

to ļ¬nd a large computer with enough memory to store 240 numbers?

With the results of Chapter 7 in mind, it is natural to try solving such

a knapsack problem using a cycle ļ¬nding algorithm. At ļ¬rst, this approach

seems reasonable. Indeed, assuming for simplicity that n is even, we can

easily deļ¬ne a function F on n/2 + 1 bits that maps values starting by 0 to

a sum of the ļ¬rst n/2 numbers in the knapsack as in the left-hand side of

Equation (8.2). The same function would map values starting by a 1 to the

target value s minus a sum of the last n/2 numbers, as in the right-hand side

of the same equation. Clearly, a random collision on F yields a solution to the

knapsack problem whenever the colliding values diļ¬er on their ļ¬rst bits, which

should heuristically occur with probability roughly 1/2. As a consequence, it

is natural to try using cycle ļ¬nding algorithms to obtain a collision on F .

However, in many cryptographic applications of knapsacks, the values of the

knapsack elements are large numbers, of n bits or more. Otherwise, given

a possible value for the sum s there would be a huge number of solutions.

As a consequence, F is an expanding function which maps n/2 + 1 bits to

approximately n bits. For this reason, it cannot directly be used repeatedly

to construct an ultimately periodic sequence. Instead, we need to use some

function G to truncate the n bits back to n/2 + 1. Of course, from a starting

point y0 we can easily deļ¬ne a sequence, using the recursion formula:

yi+1 = G(F (yi )). (8.3)

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 253

Clearly, this sequence usually leads to a cycle which can be found using a

cycle detection algorithm. Yet, this cycle does not usually produce any real

collision and cannot yield a knapsack solution.

The reason for this phenomenon is that by writing down the sequence y,

we also implicitly deļ¬ned another sequence z by:

z0 = F (y0 ) and zi+1 = F (G(zi )). (8.4)

This implicit sequence z also yields a cycle. However, z values are short

numbers on n/2 + 1 bits, while y values are long numbers on n bits. As a con-

sequence, with overwhelming probability the sequences enter a cycle through

a z-collision and not through a y-collision. This means that we eventually ļ¬nd

an initial collision zi = zj which deļ¬nes the entrance of the cycle. This colli-

sion of course leads to a collision on the subsequent y values, i.e., yi+1 = yj+1 .

However, both values result from the same sum expression and thus cannot

oļ¬er a solution to the knapsack problem. It is worth noting that for knapsack

problems based on small knapsack elements, G would no longer be necessary

and this approach would work.

8.1.2 The algorithm of Shamir and Schroeppel

To overcome this diļ¬culty, Shamir and Schroeppel devised an algorithm

which uses less memory than the birthday attack but still requires some non-

negligible amount of memory. Before presenting the algorithm of Shamir and

Schroeppel, it is useful to ļ¬rst remember from Section 6.3 that the birthday

attack can be implemented in two diļ¬erent ways. The ļ¬rst possibility is to

store a sorted copy of L1 , then to generate on the ļ¬‚y elements of L2 and

search them by dichotomy in L1 . The second possibility is to sort both L1

and L2 then to read both lists sequentially, advancing in turns either in L1 or

in L2 , until collisions are found. In the generic setting, the second possibility

is the more costly in terms of memory, because two lists1 need to be stored.

However, in order to implement this option, we do not really need to store L1

and L2 , instead we only need to be able to go through L1 (and L2 ) step by step

in increasing order. The key idea introduced by Shamir and Schroeppel shows

that for subset sum problems, we can produce the elements of L1 in increasing

order without fully storing L1 . We also need to produce the elements of L2 in

increasing order. Since L2 is constructed by subtracting a partial subset sum

from s, it means that we need to produce the corresponding partial sums in

decreasing order. This can be achieved through a straightforward adaptation

of the method used with L1 .

Assume for now than we are given a subset of the knapsack elements and

denote them by y1 , . . . , yl . We need a memory eļ¬cient subroutine to produce

1 Moreover, when n is odd, the second list is twice as long as the ļ¬rst.

Ā© 2009 by Taylor and Francis Group, LLC

254 Algorithmic Cryptanalysis

all the sums for this subset in either increasing or decreasing order. For this,

we ļ¬rst split l roughly in half, letting l = l1 + l2 and construct two sets:

l1

ei yi | ā (e1 , Ā· Ā· Ā· , el1 ) ā {0, 1}l1

Y1 = and (8.5)

i=1

l

ei yi | ā (el1 +1 , Ā· Ā· Ā· , el ) ā {0, 1}l2

Y2 = .

i=l1 +1

Any knapsack element can be uniquely written as the sum of an element from

Y1 and an element from Y2 .

To solve our problem we ļ¬rst ask the following question: āGiven two such

sums, Ļ = Ī³1 + Ī³2 and Ļ = Ī³1 + Ī³2 what can we say about the relative orders

of Ļ and Ļ from the relative orders between the Ī³ values?ā Clearly, if Ī³1 ā„ Ī³1

and Ī³2 ā„ Ī³2 then Ļ ā„ Ļ . Similarly, when both Ī³s are smaller, then Ļ is

smaller. However, when if Ī³1 ā„ Ī³1 and Ī³2 ā¤ Ī³2 , prediction is not easy.

To make good use of this partial information, we ļ¬rst sort Y2 , then for each

number in Y1 , we add it to the ļ¬rst element of Y2 and memorize the sum

together with its decomposition. For eļ¬ciency, we memorize these sums in

a binary search tree. This allows us to retrieve these sums in sorted order.

After this initialization, we proceed as follows, at each step take the smallest

sum from the tree and produce it as the next element of Y1 + Y2 . After that,

we look at the decomposition, update the sum and put it back in the tree.

The update is done by keeping the same number from Y1 and by moving to

the next number in Y2 (which is easy, since Y2 is sorted). After the update,

we put the new value back into the tree, unless the end of Y2 was reached.

Using a self-balancing tree, since the size of tree is the size of Y1 , the cost of

insertion and deletion is guaranteed to be O(log |Y1 |).

To generate the values in decreasing order, two minor changes2 are required,

start from the end of Y2 instead of the beginning and take the last tree element

at each step instead of the ļ¬rst one. In both versions of the algorithm, each

element of Y1 is paired with each element of Y2 and thus the set Y1 + Y2

is completely constructed. Putting together one copy of algorithm going in

increasing order for Y1 + Y2 with another copy going in decreasing order for

Y3 + Y4 (and thus in increasing order for s ā’ Y3 ā’ Y4 ) allows us to ļ¬nd all

occurrences of s in Y1 + Y2 + Y3 + Y4 . From a complexity point-of-view, this

requires a running time O(n2n/2 ) and a memory of O(2n/4 ) integers. The

running time is the same as the running time of the generic birthday attack,

but the required memory is much lower. Note that the constants implicit in

the O notations depend on the value of n modulo 4. The best case occurs

when n is a multiple of 4.

2 Alternatively,we can simply use the exact same algorithm as before, using the reversed

order instead of the natural order on integers.

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 255

Algorithm 8.1 Initialization of Shamir and Schroeppel algorithm

Require: Input knapsack elements y1 , . . . , y

Let 1 āā’ /2 and 2 āā’ ā’ 1

Construct an array Y2 containing the 2 2 possible sums of y 1 +1 , . . . , y .

Sort the array Y2

Create an empty AVL-tree T and a set a 2 1 free nodes for this tree structure

for (e1 , Ā· Ā· Ā· , e 1 ) in {0, 1} 1 do

Compute Y1 āā’ i=1 ei yi 1

Take the next free node N

Let Nvalue āā’ Y1 , Nindex āā’ 0 and NsortKey āā’ Y1 + Y2 [0] {Insert

into the ļ¬elds of N the value Y1 , the index 0 and the sorting key Y1 +Y2 [0]}

Insert N into T {according to the value of the sorting key}

end for

Output T and Y2 for use in Algorithm 8.2

Algorithm 8.2 Get next knapsack sum with Shamir and Schroeppel algo-

rithm

Require: Input array Y2 and current state of tree T

If T is empty, Output ā˜Finishedā™

Remove the smallest node of T , let N be this node

Let S āā’ NsortKey

Let Nindex āā’ Nindex + 1

if Nindex < 2 2 then

Let NsortKey āā’ Nvalue + Y2 [Nindex ]

Insert modiļ¬ed node N back into the tree T

end if

Output sum S and updated tree T

Ā© 2009 by Taylor and Francis Group, LLC

256 Algorithmic Cryptanalysis

Clearly, the algorithm of Shamir and Schroeppel does not make use of the

strong internal structure of the list Y1 , Y2 , Y3 and Y4 which are built by

adding together knapsack elements. However, taking advantage of this extra

structure to either speed up the search for a knapsack solution or further

reduce the required memory is a nice open problem. Not using the internal

structure of Y1 , Y2 , Y3 and Y4 also means that the algorithm of Shamir and

Schroeppel has a large range of applicability. We study it in the sequel of

this chapter together with other algorithms which present the same kind of

tradeoļ¬ between time and memory.

8.2 General setting for reduced memory birthday at-

tacks

The idea of splitting a problem in four parts instead of two for the usual

birthday attacks, may be used in a variety of applications. In the general

setting, we are given four sets L1 , L2 , L3 and L4 of elements from a group G

with group operation , together with a target group value g. The problem

is then to ļ¬nd all solutions (g1 , g2 , g3 , g4 ) ā L1 Ć— L2 Ć— L3 Ć— L4 of the equation:

g1 g2 g3 g4 = g. (8.6)

Assuming that the sets have sizes N1 , N2 , N3 and N4 of the same order

of magnitude, a generic birthday-based attack exists for this problem. This

generic attack ļ¬rst constructs the two sets:

L = {g1 g2 | ā (g1 , g2 ) ā L1 Ć— L2 } and (8.7)

g4 )ā’1 | ā (g3 , g4 ) ā L3 Ć— L4 ,

L= g (g3

where hā’1 denotes the inverse of h in the group G. Then the attack searches

for collisions between L and L . Each collision yields a solution of Equa-

tion (8.6).

The size of L is N1 N2 and the size of L is N3 N4 . Assuming without loss

of generality that N1 N2 ā¤ N3 N4 , the time complexity of the generic attack

is O(max(N1 N2 log(N1 N2 ), N3 N4 )) and the memory complexity measured in

number of group elements is N1 N2 . The knapsack example from Section 8.1

shows that for the additive group of integers (Z, +), the memory complexity

can be lowered without increasing the time complexity. More generally, we

can ask for a list of groups where this improvement can be achieved. In

the sequel, we give a list of speciļ¬c examples where the improvement can be

achieved, followed by a more general treatment and examples of groups where

no known method exists.

Note that without loss of generality, we can always consider that the target

value g is the group identity element. Indeed composing Equation (8.6) with

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 257

g ā’1 under the group operation, we ļ¬nd:

g ā’1 ) = 1G .

g1 g2 g3 (g4 (8.8)

g ā’1 to

Thus it suļ¬ces to transform L4 by replacing each element g4 by g4

make g disappear from Equation (8.6).

8.2.1 Xoring bit strings

A simple and natural group to consider in computer science and cryptogra-

phy is the group of n-bit strings, together with the group operation obtained

by bitwise xoring strings. Mathematically, this group can be viewed in many

diļ¬erent ways, the simplest is to consider it as the n-th fold direct product

Gn where G2 is {0, 1} with addition modulo 2. Alternatively, adding some

2

more structure, it can also be viewed as the additive group of the ļ¬nite ļ¬eld

F2n .

In this group, our generic problem reduces itself to ļ¬nd all quadruples of

values whose XOR is equal to the speciļ¬ed target. As usual, we can assume

without loss of generality that the speciļ¬ed target is the group identity ele-

ment, i.e., the string 0n .

Clearly, with this group, the approach used by Shamir and Schroeppel for

subset sum problems cannot work. Indeed, this approach heavily relies on the

existence of an order ā¤ on Z , compatible with the group operation. In Gn , no

2

such order exists. Instead, we need to use a very diļ¬erent method introduced

in [CJM02].

The basic idea of the method starts by selecting a subset of t bits among

the n available bits. For simplicity, we choose the t high bits of the n-bit

numbers that encode elements of Gn . For each value x in Gn , we denote by

2 2

[x]t the restriction of x to its t high bits. This restriction is compatible with

the XOR operation, more precisely we have:

[x ā• y]t = [x]t ā• [y]t . (8.9)

Thus for any solution (x1 , x2 , x3 , x4 ) of Equation (8.6) rewritten in Gn as:

2

x1 ā• x2 ā• x3 ā• x4 = 0n (8.10)

we necessarily have:

[x1 ā• x2 ]t = [x3 ā• x4 ]t . (8.11)

In other words, the value of the t high bits is the same when xoring the leftmost

elements x1 and x2 , or when xoring the rightmost elements x3 and x4 . Of

course, we do not know in advance this middle value of the t high bits; however,

we know for sure that it is shared between both sides. As a consequence,

denoting by Mt this middle value, we can enumerate all possibilities for Mt and

for each possibility create the list LMt of all pairs (x1 , x2 ) with [x1 ā•x2 ]t = Mt

Ā© 2009 by Taylor and Francis Group, LLC

258 Algorithmic Cryptanalysis

and the list LMt of pairs (x3 , x4 ) with [x3 ā• x4 ]t = Mt . Any collision between

these lists for any value of Mt yields a solution of Equation (8.10).

For a ļ¬xed value of Mt , the expected number of pairs in the ļ¬rst of these

two lists is N1 N2 Ā· 2ā’t and the expected numbers of pairs in the second list is

N3 N4 Ā· 2ā’t . For typical applications, we have N1 ā N2 ā N3 ā N4 ā 2n/4 .

Thus, to keep the memory requirements around O(2n/4 ) it is natural to choose

t ā n/4. Indeed, with this choice the two middle lists are of approximate size

2n/4 Ā· 2n/4 Ā· 2ā’t ā 2n/4 . Moreover, for each value of Mt ļ¬nding collisions

between these lists costs O((n/2 ā’ t) 2n/2ā’t ), i.e., O(n 2n/4 ) for our choice of

t. Thus, taking into account the loop on the 2t possibilities for Mt , the total

time for ļ¬nding collisions is O(n 2n/2 ).

To make sure that the approach is sound, we need to make sure that LMt

and LMt can be constructed eļ¬ciently. Assuming that L2 is already sorted

with respect to the value of the t high bits of each element, all solutions of

[x1 ā• x2 ]t = Mt can be found in time O(N1 log N2 ) using a simple variation

of the collision search method with a sorted list. For each element x1 of

L1 , compute [x1 ]t ā• Mt and search this value by dichotomy in the high bits

of L2 . When N1 ā N2 ā N3 ā N4 ā 2n/4 and t ā n/4 the time and

memory complexity for building all lists LMt and LMt in turn are O(n2n/2 )

and O(2n/4 ). Note that it is essential to remember that we only need to

store the lists LMt and LMt corresponding to the current value of Mt . The

corresponding pseudo-code for this approach is given as Algorithm 8.3 or

alternatively as Algorithm 8.4

This method is quite diļ¬erent from the method of Shamir and Schroeppel

but it gives similar performances. Moreover, it removes the need to deal with

balanced trees altogether and, thus, it is simpler to implement.

8.2.2 Generalization to diļ¬erent groups

In the general setting, the group G from which elements of our four lists are

drawn is an arbitrary group. For now, we assume that this group is abelian.

In that case, group classiļ¬cation tells us that G is isomorphic to an additive

group:

Z/n1 Z Ć— Z/n2 Z Ć— Ā· Ā· Ā· Ć— Z/nt Z,

where each ni divides niā’1 . As a consequence, it is natural to ļ¬rst consider

the case where G has this speciļ¬c form. In the previous section, we dealt with

one extreme example of this case: (Z/2Z)n .

Here, the size of the group G is N , the product of the ni values. As usual, we

expect a solution when the size of the lists is of the order of N 1/4 . Depending

on the distribution of the ni values, several approaches are possible. The ļ¬rst

approach works when some partial product of the ni values is of the order of

N 1/4 . In that case, we can write G as a direct product G1 Ć— G2 by grouping

all the contributions to this partial product in G1 and the rest in G2 . In this

direct product, G1 is of size around N 1/4 and G2 of size around N 3/4 . Due

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 259

Algorithm 8.3 Generating all solutions to Equation (8.10)

Require: Input four lists L1 , L2 , L3 , L4 of ā 2n/4 n-bit values each

Require: Parameter t

Sort L2 and L4

Allocate arrays L and L of size 2n/2ā’t plus some margin

for Mt from 0 to 2t ā’ 1 do

Reinitialize L and L to empty arrays

for all x1 ā L1 do

Search ļ¬rst occurrence x2 of ([x1 ]t ā• Mt ) in L2 high bits

while [x2 ]t = [x1 ]t ā• Mt do

Add x1 ā• x2 to list L {Keeping track of x1 and x2 values}

Skip to next x2 in L2

end while

end for

for all x3 ā L3 do

Search ļ¬rst occurrence x4 of [x3 ]t ā• Mt in L4 high bits

while [x4 ]t = [x3 ]t ā• Mt do

Add x3 ā• x4 to list L {Keeping track of x3 and x4 values}

Skip to next x4 in L4

end while

end for

Sort L and L

Find and output all collisions between L and L as quadruples

(x1 , x2 , x3 , x4 ).

end for

Ā© 2009 by Taylor and Francis Group, LLC

260 Algorithmic Cryptanalysis

Algorithm 8.4 Alternative option to Algorithm 8.3

Require: Input four lists L1 , L2 , L3 , L4 of ā 2n/4 n-bit values each

Require: Parameter t

Sort L2 and L4

Allocate arrays L and L of size 2n/2ā’t plus some margin

for Mt from 0 to 2t ā’ 1 do

Reinitialize L and L to empty arrays

for all x1 ā L1 do

Search ļ¬rst occurrence x2 of ([x1 ]t ā• Mt ) in L2 high bits

while [x2 ]t = [x1 ]t ā• Mt do

Add x1 ā• x2 to list L {Keeping track of x1 and x2 values}

Skip to next x2 in L2

end while

end for

Sort L

for all x3 ā L3 do

Search ļ¬rst occurrence x4 of [x3 ]t ā• Mt in L4 high bits

while [x4 ]t = [x3 ]t ā• Mt do

Search x3 ā• x4 by dichotomy in L

if x3 ā• x4 found then

Output collision as a quadruple (x1 , x2 , x3 , x4 ).

end if

Skip to next x4 in L4

end while

end for

end for

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 261

to this repartition, a straightforward adaptation of Algorithm 8.3 gives the

desired result. It suļ¬ces to take as middle value Mt the projection of group

elements on the ļ¬rst coordinate G1 . Then, for each element x1 of the ļ¬rst list

L1 , with ļ¬rst coordinate [x1 ]G1 it is easy to ļ¬nd all elements x2 from L2 with

a G1 -coordinate that satisļ¬es:

[x1 ]G1 + [x2 ]G1 = Mt . (8.12)

Similarly, we ļ¬nd pairs (x3 , x4 ) with:

[x3 ]G1 + [x4 ]G1 = ā’Mt . (8.13)

To ļ¬nalize the computation, it suļ¬ces to search for collisions between G2

coordinates of the values of x1 + x2 and ā’(x3 + x4 ) occurring in the middle

lists.

When it is not possible to write G as a direct product with a correct size

repartition, the four-list problem needs a diļ¬erent approach. To illustrate

this, let us consider G = Z/pZ for a prime p. Here, G cannot be written as a

proper direct product. Instead, a simple alternative is to lift the problem to

Z and remark that for any solution:

x1 + x2 + x3 + x4 = 0 (mod p), (8.14)

the integer x1 + x2 + x3 + x4 is either 0, p, 2p or 3p, assuming that each

representative xi is chosen in [0, p ā’ 1]. Moreover, the value 0 can only be

reached when x1 = x2 = x3 = x4 = 0. As a consequence, over Z/pZ we can

use three consecutive applications of the approach of Shamir and Schroeppel

described in Section 8.1 and solve the four-list problem without using too

much memory.

The same approach is easily generalized to a direct product with a bad

repartition. In that case, we write G = Z/N1 Z Ć— G2 where N1 is an integer

greater than the size of G to the power 3/4 and G2 is the remaining part

of the group structure. Denoting elements of G as pairs (x, y) and using

the approach of Shamir and Schroeppel, it is easy to create the sublist of

values x1 + x2 in increasing order over the integers and the sublist of x3 + x4

in decreasing values. From these lists, we extract partial solutions with x-

coordinates summing up to either N1 , 2N1 or 3N1 . Checking that in addition

the y coordinates also sum to 0 modulo N2 is a simple matter.

8.2.2.1 Badly presented groups

When the group G is not directly given as in additive form as

Z/n1 Z Ć— Z/n2 Z Ć— Ā· Ā· Ā· Ć— Z/nt Z,

it is not completely clear whether and how a reduced memory birthday para-

dox approach can be followed. Once again, several cases need to be consid-

ered. If the cardinality of G is known and if its factorization is accessible, the

Ā© 2009 by Taylor and Francis Group, LLC

ńņš. 9 |