ńņš. 8 |

sort, is a straight implementation of the divide and conquer idea. It works by

dividing the list in two sublists of half-size, sorting the sublists and merging

the two sorted sublists into a single sorted list. The basic idea for merging

is already described in Section 6.3 as a way of ļ¬nding collisions between two

sorted lists. Writing down the merge sort algorithm from this description

is a simple matter. The main diļ¬culty comes from merging. Indeed, when

merging two sorted lists into a single one, it is essential to have some place to

store the merged list. Indeed, it is very hard to do the merge in place, reusing

the storage of the two sublists. As a consequence, merge sort requires extra

memory in addition to the memory used by the input array. With a little

care, a single auxiliary array of the same size as the input array suļ¬ces. One

approach is to copy the original array into the auxiliary array, to sort both

halves of the auxiliary array and to merge them back into the original. When

sorting the halves of the auxiliary array, the role of original and auxiliary

arrays are reversed. Moreover, most of the copy operations can be removed,

indeed at each stage after the ļ¬rst one, we may see that copying the current

original arrays to the auxiliary ones simply rewrite the same data over and over

again. In fact, if the arrayā™s size is a power of two, it is even better to avoid

the copying altogether and directly start by merging. First merge small lists

and proceed upward until the ļ¬nal merging is done. With luck, if the number

of stages is even, the sorted list already is in the correct array. Otherwise,

copy it once from the auxiliary to the main array. However, when the size of

the array is not a power of two, the depth of the computationā™s tree is not

Ā© 2009 by Taylor and Francis Group, LLC

200 Algorithmic Cryptanalysis

uniform and making an initial copy is a great simpliļ¬cation. To distinguish

between the top level where the allocation of an auxiliary array and the initial

copy is done from the other levels, we split the algorithm into two parts: a

wrapper (Algorithm 6.8) and the main procedure (Algorithm 6.7).

Algorithm 6.7 Merge sort main procedure

Require: Input an array X[start Ā· Ā· Ā· end] of end ā’ start + 1 elements

Require: Input an auxiliary array Y [start Ā· Ā· Ā· end]

If start ā„ end, the array is already sorted, return

Let mid āā’ start+end 2

Recursively merge sort Y [start Ā· Ā· Ā· mid] with auxiliary array X

Recursively merge sort Y [mid + 1 Ā· Ā· Ā· end] with auxiliary array X

(Note the reversal of main and auxiliary array in the calls)

Let start1 āā’ start

Let start2 āā’ mid + 1

Let merge āā’ start

while start1 ā¤ mid and start2 ā¤ end do

if Y [start1 ] ā¤ Y [start2 ] then

Let X[merge] āā’ Y [start1 ]

Increment start1

else

Let X[merge] āā’ Y [start2 ]

Increment start2

end if

Increment merge

end while

while start1 ā¤ mid do

Let X[merge] āā’ Y [start1 ]

Increment start1

Increment merge

end while

while start2 ā¤ end do

Let X[merge] āā’ Y [start2 ]

Increment start2

Increment merge

end while

Return sorted array X[start Ā· Ā· Ā· end]

Merge sort is a very eļ¬cient algorithm, however, since it requires additional

memory, it cannot be used when the array to be sorted ļ¬lls the computerā™s

main memory. In that case, it is better to use a diļ¬erent sorting algorithm.

Note that using a more complicated merging method, it is also possible to

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 201

Algorithm 6.8 Merge sort wrapper

Require: Input an array X[start Ā· Ā· Ā· end] of end ā’ start + 1 elements

If start ā„ end, the array is already sorted, return

Create an auxiliary array Y [start Ā· Ā· Ā· end]

Copy X into Y

Call merge sort procedure on X[start Ā· Ā· Ā· end] with auxiliary array Y

Free the memory occupied by Y

Return sorted array X[start Ā· Ā· Ā· end]

devise an in-place variation of the merge sort algorithm, e.g., see [KPT96].

6.3.1.2.2 Quicksort sort. In order to largely reduce the need for addi-

tional memory, quicksort implements the divide and conquer idea in a diļ¬erent

way. At each stage, it randomly chooses an element in the list and uses this

element called the pivot to divide the list into two sublists. The ļ¬rst sublist

contains all elements smaller than the pivot and the second sublist the other

elements. During the stage, elements are moved around to put the list in a

new order, with the ļ¬rst sublist at the beginning, the pivot in the middle

and the second sublist at the end. This can be done in linear time, without

requiring extra memory. If the two sublists are then independently sorted,

it is clear that the list itself becomes sorted. Despite the simplicity of the

basic idea, this algorithm presents several complex issues. First of all, the

need to choose the pivot at random is essential. If the pivot is chosen in a

deterministic way, then it is possible to conceive examples of adversary lists

which would require quadratic time to be sorted. Second, making the com-

plexity analysis precise is a diļ¬cult matter because the running time depends

on the random choices made by the algorithm and making the bad choice at

each stage leads to a quadratic running time. However, with overwhelming

probability, the size of the sublists decreases quickly between stages and the

running time stays in the range O(N log N ). Finally, to avoid bad lists lead-

ing to quadratic running time, it is also necessary to specify more precisely

what happens when we encounter an element which is equal to the pivot. If

we decide that all such elements go to the ļ¬rst (or second) sublist, we are in

trouble. Indeed, in that case, sorting a list containing N copies of the same

element takes quadratic time! To avoid this, the simplest approach is to put

elements in the ļ¬rst sublist if they occur before the pivot in the original list

and to put them in the second sublist otherwise.

Despite these diļ¬culties, quicksort is a very useful practical algorithm.

Moreover, it is quite easy to detect bugs which lead to quadratic running time.

Indeed, due to the recursive nature of the algorithm, each time the program

enters a deeper stage, it adds some elements on the recursivity stack. When a

bug leads to quadratic behavior, the size of the stack needs to become linear

in N , which quickly leads to an overļ¬‚ow. When such an overļ¬‚ow occurs,

Ā© 2009 by Taylor and Francis Group, LLC

202 Algorithmic Cryptanalysis

the program aborts and reports an error. This misbehavior usually suļ¬ces to

ļ¬nd the bug. Note that the recursive nature of the algorithm also implies that

quicksort still requires more than a constant amount of memory in addition

to the original array. The additional memory needed is proportional to the

depth of the computation tree, i.e., it is of the form O(log N ).

Algorithm 6.9 Quicksort

Require: Input an array X[start Ā· Ā· Ā· end] of end ā’ start + 1 elements

If start ā„ end, the array is already sorted, return

Choose a random position pos in [start Ā· Ā· Ā· end]

Copy start to start0

Copy end to end0

Let Y āā’ X[pos]

Let X[pos] āā’ X[start]

Let i āā’ start + 1

while i ā¤ end do

if (X[i] < Y ) or (X[i] = Y and i < pos) then

Let X[start] āā’ X[i]

Increment start

Increment i

else

Exchange X[end] and X[i]

Decrement end

end if

end while

Let X[start] āā’ Y

Recursively apply quicksort to X[start0 ..start ā’ 1]

Recursively apply quicksort to X[end..end0 ]

Return sorted array X[start0 Ā· Ā· Ā· end0 ]

6.3.1.2.3 Radix sort. From the descriptions of merge sort and quicksort,

it is natural to ponder whether it is possible to devise a sort algorithm that

requires neither additional memory nor randomness. Radix sort neatly solves

this problem. However, this algorithm is not generic and it requires addi-

tional knowledge about the order used for sorting. For many cryptographic

applications, this is not a problem at all, but it makes index sort badly suited

in other cases, especially when sorting complex objects, such as, for example,

database entries. In fact, there are two diļ¬erent version of radix sort, which

sort by considering bits either from left to right or from right to left. Here, we

only consider the left to right version. To explain this version of radix sort,

let us assume that we want to sort an array of N unsigned n-bit numbers. We

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 203

know that in the sorted array, the numbers with a high order bit set to 0 occur

before the numbers with a high order bit of 1. Similarly, with each of these

subarrays, the numbers should be sorted according to the value of the second

high order bit. And so on, until the low order bit is reached. The complexity

analysis is simple, each stage which consists of moving numbers with some bit

equal to 0 before numbers with the bit equal to 1 can be performed in linear

time. Since there is one stage per bit, the complexity is O(nN ). With very

large numbers, when n log N , this is worse than merge sort or quicksort.

However, it is not a very frequent case. Moreover, if the numbers are (or

look) random, the complexity can be improved with a very simple change:

simply abort the recursion when a list of zero or one element is to be sorted.

With random numbers, this cuts oļ¬ the depth of the computation tree around

log2 N , thus improving the complexity as required.

Radix sort can also be used with signed numbers. In that case, we should

remember that the high order bit is the sign bit. As a consequence, numbers

with a high bit set to 1 need to be at the beginning because they are neg-

ative. Thus reversing the role of the ļ¬rst and second sublists for this bit is

the only change to deal with signed numbers. For all other bits, we should

proceed as we did with unsigned numbers (assuming that negative numbers

are represented using the two-complement convention).

Algorithm 6.10 Radix sort

Require: Input an array X[start Ā· Ā· Ā· end] of end ā’ start + 1 unsigned integers

Require: Bit position b to consider in the integers

If start ā„ end, the array is already sorted, return

Copy start to start0

Copy end to end0

while start ā¤ end do

if Bit b of X[i] is 0 then

Increment start

else

Exchange X[end] and X[start]

Decrement end

end if

end while

Recursively radix sort X[start0 ..start ā’ 1] on bit b ā’ 1

Recursively radix sort X[end + 1..end0 ] on bit b ā’ 1

Return sorted array X[start0 Ā· Ā· Ā· end0 ]

6.3.1.2.4 Heap sort. From a theoretical point-of-view, heap sort is very

interesting. First, it is a non-recursive algorithm and thus it can truly achieve

Ā© 2009 by Taylor and Francis Group, LLC

204 Algorithmic Cryptanalysis

in-place sorting: in addition to the array being sorted, it only requires a

constant amount of memory. Second, it uses a very nice implicit tree structure

as the basis of the algorithm. This tree structure is called a heap, it is a binary

tree where the value of each node is larger4 than or equal to the values of all

nodes in the subtree attached to it. The tree structure is implicit, because it

does not use any pointers but instead works with the convention the left and

right children of the element in position i are respectively stored5 in positions

2i + 1 and 2i + 2. Whenever the position of a child is beyond the end of

the array, it simply means that the child does not exist. For example, when

position 2i + 1 is not in the array, then node i is a leaf. When 2i + 1 is the

last position in the array, then node i has a single child (on the left).

Using this structure, heap sort works in two phases. During the ļ¬rst phase,

it takes the input array with the implicit tree structure and reorder it to satisfy

the heap property. This is done by making sure that each element is larger

than its children. During the second phase, it progressively reduces the size

of the tree by taking out the largest element and placing it where it belongs

in the sorted array. During this second phase, one part of the array is used

to store a heap tree and the other part to store an already sorted part of the

array. At the end of the second phase, the tree part is empty and the array

is fully sorted.

To write down the heap sort algorithm, we ļ¬rst need a heap insertion pro-

cedure which allows us to correctly add elements to an existing heap. This

insertion works as follows: assuming that the subtrees starting in position

2i + 1 and 2i + 2 are correct heap trees, it inserts an element in position i and

if necessary moves it down the tree to create a correct heap tree in position i.

During the ļ¬rst phase, this procedure is used iteratively on all positions, start-

ing from the end of the original array. Of course, remarking that subtrees with

a single node are always heap trees, there is nothing to be done for the second

half of the array and we may start in the middle. During the second phase, we

take the element at the root of the heap tree, i.e., the largest element in the

tree and put it at the beginning of the sorted part. Since this partially breaks

the heap structure, we use the insertion procedure again. More precisely, we

move the last element of the previous heap at the root and apply the insertion

to guarantee the heap property on a new tree, with one fewer element. Both

phases of the heap sort algorithm require N calls to the insertion procedure,

whose complexity is bounded by the tree depth. As a consequence, the total

runtime is O(N log N ) as with the other fast sort algorithms.

Despite its good complexity analysis, in practice, heap sort does not perform

very well, especially on large arrays, because it causes many cache misses when

moving elements around during the heap insertion.

4 There is, of course, a variation of the heap tree where each node is smaller than its

successors, but this is not the version we use here.

5 This assumes that the array starts at position 0, for arrays starting with position 1, we

should use positions 2i and 2i + 1 instead.

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 205

Algorithm 6.11 Heap sort

Require: Input an array X[0 Ā· Ā· Ā· end ā’ 1] of end elements

First Phase: Heap creation

for i from endā’2 down to 0 do

2

(Note: Subtrees at positions 2i + 1 and 2i + 2 are already correct)

Call heap insertion on position i with tree size end

end for

Second phase: Sorting

while end > 1 do

Put largest element at the end: Exchange X[0] and X[end ā’ 1]

Decrement end

Call heap insertion on position 0 with (reduced) tree size end

end while

Return sorted array X

Algorithm 6.12 Insertion in heap procedure

Require: Input an array X[0 Ā· Ā· Ā· end ā’ 1] of end elements

Require: Input a position pos where the heap property must be enforced

while 2 Ā· pos + 1 ā¤ end ā’ 1 do

Let val āā’ X[pos]

Let dest āā’ pos

if X[2 Ā· pos + 1] > val then

Let val āā’ X[2 Ā· pos + 1]

Let dest āā’ 2 Ā· pos + 1

end if

if 2 Ā· pos + 2 ā¤ end ā’ 1 and X[2 Ā· pos + 2] > val then

Let dest āā’ 2 Ā· pos + 2

end if

if dest = pos then

Exchange X[pos] and X[dest]

Let pos āā’ dest {loop continues at lower level of the tree}

else

Break from loop {The tree is now a heap}

end if

end while

Ā© 2009 by Taylor and Francis Group, LLC

206 Algorithmic Cryptanalysis

6.3.1.3 Optimality of sort algorithms and beating the bound

Fast algorithms are able to sort N elements in O(N log N ) time and there is

a trivial lower bound of O(N ) on this running time, since sorting an array at

least requires reading the N elements in the array and writing the sorted array

as output. A natural question is to ask whether it is possible to outperform

O(N log N ). In fact, it can be shown that in a generic sense, this is impossible.

However, in some special cases, where additional information is known about

the array and the elements it contains, it is possible to beat this bound and

come up with linear time special purpose sort algorithms.

Let us ļ¬rst explain why O(N log N ) is optimal. For that, assume that we

need to sort an array of N distinct elements. We also assume that the array

is accessed only in a limited way. Elements can be compared and moved

around, nothing else. For our array of N elements, N ! diļ¬erent orders are

possible. Moreover, given the sorted array that the algorithm outputs and the

computation trail, it is possible to reconstruct the initial order of elements.

As a consequence, the algorithm has to learn enough information to be able to

fully reconstruct the input order. The number of bits necessary to encode this

order is log2 (N !) ā N log2 N . Since any comparison reveals at most one bit of

information, the algorithm requires on average at least N log2 N comparisons.

This yields the desired lower bound on the running time.

To show that this bound can be beaten in some special cases, assume that

we want to sort an array of N integers in [1 Ā· Ā· Ā· Ī»N ] for some constant Ī».

Consider Algorithm 6.13. The initialization of the algorithm costs O(Ī»N ), the

ļ¬rst loop costs O(N ), the second loop costs O(Ī»N ) for the control and O(N )

for the printing. All in all, since Ī» is a constant, this algorithm running time

is O(N ). Clearly, the algorithm outputs numbers in increasing order, since it

only writes down the loop variable i. Moreover, it writes i as many times as i

appears in X. Thus, the algorithm outputs a sorted copy of array X. It beats

the lower bound on sort algorithms, because it is able to obtain information

about X through an additional channel. This algorithms does not restrict

itself to comparisons; in fact, it does not perform any comparisons at all, only

(implicit) equality tests.

Algorithm 6.13 Count sort: Beating the sort lower bound

Require: Input a sorted array X of N elements in [1 Ā· Ā· Ā· Ī»N ]

Allocate an array M of Ī»N elements initialized to zero

for i from 1 to N do

Increment M [X[i]]

end for

for i from 1 to Ī»N do

Print M [i] times the number i

end for

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 207

6.3.1.4 Sort algorithms and stability

When using sort algorithms with duplicate elements, we can ask what hap-

pens to equal elements: Do they remain in the same relative order or can

they be moved around? If a sort algorithm always preserves the relative or-

ders of copies of the same element, it is called a stable sort algorithm. Since

this can sometimes be an important property, it is useful to know which sort

algorithms are stable and which are not. Of course, any sort algorithm can

be made stable by simplify adding the original position to any element, using

this value to break ties between identical entries. The problem with this ap-

proach is that it requires a potential large amount of extra memory to store

the initial positions. As a consequence, when needed an intrinsically stable

sort is preferable. Among our sort algorithms with quadratic running time it

is easy to check that bubble sort and selection sort are stable. For insertion

sort, the sort is stable if and only if the dichomoty search we use always return

the last position where an element can be inserted. We leave this veriļ¬cation

as an exercise.

Of course, it would be interesting to use an algorithm which is both fast and

stable. Among our selection of sort algorithms, only merge sort has this extra

property. Quicksort and radix sort can reasonably easily be made stable 9 if

the sorted elements are written in a copy of the initial array. However, the

versions that do not need such a copy move elements around too much to

satisfy this property.

6.3.2 Hash tables

In a book about cryptography, when hash functions are discussed, the reader

is usually expected to think about cryptographically strong hash functions,

without collisions, preimages or other strong properties. This section is an

exception to this general rule, here we consider algorithmic hash functions

and only require that for ordinary inputs (not selected by an adversary) to

these functions, the outputs of the hash functions are well distributed. When

looking for collisions, such a non-cryptographic hash function can be used to

ļ¬lter out most non-colliding pairs of elements. The key requirement is that

for given two distinct elements, their hash values are rarely equal. Of course,

given two equal elements, their hash values are always equal, so using a hash

function to ļ¬lter pairs cannot miss collisions. In practice, depending on the

set of inputs, the hash functions we use can be extremely simple. The most

extreme case is encountered with random numbers, say on n bits. There,

it suļ¬ces to truncate the numbers, keeping b bits only to get a useful hash

function. For this purpose, we can use the low order bits, the high order bits

or anything else that ļ¬ts well with the particular problem being considered.

It is clear that this process cannot qualify as a strong cryptographic hash

function, for example, ļ¬nding collisions in this function is straightforward.

To ļ¬nd collisions among n-bit numbers using a non-cryptographic hash

Ā© 2009 by Taylor and Francis Group, LLC

208 Algorithmic Cryptanalysis

function h, we proceed as follow. Assuming that the hash values are b-bit

numbers, we create a large hash table of 2b numbers, which can be indexed

by hash values. This table is initialized to some special value ā„. Then we

take numbers of the list one by one, and process number x by writing x in

the hash table T at position h(x) when the position still contains the special

value ā„. Otherwise, we test whether x = T [h(x)], if so, since T [h(x)] is

a previous element, we have found a collision. A very important remark

is that this approach may miss some collisions. For example, assume that

the list of elements contains in this order three elements, x, y and z with

h(x) = h(y) = h(z), y = z and x = y. We also assume that no other element

in the list has the same hash value h(x). Then during the execution of the

algorithm, x is written in T [h(x)] since the position h(x) in table T is empty

when x is processed. When we encounter y, we see that T [h(y)] is not empty

and that y = T [h(y)](= x), so no collision is detected. Similarly, for z we see

that z = T [h(z)](= x), and do not detect a collision. All in all, we missed

the collision between y and z. In order to solve this problem, we should

make sure that when a false collision is detected, the corresponding element

is not dismissed but stored somewhere. In fact, this is a classical problem

when using hash table for other purposes such as database indexing. In these

alternative contexts, collision problems between hash values must be solved

and speciļ¬c collision resolution techniques have been devised. However, in

our context, it is often easier to choose the parameter b in order to avoid this

problem altogether or, at least, with high probability.

In order to use hash tables for collision ļ¬nding as in Algorithm 6.14, it is

important to choose the size of indexes correctly. Assume that we want to

ļ¬nd collisions in a list of L elements using a table indexed by b-bits. To avoid

collision resolution, we should take care to avoid multicollisions involving 3

or more elements. From the analysis of Section 6.2.1, assuming that the hash

values behave as random numbers, such multicollisions appear when:

L3 /6 ā 22b . (6.5)

Forgetting the constant divisor 6, we ļ¬nd that we need to choose b ā 1.5 log2 L.

Assuming that we search a collision between n-bit numbers and that we have

the usual case L ā 2n/2 , this yields b ā 3n/4.

Since this value of b is quite large, it is important to reļ¬ne the analysis,

assuming that we are ready to accept some probability of failure. Clearly, if

we lower b, 3-multicollisions between hash values are going to occur. These

multicollisions make us fail when they hide a real collision on the original

numbers as in the x, y, z example we described earlier. Note that we miss

the collision if only if the bad element occurs ļ¬rst. Since the numbers es-

sentially appear in a random order, we miss the collision with probability

1/3. Thus, preventing 3-multicollisions is not essential, it suļ¬ces to prevent

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 209

4-multicollisions6 . Redoing the analysis we start with:

L4 /24 ā 23b , (6.6)

and we ļ¬nd b ā 2n/3. Since b is still quite large compared to n, this approach

is mostly useful when n is not too large. A typical example is to consider

40-bit numbers and to take b = 26 or maybe 27. With these parameters, the

hash table can ļ¬t in memory, even if we use a wasteful representation and

take 64 bits to represent 40-bit number. Indeed, the memory requirements

are either 512 Mbytes or 1 Gbyte. A noticeable property of using hash tables

compared to sort algorithms is the fact that the list of numbers does not need

to be completely available at the start of the algorithm, it can be generated

online.

Algorithm 6.14 Collision search using hash tables

Require: Input (potentially one by one) a list X of N numbers on n bits

Choose b near 2n/3

Allocate a hash table T [0 Ā· Ā· Ā· 2b ā’ 1] initialized to ā„

for i from 1 to N do

if T [h(Xi )] =ā„ then

Let T [h(Xi )] = Xi

else

if T [h(Xi )] = Xi then

Output ā˜Collision found for value Xi ā™

end if

end if

end for

6.3.2.1 Hash tables and cache misses

Using hash tables to search for collisions yields a very simple algorithm

which is easily implemented. However, in terms of memory accesses, it has

a very bad behavior. Indeed, it reads and writes in a large table in memory

at essentially random positions. As we already know, with current memory

architecture, this behavior causes many cache misses and slows the program a

lot. It would be nice to be able to change the hash table algorithm in order to

minimize this diļ¬culty. We now show how this can be done. For this purpose,

let us ļ¬rst deļ¬ne b0 as the largest possible bit size such that a hash table with

2b0 entries ļ¬ts into the memory cache. Of course, if b ā¤ b0 , Algorithm 6.14

6 We could even go further if we lower again the probability of success, but this particular

choice achieves a nice balance.

Ā© 2009 by Taylor and Francis Group, LLC

210 Algorithmic Cryptanalysis

does not encounter cache misses and we need no improvement. Thus the

interesting case happens for b > b0 . Since the main memory is usually not

arbitrarily larger than the cache memory, it is safe to also assume that b ā¤ 2b0 .

For example, with b0 = 16, i.e., a 64-Kbyte cache, this means that the main

memory is smaller than 4 Gbytes. Under this assumption, we divide each

b-bit hash values into two parts, say the b0 low order bits on one side and the

b1 = b ā’ b0 other bits on the other side. For simplicity, we decompose the

hash function h in the same way and get two smaller hash functions, h0 on b0

bits and h1 on b1 bits. Then, we proceed as follow: in a ļ¬rst phase, for each

incoming value X, we simply throw it on one among 2b1 stacks, more precisely

on the stack numbered h1 (X). Once this ļ¬rst phase is ļ¬nished, the initial list

of numbers has been divided into 2b1 stacks. Within each stack, all numbers

share the same value for h1 , i.e., they share the top b1 bits of their h values. If

the inputs are random numbers, the stacks are roughly of the same size. In a

second phase, we simply run Algorithm 6.14 independently on the content of

each stack in order to ļ¬nd collisions within these stacks. Of course, this yields

the same result as a direct application of Algorithm 6.14 on the original list of

numbers. This approach is described in pseudo-code as Algorithm 6.15. Due

to our hypothesis on b0 , we know that no (or few) cache misses occur during

this second phase. Of course, we also need to check for cache misses during

the ļ¬rst phase. If we represent each stack as an array and write each new

element at the end of the corresponding array, we are simply writing elements

in memory in an ordered fashion within each array. If the number of arrays

is small enough, this does not cause too many cache misses. To estimate the

number of arrays, note that we need to store in cache, the current starting

position within each stack array, together with a few positions at the current

start of each stack. Thus to be safe, the numbers of stacks 2b1 should remain

below some fraction of the cache size. In practice, this means that b1 should

be smaller than b0 ā’ 5 or something similar. Since we only assumed b1 ā¤ b0 ,

there is a remaining gap. If required, this gap can be ļ¬lled by using the same

idea twice: split the original list into stacks, split each stack into substacks

and then search for collisions.

6.3.3 Binary trees

Another interesting method to ļ¬nd collisions is the use of binary search

trees. A binary search tree, is a binary tree that satisļ¬es an important addi-

tional property: for any node in the tree, all the nodes in its left subtree have

smaller (or equal) values and all the nodes in its right subtree have greater

values. Binary search trees are represented explicitly, by storing for each node,

its value and a pointer to each of its children. Due to this representation, they

are quite costly in terms of memory usage. From a runtime point-of-view, all

operations are fairly quick. Inserting, removing or searching for an element

in the tree costs at most the depth of the tree. Since a tree of depth k may

hold 2k+1 ā’ 1 entries, the elementary cost is logarithmic in the size of the tree,

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 211

Algorithm 6.15 Avoiding cache misses with hash tables

Require: Input a list X of N numbers on n-bits

Choose b near 2n/3

Split b as b0 + b1

Create 2b1 stacks of numbers represented by arrays of size 1.1N Ć— 2ā’b1

{with a memory margin of 10 percent}

Allocate a hash table T [0 Ā· Ā· Ā· 2b ā’ 1]

0

for i from 1 to N do

Add X[i] to stack number h1 (X[i])

end for

for k from 0 to 2b1 ā’ 1 do

If stack k overļ¬‚owed, Abort

Reinitialize hash table T [0 Ā· Ā· Ā· 2b ā’ 1] to ā„

for all x in stack k do

if T [h0 (x)] =ā„ then

Let T [h0 (x)] = x

else

if T [h0 (x)] = x then

Output ā˜Collision found for value xā™

end if

end if

end for

end for

Ā© 2009 by Taylor and Francis Group, LLC

212 Algorithmic Cryptanalysis

assuming that the tree is reasonably full. As a consequence, the main issue

when working with binary trees is to control their depths and to check that

they do not grow too much.

The favorable case is when a tree is built by inserting randomly selected

elements. In that case, the resulting tree is well balanced, all subtrees are

similar and the resulting depth is quite small, i.e, it remains within a factor

of the logarithm of the size. The worst case occurs when a sorted sequence

of elements is inserted one at a time to build the tree. In this case, the tree

degenerates into a list and its depth is equal to its size. As a consequence,

searching for an element takes linear time in the size of the tree.

Because of these extreme behaviors, binary search trees are not suited to

all applications. For a given application, it is prudent to test the trees by

monitoring their depths to see if problems do occur. Thankfully, there exits a

nice solution to ļ¬x the depth problem with binary search trees, this solution

uses AVL trees, named after their inventors Adelson-Velsky and Landis. These

trees are also called self-balancing trees. The key idea of AVL trees is for each

operation that modiļ¬es the tree, insertion or deletion, to check that the tree

does not become too unbalanced and if necessary to take corrective action.

Note that AVL trees do not try to minimize the tree depth, this would be too

costly, instead they simply keep the tree depth within a factor of the optimal

value by using a simple criteria. The criteria used by self-balancing trees works

by memorizing for each node the diļ¬erence between the depth of its left and

right subtrees. Diļ¬erences of either 0, 1 or ā’1 are allowed. Whenever a larger

diļ¬erence appears, the tree is modiļ¬ed to reduce it to an acceptable value.

The transformations used to modify the tree are called rotations. Two basic

rotations are possible, left and right rotations, they are inverse of each other.

Applying the correct rotation maintains the binary search property of the tree

while rebalancing the left and right subtrees of the rotated node. Depending

on the detailed structures of the subtree, rebalancing steps requires either a

single or a double rotation.

Use of binary search trees in cryptography. Because of their memory

costs, direct application of binary trees to collision search is not a good idea.

These structures are better suited to maintain collections of data that are

modiļ¬ed as time goes. An interesting example that may be encountered is

the case where after an initial construction the tree evolves in the following

way: at each time remove a tree element, say the smallest one and add in a

new element. Such an example is given in Chapter 8 in order to construct an

improved birthday paradox attack against knapsack cryptosystems.

6.3.3.1 Detailed description of basic binary search trees

Using a binary search tree requires the application of several algorithms.

Each algorithm starts from a correctly constructed tree and modiļ¬es into a

new, also correct, tree. At the very beginning, the tree is initialized at empty.

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 213

To encode a tree, we maintain a pointer which points to the root node of the

tree. When the tree is empty, the pointer to the root node is the nil pointer.

Each node in the tree is a structure which contains the value stored in the

node and two pointers, one to the left child and one to the right child. When a

pointer to a child is the nil pointer, there is no child on the corresponding side.

A node without children is called a leaf, any other node is called an internal

node. Optionally, a node may also contain an additional ļ¬eld to memorize

the size of the left subtree connected to this node. With this additional ļ¬eld,

it becomes possible to recover the i-th element in the binary search tree in

logarithmic time. We describe all algorithms for binary search trees with

this optional ļ¬eld in place. In some algorithms, it is necessary to be able to

backtrack from an element to its father. This can be done in two diļ¬erent

ways, the ļ¬rst one is to maintain for each node an additional pointer that

memorizes its father, the second one is when passing an internal node to a

tree manipulation algorithm to pass it along with the full path from the root

to this node. The ļ¬rst method is simpler to implement but requires even

more memory to encode the trees. An alternative method is never to pass an

internal node directly, but to let the manipulation algorithm search for it, thus

determining the access pass itself. This alternative method may sometimes

require the program to perform the same search twice.

Given a binary search tree, the most important algorithm is probably the

search algorithm that tries to locate a given value in the tree. This algorithm

is quite simple, starting from the root, it compares the reference value to the

value stored in the current node. If they are equal, the search is successful,

if the reference value is smaller, the algorithm moves to the left child and

continues the search, if it is greater, it moves to the right child. Another

very important algorithm is the insertion algorithm. Insertion works in two

phases, ļ¬rst it searches for the value to be inserted, if the value is already

present in the tree, it adds a copy either right after or right before the existing

copy. Otherwise, it adds it at the position where the search ended. Since the

insertion algorithm contains a search part, we do not describe the search in

pseudo-code by itself but include it as part of the insertion Algorithm 6.16.

An important part of the insertion algorithm is the ability to ļ¬nd an empty

position right before or after an internal node. For an empty position right

before a node, move once to the left and then as many times to the right as

needed to get to a nil pointer. The nil pointer is the place we looked for. To

ļ¬nd the position right after a node, move once right, and continue with left

moves until an empty position is reached. While locating the place where the

insertion itself is going to happen, each time we move to the left, the ļ¬eld

representing the size of the left subtree needs to be incremented in order to

remain at the correct value.

Thanks to the optional ļ¬eld that counts the size of the left subtree, we

may also search for the i-th element in the tree (we assume as usual that

elements are counted from zero). Once again we start at the root and proceed

by moving down the tree. If the size of the left subtree for the current node is

Ā© 2009 by Taylor and Francis Group, LLC

214 Algorithmic Cryptanalysis

Algorithm 6.16 Insertion in a binary search tree

Require: Input tree root R, free node N and value to insert X

Set the value of N to X

Set the children of N to nil

Set the left subtree size of N to 0

if R is the nil pointer then

Replace R by N and return

end if

Set Current āā’ R {First search for insertion position}

while true do

if Value of Current node < X then

if Current node has no right child then

Set right child of Current node to N

Return

else

Let Right be the right child of Current node

Set Current āā’ Right

end if

else

Increment the left subtree size ļ¬eld within Current

if Current node has no left child then

Set left child of Current node to N

Return

else

Let Left be the left child of Current node

Set Current āā’ Left

end if

end if

end while

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 215

s, three cases are possible, if i < s then we need to look for the i-th element

in the left subtree, if i = s then the current node is the right one, if i > s then

look for element i ā’ s ā’ 1 in the right subtree.

Finally, we may also delete a given tree node. If the node to delete is a

leaf, it is quite easy, it suļ¬ces to remove it by replacing the pointer to it

in its father node by the nil pointer. After that, we need to backtrack the

access path to this node all the way up to the root to correct the size of each

left subtree where needed. Similarly, to delete a node with a single child, it

suļ¬ces to remove the node and link its child directly to its father. If the node

to delete has two children, deletion is more tricky. First, we need to locate

the immediate predecessor (or successor). Then, we swap the values stored

in the node to delete and in its successor. Finally, we delete the successor.

Since this successor has at most one child, it can be deleted using the simple

approach. Note that during the deletion algorithm, the binary search tree

is temporarily invalid. The value to be deleted may be stored in incorrect

order compared to the rest of the tree. However, when the value is ļ¬nally

removed, the tree becomes correct once again. The deletion algorithm is

given as Algorithm 6.17.

Algorithm 6.17 Deletion in a binary search tree

Require: Input node N to delete, with path P of ancestors to the root

if N is a leaf or has at most one child C then

Replace pointer to N in its father by a pointer to C or to nil

for all Ancestors A in path P do

if N is in the left subtree of A then

Decrement the left subtree size ļ¬eld within A

end if

end for

Return

else

Let M be the left child of N

Let path Q āā’ (P N )

while M has a right child do

Let path Q āā’ (Q M )

Replace M by its right child

end while

Exchange the value ļ¬elds of M and N

Call deletion algorithm on M with path Q {M has at most one child}

end if

Ā© 2009 by Taylor and Francis Group, LLC

216 Algorithmic Cryptanalysis

6.4 Application to discrete logarithms in generic groups

Collisions can be used to computed discrete logarithms in arbitrary groups,

using the baby-step, giant-step method. Before presenting this method, it

is useful to ļ¬rst show that computing discrete logarithms in a group whose

cardinality is known is, assuming that the factorization of the cardinality is

given, no harder than computing discrete logarithms in all subgroups of prime

order. The Pohlig-Hellman algorithm is a constructive method to compute

discrete logarithm in the whole group from a small number of calls to discrete

logarithms computations in the subgroups of prime order.

6.4.1 Pohlig-Hellman algorithm

First, let us recall the deļ¬nition of discrete logarithm. Given a group G,

denoted multiplicatively and two elements of G, say g and h, computing the

discrete logarithm of h in basis g amounts to ļ¬nding, if it exists, an integer

a, such that h = g a in G. For discrete logarithms computations are not

necessarily possible for all elements h of G. However, if G is a cyclic group

generated by g, then the discrete logarithm is deļ¬ned for all elements of G.

In the sequel, we make this assumption.

Note that, if N denotes the cardinality of G, the discrete logarithm of h in

basis g is only determined modulo N , since g N = 1 the identity element in

G. An interesting consequence is that any algorithm able to compute discrete

logarithm can be used to obtain N . For simplicity, assume that we know the

order of magnitude of N and, more precisely, assume that N lies in the range

[N0 + 1, 2N0 ]. If the discrete logarithm algorithm outputs a normalized value

between 0 and N ā’ 1, it suļ¬ces to ask for the discrete logarithm of g 2N0 , say

a. Then we know that N divides 2N0 ā’ a and even that N = 2N0 ā’ a. If

the discrete logarithm is allowed to output any of the multiple possible values

for the discrete logarithm, choose a random integer b between 0 and some

multiple of N0 , say 10N0 . Then ask for a discrete logarithm of g b and let

a denote the resulting value. Since g a = g b , |b ā’ a| is a small multiple of

N , possibly 0. If a = b, it is a non-zero multiple. Since there are not many

divisors of this multiple in the range [N0 + 1, 2N0 ], we can easily recover N .

However, we need to make sure that the discrete logarithm algorithm does not

systematically output a = b. This comes from the fact that we are choosing b

at random in a large range. With this strategy, even an adversarial discrete

logarithm algorithm cannot systematically determine b and, at some point, it

outputs some other value for the discrete logarithm. Finally, even if we do

not know a precise range, it is usually possible to ļ¬nd N by computing the

GCD of a few multiples obtained as above.

As a consequence, in the context of discrete logarithm computations, it is

reasonable to ask for the group cardinality N in advance. We also assume that

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 217

the factorization of N into prime is known7 , i.e., we have the decomposition:

n

p ei .

N= (6.7)

i

i=1

In the cyclic group G, generated by g, there are subgroups Gi of order pi

for 1 ā¤ i ā¤ n. Moreover, each subgroup Gi is generated by Gi = g N/pi .

In addition, G also contains subgroups Gi of order pei generated by Gi =

i

e

N/pi i

g . Pohlig-Hellman algorithm works in two steps. First, it uses ei calls to

the discrete logarithm algorithm in Gi to compute discrete logarithms in Gi .

Second, it pastes these values together to obtain a discrete logarithm in G.

Starting with the second step of the algorithm, recall that we are given g

and h in G and want to ļ¬nd a number a, determined modulo N , such that ei

h = g a . If we raise this equality to the power N/pei and let Hi = hN/pi ,

i

we obtain a discrete logarithm problem in Gi , namely Hi = (Gi )a . Since Gi

is a group of order pei , this yields the value of a modulo pei . Since the pei

i i i

are pairwise coprime, we can use the Chinese remainder theorem and obtain

a modulo N .

It remains to see how discrete logarithms in Gi can be computed from

ei calls to the discrete logarithm algorithm in Gi . Of course, it is clear for

ei = 1. To treat the general case, it is useful to introduce a family of groups:

(j) (j)

(Gi ) with 1 ā¤ j ā¤ ei . Each group Gi has cardinality pj and is generated

i

N/pj

(j) (1) (ei )

by gi = g i . We have G = Gi and Gi = Gi . In order to prove our

i

result, it suļ¬ces to show by induction on j that discrete logarithms in each

(j)

group Gi can be computed using j calls to discrete logarithms in Gi . It is

(j+1)

clear for j = 1. To compute a discrete logarithm in Gi , or equivalently,

(j+1) x

given z in this group to compute x such that z = (gi ) , we proceed as

follows:

ā¢ Raise the equation to the power pi and remark that

(j+1) pi x (j)

z pi = (gi = (gi )x

)

(j)

is a discrete logarithm problem in Gi . By induction hypothesis, we

learn x0 the value of x modulo pj in j calls to discrete logarithms in Gi .

i

ā¢ Write x = x0 + pj x1 , with 0 ā¤ x1 < pi and remark that:

i

z (j+1) pj x1

(j+1) xā’x0 (1)

= (gi )x1 .

= (gi ) = (gi ) i

(j+1) x0

(gi )

Thus x1 can be obtained by an additional discrete logarithm computa-

tion in Gi .

7 Thisis not a real limitation either, because in most practical cases, factoring N is no

harder than computing the discrete logarithm.

Ā© 2009 by Taylor and Francis Group, LLC

218 Algorithmic Cryptanalysis

This concludes the proof. An iterative version of Pohlig-Hellman method is

given as Algorithm 6.18.

Algorithm 6.18 Pohlig-Hellman discrete logarithm algorithm

n

Require: A group G of cardinality N = i=1 pei . i

Require: A group generator g and a group element h

for i for 1 to n do

Let a0 āā’ 0

Let g0 āā’ g N/pi

for j from 1 to ei do

j

Let h0 āā’ (hg ā’a0 )N/pi

Call discrete logarithm algorithm in subgroup of order pi generated by

g0 for element h0 . Let a denote the result.

Let a0 āā’ a0 + pjā’1 a

i

end for

if i = 1 then

Let A āā’ a0 and M āā’ pe1 . 1

else

Compute the Chinese remainder of A modulo M and a0 modulo pei i

Put the result in A.

Let M āā’ M pei .i

end if

end for

6.4.2 Baby-step, giant-step algorithm

Thanks to Pohlig-Hellman algorithm, we only need to compute discrete

logarithms in a group G of prime order p, generated by g. We are given h and

search for an integer a in the range [0, p ā’ 1] such that h = g a . Let r be the

integer obtained by rounding up the square root of p. It is clearly possible

to write a as a0 + ra1 , with 0 ā¤ a0 < r and 0 ā¤ a1 < r. As a consequence,

we have an equality between hg ā’a0 and g ra1 in the group G. Conversely, any

equality of this form yields the discrete logarithm of h.

From an algorithmic point-of-view, this means that we can ļ¬nd the discrete

logarithm we seek by searching for a collision between two lists of r elements

each. The ļ¬rst list contains all group elements of the form hg ā’a0 . The second

list contains all group elements of the form g ra1 . The name of the algorithm,

comes from the fact that elements of the ļ¬rst list are separated by small

steps (or baby steps), corresponding to multiplication by g and elements of

the second list by large steps (or giant steps), corresponding to multiplication

by g r .

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 219

Algorithm 6.19 Baby-step, giant-step discrete logarithm algorithm

Require: A group G of prime order p.

Require: A group generator g and a group element h

ā

Let r = p

Create T an array of r group elements

Create I an array of r integers

Let k āā’ h

for i for 0 to r ā’ 1 do

Let T [i] āā’ k and I[i] āā’ i

Let k āā’ k/g

end for

Sort T and perform all exchanges simultaneously on I.

Let k āā’ 1 (neutral element in G)

Let G āā’ g r in G

for i from 0 to r ā’ 1 do

Lookup k in T (using a binary search)

If k is found at position j, return ri + I[j] as discrete logarithm.

Let k āā’ k Ā· G in G

end for

Ā© 2009 by Taylor and Francis Group, LLC

220 Algorithmic Cryptanalysis

Exercises

1. Neglecting skip years and seasonal birthrate irregularities, compute for

sets of ten to thirty individuals, possibly writing a program, the proba-

bilities of birthday collisions.

2. Check that CBC-MAC without ļ¬nal reencryption is insecure by showing

that if a message containing a single block M (1) has authentication tag t,

then the two-block message M (1) M (1) ā• t has the same authentication

tag.

3h . Write down an explicit chosen message distinguishing attack on CBC

encryption, when used beyond the birthday paradox limit, using the

technique described in Section 6.1.1.1.

4. Why is the probability for twins to have the same birthday not equal to

one?

5h . Consider a hash function obtained by directly applying the Merkle-

DamgĖ construction (Chapter 5, Section 5.4.1) to a permutation fam-

ard

ily Ļ. The goal of this exercise is to show a weakness of this hash

function with respect to the preimage resistance property. This means

that starting from an intermediate hash value h and a message block m,

the next hash value is h = Ļm (h).

(a) Show that when Ļ ā’1 is available (for example, when considering

a block cipher), the hashing process can be reversed, i.e., given a

intermediate hash value h and a message block m, the previous

hash value h can be recovered.

(b) Let h0 be the initial value of the hash function. Choose a large

set containing short sequences of message blocks. For each such

sequence Mi , starting from h0 and considering each block of Mi in

turn, we can compute an intermediate hash value which we denote

by hi .

(c) Let hF be a target value for the hash function. Similarly choose

another large set containing short sequences of message blocks.

For each such sequence Mi , starting from hF and considering each

block of Mi in turn, we can backward compute an intermediate

hash value which we denote by hi .

6h . Again considering a Merkle-DamgĖ based hash function H, we now

ard

turn to multicollisions. Let A and A(2) be two single-block messages

(1)

such that H(A(1) ) = H(A(2) ). Let B (1) and B (2) be two message blocks

such that H(A(1) B (1) ) = H(A(2) B (2) ), construct a 4-collision.

Ā© 2009 by Taylor and Francis Group, LLC

The birthday paradox: Sorting or not? 221

(a) More generally, construct a 2t -collision on messages of t-blocks.

(b) How secure against collision resistance is the hash function ob-

tained by concatenating two Merkle-DamgĖ hash functions, where

ard

each function is on n bits.

7. Check that the dichotomy search given as Algorithm 6.2 always returns

the position of the last occurrence of an element. Deduce that the

insertion sort Algorithm 6.6 is stable.

8. Any sorting algorithm can be made stable by a generic technique, based

on storing pairs containing elements and their initial position. Give the

order relation that should be used to achieve this eļ¬ect.

9h . At ļ¬rst, it may seem that in quicksort and radix sort, the basic par-

titioning preserves the relative order of elements stored in the stack of

small element and reverses the order of elements in the stack of large

elements. Explain why it is not the case. Show how to do this with an

auxiliary array of N elements. Conclude by writing a stable version of

quicksort and radix sort.

Ā© 2009 by Taylor and Francis Group, LLC

Chapter 7

Birthday-based algorithms for

functions

In Chapter 6, we introduced several algorithmic techniques, that allow us to

apply the birthday paradox to arbitrary lists of randomly distributed objects.

These techniques do not take into account the speciļ¬c details of how these

objects are generated. In this chapter, we are going to focus on a more speciļ¬c

problem and look for collisions among objects which are generated by iterating

some ļ¬xed function. In this speciļ¬c case, many improvements are possible.

This is quite important for cryptanalysis, because there are many applications

where this improved setting can be exploited. In particular, there are some

very important number theoretic algorithms due to Pollard that use these

techniques to factor integers or compute discrete logarithms.

In the most general setting, we are given a set S and a function F from S to

itself. Then starting from a point X0 in S, we generate the sequence deļ¬ned

by:

Xi+1 = F (Xi ). (7.1)

Our main goal is to ļ¬nd a collision within this sequence, say Xi = Xj . Note

that due to the deterministic nature of the above computation, from any such

collision, we may generate many. Indeed, whenever Xi = Xj , we ļ¬nd that

Xi+1 = F (Xi ) = F (Xj ) = Xj+1 and obtain a new collision. Of course,

iterating this remark, we also have Xi+2 = Xj+2 and so on.

In order to study this algorithmic problem, it is important to assume that

F behaves more or less randomly. Without this assumption, the birthday

paradox has no reason to hold. Furthermore, when omitting the assumption,

it is easy to build counterexamples. Take the set S = [0 Ā· Ā· Ā· 2n ā’ 1] of integers

modulo 2n and deļ¬ne F (X) = X + 1 (mod 2n ). Then, for any choice of X0 ,

we cannot ļ¬nd a collision in the sequence Xi , unless we have considered the

full set of 2n elements. In this case, the birthday paradox would predict a

collision after 2n/2 elements. Thus, we see that it does not apply to this coun-

terexample. In order to theoretically analyze the behavior of such sequences,

F is usually modeled as a random function. This model and the correspond-

ing analysis are presented in Section 7.2 below. For now, we simply assume

that after some time the sequence X loops back on one of its previous values.

Note that since F is a function, not a permutation, nothing guarantees that

X loops back on X0 . On the contrary, it usually restarts at a later position

in the sequence. A traditional way of describing this fact is to draw a picture

223

Ā© 2009 by Taylor and Francis Group, LLC

224 Algorithmic Cryptanalysis

Tail

X0

Cycle

Figure 7.1: Rho shape while iterating a random function

that represents the sequence X by making sure that consecutive elements of

the sequence are neighbors on the picture. On the resulting Figure 7.1, we see

a shape similar to the letter Ļ. This gave its name to Pollardā™s Rho factoring

algorithm. In this ļ¬gure, we see that the sequence has two distinct parts: a

cycle and a tail. If the ļ¬rst collision in X is a collision between Xs and Xs+ ,

then the tail contains elements of the sequence from X0 up to Xsā’1 and the

cycle starts from Xs and its length is , the period of X.

7.1 Algorithmic aspects

The key question in this chapter is to ļ¬nd collisions in the sequence X or

even better to determine the respective lengths of the cycle and tail of the

sequence. A ļ¬rst solution to this problem is to use classical birthday-based

algorithms. For example, we can use a binary search tree to store the elements

of the sequence X one at a time as long as there is no duplicate in the tree.

The ļ¬rst duplicate occurs when Xs+ is inserted in the tree and we ļ¬nd that

this value is already present due to Xs . This method has a running time

O((s + ) log(s + )) and needs to store O(s + ) elements. The running time is

essentially optimal, since, unless the function F has very speciļ¬c properties,

we need to generate the sequence up to Xs+ to detect its cycle. Moreover,

this algorithm directly generates both the length of the cycle and the length

of the tail s. However, its storage costs are high. The good news is that

there exist alternative algorithms that do not require such a large amount of

memory. These algorithms do not directly output s and but instead ļ¬nd a

collision somewhat later in the cycle. However, with a reasonable additional

computational cost, it is possible to recover s and from this late collision.

We start by presenting two simple algorithms that obey an additional re-

strictive condition. More precisely, these two algorithms access values of the

sequence X in a restrictive way, either as inputs to F or in equality tests,

but in no other manner. This restrictive use of the sequenceā™s values can be

very important because it allows the cycle ļ¬nding algorithms to be used in

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 225

a black-box context where the sequence X can only be accessed indirectly.

In particular, this restriction is essential to Pollardā™s Rho factoring algorithm

(see Section 7.3.2). Indeed, in Pollardā™s Rho factoring, equality tests can only

be performed indirectly through GCD computations.

7.1.1 Floydā™s cycle ļ¬nding algorithm

In this ļ¬rst algorithm, we start by deļ¬ning a new sequence Y which is

related to X by the equation:

Yi = X2i . (7.2)

Alternatively, the sequence Y can be deļ¬ned directly by the formulas:

Y0 = X0 and Yi+1 = F (F (Yi )). (7.3)

Algorithm 7.1 Floydā™s cycle detection algorithm

Require: Input initial sequence value X0 , max. iterations M

Let x āā’ X0

Let y āā’ X0

for i from 1 to M do

Let x āā’ F (x)

Let y āā’ F (f (y))

if x = y then

Output ā˜Collision between i and 2iā™

Exit

end if

end for

Output Failed

Once Y is deļ¬ned, Floydā™s algorithm looks for the ļ¬rst index t such that

Yt = Xt . Clearly, this can be done in time O(t) using a very small amount

of memory, as shown by the pseudo-code in Algorithm 7.1. Indeed, it suļ¬ces

to compute the two sequences in parallel until they collide. To analyze the

behavior of this algorithm, we need to better understand the relation between

the position t of this ļ¬rst collision between X and Y and the parameters s

and of the sequence X.

First, since X2t = Xt , we ļ¬nd that t is a period of X and thus a multiple

of the period . Second, since Xt is in the cycle of X, t ā„ s. In fact, t is the

smallest multiple of greater than (or equal to) s:

s

t= . (7.4)

Ā© 2009 by Taylor and Francis Group, LLC

226 Algorithmic Cryptanalysis

Moreover, once t is known, recovering is easy, it suļ¬ces to compute the

sequence Xt+i until we loop back to Xt . This happens after exactly steps.

7.1.2 Brentā™s cycle ļ¬nding algorithm

The main drawback of Floydā™s algorithm is the fact that each step requires

three applications of the function F in order to advance both sequences X

and Y . Brentā™s algorithm is an alternative that only requires the computation

of a single copy of the original sequence. The starting point of the method is

to remark that once we are given a point Xk inside the cycle, we can ļ¬nd the

cycleā™s length in exactly steps, using the same method as when recovering

after Floydā™s algorithm. The diļ¬culty is to ļ¬nd a start-point in the cycle. In

Brentā™s algorithm, this is done by trying several start-points until an adequate

one is found. In order to minimize the overall cost, the computation steps used

to ļ¬nd a potential cycle are shared and also used to compute the next start-

point to be tested. More precisely, we consider the sequence of start-points

X1 , X2 , X4 , . . . , X2i , . . . indexed by powers of two. For each start-point,

X2i , we compute the sequence of successors X2i +j until we either ļ¬nd a cycle

i+1

or reach X 2 . In the former case, we are done, in the latter, we change

i+1

to the next start-point X 2 . This approach is described in pseudo-code as

Algorithm 7.2.

Algorithm 7.2 Brentā™s cycle detection algorithm

Require: Input initial sequence value X0 , max. iterations M

Let x āā’ X0

Let y āā’ x

Let trap āā’ 0

Let nexttrap āā’ 1

for i from 1 to M do

Let x āā’ F (x)

if x = y then

Output ā˜Collision between trap and iā™

Exit

end if

if i = nexttrap then

Let trap āā’ nexttrap

Let nexttrap āā’ 2 Ā· trap

Let y āā’ x

end if

end for

Output Failed

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 227

As before, given the parameters s and of the sequence X, we can analyze

the behavior of Brentā™s method. It succeeds with starting point X2k , if and

only if, 2k ā„ max(s, ). Indeed, to ļ¬nd the cycle with this start-point, the

point should be in the cycle, i.e., 2k ā„ s, and the cycleā™s length should be

at most 2k . Clearly, Brentā™s method ļ¬nds with complexity O(s + ).

Finally, note that our choice of putting traps in positions of the form 2k is

not the only option. Depending on the speciļ¬c problem at hand, we could con-

sider another quickly increasing sequence. This could be either a (rounded)

geometric progression Ī±k or something diļ¬erent such as a Fibonacci se-

quence.

7.1.3 Finding the cycleā™s start

We have seen that both Floydā™s and Brentā™s algorithms can be used to ļ¬nd

the length of the cycle in sequence X. However, neither produces the length

of the tail s or equivalently the position where the cycle starts. From the

point-of-view of collision ļ¬nding, this position is quite important. Indeed, we

have:

F (Xsā’1 ) = Xs = Xs+ = F (Xs+ ā’1 ) and Xsā’1 = Xs+ ā’1 . (7.5)

Thus, ļ¬nding the entrance of the cycle yields a true collision in F and this is

the only place in the sequence X where such a true collision can be found.

After either Floydā™s or Brentā™s algorithm, we know a number t ā¤ 2(s + )

such that Xt is in the cycle of X. Thus, the start of the cycle is an integer

s in the interval [0 Ā· Ā· Ā· t]. From this information, it is possible to obtain s in

several diļ¬erent ways. We now present two diļ¬erent approaches to solve this

problem.

7.1.3.1 Dichotomy search

Once is known, it is possible for any number Ļ in the interval from 0 to

t, whether s ā¤ Ļ by testing if XĻ = XĻ+ . Indeed, if the condition holds, XĻ

is in the cycle and Ļ ā„ s, otherwise, Ļ < s. This test can be performed by

computing the sequence X up to Ļ + , which requires O(s+ ) steps. Thus we

can ļ¬nd s using a dichotomy search with complexity O ((s + ) log(s + )). It

is interesting to remark that this approach has the same runtime complexity

as a generic birthday method that relies on a fast sort algorithm.

7.1.3.2 Direct search

The main problem with the above method is that we need to compute sev-

eral overlapping parts of the same sequence, thus recomputing the same infor-

mation many times. To avoid this drawback, we can proceed diļ¬erently. First

compute X . Then, from X0 and X compute in parallel the two sequences Xi

and X +i , testing for equality at each step. When the two sequences collide,

Ā© 2009 by Taylor and Francis Group, LLC

228 Algorithmic Cryptanalysis

we have s = i. This allows us to recover s and obtain a true collision1 in time

O(s + ). This is described as Algorithm 7.3.

Algorithm 7.3 Algorithm for recovering a cycleā™s start

Require: Start-point X0 , cycle length , max. iterations M

Let x āā’ X0

Let y āā’ x

for i from 1 to do

Let y āā’ F (y)

end for

if x = y then

Output ā˜Cyclic sequence, no real collisionā™

Exit

end if

for i from 1 to M do

Let x āā’ F (x)

Let y āā’ F (y)

if x = y then

Output ā˜Collision between images of x and yā™

Exit

end if

Let x āā’ x

Let y āā’ y

end for

Output Failed

As a consequence, we see that ļ¬nding collisions in recursively deļ¬ned se-

quences of this type is much more practical than ļ¬nding collisions in lists of

random objects.

7.1.4 Value-dependent cycle ļ¬nding

As explained above, both Floydā™s and Brentā™s algorithms detect a cycle in

a sequence X, while using X only in a restrictive way, either through F or in

equality tests on the values taken by the sequence. We know that this addi-

tional property is extremely useful in some cases. However, in other cases, the

sequence X is directly available. Thus, it is natural to ask whether cycle ļ¬nd-

ing algorithms can be improved using more value-dependent computations.

The surprising answer is that it is indeed possible. A ļ¬rst step in this di-

rection can be achieved using a distinguished point method, as proposed for

1 Unless s = 0.

Ā© 2009 by Taylor and Francis Group, LLC

Birthday-based algorithms for functions 229

example in [QD90]. This consists of memorizing values that satisfy a speciļ¬c

property, called distinguished points, and waiting until they come back. The

main diļ¬culty here is to choose the right probability of occurrence for distin-

guished points. If they are too frequent, then we need to store a huge number

of values and the algorithm is ineļ¬cient. If they are too rare, then the cycle

may not contain any distinguished point at all and the technique fails. We

discuss the use of distinguished points in more detail in Section 7.5.2.

A stronger way of using the values taken by the sequence in a cycle detection

algorithm was proposed, in 2004, by Nivasch in [Niv04]. The main advantage

of his algorithm is that it is guaranteed to stop somewhere between Xs+

and Xs+2 ā’1 , i.e., during the second iteration of the sequence cycle. This is

particularly useful for sequences where is small compared to s. The key idea

is to maintain a (hopefully) short stack of already encountered values in a way

that guarantees cycle detection. To build this stack, Nivaschā™s algorithm relies

on the existence of a total ordering on the set of sequence values and focuses

on keeping small values taken by the sequence on the stack. The ultimate goal

is to make sure that the smallest value occuring in the cycle is always stored

on the stack and is always present for detection during the second iteration

around the cycle. In order to achieve that, at any point in time, the stack

memorizes the minimum value, say m1 , encountered up to that time while

computing X, followed by the minimum value (say m2 ) encountered after m1

in the computation, followed by the minimum value (say m3 ) encountered

after m2 , . . . Clearly, the minimum value of the cycle is one of those, say

mk . Indeed, any value of the sequence X smaller that mk is out of the cycle

and thus part of the pre-cycle. Since mk occurs after these possibly smaller

elements, by deļ¬nition of our stack, it gets to be stored on the stack and

remains there afterward.

From an algorithmic point-of-view, building such a stack is an easy matter.

Whenever, we compute a new value Xi of the sequence X, we remove all

values larger than Xi from the stack and add Xi to the stack. Moreover, it is

easy to see that if Xi we add at the end of the stack, then this stack is always

sorted in increasing order. Thus, erasing larger values can be done very easily

by using a dichotomy search on the pre-existing stack and by truncating the

stack after the point where Xi needs to be inserted. It is shown in [Niv04] that

the expected size of the stack at time i is log i and that the size of the stack

remains below O(log i) almost surely. Thus, except in extremely rare cases,

the memory requirements of Nivaschā™s algorithm are small. When considering

the time complexity of this algorithm, we need to consider two diļ¬erent kinds

of operations: evaluations of the function F and other operations. Indeed,

in almost all cases, the computation of values of F dominates the running

time. Clearly, since X is computed once for all considered positions, up to at

most position s + 2 ā’ 1, this bounds the number of evaluations of F . The

dichotomy search within the stack dominates the other operations, and the

cost of each search is logarithmic in the size of the stack. Thus, the time

complexity of other operations is bounded by O((s + 2 ) log log(s + 2 )). At

Ā© 2009 by Taylor and Francis Group, LLC

230 Algorithmic Cryptanalysis

ļ¬rst, this complexity may appear worse than the complexity of the evaluation

of F . However, remember that F operates on numbers of at least O(log(s+2 ))

bits, otherwise the cycle would be shorter. Thus, each evaluation of F costs at

least O(log(s + 2 )) and, when accounting for this unit cost, evaluations of F

dominates the complexity. Thus, where applicable, Nivaschā™s algorithm gives

a nice improvement of cycle ļ¬nding algorithms. We give a full description of

it in Algorithm 7.4.

Algorithm 7.4 Nivaschā™s cycle detection algorithm

Require: Input initial sequence value X0 , max. iterations M

Create empty stack S

Let x āā’ X0

Add pair (x, 0) to the stack

for i from 1 to M do

Let x āā’ F (x)

Search x by dichotomy in the (sorted) ļ¬rst component

if (x, j) is found then

Output ā˜Collision between i and jā™

Exit

else

We know that Sk < x < Sk+1

Truncate S after Sk

Add pair (x, i) at the end of S

end if

end for

Output Failed

Another advantage of Nivaschā™s algorithm compared to Floydā™s or Brentā™s

algorithms is that it is guaranteed to directly output the cycle length , rather

than a multiple thereof. Thus, for applications where the exact cycleā™s length

is needed, we can obtain an additional gain of functionā™s evaluations.

7.1.4.1 Nivasch and the cycleā™s start

ńņš. 8 |