. 8
( 18)

>> Merge sort. Our ¬rst fast algorithm for sorting, called merge
sort, is a straight implementation of the divide and conquer idea. It works by
dividing the list in two sublists of half-size, sorting the sublists and merging
the two sorted sublists into a single sorted list. The basic idea for merging
is already described in Section 6.3 as a way of ¬nding collisions between two
sorted lists. Writing down the merge sort algorithm from this description
is a simple matter. The main di¬culty comes from merging. Indeed, when
merging two sorted lists into a single one, it is essential to have some place to
store the merged list. Indeed, it is very hard to do the merge in place, reusing
the storage of the two sublists. As a consequence, merge sort requires extra
memory in addition to the memory used by the input array. With a little
care, a single auxiliary array of the same size as the input array su¬ces. One
approach is to copy the original array into the auxiliary array, to sort both
halves of the auxiliary array and to merge them back into the original. When
sorting the halves of the auxiliary array, the role of original and auxiliary
arrays are reversed. Moreover, most of the copy operations can be removed,
indeed at each stage after the ¬rst one, we may see that copying the current
original arrays to the auxiliary ones simply rewrite the same data over and over
again. In fact, if the array™s size is a power of two, it is even better to avoid
the copying altogether and directly start by merging. First merge small lists
and proceed upward until the ¬nal merging is done. With luck, if the number
of stages is even, the sorted list already is in the correct array. Otherwise,
copy it once from the auxiliary to the main array. However, when the size of
the array is not a power of two, the depth of the computation™s tree is not

© 2009 by Taylor and Francis Group, LLC
200 Algorithmic Cryptanalysis

uniform and making an initial copy is a great simpli¬cation. To distinguish
between the top level where the allocation of an auxiliary array and the initial
copy is done from the other levels, we split the algorithm into two parts: a
wrapper (Algorithm 6.8) and the main procedure (Algorithm 6.7).

Algorithm 6.7 Merge sort main procedure
Require: Input an array X[start · · · end] of end ’ start + 1 elements
Require: Input an auxiliary array Y [start · · · end]
If start ≥ end, the array is already sorted, return
Let mid ←’ start+end 2
Recursively merge sort Y [start · · · mid] with auxiliary array X
Recursively merge sort Y [mid + 1 · · · end] with auxiliary array X
(Note the reversal of main and auxiliary array in the calls)
Let start1 ←’ start
Let start2 ←’ mid + 1
Let merge ←’ start
while start1 ¤ mid and start2 ¤ end do
if Y [start1 ] ¤ Y [start2 ] then
Let X[merge] ←’ Y [start1 ]
Increment start1
Let X[merge] ←’ Y [start2 ]
Increment start2
end if
Increment merge
end while
while start1 ¤ mid do
Let X[merge] ←’ Y [start1 ]
Increment start1
Increment merge
end while
while start2 ¤ end do
Let X[merge] ←’ Y [start2 ]
Increment start2
Increment merge
end while
Return sorted array X[start · · · end]

Merge sort is a very e¬cient algorithm, however, since it requires additional
memory, it cannot be used when the array to be sorted ¬lls the computer™s
main memory. In that case, it is better to use a di¬erent sorting algorithm.
Note that using a more complicated merging method, it is also possible to

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 201

Algorithm 6.8 Merge sort wrapper
Require: Input an array X[start · · · end] of end ’ start + 1 elements
If start ≥ end, the array is already sorted, return
Create an auxiliary array Y [start · · · end]
Copy X into Y
Call merge sort procedure on X[start · · · end] with auxiliary array Y
Free the memory occupied by Y
Return sorted array X[start · · · end]

devise an in-place variation of the merge sort algorithm, e.g., see [KPT96]. Quicksort sort. In order to largely reduce the need for addi-
tional memory, quicksort implements the divide and conquer idea in a di¬erent
way. At each stage, it randomly chooses an element in the list and uses this
element called the pivot to divide the list into two sublists. The ¬rst sublist
contains all elements smaller than the pivot and the second sublist the other
elements. During the stage, elements are moved around to put the list in a
new order, with the ¬rst sublist at the beginning, the pivot in the middle
and the second sublist at the end. This can be done in linear time, without
requiring extra memory. If the two sublists are then independently sorted,
it is clear that the list itself becomes sorted. Despite the simplicity of the
basic idea, this algorithm presents several complex issues. First of all, the
need to choose the pivot at random is essential. If the pivot is chosen in a
deterministic way, then it is possible to conceive examples of adversary lists
which would require quadratic time to be sorted. Second, making the com-
plexity analysis precise is a di¬cult matter because the running time depends
on the random choices made by the algorithm and making the bad choice at
each stage leads to a quadratic running time. However, with overwhelming
probability, the size of the sublists decreases quickly between stages and the
running time stays in the range O(N log N ). Finally, to avoid bad lists lead-
ing to quadratic running time, it is also necessary to specify more precisely
what happens when we encounter an element which is equal to the pivot. If
we decide that all such elements go to the ¬rst (or second) sublist, we are in
trouble. Indeed, in that case, sorting a list containing N copies of the same
element takes quadratic time! To avoid this, the simplest approach is to put
elements in the ¬rst sublist if they occur before the pivot in the original list
and to put them in the second sublist otherwise.
Despite these di¬culties, quicksort is a very useful practical algorithm.
Moreover, it is quite easy to detect bugs which lead to quadratic running time.
Indeed, due to the recursive nature of the algorithm, each time the program
enters a deeper stage, it adds some elements on the recursivity stack. When a
bug leads to quadratic behavior, the size of the stack needs to become linear
in N , which quickly leads to an over¬‚ow. When such an over¬‚ow occurs,

© 2009 by Taylor and Francis Group, LLC
202 Algorithmic Cryptanalysis

the program aborts and reports an error. This misbehavior usually su¬ces to
¬nd the bug. Note that the recursive nature of the algorithm also implies that
quicksort still requires more than a constant amount of memory in addition
to the original array. The additional memory needed is proportional to the
depth of the computation tree, i.e., it is of the form O(log N ).

Algorithm 6.9 Quicksort
Require: Input an array X[start · · · end] of end ’ start + 1 elements
If start ≥ end, the array is already sorted, return
Choose a random position pos in [start · · · end]
Copy start to start0
Copy end to end0
Let Y ←’ X[pos]
Let X[pos] ←’ X[start]
Let i ←’ start + 1
while i ¤ end do
if (X[i] < Y ) or (X[i] = Y and i < pos) then
Let X[start] ←’ X[i]
Increment start
Increment i
Exchange X[end] and X[i]
Decrement end
end if
end while
Let X[start] ←’ Y
Recursively apply quicksort to X[start0 ..start ’ 1]
Recursively apply quicksort to X[end..end0 ]
Return sorted array X[start0 · · · end0 ] Radix sort. From the descriptions of merge sort and quicksort,
it is natural to ponder whether it is possible to devise a sort algorithm that
requires neither additional memory nor randomness. Radix sort neatly solves
this problem. However, this algorithm is not generic and it requires addi-
tional knowledge about the order used for sorting. For many cryptographic
applications, this is not a problem at all, but it makes index sort badly suited
in other cases, especially when sorting complex objects, such as, for example,
database entries. In fact, there are two di¬erent version of radix sort, which
sort by considering bits either from left to right or from right to left. Here, we
only consider the left to right version. To explain this version of radix sort,
let us assume that we want to sort an array of N unsigned n-bit numbers. We

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 203

know that in the sorted array, the numbers with a high order bit set to 0 occur
before the numbers with a high order bit of 1. Similarly, with each of these
subarrays, the numbers should be sorted according to the value of the second
high order bit. And so on, until the low order bit is reached. The complexity
analysis is simple, each stage which consists of moving numbers with some bit
equal to 0 before numbers with the bit equal to 1 can be performed in linear
time. Since there is one stage per bit, the complexity is O(nN ). With very
large numbers, when n log N , this is worse than merge sort or quicksort.
However, it is not a very frequent case. Moreover, if the numbers are (or
look) random, the complexity can be improved with a very simple change:
simply abort the recursion when a list of zero or one element is to be sorted.
With random numbers, this cuts o¬ the depth of the computation tree around
log2 N , thus improving the complexity as required.
Radix sort can also be used with signed numbers. In that case, we should
remember that the high order bit is the sign bit. As a consequence, numbers
with a high bit set to 1 need to be at the beginning because they are neg-
ative. Thus reversing the role of the ¬rst and second sublists for this bit is
the only change to deal with signed numbers. For all other bits, we should
proceed as we did with unsigned numbers (assuming that negative numbers
are represented using the two-complement convention).

Algorithm 6.10 Radix sort
Require: Input an array X[start · · · end] of end ’ start + 1 unsigned integers
Require: Bit position b to consider in the integers
If start ≥ end, the array is already sorted, return
Copy start to start0
Copy end to end0
while start ¤ end do
if Bit b of X[i] is 0 then
Increment start
Exchange X[end] and X[start]
Decrement end
end if
end while
Recursively radix sort X[start0 ..start ’ 1] on bit b ’ 1
Recursively radix sort X[end + 1..end0 ] on bit b ’ 1
Return sorted array X[start0 · · · end0 ] Heap sort. From a theoretical point-of-view, heap sort is very
interesting. First, it is a non-recursive algorithm and thus it can truly achieve

© 2009 by Taylor and Francis Group, LLC
204 Algorithmic Cryptanalysis

in-place sorting: in addition to the array being sorted, it only requires a
constant amount of memory. Second, it uses a very nice implicit tree structure
as the basis of the algorithm. This tree structure is called a heap, it is a binary
tree where the value of each node is larger4 than or equal to the values of all
nodes in the subtree attached to it. The tree structure is implicit, because it
does not use any pointers but instead works with the convention the left and
right children of the element in position i are respectively stored5 in positions
2i + 1 and 2i + 2. Whenever the position of a child is beyond the end of
the array, it simply means that the child does not exist. For example, when
position 2i + 1 is not in the array, then node i is a leaf. When 2i + 1 is the
last position in the array, then node i has a single child (on the left).
Using this structure, heap sort works in two phases. During the ¬rst phase,
it takes the input array with the implicit tree structure and reorder it to satisfy
the heap property. This is done by making sure that each element is larger
than its children. During the second phase, it progressively reduces the size
of the tree by taking out the largest element and placing it where it belongs
in the sorted array. During this second phase, one part of the array is used
to store a heap tree and the other part to store an already sorted part of the
array. At the end of the second phase, the tree part is empty and the array
is fully sorted.
To write down the heap sort algorithm, we ¬rst need a heap insertion pro-
cedure which allows us to correctly add elements to an existing heap. This
insertion works as follows: assuming that the subtrees starting in position
2i + 1 and 2i + 2 are correct heap trees, it inserts an element in position i and
if necessary moves it down the tree to create a correct heap tree in position i.
During the ¬rst phase, this procedure is used iteratively on all positions, start-
ing from the end of the original array. Of course, remarking that subtrees with
a single node are always heap trees, there is nothing to be done for the second
half of the array and we may start in the middle. During the second phase, we
take the element at the root of the heap tree, i.e., the largest element in the
tree and put it at the beginning of the sorted part. Since this partially breaks
the heap structure, we use the insertion procedure again. More precisely, we
move the last element of the previous heap at the root and apply the insertion
to guarantee the heap property on a new tree, with one fewer element. Both
phases of the heap sort algorithm require N calls to the insertion procedure,
whose complexity is bounded by the tree depth. As a consequence, the total
runtime is O(N log N ) as with the other fast sort algorithms.
Despite its good complexity analysis, in practice, heap sort does not perform
very well, especially on large arrays, because it causes many cache misses when
moving elements around during the heap insertion.

4 There is, of course, a variation of the heap tree where each node is smaller than its
successors, but this is not the version we use here.
5 This assumes that the array starts at position 0, for arrays starting with position 1, we

should use positions 2i and 2i + 1 instead.

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 205

Algorithm 6.11 Heap sort
Require: Input an array X[0 · · · end ’ 1] of end elements
First Phase: Heap creation
for i from end’2 down to 0 do
(Note: Subtrees at positions 2i + 1 and 2i + 2 are already correct)
Call heap insertion on position i with tree size end
end for
Second phase: Sorting
while end > 1 do
Put largest element at the end: Exchange X[0] and X[end ’ 1]
Decrement end
Call heap insertion on position 0 with (reduced) tree size end
end while
Return sorted array X

Algorithm 6.12 Insertion in heap procedure
Require: Input an array X[0 · · · end ’ 1] of end elements
Require: Input a position pos where the heap property must be enforced
while 2 · pos + 1 ¤ end ’ 1 do
Let val ←’ X[pos]
Let dest ←’ pos
if X[2 · pos + 1] > val then
Let val ←’ X[2 · pos + 1]
Let dest ←’ 2 · pos + 1
end if
if 2 · pos + 2 ¤ end ’ 1 and X[2 · pos + 2] > val then
Let dest ←’ 2 · pos + 2
end if
if dest = pos then
Exchange X[pos] and X[dest]
Let pos ←’ dest {loop continues at lower level of the tree}
Break from loop {The tree is now a heap}
end if
end while

© 2009 by Taylor and Francis Group, LLC
206 Algorithmic Cryptanalysis Optimality of sort algorithms and beating the bound

Fast algorithms are able to sort N elements in O(N log N ) time and there is
a trivial lower bound of O(N ) on this running time, since sorting an array at
least requires reading the N elements in the array and writing the sorted array
as output. A natural question is to ask whether it is possible to outperform
O(N log N ). In fact, it can be shown that in a generic sense, this is impossible.
However, in some special cases, where additional information is known about
the array and the elements it contains, it is possible to beat this bound and
come up with linear time special purpose sort algorithms.
Let us ¬rst explain why O(N log N ) is optimal. For that, assume that we
need to sort an array of N distinct elements. We also assume that the array
is accessed only in a limited way. Elements can be compared and moved
around, nothing else. For our array of N elements, N ! di¬erent orders are
possible. Moreover, given the sorted array that the algorithm outputs and the
computation trail, it is possible to reconstruct the initial order of elements.
As a consequence, the algorithm has to learn enough information to be able to
fully reconstruct the input order. The number of bits necessary to encode this
order is log2 (N !) ≈ N log2 N . Since any comparison reveals at most one bit of
information, the algorithm requires on average at least N log2 N comparisons.
This yields the desired lower bound on the running time.
To show that this bound can be beaten in some special cases, assume that
we want to sort an array of N integers in [1 · · · »N ] for some constant ».
Consider Algorithm 6.13. The initialization of the algorithm costs O(»N ), the
¬rst loop costs O(N ), the second loop costs O(»N ) for the control and O(N )
for the printing. All in all, since » is a constant, this algorithm running time
is O(N ). Clearly, the algorithm outputs numbers in increasing order, since it
only writes down the loop variable i. Moreover, it writes i as many times as i
appears in X. Thus, the algorithm outputs a sorted copy of array X. It beats
the lower bound on sort algorithms, because it is able to obtain information
about X through an additional channel. This algorithms does not restrict
itself to comparisons; in fact, it does not perform any comparisons at all, only
(implicit) equality tests.

Algorithm 6.13 Count sort: Beating the sort lower bound
Require: Input a sorted array X of N elements in [1 · · · »N ]
Allocate an array M of »N elements initialized to zero
for i from 1 to N do
Increment M [X[i]]
end for
for i from 1 to »N do
Print M [i] times the number i
end for

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 207 Sort algorithms and stability
When using sort algorithms with duplicate elements, we can ask what hap-
pens to equal elements: Do they remain in the same relative order or can
they be moved around? If a sort algorithm always preserves the relative or-
ders of copies of the same element, it is called a stable sort algorithm. Since
this can sometimes be an important property, it is useful to know which sort
algorithms are stable and which are not. Of course, any sort algorithm can
be made stable by simplify adding the original position to any element, using
this value to break ties between identical entries. The problem with this ap-
proach is that it requires a potential large amount of extra memory to store
the initial positions. As a consequence, when needed an intrinsically stable
sort is preferable. Among our sort algorithms with quadratic running time it
is easy to check that bubble sort and selection sort are stable. For insertion
sort, the sort is stable if and only if the dichomoty search we use always return
the last position where an element can be inserted. We leave this veri¬cation
as an exercise.
Of course, it would be interesting to use an algorithm which is both fast and
stable. Among our selection of sort algorithms, only merge sort has this extra
property. Quicksort and radix sort can reasonably easily be made stable 9 if
the sorted elements are written in a copy of the initial array. However, the
versions that do not need such a copy move elements around too much to
satisfy this property.

6.3.2 Hash tables
In a book about cryptography, when hash functions are discussed, the reader
is usually expected to think about cryptographically strong hash functions,
without collisions, preimages or other strong properties. This section is an
exception to this general rule, here we consider algorithmic hash functions
and only require that for ordinary inputs (not selected by an adversary) to
these functions, the outputs of the hash functions are well distributed. When
looking for collisions, such a non-cryptographic hash function can be used to
¬lter out most non-colliding pairs of elements. The key requirement is that
for given two distinct elements, their hash values are rarely equal. Of course,
given two equal elements, their hash values are always equal, so using a hash
function to ¬lter pairs cannot miss collisions. In practice, depending on the
set of inputs, the hash functions we use can be extremely simple. The most
extreme case is encountered with random numbers, say on n bits. There,
it su¬ces to truncate the numbers, keeping b bits only to get a useful hash
function. For this purpose, we can use the low order bits, the high order bits
or anything else that ¬ts well with the particular problem being considered.
It is clear that this process cannot qualify as a strong cryptographic hash
function, for example, ¬nding collisions in this function is straightforward.
To ¬nd collisions among n-bit numbers using a non-cryptographic hash

© 2009 by Taylor and Francis Group, LLC
208 Algorithmic Cryptanalysis

function h, we proceed as follow. Assuming that the hash values are b-bit
numbers, we create a large hash table of 2b numbers, which can be indexed
by hash values. This table is initialized to some special value ⊥. Then we
take numbers of the list one by one, and process number x by writing x in
the hash table T at position h(x) when the position still contains the special
value ⊥. Otherwise, we test whether x = T [h(x)], if so, since T [h(x)] is
a previous element, we have found a collision. A very important remark
is that this approach may miss some collisions. For example, assume that
the list of elements contains in this order three elements, x, y and z with
h(x) = h(y) = h(z), y = z and x = y. We also assume that no other element
in the list has the same hash value h(x). Then during the execution of the
algorithm, x is written in T [h(x)] since the position h(x) in table T is empty
when x is processed. When we encounter y, we see that T [h(y)] is not empty
and that y = T [h(y)](= x), so no collision is detected. Similarly, for z we see
that z = T [h(z)](= x), and do not detect a collision. All in all, we missed
the collision between y and z. In order to solve this problem, we should
make sure that when a false collision is detected, the corresponding element
is not dismissed but stored somewhere. In fact, this is a classical problem
when using hash table for other purposes such as database indexing. In these
alternative contexts, collision problems between hash values must be solved
and speci¬c collision resolution techniques have been devised. However, in
our context, it is often easier to choose the parameter b in order to avoid this
problem altogether or, at least, with high probability.
In order to use hash tables for collision ¬nding as in Algorithm 6.14, it is
important to choose the size of indexes correctly. Assume that we want to
¬nd collisions in a list of L elements using a table indexed by b-bits. To avoid
collision resolution, we should take care to avoid multicollisions involving 3
or more elements. From the analysis of Section 6.2.1, assuming that the hash
values behave as random numbers, such multicollisions appear when:

L3 /6 ≈ 22b . (6.5)

Forgetting the constant divisor 6, we ¬nd that we need to choose b ≈ 1.5 log2 L.
Assuming that we search a collision between n-bit numbers and that we have
the usual case L ≈ 2n/2 , this yields b ≈ 3n/4.
Since this value of b is quite large, it is important to re¬ne the analysis,
assuming that we are ready to accept some probability of failure. Clearly, if
we lower b, 3-multicollisions between hash values are going to occur. These
multicollisions make us fail when they hide a real collision on the original
numbers as in the x, y, z example we described earlier. Note that we miss
the collision if only if the bad element occurs ¬rst. Since the numbers es-
sentially appear in a random order, we miss the collision with probability
1/3. Thus, preventing 3-multicollisions is not essential, it su¬ces to prevent

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 209

4-multicollisions6 . Redoing the analysis we start with:

L4 /24 ≈ 23b , (6.6)

and we ¬nd b ≈ 2n/3. Since b is still quite large compared to n, this approach
is mostly useful when n is not too large. A typical example is to consider
40-bit numbers and to take b = 26 or maybe 27. With these parameters, the
hash table can ¬t in memory, even if we use a wasteful representation and
take 64 bits to represent 40-bit number. Indeed, the memory requirements
are either 512 Mbytes or 1 Gbyte. A noticeable property of using hash tables
compared to sort algorithms is the fact that the list of numbers does not need
to be completely available at the start of the algorithm, it can be generated

Algorithm 6.14 Collision search using hash tables
Require: Input (potentially one by one) a list X of N numbers on n bits
Choose b near 2n/3
Allocate a hash table T [0 · · · 2b ’ 1] initialized to ⊥
for i from 1 to N do
if T [h(Xi )] =⊥ then
Let T [h(Xi )] = Xi
if T [h(Xi )] = Xi then
Output ˜Collision found for value Xi ™
end if
end if
end for Hash tables and cache misses
Using hash tables to search for collisions yields a very simple algorithm
which is easily implemented. However, in terms of memory accesses, it has
a very bad behavior. Indeed, it reads and writes in a large table in memory
at essentially random positions. As we already know, with current memory
architecture, this behavior causes many cache misses and slows the program a
lot. It would be nice to be able to change the hash table algorithm in order to
minimize this di¬culty. We now show how this can be done. For this purpose,
let us ¬rst de¬ne b0 as the largest possible bit size such that a hash table with
2b0 entries ¬ts into the memory cache. Of course, if b ¤ b0 , Algorithm 6.14

6 We could even go further if we lower again the probability of success, but this particular
choice achieves a nice balance.

© 2009 by Taylor and Francis Group, LLC
210 Algorithmic Cryptanalysis

does not encounter cache misses and we need no improvement. Thus the
interesting case happens for b > b0 . Since the main memory is usually not
arbitrarily larger than the cache memory, it is safe to also assume that b ¤ 2b0 .
For example, with b0 = 16, i.e., a 64-Kbyte cache, this means that the main
memory is smaller than 4 Gbytes. Under this assumption, we divide each
b-bit hash values into two parts, say the b0 low order bits on one side and the
b1 = b ’ b0 other bits on the other side. For simplicity, we decompose the
hash function h in the same way and get two smaller hash functions, h0 on b0
bits and h1 on b1 bits. Then, we proceed as follow: in a ¬rst phase, for each
incoming value X, we simply throw it on one among 2b1 stacks, more precisely
on the stack numbered h1 (X). Once this ¬rst phase is ¬nished, the initial list
of numbers has been divided into 2b1 stacks. Within each stack, all numbers
share the same value for h1 , i.e., they share the top b1 bits of their h values. If
the inputs are random numbers, the stacks are roughly of the same size. In a
second phase, we simply run Algorithm 6.14 independently on the content of
each stack in order to ¬nd collisions within these stacks. Of course, this yields
the same result as a direct application of Algorithm 6.14 on the original list of
numbers. This approach is described in pseudo-code as Algorithm 6.15. Due
to our hypothesis on b0 , we know that no (or few) cache misses occur during
this second phase. Of course, we also need to check for cache misses during
the ¬rst phase. If we represent each stack as an array and write each new
element at the end of the corresponding array, we are simply writing elements
in memory in an ordered fashion within each array. If the number of arrays
is small enough, this does not cause too many cache misses. To estimate the
number of arrays, note that we need to store in cache, the current starting
position within each stack array, together with a few positions at the current
start of each stack. Thus to be safe, the numbers of stacks 2b1 should remain
below some fraction of the cache size. In practice, this means that b1 should
be smaller than b0 ’ 5 or something similar. Since we only assumed b1 ¤ b0 ,
there is a remaining gap. If required, this gap can be ¬lled by using the same
idea twice: split the original list into stacks, split each stack into substacks
and then search for collisions.

6.3.3 Binary trees
Another interesting method to ¬nd collisions is the use of binary search
trees. A binary search tree, is a binary tree that satis¬es an important addi-
tional property: for any node in the tree, all the nodes in its left subtree have
smaller (or equal) values and all the nodes in its right subtree have greater
values. Binary search trees are represented explicitly, by storing for each node,
its value and a pointer to each of its children. Due to this representation, they
are quite costly in terms of memory usage. From a runtime point-of-view, all
operations are fairly quick. Inserting, removing or searching for an element
in the tree costs at most the depth of the tree. Since a tree of depth k may
hold 2k+1 ’ 1 entries, the elementary cost is logarithmic in the size of the tree,

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 211

Algorithm 6.15 Avoiding cache misses with hash tables
Require: Input a list X of N numbers on n-bits
Choose b near 2n/3
Split b as b0 + b1
Create 2b1 stacks of numbers represented by arrays of size 1.1N — 2’b1
{with a memory margin of 10 percent}
Allocate a hash table T [0 · · · 2b ’ 1]
for i from 1 to N do
Add X[i] to stack number h1 (X[i])
end for
for k from 0 to 2b1 ’ 1 do
If stack k over¬‚owed, Abort
Reinitialize hash table T [0 · · · 2b ’ 1] to ⊥
for all x in stack k do
if T [h0 (x)] =⊥ then
Let T [h0 (x)] = x
if T [h0 (x)] = x then
Output ˜Collision found for value x™
end if
end if
end for
end for

© 2009 by Taylor and Francis Group, LLC
212 Algorithmic Cryptanalysis

assuming that the tree is reasonably full. As a consequence, the main issue
when working with binary trees is to control their depths and to check that
they do not grow too much.
The favorable case is when a tree is built by inserting randomly selected
elements. In that case, the resulting tree is well balanced, all subtrees are
similar and the resulting depth is quite small, i.e, it remains within a factor
of the logarithm of the size. The worst case occurs when a sorted sequence
of elements is inserted one at a time to build the tree. In this case, the tree
degenerates into a list and its depth is equal to its size. As a consequence,
searching for an element takes linear time in the size of the tree.
Because of these extreme behaviors, binary search trees are not suited to
all applications. For a given application, it is prudent to test the trees by
monitoring their depths to see if problems do occur. Thankfully, there exits a
nice solution to ¬x the depth problem with binary search trees, this solution
uses AVL trees, named after their inventors Adelson-Velsky and Landis. These
trees are also called self-balancing trees. The key idea of AVL trees is for each
operation that modi¬es the tree, insertion or deletion, to check that the tree
does not become too unbalanced and if necessary to take corrective action.
Note that AVL trees do not try to minimize the tree depth, this would be too
costly, instead they simply keep the tree depth within a factor of the optimal
value by using a simple criteria. The criteria used by self-balancing trees works
by memorizing for each node the di¬erence between the depth of its left and
right subtrees. Di¬erences of either 0, 1 or ’1 are allowed. Whenever a larger
di¬erence appears, the tree is modi¬ed to reduce it to an acceptable value.
The transformations used to modify the tree are called rotations. Two basic
rotations are possible, left and right rotations, they are inverse of each other.
Applying the correct rotation maintains the binary search property of the tree
while rebalancing the left and right subtrees of the rotated node. Depending
on the detailed structures of the subtree, rebalancing steps requires either a
single or a double rotation.

Use of binary search trees in cryptography. Because of their memory
costs, direct application of binary trees to collision search is not a good idea.
These structures are better suited to maintain collections of data that are
modi¬ed as time goes. An interesting example that may be encountered is
the case where after an initial construction the tree evolves in the following
way: at each time remove a tree element, say the smallest one and add in a
new element. Such an example is given in Chapter 8 in order to construct an
improved birthday paradox attack against knapsack cryptosystems. Detailed description of basic binary search trees
Using a binary search tree requires the application of several algorithms.
Each algorithm starts from a correctly constructed tree and modi¬es into a
new, also correct, tree. At the very beginning, the tree is initialized at empty.

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 213

To encode a tree, we maintain a pointer which points to the root node of the
tree. When the tree is empty, the pointer to the root node is the nil pointer.
Each node in the tree is a structure which contains the value stored in the
node and two pointers, one to the left child and one to the right child. When a
pointer to a child is the nil pointer, there is no child on the corresponding side.
A node without children is called a leaf, any other node is called an internal
node. Optionally, a node may also contain an additional ¬eld to memorize
the size of the left subtree connected to this node. With this additional ¬eld,
it becomes possible to recover the i-th element in the binary search tree in
logarithmic time. We describe all algorithms for binary search trees with
this optional ¬eld in place. In some algorithms, it is necessary to be able to
backtrack from an element to its father. This can be done in two di¬erent
ways, the ¬rst one is to maintain for each node an additional pointer that
memorizes its father, the second one is when passing an internal node to a
tree manipulation algorithm to pass it along with the full path from the root
to this node. The ¬rst method is simpler to implement but requires even
more memory to encode the trees. An alternative method is never to pass an
internal node directly, but to let the manipulation algorithm search for it, thus
determining the access pass itself. This alternative method may sometimes
require the program to perform the same search twice.
Given a binary search tree, the most important algorithm is probably the
search algorithm that tries to locate a given value in the tree. This algorithm
is quite simple, starting from the root, it compares the reference value to the
value stored in the current node. If they are equal, the search is successful,
if the reference value is smaller, the algorithm moves to the left child and
continues the search, if it is greater, it moves to the right child. Another
very important algorithm is the insertion algorithm. Insertion works in two
phases, ¬rst it searches for the value to be inserted, if the value is already
present in the tree, it adds a copy either right after or right before the existing
copy. Otherwise, it adds it at the position where the search ended. Since the
insertion algorithm contains a search part, we do not describe the search in
pseudo-code by itself but include it as part of the insertion Algorithm 6.16.
An important part of the insertion algorithm is the ability to ¬nd an empty
position right before or after an internal node. For an empty position right
before a node, move once to the left and then as many times to the right as
needed to get to a nil pointer. The nil pointer is the place we looked for. To
¬nd the position right after a node, move once right, and continue with left
moves until an empty position is reached. While locating the place where the
insertion itself is going to happen, each time we move to the left, the ¬eld
representing the size of the left subtree needs to be incremented in order to
remain at the correct value.
Thanks to the optional ¬eld that counts the size of the left subtree, we
may also search for the i-th element in the tree (we assume as usual that
elements are counted from zero). Once again we start at the root and proceed
by moving down the tree. If the size of the left subtree for the current node is

© 2009 by Taylor and Francis Group, LLC
214 Algorithmic Cryptanalysis

Algorithm 6.16 Insertion in a binary search tree
Require: Input tree root R, free node N and value to insert X
Set the value of N to X
Set the children of N to nil
Set the left subtree size of N to 0
if R is the nil pointer then
Replace R by N and return
end if
Set Current ←’ R {First search for insertion position}
while true do
if Value of Current node < X then
if Current node has no right child then
Set right child of Current node to N
Let Right be the right child of Current node
Set Current ←’ Right
end if
Increment the left subtree size ¬eld within Current
if Current node has no left child then
Set left child of Current node to N
Let Left be the left child of Current node
Set Current ←’ Left
end if
end if
end while

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 215

s, three cases are possible, if i < s then we need to look for the i-th element
in the left subtree, if i = s then the current node is the right one, if i > s then
look for element i ’ s ’ 1 in the right subtree.
Finally, we may also delete a given tree node. If the node to delete is a
leaf, it is quite easy, it su¬ces to remove it by replacing the pointer to it
in its father node by the nil pointer. After that, we need to backtrack the
access path to this node all the way up to the root to correct the size of each
left subtree where needed. Similarly, to delete a node with a single child, it
su¬ces to remove the node and link its child directly to its father. If the node
to delete has two children, deletion is more tricky. First, we need to locate
the immediate predecessor (or successor). Then, we swap the values stored
in the node to delete and in its successor. Finally, we delete the successor.
Since this successor has at most one child, it can be deleted using the simple
approach. Note that during the deletion algorithm, the binary search tree
is temporarily invalid. The value to be deleted may be stored in incorrect
order compared to the rest of the tree. However, when the value is ¬nally
removed, the tree becomes correct once again. The deletion algorithm is
given as Algorithm 6.17.

Algorithm 6.17 Deletion in a binary search tree
Require: Input node N to delete, with path P of ancestors to the root
if N is a leaf or has at most one child C then
Replace pointer to N in its father by a pointer to C or to nil
for all Ancestors A in path P do
if N is in the left subtree of A then
Decrement the left subtree size ¬eld within A
end if
end for
Let M be the left child of N
Let path Q ←’ (P N )
while M has a right child do
Let path Q ←’ (Q M )
Replace M by its right child
end while
Exchange the value ¬elds of M and N
Call deletion algorithm on M with path Q {M has at most one child}
end if

© 2009 by Taylor and Francis Group, LLC
216 Algorithmic Cryptanalysis

6.4 Application to discrete logarithms in generic groups
Collisions can be used to computed discrete logarithms in arbitrary groups,
using the baby-step, giant-step method. Before presenting this method, it
is useful to ¬rst show that computing discrete logarithms in a group whose
cardinality is known is, assuming that the factorization of the cardinality is
given, no harder than computing discrete logarithms in all subgroups of prime
order. The Pohlig-Hellman algorithm is a constructive method to compute
discrete logarithm in the whole group from a small number of calls to discrete
logarithms computations in the subgroups of prime order.

6.4.1 Pohlig-Hellman algorithm
First, let us recall the de¬nition of discrete logarithm. Given a group G,
denoted multiplicatively and two elements of G, say g and h, computing the
discrete logarithm of h in basis g amounts to ¬nding, if it exists, an integer
a, such that h = g a in G. For discrete logarithms computations are not
necessarily possible for all elements h of G. However, if G is a cyclic group
generated by g, then the discrete logarithm is de¬ned for all elements of G.
In the sequel, we make this assumption.
Note that, if N denotes the cardinality of G, the discrete logarithm of h in
basis g is only determined modulo N , since g N = 1 the identity element in
G. An interesting consequence is that any algorithm able to compute discrete
logarithm can be used to obtain N . For simplicity, assume that we know the
order of magnitude of N and, more precisely, assume that N lies in the range
[N0 + 1, 2N0 ]. If the discrete logarithm algorithm outputs a normalized value
between 0 and N ’ 1, it su¬ces to ask for the discrete logarithm of g 2N0 , say
a. Then we know that N divides 2N0 ’ a and even that N = 2N0 ’ a. If
the discrete logarithm is allowed to output any of the multiple possible values
for the discrete logarithm, choose a random integer b between 0 and some
multiple of N0 , say 10N0 . Then ask for a discrete logarithm of g b and let
a denote the resulting value. Since g a = g b , |b ’ a| is a small multiple of
N , possibly 0. If a = b, it is a non-zero multiple. Since there are not many
divisors of this multiple in the range [N0 + 1, 2N0 ], we can easily recover N .
However, we need to make sure that the discrete logarithm algorithm does not
systematically output a = b. This comes from the fact that we are choosing b
at random in a large range. With this strategy, even an adversarial discrete
logarithm algorithm cannot systematically determine b and, at some point, it
outputs some other value for the discrete logarithm. Finally, even if we do
not know a precise range, it is usually possible to ¬nd N by computing the
GCD of a few multiples obtained as above.
As a consequence, in the context of discrete logarithm computations, it is
reasonable to ask for the group cardinality N in advance. We also assume that

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 217

the factorization of N into prime is known7 , i.e., we have the decomposition:
p ei .
N= (6.7)

In the cyclic group G, generated by g, there are subgroups Gi of order pi
for 1 ¤ i ¤ n. Moreover, each subgroup Gi is generated by Gi = g N/pi .
In addition, G also contains subgroups Gi of order pei generated by Gi =
N/pi i
g . Pohlig-Hellman algorithm works in two steps. First, it uses ei calls to
the discrete logarithm algorithm in Gi to compute discrete logarithms in Gi .
Second, it pastes these values together to obtain a discrete logarithm in G.
Starting with the second step of the algorithm, recall that we are given g
and h in G and want to ¬nd a number a, determined modulo N , such that ei
h = g a . If we raise this equality to the power N/pei and let Hi = hN/pi ,
we obtain a discrete logarithm problem in Gi , namely Hi = (Gi )a . Since Gi
is a group of order pei , this yields the value of a modulo pei . Since the pei
i i i
are pairwise coprime, we can use the Chinese remainder theorem and obtain
a modulo N .
It remains to see how discrete logarithms in Gi can be computed from
ei calls to the discrete logarithm algorithm in Gi . Of course, it is clear for
ei = 1. To treat the general case, it is useful to introduce a family of groups:
(j) (j)
(Gi ) with 1 ¤ j ¤ ei . Each group Gi has cardinality pj and is generated
(j) (1) (ei )
by gi = g i . We have G = Gi and Gi = Gi . In order to prove our
result, it su¬ces to show by induction on j that discrete logarithms in each
group Gi can be computed using j calls to discrete logarithms in Gi . It is
clear for j = 1. To compute a discrete logarithm in Gi , or equivalently,
(j+1) x
given z in this group to compute x such that z = (gi ) , we proceed as
• Raise the equation to the power pi and remark that
(j+1) pi x (j)
z pi = (gi = (gi )x
is a discrete logarithm problem in Gi . By induction hypothesis, we
learn x0 the value of x modulo pj in j calls to discrete logarithms in Gi .

• Write x = x0 + pj x1 , with 0 ¤ x1 < pi and remark that:
z (j+1) pj x1
(j+1) x’x0 (1)
= (gi )x1 .
= (gi ) = (gi ) i
(j+1) x0
(gi )
Thus x1 can be obtained by an additional discrete logarithm computa-
tion in Gi .

7 Thisis not a real limitation either, because in most practical cases, factoring N is no
harder than computing the discrete logarithm.

© 2009 by Taylor and Francis Group, LLC
218 Algorithmic Cryptanalysis

This concludes the proof. An iterative version of Pohlig-Hellman method is
given as Algorithm 6.18.

Algorithm 6.18 Pohlig-Hellman discrete logarithm algorithm
Require: A group G of cardinality N = i=1 pei . i
Require: A group generator g and a group element h
for i for 1 to n do
Let a0 ←’ 0
Let g0 ←’ g N/pi
for j from 1 to ei do
Let h0 ←’ (hg ’a0 )N/pi
Call discrete logarithm algorithm in subgroup of order pi generated by
g0 for element h0 . Let a denote the result.
Let a0 ←’ a0 + pj’1 a
end for
if i = 1 then
Let A ←’ a0 and M ←’ pe1 . 1
Compute the Chinese remainder of A modulo M and a0 modulo pei i
Put the result in A.
Let M ←’ M pei .i
end if
end for

6.4.2 Baby-step, giant-step algorithm
Thanks to Pohlig-Hellman algorithm, we only need to compute discrete
logarithms in a group G of prime order p, generated by g. We are given h and
search for an integer a in the range [0, p ’ 1] such that h = g a . Let r be the
integer obtained by rounding up the square root of p. It is clearly possible
to write a as a0 + ra1 , with 0 ¤ a0 < r and 0 ¤ a1 < r. As a consequence,
we have an equality between hg ’a0 and g ra1 in the group G. Conversely, any
equality of this form yields the discrete logarithm of h.
From an algorithmic point-of-view, this means that we can ¬nd the discrete
logarithm we seek by searching for a collision between two lists of r elements
each. The ¬rst list contains all group elements of the form hg ’a0 . The second
list contains all group elements of the form g ra1 . The name of the algorithm,
comes from the fact that elements of the ¬rst list are separated by small
steps (or baby steps), corresponding to multiplication by g and elements of
the second list by large steps (or giant steps), corresponding to multiplication
by g r .

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 219

Algorithm 6.19 Baby-step, giant-step discrete logarithm algorithm
Require: A group G of prime order p.
Require: A group generator g and a group element h

Let r = p
Create T an array of r group elements
Create I an array of r integers
Let k ←’ h
for i for 0 to r ’ 1 do
Let T [i] ←’ k and I[i] ←’ i
Let k ←’ k/g
end for
Sort T and perform all exchanges simultaneously on I.
Let k ←’ 1 (neutral element in G)
Let G ←’ g r in G
for i from 0 to r ’ 1 do
Lookup k in T (using a binary search)
If k is found at position j, return ri + I[j] as discrete logarithm.
Let k ←’ k · G in G
end for

© 2009 by Taylor and Francis Group, LLC
220 Algorithmic Cryptanalysis

1. Neglecting skip years and seasonal birthrate irregularities, compute for
sets of ten to thirty individuals, possibly writing a program, the proba-
bilities of birthday collisions.

2. Check that CBC-MAC without ¬nal reencryption is insecure by showing
that if a message containing a single block M (1) has authentication tag t,
then the two-block message M (1) M (1) • t has the same authentication

3h . Write down an explicit chosen message distinguishing attack on CBC
encryption, when used beyond the birthday paradox limit, using the
technique described in Section

4. Why is the probability for twins to have the same birthday not equal to

5h . Consider a hash function obtained by directly applying the Merkle-
Damg˚ construction (Chapter 5, Section 5.4.1) to a permutation fam-
ily π. The goal of this exercise is to show a weakness of this hash
function with respect to the preimage resistance property. This means
that starting from an intermediate hash value h and a message block m,
the next hash value is h = πm (h).

(a) Show that when π ’1 is available (for example, when considering
a block cipher), the hashing process can be reversed, i.e., given a
intermediate hash value h and a message block m, the previous
hash value h can be recovered.
(b) Let h0 be the initial value of the hash function. Choose a large
set containing short sequences of message blocks. For each such
sequence Mi , starting from h0 and considering each block of Mi in
turn, we can compute an intermediate hash value which we denote
by hi .
(c) Let hF be a target value for the hash function. Similarly choose
another large set containing short sequences of message blocks.
For each such sequence Mi , starting from hF and considering each
block of Mi in turn, we can backward compute an intermediate
hash value which we denote by hi .

6h . Again considering a Merkle-Damg˚ based hash function H, we now
turn to multicollisions. Let A and A(2) be two single-block messages

such that H(A(1) ) = H(A(2) ). Let B (1) and B (2) be two message blocks
such that H(A(1) B (1) ) = H(A(2) B (2) ), construct a 4-collision.

© 2009 by Taylor and Francis Group, LLC
The birthday paradox: Sorting or not? 221

(a) More generally, construct a 2t -collision on messages of t-blocks.
(b) How secure against collision resistance is the hash function ob-
tained by concatenating two Merkle-Damg˚ hash functions, where
each function is on n bits.
7. Check that the dichotomy search given as Algorithm 6.2 always returns
the position of the last occurrence of an element. Deduce that the
insertion sort Algorithm 6.6 is stable.
8. Any sorting algorithm can be made stable by a generic technique, based
on storing pairs containing elements and their initial position. Give the
order relation that should be used to achieve this e¬ect.
9h . At ¬rst, it may seem that in quicksort and radix sort, the basic par-
titioning preserves the relative order of elements stored in the stack of
small element and reverses the order of elements in the stack of large
elements. Explain why it is not the case. Show how to do this with an
auxiliary array of N elements. Conclude by writing a stable version of
quicksort and radix sort.

© 2009 by Taylor and Francis Group, LLC
Chapter 7
Birthday-based algorithms for

In Chapter 6, we introduced several algorithmic techniques, that allow us to
apply the birthday paradox to arbitrary lists of randomly distributed objects.
These techniques do not take into account the speci¬c details of how these
objects are generated. In this chapter, we are going to focus on a more speci¬c
problem and look for collisions among objects which are generated by iterating
some ¬xed function. In this speci¬c case, many improvements are possible.
This is quite important for cryptanalysis, because there are many applications
where this improved setting can be exploited. In particular, there are some
very important number theoretic algorithms due to Pollard that use these
techniques to factor integers or compute discrete logarithms.
In the most general setting, we are given a set S and a function F from S to
itself. Then starting from a point X0 in S, we generate the sequence de¬ned
Xi+1 = F (Xi ). (7.1)
Our main goal is to ¬nd a collision within this sequence, say Xi = Xj . Note
that due to the deterministic nature of the above computation, from any such
collision, we may generate many. Indeed, whenever Xi = Xj , we ¬nd that
Xi+1 = F (Xi ) = F (Xj ) = Xj+1 and obtain a new collision. Of course,
iterating this remark, we also have Xi+2 = Xj+2 and so on.
In order to study this algorithmic problem, it is important to assume that
F behaves more or less randomly. Without this assumption, the birthday
paradox has no reason to hold. Furthermore, when omitting the assumption,
it is easy to build counterexamples. Take the set S = [0 · · · 2n ’ 1] of integers
modulo 2n and de¬ne F (X) = X + 1 (mod 2n ). Then, for any choice of X0 ,
we cannot ¬nd a collision in the sequence Xi , unless we have considered the
full set of 2n elements. In this case, the birthday paradox would predict a
collision after 2n/2 elements. Thus, we see that it does not apply to this coun-
terexample. In order to theoretically analyze the behavior of such sequences,
F is usually modeled as a random function. This model and the correspond-
ing analysis are presented in Section 7.2 below. For now, we simply assume
that after some time the sequence X loops back on one of its previous values.
Note that since F is a function, not a permutation, nothing guarantees that
X loops back on X0 . On the contrary, it usually restarts at a later position
in the sequence. A traditional way of describing this fact is to draw a picture

© 2009 by Taylor and Francis Group, LLC
224 Algorithmic Cryptanalysis


Figure 7.1: Rho shape while iterating a random function

that represents the sequence X by making sure that consecutive elements of
the sequence are neighbors on the picture. On the resulting Figure 7.1, we see
a shape similar to the letter ρ. This gave its name to Pollard™s Rho factoring
algorithm. In this ¬gure, we see that the sequence has two distinct parts: a
cycle and a tail. If the ¬rst collision in X is a collision between Xs and Xs+ ,
then the tail contains elements of the sequence from X0 up to Xs’1 and the
cycle starts from Xs and its length is , the period of X.

7.1 Algorithmic aspects
The key question in this chapter is to ¬nd collisions in the sequence X or
even better to determine the respective lengths of the cycle and tail of the
sequence. A ¬rst solution to this problem is to use classical birthday-based
algorithms. For example, we can use a binary search tree to store the elements
of the sequence X one at a time as long as there is no duplicate in the tree.
The ¬rst duplicate occurs when Xs+ is inserted in the tree and we ¬nd that
this value is already present due to Xs . This method has a running time
O((s + ) log(s + )) and needs to store O(s + ) elements. The running time is
essentially optimal, since, unless the function F has very speci¬c properties,
we need to generate the sequence up to Xs+ to detect its cycle. Moreover,
this algorithm directly generates both the length of the cycle and the length
of the tail s. However, its storage costs are high. The good news is that
there exist alternative algorithms that do not require such a large amount of
memory. These algorithms do not directly output s and but instead ¬nd a
collision somewhat later in the cycle. However, with a reasonable additional
computational cost, it is possible to recover s and from this late collision.
We start by presenting two simple algorithms that obey an additional re-
strictive condition. More precisely, these two algorithms access values of the
sequence X in a restrictive way, either as inputs to F or in equality tests,
but in no other manner. This restrictive use of the sequence™s values can be
very important because it allows the cycle ¬nding algorithms to be used in

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 225

a black-box context where the sequence X can only be accessed indirectly.
In particular, this restriction is essential to Pollard™s Rho factoring algorithm
(see Section 7.3.2). Indeed, in Pollard™s Rho factoring, equality tests can only
be performed indirectly through GCD computations.

7.1.1 Floyd™s cycle ¬nding algorithm
In this ¬rst algorithm, we start by de¬ning a new sequence Y which is
related to X by the equation:

Yi = X2i . (7.2)

Alternatively, the sequence Y can be de¬ned directly by the formulas:

Y0 = X0 and Yi+1 = F (F (Yi )). (7.3)

Algorithm 7.1 Floyd™s cycle detection algorithm
Require: Input initial sequence value X0 , max. iterations M
Let x ←’ X0
Let y ←’ X0
for i from 1 to M do
Let x ←’ F (x)
Let y ←’ F (f (y))
if x = y then
Output ˜Collision between i and 2i™
end if
end for
Output Failed

Once Y is de¬ned, Floyd™s algorithm looks for the ¬rst index t such that
Yt = Xt . Clearly, this can be done in time O(t) using a very small amount
of memory, as shown by the pseudo-code in Algorithm 7.1. Indeed, it su¬ces
to compute the two sequences in parallel until they collide. To analyze the
behavior of this algorithm, we need to better understand the relation between
the position t of this ¬rst collision between X and Y and the parameters s
and of the sequence X.
First, since X2t = Xt , we ¬nd that t is a period of X and thus a multiple
of the period . Second, since Xt is in the cycle of X, t ≥ s. In fact, t is the
smallest multiple of greater than (or equal to) s:
t= . (7.4)

© 2009 by Taylor and Francis Group, LLC
226 Algorithmic Cryptanalysis

Moreover, once t is known, recovering is easy, it su¬ces to compute the
sequence Xt+i until we loop back to Xt . This happens after exactly steps.

7.1.2 Brent™s cycle ¬nding algorithm

The main drawback of Floyd™s algorithm is the fact that each step requires
three applications of the function F in order to advance both sequences X
and Y . Brent™s algorithm is an alternative that only requires the computation
of a single copy of the original sequence. The starting point of the method is
to remark that once we are given a point Xk inside the cycle, we can ¬nd the
cycle™s length in exactly steps, using the same method as when recovering
after Floyd™s algorithm. The di¬culty is to ¬nd a start-point in the cycle. In
Brent™s algorithm, this is done by trying several start-points until an adequate
one is found. In order to minimize the overall cost, the computation steps used
to ¬nd a potential cycle are shared and also used to compute the next start-
point to be tested. More precisely, we consider the sequence of start-points
X1 , X2 , X4 , . . . , X2i , . . . indexed by powers of two. For each start-point,
X2i , we compute the sequence of successors X2i +j until we either ¬nd a cycle
or reach X 2 . In the former case, we are done, in the latter, we change
to the next start-point X 2 . This approach is described in pseudo-code as
Algorithm 7.2.

Algorithm 7.2 Brent™s cycle detection algorithm
Require: Input initial sequence value X0 , max. iterations M
Let x ←’ X0
Let y ←’ x
Let trap ←’ 0
Let nexttrap ←’ 1
for i from 1 to M do
Let x ←’ F (x)
if x = y then
Output ˜Collision between trap and i™
end if
if i = nexttrap then
Let trap ←’ nexttrap
Let nexttrap ←’ 2 · trap
Let y ←’ x
end if
end for
Output Failed

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 227

As before, given the parameters s and of the sequence X, we can analyze
the behavior of Brent™s method. It succeeds with starting point X2k , if and
only if, 2k ≥ max(s, ). Indeed, to ¬nd the cycle with this start-point, the
point should be in the cycle, i.e., 2k ≥ s, and the cycle™s length should be
at most 2k . Clearly, Brent™s method ¬nds with complexity O(s + ).
Finally, note that our choice of putting traps in positions of the form 2k is
not the only option. Depending on the speci¬c problem at hand, we could con-
sider another quickly increasing sequence. This could be either a (rounded)
geometric progression ±k or something di¬erent such as a Fibonacci se-

7.1.3 Finding the cycle™s start
We have seen that both Floyd™s and Brent™s algorithms can be used to ¬nd
the length of the cycle in sequence X. However, neither produces the length
of the tail s or equivalently the position where the cycle starts. From the
point-of-view of collision ¬nding, this position is quite important. Indeed, we

F (Xs’1 ) = Xs = Xs+ = F (Xs+ ’1 ) and Xs’1 = Xs+ ’1 . (7.5)

Thus, ¬nding the entrance of the cycle yields a true collision in F and this is
the only place in the sequence X where such a true collision can be found.
After either Floyd™s or Brent™s algorithm, we know a number t ¤ 2(s + )
such that Xt is in the cycle of X. Thus, the start of the cycle is an integer
s in the interval [0 · · · t]. From this information, it is possible to obtain s in
several di¬erent ways. We now present two di¬erent approaches to solve this
problem. Dichotomy search
Once is known, it is possible for any number σ in the interval from 0 to
t, whether s ¤ σ by testing if Xσ = Xσ+ . Indeed, if the condition holds, Xσ
is in the cycle and σ ≥ s, otherwise, σ < s. This test can be performed by
computing the sequence X up to σ + , which requires O(s+ ) steps. Thus we
can ¬nd s using a dichotomy search with complexity O ((s + ) log(s + )). It
is interesting to remark that this approach has the same runtime complexity
as a generic birthday method that relies on a fast sort algorithm. Direct search
The main problem with the above method is that we need to compute sev-
eral overlapping parts of the same sequence, thus recomputing the same infor-
mation many times. To avoid this drawback, we can proceed di¬erently. First
compute X . Then, from X0 and X compute in parallel the two sequences Xi
and X +i , testing for equality at each step. When the two sequences collide,

© 2009 by Taylor and Francis Group, LLC
228 Algorithmic Cryptanalysis

we have s = i. This allows us to recover s and obtain a true collision1 in time
O(s + ). This is described as Algorithm 7.3.

Algorithm 7.3 Algorithm for recovering a cycle™s start
Require: Start-point X0 , cycle length , max. iterations M
Let x ←’ X0
Let y ←’ x
for i from 1 to do
Let y ←’ F (y)
end for
if x = y then
Output ˜Cyclic sequence, no real collision™
end if
for i from 1 to M do
Let x ←’ F (x)
Let y ←’ F (y)
if x = y then
Output ˜Collision between images of x and y™
end if
Let x ←’ x
Let y ←’ y
end for
Output Failed

As a consequence, we see that ¬nding collisions in recursively de¬ned se-
quences of this type is much more practical than ¬nding collisions in lists of
random objects.

7.1.4 Value-dependent cycle ¬nding
As explained above, both Floyd™s and Brent™s algorithms detect a cycle in
a sequence X, while using X only in a restrictive way, either through F or in
equality tests on the values taken by the sequence. We know that this addi-
tional property is extremely useful in some cases. However, in other cases, the
sequence X is directly available. Thus, it is natural to ask whether cycle ¬nd-
ing algorithms can be improved using more value-dependent computations.
The surprising answer is that it is indeed possible. A ¬rst step in this di-
rection can be achieved using a distinguished point method, as proposed for

1 Unless s = 0.

© 2009 by Taylor and Francis Group, LLC
Birthday-based algorithms for functions 229

example in [QD90]. This consists of memorizing values that satisfy a speci¬c
property, called distinguished points, and waiting until they come back. The
main di¬culty here is to choose the right probability of occurrence for distin-
guished points. If they are too frequent, then we need to store a huge number
of values and the algorithm is ine¬cient. If they are too rare, then the cycle
may not contain any distinguished point at all and the technique fails. We
discuss the use of distinguished points in more detail in Section 7.5.2.
A stronger way of using the values taken by the sequence in a cycle detection
algorithm was proposed, in 2004, by Nivasch in [Niv04]. The main advantage
of his algorithm is that it is guaranteed to stop somewhere between Xs+
and Xs+2 ’1 , i.e., during the second iteration of the sequence cycle. This is
particularly useful for sequences where is small compared to s. The key idea
is to maintain a (hopefully) short stack of already encountered values in a way
that guarantees cycle detection. To build this stack, Nivasch™s algorithm relies
on the existence of a total ordering on the set of sequence values and focuses
on keeping small values taken by the sequence on the stack. The ultimate goal
is to make sure that the smallest value occuring in the cycle is always stored
on the stack and is always present for detection during the second iteration
around the cycle. In order to achieve that, at any point in time, the stack
memorizes the minimum value, say m1 , encountered up to that time while
computing X, followed by the minimum value (say m2 ) encountered after m1
in the computation, followed by the minimum value (say m3 ) encountered
after m2 , . . . Clearly, the minimum value of the cycle is one of those, say
mk . Indeed, any value of the sequence X smaller that mk is out of the cycle
and thus part of the pre-cycle. Since mk occurs after these possibly smaller
elements, by de¬nition of our stack, it gets to be stored on the stack and
remains there afterward.
From an algorithmic point-of-view, building such a stack is an easy matter.
Whenever, we compute a new value Xi of the sequence X, we remove all
values larger than Xi from the stack and add Xi to the stack. Moreover, it is
easy to see that if Xi we add at the end of the stack, then this stack is always
sorted in increasing order. Thus, erasing larger values can be done very easily
by using a dichotomy search on the pre-existing stack and by truncating the
stack after the point where Xi needs to be inserted. It is shown in [Niv04] that
the expected size of the stack at time i is log i and that the size of the stack
remains below O(log i) almost surely. Thus, except in extremely rare cases,
the memory requirements of Nivasch™s algorithm are small. When considering
the time complexity of this algorithm, we need to consider two di¬erent kinds
of operations: evaluations of the function F and other operations. Indeed,
in almost all cases, the computation of values of F dominates the running
time. Clearly, since X is computed once for all considered positions, up to at
most position s + 2 ’ 1, this bounds the number of evaluations of F . The
dichotomy search within the stack dominates the other operations, and the
cost of each search is logarithmic in the size of the stack. Thus, the time
complexity of other operations is bounded by O((s + 2 ) log log(s + 2 )). At

© 2009 by Taylor and Francis Group, LLC
230 Algorithmic Cryptanalysis

¬rst, this complexity may appear worse than the complexity of the evaluation
of F . However, remember that F operates on numbers of at least O(log(s+2 ))
bits, otherwise the cycle would be shorter. Thus, each evaluation of F costs at
least O(log(s + 2 )) and, when accounting for this unit cost, evaluations of F
dominates the complexity. Thus, where applicable, Nivasch™s algorithm gives
a nice improvement of cycle ¬nding algorithms. We give a full description of
it in Algorithm 7.4.

Algorithm 7.4 Nivasch™s cycle detection algorithm
Require: Input initial sequence value X0 , max. iterations M
Create empty stack S
Let x ←’ X0
Add pair (x, 0) to the stack
for i from 1 to M do
Let x ←’ F (x)
Search x by dichotomy in the (sorted) ¬rst component
if (x, j) is found then
Output ˜Collision between i and j™
We know that Sk < x < Sk+1
Truncate S after Sk
Add pair (x, i) at the end of S
end if
end for
Output Failed

Another advantage of Nivasch™s algorithm compared to Floyd™s or Brent™s
algorithms is that it is guaranteed to directly output the cycle length , rather
than a multiple thereof. Thus, for applications where the exact cycle™s length
is needed, we can obtain an additional gain of function™s evaluations. Nivasch and the cycle™s start


. 8
( 18)