ńņš. 10 |

additive representation of G can easily be derived. If, furthermore, an iso-

morphism from G to its additive representation can be computed eļ¬ciently

enough, we are clearly done. Similarly, when G can be expressed as a direct

product G1 Ć— G2 with the right repartition of sizes, we can directly use the

ļ¬rst approach we used for groups in additive representation, even when G1

and G2 are given in arbitrary form. Note that when G has a small enough

subgroup (smaller than the square root of the size of G), then it is possible to

tabulate the discrete logarithms in this subgroup in order to use an additive

representation for this part.

Nevertheless, in our context, there remains a few bad group presentations

which prevent us from using a birthday approach with reduced memory re-

quirement. A typical example is the group G of an elliptic curve over a ļ¬nite

ļ¬eld (see Chapter 14). If the order of the group is prime (or even a product of

two primes of comparable sizes), then there is no known algorithms to solve

the problem of four lists with memory requirement lower than the generic

birthday-based attack. Another natural example to consider is the case of

multiplicative subgroups in ļ¬nite ļ¬elds. Of course, if we consider the full mul-

tiplicative group and look for naturally occurring random solutions, the order

of the group cannot be too large and we are thus in the case where discrete

logarithms can be computed eļ¬ciently. Indeed, with a group of order N , the

subexponential complexity of discrete logarithm computation, even repeated

N 1/4 times to deal with all the elements in the four lists is negligible compared

to the birthday running time of the algorithm N 1/2 . On the other hand, with

a moderately large subgroup in a large ļ¬nite ļ¬eld, computing N 1/4 discrete

logarithms with Pollardā™s Rho is not an option. Similarly, if we consider small

sets of the full multiplicative group and know that a solution has somehow

been prearranged, we do not know how to apply the four-list approach. We

illustrate this in Section 8.4.2 where we study possible cryptanalysis of plain

RSA and plain ElGamal encryptions.

8.2.3 Working with more lists

Clearly, Equation (8.8) can easily be generalized to a diļ¬erent number

of lists. With two lists, the algorithmic issue is simple since the ordinary

birthday-based approach is almost optimal. With three lists of roughly equal

sizes N 1/3 in a group of N elements, the best known algorithm is a simple

adaptation of the basic birthday approach. Sort the ļ¬rst list L1 and then for all

pairs (g2 , g3 ) of elements in L2 Ć—L3 , search for (g2 g3 )ā’1 in L1 . The required

memory is the optimal value N 1/3 but the running time is O(N 2/3 log N ).

With more than four lists, the best known algorithm is to transform the

problem into a four-list version and proceed. Assuming t lists of roughly equal

sizes, we ļ¬rst group the lists into packets of approximately t/4 lists. When t

is a multiple of four, we group the t/4 ļ¬rst lists and put all the corresponding

sums (or group compositions) into the ļ¬rst list L1 of the target four-lists

problem. We group the next packet of t/4 lists into L2 , L3 and L4 . Clearly,

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 263

any solution of the transformed four-list problem is a solution of the original

problem with t lists. When t is not a multiple of four, we group the lists in

order to balance the size of the lists as far as possible. However, in that case,

the resulting time-memory tradeoļ¬ is not as good.

8.3 Extensions of the technique

8.3.1 Multiple targets

A natural generalization is to consider the case where instead of a single

acceptable target, there are many. Clearly, if we are given a list L of possible

target, a very easy solution is to move the target taken from the list L from

the right-hand side to the left-hand side and obtain a new instance of the

original problem with a single target. This instance has one more list but we

already know how to address this case.

However, if the set of targets has some extra structure and can be described

more eļ¬ciently than by a simple list, it is possible to obtain better approaches.

To illustrate this, we start with a simple example. Assume that we are consid-

ering a four-list problem with lists L1 , L2 , L3 and L4 of elements in a group

G1 Ć— G2 and that the list of allowed targets is a direct product L = L1 Ć— L2 .

In other words, a pair (g1 , g2 ) is a valid target, if and only if, g1 ā L1 and

g2 ā L2 . Moreover, assume that G1 and G2 have similar sizes, of the order

of 2n , that each of the lists L1 to L4 have size 2n/4 and that each of the lists

L1 and L2 have size 2n/2 . With these parameters, the expected number of

solutions is 1.

With this speciļ¬c instance, a ļ¬rst approach is to rewrite the problem as a

a problem of ļ¬nding a ļ¬xed target within a sum of six lists. In approach, we

view each element g1 of L1 as a representative for (g1 , 0) in G1 Ć— G2 and each

element g2 of L2 as a representative for (0, g2 ). If, in addition, we construct

the two sums of lists L1 + L2 and L3 + L4 , we can directly use the ordinary

case of four sets, grouping L1 + L2 and L1 on one size, L3 + L4 and L2 on the

other. We obtain a running time of 2n and a memory requirement of 2n/2 .

However, this approach does not take advantage of the extra structure within

L , or equivalently of the fact that L1 only contains elements of G1 and L2

elements of G2 .

In order to improve this case, we now consider a diļ¬erent approach. Here,

we ļ¬rst focus on the partial solutions, with a correct value in G1 but not nec-

essarily in G2 . Clearly, L2 is not needed to construct these partial solutions,

since any element of L2 has a zero contribution on the G1 component of sums.

The expected number of partial solutions is 2n/2 ; moreover, using the basic

algorithm with one group containing L1 , L1 and the other group containing

L2 + L3 , L4 we can construct all these solutions in time 23n/4 . In our precise

Ā© 2009 by Taylor and Francis Group, LLC

264 Algorithmic Cryptanalysis

case, the memory requirement is 2n/2 , due to the need to store L1 . Once

a partial solution is constructed, it is easy to evaluate the G2 component of

the corresponding sum. Then, to check whether the partial solution is also

correct on G2 , it suļ¬ces to test if the opposite of the G2 component belongs

to L2 , using a dichotomy search. As a consequence, with this speciļ¬c set of

parameters, using the idea of partial solutions reduces the running time, on a

group G1 Ć— G2 of size 22n , from 2n down to 23n/4 .

8.3.2 Wagnerā™s extension

Up to now, we have concerned ourselves with ļ¬nding all solutions to Equa-

tion (8.8), usually in the case where a small number of solutions exist. David

Wagner [Wag02] asks a slightly diļ¬erent question and looks for a single solu-

tion to this equation, especially in cases where a large number of solutions is

expected. The surprising conclusion, in this case, is that the algorithms can be

adapted to yield a completely diļ¬erent and much more eļ¬cient time/memory

tradeoļ¬. A precursor of this idea was already present in [CP91].

Let us start by considering the problem of xoring four bitstrings in this

new context. We are given four lists L1 , L2 , L3 and L4 , each containing 2Ī²n

n-bit numbers for some parameter Ī² ā„ 1/4. On average, we expect 24Ī²ā’1

solutions. The key idea of Wagner, is to restrict the search and only look for

solutions (x1 , x2 , x3 , x4 ) where the restriction of x1 ā• x2 (and of x3 ā• x4 ) to

a fraction Ļ„ n of the bits is zero. The expected number of pairs x1 ā• x2 such

that x1 ā• x2 is zero on these Ļ„ n bits is 22Ī²ā’Ļ„ . From this, we deduce that

the expected number of solutions to Equation (8.8) that satisfy the additional

condition is 24Ī²ā’Ļ„ ā’1 . A natural choice of parameters is Ļ„ = Ī² = 1/3. In that

case, we expect to ļ¬nd one solution (among many) to the four-list problem,

using time and memory O(n Ā· 2n/3 ). From a running time point-of-view, this

is more eļ¬cient than the usual O(n Ā· 2n/2 ) of birthday-based attacks. On the

other hand, it only works when much larger lists of numbers are available.

Thus, we already see with four lists that asking for a single solution among

a large set leads to a better time/memory tradeoļ¬. However, the real power of

this technique appears when we consider a larger number of lists. To simplify

the exposition of the method, let us assume that we are given 2t lists of 2Ī±n

numbers on n bits. The simpliļ¬cation comes from the fact that when the

number of lists is a power of 2, we can more easily organize the computation,

since we can group lists by packets of 2, 4, . . . Within this list, we want to

ļ¬nd elements x1 , x2 , . . . , xt , with each xi in the corresponding list Li , whose

sum is zero (using a XOR sum). We proceed by using a succession of steps.

In each step, we regroup lists two by two, thus halving the number of lists.

Moreover, at each step, we cancel a fraction of the high bits. To balance

the computation, it is best to preserve the size of the lists considered at each

step, except in the ļ¬nal one, where we are happy to obtain a single solution.

Since the original lists contain 2Ī±n elements, it is possible to cancel Ī±n bits at

each step while preserving the sizes of the lists. Indeed, combining two lists

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 265

together yield 22Ī±n possibilities and forcing Ī±n bits to be zeros reduces this

number by a factor of 2Ī±n . At the ļ¬nal step, we use the birthday paradox

and expect a collision on 2Ī±n bits to occur for lists of 2Ī±n elements. With

2t lists, we perform t steps and we can work with Ī± ā 1/(t + 1). Both from

a time and memory point-of-view, the complexity of this approach is 2t+Ī±n . ā

If we are free to optimize this complexity by choosing t, then using t ā n

ā

yields a subexponential algorithm with complexity 22 n .

With this extension, using bitstrings is a very comfortable option, indeed,

once a fraction of the bits has been set to zero in all groups at some steps, this

fraction clearly remains zero throughout the end of the algorithm. As already

noticed in [Wag02], this can be generalized to many other groups. However,

when generalizing Wagnerā™s extension to diļ¬erent groups, we need to take

care to avoid the resurgence of non-zero values in unwanted places. Over a

product of small groups, things are also well-behaved, it suļ¬ces to cancel

some of the small groups at each step to be sure that they do not reappear.

For integers modulo 2n , the best option is to cancel low order bits at each

steps, since carries only propagate from right to left. Over the integers, this

also is the best approach. Note that, in that case we encounter a new problem:

the sum of 2t numbers on n bits is a number on n + t bits. Luckily, t is small

compared to n and this problem is minor. As a consequence, we see that

Wagnerā™s extension has essentially the same spectrum of applicability as the

original reduced memory birthday paradox attacks.

8.3.3 Related open problems

When searching collisions between two lists, the available algorithms are

essentially optimal. Indeed, it is clear that the running time is bounded by

the time required to read the input lists. With lists formed of 2n/2 elements,

we get a lower bound of 2n/2 operations to compare with the running time

O(n2n/2 ) of a fast sort algorithm. On the contrary, with the algorithms

presented in this chapter, we have no such guarantee. As a consequence, it

is worth pondering whether these algorithms can be improved. As a special

case, it is interesting to consider three lists L1 , L2 and L3 of respective sizes

N1 , N2 and N3 . How much does it cost to ļ¬nd a solution of the equation:

x1 ā• x2 ā• x3 = 0, with x1 ā L1 , x2 ā L2 and x3 ā L3 . (8.15)

Clearly, the running time should be expressed as a function of N1 N2 N3 . In-

deed, with random lists of n-bit numbers, we expect a solution when N1 N2 N3 =

2n , independently of the individual values of N1 , N2 and N3 . With some spe-

ciļ¬c choices for the sizes of the lists, we can achieve a complexity O(n2n/2 ).

For example, when N3 = 1 and N1 = N2 = 2n/2 , a plain collision search

approach suļ¬ces. Similarly, when N1 N2 = 2n/2 and N3 = 2n/2 , the same

approach works. A much more problematic case is N1 = N2 = N3 = 2n/3 . In

this case, we can sort one list, say L3 and then for each pair (x1 , x2 ) ā L1 Ć— L2

Ā© 2009 by Taylor and Francis Group, LLC

266 Algorithmic Cryptanalysis

search for x1 ā• x2 in L3 . However, this solution yields a running time larger

than 22n/3 . To reduce the running time, we can follow the approach of Wagner

and enlarge the lists L1 , L2 and L3 . For lists of size, 2Ī±n with 1/3 ā¤ Ī± ā¤ 1/2,

this leads to a running time 2(1ā’Ī±)n . However, the best we can do is 2n/2 with

N1 = N2 = N3 = 2n/2 , which is worse than the basic birthday algorithm.

This speciļ¬c example emphasizes two more general open problems with

generalized birthday algorithms. The ļ¬rst open problem is to devise a new

algorithm that searches a unique solution with a better tradeoļ¬ between time

and memory. Typically, the current best algorithms have time 2n/2 and mem-

ory 2n/4 , so we could aļ¬ord to use some extra memory in order to go faster.

The second open problem concerns Wagnerā™s algorithm to search for one so-

lution among many. In its present form, this algorithm works extremely well

when the number of lists is a power of two. However, when this is not the

case, its behavior is obtained by rounding the number of lists down to a power

of two. In the worst case, when the number of lists is a power of two minus

one, this is highly unsatisfactory.

When considering the XOR operation with a large number of lists, it was

noted in [BM97] that Gaussian elimination can be used to solve the problem.

8.3.3.1 An incremental improvement

Looking again at the speciļ¬c example of three lists, it is possible to slightly

improve the basic birthday algorithm when using a speciļ¬c conļ¬guration of

the sizes of the lists. Instead of working with L1 = L2 = 2n/2 and L3 = 1,

we now use L1 = L2 = 2n/2 /r and L3 = n/2, with r = n/2. With this

choice, it is possible to rearrange the computations in order to scan the lists

L1 and L2 only once, simultaneously testing each candidate against the list

L3 . In order to do that, we ļ¬rst need to perform some elementary linear

algebra on the elements of the lists. Viewing each element of the three lists

as a vector in Fn , it is clear that a solution x1 ā• x2 ā• x3 = 0 also satisļ¬es

2

M x1 ā• M x2 ā• M x3 = 0, for any n Ć— n Boolean matrix. Moreover, if M is

invertible, any solution of the second equation is a solution to the original

problem. As a consequence, for any invertible matrix M , we may transform

the three lists by applying M to all their elements, without changing the set

of solutions.

Now, since L3 contains n/2 vectors, it spans a linear subspace of F2n of di-

mension at most n/2. We can easily choose a basis (b1 , b2 , Ā· Ā· Ā· , bn ) of F2n such

that each vector in L3 belongs to the subspace generated by (b1 , b2 , Ā· Ā· Ā· , bn/2 ).

We now choose for M the matrix that transforms each vector of F2n to the

basis (b1 , b2 , Ā· Ā· Ā· , bn ). Clearly, each element x3 of L3 is such that M x3 has

n/2 trailing zeros in position from n/2 to n. Moreover, L3 usually spans a

subspace of rank exactly n/2. In that case, b1 , b2 , . . . , bn/2 can simply be

chosen as the elements of L3 . In that case, any vector M x3 is zero everywhere

except in a single position between 1 and n/2. Once M is chosen, we can use

it to transform L1 and L2 . After this transformation, it suļ¬ces to search for

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 267

pairs of elements (x1 , x2 ) such that M x1 and M x2 collide on the last n/2

bits in the new representation. As usual, we expect that a fraction 2n/2 of

all pairs satisļ¬es this property. For each such candidate pair, it then suļ¬ces

to check that the corresponding sum belongs to L3 . Since there are 2n/2+1 /n

such pairs and since it suļ¬ces to check that sum contains a single one, this

can be done eļ¬ciently. All in all, we gain a factor r or n/2 compared to

the basic birthday algorithm.

8.4 Some direct applications

8.4.1 Noisy Chinese remainder reconstruction

The Chinese remainder theorem, as recalled in Chapter 2, allows to recon-

struct an integer x in a given interval [Bl , Bh ] from a list of modular values

x mod pi , as long as the values pi are mutually coprime, assuming that their

product is at least Bh ā’ Bl + 1.

The notion of ānoisyā Chinese remainder reconstruction covers several pos-

sible extensions of this basic problem. One simple extension is presented as

Exercise 1. In this section, we present a diļ¬erent extension, which arises

from the problem of counting points on elliptic curves. This problem, to-

gether with a birthday approach with reduced memory to solve it, was ļ¬rst

presented in [JL01].

The goal of the problem is to recover a unique integer x in a range [Bl , Bh ]

from modular values modulo coprime numbers. The only diļ¬erence with the

basic Chinese remainder theorem is that, instead of having a single candidate

for each modular value, we have a set of candidates. Some of these sets may

still contain a single element. Note that, by using a preliminary application

of the ordinary Chinese remainder theorem, it is possible to regroup all the

modulus with a single candidate into their product. Thus, without loss of

generality, we assume that we have a list of coprime numbers p0 , p1 , . . . , pk

together with sets Si of allowed modular values modulo pi . The set S0 contains

a single element, all the other sets contain at least 2. If for some reason, none

of the modulus corresponds to a unique modular value, we simply set p0 = 1

(and s0 = {0}).

To solve this problem in the context of point counting algorithms, the orig-

inal method was to use Atkinā™s match and sort algorithm. With this method,

we use the fact that the elliptic curve whose cardinality is desired can be used

to test each candidate and check its validity. This algorithm is a birthday

paradox based algorithm of the type discussed in Chapter 6. We now recall

a brief description of the algorithm, since this is useful to present the noisy

Chinese remainder variation. The ļ¬rst step of the match and sort algorithm

is, excluding pO to sort the coprime modulus by increasing values of |Si |/pi .

Ā© 2009 by Taylor and Francis Group, LLC

268 Algorithmic Cryptanalysis

After doing this, we consider the smallest possible values of such that the

product of all coprime values from p0 up to p is larger than the length of

the interval Bh ā’ Bl + 1. Combining all the allowed modular values for each

pi we obtain N = i=1 |Si | candidates. Note that N is almost minimal, but

depending on the exact values of the modulus pi and of the cardinality |Si |

it can in some cases be improved, especially when there is a gap between the

length of the interval and the product of the modulus.

After choosing a set of modulus, it is possible to obtain the cardinality of

the elliptic curve by brute force. It suļ¬ces, given an element Q in the additive

group of the elliptic curve to check, for each of the N candidates ci , whether

ci Q is the neutral element in the group. Atkinā™s match and sort reļ¬nes this

into a birthday-based attack, by splitting the product of modulus in two

parts P and P . To each subproduct, we associate a set of allowed values,

respectively S and S . The decomposition of the product is chosen to balance

the sizes of these two sets as much as possible. At this point, it is useful to

note that by shifting everything by a well-chosen constant, we can replace

the interval [Bl , Bh ] by another interval centered around 0, say [ā’B, B], with

B ā¤ P P /2. Once this is done, we can make sure that each candidate value s

in S is replaced by a new value s such that: s = s (mod P ), s = 0 (mod P )

Ė Ė Ė

and s ā] ā’ P P /2, P P /2]. Similarly, each s in S is replaced by a new value

Ė

s such that: s = s (mod P ), s = 0 (mod P ) and s ā [ā’P P /2, P P /2[.

Ė Ė Ė Ė

Making these replacements, we can rewrite each of the N possible candi-

dates as s + s + Ī»P P for s and s in the modiļ¬ed sets and Ī» ā {ā’1, 0, 1}. Fi-

ĖĖ Ė Ė

nally, it suļ¬ces to search for collisions between the sets, sQ and ā’(Ė +Ī»P P )Q

Ė s

on the elliptic curve.

The approach presented in [JL01] no longer makes use of the group of the

elliptic curve. Instead, it relies on the fact that the modular information that is

obtained at the beginning of point counting algorithms is redundant and tries

to make use of this redundancy to recover the cardinality. Like Atkinā™s match

and sort, it uses a centered interval [ā’B, B] and it starts by determining a

product of modulus larger than the interval length that generates the smallest

amount of candidates. Instead of splitting this product in two, we now split

it in four factors P1 , P2 , P3 and P4 , chosen to balance the sizes of the lists

of allowed modular values for each Pi . To each factor Pi , we associate a

set Si that contains each of the candidate values modulo Pi . The values

stored in Si are normalized as follows: for each si in Si , we make sure that

si = 0 (mod Pj ) for j = i and that v ā [ā’P1 P2 P3 P4 /2, P1 P2 P3 P4 /2]. As a

consequence, each of the N candidates modulo P1 P2 P3 P4 can be written as

s1 + s2 + s3 + s4 + Ī» Ā· P1 P2 P3 P4 , with each si in the corresponding set Si and

Ī» ā {ā’2, ā’1, 0, 1, 2}.

Let q1 , . . . , q be the remaining modulus values, i.e., the values among the

initial p1 , . . . , pk which do not appear in the product P1 P2 P3 P4 . To each

element si in one of the sets S1 , S2 or S3 , we can associate a vector V (si )

with coordinates, where the j-th coordinate is obtained by considering si

modulo qj . Similarly, assuming that we have guessed the value of Ī», we can

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 269

construct V (s4 + Ī» Ā· P1 P2 P3 P4 ) for each s4 in S4 . It now remains to solve a

sum-of-four problem with multiple targets as in Section 8.3.1. More precisely,

we are looking for (s1 , s2 , s3 , s4 ) such that the sum

V (s1 ) + V (s2 ) + V (s3 ) + V (s4 + Ī» Ā· P1 P2 P3 P4 ) (8.16)

has an acceptable value modulo each modulus qi . Thus, the set of allowed

targets is structured as a direct product. We refer the reader to [JL01] for

more details.

8.4.2 Plain RSA and plain ElGamal encryptions

In this section, we describe some birthday-based attacks from [BJN00]

against some variations of RSA and ElGamal encryption which do not fol-

low the state-of-the-art recommendations for secure encryption schemes. We

call these variations plain RSA and plain ElGamal. It should be clear from

the coming discussion that these simpliļ¬ed cryptosystems are insecure and

should not be used. We start by deļ¬ning the simpliļ¬ed systems that we are

going to consider.

Plain RSA. Given an RSA public key (N, e), we use this key to directly

encrypt random session keys for a block cipher. Let K be such a random key,

the corresponding encrypted key simply is K e (mod N ).

Plain ElGamal. Given a large prime p, an element g of order q modulo

p, with q much smaller than p, more precisely, with (p ā’ 1)/q ā„ 2m , and an

ElGamal public key y = g x (where x is the corresponding secret key). We

encrypt a randomly chosen session key K of at most m bits by choosing a

random number r and forming the ciphertext (g r , K Ć— y r ).

When encrypting a random key K with either plain RSA or plain ElGamal,

we do not check for any speciļ¬c properties of this key. In particular, it is

conceivable that K viewed as an integer is a product of two integers K1 and

K2 of roughly equal size. The birthday paradox based attacks we present here

work in this speciļ¬c case. As a consequence, they are probabilistic attacks

which only succeed when a bad key is chosen. We give some data about the

probability of this bad event occurring at the end of the section and show that

both asymptotically and in practice, this approach works for a non-negligible

fraction of the keys.

8.4.3 Birthday attack on plain RSA

With plain RSA, if the key K can be written as K = K1 K2 , the multiplica-

tivity of RSA implies that:

K e = K1 K2

ee

(mod N ). (8.17)

Ā© 2009 by Taylor and Francis Group, LLC

270 Algorithmic Cryptanalysis

Given the encrypted key K e , it is possible to obtain K1 and K2 from a list L of

encrypted small key (k e , k) sorted by increasing order of the ļ¬rst component.

Indeed, it suļ¬ces for all small keys K1 in L to search for K e /K1 in the ļ¬rst

e

component of L. If the search is successful, we clearly obtain an equality

K e /K1 = K2

e e

(mod N ), (8.18)

for two small keys, from which we deduce that K = K1 K2 .

Forgetting about logarithmic factors, we can recover a n-bit key in time 2n/2

instead of 2n . This clearly shows that plain RSA is not a secure cryptosystem.

However, this attack is costly in terms of memory and we can ask whether it

is possible to lower these memory requirements. The natural option would be

to look for K as a product of four smaller keys K = K1 K2 K3 K4 ; of course,

this covers a smaller fraction of the keys. In this case, we would like to use

the algorithms from the present chapter. However, here the group order is

unknown and discrete logarithm computations are not feasible, because N is

too large to factor. As a consequence, we ļ¬nd ourselves in one of the bad cases

presented in Section 8.2.2.1. A nice open problem is to adapt the algorithms

to this speciļ¬c case and recover K using less memory.

8.4.4 Birthday attack on plain ElGamal

With plain ElGamal, instead of directly using multiplicative properties, we

ļ¬rst remark that both g and y belong to the (usually unique) subgroup of

order q in Fp . Thanks to this fact, we are able to derandomize the encryption

scheme and get a new value which is a function of the encrypted key only.

This value is:

pā’1 pā’1 pā’1 pā’1

(y r K) )r K

= (y =K (mod p). (8.19)

q q q q

Once again, if K = K1 K2 then

pā’1 pā’1 pā’1

q q

K1 K2 =K , (8.20)

q

thus we can attack this scheme exactly as we did in the case of plain RSA.

The key diļ¬erence here is that the order of the group where we want to

solve this multiplicative equation is no longer an unknown, indeed it is equal

to (p ā’ 1)/q. Assuming that this number can be factored, we now know the

decomposition of the group as discussed in Section 8.2.2.1. From this discus-

sion, it follows that for n-bit keys, if K can be written as K = K1 K2 K3 K4

a product of four numbers of about n/4 bits, we can use the quadrisection

approach if there is a small subgroup of size approximately 2n/4 . As stated

in Section 8.2.2.1, we do not even need to compute logarithms and put the

problem into additive form in order to solve it.

However, computing logarithms is a good idea, since it allows us to solve

the problem eļ¬ciently even when the available subgroup is too large, up to

Ā© 2009 by Taylor and Francis Group, LLC

Birthday attacks through quadrisection 271

2n/2 elements. Indeed, we have a list of 2n/4 small numbers. Thus, we need

to compute 2n/4 discrete logarithms. In our context, the best algorithms

available are generic algorithms such as Pollardā™s rho. With a group of 2n/2

elements, each computation of a discrete logarithm costs 2n/4 . As a conse-

quence, the total cost is 2n/2 and remains comparable to the running time of

the quadrisection approach. Once we have reduced the problem to its additive

form, we can, as in Section 8.2.2, lift the problem to Z and solve it eļ¬ciently.

Ā© 2009 by Taylor and Francis Group, LLC

272 Algorithmic Cryptanalysis

Exercises

1h . Let N be a product of many small primes (or prime powers). Let X a

number in the interval [0, B], where B is a known upper bound. Assume

that we are given values for X mod p for each prime (power) in N , with

the following caveat: most modular values are correct, but some might

be incorrect. To parametrize the problem, it is useful to denote by NI

the product of the primes (or prime powers) for which the modular value

is incorrect.

ā¢ Show that if NI is given, we can obtain X as long as N/NI > B.

ā¢ Assuming that each prime (or prime power) in N has the same

order of magnitude P , we now assume that N contains n terms,

i.e., N ā P n and NI contains nI terms. Describe a brute force

algorithm that search for X by trying all possibilities for NI . Under

which condition do we obtain a unique solution for X?

Ė

ā¢ An alternative approach is to ļ¬rst construct a value X modulo

Ė

N that matches all the given modular values. Show that X = X

(mod N/NI ). Deduce that X/N can be approximated by a fraction

Ī“/NI . Can this fraction be obtained without exhaustive search?

ā¢ Combining exhaustive search together with the above approach,

when can X be recovered and at what cost?

2. Implement the birthday attack using quadrisection against RSA (or El-

Gamal) described in Section 8.4.2.

3h . The quadrisection algorithms from this chapter have all been presented

for commutative groups. Discuss their applicability to the group of 2Ć—2

matrices over Fp .

4. Let L1 , . . . , Ln be n lists, each containing 2 bitstrings on n bits. Write

an eļ¬cient method for ļ¬nding n elements, one from each list, whose

sum is zero.

Ā© 2009 by Taylor and Francis Group, LLC

Chapter 9

Fourier and Hadamard-Walsh

transforms

9.1 Introductory example: Studying S-boxes

Substitution boxes or S-boxes are very important ingredients in the con-

struction of secret key cryptographic algorithms. Their main role is to break

the linearity of these algorithms. Due to the binary nature of computers, most

S-boxes act on bits. They take a number of bits as input and produce a number

of bits as output. Having S-boxes with good properties is essential to the secu-

rity of secret-key encryption. For example, it is well known that the security of

the DES algorithm drops dramatically when its S-boxes are replaced without

care. The cryptanalysis of FEAL in [BS91b, GC91, MY92, TCG92], which

is an encryption algorithm similar to DES, illustrates this point. As a conse-

quence, the properties of S-boxes need to be studied with care when choosing

them, or when trying to cryptanalyze an encryption algorithm. In particular,

the linear and diļ¬erential properties of S-boxes, which are respectively used by

linear [Mat93, Mat94a, Mat94b, TSM94] and diļ¬erential [BS91a, BS92, BS93]

cryptanalysis, should always be determined.

9.1.1 Deļ¬nitions, notations and basic algorithms

Let S be an S-box with a n-bit input and a t-bit output. The diļ¬erential

Ī“

characteristics of S are denoted by Dā (S) and stand for the number of input

pairs (x, y) with input diļ¬erence ā and output diļ¬erence Ī“, i.e., such that

x ā• y = ā and S(x) ā• S(y) = Ī“. The linear characteristics of S are denoted by

Lm (S) and stand for the diļ¬erence between the number of n-bit values x such

M

that the two bitwise scalar products (M |x) and (m|S(x)) are equal modulo

2 and the number of pairs such that these scalar products are diļ¬erent. The

n-bit value M is called the input mask and the t-bit value m the output mask.

Throughout this study, to simplify notations, we identify numbers with

the binary bitstrings that represent these numbers. To make the notation

explicit, let a and b be two numbers. First, write both a and b in basis 2, i.e.,

ā’1 ā’1

assuming -bit values as sums i=0 ai 2i and i=0 bi 2i , with all values ai and

bi in {0, 1}. In this context, we identify a with the bitstring a ā’1 Ā· Ā· Ā· a1 a0 , we

273

Ā© 2009 by Taylor and Francis Group, LLC

274 Algorithmic Cryptanalysis

also deļ¬ne the bitwise scalar product of a and b as:

ā’1

(a|b) = ai bi mod 2.

i=0

From the deļ¬nitions, we can directly derive two basic algorithms: one for

computing diļ¬erential characteristics (Algorithm 9.1) and the other for lin-

ear characteristics (Algorithm 9.2). For linear characteristics, the algorithm

simply enumerates all input masks, all output masks and all input values,

compute the scalar products, do some counting and derive each characteris-

tic. The runtime is O(22n+t ) operations on n-bit counters and in terms of

memory it suļ¬ces to store S and a couple of additional variables, thus re-

quiring t Ā· 2n + O(1) bits of memory (assuming that the characteristics are

produced but not stored). For diļ¬erential characteristics, we proceed slightly

diļ¬erently, by ļ¬rst enumerating all values of the input diļ¬erence ā. For each

possible diļ¬erence, we initialize to zero a table of 2t counters (one for each

possible output diļ¬erence). Then, for each pair (x, x ā• ā) with the prescribed

input diļ¬erence, its output diļ¬erence S(x)ā•S(x+Ī“) is computed and the cor-

responding counter is incremented. The runtime is O(22n ) and the memory is

t Ā· 2n + n Ā· 2t + O(1) bits to store S and the counters (again assuming that the

characteristics are not stored). Note that it is straightforward to gain a factor

of two using the fact that due to symmetry, each pair is computed twice, once

as (x, x ā• ā) and once as (x ā• ā, x). We also remark that for ā = 0 the result

is known without any computation, since the 2n pairs with input diļ¬erence

0 all have output diļ¬erence 0. We see that, using these basic algorithms, for

any choice of input and output sizes for S, it is easier to compute diļ¬erential

characterics than linear characteristics.

Algorithm 9.1 Algorithm for computing diļ¬erential characteristics

Require: Input Table S containing 2n elements on t bits

Create table Count of 2t integers

for ā from 0 to 2n ā’ 1 do

Set table Count to zero

for x from 0 to 2n ā’ 1 do

Increment Count[S[x] ā• S[x ā• ā]]

end for

Print table Count: Count[Ī“]/2n is the probability of transition from input

diļ¬erence ā to output diļ¬erence Ī“

end for

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 275

Algorithm 9.2 Algorithm for computing linear characteristics

Require: Input Table S containing 2n elements on t bits

for M from 0 to 2n ā’ 1 do

for m from 0 to 2t ā’ 1 do

Let Count āā’ 0

for x from 0 to 2n ā’ 1 do

if (x|M ) = (S[x]|m) then

Increment Count

end if

end for

Print Count, where Count/2n is the probability of the event

(x|M ) ā• (S[x]|m) = 0

end for

end for

9.1.2 Fast linear characteristics using the Walsh transform

The Walsh (or Hadamard-Walsh) transform is a kind of discrete Fourier

transform which has been used for a long time in coding theory. It was

applied to cryptography by Xiao and Massey in [XM88]. Among its possible

applications, we ļ¬nd the fast computation of linear characteristics. The easiest

case happens for S-boxes with a single bit of output, i.e., when t = 1. In that

case, it suļ¬ces to consider the case of an output mask m equal to 1. Indeed,

when m = 0 the result is independent of the S-box and quite uninteresting.

From the deļ¬nition, we have:

L1 (S) = 1ā’ 1 (9.1)

M

(x|M )=S(x) (x|M )=S(x)

(ā’1)(x|M ) Ā· (ā’1)S(x) = (ā’1)(x|M ) Ā· T (x),

=

x x

where T (x) is deļ¬ned as (ā’1)S(x) . In this form, it is interesting to split T

in two halves T0 and T1 with inputs on n ā’ 1 bits. More precisely, for all

0 ā¤ x < 2nā’1 , we let:

T1 (x) = T (2nā’1 + x).

T0 (x) = T (x) and (9.2)

Similarly, we split S in two halves S0 and S1 . Using these notations, we see

that for input masks M < 2nā’1 , we have:

L1 (S) = (ā’1)(x|M ) Ā· T0 (x) + (ā’1)(x|M ) Ā· T1 (x) (9.3)

M

x<2nā’1 x<2nā’1

L1 (S0 ) + L1 (S1 ).

= M M

Ā© 2009 by Taylor and Francis Group, LLC

276 Algorithmic Cryptanalysis

Likewise, for input masks of the form 2nā’1 + M with M < 2nā’1 , we have:

L1nā’1 +M (S) = (ā’1)(x|M ) Ā· T0 (x) ā’ (ā’1)(x|M ) Ā· T1 (x)

2

x<2nā’1 x<2nā’1

L1 (S0 ) ā’ L1 (S1 ).

= (9.4)

M M

Clearly, T0 and T1 can also be divided in halves and so on . . . until we reach

tables of size one where nothing remains to be done. This yields the following

algorithm, which receives as input the table T , modiļ¬es it in place and outputs

a new table, the Walsh transform of T denoted W (T ). At position M in this

table, we ļ¬nd the value WM (T ) = L1 (S). This algorithm can be written

M

without using recursion, as described in pseudo-code as Algorithm 9.3.

Algorithm 9.3 Walsh transform algorithm

Require: Input Table T containing 2n elements ā’1, 1

Comment: Variable Sz is the small table size

Comment: Variable Pos is the small table position

for i from 0 to n ā’ 1 do

Let Sz āā’ 2i , Pos āā’ 0

while (Pos < 2n ) do

for j from 0 to Sz ā’ 1 do

Let Sum āā’ T [Pos + j] + T [Pos + Sz + j]

Let Diff āā’ T [Pos + j] ā’ T [Pos + Sz + j]

T [Pos + j] āā’ Sum

T [Pos + Sz + j] āā’ Diff

end for

Let Pos āā’ Pos + 2 Ā· Sz

end while

end for

Output overwritten content of T containing W (T )

We see that the time complexity of the Walsh transform is O(n Ā· 2n ) and

that the required memory is O(2n ) numbers or O(nĀ·2n ) bits since the numbers

contained in the Walsh transform are integers in the range between ā’2n and

2n . Another important fact about the Walsh transform is that it can easily

Ė

be inverted by a similar looking algorithm, the inverse Walsh transform W ,

described in pseudo-code in Algorithm 9.4.

Note that the only diļ¬erence between the Walsh transform and its inverse is

the fact that in the inverse transform the variables Sum and Diff are divided

by 2 before being put back into the table. It is easy to check that each

elementary step of size Sz in the inverse Walsh transform is the inverse of the

corresponding step in the regular transform. Moreover, the order in which

the various elementary steps are performed is irrelevant because each of them

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 277

Algorithm 9.4 Inverse Walsh transform algorithm

Require: Input Table T containing 2n elements ā’1, 1

Comment: Variable Sz is the small table size

Comment: Variable Pos is the small table position

for i from 0 to n ā’ 1 do

Let Sz āā’ 2i , Pos āā’ 0

while (Pos < 2n ) do

for j from 0 to Sz ā’ 1 do

Let Sum āā’ T [Pos + j] + T [Pos + Sz + j]

Let Diff āā’ T [Pos + j] ā’ T [Pos + Sz + j]

T [Pos + j] āā’ Sum/2

T [Pos + Sz + j] āā’ Diff/2

end for

Let Pos āā’ Pos + 2 Ā· Sz

end while

end for

Ė

Output overwritten content of T containing W (T )

commutes with all the others. As a consequence, we deduce that the two

algorithms are indeed inverses of each other.

For general S-boxes with t bits of output, it is also possible to speed up

the computation using the Walsh transform. It suļ¬ces for each output mask

m to construct an auxiliary table Sm (x) = (m|S(x)) with a single bit of

output, and then to proceed as above. This lowers the time complexity down

to O(n Ā· 2n+t ) compared to O(22n+t ) for the basic algorithm. The memory

requirements are essentially multiplied by 1 + n/t, since in addition to S we

now need to hold in memory the Walsh transform of one Sm . Using this Walsh

transform algorithm, the computation of linear characteristics now compare

favorably with the computation of diļ¬erential characteristics, especially when

the output size t is much smaller than the input size n. Indeed, we have to

compare O(n Ā· 2n+t ) with O(22n ).

9.1.2.1 Basic implementation of the Walsh transform

In this section, we give a basic implementation of Algorithm 9.3 as Pro-

gram 9.1. One interesting experiment is to measure the speed of this code for

tables of increasing sizes. In this experiment, we take all possible sizes from

32 to 228 and, in order to reļ¬‚ect the time per element, we repeat the Walsh

transform 228 /S times for a table of size S. We give the observed running

time in Table 9.1. For very small tables, the running times are high due to

function calls overhead. For large tables, we see that there is a progressive

slowdown between 211 and 220 , followed by a brutal slowdown after 220 . This

is the result of cache eļ¬ects for large tables. Cache friendly versions of the

Walsh transform are given on the bookā™s website.

Ā© 2009 by Taylor and Francis Group, LLC

278 Algorithmic Cryptanalysis

Program 9.1 C code for Walsh transform

#define TYPE int

/*In place Walsh transform*/

void Walsh(TYPE *Tab, TYPE size)

{

TYPE i,i0,i1; TYPE step;

TYPE sum,diff;

for (step=1;step<size;step<<=1) {

for (i1=0;i1<size;i1+=2*step) {

for (i0=0;i0<step;i0++) {

i=i1+i0;

sum=Tab[i]+Tab[i+step];diff=Tab[i]-Tab[i+step];

Tab[i]=sum;Tab[i+step]=diff;

}

}

}

}

Size S Runtime for Size S Runtime for

28 28

2 /S executions 2 /S executions

32 16.0 s 64 9.4 s

128 6.5 s 256 4.9 s

512 4.3 s 1024 4.1 s

2048 3.9 s 4096 4.0 s

8192 4.1 s 16384 4.5 s

32768 4.6 s 65536 4.8 s

131072 5.0 s 262144 5.1 s

524288 5.3 s 1048576 6.0 s

2097152 11.9 s 4194304 13.5 s

8388608 14.2 s 16777216 14.9 s

33554432 15.4 s 67108864 16.1 s

134217728 16.7 s 268435456 17.3 s

Table 9.1: Timings on Intel Core 2 Duo at 2.4 GHz using gcc 4.3.2

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 279

9.1.3 Link between Walsh transforms and diļ¬erential char-

acteristics

In 1994, in two independent articles, Chabaud and Vaudenay [CV94], Dae-

men, Govaerts and Vandewalle [DGV94] discussed the relation between the

Walsh transform, or more precisely its square, and the value of diļ¬erential

characteristics. In this section, we illustrate this relation by explicitly de-

scribing a Walsh transform based algorithm for computing diļ¬erential char-

acteristics.

As with the case of linear characteristics, we start by studying S-boxes with

a single bit of output. In order to represent these S-boxes, we are going to

use polynomials in a multivariate polynomial ring. More precisely, we start

from the commutative ring over Z in n variables X0 , X1 , . . . , Xnā’1 . Then we

2 2 2

form an ideal I, generated by the n polynomials X0 ā’ 1, X1 ā’ 1, . . . Xnā’1 ā’ 1.

Finally, we form the quotient ring:

Z[X0 , X1 , Ā· Ā· Ā· , Xnā’1 ]

K= . (9.5)

(I)

In the ring K, we encode an S-box S by the polynomial fS deļ¬ned as:

2n ā’1

(ā’1)S(i) Mi .

fS = (9.6)

i=0

In this expression Mi denotes the following monomial in K:

nā’1

Bj (i)

Mi = Xj , (9.7)

j=0

where Bj (i) denotes the j-th bit in the binary representation of i. Note that

in this representation, we use the trick of representing S-boxes multiplicatively

by 1 and ā’1 instead of 0 and 1, as we already did in the table T in Section 9.1.2.

In this polynomial representation, we ļ¬rst remark that the multiplication

of monomials modulo I is such that Mi = Mj Ā· Mk if and only if i = j ā• k.

2

This implies that the coeļ¬cient of Mā in fS is:

(ā’1)S(i) (ā’1)S(j) . (9.8)

iā•j=ā

Since each term in the sum is equal to 1 when S(i) = S(j) and to ā’1 when

0 1

S(i) = S(j), the sum is clearly equal to Dā (S) ā’ Dā (S). Moreover, it is clear

from a simple counting argument that Dā (S) + Dā (S) is 2n . Thus, we easily

0 1

0 1

recover Dā (S) and Dā (S).

2

As a direct consequence, computing fS yields the full table of diļ¬eren-

tial characteristics for the single bit output S-box S. Thus, ļ¬nding a fast

multiplication algorithm in K is suļ¬cient to quickly compute the diļ¬erential

characteristics of S.

Ā© 2009 by Taylor and Francis Group, LLC

280 Algorithmic Cryptanalysis

What we need to do now, is a simpler analog of the well-known fact that

in the polynomial ring Z[x], fast multiplication can be performed using fast

Fourier transform, pointwise multiplication and inverse Fourier transform.

To describe fast multiplication in K, a simple way is to proceed by recursion

on the number of variables. Assume that we already know how to multiply

polynomials on n ā’ 1 variables. To multiply f and g, we ļ¬rst write f as

f0 + Xnā’1 f1 and g as g0 + Xnā’1 g1 . With this notation we have:

f Ā· g = (f0 g0 + f1 g1 ) + Xnā’1 (f0 g1 + f1 g0 ). (9.9)

Denoting the product by h and writing h0 = f0 g0 +f1 g1 and h1 = f0 g1 +f1 g0 ,

we now remark that:

h0 + h1 = (f0 + f1 ) Ā· (g0 + g1 ) (9.10)

h0 ā’ h1 = (f0 ā’ f1 ) Ā· (g0 ā’ g1 ).

Viewing f , g and f g as tables with 2n entries, where entry i is the coeļ¬cient

of Mi , and going through the recursion, we ļ¬nd the following relation between

the Walsh transforms of f , g and h:

W (h) = W (f ) Ć— W (g), (9.11)

where Ć— denotes pointwise multiplication, i.e.,

āi : Wi (h) = Wi (f ) Ā· Wi (g), (9.12)

Then h itself is obtained by computing the inverse Walsh transform of the

pointwise product.

2

In order to compute fS , it suļ¬ces to perform a single multiplication in K.

More precisely, we compute the Walsh transform of fS , square it pointwise and

apply the inverse Walsh transform. Thus, using this algorithm the complexity

of computing fS requires O(n Ā· 2n ) bit operations. As a consequence, we build

2

the complete table of diļ¬erential characteristics for a S-box S with a single

bit output using O(n Ā· 2n ) bit operations, compared to O(22n ) for the basic

Algorithm 9.1.

9.1.3.1 Diļ¬erential characteristics for general S-boxes

In order to deal with general S-boxes with more than a single bit of output,

we need to generalize the above method, in order to account for the additional

output bits. We now describe two slightly diļ¬erent approaches that apply to

this generalized case.

9.1.3.1.1 Theoretical approach The ļ¬rst approach, is to look at the

problem from a theoretical point-of-view, replacing the function with several

bits of output by another function with a single bit of output. The simplest

way to proceed is probably to replace the S-box S by its characteristic function

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 281

as proposed in [CV94]. Indeed, the characteristic function is a function from

n + t bits to a single bit and we can encode it and use the Walsh transform

almost as in the case of a single bit output. More precisely, the characteristic

function of S is the function ĻS is deļ¬ned as:

1, if y = S(x) and

Ļs (y x) = (9.13)

0, otherwise.

In this deļ¬nition y x is a decomposition of the input to ĻS into a lower part

x on n bits and an upper part y on t bits.

Then, we use monomials on n + t variables in an enlarged copy of K and

deļ¬ne:

2n+t ā’1 2n ā’1

M (i + 2n Ā· S(i)).

FS = ĻS (i)M (i) = (9.14)

i=0 i=0

Note that, this time, we do not use a multiplicative encoding with 1 and

ā’1. Using a simple computation, we can check that the coeļ¬cient of M (ā +

2n Ā· Ī“) in FS is exactly Dā (S). Thus we can compute all the diļ¬erential

2 Ī“

characteristics using a Walsh transform followed by a pointwise multiplication

and ļ¬nally an inverse Walsh transform. The time complexity is O((n + t) Ā·

2n+t ). However, the drawback of this approach is that we need to store in

main memory a large table with 2n+t entries.

9.1.3.1.2 Practical Variant To avoid this main drawback, we can use a

slightly diļ¬erent approach and rely on disk storage instead of main memory to

store the 2n+t entries. The goal here is to reduce the amount of main memory

needed to the maximum of 2n and 2t . The key ingredient is to alternatively

view a large table with 2n+t elements either as 2t columns of length 2n or 2n

rows of length 2t . First, to create the table by columns, we enumerate the 2t

possible1 linear output masks m and deļ¬ne Sm (x) = (m|S(x)) with a single

bit of output as in Section 9.1.2. For each value of m, we deļ¬ne fSm , compute

its square and save it to disk as a column containing 2n elements. Once this is

ļ¬nished, we consider each of the 2n possible values2 for ā and read from disk

all the numbers fSm (ā) into a table Tā with 2t elements. We now study the

2

relation between this table and the table Tā of all diļ¬erential characteristics

Ī“

Dā (S), for ļ¬xed input ā. We claim that when the output diļ¬erence of S is

1 Clearly,

for the zero mask, the computation is independent of S and the constant result

can be speciļ¬ed within the algorithm. However, to simplify the exposition, we skip this

minor improvement.

2 Once again, we could ignore the case where ā = 0.

Ā© 2009 by Taylor and Francis Group, LLC

282 Algorithmic Cryptanalysis

Ī“, the output diļ¬erence of Sm is clearly (m|Ī“). As a consequence:

0 1

Tā (m) = Dā (Sm ) ā’ Dā (Sm ) (9.15)

Ī“ Ī“

Dā (S) ā’ Dā (S)

=

(m|Ī“)=0 (m|Ī“)=1

(ā’1)(m|Ī“) Dā (S) =

Ī“

(ā’1)(m|Ī“) Tā (Ī“).

=

Ī“ Ī“

Thus, Tā is the Walsh transform of the table Tā . This shows that by com-

puting an additional inverse Walsh transform on each table Tā we can re-

cover the full table of diļ¬erential characteristics. The complexity is again

O((n + t) Ā· 2n+t ), but we require less main memory since we use disk space to

eļ¬ciently transpose the table from columns with ļ¬xed m to rows with ļ¬xed

ā. In fact, this approach does not even need the distinction between input

and output bits, it can be generalized to any Walsh transform computation

which overļ¬‚ows the available main memory, by regrouping bits arbitrarily.

When we compare this to the complexity of the basic algorithm, we see

that it is very useful when t is small compared to n but much less when t is

almost as large as n. The case where t is larger than n is rarely considered;

however, in this extreme case, the basic algorithm is faster than the Walsh

based technique described here. However, for S-boxes with many algebraic

properties, it may happen that even for large t a fast Walsh transform based

algorithm exists. It is for example the case of the AES S-box, due to the

linear redundancy between its bits described in [FM03]. Another possible

application is the computation of generalized diļ¬erences with more than two

inputs. In this case, the cost of the basic algorithm grows to 23n (with three

inputs), while the Walsh based algorithm cost remains at (n + t)2n+t .

9.1.4 Truncated diļ¬erential characteristics

Truncated diļ¬erential cryptanalysis is a variation of diļ¬erential cryptanal-

ysis introduced in [Knu94]. The basic idea is to study partially known diļ¬er-

ences, where the diļ¬erence is ļ¬xed on some bits and left unknown on other

bits. For this variation, it is possible to devise a variation of the basic algo-

rithm and build truncated diļ¬erential characteristics by ļ¬rst enumerating an

input and an output masks, which describe the input and output bits where

the diļ¬erence is known. Then, for each pair of masks, one enumerates all pos-

sible input pairs and increments the counter corresponding to the truncated

input and output diļ¬erence of the pair. This is shown in pseudo-code as Al-

gorithm 9.5. The time complexity of this algorithm is O(max(23n+t , 22n+2t ));

it requires a table of 2n+t elements in terms of memory. With this basic

approach, computing all possible truncated diļ¬erential characteristics costs

a lot more than computing ordinary diļ¬erentials. Before giving an improved

algorithm, we ļ¬rst need to count the total number of possible truncated diļ¬er-

entials, in order to obtain a lower bound for the complexity. The easiest way

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 283

is to remark that in general a truncated diļ¬erential is deļ¬ned by specifying

for each input and output bits whether it should be zero, one or unknown.

Thus, there are 3n+t diļ¬erent values to compute.

Algorithm 9.5 Algorithm for truncated diļ¬erential characteristics

Require: Input Table S containing 2n elements on t bits

Create bidimensional array Count of 2n Ć— 2t integers

for InMask from 0 to 2n ā’ 1 do

for OutMask from 0 to 2t ā’ 1 do

Set table Count to zero

for x from 0 to 2n ā’ 1 do

for y from 0 to 2n ā’ 1 do

Increment Count[(x ā• y)&InMask, (S[x] ā• S[x ā• ā])&OutMask]

end for

end for

Print table Count: Count[ā, Ī“]/2n is the probability of transition from

partial input diļ¬erence ā to partial output diļ¬erence Ī“

end for

end for

Looking beyond the basic approach, another idea is to compute the trun-

cated diļ¬erentials from the table of ordinary diļ¬erential. More precisely, we

can see that any truncated diļ¬erential is the sum of all the ordinary diļ¬er-

entials that are compatible with this truncated diļ¬erential. Here the word

ācompatibleā is understood in the natural sense and means that when the

truncated diļ¬erential has a 0 or 1 bit in either the input or output mask, the

ordinary diļ¬erentials we sum have the same bit value. When the truncated

diļ¬erential has an undetermined bit, we sum over all possibilities for this bit.

As a consequence, to compute a truncated diļ¬erential with k bits left un-

determined, we need to sum 2k ordinary diļ¬erentials. For each choice of k

positions for the undetermined bits, there are 2n+tā’k possible truncated dif-

ferentials, obtained by choosing the values of the bits in the other positions.

As a consequence, the total cost to compute all the truncated diļ¬erential

with a given set of undetermined bits is 2n+t . Since there are 2n+t possible

sets of undetermined bits, the total complexity of this second approach is

22n+2t = 4n+t .

In order to compute truncated diļ¬erentials, let us ļ¬rst consider the simplest

case where we limit the possibility of truncating to a single bit. To make things

even simpler, let us assume that the only bit which can be truncated is the

ļ¬rst input bit of S. In that case, given a ļ¬xed output diļ¬erence Ī“ and a

ļ¬xed diļ¬erence on n ā’ 1 input bits ā , we need to build the two diļ¬erential

Ī“ Ī“

characteristics D0 ā (S) and D1 ā (S) to cover the two possible cases where

Ā© 2009 by Taylor and Francis Group, LLC

284 Algorithmic Cryptanalysis

we do not truncate. We also need to compute a diļ¬erential where the bit is

really truncated:

Ī“ Ī“ Ī“

Dā— ā (S) = D0 ā (S) + D1 ā (S). (9.16)

Here, the notation ā— indicates that a bit of diļ¬erence is left unspeciļ¬ed.

There are several ways to use Equation (9.16) in order to compute truncated

diļ¬erential. One possibility, remembering that diļ¬erential characteristics are

computing by a Walsh transform, followed by squaring and an inverse Walsh

transform, is to embed the computation of the truncated characteristics within

a modiļ¬ed inverse Walsh transform algorithm for the ļ¬nal step. Starting with

a single unspeciļ¬ed bit in high-order position, we can deal with it in the last

iteration of the inverse Walsh transform. Thus, we now concentrate on this

last iteration and ļ¬nd out how it can be modiļ¬ed to compute the additional

truncated diļ¬erence. The required modiļ¬cation is very simple, since it suļ¬ces

in addition to Sum and Diff to store a copy of Sum + Diff, i.e., a copy of the

original content of T [Pos + j] somewhere. Doing a similar reasoning on other

bits easily yields an extended variation of the inverse Walsh transform that

works on tables of size a power of 3. It is then possible to plug this modiļ¬ed

inverse Walsh in the algorithms of Section 9.1.3.1. Note that, for the practical

variant presented in Section 9.1.3.1.2 that uses two inverse Walsh transform

steps, both need to be modiļ¬ed, one to allow truncated input diļ¬erences, the

other to permit truncated outputs. One diļ¬culty is that right before calling

the modiļ¬ed inverse transform, we need to convert tables of size 2n (resp.

2t ) into tables of size 3n (resp. 3t ). This is done by sending the entry in

position i with binary decomposition i = j=0 bj 2j to position j=0 bj 3j .

Thus, the binary decomposition of i is interpreted in basis 3. All unspeciļ¬ed

positions are ļ¬lled with zeros. A convenient convention is to organize the

modiļ¬ed inverse Walsh transform in such a way that for the entry in position

i the corresponding truncated diļ¬erence is obtained by looking at the base 3

decomposition of i. A digit set to 0 means a 0 diļ¬erence on the corresponding

bits, a 1 means 1 and a 2 means ā— (unknown). From a complexity point-of-

view, we ļ¬rst study the runtime of the modiļ¬ed Walsh transform on a table of

size 3 . At ļ¬rst, it seems that the complexity is O( 3 ). However, taking into

account that a large part of the table is initially zero and that it is useless to

add/subtract zeros, the complexity can be lowered to O(3 ). To implement

this improvement, it is easier to incorporate the binary to base 3 renumbering

within the modiļ¬ed inverse Walsh transform itself. With this improvement,

the main contribution to the runtime of the complete truncated diļ¬erential

algorithm is the second layer of modiļ¬ed Walsh transform. Indeed, the ļ¬rst

layer performs 2t transforms on tables of size 3n , while the second layer does

3n transforms on tables of size 3t . Thus, we obtain a total runtime of O(3n+t )

which is optimal, since we need to compute 3n+t values.

For completeness, we give on the bookā™s website two diļ¬erent C programs

computing truncated diļ¬erentials with this algorithm: one implementing the

modiļ¬ed inverse Walsh with complexity O( 3 ), the other slightly more com-

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 285

plicated with complexity O(3 ). For the sake of simplicity, in these programs

everything is stored in main memory. However, for larger tables, it would

be worth doing the transpose from columns to lines on disk as explained in

Section 9.1.3.1.2.

9.2 Algebraic normal forms of Boolean functions

For a table S with n input bits and a single bit of output or equivalently

for the associated Boolean function f on n bits, another very interesting rep-

resentation is to write the Boolean function f as a polynomial in n variables

over the polynomial ring F2 [x0 , x1 , . . . , xnā’1 ]. This representation is called

the algebraic normal form of f and has many applications in cryptography

and coding theory. In particular, for a cryptanalyst, it is often interesting, in

the context of algebraic cryptanalysis, to look at the degree of the algebraic

norm form of f .

More precisely, the algebraic normal form of f is obtained by writing:

xai

f (x0 , . . . , xnā’1 ) = g(a0 , . . . , anā’1 ) (9.17)

i

(a0 ,...,anā’1 )āFn i

2

The function g giving the coeļ¬cient of the polynomial in Equation (9.17) is

called the Moebius transform of the Boolean function f . Since g is also a

Boolean function, Moebius transforms can be computed by using operations

on bits only. Compared to Walsh transforms this gives a very signiļ¬cant

advantage and allows us to represent Boolean functions using a single bit in

memory per entry, instead of a full integer. It remains to see whether a fast

algorithm can work with this representation.

With a Boolean function f on a single bit entry x0 , it is easy to see that:

and g(1) = f (0) ā• f (1).

g(0) = f (0) (9.18)

Moreover, since the transformation that sends (x, y) to (x, x ā• y) is its own

inverse, we see that for Boolean functions on a single variable, the Moebius

transform is an involution on the set of Boolean functions. To generalize to

more than one variable, let us consider f a Boolean function on n variables

and write:

f (x0 , . . . , xnā’1 ) = f (0) (x0 , . . . , xnā’2 ) ā• f (1) (x0 , . . . , xnā’2 ) Ā· xnā’1 , (9.19)

where f (0) and f (1) are two Boolean functions on n ā’ 1 variables. If g (0) and

g (1) are the Moebius transforms of respectively f (0) and f (1) , they are related

to the Moebius transform g of f by the equations:

g(x0 , . . . , xnā’2 , 0) = g (0) (x0 , . . . , xnā’2 ) and (9.20)

g(x0 , . . . , xnā’2 , 1) = g (1) (x0 , . . . , xnā’2 ).

Ā© 2009 by Taylor and Francis Group, LLC

286 Algorithmic Cryptanalysis

Moreover, f (0) and f (1) are related to f by the equations:

f (0) (x0 , . . . , xnā’2 ) = f (x0 , . . . , xnā’2 , 0) and (9.21)

(1)

(x0 , . . . , xnā’2 ) = f (x0 , . . . , xnā’2 , 0) ā• f (x0 , . . . , xnā’2 , 1).

f

From these equations, we can devise a Moebius transform algorithm which is

very similar to the Walsh transform Algorithm 9.3. This yields Algorithm 9.6,

which is its own inverse. It can be as easy to implement as Program 9.2.

Algorithm 9.6 Moebius transform algorithm

Require: Input Truth table S of Boolean function f , with 2n entries

Comment: Variable Sz is the small table size

Comment: Variable Pos is the small table position

for i from 0 to n ā’ 1 do

Let Sz āā’ 2i , Pos āā’ 0

while (Pos < 2n ) do

for j from 0 to Sz ā’ 1 do

S[Pos + Sz + j] āā’ S[Pos + j] ā• S[Pos + Sz + j]

end for

Let Pos āā’ Pos + 2 Ā· Sz

end while

end for

Output overwritten content of S containing Moebius transform

9.3 Goldreich-Levin theorem

Goldreich-Levin theorem [LG89] is a fundamental theorem in the theory of

cryptography. In particular, it is essential to the proof by HĖ

astad, Impagli-

azzo, Levin and Luby [HILL99] that a secure pseudo-random generator can

be constructed from any one-way function.

THEOREM 9.1

Let us denote by x a ļ¬xed unknown n-bit value and denote by f a ļ¬xed n-

bit to t-bit one-way function. Suppose there exists an algorithm A that given

the value of f (x) allows to predict the value of a scalar product (R|x) with

probability 1 + over the choice R among n-bit strings, using at most T

2

operations. Then there exists an algorithm B, which given f (x) produces in

time at most T a list of at most 4n2 ā’2 values that contain x with at least

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 287

Program 9.2 C code for Moebius transform

#define TYPE unsigned int

/*In place Moebius transform*/

void Moebius(TYPE *Tab, TYPE size)

{

int Wsize;

TYPE i,i0,i1; TYPE step;

Wsize=size/(8*sizeof(TYPE));

/*Moebius transform for high order bits, using word ops*/

for (step=1;step<Wsize;step<<=1) {

for (i1=0;i1<Wsize;i1+=2*step) {

for (i0=0;i0<step;i0++) {

i=i1+i0;

Tab[i+step]^=Tab[i];

}

}

}

/*Moebius transform for low order bits, within words*/

/* Assumes 8*sizeof(TYPE)=32 */

for(i=0;i<Wsize;i++) {

TYPE tmp;

tmp=Tab[i];

tmp^=(tmp<<16);

tmp^=(tmp&0xff00ff)<<8;

tmp^=(tmp&0xf0f0f0f)<<4;

tmp^=(tmp&0x33333333)<<2;

tmp^=(tmp&0x55555555)<<1;

Tab[i]=tmp;

}

}

Ā© 2009 by Taylor and Francis Group, LLC

288 Algorithmic Cryptanalysis

1/2. The running time T is given by:

2n2 2n 2n

T= T + log +2 + Tf . (9.22)

2 2 2

Informally, this theorem means that, for a one-way function f , it is not pos-

sible to ļ¬nd an eļ¬cient algorithm that can given f (x) predict scalar products

of the form (R|x) with a non-negligible advantage. Of course, if we remove

the fact that the prediction of (R|x) can be noisy, the theorem becomes a

straightforward application of linear algebra methods. Indeed, from n scalar

products (Ri |x), we completely recover x as soon as the Ri values are linearly

independent over F2 .

Similarly, if we are given a probabilistic algorithm which outputs the value

of (R|x) with probability 1 + for any ļ¬xed R, recovering x is extremely easy.

2

Indeed, in that case, by choosing for R a value with a single bit set to 1 and

all other set to 0, we can learn the exact value of the corresponding bit of x.

It suļ¬ces to repeat the probabilistic algorithm to obtain many independent

predictions for (R|x). After some time, the correct value can be detected

through a majority vote (see Chapter 12).

The diļ¬culty with the proof of Goldreich-Levin theorem is that, in some

sense, we need to hide the fact that we are repeating queries. To predict the

value of (R|x) for a ļ¬xed R, we ļ¬rst choose k random auxiliary strings R1 ,

. . . , Rk . Then, we call the prediction algorithm A, 2k times, giving as input

k

f (x) together with a value R (Ī±) = R ā• i=1 Ī±i Ri , where Ī± takes all possible

values in {0, 1}k . Since:

k

(R|x) = (R (Ī±)|x) ā• Ī±i (Ri |x), (9.23)

i=1

if we guess the k values of (Ri |x) we deduce 2k diļ¬erent prediction for (R|x).

Note that these predictions are not independent, only pairwise independent,

yet this suļ¬ces to make the proof go through.

Finally, from an algorithmic point-of-view, we need to compute the diļ¬er-

ence between the number of predicted 0 values and the number of predicted

1 values. This is precisely obtained by using a Walsh transform on the table

of values (R (Ī±)|x).

9.4 Generalization of the Walsh transform to Fp

The Walsh transform as presented in Section 9.1.2 is speciļ¬c to the binary

ļ¬eld. However, it is natural to consider whether a similar method is appli-

cable to other ļ¬nite ļ¬elds. Since the Walsh transform is an eļ¬cient way to

Ā© 2009 by Taylor and Francis Group, LLC

Fourier and Hadamard-Walsh transforms 289

search for approximate linear relations between the input and output of a

function, we ļ¬rst need to adapt the formalization of linear relations to larger

ļ¬nite ļ¬elds. We only consider the case of prime ļ¬elds Fp . Indeed, any linear

relation that holds with good probability over an extension ļ¬eld implies an-

other good linear relation over the corresponding base ļ¬eld. Such a implied

relation can be constructed by applying the trace map to the original relation.

Such a generalized Walsh transform can be useful in several applications. A

ļ¬rst application is to compute characteristic for the non-Boolean generaliza-

tion of linear cryptanalysis introduced in [BSV07]. Another, more theoretical

application presented in Berbainā™s PhD thesis [Ber07] concerns the security

proof of a generalization of the QUAD family of stream ciphers from [BGP06],

based on the evaluation of quadratic polynomials, to non-binary ļ¬elds. This

proof relies on a generalization of the Goldreich-Levin theorem to non-binary

ļ¬elds.

In this section, S is a function from Fn to Ft and all scalar products are

p p

computed modulo p. We ļ¬rst note a key diļ¬erence between F2 and the general

case Fp . In the general case, a linear expression can take more than two values.

As a consequence, it is no longer possible to regroup all cases into a single

number. Over F2 , this was possible, because for any linear relation, we could

split all inputs to f into two sets: the matching entries and the non-matching

entries. Since the total number of input is ļ¬xed and known, the diļ¬erence

between the cardinality of these two sets suļ¬ced to encode all the information

we need. Over Fp , we need to distinguish between p diļ¬erent cases. As a

consequence, we change our deļ¬nition accordingly and deļ¬ne Lm (S)(Ī±) as

M

the cardinality of the set of inputs x such that:

(M |x) + (m|S(x)) = Ī± (mod p). (9.24)

We know that for all input and output masks:

pā’1

Lm (S)(Ī±) = pn , (9.25)

M

Ī±=0

since there are pn possible inputs to S. Clearly, we now need p ā’ 1 diļ¬erent

numbers for each pair of masks to extract the complete information about

linear relations. Moreover, we can remark than there are relations between

some pairs of masks. More precisely, for any non-zero constant Ī» in Fp we

have:

Lm (S)(Ī±) = LĪ»Ā·m (S)(Ī»Ī±) , (9.26)

M Ī»Ā·M

where Ī» Ā· m denotes the multiplication of each component in the mask m by Ī».

Of course, as with the binary ļ¬eld, we ļ¬rst need to address the basic case

where S outputs a single element in Fp . In that case, thanks to the multi-

plicative property above we only need to consider output masks of the form

m = 1. Indeed, when m = 0 the values Lm (S)(Ī±) are independent of S and

M

uninteresting. In this context, in order to compute the linear characteristics

Ā© 2009 by Taylor and Francis Group, LLC

290 Algorithmic Cryptanalysis

of S, we split x into x0 x , M into M0 M and S into p subfunctions Sx0 for

each ļ¬xed value of the ļ¬rst coordinate. Using the deļ¬nition, we then ļ¬nd:

L1 (S)(Ī±) = # {x|(M |x) + S(x) = Ī± (mod p)} (9.27)

M

= # āŖx0 āFp {x |(M |x ) + S(x0 x )} = Ī± ā’ M0 x0 }

= # āŖx0 āFp {x |(M |x ) + Sx0 (x )} = Ī± ā’ M0 x0 }

L1 (Sx0 )(Ī±ā’M0 x0 ) .

= M

x0 āFp

From this equations, we could easily derive a Walsh transform that oper-

ates on p Ć— pn values. However, this would not be optimal. Indeed, we have

pn possible input masks, p values of Ī± for each mask and one copy of Equa-

tion (9.25) for each mask. As a consequence, it suļ¬ces to compute (p ā’ 1) Ā· pn

numbers to obtain the complete characterization of linear approximations of

S. In order to devise such an optimal variation, we proceed as in the binary

case, keeping only the computation of L1 (S)(Ī±) , for non-zero values of Ī±. Of

M

course, this means that whenever L1 (S)(0) is needed, we have to replace it

M

using Equation (9.25). To avoid cumbersome constants, it is easier to com-

ĀÆ

pute the renormalized values L1 (S)(Ī±) = L1 (S)(Ī±) ā’ pnā’1 , which are related

M M

by:

pā’1

ĀÆM

Lm (S)(Ī±) = 0. (9.28)

Ī±=0

Moreover, since the restricted function Sx0 only has n ā’ 1 input coordinates,

we also have:

ĀÆM ĀÆM

L1 (S)(Ī±) = L1 (Sx0 )(Ī±ā’M0 x0 ) . (9.29)

x0 āFp

The simplest way to implement the algorithm is to recompute values of

ĀÆ

the form L1 (S)(0) when they are needed, right before the innermost loop

M

of the algorithm. To make the algorithm complete, the only missing step

ńņš. 10 |