<<

. 10
( 18)



>>

262 Algorithmic Cryptanalysis

additive representation of G can easily be derived. If, furthermore, an iso-
morphism from G to its additive representation can be computed e¬ciently
enough, we are clearly done. Similarly, when G can be expressed as a direct
product G1 — G2 with the right repartition of sizes, we can directly use the
¬rst approach we used for groups in additive representation, even when G1
and G2 are given in arbitrary form. Note that when G has a small enough
subgroup (smaller than the square root of the size of G), then it is possible to
tabulate the discrete logarithms in this subgroup in order to use an additive
representation for this part.
Nevertheless, in our context, there remains a few bad group presentations
which prevent us from using a birthday approach with reduced memory re-
quirement. A typical example is the group G of an elliptic curve over a ¬nite
¬eld (see Chapter 14). If the order of the group is prime (or even a product of
two primes of comparable sizes), then there is no known algorithms to solve
the problem of four lists with memory requirement lower than the generic
birthday-based attack. Another natural example to consider is the case of
multiplicative subgroups in ¬nite ¬elds. Of course, if we consider the full mul-
tiplicative group and look for naturally occurring random solutions, the order
of the group cannot be too large and we are thus in the case where discrete
logarithms can be computed e¬ciently. Indeed, with a group of order N , the
subexponential complexity of discrete logarithm computation, even repeated
N 1/4 times to deal with all the elements in the four lists is negligible compared
to the birthday running time of the algorithm N 1/2 . On the other hand, with
a moderately large subgroup in a large ¬nite ¬eld, computing N 1/4 discrete
logarithms with Pollard™s Rho is not an option. Similarly, if we consider small
sets of the full multiplicative group and know that a solution has somehow
been prearranged, we do not know how to apply the four-list approach. We
illustrate this in Section 8.4.2 where we study possible cryptanalysis of plain
RSA and plain ElGamal encryptions.


8.2.3 Working with more lists
Clearly, Equation (8.8) can easily be generalized to a di¬erent number
of lists. With two lists, the algorithmic issue is simple since the ordinary
birthday-based approach is almost optimal. With three lists of roughly equal
sizes N 1/3 in a group of N elements, the best known algorithm is a simple
adaptation of the basic birthday approach. Sort the ¬rst list L1 and then for all
pairs (g2 , g3 ) of elements in L2 —L3 , search for (g2 g3 )’1 in L1 . The required
memory is the optimal value N 1/3 but the running time is O(N 2/3 log N ).
With more than four lists, the best known algorithm is to transform the
problem into a four-list version and proceed. Assuming t lists of roughly equal
sizes, we ¬rst group the lists into packets of approximately t/4 lists. When t
is a multiple of four, we group the t/4 ¬rst lists and put all the corresponding
sums (or group compositions) into the ¬rst list L1 of the target four-lists
problem. We group the next packet of t/4 lists into L2 , L3 and L4 . Clearly,




© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 263

any solution of the transformed four-list problem is a solution of the original
problem with t lists. When t is not a multiple of four, we group the lists in
order to balance the size of the lists as far as possible. However, in that case,
the resulting time-memory tradeo¬ is not as good.




8.3 Extensions of the technique
8.3.1 Multiple targets
A natural generalization is to consider the case where instead of a single
acceptable target, there are many. Clearly, if we are given a list L of possible
target, a very easy solution is to move the target taken from the list L from
the right-hand side to the left-hand side and obtain a new instance of the
original problem with a single target. This instance has one more list but we
already know how to address this case.
However, if the set of targets has some extra structure and can be described
more e¬ciently than by a simple list, it is possible to obtain better approaches.
To illustrate this, we start with a simple example. Assume that we are consid-
ering a four-list problem with lists L1 , L2 , L3 and L4 of elements in a group
G1 — G2 and that the list of allowed targets is a direct product L = L1 — L2 .
In other words, a pair (g1 , g2 ) is a valid target, if and only if, g1 ∈ L1 and
g2 ∈ L2 . Moreover, assume that G1 and G2 have similar sizes, of the order
of 2n , that each of the lists L1 to L4 have size 2n/4 and that each of the lists
L1 and L2 have size 2n/2 . With these parameters, the expected number of
solutions is 1.
With this speci¬c instance, a ¬rst approach is to rewrite the problem as a
a problem of ¬nding a ¬xed target within a sum of six lists. In approach, we
view each element g1 of L1 as a representative for (g1 , 0) in G1 — G2 and each
element g2 of L2 as a representative for (0, g2 ). If, in addition, we construct
the two sums of lists L1 + L2 and L3 + L4 , we can directly use the ordinary
case of four sets, grouping L1 + L2 and L1 on one size, L3 + L4 and L2 on the
other. We obtain a running time of 2n and a memory requirement of 2n/2 .
However, this approach does not take advantage of the extra structure within
L , or equivalently of the fact that L1 only contains elements of G1 and L2
elements of G2 .
In order to improve this case, we now consider a di¬erent approach. Here,
we ¬rst focus on the partial solutions, with a correct value in G1 but not nec-
essarily in G2 . Clearly, L2 is not needed to construct these partial solutions,
since any element of L2 has a zero contribution on the G1 component of sums.
The expected number of partial solutions is 2n/2 ; moreover, using the basic
algorithm with one group containing L1 , L1 and the other group containing
L2 + L3 , L4 we can construct all these solutions in time 23n/4 . In our precise




© 2009 by Taylor and Francis Group, LLC
264 Algorithmic Cryptanalysis

case, the memory requirement is 2n/2 , due to the need to store L1 . Once
a partial solution is constructed, it is easy to evaluate the G2 component of
the corresponding sum. Then, to check whether the partial solution is also
correct on G2 , it su¬ces to test if the opposite of the G2 component belongs
to L2 , using a dichotomy search. As a consequence, with this speci¬c set of
parameters, using the idea of partial solutions reduces the running time, on a
group G1 — G2 of size 22n , from 2n down to 23n/4 .


8.3.2 Wagner™s extension
Up to now, we have concerned ourselves with ¬nding all solutions to Equa-
tion (8.8), usually in the case where a small number of solutions exist. David
Wagner [Wag02] asks a slightly di¬erent question and looks for a single solu-
tion to this equation, especially in cases where a large number of solutions is
expected. The surprising conclusion, in this case, is that the algorithms can be
adapted to yield a completely di¬erent and much more e¬cient time/memory
tradeo¬. A precursor of this idea was already present in [CP91].
Let us start by considering the problem of xoring four bitstrings in this
new context. We are given four lists L1 , L2 , L3 and L4 , each containing 2βn
n-bit numbers for some parameter β ≥ 1/4. On average, we expect 24β’1
solutions. The key idea of Wagner, is to restrict the search and only look for
solutions (x1 , x2 , x3 , x4 ) where the restriction of x1 • x2 (and of x3 • x4 ) to
a fraction „ n of the bits is zero. The expected number of pairs x1 • x2 such
that x1 • x2 is zero on these „ n bits is 22β’„ . From this, we deduce that
the expected number of solutions to Equation (8.8) that satisfy the additional
condition is 24β’„ ’1 . A natural choice of parameters is „ = β = 1/3. In that
case, we expect to ¬nd one solution (among many) to the four-list problem,
using time and memory O(n · 2n/3 ). From a running time point-of-view, this
is more e¬cient than the usual O(n · 2n/2 ) of birthday-based attacks. On the
other hand, it only works when much larger lists of numbers are available.
Thus, we already see with four lists that asking for a single solution among
a large set leads to a better time/memory tradeo¬. However, the real power of
this technique appears when we consider a larger number of lists. To simplify
the exposition of the method, let us assume that we are given 2t lists of 2±n
numbers on n bits. The simpli¬cation comes from the fact that when the
number of lists is a power of 2, we can more easily organize the computation,
since we can group lists by packets of 2, 4, . . . Within this list, we want to
¬nd elements x1 , x2 , . . . , xt , with each xi in the corresponding list Li , whose
sum is zero (using a XOR sum). We proceed by using a succession of steps.
In each step, we regroup lists two by two, thus halving the number of lists.
Moreover, at each step, we cancel a fraction of the high bits. To balance
the computation, it is best to preserve the size of the lists considered at each
step, except in the ¬nal one, where we are happy to obtain a single solution.
Since the original lists contain 2±n elements, it is possible to cancel ±n bits at
each step while preserving the sizes of the lists. Indeed, combining two lists




© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 265

together yield 22±n possibilities and forcing ±n bits to be zeros reduces this
number by a factor of 2±n . At the ¬nal step, we use the birthday paradox
and expect a collision on 2±n bits to occur for lists of 2±n elements. With
2t lists, we perform t steps and we can work with ± ≈ 1/(t + 1). Both from
a time and memory point-of-view, the complexity of this approach is 2t+±n . √
If we are free to optimize this complexity by choosing t, then using t ≈ n

yields a subexponential algorithm with complexity 22 n .
With this extension, using bitstrings is a very comfortable option, indeed,
once a fraction of the bits has been set to zero in all groups at some steps, this
fraction clearly remains zero throughout the end of the algorithm. As already
noticed in [Wag02], this can be generalized to many other groups. However,
when generalizing Wagner™s extension to di¬erent groups, we need to take
care to avoid the resurgence of non-zero values in unwanted places. Over a
product of small groups, things are also well-behaved, it su¬ces to cancel
some of the small groups at each step to be sure that they do not reappear.
For integers modulo 2n , the best option is to cancel low order bits at each
steps, since carries only propagate from right to left. Over the integers, this
also is the best approach. Note that, in that case we encounter a new problem:
the sum of 2t numbers on n bits is a number on n + t bits. Luckily, t is small
compared to n and this problem is minor. As a consequence, we see that
Wagner™s extension has essentially the same spectrum of applicability as the
original reduced memory birthday paradox attacks.


8.3.3 Related open problems
When searching collisions between two lists, the available algorithms are
essentially optimal. Indeed, it is clear that the running time is bounded by
the time required to read the input lists. With lists formed of 2n/2 elements,
we get a lower bound of 2n/2 operations to compare with the running time
O(n2n/2 ) of a fast sort algorithm. On the contrary, with the algorithms
presented in this chapter, we have no such guarantee. As a consequence, it
is worth pondering whether these algorithms can be improved. As a special
case, it is interesting to consider three lists L1 , L2 and L3 of respective sizes
N1 , N2 and N3 . How much does it cost to ¬nd a solution of the equation:

x1 • x2 • x3 = 0, with x1 ∈ L1 , x2 ∈ L2 and x3 ∈ L3 . (8.15)

Clearly, the running time should be expressed as a function of N1 N2 N3 . In-
deed, with random lists of n-bit numbers, we expect a solution when N1 N2 N3 =
2n , independently of the individual values of N1 , N2 and N3 . With some spe-
ci¬c choices for the sizes of the lists, we can achieve a complexity O(n2n/2 ).
For example, when N3 = 1 and N1 = N2 = 2n/2 , a plain collision search
approach su¬ces. Similarly, when N1 N2 = 2n/2 and N3 = 2n/2 , the same
approach works. A much more problematic case is N1 = N2 = N3 = 2n/3 . In
this case, we can sort one list, say L3 and then for each pair (x1 , x2 ) ∈ L1 — L2




© 2009 by Taylor and Francis Group, LLC
266 Algorithmic Cryptanalysis

search for x1 • x2 in L3 . However, this solution yields a running time larger
than 22n/3 . To reduce the running time, we can follow the approach of Wagner
and enlarge the lists L1 , L2 and L3 . For lists of size, 2±n with 1/3 ¤ ± ¤ 1/2,
this leads to a running time 2(1’±)n . However, the best we can do is 2n/2 with
N1 = N2 = N3 = 2n/2 , which is worse than the basic birthday algorithm.
This speci¬c example emphasizes two more general open problems with
generalized birthday algorithms. The ¬rst open problem is to devise a new
algorithm that searches a unique solution with a better tradeo¬ between time
and memory. Typically, the current best algorithms have time 2n/2 and mem-
ory 2n/4 , so we could a¬ord to use some extra memory in order to go faster.
The second open problem concerns Wagner™s algorithm to search for one so-
lution among many. In its present form, this algorithm works extremely well
when the number of lists is a power of two. However, when this is not the
case, its behavior is obtained by rounding the number of lists down to a power
of two. In the worst case, when the number of lists is a power of two minus
one, this is highly unsatisfactory.
When considering the XOR operation with a large number of lists, it was
noted in [BM97] that Gaussian elimination can be used to solve the problem.

8.3.3.1 An incremental improvement
Looking again at the speci¬c example of three lists, it is possible to slightly
improve the basic birthday algorithm when using a speci¬c con¬guration of
the sizes of the lists. Instead of working with L1 = L2 = 2n/2 and L3 = 1,
we now use L1 = L2 = 2n/2 /r and L3 = n/2, with r = n/2. With this
choice, it is possible to rearrange the computations in order to scan the lists
L1 and L2 only once, simultaneously testing each candidate against the list
L3 . In order to do that, we ¬rst need to perform some elementary linear
algebra on the elements of the lists. Viewing each element of the three lists
as a vector in Fn , it is clear that a solution x1 • x2 • x3 = 0 also satis¬es
2
M x1 • M x2 • M x3 = 0, for any n — n Boolean matrix. Moreover, if M is
invertible, any solution of the second equation is a solution to the original
problem. As a consequence, for any invertible matrix M , we may transform
the three lists by applying M to all their elements, without changing the set
of solutions.
Now, since L3 contains n/2 vectors, it spans a linear subspace of F2n of di-
mension at most n/2. We can easily choose a basis (b1 , b2 , · · · , bn ) of F2n such
that each vector in L3 belongs to the subspace generated by (b1 , b2 , · · · , bn/2 ).
We now choose for M the matrix that transforms each vector of F2n to the
basis (b1 , b2 , · · · , bn ). Clearly, each element x3 of L3 is such that M x3 has
n/2 trailing zeros in position from n/2 to n. Moreover, L3 usually spans a
subspace of rank exactly n/2. In that case, b1 , b2 , . . . , bn/2 can simply be
chosen as the elements of L3 . In that case, any vector M x3 is zero everywhere
except in a single position between 1 and n/2. Once M is chosen, we can use
it to transform L1 and L2 . After this transformation, it su¬ces to search for




© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 267

pairs of elements (x1 , x2 ) such that M x1 and M x2 collide on the last n/2
bits in the new representation. As usual, we expect that a fraction 2n/2 of
all pairs satis¬es this property. For each such candidate pair, it then su¬ces
to check that the corresponding sum belongs to L3 . Since there are 2n/2+1 /n
such pairs and since it su¬ces to check that sum contains a single one, this
can be done e¬ciently. All in all, we gain a factor r or n/2 compared to
the basic birthday algorithm.




8.4 Some direct applications
8.4.1 Noisy Chinese remainder reconstruction
The Chinese remainder theorem, as recalled in Chapter 2, allows to recon-
struct an integer x in a given interval [Bl , Bh ] from a list of modular values
x mod pi , as long as the values pi are mutually coprime, assuming that their
product is at least Bh ’ Bl + 1.
The notion of “noisy” Chinese remainder reconstruction covers several pos-
sible extensions of this basic problem. One simple extension is presented as
Exercise 1. In this section, we present a di¬erent extension, which arises
from the problem of counting points on elliptic curves. This problem, to-
gether with a birthday approach with reduced memory to solve it, was ¬rst
presented in [JL01].
The goal of the problem is to recover a unique integer x in a range [Bl , Bh ]
from modular values modulo coprime numbers. The only di¬erence with the
basic Chinese remainder theorem is that, instead of having a single candidate
for each modular value, we have a set of candidates. Some of these sets may
still contain a single element. Note that, by using a preliminary application
of the ordinary Chinese remainder theorem, it is possible to regroup all the
modulus with a single candidate into their product. Thus, without loss of
generality, we assume that we have a list of coprime numbers p0 , p1 , . . . , pk
together with sets Si of allowed modular values modulo pi . The set S0 contains
a single element, all the other sets contain at least 2. If for some reason, none
of the modulus corresponds to a unique modular value, we simply set p0 = 1
(and s0 = {0}).
To solve this problem in the context of point counting algorithms, the orig-
inal method was to use Atkin™s match and sort algorithm. With this method,
we use the fact that the elliptic curve whose cardinality is desired can be used
to test each candidate and check its validity. This algorithm is a birthday
paradox based algorithm of the type discussed in Chapter 6. We now recall
a brief description of the algorithm, since this is useful to present the noisy
Chinese remainder variation. The ¬rst step of the match and sort algorithm
is, excluding pO to sort the coprime modulus by increasing values of |Si |/pi .




© 2009 by Taylor and Francis Group, LLC
268 Algorithmic Cryptanalysis

After doing this, we consider the smallest possible values of such that the
product of all coprime values from p0 up to p is larger than the length of
the interval Bh ’ Bl + 1. Combining all the allowed modular values for each
pi we obtain N = i=1 |Si | candidates. Note that N is almost minimal, but
depending on the exact values of the modulus pi and of the cardinality |Si |
it can in some cases be improved, especially when there is a gap between the
length of the interval and the product of the modulus.
After choosing a set of modulus, it is possible to obtain the cardinality of
the elliptic curve by brute force. It su¬ces, given an element Q in the additive
group of the elliptic curve to check, for each of the N candidates ci , whether
ci Q is the neutral element in the group. Atkin™s match and sort re¬nes this
into a birthday-based attack, by splitting the product of modulus in two
parts P and P . To each subproduct, we associate a set of allowed values,
respectively S and S . The decomposition of the product is chosen to balance
the sizes of these two sets as much as possible. At this point, it is useful to
note that by shifting everything by a well-chosen constant, we can replace
the interval [Bl , Bh ] by another interval centered around 0, say [’B, B], with
B ¤ P P /2. Once this is done, we can make sure that each candidate value s
in S is replaced by a new value s such that: s = s (mod P ), s = 0 (mod P )
˜ ˜ ˜
and s ∈] ’ P P /2, P P /2]. Similarly, each s in S is replaced by a new value
˜
s such that: s = s (mod P ), s = 0 (mod P ) and s ∈ [’P P /2, P P /2[.
˜ ˜ ˜ ˜
Making these replacements, we can rewrite each of the N possible candi-
dates as s + s + »P P for s and s in the modi¬ed sets and » ∈ {’1, 0, 1}. Fi-
˜˜ ˜ ˜
nally, it su¬ces to search for collisions between the sets, sQ and ’(˜ +»P P )Q
˜ s
on the elliptic curve.
The approach presented in [JL01] no longer makes use of the group of the
elliptic curve. Instead, it relies on the fact that the modular information that is
obtained at the beginning of point counting algorithms is redundant and tries
to make use of this redundancy to recover the cardinality. Like Atkin™s match
and sort, it uses a centered interval [’B, B] and it starts by determining a
product of modulus larger than the interval length that generates the smallest
amount of candidates. Instead of splitting this product in two, we now split
it in four factors P1 , P2 , P3 and P4 , chosen to balance the sizes of the lists
of allowed modular values for each Pi . To each factor Pi , we associate a
set Si that contains each of the candidate values modulo Pi . The values
stored in Si are normalized as follows: for each si in Si , we make sure that
si = 0 (mod Pj ) for j = i and that v ∈ [’P1 P2 P3 P4 /2, P1 P2 P3 P4 /2]. As a
consequence, each of the N candidates modulo P1 P2 P3 P4 can be written as
s1 + s2 + s3 + s4 + » · P1 P2 P3 P4 , with each si in the corresponding set Si and
» ∈ {’2, ’1, 0, 1, 2}.
Let q1 , . . . , q be the remaining modulus values, i.e., the values among the
initial p1 , . . . , pk which do not appear in the product P1 P2 P3 P4 . To each
element si in one of the sets S1 , S2 or S3 , we can associate a vector V (si )
with coordinates, where the j-th coordinate is obtained by considering si
modulo qj . Similarly, assuming that we have guessed the value of », we can




© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 269

construct V (s4 + » · P1 P2 P3 P4 ) for each s4 in S4 . It now remains to solve a
sum-of-four problem with multiple targets as in Section 8.3.1. More precisely,
we are looking for (s1 , s2 , s3 , s4 ) such that the sum

V (s1 ) + V (s2 ) + V (s3 ) + V (s4 + » · P1 P2 P3 P4 ) (8.16)

has an acceptable value modulo each modulus qi . Thus, the set of allowed
targets is structured as a direct product. We refer the reader to [JL01] for
more details.


8.4.2 Plain RSA and plain ElGamal encryptions
In this section, we describe some birthday-based attacks from [BJN00]
against some variations of RSA and ElGamal encryption which do not fol-
low the state-of-the-art recommendations for secure encryption schemes. We
call these variations plain RSA and plain ElGamal. It should be clear from
the coming discussion that these simpli¬ed cryptosystems are insecure and
should not be used. We start by de¬ning the simpli¬ed systems that we are
going to consider.

Plain RSA. Given an RSA public key (N, e), we use this key to directly
encrypt random session keys for a block cipher. Let K be such a random key,
the corresponding encrypted key simply is K e (mod N ).

Plain ElGamal. Given a large prime p, an element g of order q modulo
p, with q much smaller than p, more precisely, with (p ’ 1)/q ≥ 2m , and an
ElGamal public key y = g x (where x is the corresponding secret key). We
encrypt a randomly chosen session key K of at most m bits by choosing a
random number r and forming the ciphertext (g r , K — y r ).
When encrypting a random key K with either plain RSA or plain ElGamal,
we do not check for any speci¬c properties of this key. In particular, it is
conceivable that K viewed as an integer is a product of two integers K1 and
K2 of roughly equal size. The birthday paradox based attacks we present here
work in this speci¬c case. As a consequence, they are probabilistic attacks
which only succeed when a bad key is chosen. We give some data about the
probability of this bad event occurring at the end of the section and show that
both asymptotically and in practice, this approach works for a non-negligible
fraction of the keys.


8.4.3 Birthday attack on plain RSA
With plain RSA, if the key K can be written as K = K1 K2 , the multiplica-
tivity of RSA implies that:

K e = K1 K2
ee
(mod N ). (8.17)




© 2009 by Taylor and Francis Group, LLC
270 Algorithmic Cryptanalysis

Given the encrypted key K e , it is possible to obtain K1 and K2 from a list L of
encrypted small key (k e , k) sorted by increasing order of the ¬rst component.
Indeed, it su¬ces for all small keys K1 in L to search for K e /K1 in the ¬rst
e

component of L. If the search is successful, we clearly obtain an equality

K e /K1 = K2
e e
(mod N ), (8.18)

for two small keys, from which we deduce that K = K1 K2 .
Forgetting about logarithmic factors, we can recover a n-bit key in time 2n/2
instead of 2n . This clearly shows that plain RSA is not a secure cryptosystem.
However, this attack is costly in terms of memory and we can ask whether it
is possible to lower these memory requirements. The natural option would be
to look for K as a product of four smaller keys K = K1 K2 K3 K4 ; of course,
this covers a smaller fraction of the keys. In this case, we would like to use
the algorithms from the present chapter. However, here the group order is
unknown and discrete logarithm computations are not feasible, because N is
too large to factor. As a consequence, we ¬nd ourselves in one of the bad cases
presented in Section 8.2.2.1. A nice open problem is to adapt the algorithms
to this speci¬c case and recover K using less memory.


8.4.4 Birthday attack on plain ElGamal
With plain ElGamal, instead of directly using multiplicative properties, we
¬rst remark that both g and y belong to the (usually unique) subgroup of
order q in Fp . Thanks to this fact, we are able to derandomize the encryption
scheme and get a new value which is a function of the encrypted key only.
This value is:
p’1 p’1 p’1 p’1
(y r K) )r K
= (y =K (mod p). (8.19)
q q q q



Once again, if K = K1 K2 then
p’1 p’1 p’1
q q
K1 K2 =K , (8.20)
q



thus we can attack this scheme exactly as we did in the case of plain RSA.
The key di¬erence here is that the order of the group where we want to
solve this multiplicative equation is no longer an unknown, indeed it is equal
to (p ’ 1)/q. Assuming that this number can be factored, we now know the
decomposition of the group as discussed in Section 8.2.2.1. From this discus-
sion, it follows that for n-bit keys, if K can be written as K = K1 K2 K3 K4
a product of four numbers of about n/4 bits, we can use the quadrisection
approach if there is a small subgroup of size approximately 2n/4 . As stated
in Section 8.2.2.1, we do not even need to compute logarithms and put the
problem into additive form in order to solve it.
However, computing logarithms is a good idea, since it allows us to solve
the problem e¬ciently even when the available subgroup is too large, up to




© 2009 by Taylor and Francis Group, LLC
Birthday attacks through quadrisection 271

2n/2 elements. Indeed, we have a list of 2n/4 small numbers. Thus, we need
to compute 2n/4 discrete logarithms. In our context, the best algorithms
available are generic algorithms such as Pollard™s rho. With a group of 2n/2
elements, each computation of a discrete logarithm costs 2n/4 . As a conse-
quence, the total cost is 2n/2 and remains comparable to the running time of
the quadrisection approach. Once we have reduced the problem to its additive
form, we can, as in Section 8.2.2, lift the problem to Z and solve it e¬ciently.




© 2009 by Taylor and Francis Group, LLC
272 Algorithmic Cryptanalysis



Exercises
1h . Let N be a product of many small primes (or prime powers). Let X a
number in the interval [0, B], where B is a known upper bound. Assume
that we are given values for X mod p for each prime (power) in N , with
the following caveat: most modular values are correct, but some might
be incorrect. To parametrize the problem, it is useful to denote by NI
the product of the primes (or prime powers) for which the modular value
is incorrect.

• Show that if NI is given, we can obtain X as long as N/NI > B.
• Assuming that each prime (or prime power) in N has the same
order of magnitude P , we now assume that N contains n terms,
i.e., N ≈ P n and NI contains nI terms. Describe a brute force
algorithm that search for X by trying all possibilities for NI . Under
which condition do we obtain a unique solution for X?
˜
• An alternative approach is to ¬rst construct a value X modulo
˜
N that matches all the given modular values. Show that X = X
(mod N/NI ). Deduce that X/N can be approximated by a fraction
δ/NI . Can this fraction be obtained without exhaustive search?
• Combining exhaustive search together with the above approach,
when can X be recovered and at what cost?
2. Implement the birthday attack using quadrisection against RSA (or El-
Gamal) described in Section 8.4.2.

3h . The quadrisection algorithms from this chapter have all been presented
for commutative groups. Discuss their applicability to the group of 2—2
matrices over Fp .
4. Let L1 , . . . , Ln be n lists, each containing 2 bitstrings on n bits. Write
an e¬cient method for ¬nding n elements, one from each list, whose
sum is zero.




© 2009 by Taylor and Francis Group, LLC
Chapter 9
Fourier and Hadamard-Walsh
transforms




9.1 Introductory example: Studying S-boxes
Substitution boxes or S-boxes are very important ingredients in the con-
struction of secret key cryptographic algorithms. Their main role is to break
the linearity of these algorithms. Due to the binary nature of computers, most
S-boxes act on bits. They take a number of bits as input and produce a number
of bits as output. Having S-boxes with good properties is essential to the secu-
rity of secret-key encryption. For example, it is well known that the security of
the DES algorithm drops dramatically when its S-boxes are replaced without
care. The cryptanalysis of FEAL in [BS91b, GC91, MY92, TCG92], which
is an encryption algorithm similar to DES, illustrates this point. As a conse-
quence, the properties of S-boxes need to be studied with care when choosing
them, or when trying to cryptanalyze an encryption algorithm. In particular,
the linear and di¬erential properties of S-boxes, which are respectively used by
linear [Mat93, Mat94a, Mat94b, TSM94] and di¬erential [BS91a, BS92, BS93]
cryptanalysis, should always be determined.


9.1.1 De¬nitions, notations and basic algorithms
Let S be an S-box with a n-bit input and a t-bit output. The di¬erential
δ
characteristics of S are denoted by D∆ (S) and stand for the number of input
pairs (x, y) with input di¬erence ∆ and output di¬erence δ, i.e., such that
x • y = ∆ and S(x) • S(y) = δ. The linear characteristics of S are denoted by
Lm (S) and stand for the di¬erence between the number of n-bit values x such
M
that the two bitwise scalar products (M |x) and (m|S(x)) are equal modulo
2 and the number of pairs such that these scalar products are di¬erent. The
n-bit value M is called the input mask and the t-bit value m the output mask.
Throughout this study, to simplify notations, we identify numbers with
the binary bitstrings that represent these numbers. To make the notation
explicit, let a and b be two numbers. First, write both a and b in basis 2, i.e.,
’1 ’1
assuming -bit values as sums i=0 ai 2i and i=0 bi 2i , with all values ai and
bi in {0, 1}. In this context, we identify a with the bitstring a ’1 · · · a1 a0 , we


273
© 2009 by Taylor and Francis Group, LLC
274 Algorithmic Cryptanalysis

also de¬ne the bitwise scalar product of a and b as:


’1
(a|b) = ai bi mod 2.
i=0




From the de¬nitions, we can directly derive two basic algorithms: one for
computing di¬erential characteristics (Algorithm 9.1) and the other for lin-
ear characteristics (Algorithm 9.2). For linear characteristics, the algorithm
simply enumerates all input masks, all output masks and all input values,
compute the scalar products, do some counting and derive each characteris-
tic. The runtime is O(22n+t ) operations on n-bit counters and in terms of
memory it su¬ces to store S and a couple of additional variables, thus re-
quiring t · 2n + O(1) bits of memory (assuming that the characteristics are
produced but not stored). For di¬erential characteristics, we proceed slightly
di¬erently, by ¬rst enumerating all values of the input di¬erence ∆. For each
possible di¬erence, we initialize to zero a table of 2t counters (one for each
possible output di¬erence). Then, for each pair (x, x • ∆) with the prescribed
input di¬erence, its output di¬erence S(x)•S(x+δ) is computed and the cor-
responding counter is incremented. The runtime is O(22n ) and the memory is
t · 2n + n · 2t + O(1) bits to store S and the counters (again assuming that the
characteristics are not stored). Note that it is straightforward to gain a factor
of two using the fact that due to symmetry, each pair is computed twice, once
as (x, x • ∆) and once as (x • ∆, x). We also remark that for ∆ = 0 the result
is known without any computation, since the 2n pairs with input di¬erence
0 all have output di¬erence 0. We see that, using these basic algorithms, for
any choice of input and output sizes for S, it is easier to compute di¬erential
characterics than linear characteristics.


Algorithm 9.1 Algorithm for computing di¬erential characteristics
Require: Input Table S containing 2n elements on t bits
Create table Count of 2t integers
for ∆ from 0 to 2n ’ 1 do
Set table Count to zero
for x from 0 to 2n ’ 1 do
Increment Count[S[x] • S[x • ∆]]
end for
Print table Count: Count[δ]/2n is the probability of transition from input
di¬erence ∆ to output di¬erence δ
end for




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 275

Algorithm 9.2 Algorithm for computing linear characteristics
Require: Input Table S containing 2n elements on t bits
for M from 0 to 2n ’ 1 do
for m from 0 to 2t ’ 1 do
Let Count ←’ 0
for x from 0 to 2n ’ 1 do
if (x|M ) = (S[x]|m) then
Increment Count
end if
end for
Print Count, where Count/2n is the probability of the event
(x|M ) • (S[x]|m) = 0
end for
end for


9.1.2 Fast linear characteristics using the Walsh transform
The Walsh (or Hadamard-Walsh) transform is a kind of discrete Fourier
transform which has been used for a long time in coding theory. It was
applied to cryptography by Xiao and Massey in [XM88]. Among its possible
applications, we ¬nd the fast computation of linear characteristics. The easiest
case happens for S-boxes with a single bit of output, i.e., when t = 1. In that
case, it su¬ces to consider the case of an output mask m equal to 1. Indeed,
when m = 0 the result is independent of the S-box and quite uninteresting.
From the de¬nition, we have:

L1 (S) = 1’ 1 (9.1)
M
(x|M )=S(x) (x|M )=S(x)

(’1)(x|M ) · (’1)S(x) = (’1)(x|M ) · T (x),
=
x x


where T (x) is de¬ned as (’1)S(x) . In this form, it is interesting to split T
in two halves T0 and T1 with inputs on n ’ 1 bits. More precisely, for all
0 ¤ x < 2n’1 , we let:

T1 (x) = T (2n’1 + x).
T0 (x) = T (x) and (9.2)

Similarly, we split S in two halves S0 and S1 . Using these notations, we see
that for input masks M < 2n’1 , we have:

L1 (S) = (’1)(x|M ) · T0 (x) + (’1)(x|M ) · T1 (x) (9.3)
M
x<2n’1 x<2n’1

L1 (S0 ) + L1 (S1 ).
= M M




© 2009 by Taylor and Francis Group, LLC
276 Algorithmic Cryptanalysis

Likewise, for input masks of the form 2n’1 + M with M < 2n’1 , we have:

L1n’1 +M (S) = (’1)(x|M ) · T0 (x) ’ (’1)(x|M ) · T1 (x)
2
x<2n’1 x<2n’1

L1 (S0 ) ’ L1 (S1 ).
= (9.4)
M M

Clearly, T0 and T1 can also be divided in halves and so on . . . until we reach
tables of size one where nothing remains to be done. This yields the following
algorithm, which receives as input the table T , modi¬es it in place and outputs
a new table, the Walsh transform of T denoted W (T ). At position M in this
table, we ¬nd the value WM (T ) = L1 (S). This algorithm can be written
M
without using recursion, as described in pseudo-code as Algorithm 9.3.


Algorithm 9.3 Walsh transform algorithm
Require: Input Table T containing 2n elements ’1, 1
Comment: Variable Sz is the small table size
Comment: Variable Pos is the small table position
for i from 0 to n ’ 1 do
Let Sz ←’ 2i , Pos ←’ 0
while (Pos < 2n ) do
for j from 0 to Sz ’ 1 do
Let Sum ←’ T [Pos + j] + T [Pos + Sz + j]
Let Diff ←’ T [Pos + j] ’ T [Pos + Sz + j]
T [Pos + j] ←’ Sum
T [Pos + Sz + j] ←’ Diff
end for
Let Pos ←’ Pos + 2 · Sz
end while
end for
Output overwritten content of T containing W (T )



We see that the time complexity of the Walsh transform is O(n · 2n ) and
that the required memory is O(2n ) numbers or O(n·2n ) bits since the numbers
contained in the Walsh transform are integers in the range between ’2n and
2n . Another important fact about the Walsh transform is that it can easily
ˆ
be inverted by a similar looking algorithm, the inverse Walsh transform W ,
described in pseudo-code in Algorithm 9.4.
Note that the only di¬erence between the Walsh transform and its inverse is
the fact that in the inverse transform the variables Sum and Diff are divided
by 2 before being put back into the table. It is easy to check that each
elementary step of size Sz in the inverse Walsh transform is the inverse of the
corresponding step in the regular transform. Moreover, the order in which
the various elementary steps are performed is irrelevant because each of them




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 277

Algorithm 9.4 Inverse Walsh transform algorithm
Require: Input Table T containing 2n elements ’1, 1
Comment: Variable Sz is the small table size
Comment: Variable Pos is the small table position
for i from 0 to n ’ 1 do
Let Sz ←’ 2i , Pos ←’ 0
while (Pos < 2n ) do
for j from 0 to Sz ’ 1 do
Let Sum ←’ T [Pos + j] + T [Pos + Sz + j]
Let Diff ←’ T [Pos + j] ’ T [Pos + Sz + j]
T [Pos + j] ←’ Sum/2
T [Pos + Sz + j] ←’ Diff/2
end for
Let Pos ←’ Pos + 2 · Sz
end while
end for
ˆ
Output overwritten content of T containing W (T )



commutes with all the others. As a consequence, we deduce that the two
algorithms are indeed inverses of each other.
For general S-boxes with t bits of output, it is also possible to speed up
the computation using the Walsh transform. It su¬ces for each output mask
m to construct an auxiliary table Sm (x) = (m|S(x)) with a single bit of
output, and then to proceed as above. This lowers the time complexity down
to O(n · 2n+t ) compared to O(22n+t ) for the basic algorithm. The memory
requirements are essentially multiplied by 1 + n/t, since in addition to S we
now need to hold in memory the Walsh transform of one Sm . Using this Walsh
transform algorithm, the computation of linear characteristics now compare
favorably with the computation of di¬erential characteristics, especially when
the output size t is much smaller than the input size n. Indeed, we have to
compare O(n · 2n+t ) with O(22n ).

9.1.2.1 Basic implementation of the Walsh transform
In this section, we give a basic implementation of Algorithm 9.3 as Pro-
gram 9.1. One interesting experiment is to measure the speed of this code for
tables of increasing sizes. In this experiment, we take all possible sizes from
32 to 228 and, in order to re¬‚ect the time per element, we repeat the Walsh
transform 228 /S times for a table of size S. We give the observed running
time in Table 9.1. For very small tables, the running times are high due to
function calls overhead. For large tables, we see that there is a progressive
slowdown between 211 and 220 , followed by a brutal slowdown after 220 . This
is the result of cache e¬ects for large tables. Cache friendly versions of the
Walsh transform are given on the book™s website.




© 2009 by Taylor and Francis Group, LLC
278 Algorithmic Cryptanalysis




Program 9.1 C code for Walsh transform
#define TYPE int

/*In place Walsh transform*/
void Walsh(TYPE *Tab, TYPE size)
{
TYPE i,i0,i1; TYPE step;
TYPE sum,diff;
for (step=1;step<size;step<<=1) {
for (i1=0;i1<size;i1+=2*step) {
for (i0=0;i0<step;i0++) {
i=i1+i0;
sum=Tab[i]+Tab[i+step];diff=Tab[i]-Tab[i+step];
Tab[i]=sum;Tab[i+step]=diff;
}
}
}
}




Size S Runtime for Size S Runtime for
28 28
2 /S executions 2 /S executions
32 16.0 s 64 9.4 s
128 6.5 s 256 4.9 s
512 4.3 s 1024 4.1 s
2048 3.9 s 4096 4.0 s
8192 4.1 s 16384 4.5 s
32768 4.6 s 65536 4.8 s
131072 5.0 s 262144 5.1 s
524288 5.3 s 1048576 6.0 s
2097152 11.9 s 4194304 13.5 s
8388608 14.2 s 16777216 14.9 s
33554432 15.4 s 67108864 16.1 s
134217728 16.7 s 268435456 17.3 s

Table 9.1: Timings on Intel Core 2 Duo at 2.4 GHz using gcc 4.3.2




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 279

9.1.3 Link between Walsh transforms and di¬erential char-
acteristics
In 1994, in two independent articles, Chabaud and Vaudenay [CV94], Dae-
men, Govaerts and Vandewalle [DGV94] discussed the relation between the
Walsh transform, or more precisely its square, and the value of di¬erential
characteristics. In this section, we illustrate this relation by explicitly de-
scribing a Walsh transform based algorithm for computing di¬erential char-
acteristics.
As with the case of linear characteristics, we start by studying S-boxes with
a single bit of output. In order to represent these S-boxes, we are going to
use polynomials in a multivariate polynomial ring. More precisely, we start
from the commutative ring over Z in n variables X0 , X1 , . . . , Xn’1 . Then we
2 2 2
form an ideal I, generated by the n polynomials X0 ’ 1, X1 ’ 1, . . . Xn’1 ’ 1.
Finally, we form the quotient ring:
Z[X0 , X1 , · · · , Xn’1 ]
K= . (9.5)
(I)
In the ring K, we encode an S-box S by the polynomial fS de¬ned as:
2n ’1
(’1)S(i) Mi .
fS = (9.6)
i=0

In this expression Mi denotes the following monomial in K:
n’1
Bj (i)
Mi = Xj , (9.7)
j=0

where Bj (i) denotes the j-th bit in the binary representation of i. Note that
in this representation, we use the trick of representing S-boxes multiplicatively
by 1 and ’1 instead of 0 and 1, as we already did in the table T in Section 9.1.2.
In this polynomial representation, we ¬rst remark that the multiplication
of monomials modulo I is such that Mi = Mj · Mk if and only if i = j • k.
2
This implies that the coe¬cient of M∆ in fS is:

(’1)S(i) (’1)S(j) . (9.8)
i•j=∆

Since each term in the sum is equal to 1 when S(i) = S(j) and to ’1 when
0 1
S(i) = S(j), the sum is clearly equal to D∆ (S) ’ D∆ (S). Moreover, it is clear
from a simple counting argument that D∆ (S) + D∆ (S) is 2n . Thus, we easily
0 1
0 1
recover D∆ (S) and D∆ (S).
2
As a direct consequence, computing fS yields the full table of di¬eren-
tial characteristics for the single bit output S-box S. Thus, ¬nding a fast
multiplication algorithm in K is su¬cient to quickly compute the di¬erential
characteristics of S.




© 2009 by Taylor and Francis Group, LLC
280 Algorithmic Cryptanalysis

What we need to do now, is a simpler analog of the well-known fact that
in the polynomial ring Z[x], fast multiplication can be performed using fast
Fourier transform, pointwise multiplication and inverse Fourier transform.
To describe fast multiplication in K, a simple way is to proceed by recursion
on the number of variables. Assume that we already know how to multiply
polynomials on n ’ 1 variables. To multiply f and g, we ¬rst write f as
f0 + Xn’1 f1 and g as g0 + Xn’1 g1 . With this notation we have:

f · g = (f0 g0 + f1 g1 ) + Xn’1 (f0 g1 + f1 g0 ). (9.9)

Denoting the product by h and writing h0 = f0 g0 +f1 g1 and h1 = f0 g1 +f1 g0 ,
we now remark that:

h0 + h1 = (f0 + f1 ) · (g0 + g1 ) (9.10)
h0 ’ h1 = (f0 ’ f1 ) · (g0 ’ g1 ).

Viewing f , g and f g as tables with 2n entries, where entry i is the coe¬cient
of Mi , and going through the recursion, we ¬nd the following relation between
the Walsh transforms of f , g and h:

W (h) = W (f ) — W (g), (9.11)

where — denotes pointwise multiplication, i.e.,

∀i : Wi (h) = Wi (f ) · Wi (g), (9.12)

Then h itself is obtained by computing the inverse Walsh transform of the
pointwise product.
2
In order to compute fS , it su¬ces to perform a single multiplication in K.
More precisely, we compute the Walsh transform of fS , square it pointwise and
apply the inverse Walsh transform. Thus, using this algorithm the complexity
of computing fS requires O(n · 2n ) bit operations. As a consequence, we build
2

the complete table of di¬erential characteristics for a S-box S with a single
bit output using O(n · 2n ) bit operations, compared to O(22n ) for the basic
Algorithm 9.1.

9.1.3.1 Di¬erential characteristics for general S-boxes
In order to deal with general S-boxes with more than a single bit of output,
we need to generalize the above method, in order to account for the additional
output bits. We now describe two slightly di¬erent approaches that apply to
this generalized case.

9.1.3.1.1 Theoretical approach The ¬rst approach, is to look at the
problem from a theoretical point-of-view, replacing the function with several
bits of output by another function with a single bit of output. The simplest
way to proceed is probably to replace the S-box S by its characteristic function




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 281

as proposed in [CV94]. Indeed, the characteristic function is a function from
n + t bits to a single bit and we can encode it and use the Walsh transform
almost as in the case of a single bit output. More precisely, the characteristic
function of S is the function χS is de¬ned as:

1, if y = S(x) and
χs (y x) = (9.13)
0, otherwise.

In this de¬nition y x is a decomposition of the input to χS into a lower part
x on n bits and an upper part y on t bits.
Then, we use monomials on n + t variables in an enlarged copy of K and
de¬ne:
2n+t ’1 2n ’1
M (i + 2n · S(i)).
FS = χS (i)M (i) = (9.14)
i=0 i=0


Note that, this time, we do not use a multiplicative encoding with 1 and
’1. Using a simple computation, we can check that the coe¬cient of M (∆ +
2n · δ) in FS is exactly D∆ (S). Thus we can compute all the di¬erential
2 δ

characteristics using a Walsh transform followed by a pointwise multiplication
and ¬nally an inverse Walsh transform. The time complexity is O((n + t) ·
2n+t ). However, the drawback of this approach is that we need to store in
main memory a large table with 2n+t entries.


9.1.3.1.2 Practical Variant To avoid this main drawback, we can use a
slightly di¬erent approach and rely on disk storage instead of main memory to
store the 2n+t entries. The goal here is to reduce the amount of main memory
needed to the maximum of 2n and 2t . The key ingredient is to alternatively
view a large table with 2n+t elements either as 2t columns of length 2n or 2n
rows of length 2t . First, to create the table by columns, we enumerate the 2t
possible1 linear output masks m and de¬ne Sm (x) = (m|S(x)) with a single
bit of output as in Section 9.1.2. For each value of m, we de¬ne fSm , compute
its square and save it to disk as a column containing 2n elements. Once this is
¬nished, we consider each of the 2n possible values2 for ∆ and read from disk
all the numbers fSm (∆) into a table T∆ with 2t elements. We now study the
2

relation between this table and the table T∆ of all di¬erential characteristics
δ
D∆ (S), for ¬xed input ∆. We claim that when the output di¬erence of S is



1 Clearly,
for the zero mask, the computation is independent of S and the constant result
can be speci¬ed within the algorithm. However, to simplify the exposition, we skip this
minor improvement.
2 Once again, we could ignore the case where ∆ = 0.




© 2009 by Taylor and Francis Group, LLC
282 Algorithmic Cryptanalysis

δ, the output di¬erence of Sm is clearly (m|δ). As a consequence:
0 1
T∆ (m) = D∆ (Sm ) ’ D∆ (Sm ) (9.15)
δ δ
D∆ (S) ’ D∆ (S)
=
(m|δ)=0 (m|δ)=1

(’1)(m|δ) D∆ (S) =
δ
(’1)(m|δ) T∆ (δ).
=
δ δ

Thus, T∆ is the Walsh transform of the table T∆ . This shows that by com-
puting an additional inverse Walsh transform on each table T∆ we can re-
cover the full table of di¬erential characteristics. The complexity is again
O((n + t) · 2n+t ), but we require less main memory since we use disk space to
e¬ciently transpose the table from columns with ¬xed m to rows with ¬xed
∆. In fact, this approach does not even need the distinction between input
and output bits, it can be generalized to any Walsh transform computation
which over¬‚ows the available main memory, by regrouping bits arbitrarily.
When we compare this to the complexity of the basic algorithm, we see
that it is very useful when t is small compared to n but much less when t is
almost as large as n. The case where t is larger than n is rarely considered;
however, in this extreme case, the basic algorithm is faster than the Walsh
based technique described here. However, for S-boxes with many algebraic
properties, it may happen that even for large t a fast Walsh transform based
algorithm exists. It is for example the case of the AES S-box, due to the
linear redundancy between its bits described in [FM03]. Another possible
application is the computation of generalized di¬erences with more than two
inputs. In this case, the cost of the basic algorithm grows to 23n (with three
inputs), while the Walsh based algorithm cost remains at (n + t)2n+t .

9.1.4 Truncated di¬erential characteristics
Truncated di¬erential cryptanalysis is a variation of di¬erential cryptanal-
ysis introduced in [Knu94]. The basic idea is to study partially known di¬er-
ences, where the di¬erence is ¬xed on some bits and left unknown on other
bits. For this variation, it is possible to devise a variation of the basic algo-
rithm and build truncated di¬erential characteristics by ¬rst enumerating an
input and an output masks, which describe the input and output bits where
the di¬erence is known. Then, for each pair of masks, one enumerates all pos-
sible input pairs and increments the counter corresponding to the truncated
input and output di¬erence of the pair. This is shown in pseudo-code as Al-
gorithm 9.5. The time complexity of this algorithm is O(max(23n+t , 22n+2t ));
it requires a table of 2n+t elements in terms of memory. With this basic
approach, computing all possible truncated di¬erential characteristics costs
a lot more than computing ordinary di¬erentials. Before giving an improved
algorithm, we ¬rst need to count the total number of possible truncated di¬er-
entials, in order to obtain a lower bound for the complexity. The easiest way




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 283

is to remark that in general a truncated di¬erential is de¬ned by specifying
for each input and output bits whether it should be zero, one or unknown.
Thus, there are 3n+t di¬erent values to compute.


Algorithm 9.5 Algorithm for truncated di¬erential characteristics
Require: Input Table S containing 2n elements on t bits
Create bidimensional array Count of 2n — 2t integers
for InMask from 0 to 2n ’ 1 do
for OutMask from 0 to 2t ’ 1 do
Set table Count to zero
for x from 0 to 2n ’ 1 do
for y from 0 to 2n ’ 1 do
Increment Count[(x • y)&InMask, (S[x] • S[x • ∆])&OutMask]
end for
end for
Print table Count: Count[∆, δ]/2n is the probability of transition from
partial input di¬erence ∆ to partial output di¬erence δ
end for
end for


Looking beyond the basic approach, another idea is to compute the trun-
cated di¬erentials from the table of ordinary di¬erential. More precisely, we
can see that any truncated di¬erential is the sum of all the ordinary di¬er-
entials that are compatible with this truncated di¬erential. Here the word
“compatible” is understood in the natural sense and means that when the
truncated di¬erential has a 0 or 1 bit in either the input or output mask, the
ordinary di¬erentials we sum have the same bit value. When the truncated
di¬erential has an undetermined bit, we sum over all possibilities for this bit.
As a consequence, to compute a truncated di¬erential with k bits left un-
determined, we need to sum 2k ordinary di¬erentials. For each choice of k
positions for the undetermined bits, there are 2n+t’k possible truncated dif-
ferentials, obtained by choosing the values of the bits in the other positions.
As a consequence, the total cost to compute all the truncated di¬erential
with a given set of undetermined bits is 2n+t . Since there are 2n+t possible
sets of undetermined bits, the total complexity of this second approach is
22n+2t = 4n+t .

In order to compute truncated di¬erentials, let us ¬rst consider the simplest
case where we limit the possibility of truncating to a single bit. To make things
even simpler, let us assume that the only bit which can be truncated is the
¬rst input bit of S. In that case, given a ¬xed output di¬erence δ and a
¬xed di¬erence on n ’ 1 input bits ∆ , we need to build the two di¬erential
δ δ
characteristics D0 ∆ (S) and D1 ∆ (S) to cover the two possible cases where




© 2009 by Taylor and Francis Group, LLC
284 Algorithmic Cryptanalysis

we do not truncate. We also need to compute a di¬erential where the bit is
really truncated:
δ δ δ
D— ∆ (S) = D0 ∆ (S) + D1 ∆ (S). (9.16)
Here, the notation — indicates that a bit of di¬erence is left unspeci¬ed.
There are several ways to use Equation (9.16) in order to compute truncated
di¬erential. One possibility, remembering that di¬erential characteristics are
computing by a Walsh transform, followed by squaring and an inverse Walsh
transform, is to embed the computation of the truncated characteristics within
a modi¬ed inverse Walsh transform algorithm for the ¬nal step. Starting with
a single unspeci¬ed bit in high-order position, we can deal with it in the last
iteration of the inverse Walsh transform. Thus, we now concentrate on this
last iteration and ¬nd out how it can be modi¬ed to compute the additional
truncated di¬erence. The required modi¬cation is very simple, since it su¬ces
in addition to Sum and Diff to store a copy of Sum + Diff, i.e., a copy of the
original content of T [Pos + j] somewhere. Doing a similar reasoning on other
bits easily yields an extended variation of the inverse Walsh transform that
works on tables of size a power of 3. It is then possible to plug this modi¬ed
inverse Walsh in the algorithms of Section 9.1.3.1. Note that, for the practical
variant presented in Section 9.1.3.1.2 that uses two inverse Walsh transform
steps, both need to be modi¬ed, one to allow truncated input di¬erences, the
other to permit truncated outputs. One di¬culty is that right before calling
the modi¬ed inverse transform, we need to convert tables of size 2n (resp.
2t ) into tables of size 3n (resp. 3t ). This is done by sending the entry in
position i with binary decomposition i = j=0 bj 2j to position j=0 bj 3j .
Thus, the binary decomposition of i is interpreted in basis 3. All unspeci¬ed
positions are ¬lled with zeros. A convenient convention is to organize the
modi¬ed inverse Walsh transform in such a way that for the entry in position
i the corresponding truncated di¬erence is obtained by looking at the base 3
decomposition of i. A digit set to 0 means a 0 di¬erence on the corresponding
bits, a 1 means 1 and a 2 means — (unknown). From a complexity point-of-
view, we ¬rst study the runtime of the modi¬ed Walsh transform on a table of
size 3 . At ¬rst, it seems that the complexity is O( 3 ). However, taking into
account that a large part of the table is initially zero and that it is useless to
add/subtract zeros, the complexity can be lowered to O(3 ). To implement
this improvement, it is easier to incorporate the binary to base 3 renumbering
within the modi¬ed inverse Walsh transform itself. With this improvement,
the main contribution to the runtime of the complete truncated di¬erential
algorithm is the second layer of modi¬ed Walsh transform. Indeed, the ¬rst
layer performs 2t transforms on tables of size 3n , while the second layer does
3n transforms on tables of size 3t . Thus, we obtain a total runtime of O(3n+t )
which is optimal, since we need to compute 3n+t values.
For completeness, we give on the book™s website two di¬erent C programs
computing truncated di¬erentials with this algorithm: one implementing the
modi¬ed inverse Walsh with complexity O( 3 ), the other slightly more com-




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 285

plicated with complexity O(3 ). For the sake of simplicity, in these programs
everything is stored in main memory. However, for larger tables, it would
be worth doing the transpose from columns to lines on disk as explained in
Section 9.1.3.1.2.




9.2 Algebraic normal forms of Boolean functions
For a table S with n input bits and a single bit of output or equivalently
for the associated Boolean function f on n bits, another very interesting rep-
resentation is to write the Boolean function f as a polynomial in n variables
over the polynomial ring F2 [x0 , x1 , . . . , xn’1 ]. This representation is called
the algebraic normal form of f and has many applications in cryptography
and coding theory. In particular, for a cryptanalyst, it is often interesting, in
the context of algebraic cryptanalysis, to look at the degree of the algebraic
norm form of f .
More precisely, the algebraic normal form of f is obtained by writing:
xai
f (x0 , . . . , xn’1 ) = g(a0 , . . . , an’1 ) (9.17)
i
(a0 ,...,an’1 )∈Fn i
2


The function g giving the coe¬cient of the polynomial in Equation (9.17) is
called the Moebius transform of the Boolean function f . Since g is also a
Boolean function, Moebius transforms can be computed by using operations
on bits only. Compared to Walsh transforms this gives a very signi¬cant
advantage and allows us to represent Boolean functions using a single bit in
memory per entry, instead of a full integer. It remains to see whether a fast
algorithm can work with this representation.
With a Boolean function f on a single bit entry x0 , it is easy to see that:
and g(1) = f (0) • f (1).
g(0) = f (0) (9.18)
Moreover, since the transformation that sends (x, y) to (x, x • y) is its own
inverse, we see that for Boolean functions on a single variable, the Moebius
transform is an involution on the set of Boolean functions. To generalize to
more than one variable, let us consider f a Boolean function on n variables
and write:
f (x0 , . . . , xn’1 ) = f (0) (x0 , . . . , xn’2 ) • f (1) (x0 , . . . , xn’2 ) · xn’1 , (9.19)
where f (0) and f (1) are two Boolean functions on n ’ 1 variables. If g (0) and
g (1) are the Moebius transforms of respectively f (0) and f (1) , they are related
to the Moebius transform g of f by the equations:
g(x0 , . . . , xn’2 , 0) = g (0) (x0 , . . . , xn’2 ) and (9.20)
g(x0 , . . . , xn’2 , 1) = g (1) (x0 , . . . , xn’2 ).




© 2009 by Taylor and Francis Group, LLC
286 Algorithmic Cryptanalysis

Moreover, f (0) and f (1) are related to f by the equations:

f (0) (x0 , . . . , xn’2 ) = f (x0 , . . . , xn’2 , 0) and (9.21)
(1)
(x0 , . . . , xn’2 ) = f (x0 , . . . , xn’2 , 0) • f (x0 , . . . , xn’2 , 1).
f

From these equations, we can devise a Moebius transform algorithm which is
very similar to the Walsh transform Algorithm 9.3. This yields Algorithm 9.6,
which is its own inverse. It can be as easy to implement as Program 9.2.


Algorithm 9.6 Moebius transform algorithm
Require: Input Truth table S of Boolean function f , with 2n entries
Comment: Variable Sz is the small table size
Comment: Variable Pos is the small table position
for i from 0 to n ’ 1 do
Let Sz ←’ 2i , Pos ←’ 0
while (Pos < 2n ) do
for j from 0 to Sz ’ 1 do
S[Pos + Sz + j] ←’ S[Pos + j] • S[Pos + Sz + j]
end for
Let Pos ←’ Pos + 2 · Sz
end while
end for
Output overwritten content of S containing Moebius transform




9.3 Goldreich-Levin theorem
Goldreich-Levin theorem [LG89] is a fundamental theorem in the theory of
cryptography. In particular, it is essential to the proof by H˚
astad, Impagli-
azzo, Levin and Luby [HILL99] that a secure pseudo-random generator can
be constructed from any one-way function.


THEOREM 9.1
Let us denote by x a ¬xed unknown n-bit value and denote by f a ¬xed n-
bit to t-bit one-way function. Suppose there exists an algorithm A that given
the value of f (x) allows to predict the value of a scalar product (R|x) with
probability 1 + over the choice R among n-bit strings, using at most T
2
operations. Then there exists an algorithm B, which given f (x) produces in
time at most T a list of at most 4n2 ’2 values that contain x with at least




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 287




Program 9.2 C code for Moebius transform
#define TYPE unsigned int

/*In place Moebius transform*/
void Moebius(TYPE *Tab, TYPE size)
{
int Wsize;
TYPE i,i0,i1; TYPE step;

Wsize=size/(8*sizeof(TYPE));

/*Moebius transform for high order bits, using word ops*/
for (step=1;step<Wsize;step<<=1) {
for (i1=0;i1<Wsize;i1+=2*step) {
for (i0=0;i0<step;i0++) {
i=i1+i0;
Tab[i+step]^=Tab[i];
}
}
}
/*Moebius transform for low order bits, within words*/
/* Assumes 8*sizeof(TYPE)=32 */
for(i=0;i<Wsize;i++) {
TYPE tmp;
tmp=Tab[i];
tmp^=(tmp<<16);
tmp^=(tmp&0xff00ff)<<8;
tmp^=(tmp&0xf0f0f0f)<<4;
tmp^=(tmp&0x33333333)<<2;
tmp^=(tmp&0x55555555)<<1;
Tab[i]=tmp;
}
}




© 2009 by Taylor and Francis Group, LLC
288 Algorithmic Cryptanalysis

1/2. The running time T is given by:
2n2 2n 2n
T= T + log +2 + Tf . (9.22)
2 2 2


Informally, this theorem means that, for a one-way function f , it is not pos-
sible to ¬nd an e¬cient algorithm that can given f (x) predict scalar products
of the form (R|x) with a non-negligible advantage. Of course, if we remove
the fact that the prediction of (R|x) can be noisy, the theorem becomes a
straightforward application of linear algebra methods. Indeed, from n scalar
products (Ri |x), we completely recover x as soon as the Ri values are linearly
independent over F2 .
Similarly, if we are given a probabilistic algorithm which outputs the value
of (R|x) with probability 1 + for any ¬xed R, recovering x is extremely easy.
2
Indeed, in that case, by choosing for R a value with a single bit set to 1 and
all other set to 0, we can learn the exact value of the corresponding bit of x.
It su¬ces to repeat the probabilistic algorithm to obtain many independent
predictions for (R|x). After some time, the correct value can be detected
through a majority vote (see Chapter 12).
The di¬culty with the proof of Goldreich-Levin theorem is that, in some
sense, we need to hide the fact that we are repeating queries. To predict the
value of (R|x) for a ¬xed R, we ¬rst choose k random auxiliary strings R1 ,
. . . , Rk . Then, we call the prediction algorithm A, 2k times, giving as input
k
f (x) together with a value R (±) = R • i=1 ±i Ri , where ± takes all possible
values in {0, 1}k . Since:
k
(R|x) = (R (±)|x) • ±i (Ri |x), (9.23)
i=1

if we guess the k values of (Ri |x) we deduce 2k di¬erent prediction for (R|x).
Note that these predictions are not independent, only pairwise independent,
yet this su¬ces to make the proof go through.
Finally, from an algorithmic point-of-view, we need to compute the di¬er-
ence between the number of predicted 0 values and the number of predicted
1 values. This is precisely obtained by using a Walsh transform on the table
of values (R (±)|x).




9.4 Generalization of the Walsh transform to Fp
The Walsh transform as presented in Section 9.1.2 is speci¬c to the binary
¬eld. However, it is natural to consider whether a similar method is appli-
cable to other ¬nite ¬elds. Since the Walsh transform is an e¬cient way to




© 2009 by Taylor and Francis Group, LLC
Fourier and Hadamard-Walsh transforms 289

search for approximate linear relations between the input and output of a
function, we ¬rst need to adapt the formalization of linear relations to larger
¬nite ¬elds. We only consider the case of prime ¬elds Fp . Indeed, any linear
relation that holds with good probability over an extension ¬eld implies an-
other good linear relation over the corresponding base ¬eld. Such a implied
relation can be constructed by applying the trace map to the original relation.
Such a generalized Walsh transform can be useful in several applications. A
¬rst application is to compute characteristic for the non-Boolean generaliza-
tion of linear cryptanalysis introduced in [BSV07]. Another, more theoretical
application presented in Berbain™s PhD thesis [Ber07] concerns the security
proof of a generalization of the QUAD family of stream ciphers from [BGP06],
based on the evaluation of quadratic polynomials, to non-binary ¬elds. This
proof relies on a generalization of the Goldreich-Levin theorem to non-binary
¬elds.
In this section, S is a function from Fn to Ft and all scalar products are
p p
computed modulo p. We ¬rst note a key di¬erence between F2 and the general
case Fp . In the general case, a linear expression can take more than two values.
As a consequence, it is no longer possible to regroup all cases into a single
number. Over F2 , this was possible, because for any linear relation, we could
split all inputs to f into two sets: the matching entries and the non-matching
entries. Since the total number of input is ¬xed and known, the di¬erence
between the cardinality of these two sets su¬ced to encode all the information
we need. Over Fp , we need to distinguish between p di¬erent cases. As a
consequence, we change our de¬nition accordingly and de¬ne Lm (S)(±) as
M
the cardinality of the set of inputs x such that:

(M |x) + (m|S(x)) = ± (mod p). (9.24)

We know that for all input and output masks:
p’1
Lm (S)(±) = pn , (9.25)
M
±=0

since there are pn possible inputs to S. Clearly, we now need p ’ 1 di¬erent
numbers for each pair of masks to extract the complete information about
linear relations. Moreover, we can remark than there are relations between
some pairs of masks. More precisely, for any non-zero constant » in Fp we
have:
Lm (S)(±) = L»·m (S)(»±) , (9.26)
M »·M

where » · m denotes the multiplication of each component in the mask m by ».
Of course, as with the binary ¬eld, we ¬rst need to address the basic case
where S outputs a single element in Fp . In that case, thanks to the multi-
plicative property above we only need to consider output masks of the form
m = 1. Indeed, when m = 0 the values Lm (S)(±) are independent of S and
M
uninteresting. In this context, in order to compute the linear characteristics




© 2009 by Taylor and Francis Group, LLC
290 Algorithmic Cryptanalysis

of S, we split x into x0 x , M into M0 M and S into p subfunctions Sx0 for
each ¬xed value of the ¬rst coordinate. Using the de¬nition, we then ¬nd:

L1 (S)(±) = # {x|(M |x) + S(x) = ± (mod p)} (9.27)
M
= # ∪x0 ∈Fp {x |(M |x ) + S(x0 x )} = ± ’ M0 x0 }
= # ∪x0 ∈Fp {x |(M |x ) + Sx0 (x )} = ± ’ M0 x0 }
L1 (Sx0 )(±’M0 x0 ) .
= M
x0 ∈Fp

From this equations, we could easily derive a Walsh transform that oper-
ates on p — pn values. However, this would not be optimal. Indeed, we have
pn possible input masks, p values of ± for each mask and one copy of Equa-
tion (9.25) for each mask. As a consequence, it su¬ces to compute (p ’ 1) · pn
numbers to obtain the complete characterization of linear approximations of
S. In order to devise such an optimal variation, we proceed as in the binary
case, keeping only the computation of L1 (S)(±) , for non-zero values of ±. Of
M
course, this means that whenever L1 (S)(0) is needed, we have to replace it
M
using Equation (9.25). To avoid cumbersome constants, it is easier to com-
¯
pute the renormalized values L1 (S)(±) = L1 (S)(±) ’ pn’1 , which are related
M M
by:
p’1
¯M
Lm (S)(±) = 0. (9.28)
±=0

Moreover, since the restricted function Sx0 only has n ’ 1 input coordinates,
we also have:
¯M ¯M
L1 (S)(±) = L1 (Sx0 )(±’M0 x0 ) . (9.29)
x0 ∈Fp

The simplest way to implement the algorithm is to recompute values of
¯
the form L1 (S)(0) when they are needed, right before the innermost loop
M
of the algorithm. To make the algorithm complete, the only missing step

<<

. 10
( 18)



>>