<<

. 14
( 18)



>>




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 377

linear function coincide with the output of f for some fraction p of the 2k
possible inputs. Let yt = Lf (xt , xt’δ1 , · · · , xt’δk’1 ) and remark that when
the k-uple on the right-hand side belongs to the set of input where f and Lf
agree, then yt = zt . Moreover, since yt is a linear combination of outputs of
the same LFSR, with shifted initial values, yt itself can be produced by the
LFSR when it is initialized with the XOR of all these initial values. Assuming
for simplicity that the k-uples (xt , xt’δ1 , · · · , xt’δk’1 ) are random and inde-
pendent, it should be clear that zt is the image of yt under a binary symmetric
channel with probability p and bias = 2p ’ 1.
Of course, this independence hypothesis is clearly false. However, k-uples
which are generated by LFSRs are well balanced and despite the dependence
between successive k-uples, attacks based on the binary symmetric channel do
work well in practice. Correlation attacks precisely work within this frame-
work, forget about the exact output function f and simply focus on the bias
of the best possible linear approximation.


12.2.2 Maximum likelihood decoding
Given a noisy LFSR with bias , a natural question is to ponder whether
the real output of the LFSR can be recovered from N bits of noisy output.
This question arises both in coding theory and in cryptography. It is very
useful to ¬rst solve the problem with unlimited computing power. Clearly,
if the LFSR is fully speci¬ed, then its feedback polynomial is already known
and recovering its exact output is equivalent to ¬nding its initial state.
With unlimited computing power, it is possible to compute the LFSR out-
put on N bits for all initial states. Comparing all these sequences to the
noisy output z, we can measure the Hamming distance of each candidate to
the noisy output, i.e., the number of bits where the sequences di¬er. For
the correct guess, the average Hamming distance is (1 ’ p)N . For incorrect
guesses, assuming that they behave essentially like random strings, we expect
an average distance of N/2. The average distance for the correct guess simply
comes from the way the noise is generated. However, the exact behavior of
incorrect guesses is slightly harder to explicit. In fact, the di¬erence between
the observed stream and an incorrect guess is the XOR of three terms, the
LFSR output for the guessed initial state, the LFSR output for the correct
initial state and the noise. By linearity, the two LFSR outputs can be grouped
into a single LFSR output whose initial state is obtained by xoring the two
initial states. As a consequence, the di¬erence is a noisy copy of this LFSR
output. Thus, it is almost balanced, i.e., it contains about half 0s and half
1s. This implies that the average distance is N/2 as expected. However, since
there are exponentially many incorrect guesses, a few of these bad guesses
may be much closer to the observed strings.
One important question with maximum likelihood decoding is to determine
the parameters for which the correct guess is likely to correspond to the small-
est Hamming distance to the noisy output. Indeed, when this happens, it is




© 2009 by Taylor and Francis Group, LLC
378 Algorithmic Cryptanalysis

possible to determine the correct guess by keeping the candidate correspond-
ing to the minimal Hamming distance. Note that in some contexts, a weaker
condition may su¬ce. For example, if it is possible to test whether a likely
candidate is indeed correct, then we can be satis¬ed if the maximum likeli-
hood decoding produces a short list of candidates that contains the correct
one.
In practical cryptanalysis, this exhaustive approach can be used to cryptan-
alyze keystream generators made of several independent LFSRs together with
a non-linear output rule. Indeed, in this context, each LFSR has a small in-
ternal state compared to the complete cryptosystem and an exhaustive search
for the initial state of a single LFSR is much more e¬cient than a global ex-
haustive key search. As a consequence, if a correlation of bias can be found
between the output of the keystream generator and one of the internal LFSRs
with a state of length n, then the initial state of this register can be recovered
in time N 2n using N ≈ 1/ 2 bits of keystream. In particular, this can be
applied to Ge¬e™s generators. This attack initially proposed in [Sie84, Sie85]
is called a correlation attack.

12.2.2.1 Necessary amount of keystream for correlation attacks
In order to determine the amount of keystream required for correlation
attacks, we need to determine how close to the observed string a wrong guess
can be. Assume that we are observing N bits of keystream, that the bias
for the correct guess is and that we are testing C incorrect candidates. We
need determine the probability of having an incorrect candidate closer to the
observed stream than the correct one. If we are interested by the setting where
seemingly valid candidates can be further tested, it also useful to determine
the average number of incorrect candidates closer than the correct one.
On average, the correct candidate agrees with the observed keystream in
(1+ )N/2 positions, but the exact number can vary around this average. Simi-
larly, on average, each incorrect candidate agrees with the observed keystream
in N/2 positions, but the exact number can vary. Thus, to answer our ques-
tions, we need to understand these variations around the average.
We ¬rst consider a simpler case o¬ering the choice between two candidates,
a correct and an incorrect one. In this case, the problem is equivalent to distin-
guishing between a perfect random generator1 and a slightly biased generator
which outputs 0 with probability (1 + )/2. For each of the two generators,
we ¬rst compute the probability that it outputs T zeros and N ’ T ones after
N measurement, taking into account the number of possible arrangements of
the zeros, this probability can be written as:
N
pT (1 ’ p)N ’T .
PT (p) =
T

1 That is, a generator which outputs each successive bit independently, with probability 1/2
of giving a 0 and probability 1/2 of giving a 1.




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 379

» 1 ’ 1 erfc »/ 2
2
0 1/2
0.1 0.5398
0.5 0.6915
1 0.8413
1.5 0.9332
2 0.9772
2.5 0.9938
3 0.9987


Table 12.1: Typical probabilities with binomial distributions


When we observe T zeros, it seems natural to compare PT (1/2) and PT ((1+
)/2) and to keep the candidate corresponding to the largest probability. In
fact, this is the best that can be done. When is small, PT (1/2) is the
larger probability for T < T0 , where T0 = (1 + /2) · N/2. To determine
the probability of success of this strategy, we need to compute sums of PT (p)
for T in interval of the form [0, T0 ] and [T0 , N ]. The problem is that there
are no closed formulas for these sums. However, when N is large, there is a
standard way of approximating the formulas. Without going into the details,
the idea is to ¬rst view T as a random variable obtained by summing N values
0 or 1 (with probabilities p and 1 ’ p). By de¬nition, T is said to follow a
binomial distribution. Then, the binomial distribution is approximated by a
Gaussian (or normal) distribution with average E = (1 ’ p)N and variance
σ 2 = p(1 ’ p)N . For a normal distribution, the probability to measure a value
smaller than E + »σ is:
»
1 2
e’u /2
Q(E + »σ) = √ du. (12.2)
2π ’∞

Moreover, with most computer algebra systems, this probability can be com-
puted using the complementary error function, usually denoted by erfc, ac-
cording to the relation:

1
Q(E + »σ) = 1 ’ erfc »/ 2 . (12.3)
2
We give some typical values in Table 12.1.
With this tool, computing the probability of successfully distinguishing the
correct and incorrect candidates becomes easy. Write the number of experi-
ments N as (2»/ )2 , i.e., let
1√ 2
·N
»= (12.4)
2
and de¬ne a threshold T0 = (1 + /2)N/2. For the incorrect candidate, the

average value of T is Ti = N/2 and σi = N /2 = »/ . As a consequence, T0 ’




© 2009 by Taylor and Francis Group, LLC
380 Algorithmic Cryptanalysis

Ti = »2 / is »σi . For the correct candidate, we have Tc = (1+ )N/2 and σc =
N/(1 ’ 2 )/2. When is small, we can conveniently ignore the di¬erence
between σc and σi and approximate T0 ’ Tc by ’»σi . As a consequence, if
we predict that the observed distribution is the correct one when T ≥ T0 and
the incorrect one otherwise, the probability of success is obtained by looking
up » in Table 12.1.
The more general case with C incorrect candidates and a correct one can
be viewed in several di¬erent ways which yield the same threshold T0 . One
option is to regroup the candidates in two groups, a bad one that occurs
with probability C/(C + 1) and a good one that occurs with probability 1/C.
Using the previous notations, the probability of observing T zeros is PT (1/2)
in the bad case and PT ((1 + )/2) in the good case. Using this option, we
can apply Neyman-Pearson lemma which says that an optimal distinguisher
should predict that the good case occured, if and only if, PT ((1 + )/2) >
C · PT (1/2).


12.2.3 Fast correlation attacks
While correlation attacks are very e¬ective against stream ciphers which
involve small LFSRs, they quickly become impractical as the size of the target
registers grows. With large LFSRs, we would like a di¬erent algorithm to
exploit the correlation without having to pay with exponential complexity in
terms of the LFSR™s length. Fast correlation attacks proposed by Meier and
Sta¬elbach (see [MS89]) o¬er this possibility.
Note that there exists a much wider variety of correlation attacks than
can be presented here. For example, conditional correlation attacks use the
fact that some speci¬c con¬gurations of output bits gives the value of some
linear combination of LFSR bits much more precisely than in the average
case. To use this fact, we need parity checks that only use bit positions where
these speci¬c con¬gurations occur. For more information, see [LMV05]. Let
us also mention iterative correlation attacks [MG90], multipass correlation
attacks [ZF06] and vectorial fast correlation attacks [GH05].

12.2.3.1 Binary symmetric channel with repetitions
In order to illustrate one essential idea needed to devise such a correlation
attack, we start with an easy case. This easy case can be formulated as a
learning problem. In this problem, we try to learn a secret sequence S of
length n. To this e¬ect, we are given access to several independent noisy
copies of S obtained through a binary symmetric channel with bias . How
many copies are needed to recover S ?
We start by solving the case of a single bit s. After obtaining one mea-
surement of s through the binary symmetric channel, we can predict s with
probability p = (1 + )/2. With many measurements, the best we can do is
to use a majority rule in order to determine our prediction. If we see more 0s




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 381

than 1s, we predict a 0 and similarly in the opposite case. In the unlikely case
where there are as many 0s and 1s, we output a random prediction. Following
the results of Section 12.2.2.1, the probability of correctly predicting 0 is thus:
N
t=(N +1)/2 Pt for odd N and
P= (12.5)
N 1
· PN/2
t=N/2+1 Pt + 2 for even N .

Due to the symmetry between the cases s = 0 and s = 1, P is also the
overall probability of correct prediction. Once again, this probability can be
computed using the complementary error function. The only di¬erence with
the previous case comes from the fact that since we have two opposite biases,

we now compute » = N 2 , i.e., rewrite Equation (12.4) without a factor
1/2. In the reverse direction, if we specify the desired probability of success,
this ¬xes » and we need to make N = (»/ )2 measurement.
This is usually summarized by saying that a correlation attack with bias
requires O( ’2 ) measurements.

12.2.3.2 A basic attack on LFSRs
With an LFSR-based keystream generator, unless the cryptosystem as a
whole presents some unexpected weakness, it is unlikely that the cryptanalysis
has access to many noisy copies of the same LFSR output. As a consequence,
we cannot directly use the above method. Instead, we need to exploit the
redundancy within the LFSR output in order to create such repeated copies of
individual bits. Note that we have encountered a similar problem in Chapter 9,
Section 9.3 with Goldreich-Levin theorem. However, with Goldreich-Levin
theorem, we could freely choose the noisy scalar products we wanted to learn.
Here, we no longer have this freedom and we need to work with noisy values
which are ¬xed by the LFSR speci¬cations.
Remember that in order to know the complete state of an LFSR it su¬ces
to learn the exact values of n consecutive bits. Since n is a relatively small
number, we can ¬rst focus on the recovery of a single bit. The key idea is the
construction of parity check equations for the LFSR. A parity check equation
is simply a systematic linear equality of the form:

xi1 • xi2 • · · · • xit = 0. (12.6)

Here, systematic means that the equation is true regardless of the initial value
of the LFSR.
Each such parity check yields a biased prediction for xi1 :

xi1 = zi2 • · · · • zit . (12.7)

Since each value zi2 , . . . , zit is an approximation for xi2 , . . . , xit with bias
and since all these approximations involve independent random choices in the




© 2009 by Taylor and Francis Group, LLC
382 Algorithmic Cryptanalysis

binary symmetric channel model, we can show that we obtain an approxima-
tion for xi1 with bias t’1 . This can be proved by repeated application of the
following lemma.


LEMMA 12.1
Given two independent approximations x and y of x and y with respective
ˆ ˆ
biases x and y , then x • y is an approximation of x • y with bias x y .
ˆˆ


PROOF We see that x • y is a correct approximation of x • y, if either
ˆˆ
x and y are either both correct or both incorrect. The ¬rst event where both
ˆ ˆ
are correct occurs, due to independence, with probability 1/4 · (1 + x )(1 + y ).
The second event occurs with probability 1/4 · (1 ’ x )(1 ’ y ). Summing the
two probabilities, we ¬nd that the total probability of success is:
1 1
· (2 + 2 · (1 +
x y) = x y ). (12.8)
4 2
Thus the bias of the combined approximation is x y.


By repeatedly applying this lemma, we obtain the bias of any sum of inde-
pendent approximations simply by multiplying the elementary biases. This
can be applied to parity check involving t bits as above, called a parity check of
weight t, to obtain an approximation of one of the involved bits with bias t’1 .
As a consequence, if we obtain a large number of parity checks all involving the
same bit xi1 , we can predict this bit with high con¬dence thanks to a majority
vote between the individual predictions. Using the results of Section 12.2.3.1,
we see that this number is of the order of ’2t+2 .
To evaluate the performance of this basic attack, it is important to take
several parameters into account. The ¬rst parameter is, of course, the amount
of time needed to evaluate all the parity checks and perform the majority
vote. However, the amount of keystream required for the attack is also very
important. The other two important parameters are the initial bias and the
length n of the register. In order to understand the relationship between these
parameters, we now study the conditions of existence of enough parity checks
to perform the basic attack. For this analysis, we do not consider algorithms
for generating the parity checks, merely their existence.
Due to the linearity of the LFSR, for each output bit, there exists ¬xed
constants, which only depend on the feedback polynomial such that:
n’1
(j)
xj = ci x i . (12.9)
i=0

This allows us to express each output bit as a linear function of the bits
contained in the initial value of the LFSR. It is convenient to group the coe¬-
(j)
cients ci in a vector c(j) and view the initial value of the LFSR as a vector X.




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 383

With these notations, we can rewrite:

xj = (c(j) |X). (12.10)

To construct parity checks of weight t, we look for sets of t vectors cj1 , . . . ,
cjt such that:
t
c(ji ) = 0. (12.11)
i=1

Indeed, by linearity of the scalar product, this implies:

t t
c(ji ) |X) = 0.
xji = ( (12.12)
i=1 i=1


To count the number of parity checks of weight t, all involving some ¬xed
position, say j1 = 0, we need to choose the remaining positions j2 to jt and
check that the sum of the n-bit coe¬cient vectors is 0. Assuming that we have
access to a noisy output of the LFSR of length N , each of the above positions
t’1
is bounded by N , and there are i=1 (N ’ i)/(t ’ 1)! unordered choices for
the t ’ 1 positions. We expect that a fraction 2’n of these choices sum to
zero. As a consequence, the average number of available parity checks for j1
is:
(N ’ 1)!
N ’1
· 2’n = n . (12.13)
t’1 2 (t ’ 1)!(N ’ t)!

With an elementary bias , we compute the probability of correct prediction
by letting:
(N ’ 1)! 2t’2
»= (12.14)
2n (t ’ 1)!(N ’ t)!

and by looking up » in Table 12.1.
Note that, given a set of parity checks allowing recovery of xj1 , it is easy
to modify it to recover xj1 +1 by shifting all indices by 1. Similarly, we can
obtain xj1 +2 , . . . , xj1 +n’1 . Of course, since the indices are shifted, the highest
required position is no longer N but can go up to N + n ’ 1. However, since
n is normally very small when compared to N , this issue can be neglected.


12.2.4 Algorithmic aspects of fast correlation attacks
The advantage of fast correlation attacks compared to basic correlation
attacks is that we no longer need to exhaustively search the 2n ’ 1 possible
initialization values. Instead, we need to ¬nd and evaluate all the necessary
parity checks. We address these algorithm issues in this section.




© 2009 by Taylor and Francis Group, LLC
384 Algorithmic Cryptanalysis

12.2.4.1 Computing parity checks

The ¬rst algorithmic problem we encounter is to ¬nd a large enough number
of parity checks for fast correlation attacks. This can be done in many ways.
However, a common pre-requisite is to express each output bit of a given
LFSR has a linear combination of the input bits, i.e., following the notation
of Section 12.2.3.2, to compute the vectors c(j) . A very simple way to do this
is to initialize n copies of the LFSR, with the same feedback register, with n
initial values containing n ’ 1 zeros and a single 1. In the copy numbered i,
the only non-zero initial bit is in position 1. By linearity, the n bits of output
obtained at time j form the vector c(j) . An easy way to speed up this process
is to use a bitslicing approach as in Chapter 5 and apply the LFSR recursion
to n-bit words, thus running the n copies in parallel. This is described in
pseudo-code as Algorithm 12.1.


Algorithm 12.1 Computing formal expression of LFSR outputs
Require: Input coe¬cients of LFSR (±0 , . . . , ±n’1 )
Require: Bound on stream length N
Allocate array A of N words on n bits
Initialize A[0 · · · n ’ 1] to 0
for i from 0 to n ’ 1 do
Set i-th bit of A[i] to 1
end for
for i from n to N do
n’1
Let A[i] = j=0 ±j A[i ’ n + j]
end for
Output array A {A[i] contains the expression of the i-th bit of LFSR output
as a linear combination of bits 0 to n ’ 1.}



Once the n-bit vectors of coe¬cients are known, we need to form parity
checks of weight t, by summing together one ¬xed vector, corresponding with
a position we want to predict and (t ’ 1) other vectors corresponding to other
positions.




Brute force. The ¬rst idea that comes to mind to solve this problem is
simply to exhaustively try all possible set of positions and keep the sets which
N ’1
sum to zero. The running time is and can be closely approximated
t’1
by N t’1 /(t ’ 1)!, by ignoring the di¬erence between N and N ’ i for small
values of i.




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 385

Birthday approach. The next idea is to remark that each parity check can
be written as an equality between the vectors corresponding to two half parity
checks in two smaller lists. Indeed, it is possible to equate the expressions of
two sums of t/2 positions each. This method has been the method of choice
for constructing parity checks for a long time. Note that each parity check
can be split into two parts in many di¬erent ways and that we need to apply
some care in order to avoid generating the same parity check many times, one
for each di¬erent split.


Quadrisection approach. Going further, it is also possible to use the al-
gorithms presented in Chapter 8 to compute parity checks. This does not
change the running time compared to the birthday approach; however, it
greatly reduces the required amount of memory. Indeed, each parity check
can be viewed as a sum of 4 parts, each part containing (almost) t/4 values.
Thus, as in [CJM02], we obtain a four set sum problem over F2n which can be
directly addressed using the algorithms of Chapter 8. As with the birthday
approach, each parity check can be split into four parts in many ways and we
need to be careful to avoid generating the same parity check many times.


Rho approach. A natural question is to wonder whether we can get rid of
memory altogether when generating parity checks. After all, a parity check is
nothing but a collision in a function that maps t/2 di¬erent positions to the
sum of their expression in terms of the LFSR initial value. Since collisions
in functions can be found by cycle ¬nding algorithms, it may appear that
such an approach can be used to ¬nd parity checks. However, it is not known
how to do this, unless we are ready to increase the value of t beyond the
value determined in Section 12.2.3.2. Since the bias of parity checks decreases
quickly when t increases, cycle ¬nding algorithms are usually not used to
construct parity checks.
If we are ready to increase t, we can, as in Section 7.5.1, proceed by using a
function de¬ned in the following way. First, any subset of t/2 positions can be
mapped to a n-bit string describing the corresponding linear combination of
bits of the initial value. This n-bit string can be viewed as an n-bit integer and
this number can in turn be interpreted to construct a new set of t/2 positions.
Any cycle ¬nding algorithm can obtain a collision for this function. However,
the collision may occur in two di¬erent places, either when mapping sets to
bitstrings or when mapping bitstrings to sets. The ¬rst type of collision yields
a parity check. Unfortunately, the second type of collision is meaningless.
The problem is that when t is too small, there are much more bitstring values
than sets of t/2 positions. As a consequence, an overwhelming fraction of the
collisions is of the useless type. If we increase t, roughly doubling it, we can
make sure that the mapping from bitstrings to sets is injective and thus that
all collisions are of the useful type. Note that doubling t makes the rest of




© 2009 by Taylor and Francis Group, LLC
386 Algorithmic Cryptanalysis

the fast correlation attack very ine¬cient. Thus, it is best to avoid this Rho
based approach.

Discrete logarithm based approach. For parity checks involving t = 3
positions, Kuhn and Penzhorn proposed in [PK95] to compute the discrete
logarithm of 1 + ±i in basis ± in the ¬nite ¬eld F2n , for i smaller than N , the
maximal amount of accessible keystream. Here, ± is a root of the feedback
polynomial of the LFSR under attack. If the discrete logarithm j is also
smaller than N , we learn than 1 + ±i + ±j = 0, which can be converted into a
parity check x0 • xi • xj = 0. An extension of this idea was recently proposed
in [DLC07].

12.2.4.2 Improving the basic attack
The basic attack described in Section 12.2.3.2 presents two main di¬culties:

• When the length n of the LFSR increases, the number of parity checks
decreases and we need to increase t and/or N to implement the attack.
This is due to the 2’n factor in Equation (12.13).

• When t increases, the number of necessary parity checks increases ex-
ponentially with t and evaluating these parity checks on the observed
keystream takes a long time.

It is possible to reduce the impact of these two di¬culties by using an
approach similar to the method used for the Goldreich-Levin theorem, in
Section 9.3. We isolate a subset S0 of n0 bits in the initial state of the
LFSR. These bits are going to be addressed separately using an exhaustive
search. As a consequence, when counting and constructing parity checks, we
only cancel the n ’ n0 other bits. For each acceptable parity check, we also
keep the linear expression specifying the contribution of the bits in S0 . To
count these modi¬ed parity checks, we replace the factor 2’n by 2’(n’n0 ) in
Equation (12.13). This increases the number of available parity checks and
allows to reduce t and/or N . Note, however, that since the n0 bits are ¬xed
and cannot be shifted, we cannot use shifted parity checks this time and need
to compute them independently n ’ n0 times, once for each of the bits that
need to be recovered.
Without using the Walsh transform, recovering each of the n’n0 remaining
bits would require to evaluate and count the number of predictions 2n0 times.
Instead, we start by regrouping the parity checks that share the same linear
expression on S0 . Then, within each group we partially evaluate these parity
checks and keep the di¬erence between the number of 0 and 1 values. This
di¬erence is stored in an array of 2n0 elements in the position given by the
binary encoding of the linear expression of the group. After applying a Walsh
transform to this array, position i contains the di¬erence between the numbers
of 0 and 1 values for the corresponding guess of the bits of S0 . As in the




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 387

analysis of Section 12.2.2.1, we have 2n0 ’ 1 bad candidates and a single
good candidate. With well-chosen parameters, the position corresponding to
the good candidate contains the largest value in absolute value. Learning
this position gives the value of the n0 bits in S0 . Moreover, the sign of this
largest value determines the value of the bit for which the parity checks were
constructed (see [CJM02] for more details).




12.3 Algebraic attacks
In addition to the binary symmetric channel model, the ¬ltered generator is
also well-suited to a modelization by a system of algebraic equations. Indeed,
if in the basic expression:

zt = f (xt , xt’δ1 , · · · , xt’δk’1 ), (12.15)

we replace each bit of x by its linear expression in terms of the initial values
x1 , . . . xn of the LFSR resulting from Algorithm 12.1, we obtain a transformed
equation:
zt = ft (x1 , x2 , · · · , xn ). (12.16)

Assuming a known plaintext attack, the cryptanalyst knows the value of zt
and wants to recover the initial state. At this point, two important remarks
are needed. First, we can collect a large number of equations, one for each
bit of known plaintext. Second, the degree of each polynomial ft is the same
as the degree of f , since ft results from a linear change of variables.
As a consequence, the ¬ltered generator can be vulnerable to algebraic
attacks when its parameters are badly chosen. It is especially important not
to have a low degree for f . For example, let us detail what happens when
f can be expressed as a quadratic polynomial. In that case, each equation
ft = zt can be rewritten as:

n n’1 n
(t) (t)
zt = ±i xi + βi,j xi xj . (12.17)
i=1 i=1 j=i+1


Using (re)linearization techniques, we can de¬ne yi,j as a new unknown
with value equal to xi xj and substitute yi,j in the above. This yields a linear
equation in n(n + 1)/2 unknowns. Collecting n(n + 1)/2 (or a few more) such
equations, it su¬ces to solve a moderately large linear system of equations to
recover the initial state of the LFSR. Equivalently, we can use Algorithm 11.4
on this set of algebraic equations using a degree limit of 2.




© 2009 by Taylor and Francis Group, LLC
388 Algorithmic Cryptanalysis

12.3.1 Predicting an annihilator polynomial
Alternatively, one can also look for linear combinations of the zt such that
the contribution of all terms yi,j vanishes. Unless we are especially unlucky,
there are linear terms xi left and we obtain linear equations in the n ini-
tial unknowns. At ¬rst, this seems essentially equally costly as applying
Algorithm 11.4. However, it was shown recently by Rønjom and Helleseth
in [RH07] that this alternative can be made extremely e¬cient. In order to
see why, let us brie¬‚y recall Wiedemann™s algorithm from Chapter 3. In this
algorithm, there are two steps, the ¬rst one determines the minimal polyno-
mial of some linear sequence and the second one uses this minimal polynomial
to invert a linear system. In the case of an algebraic attack against ¬ltered
LFSR, a multiple of the minimal polynomial can be predicted in advance
and the ¬rst step of Wiedemann™s algorithm can be skipped. Tweaking in a
modi¬ed second step, it is then possible to directly derive linear equations in
the n initial unknows, without performing linear algebra on a large system of
equations.
To understand the method of [RH07], it is best to view the current LFSR
state as a element of F2n and the advance step of the LFSR as a multiplication
by some element ± in F2n . If the initial state is X0 , the state at time t is
Xt = ±t X0 . With this notation, it is also possible to write the output bit xt
as a trace of βXt for some constant β in F2n . We recall that the trace of X
in F2n is de¬ned as:
n’1
i
X2 .
Tr(X) = (12.18)
i=0
Let us now focus on an arbitrary quadratic term xt xt+δ . We can write:
xt xt+δ = Tr(βX0 ±t ) · Tr(βX0 ±t+δ )
n’1 n’1
t 2i i
(βX0 ±t+δ )2
·
= (βX0 ± )
i=0 i=0
n’1 n’1
i
+2j j i
+2j t
(βX0 )2 ±2 δ
· (±2
= ). (12.19)
i=0 j=0

Thus, each quadratic term is a linear combination with complicated but
¬xed coe¬cients of powers of t. This powers are of the form ± t , where
has at most two non-zero bits in its binary decomposition. We can regroup
the values ± by considering the di¬erence ∆ = i ’ j modulo n and form
polynomials:
n’1
i
+2(i+∆) mod n
X ’ ±2
P∆ (X) = . (12.20)
i=0
This polynomial is invariant under the action of Frobenius, thus, it has coe¬-
i
cients in F2 . Note that P0 (X) regroups the values ±2 , thus it is the feedback
polynomial of the considerered LFSR.




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 389

Following [RH07] and proceeding as with Wiedemann™s algorithm, we see
i i+∆
that P∆ annihilates all powers of values ±2 +2 in Equation (12.19). As a
consequence, the product of all P∆ for ∆ = 0 annihilates all non-linear terms
of Equation (12.19). Unless we are extremely unlucky, the linear terms are not
canceled and it su¬ces to solve a system of n linear equations in n unknowns
to recover the initial state X0 .
For monomials of higher degree d, it is possible to proceed similarly, by
considering products of term X ’ ± for values of with d non-zero bits in
their binary decomposition.




12.4 Extension to some non-linear shift registers
In order to avoid many of the weaknesses of linear feedback shift registers,
non-linear feedback shift registers were recently considered for cryptographic
uses. Non-linear shift registers (NLFSR) are very similar to LFSR, the only
di¬erence is that the new bit entering the register at each step is computed
using a non-linear function instead of a linear one. One di¬culty with these
registers is to make sure that the period is long enough for all initial values.
To overcome this di¬culty, several approaches are possible. One option is
to use registers of small size, to make sure through exhaustive search that
all these registers have a long enough periods and to combine several into the
keystream generator. Another option is to consider registers of a speci¬c form
that allow the period to be computed.
At ¬rst, it seems that algebraic and correlation attacks cannot apply to
non-linear shift registers. Indeed, concerning algebraic attacks, we see that
the iteration of the low degree feedback function quickly produces equations
of very high degree, which cannot be solved by the techniques of Chapter 11.
Concerning correlation attacks, getting biased prediction of inner bits of the
NLFSR remains easy. However, without linearity, we can no longer construct
parity checks.
Yet, in this section, we show that the correlation attacks presented in this
chapter can be directly applied to a speci¬c subclass of non-linear shift register
based keystream generators. This subclass is formed of keystream generators
based on a non-linear shift register whose output is a linear combination of
the registers cells.
To make things more precise, we can view such a keystream generator as
built around an inner sequence xi de¬ned from n initial bits x0 to xn’1 ,
together with a recursion formula:

xn+i = F (xi , xi+1 , . . . , xi+n’1 ), (12.21)

for some e¬ciently computable non-linear function F , for example a low-




© 2009 by Taylor and Francis Group, LLC
390 Algorithmic Cryptanalysis

degree function. The output zi at time i is obtained by applying a ¬xed linear
function L to the inner sequence. Assuming, for simplicity of exposition, that
xi is always used when computing zi , we write:

zi = xi • L(xi+1 , . . . , xi+n’1 ). (12.22)

Note that, depending on the choice of L, this construction can be trivially
broken. For example, if L = 0, zi simply is a copy of xi . In this case, it su¬ces
to collect n consecutive output bits to break the construction. Indeed, using
Equation (12.21), the rest of the sequence can then be predicted.
We now show that for any choice of L, this NLFSR based generator can
be vulnerable to correlation and algebraic attacks. The key argument is that
for each inner bit of the NLFSR, we have two di¬erent equations, the non-
linear Equation (12.21) and a linear equation which is obtained by rewriting
Equation (12.22) as:

xi = zi • L(xi+1 , . . . , xi+n’1 ). (12.23)

Moreover, once the keystream z has been observed, we can substitute Equa-
tion (12.23) into itself for various values of i. As a consequence, each inner
bit xi can be expressed as a linear equation Li (x0 , . . . , xn’1 ) of the initial
bits. The constant term in Li is obtained by linearly combining bits of the
output stream z. Substituting each bit of x by the corresponding linear ex-
pression in Equation (12.21) produces a low degree algebraic equation in the
unknown x0 , . . . , xn’1 . Since there are many such equations, we can use the
methodology of algebraic attacks.
Similarly, if we take the best linear approximation by the inputs of F and
its output, then after replacing in this linear expression each value of x by
its linear expression, we obtained a noisy linear equation in the bits of initial
state. Of course, we can apply the correlation methodology to the resulting
set of noisy linear equations.




12.5 The cube attack
A third class of attack against the ¬ltered generator was recently proposed
by Dinur and Shamir [DS09]. It is called the cube attack, it is a new way to
use a well-known identity on multivariate polynomials over F2 and transform
it into an attack against some secret-key ciphers. In particular, this can be
applied to the ¬ltered generator.
To explain the underlying identity, let us start with a univariate polynomial
F over F2 . When x is in F2 , we know that x2 = x. As a consequence, the
polynomial f obtained by keeping the remainder of the Euclidean division of




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 391

F by f induces the same function as F on F2 . Moreover, f is of degree at
most one, i.e., f (x) = ax • b. We now see that:

f (0) • f (1) = a. (12.24)

In fact, by summing on the two points 0 and 1, we are taking a derivative
of f and we can recover the coe¬cient of the term x in f . Note that this
is based on the same basic equation as di¬erential cryptanalysis, but views
things from a di¬erent angle.
With more unknowns, we proceed similarly. Let F (x1 , · · · , xk ) be a poly-
nomial in k unknowns. After accounting for the identities x2 = xi for each
i
unknown, we obtain a polynomial f , whose partial degree in each unknown is
at most one. Factoring x1 out where possible, we can write:

f (x1 , · · · , xk ) = a(x2 , · · · , xk ) · x1 + b(x2 , · · · , xk ).

As a consequence, the function f (0, x2 , · · · , xk ) • f (1, x2 , · · · , xk ) is equal to
a(x2 , · · · , xk ).
The key idea of the cube attack is to repeat this process until we reach
easy-to-solve linear equations. For example, to repeat the approach on x1 , x2
and x3 , we ¬rst write:

f (x1 , · · · , xk ) = a7 (x4 , · · · , xk ) · x1 x2 x3 • a6 (x4 , · · · , xk ) · x1 x2 •
= a5 (x4 , · · · , xk ) · x1 x3 • a4 (x4 , · · · , xk ) · x1 •
= a3 (x4 , · · · , xk ) · x2 x3 • a2 (x4 , · · · , xk ) · x2 •
= a1 (x4 , · · · , xk ) · x3 • a0 (x4 , · · · , xk ).

Then, we remark that:

a7 (x4 , · · · , xk ) = f (0, 0, 0, x4 , · · · , xk ) • f (0, 0, 1, x4 , · · · , xk ) •
= f (0, 1, 0, x4 , · · · , xk ) • f (0, 1, 1, x4 , · · · , xk ) •
= f (1, 0, 0, x4 , · · · , xk ) • f (1, 0, 1, x4 , · · · , xk ) •
= f (1, 1, 0, x4 , · · · , xk ) • f (1, 1, 1, x4 , · · · , xk ).

From this example, the reader can easily see that when we repeat the process
with t unknowns, we need to decompose f into 2t parts and to sum over 2t
points. The ¬rst t coordinates of these points cover all the possible values
that can be achieved by setting each coordinate to 0 or 1. These points are
the vertices of a hypercube in the t-dimensional space.
It is shown in [DS09] that for any polynomial f and any set of unknows S,
we can write:

f (x1 , · · · , xk ) = xi fS (x1 , · · · , xk ) + q(x1 , · · · , xk ),
i∈S

where each monomial in q misses at least one unknown from S and fs is a
polynomial of the variables not in S. When summing on a hypercube induced




© 2009 by Taylor and Francis Group, LLC
392 Algorithmic Cryptanalysis

by S, we obtain a value for the polynomial fS . In [DS09], fS is called the
super-polynomial of S in f . Here, we instead called it the derived function of
f at S. The basic idea of cube attacks is very similar to a generalization of
di¬erential cryptanalysis called higher order di¬erential cryptanalysis [Knu94,
Lai94, MSK98]. However, it emphasizes a di¬erent attack scenario.


12.5.1 Basic scenario for the cube method
In order to use the cube attack, we need to be able to sum many evaluations
of the same polynomial along some hypercube. As a consequence, we need a
scenario where the attacker controls at least part of the unknowns entering
the polynomial. Since this process of summing produces values for a lower
degree polynomial derived from the ¬rst, it is natural to use this lower degree
polynomial to extract information about other unknowns. For this reason,
in [DS09], it is proposed to consider systems involving polynomials where
some unknowns are controlled by the attackers and some unknowns contain
secret key material. To simplify the extraction of information, the easiest is
to choose the hypercube we are summing on, in such a way that the derived
polynomial is linear in the secret unknowns. If this can be done, we obtain a
linear equation in the secret key. Repeating this with several hypercubes, we
obtain a linear system of equations, from which the secret key can hopefully
be recovered.
Since the cube attack is very recent, its scope of application is not yet
known. However, this attack seems to have a lot of potential. One very
surprising application is already given in [DS09]. This application allows to
break cryptosystems, which satisfy some speci¬c criteria, in a black box way,
without even knowing the full details about these cryptosystems. The idea is,
in an initial phase, to choose hypercubes at random and look at the derived
function that we obtain, when running test copies of the system for which
the attacker controls the secret key. If the derived function is, or seems to
be, constant, the hypercube contains too many unknowns. If the derived
function is complicated, the hypercube contains too few unknows. If the
derived function is a simple linear function of the secret key, then we have
achieved the right balance and found one linear equation to be used later in
an attack phase. Note that there are several ways to test the linearity of the
derived function. One option is to check that if we add any of the secret key
unknowns to the hypercube variables, the resulting derived function becomes
a constant.
Among the possible applications of the cube attacks, it is proposed in [DS09]
to consider an extremely large ¬ltered LFSR (on 10,000 unknowns) which
mixes IV and secret key variables in a complex way. The only real constraint
is that the degree of the output in terms of the variables is at most 16. The
cube attack successfully breaks this incredibly large ¬ltered LFSR using 220
di¬erent IV values and a single bit of output for each. Note that this ¬ltered
LFSR is completely of range for more conventional correlation or algebraic




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 393

attacks.




12.6 Time memory data tradeo¬s
In Chapter 7, we describe some time-memory tradeo¬s against block ci-
phers. Since stream ciphers behave quite di¬erently from block ciphers, it is
interesting to revisit these attacks in this di¬erent context. Let us start by
listing the relevant di¬erences between the two kinds of ciphers. First, instead
of combining a key with a block of plaintext to get a block of ciphertext as
block ciphers do, stream ciphers generate a pseudo-random sequence from an
inner state, usually obtained by mixing together a key and an initialization
value. During the generation of the pseudo-random sequence, the inner state
takes many di¬erent consecutive values, where each value is obtained from
the previous one through an update function. At the same time, the output
sequence is obtained by applying an output function to each inner state. In
order to cryptanalyze the stream cipher, it su¬ces to recover one value of the
inner state at any point in time. Indeed, this su¬ces to predict the sequel
of the pseudo-random stream. Moreover, most of the time, the update func-
tion is in fact reversible and recovering a single inner state allows to ¬nd the
complete sequence of inner states. From this point, depending on the precise
cipher being considered, it might also be possible to recover the encryption
key. Thus, with a stream cipher, the goal of an attacker is slightly di¬erent;
instead of having to directly ¬nd the key, it su¬ces to recover one of the many
inner states taken during the encryption process.
From this remark, it is already possible to devise a simple attack that works
better than exhaustive search. To do this, we start by de¬ning a function F ,
which takes as input an inner state and outputs a pre¬x of the corresponding
pseudo-random stream. For the size of the pre¬x, we choose either the size
of the inner or something slightly larger. Clearly, from some output of the
stream, we can deduce the image by F of the corresponding inner states of
the stream. As a consequence, if we know how to invert F we can recover
the inner state and break the cipher. Moreover, if we know how to invert F
only for a small fraction of the states, this already yields an attack. Indeed,
by observing the output of the stream cipher, we cannot only compute F for
the initial inner state, but also for most of the subsequent inner states taken
during the encryption process. In fact, we obtain F for all of these inner
states except the few ¬nal ones, for which we do not have enough keystream
available. To simplify the analysis of the attack, one generally assumes that
each inner state gives a new independent chance of inverting F . Of course,
successive states are not really independent, but from the cryptanalyst point-
of-view, the simpli¬ed modelization is a good rule of thumb. In this model,




© 2009 by Taylor and Francis Group, LLC
394 Algorithmic Cryptanalysis

this basic attack, independently discovered by Babbage and Golic, works as
follows. First, during a precomputation step, we compute F on some fraction
± of possible inner states and store these values in a sorted table. Then
to recover an unknown state, we obtain the corresponding pseudo-random
sequence. From this sequence, we deduce N di¬erent values of F for N of
the inner states seen during the construction of the pseudo-random sequence
(whose length is a little above N ). If ±N ≈ 1, we expect that one among the
F values can be found in our table and we deduce the corresponding inner
state.
This simple attack illustrates the fact that with stream ciphers, we have
many available targets instead of a single one and this helps the cryptan-
alyst. This was further extended by Biryukov and Shamir in [BS00] into
time/memory/data tradeo¬ attacks. By playing with three parameters in-
stead of two, it is possible to obtain a better compromise. For example,
assuming that the space of inner states contains N values, it is possible to
recover one state in time N 2/3 using output sequences of length N 1/3 and
memory of the order of N 1/3 . This is clearly more e¬cient than Hellman™s
usual compromise with time and memory N 2/3 . For more details, we refer
the reader to [BS00].




© 2009 by Taylor and Francis Group, LLC
Attacks on stream ciphers 395



Exercises
1h . Assume that the same one-time pad key is used twice to encrypt two
texts in English, M and M . Show how both texts can be recovered
from the encrypted strings.

2. Work out a scenario where the malleability of messages encrypted by
the one-time pad yields a major security hole.

3h . Consider a ¬ltered LFSR where the input bits to the non-linear function
f are consecutive. Assume that f maps k bits to one. Show that
the best linear correlation between input and output can be improved
by considering several consecutive output bits. Try this approach for
randomly selected functions.

4. Consider a Ge¬e generator based on three LFSRs: a control LFSR C
on n bits and two outputs LFSR A and B on m bits each. How much
does exhaustive search on this generator cost? Given the initial value of
C, how can you e¬ciently recover the initial values A and B? Conclude
that n should not be too small.

5h . Continuing with the Ge¬e generator, assume that n is too large for
exhaustive search. Construct a correlation attack and recover A and
B. How much keystream is needed? What happens if A and B use
the same feedback polynomial? Once A and B are known, how do you
reconstruct C?

6. Recompute the necessary amount of keystream N required for a corre-
lation attack on a ¬ltered LFSR with bias if we want the correct can-
didate to appear among the list of L candidates generating sequences
closest to the output stream.

7h . Write a program for computing parity checks. Try the birthday based
and the quadrisection approach. Let n denote the number of bits taken
in account in the parity checks. What values of n can you achieve with
each approach?

8. Following Section 12.3.1, consider a LFSR on n bits and the output
sequence x • y • xy, where x and y are two consecutive bits, then
construct a linear combination of output bits where all quadratic terms
vanish.

9h . This exercise considers non-linear shift registers on n bits. Show that
there are 2n possible shift registers of this sort. Study the possible struc-
ture of the oriented graph whose vertices are the inner states and whose
arrows are obtained by advancing the NLFSR. Which NLFSR would




© 2009 by Taylor and Francis Group, LLC
396 Algorithmic Cryptanalysis

you keep for cryptographic purposes? Write a program to enumerate
the possibilities and select the good ones.

Finally, here are possible implementation projects on this topic:
i. Write a complete set of programs for fast correlation attacks. Com-
pare the performance of using the Walsh transform or not during the
evaluation of parity checks. Look up and implement some advanced cor-
relation attack from the literature (conditional, iterative, . . . correlation
attacks).
ii. Generalize the method of Section 12.3.1 to polynomials of arbitrary
degree. Write a program that computes, given a feedback polynomial
and output function for a LFSR, a linear combination of output bits
such that the non-linear terms vanish. Show that for some badly chosen
output functions the method can work with short output sequences.
Construct such bad examples.
iii. Write a working implementation of the cube attack as presented in [DS09].
Compare this to other attacks against the ¬ltered generator.




© 2009 by Taylor and Francis Group, LLC
Chapter 13
Lattice-based cryptanalysis


Lattice-based cryptanalysis uses lattice reduction algorithms in order to dis-
cover short linear relations between integer vectors that give a useful insight
into various cryptographic systems. They can roughly be classi¬ed into two
main classes: direct attacks, where the cryptanalytic problem at hand can
directly be expressed as a lattice reduction problem and Coppersmith™s based
attacks, which rely on several algorithms which can recover small roots of
polynomial systems of equations using lattice reduction.
Due to the complexity of lattice reduction algorithms, a frequent way of
analyzing lattice-based attacks consists of assuming the existence of a lattice
reduction oracle that solves some lattice problem such as the shortest vector
problem (see Chapter 10). Of course, no such oracle exists; however, it is a
very convenient heuristic based on the good practical performance of lattice
reduction algorithms. In some cases, the analysis can be re¬ned to re¬‚ect
the proved properties of available lattice reduction algorithm. This can be
necessary either to obtain a proved attack or when faced with lattice reduction
problems that are out of reach of current lattice reduction programs.




13.1 Direct attacks using lattice reduction
13.1.1 Dependence relations with small coe¬cients
One very important family of lattice reduction attacks makes use of lattices
to ¬nd linear relations with relatively small coe¬cients between numbers or
vectors. These relations may be considered over the integers or in some modu-
lar ring. Before addressing this problem, it is nice to look at its combinatorial
properties and determine for which instances we should expect small linear
relations.

13.1.1.1 Combinatorial properties
When searching short linear dependency between elements of a family of
numbers or vectors, two cases occur. We can either be looking for an “abnor-
mally” small relation whose existence is guaranteed by the speci¬c properties


397
© 2009 by Taylor and Francis Group, LLC
398 Algorithmic Cryptanalysis

of the family and may, for example, re¬‚ect the existence of a trapdoor in a
cryptographic scheme, or we may simply desire an ordinary relation whose
existence is generically guaranteed. The goal of this section is to analyze the
expected size of such generic relations.


LEMMA 13.1
Let v1 , . . . , vn be a family of vectors with integer coe¬cients in t coordinates
with t < n. Let M denote an upper bound for the absolute values of all
coe¬cients of the various vi s. Then there exists a non-zero integer relation
n
»i vi = 0, (13.1)
i=1

such that max |»i | ¤ B, where B is given by
log2 M + log2 n + 1
log2 B = t · (13.2)
n’t

PROOF Consider all possible linear combinations
n
µi vi (13.3)
i=1

with 0 ¤ µi < B. There are B n such relations and the resulting vectors
have all their coordinates upper bounded by nBM . Since there are less than
(2nBM )t such vectors, two distinct relations have to compute the same value,
as soon as (2BM )t ¤ B n , which amounts to the relation given in Equa-
tion (13.2). This collision gives
n n
µi vi = µi vi , (13.4)
i=1 i=1

with 0 ¤ µi < B and 0 ¤ µi < B. After regrouping both sides by subtraction,
we ¬nd a non-zero relation as per Equation (13.1).
Note that in this proof, we do not invoke the birthday paradox, because the
various linear combinations are related and highly dependent.

Of course, for a speci¬c instance of the problem, the shortest dependence
relation (say w.r.t. the Euclidean length) can be much shorter than the generic
bound stated in the above lemma. A similar lemma exists in the case of
modular relations:


LEMMA 13.2
Let v1 , . . . , vn be a family of vectors with integer coe¬cients in t coordinates
with t < n. Let N be an integer. Then there exists a non-zero integer relation




© 2009 by Taylor and Francis Group, LLC
Lattice-based cryptanalysis 399

modulo N :
n
»i vi = 0 (mod N ), (13.5)
i=1

such that max |»i | ¤ min(B, N/2), where B is given by

B = N t/n . (13.6)


PROOF As before, consider all possible linear combinations
n
µi vi (mod N ) (13.7)
i=1

with 0 ¤ µi < B. There are B n such relations and that the resulting vectors
are expressed modulo N . Thus, there are less than N t possible vectors and
two distinct relations have to compute the same value, as soon as N t ¤ B n ,
which is equivalent to Equation (13.2). This collision gives
n n
µi vi = µi vi (mod N ), (13.8)
i=1 i=1

with 0 ¤ µi < B and 0 ¤ µi < B. Once again, this yields a non-zero relation.
Moreover, using reduction modulo N , this relation can be expressed with
coe¬cients in the range [’N/2, N/2].

13.1.1.2 Lattice reduction based search for short relations
Given a family of integer vectors on t coordinates v1 , . . . , vn as above, we
now form the integer lattice generated by the columns of the following matrix:
« 
Kv1 Kv2 · · · Kvn
0 ··· 0 ·
¬1
¬ ·
1 ··· 0 ·,
¬0
L(v) = ¬ (13.9)
·
¬. . .. .·
. . ..
. . .
0 ···
0 1

where K is a constant that is determined in the sequel.
Any linear relation as per Equation (13.1) can be mapped into a lattice
point:
(0 · · · 0»1 »2 · · · »n ),
whose ¬rst t coordinates are equal to 0. Conversely, any lattice point starting
with t zeros corresponds to a linear relation. As a consequence, the main
restriction when choosing K is that this constant should be large enough to
ensure that the ¬rst vector in a reduced basis of the lattice has zero compo-
nents in its upper part consisting of the ¬rst t coordinates that correspond




© 2009 by Taylor and Francis Group, LLC
400 Algorithmic Cryptanalysis

to the contributions of the vi s. Indeed, when the contribution of the ¬rst t
coordinates is not all zero, then the Euclidean norm of the lattice vector is at
least K.
To make the choice of K more precise, it is important to specify the lattice
reduction process that is used. With a lattice reduction oracle, it su¬ces to
choose a value K larger than the norm of the expected linear relation. Using
the expected size of generic relation as in the previous section, it su¬ces to
take:
√ 1
K= n(2M ) n’t . (13.10)


When using the L3 lattice algorithm, K should be multiplied by a safety
coe¬cient 2n/2 , to account for the fact that the ¬rst vector of the reduced
basis may be larger than the shortest lattice vector by a factor up to 2n/2 .
With this choice for K, the lattice reduction process outputs short vectors
whose upper part is guaranteed to be zero and these vectors clearly corre-
spond to linear relations with small coe¬cients. As pointed out above, the
coe¬cients of the linear relation appear as coordinates of rank t + 1, · · · , t + n
of the output vector.

13.1.1.3 Generalization to approximate relations

In some cases, instead of searching for exact relations between the vectors,
n
we may look for approximate relations, where i=1 »i vi is no longer 0 but a
vector of small norm. A typical application of approximate relations occurs
when the values of the vectors vi s correspond to approximations of vectors
with real-valued coordinates.
In the case of approximate relations, one option is to choose K = 1. With
this choice output vectors are short but there is no special reason to obtain
an upper part formed of zeros. This clearly corresponds to approximate de-
pendencies with small coe¬cients.
Other choices for K are possible and the choice of K allows to control the
balance between the quality of the approximate relation and the size of the
coe¬cients. When K is larger, the linear relation is more precise but the
coe¬cients are larger.

13.1.1.4 Modular relations

Another possible extension is to consider linear relations which are no longer
exact but, instead, hold modulo some integer N . In other words, we need to
modify the lattice that is considered to deal with modular relations. Assuming
that N is large enough, the answer is very simple and consists of adding to
the lattice basis a few columns that ensure modular reduction. This is done




© 2009 by Taylor and Francis Group, LLC
Lattice-based cryptanalysis 401

by considering the lattice generated by the columns of the following matrix:
« 
Kv1 Kv2 · · · Kvn KN I
0 ··· 0
¬1 0·
¬ ·
1 ··· 0
¬0 0·
¬ ·
¬. . .. .
. . ..
·
. . . 0
0 ··· 1
0 0

where I is a t—t identity matrix. Thanks to the additional columns, a modular
n
relation i=1 »i i = 0 (mod N ) corresponds to a lattice point given by the
vector:
(0 · · · 0 »1 »2 · · · »n ).

Of course, this is not su¬cient, we also need to make sure that such a
vector corresponding to a modular linear relation may appear as the result
of lattice reduction. To discuss this question, it is important to remark that
the above lattice includes short vectors which are not related to the existence
of any linear relation. These vectors can be obtained by multiplying any
of the ¬rst n vectors in the initial basis by a factor N and then by reducing
upper part mod N . This process yields vectors whose coordinates are all zero,
with the exception of a single coordinate with value N . Applying the above
construction to all vi s, we get a family of n vectors of norm N that are
mutually orthogonal. Experiments show that, if N is too small, this family
appears in sequence as the ¬rst output vectors of a reduced basis, and thus
completely masks any useful information about linear relations.
Clearly, assuming a lattice reduction oracle, we need to check that the
short vector that corresponds to the linear relation coming from Lemma 13.2
is shorter than these systematic vectors of norm N . √ Since the short vector
associated with the linear relation has norm at most nB, we need to check

that nN t/n < N or equivalently that:
n
N > n 2(n’t) . (13.11)

If the reduction is done using L3 , an additional factor of 2n/2 is required to
guarantee that the ¬rst vector of the reduced basis corresponds to a genuine
linear relation.
An interesting special case is the binary case where N = 2. In this case,
any linear relation with four or more non-zero coe¬cients yields a vector with
norm greater than the systematic vectors of norm N . Thus, we cannot hope
that a lattice reduction approach ¬nds a linear relation with more than 3 non-
zero coe¬cients. Moreover, relations with 3 coe¬cients or fewer can easily be
found using exhaustive search or birthday based algorithms. For this reason
no known lattice reduction algorithms are used for attacking binary problems,
such as ¬nding short codewords in linear codes.




© 2009 by Taylor and Francis Group, LLC
402 Algorithmic Cryptanalysis

13.1.2 Some applications of short dependence relations
13.1.2.1 Knapsack problems
Solving knapsack problems is an interesting special case of ¬nding linear
relations between given numbers. Historically, breaking knapsack problems
was one major application of lattice based cryptanalysis in the 80s and 90s.
In addition to this historical role, solving a knapsack problem requires more
involved techniques than the general case, because the expected relations have
all their coe¬cients in {0, 1}. In cryptographic scenarios, we know that such
a relation exists between the given elements of the knapsack a1 , . . . , an and
n
the target sum s = i=1 i ai . Moreover we know that the Euclidean norm

of the vector that encodes this relation is ±n, where ± is the proportion
of ones in the relations. Depending on the considered cryptosystem, ± may
or may not be known to the cryptanalyst but, in many practical examples it
is a part of the cryptographic system itself. Furthermore ± is an important
parameter when trying to analyze the performances of lattice-based attacks
against knapsack problems. When each coe¬cient of is chosen uniformly at
random, the number of ones is close to n/2 with high probability and we can
assume that ± = 1/2.
Another important parameter with knapsack problems is their density d
de¬ned as:
n
d= . (13.12)
log2 (maxi ai )
Since this parameter is the ratio between the number of elements in the knap-
sack and the number of bits in each element, it controls the expected size of
“parasitical” linear relations of short norms between the ai , with coe¬cients
not restricted to {0, 1}.
It was shown in [LO85], that, when the density is low, parasitical vectors are
large and, thus, the shortest vector gives the solution to any random instance
of the knapsack problem. If we use the lattice from Equation (13.9), for
vectors on one coordinate only, and if we assume that shortest lattice-vectors
can be e¬ciently computed (even if this is not totally accurate), then low
density means d < 0.6463. Later in [CJL+ 92], this condition was improved to
d < 0.9408. In order to reach that bound, either of the two following lattices
(generated by columns) can be used:
« 
Ka1 Ka2 · · · Kan ’Ks « 
Ka1 Ka2 · · · Kan Ks
¬ n + 1 ’1 · · · ’1 ’1 ·
0 · · · 0 1/2 ·
¬ ’1 n + 1 · · · ’1 ’1 · ¬ 1
¬ ·¬
·
1 · · · 0 1/2 · .
·¬ 0
¬
¬. . .. . . ·¬
¬. . .. . ·¬ . ·
. .. . .·
. . . . · . . .. .
. . . .
¬
 ’1 ’1 · · · n + 1 ’1 
0 · · · 1 1/2
0
’1 ’1 · · · ’1 n + 1

Before we close this section, let us warn the reader on the meaning of the
low-density attacks. The inequality d < 0.9408, provides a provable guarantee




© 2009 by Taylor and Francis Group, LLC
Lattice-based cryptanalysis 403

that, from a shortest vector for a lattice computed from the problem one
can, with high probability, solve a random instance of the original knapsack
problem. Note, however, that in some cases, even lattice with larger density
can sometimes be solved using the approach. Moreover, the critical density
given here is for knapsack problems where the solution is balanced, with about
half zeros and half ones. For knapsacks with an unbalanced solution, i.e., when
the parameter ± given above di¬ers from 1/2, the critical density can be much
higher.

13.1.2.2 Polynomial relations
Finding the minimal polynomial of a real algebraic number x of degree d
corresponds to searching a linear dependency between 1, x, x2 , . . . , xd . Since
we are working with integer lattices, we choose a large integer K and we try
to ¬nd an approximate relation between the closest integers to K, Kx, Kx2 ,
. . . , Kxd . More precisely, we reduce the following lattice, given by columns:

K Kx Kx2 · · · Kxd
« 
0 ···
¬1 0 0·
¬ ·
0 ···
¬0 1 0·
¬ ·
1 ···
¬0 0 0·
¬ ·
¬. . . .·
..
. . . .
.
. . . .
···
0 0 0 1

The ¬rst vector of the reduced lattice can be written as:

( a0 a1 · · · ad ).

Since we wish to interpret a0 , . . . , ad as the coe¬cients of the minimal poly-
nomial of x, i.e., we want to conclude that a0 + a1 x + a2 x2 + · · · + ad xd = 0.
The most important parameters here are K and d. If d is smaller than the
degree of the minimal polynomial of x then this technique cannot succeed.
Likewise, if K is too small, then it cannot succeed either. To see this, assume
for example that x is between 0 and 1 and apply Lemma (13.1): this yields a
linear combination of the elements on the ¬rst row of the above matrix with
coe¬cients bounded above by B, where B satis¬es:
log K + log d + 1
log B =
n’1
If K is small, this relation is much more likely to appear as an output to lattice
reduction algorithms than the one corresponding to the minimal polynomial.
In fact, a reasonable choice is to take K ≥ (max |ai |)2d . In other words, K
should be much larger than the expected size of the coe¬cients of the minimal
polynomial. If d is not exactly known, for example if we only know an upper
bound on the degree of the minimal polynomial of x, then the following trick
can be applied: take the ¬rst two or three vectors appearing in the output




© 2009 by Taylor and Francis Group, LLC
404 Algorithmic Cryptanalysis

reduced lattice, transform them into polynomials and compute their GCD. If
K was large enough the minimal polynomial of x is usually obtained.
It is very important to know that the heuristic procedure we just described
can give positive results, i.e., it can ¬nd a minimal polynomial, but cannot
give a negative result. Moreover, if the method succeeds, we have a candidate
for the minimal polynomial; we can then check this candidate either up to
arbitrary large precision or formally if the situation permits.

13.1.2.3 NTRU lattices
Another kind of lattices has been an important source of progress for practi-
cal lattice reduction, lattice obtained from the NTRU cryptosystem. Without
entering into the details of this cryptosystem, let us simply state that the re-
sulting lattices have some very speci¬c properties. In fact, they are very simi-
lar to knapsack lattices, except that we are considering modular knapsacks on
vectors. Depending on the exact version of the NTRU system, these knapsack
instances contain 0, 1 and also ’1 values; moreover, the exact number of 1
and ’1 values is usually known. The modulo used in these knapsacks is very
small, for example, 64 is a typical value. In addition, there is some extra
structure coming from the fact that the vectors in this modular knapsack are
rotated copies of each other. Another di¬erence with pure knapsack systems
is that the sum of vectors we are searching does not exactly sum to zero but
instead to a vector with a ¬xed proportion of 1 and ’1 values.
More precisely, starting from a vector with n coordinates, we are considering
a lattice spanned by the columns of a 2n — 2n matrix of the following form:
« 
a1 a2 · · · an q 0 · · · 0
¬ a2 a3 · · · a1 0 q · · · 0 ·
¬ . . .. . . . .. . ·
¬ ·
¬. . . . . . . .·
¬. . . .. .·
¬ an a1 · · · an’1 0 0 · · · q ·
¬ ·
¬ 1 0 ··· 0 0 0 ··· 0·
¬ ·
¬ 0 1 ··· 0 0 0 ··· 0·
¬ ·
¬. . . . . . .. . ·
 . . .. . . . . . 
.. . .. .
0 0 ··· 0 0 ··· 0
1

This lattice has many interesting properties:
• Starting from any vector in the lattice and rotating the coordinates of
the ¬rst and of the second half by the same amount, we obtain another
lattice vector.
• The short vectors corresponding to the NTRU secret key contain a small
number of 1 and ’1 values and have a very short norm.
• The initial lattice basis given above already contains short vectors, of
norm q, corresponding to the reduction modulo q rule.




© 2009 by Taylor and Francis Group, LLC
Lattice-based cryptanalysis 405

Despite these extra properties, fully reducing the NTRU lattice in large di-
mension in order to recover the secret key is a real challenge. This shows
that even when working with small numbers, lattice reduction is not an easy
task. A state-of-the-art of lattice reduction methods applied, in particular, to

<<

. 14
( 18)



>>