ńņš. 14 |

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 377

linear function coincide with the output of f for some fraction p of the 2k

possible inputs. Let yt = Lf (xt , xtā’Ī“1 , Ā· Ā· Ā· , xtā’Ī“kā’1 ) and remark that when

the k-uple on the right-hand side belongs to the set of input where f and Lf

agree, then yt = zt . Moreover, since yt is a linear combination of outputs of

the same LFSR, with shifted initial values, yt itself can be produced by the

LFSR when it is initialized with the XOR of all these initial values. Assuming

for simplicity that the k-uples (xt , xtā’Ī“1 , Ā· Ā· Ā· , xtā’Ī“kā’1 ) are random and inde-

pendent, it should be clear that zt is the image of yt under a binary symmetric

channel with probability p and bias = 2p ā’ 1.

Of course, this independence hypothesis is clearly false. However, k-uples

which are generated by LFSRs are well balanced and despite the dependence

between successive k-uples, attacks based on the binary symmetric channel do

work well in practice. Correlation attacks precisely work within this frame-

work, forget about the exact output function f and simply focus on the bias

of the best possible linear approximation.

12.2.2 Maximum likelihood decoding

Given a noisy LFSR with bias , a natural question is to ponder whether

the real output of the LFSR can be recovered from N bits of noisy output.

This question arises both in coding theory and in cryptography. It is very

useful to ļ¬rst solve the problem with unlimited computing power. Clearly,

if the LFSR is fully speciļ¬ed, then its feedback polynomial is already known

and recovering its exact output is equivalent to ļ¬nding its initial state.

With unlimited computing power, it is possible to compute the LFSR out-

put on N bits for all initial states. Comparing all these sequences to the

noisy output z, we can measure the Hamming distance of each candidate to

the noisy output, i.e., the number of bits where the sequences diļ¬er. For

the correct guess, the average Hamming distance is (1 ā’ p)N . For incorrect

guesses, assuming that they behave essentially like random strings, we expect

an average distance of N/2. The average distance for the correct guess simply

comes from the way the noise is generated. However, the exact behavior of

incorrect guesses is slightly harder to explicit. In fact, the diļ¬erence between

the observed stream and an incorrect guess is the XOR of three terms, the

LFSR output for the guessed initial state, the LFSR output for the correct

initial state and the noise. By linearity, the two LFSR outputs can be grouped

into a single LFSR output whose initial state is obtained by xoring the two

initial states. As a consequence, the diļ¬erence is a noisy copy of this LFSR

output. Thus, it is almost balanced, i.e., it contains about half 0s and half

1s. This implies that the average distance is N/2 as expected. However, since

there are exponentially many incorrect guesses, a few of these bad guesses

may be much closer to the observed strings.

One important question with maximum likelihood decoding is to determine

the parameters for which the correct guess is likely to correspond to the small-

est Hamming distance to the noisy output. Indeed, when this happens, it is

Ā© 2009 by Taylor and Francis Group, LLC

378 Algorithmic Cryptanalysis

possible to determine the correct guess by keeping the candidate correspond-

ing to the minimal Hamming distance. Note that in some contexts, a weaker

condition may suļ¬ce. For example, if it is possible to test whether a likely

candidate is indeed correct, then we can be satisļ¬ed if the maximum likeli-

hood decoding produces a short list of candidates that contains the correct

one.

In practical cryptanalysis, this exhaustive approach can be used to cryptan-

alyze keystream generators made of several independent LFSRs together with

a non-linear output rule. Indeed, in this context, each LFSR has a small in-

ternal state compared to the complete cryptosystem and an exhaustive search

for the initial state of a single LFSR is much more eļ¬cient than a global ex-

haustive key search. As a consequence, if a correlation of bias can be found

between the output of the keystream generator and one of the internal LFSRs

with a state of length n, then the initial state of this register can be recovered

in time N 2n using N ā 1/ 2 bits of keystream. In particular, this can be

applied to Geļ¬eā™s generators. This attack initially proposed in [Sie84, Sie85]

is called a correlation attack.

12.2.2.1 Necessary amount of keystream for correlation attacks

In order to determine the amount of keystream required for correlation

attacks, we need to determine how close to the observed string a wrong guess

can be. Assume that we are observing N bits of keystream, that the bias

for the correct guess is and that we are testing C incorrect candidates. We

need determine the probability of having an incorrect candidate closer to the

observed stream than the correct one. If we are interested by the setting where

seemingly valid candidates can be further tested, it also useful to determine

the average number of incorrect candidates closer than the correct one.

On average, the correct candidate agrees with the observed keystream in

(1+ )N/2 positions, but the exact number can vary around this average. Simi-

larly, on average, each incorrect candidate agrees with the observed keystream

in N/2 positions, but the exact number can vary. Thus, to answer our ques-

tions, we need to understand these variations around the average.

We ļ¬rst consider a simpler case oļ¬ering the choice between two candidates,

a correct and an incorrect one. In this case, the problem is equivalent to distin-

guishing between a perfect random generator1 and a slightly biased generator

which outputs 0 with probability (1 + )/2. For each of the two generators,

we ļ¬rst compute the probability that it outputs T zeros and N ā’ T ones after

N measurement, taking into account the number of possible arrangements of

the zeros, this probability can be written as:

N

pT (1 ā’ p)N ā’T .

PT (p) =

T

1 That is, a generator which outputs each successive bit independently, with probability 1/2

of giving a 0 and probability 1/2 of giving a 1.

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 379

ā

Ī» 1 ā’ 1 erfc Ī»/ 2

2

0 1/2

0.1 0.5398

0.5 0.6915

1 0.8413

1.5 0.9332

2 0.9772

2.5 0.9938

3 0.9987

Table 12.1: Typical probabilities with binomial distributions

When we observe T zeros, it seems natural to compare PT (1/2) and PT ((1+

)/2) and to keep the candidate corresponding to the largest probability. In

fact, this is the best that can be done. When is small, PT (1/2) is the

larger probability for T < T0 , where T0 = (1 + /2) Ā· N/2. To determine

the probability of success of this strategy, we need to compute sums of PT (p)

for T in interval of the form [0, T0 ] and [T0 , N ]. The problem is that there

are no closed formulas for these sums. However, when N is large, there is a

standard way of approximating the formulas. Without going into the details,

the idea is to ļ¬rst view T as a random variable obtained by summing N values

0 or 1 (with probabilities p and 1 ā’ p). By deļ¬nition, T is said to follow a

binomial distribution. Then, the binomial distribution is approximated by a

Gaussian (or normal) distribution with average E = (1 ā’ p)N and variance

Ļ 2 = p(1 ā’ p)N . For a normal distribution, the probability to measure a value

smaller than E + Ī»Ļ is:

Ī»

1 2

eā’u /2

Q(E + Ī»Ļ) = ā du. (12.2)

2Ļ ā’ā

Moreover, with most computer algebra systems, this probability can be com-

puted using the complementary error function, usually denoted by erfc, ac-

cording to the relation:

ā

1

Q(E + Ī»Ļ) = 1 ā’ erfc Ī»/ 2 . (12.3)

2

We give some typical values in Table 12.1.

With this tool, computing the probability of successfully distinguishing the

correct and incorrect candidates becomes easy. Write the number of experi-

ments N as (2Ī»/ )2 , i.e., let

1ā 2

Ā·N

Ī»= (12.4)

2

and deļ¬ne a threshold T0 = (1 + /2)N/2. For the incorrect candidate, the

ā

average value of T is Ti = N/2 and Ļi = N /2 = Ī»/ . As a consequence, T0 ā’

Ā© 2009 by Taylor and Francis Group, LLC

380 Algorithmic Cryptanalysis

Ti = Ī»2 / is Ī»Ļi . For the correct candidate, we have Tc = (1+ )N/2 and Ļc =

N/(1 ā’ 2 )/2. When is small, we can conveniently ignore the diļ¬erence

between Ļc and Ļi and approximate T0 ā’ Tc by ā’Ī»Ļi . As a consequence, if

we predict that the observed distribution is the correct one when T ā„ T0 and

the incorrect one otherwise, the probability of success is obtained by looking

up Ī» in Table 12.1.

The more general case with C incorrect candidates and a correct one can

be viewed in several diļ¬erent ways which yield the same threshold T0 . One

option is to regroup the candidates in two groups, a bad one that occurs

with probability C/(C + 1) and a good one that occurs with probability 1/C.

Using the previous notations, the probability of observing T zeros is PT (1/2)

in the bad case and PT ((1 + )/2) in the good case. Using this option, we

can apply Neyman-Pearson lemma which says that an optimal distinguisher

should predict that the good case occured, if and only if, PT ((1 + )/2) >

C Ā· PT (1/2).

12.2.3 Fast correlation attacks

While correlation attacks are very eļ¬ective against stream ciphers which

involve small LFSRs, they quickly become impractical as the size of the target

registers grows. With large LFSRs, we would like a diļ¬erent algorithm to

exploit the correlation without having to pay with exponential complexity in

terms of the LFSRā™s length. Fast correlation attacks proposed by Meier and

Staļ¬elbach (see [MS89]) oļ¬er this possibility.

Note that there exists a much wider variety of correlation attacks than

can be presented here. For example, conditional correlation attacks use the

fact that some speciļ¬c conļ¬gurations of output bits gives the value of some

linear combination of LFSR bits much more precisely than in the average

case. To use this fact, we need parity checks that only use bit positions where

these speciļ¬c conļ¬gurations occur. For more information, see [LMV05]. Let

us also mention iterative correlation attacks [MG90], multipass correlation

attacks [ZF06] and vectorial fast correlation attacks [GH05].

12.2.3.1 Binary symmetric channel with repetitions

In order to illustrate one essential idea needed to devise such a correlation

attack, we start with an easy case. This easy case can be formulated as a

learning problem. In this problem, we try to learn a secret sequence S of

length n. To this eļ¬ect, we are given access to several independent noisy

copies of S obtained through a binary symmetric channel with bias . How

many copies are needed to recover S ?

We start by solving the case of a single bit s. After obtaining one mea-

surement of s through the binary symmetric channel, we can predict s with

probability p = (1 + )/2. With many measurements, the best we can do is

to use a majority rule in order to determine our prediction. If we see more 0s

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 381

than 1s, we predict a 0 and similarly in the opposite case. In the unlikely case

where there are as many 0s and 1s, we output a random prediction. Following

the results of Section 12.2.2.1, the probability of correctly predicting 0 is thus:

N

t=(N +1)/2 Pt for odd N and

P= (12.5)

N 1

Ā· PN/2

t=N/2+1 Pt + 2 for even N .

Due to the symmetry between the cases s = 0 and s = 1, P is also the

overall probability of correct prediction. Once again, this probability can be

computed using the complementary error function. The only diļ¬erence with

the previous case comes from the fact that since we have two opposite biases,

ā

we now compute Ī» = N 2 , i.e., rewrite Equation (12.4) without a factor

1/2. In the reverse direction, if we specify the desired probability of success,

this ļ¬xes Ī» and we need to make N = (Ī»/ )2 measurement.

This is usually summarized by saying that a correlation attack with bias

requires O( ā’2 ) measurements.

12.2.3.2 A basic attack on LFSRs

With an LFSR-based keystream generator, unless the cryptosystem as a

whole presents some unexpected weakness, it is unlikely that the cryptanalysis

has access to many noisy copies of the same LFSR output. As a consequence,

we cannot directly use the above method. Instead, we need to exploit the

redundancy within the LFSR output in order to create such repeated copies of

individual bits. Note that we have encountered a similar problem in Chapter 9,

Section 9.3 with Goldreich-Levin theorem. However, with Goldreich-Levin

theorem, we could freely choose the noisy scalar products we wanted to learn.

Here, we no longer have this freedom and we need to work with noisy values

which are ļ¬xed by the LFSR speciļ¬cations.

Remember that in order to know the complete state of an LFSR it suļ¬ces

to learn the exact values of n consecutive bits. Since n is a relatively small

number, we can ļ¬rst focus on the recovery of a single bit. The key idea is the

construction of parity check equations for the LFSR. A parity check equation

is simply a systematic linear equality of the form:

xi1 ā• xi2 ā• Ā· Ā· Ā· ā• xit = 0. (12.6)

Here, systematic means that the equation is true regardless of the initial value

of the LFSR.

Each such parity check yields a biased prediction for xi1 :

xi1 = zi2 ā• Ā· Ā· Ā· ā• zit . (12.7)

Since each value zi2 , . . . , zit is an approximation for xi2 , . . . , xit with bias

and since all these approximations involve independent random choices in the

Ā© 2009 by Taylor and Francis Group, LLC

382 Algorithmic Cryptanalysis

binary symmetric channel model, we can show that we obtain an approxima-

tion for xi1 with bias tā’1 . This can be proved by repeated application of the

following lemma.

LEMMA 12.1

Given two independent approximations x and y of x and y with respective

Ė Ė

biases x and y , then x ā• y is an approximation of x ā• y with bias x y .

ĖĖ

PROOF We see that x ā• y is a correct approximation of x ā• y, if either

ĖĖ

x and y are either both correct or both incorrect. The ļ¬rst event where both

Ė Ė

are correct occurs, due to independence, with probability 1/4 Ā· (1 + x )(1 + y ).

The second event occurs with probability 1/4 Ā· (1 ā’ x )(1 ā’ y ). Summing the

two probabilities, we ļ¬nd that the total probability of success is:

1 1

Ā· (2 + 2 Ā· (1 +

x y) = x y ). (12.8)

4 2

Thus the bias of the combined approximation is x y.

By repeatedly applying this lemma, we obtain the bias of any sum of inde-

pendent approximations simply by multiplying the elementary biases. This

can be applied to parity check involving t bits as above, called a parity check of

weight t, to obtain an approximation of one of the involved bits with bias tā’1 .

As a consequence, if we obtain a large number of parity checks all involving the

same bit xi1 , we can predict this bit with high conļ¬dence thanks to a majority

vote between the individual predictions. Using the results of Section 12.2.3.1,

we see that this number is of the order of ā’2t+2 .

To evaluate the performance of this basic attack, it is important to take

several parameters into account. The ļ¬rst parameter is, of course, the amount

of time needed to evaluate all the parity checks and perform the majority

vote. However, the amount of keystream required for the attack is also very

important. The other two important parameters are the initial bias and the

length n of the register. In order to understand the relationship between these

parameters, we now study the conditions of existence of enough parity checks

to perform the basic attack. For this analysis, we do not consider algorithms

for generating the parity checks, merely their existence.

Due to the linearity of the LFSR, for each output bit, there exists ļ¬xed

constants, which only depend on the feedback polynomial such that:

nā’1

(j)

xj = ci x i . (12.9)

i=0

This allows us to express each output bit as a linear function of the bits

contained in the initial value of the LFSR. It is convenient to group the coeļ¬-

(j)

cients ci in a vector c(j) and view the initial value of the LFSR as a vector X.

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 383

With these notations, we can rewrite:

xj = (c(j) |X). (12.10)

To construct parity checks of weight t, we look for sets of t vectors cj1 , . . . ,

cjt such that:

t

c(ji ) = 0. (12.11)

i=1

Indeed, by linearity of the scalar product, this implies:

t t

c(ji ) |X) = 0.

xji = ( (12.12)

i=1 i=1

To count the number of parity checks of weight t, all involving some ļ¬xed

position, say j1 = 0, we need to choose the remaining positions j2 to jt and

check that the sum of the n-bit coeļ¬cient vectors is 0. Assuming that we have

access to a noisy output of the LFSR of length N , each of the above positions

tā’1

is bounded by N , and there are i=1 (N ā’ i)/(t ā’ 1)! unordered choices for

the t ā’ 1 positions. We expect that a fraction 2ā’n of these choices sum to

zero. As a consequence, the average number of available parity checks for j1

is:

(N ā’ 1)!

N ā’1

Ā· 2ā’n = n . (12.13)

tā’1 2 (t ā’ 1)!(N ā’ t)!

With an elementary bias , we compute the probability of correct prediction

by letting:

(N ā’ 1)! 2tā’2

Ī»= (12.14)

2n (t ā’ 1)!(N ā’ t)!

and by looking up Ī» in Table 12.1.

Note that, given a set of parity checks allowing recovery of xj1 , it is easy

to modify it to recover xj1 +1 by shifting all indices by 1. Similarly, we can

obtain xj1 +2 , . . . , xj1 +nā’1 . Of course, since the indices are shifted, the highest

required position is no longer N but can go up to N + n ā’ 1. However, since

n is normally very small when compared to N , this issue can be neglected.

12.2.4 Algorithmic aspects of fast correlation attacks

The advantage of fast correlation attacks compared to basic correlation

attacks is that we no longer need to exhaustively search the 2n ā’ 1 possible

initialization values. Instead, we need to ļ¬nd and evaluate all the necessary

parity checks. We address these algorithm issues in this section.

Ā© 2009 by Taylor and Francis Group, LLC

384 Algorithmic Cryptanalysis

12.2.4.1 Computing parity checks

The ļ¬rst algorithmic problem we encounter is to ļ¬nd a large enough number

of parity checks for fast correlation attacks. This can be done in many ways.

However, a common pre-requisite is to express each output bit of a given

LFSR has a linear combination of the input bits, i.e., following the notation

of Section 12.2.3.2, to compute the vectors c(j) . A very simple way to do this

is to initialize n copies of the LFSR, with the same feedback register, with n

initial values containing n ā’ 1 zeros and a single 1. In the copy numbered i,

the only non-zero initial bit is in position 1. By linearity, the n bits of output

obtained at time j form the vector c(j) . An easy way to speed up this process

is to use a bitslicing approach as in Chapter 5 and apply the LFSR recursion

to n-bit words, thus running the n copies in parallel. This is described in

pseudo-code as Algorithm 12.1.

Algorithm 12.1 Computing formal expression of LFSR outputs

Require: Input coeļ¬cients of LFSR (Ī±0 , . . . , Ī±nā’1 )

Require: Bound on stream length N

Allocate array A of N words on n bits

Initialize A[0 Ā· Ā· Ā· n ā’ 1] to 0

for i from 0 to n ā’ 1 do

Set i-th bit of A[i] to 1

end for

for i from n to N do

nā’1

Let A[i] = j=0 Ī±j A[i ā’ n + j]

end for

Output array A {A[i] contains the expression of the i-th bit of LFSR output

as a linear combination of bits 0 to n ā’ 1.}

Once the n-bit vectors of coeļ¬cients are known, we need to form parity

checks of weight t, by summing together one ļ¬xed vector, corresponding with

a position we want to predict and (t ā’ 1) other vectors corresponding to other

positions.

Brute force. The ļ¬rst idea that comes to mind to solve this problem is

simply to exhaustively try all possible set of positions and keep the sets which

N ā’1

sum to zero. The running time is and can be closely approximated

tā’1

by N tā’1 /(t ā’ 1)!, by ignoring the diļ¬erence between N and N ā’ i for small

values of i.

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 385

Birthday approach. The next idea is to remark that each parity check can

be written as an equality between the vectors corresponding to two half parity

checks in two smaller lists. Indeed, it is possible to equate the expressions of

two sums of t/2 positions each. This method has been the method of choice

for constructing parity checks for a long time. Note that each parity check

can be split into two parts in many diļ¬erent ways and that we need to apply

some care in order to avoid generating the same parity check many times, one

for each diļ¬erent split.

Quadrisection approach. Going further, it is also possible to use the al-

gorithms presented in Chapter 8 to compute parity checks. This does not

change the running time compared to the birthday approach; however, it

greatly reduces the required amount of memory. Indeed, each parity check

can be viewed as a sum of 4 parts, each part containing (almost) t/4 values.

Thus, as in [CJM02], we obtain a four set sum problem over F2n which can be

directly addressed using the algorithms of Chapter 8. As with the birthday

approach, each parity check can be split into four parts in many ways and we

need to be careful to avoid generating the same parity check many times.

Rho approach. A natural question is to wonder whether we can get rid of

memory altogether when generating parity checks. After all, a parity check is

nothing but a collision in a function that maps t/2 diļ¬erent positions to the

sum of their expression in terms of the LFSR initial value. Since collisions

in functions can be found by cycle ļ¬nding algorithms, it may appear that

such an approach can be used to ļ¬nd parity checks. However, it is not known

how to do this, unless we are ready to increase the value of t beyond the

value determined in Section 12.2.3.2. Since the bias of parity checks decreases

quickly when t increases, cycle ļ¬nding algorithms are usually not used to

construct parity checks.

If we are ready to increase t, we can, as in Section 7.5.1, proceed by using a

function deļ¬ned in the following way. First, any subset of t/2 positions can be

mapped to a n-bit string describing the corresponding linear combination of

bits of the initial value. This n-bit string can be viewed as an n-bit integer and

this number can in turn be interpreted to construct a new set of t/2 positions.

Any cycle ļ¬nding algorithm can obtain a collision for this function. However,

the collision may occur in two diļ¬erent places, either when mapping sets to

bitstrings or when mapping bitstrings to sets. The ļ¬rst type of collision yields

a parity check. Unfortunately, the second type of collision is meaningless.

The problem is that when t is too small, there are much more bitstring values

than sets of t/2 positions. As a consequence, an overwhelming fraction of the

collisions is of the useless type. If we increase t, roughly doubling it, we can

make sure that the mapping from bitstrings to sets is injective and thus that

all collisions are of the useful type. Note that doubling t makes the rest of

Ā© 2009 by Taylor and Francis Group, LLC

386 Algorithmic Cryptanalysis

the fast correlation attack very ineļ¬cient. Thus, it is best to avoid this Rho

based approach.

Discrete logarithm based approach. For parity checks involving t = 3

positions, Kuhn and Penzhorn proposed in [PK95] to compute the discrete

logarithm of 1 + Ī±i in basis Ī± in the ļ¬nite ļ¬eld F2n , for i smaller than N , the

maximal amount of accessible keystream. Here, Ī± is a root of the feedback

polynomial of the LFSR under attack. If the discrete logarithm j is also

smaller than N , we learn than 1 + Ī±i + Ī±j = 0, which can be converted into a

parity check x0 ā• xi ā• xj = 0. An extension of this idea was recently proposed

in [DLC07].

12.2.4.2 Improving the basic attack

The basic attack described in Section 12.2.3.2 presents two main diļ¬culties:

ā¢ When the length n of the LFSR increases, the number of parity checks

decreases and we need to increase t and/or N to implement the attack.

This is due to the 2ā’n factor in Equation (12.13).

ā¢ When t increases, the number of necessary parity checks increases ex-

ponentially with t and evaluating these parity checks on the observed

keystream takes a long time.

It is possible to reduce the impact of these two diļ¬culties by using an

approach similar to the method used for the Goldreich-Levin theorem, in

Section 9.3. We isolate a subset S0 of n0 bits in the initial state of the

LFSR. These bits are going to be addressed separately using an exhaustive

search. As a consequence, when counting and constructing parity checks, we

only cancel the n ā’ n0 other bits. For each acceptable parity check, we also

keep the linear expression specifying the contribution of the bits in S0 . To

count these modiļ¬ed parity checks, we replace the factor 2ā’n by 2ā’(nā’n0 ) in

Equation (12.13). This increases the number of available parity checks and

allows to reduce t and/or N . Note, however, that since the n0 bits are ļ¬xed

and cannot be shifted, we cannot use shifted parity checks this time and need

to compute them independently n ā’ n0 times, once for each of the bits that

need to be recovered.

Without using the Walsh transform, recovering each of the nā’n0 remaining

bits would require to evaluate and count the number of predictions 2n0 times.

Instead, we start by regrouping the parity checks that share the same linear

expression on S0 . Then, within each group we partially evaluate these parity

checks and keep the diļ¬erence between the number of 0 and 1 values. This

diļ¬erence is stored in an array of 2n0 elements in the position given by the

binary encoding of the linear expression of the group. After applying a Walsh

transform to this array, position i contains the diļ¬erence between the numbers

of 0 and 1 values for the corresponding guess of the bits of S0 . As in the

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 387

analysis of Section 12.2.2.1, we have 2n0 ā’ 1 bad candidates and a single

good candidate. With well-chosen parameters, the position corresponding to

the good candidate contains the largest value in absolute value. Learning

this position gives the value of the n0 bits in S0 . Moreover, the sign of this

largest value determines the value of the bit for which the parity checks were

constructed (see [CJM02] for more details).

12.3 Algebraic attacks

In addition to the binary symmetric channel model, the ļ¬ltered generator is

also well-suited to a modelization by a system of algebraic equations. Indeed,

if in the basic expression:

zt = f (xt , xtā’Ī“1 , Ā· Ā· Ā· , xtā’Ī“kā’1 ), (12.15)

we replace each bit of x by its linear expression in terms of the initial values

x1 , . . . xn of the LFSR resulting from Algorithm 12.1, we obtain a transformed

equation:

zt = ft (x1 , x2 , Ā· Ā· Ā· , xn ). (12.16)

Assuming a known plaintext attack, the cryptanalyst knows the value of zt

and wants to recover the initial state. At this point, two important remarks

are needed. First, we can collect a large number of equations, one for each

bit of known plaintext. Second, the degree of each polynomial ft is the same

as the degree of f , since ft results from a linear change of variables.

As a consequence, the ļ¬ltered generator can be vulnerable to algebraic

attacks when its parameters are badly chosen. It is especially important not

to have a low degree for f . For example, let us detail what happens when

f can be expressed as a quadratic polynomial. In that case, each equation

ft = zt can be rewritten as:

n nā’1 n

(t) (t)

zt = Ī±i xi + Ī²i,j xi xj . (12.17)

i=1 i=1 j=i+1

Using (re)linearization techniques, we can deļ¬ne yi,j as a new unknown

with value equal to xi xj and substitute yi,j in the above. This yields a linear

equation in n(n + 1)/2 unknowns. Collecting n(n + 1)/2 (or a few more) such

equations, it suļ¬ces to solve a moderately large linear system of equations to

recover the initial state of the LFSR. Equivalently, we can use Algorithm 11.4

on this set of algebraic equations using a degree limit of 2.

Ā© 2009 by Taylor and Francis Group, LLC

388 Algorithmic Cryptanalysis

12.3.1 Predicting an annihilator polynomial

Alternatively, one can also look for linear combinations of the zt such that

the contribution of all terms yi,j vanishes. Unless we are especially unlucky,

there are linear terms xi left and we obtain linear equations in the n ini-

tial unknowns. At ļ¬rst, this seems essentially equally costly as applying

Algorithm 11.4. However, it was shown recently by RĆønjom and Helleseth

in [RH07] that this alternative can be made extremely eļ¬cient. In order to

see why, let us brieļ¬‚y recall Wiedemannā™s algorithm from Chapter 3. In this

algorithm, there are two steps, the ļ¬rst one determines the minimal polyno-

mial of some linear sequence and the second one uses this minimal polynomial

to invert a linear system. In the case of an algebraic attack against ļ¬ltered

LFSR, a multiple of the minimal polynomial can be predicted in advance

and the ļ¬rst step of Wiedemannā™s algorithm can be skipped. Tweaking in a

modiļ¬ed second step, it is then possible to directly derive linear equations in

the n initial unknows, without performing linear algebra on a large system of

equations.

To understand the method of [RH07], it is best to view the current LFSR

state as a element of F2n and the advance step of the LFSR as a multiplication

by some element Ī± in F2n . If the initial state is X0 , the state at time t is

Xt = Ī±t X0 . With this notation, it is also possible to write the output bit xt

as a trace of Ī²Xt for some constant Ī² in F2n . We recall that the trace of X

in F2n is deļ¬ned as:

nā’1

i

X2 .

Tr(X) = (12.18)

i=0

Let us now focus on an arbitrary quadratic term xt xt+Ī“ . We can write:

xt xt+Ī“ = Tr(Ī²X0 Ī±t ) Ā· Tr(Ī²X0 Ī±t+Ī“ )

nā’1 nā’1

t 2i i

(Ī²X0 Ī±t+Ī“ )2

Ā·

= (Ī²X0 Ī± )

i=0 i=0

nā’1 nā’1

i

+2j j i

+2j t

(Ī²X0 )2 Ī±2 Ī“

Ā· (Ī±2

= ). (12.19)

i=0 j=0

Thus, each quadratic term is a linear combination with complicated but

ļ¬xed coeļ¬cients of powers of t. This powers are of the form Ī± t , where

has at most two non-zero bits in its binary decomposition. We can regroup

the values Ī± by considering the diļ¬erence ā = i ā’ j modulo n and form

polynomials:

nā’1

i

+2(i+ā) mod n

X ā’ Ī±2

Pā (X) = . (12.20)

i=0

This polynomial is invariant under the action of Frobenius, thus, it has coeļ¬-

i

cients in F2 . Note that P0 (X) regroups the values Ī±2 , thus it is the feedback

polynomial of the considerered LFSR.

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 389

Following [RH07] and proceeding as with Wiedemannā™s algorithm, we see

i i+ā

that Pā annihilates all powers of values Ī±2 +2 in Equation (12.19). As a

consequence, the product of all Pā for ā = 0 annihilates all non-linear terms

of Equation (12.19). Unless we are extremely unlucky, the linear terms are not

canceled and it suļ¬ces to solve a system of n linear equations in n unknowns

to recover the initial state X0 .

For monomials of higher degree d, it is possible to proceed similarly, by

considering products of term X ā’ Ī± for values of with d non-zero bits in

their binary decomposition.

12.4 Extension to some non-linear shift registers

In order to avoid many of the weaknesses of linear feedback shift registers,

non-linear feedback shift registers were recently considered for cryptographic

uses. Non-linear shift registers (NLFSR) are very similar to LFSR, the only

diļ¬erence is that the new bit entering the register at each step is computed

using a non-linear function instead of a linear one. One diļ¬culty with these

registers is to make sure that the period is long enough for all initial values.

To overcome this diļ¬culty, several approaches are possible. One option is

to use registers of small size, to make sure through exhaustive search that

all these registers have a long enough periods and to combine several into the

keystream generator. Another option is to consider registers of a speciļ¬c form

that allow the period to be computed.

At ļ¬rst, it seems that algebraic and correlation attacks cannot apply to

non-linear shift registers. Indeed, concerning algebraic attacks, we see that

the iteration of the low degree feedback function quickly produces equations

of very high degree, which cannot be solved by the techniques of Chapter 11.

Concerning correlation attacks, getting biased prediction of inner bits of the

NLFSR remains easy. However, without linearity, we can no longer construct

parity checks.

Yet, in this section, we show that the correlation attacks presented in this

chapter can be directly applied to a speciļ¬c subclass of non-linear shift register

based keystream generators. This subclass is formed of keystream generators

based on a non-linear shift register whose output is a linear combination of

the registers cells.

To make things more precise, we can view such a keystream generator as

built around an inner sequence xi deļ¬ned from n initial bits x0 to xnā’1 ,

together with a recursion formula:

xn+i = F (xi , xi+1 , . . . , xi+nā’1 ), (12.21)

for some eļ¬ciently computable non-linear function F , for example a low-

Ā© 2009 by Taylor and Francis Group, LLC

390 Algorithmic Cryptanalysis

degree function. The output zi at time i is obtained by applying a ļ¬xed linear

function L to the inner sequence. Assuming, for simplicity of exposition, that

xi is always used when computing zi , we write:

zi = xi ā• L(xi+1 , . . . , xi+nā’1 ). (12.22)

Note that, depending on the choice of L, this construction can be trivially

broken. For example, if L = 0, zi simply is a copy of xi . In this case, it suļ¬ces

to collect n consecutive output bits to break the construction. Indeed, using

Equation (12.21), the rest of the sequence can then be predicted.

We now show that for any choice of L, this NLFSR based generator can

be vulnerable to correlation and algebraic attacks. The key argument is that

for each inner bit of the NLFSR, we have two diļ¬erent equations, the non-

linear Equation (12.21) and a linear equation which is obtained by rewriting

Equation (12.22) as:

xi = zi ā• L(xi+1 , . . . , xi+nā’1 ). (12.23)

Moreover, once the keystream z has been observed, we can substitute Equa-

tion (12.23) into itself for various values of i. As a consequence, each inner

bit xi can be expressed as a linear equation Li (x0 , . . . , xnā’1 ) of the initial

bits. The constant term in Li is obtained by linearly combining bits of the

output stream z. Substituting each bit of x by the corresponding linear ex-

pression in Equation (12.21) produces a low degree algebraic equation in the

unknown x0 , . . . , xnā’1 . Since there are many such equations, we can use the

methodology of algebraic attacks.

Similarly, if we take the best linear approximation by the inputs of F and

its output, then after replacing in this linear expression each value of x by

its linear expression, we obtained a noisy linear equation in the bits of initial

state. Of course, we can apply the correlation methodology to the resulting

set of noisy linear equations.

12.5 The cube attack

A third class of attack against the ļ¬ltered generator was recently proposed

by Dinur and Shamir [DS09]. It is called the cube attack, it is a new way to

use a well-known identity on multivariate polynomials over F2 and transform

it into an attack against some secret-key ciphers. In particular, this can be

applied to the ļ¬ltered generator.

To explain the underlying identity, let us start with a univariate polynomial

F over F2 . When x is in F2 , we know that x2 = x. As a consequence, the

polynomial f obtained by keeping the remainder of the Euclidean division of

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 391

F by f induces the same function as F on F2 . Moreover, f is of degree at

most one, i.e., f (x) = ax ā• b. We now see that:

f (0) ā• f (1) = a. (12.24)

In fact, by summing on the two points 0 and 1, we are taking a derivative

of f and we can recover the coeļ¬cient of the term x in f . Note that this

is based on the same basic equation as diļ¬erential cryptanalysis, but views

things from a diļ¬erent angle.

With more unknowns, we proceed similarly. Let F (x1 , Ā· Ā· Ā· , xk ) be a poly-

nomial in k unknowns. After accounting for the identities x2 = xi for each

i

unknown, we obtain a polynomial f , whose partial degree in each unknown is

at most one. Factoring x1 out where possible, we can write:

f (x1 , Ā· Ā· Ā· , xk ) = a(x2 , Ā· Ā· Ā· , xk ) Ā· x1 + b(x2 , Ā· Ā· Ā· , xk ).

As a consequence, the function f (0, x2 , Ā· Ā· Ā· , xk ) ā• f (1, x2 , Ā· Ā· Ā· , xk ) is equal to

a(x2 , Ā· Ā· Ā· , xk ).

The key idea of the cube attack is to repeat this process until we reach

easy-to-solve linear equations. For example, to repeat the approach on x1 , x2

and x3 , we ļ¬rst write:

f (x1 , Ā· Ā· Ā· , xk ) = a7 (x4 , Ā· Ā· Ā· , xk ) Ā· x1 x2 x3 ā• a6 (x4 , Ā· Ā· Ā· , xk ) Ā· x1 x2 ā•

= a5 (x4 , Ā· Ā· Ā· , xk ) Ā· x1 x3 ā• a4 (x4 , Ā· Ā· Ā· , xk ) Ā· x1 ā•

= a3 (x4 , Ā· Ā· Ā· , xk ) Ā· x2 x3 ā• a2 (x4 , Ā· Ā· Ā· , xk ) Ā· x2 ā•

= a1 (x4 , Ā· Ā· Ā· , xk ) Ā· x3 ā• a0 (x4 , Ā· Ā· Ā· , xk ).

Then, we remark that:

a7 (x4 , Ā· Ā· Ā· , xk ) = f (0, 0, 0, x4 , Ā· Ā· Ā· , xk ) ā• f (0, 0, 1, x4 , Ā· Ā· Ā· , xk ) ā•

= f (0, 1, 0, x4 , Ā· Ā· Ā· , xk ) ā• f (0, 1, 1, x4 , Ā· Ā· Ā· , xk ) ā•

= f (1, 0, 0, x4 , Ā· Ā· Ā· , xk ) ā• f (1, 0, 1, x4 , Ā· Ā· Ā· , xk ) ā•

= f (1, 1, 0, x4 , Ā· Ā· Ā· , xk ) ā• f (1, 1, 1, x4 , Ā· Ā· Ā· , xk ).

From this example, the reader can easily see that when we repeat the process

with t unknowns, we need to decompose f into 2t parts and to sum over 2t

points. The ļ¬rst t coordinates of these points cover all the possible values

that can be achieved by setting each coordinate to 0 or 1. These points are

the vertices of a hypercube in the t-dimensional space.

It is shown in [DS09] that for any polynomial f and any set of unknows S,

we can write:

f (x1 , Ā· Ā· Ā· , xk ) = xi fS (x1 , Ā· Ā· Ā· , xk ) + q(x1 , Ā· Ā· Ā· , xk ),

iāS

where each monomial in q misses at least one unknown from S and fs is a

polynomial of the variables not in S. When summing on a hypercube induced

Ā© 2009 by Taylor and Francis Group, LLC

392 Algorithmic Cryptanalysis

by S, we obtain a value for the polynomial fS . In [DS09], fS is called the

super-polynomial of S in f . Here, we instead called it the derived function of

f at S. The basic idea of cube attacks is very similar to a generalization of

diļ¬erential cryptanalysis called higher order diļ¬erential cryptanalysis [Knu94,

Lai94, MSK98]. However, it emphasizes a diļ¬erent attack scenario.

12.5.1 Basic scenario for the cube method

In order to use the cube attack, we need to be able to sum many evaluations

of the same polynomial along some hypercube. As a consequence, we need a

scenario where the attacker controls at least part of the unknowns entering

the polynomial. Since this process of summing produces values for a lower

degree polynomial derived from the ļ¬rst, it is natural to use this lower degree

polynomial to extract information about other unknowns. For this reason,

in [DS09], it is proposed to consider systems involving polynomials where

some unknowns are controlled by the attackers and some unknowns contain

secret key material. To simplify the extraction of information, the easiest is

to choose the hypercube we are summing on, in such a way that the derived

polynomial is linear in the secret unknowns. If this can be done, we obtain a

linear equation in the secret key. Repeating this with several hypercubes, we

obtain a linear system of equations, from which the secret key can hopefully

be recovered.

Since the cube attack is very recent, its scope of application is not yet

known. However, this attack seems to have a lot of potential. One very

surprising application is already given in [DS09]. This application allows to

break cryptosystems, which satisfy some speciļ¬c criteria, in a black box way,

without even knowing the full details about these cryptosystems. The idea is,

in an initial phase, to choose hypercubes at random and look at the derived

function that we obtain, when running test copies of the system for which

the attacker controls the secret key. If the derived function is, or seems to

be, constant, the hypercube contains too many unknowns. If the derived

function is complicated, the hypercube contains too few unknows. If the

derived function is a simple linear function of the secret key, then we have

achieved the right balance and found one linear equation to be used later in

an attack phase. Note that there are several ways to test the linearity of the

derived function. One option is to check that if we add any of the secret key

unknowns to the hypercube variables, the resulting derived function becomes

a constant.

Among the possible applications of the cube attacks, it is proposed in [DS09]

to consider an extremely large ļ¬ltered LFSR (on 10,000 unknowns) which

mixes IV and secret key variables in a complex way. The only real constraint

is that the degree of the output in terms of the variables is at most 16. The

cube attack successfully breaks this incredibly large ļ¬ltered LFSR using 220

diļ¬erent IV values and a single bit of output for each. Note that this ļ¬ltered

LFSR is completely of range for more conventional correlation or algebraic

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 393

attacks.

12.6 Time memory data tradeoļ¬s

In Chapter 7, we describe some time-memory tradeoļ¬s against block ci-

phers. Since stream ciphers behave quite diļ¬erently from block ciphers, it is

interesting to revisit these attacks in this diļ¬erent context. Let us start by

listing the relevant diļ¬erences between the two kinds of ciphers. First, instead

of combining a key with a block of plaintext to get a block of ciphertext as

block ciphers do, stream ciphers generate a pseudo-random sequence from an

inner state, usually obtained by mixing together a key and an initialization

value. During the generation of the pseudo-random sequence, the inner state

takes many diļ¬erent consecutive values, where each value is obtained from

the previous one through an update function. At the same time, the output

sequence is obtained by applying an output function to each inner state. In

order to cryptanalyze the stream cipher, it suļ¬ces to recover one value of the

inner state at any point in time. Indeed, this suļ¬ces to predict the sequel

of the pseudo-random stream. Moreover, most of the time, the update func-

tion is in fact reversible and recovering a single inner state allows to ļ¬nd the

complete sequence of inner states. From this point, depending on the precise

cipher being considered, it might also be possible to recover the encryption

key. Thus, with a stream cipher, the goal of an attacker is slightly diļ¬erent;

instead of having to directly ļ¬nd the key, it suļ¬ces to recover one of the many

inner states taken during the encryption process.

From this remark, it is already possible to devise a simple attack that works

better than exhaustive search. To do this, we start by deļ¬ning a function F ,

which takes as input an inner state and outputs a preļ¬x of the corresponding

pseudo-random stream. For the size of the preļ¬x, we choose either the size

of the inner or something slightly larger. Clearly, from some output of the

stream, we can deduce the image by F of the corresponding inner states of

the stream. As a consequence, if we know how to invert F we can recover

the inner state and break the cipher. Moreover, if we know how to invert F

only for a small fraction of the states, this already yields an attack. Indeed,

by observing the output of the stream cipher, we cannot only compute F for

the initial inner state, but also for most of the subsequent inner states taken

during the encryption process. In fact, we obtain F for all of these inner

states except the few ļ¬nal ones, for which we do not have enough keystream

available. To simplify the analysis of the attack, one generally assumes that

each inner state gives a new independent chance of inverting F . Of course,

successive states are not really independent, but from the cryptanalyst point-

of-view, the simpliļ¬ed modelization is a good rule of thumb. In this model,

Ā© 2009 by Taylor and Francis Group, LLC

394 Algorithmic Cryptanalysis

this basic attack, independently discovered by Babbage and Golic, works as

follows. First, during a precomputation step, we compute F on some fraction

Ī± of possible inner states and store these values in a sorted table. Then

to recover an unknown state, we obtain the corresponding pseudo-random

sequence. From this sequence, we deduce N diļ¬erent values of F for N of

the inner states seen during the construction of the pseudo-random sequence

(whose length is a little above N ). If Ī±N ā 1, we expect that one among the

F values can be found in our table and we deduce the corresponding inner

state.

This simple attack illustrates the fact that with stream ciphers, we have

many available targets instead of a single one and this helps the cryptan-

alyst. This was further extended by Biryukov and Shamir in [BS00] into

time/memory/data tradeoļ¬ attacks. By playing with three parameters in-

stead of two, it is possible to obtain a better compromise. For example,

assuming that the space of inner states contains N values, it is possible to

recover one state in time N 2/3 using output sequences of length N 1/3 and

memory of the order of N 1/3 . This is clearly more eļ¬cient than Hellmanā™s

usual compromise with time and memory N 2/3 . For more details, we refer

the reader to [BS00].

Ā© 2009 by Taylor and Francis Group, LLC

Attacks on stream ciphers 395

Exercises

1h . Assume that the same one-time pad key is used twice to encrypt two

texts in English, M and M . Show how both texts can be recovered

from the encrypted strings.

2. Work out a scenario where the malleability of messages encrypted by

the one-time pad yields a major security hole.

3h . Consider a ļ¬ltered LFSR where the input bits to the non-linear function

f are consecutive. Assume that f maps k bits to one. Show that

the best linear correlation between input and output can be improved

by considering several consecutive output bits. Try this approach for

randomly selected functions.

4. Consider a Geļ¬e generator based on three LFSRs: a control LFSR C

on n bits and two outputs LFSR A and B on m bits each. How much

does exhaustive search on this generator cost? Given the initial value of

C, how can you eļ¬ciently recover the initial values A and B? Conclude

that n should not be too small.

5h . Continuing with the Geļ¬e generator, assume that n is too large for

exhaustive search. Construct a correlation attack and recover A and

B. How much keystream is needed? What happens if A and B use

the same feedback polynomial? Once A and B are known, how do you

reconstruct C?

6. Recompute the necessary amount of keystream N required for a corre-

lation attack on a ļ¬ltered LFSR with bias if we want the correct can-

didate to appear among the list of L candidates generating sequences

closest to the output stream.

7h . Write a program for computing parity checks. Try the birthday based

and the quadrisection approach. Let n denote the number of bits taken

in account in the parity checks. What values of n can you achieve with

each approach?

8. Following Section 12.3.1, consider a LFSR on n bits and the output

sequence x ā• y ā• xy, where x and y are two consecutive bits, then

construct a linear combination of output bits where all quadratic terms

vanish.

9h . This exercise considers non-linear shift registers on n bits. Show that

there are 2n possible shift registers of this sort. Study the possible struc-

ture of the oriented graph whose vertices are the inner states and whose

arrows are obtained by advancing the NLFSR. Which NLFSR would

Ā© 2009 by Taylor and Francis Group, LLC

396 Algorithmic Cryptanalysis

you keep for cryptographic purposes? Write a program to enumerate

the possibilities and select the good ones.

Finally, here are possible implementation projects on this topic:

i. Write a complete set of programs for fast correlation attacks. Com-

pare the performance of using the Walsh transform or not during the

evaluation of parity checks. Look up and implement some advanced cor-

relation attack from the literature (conditional, iterative, . . . correlation

attacks).

ii. Generalize the method of Section 12.3.1 to polynomials of arbitrary

degree. Write a program that computes, given a feedback polynomial

and output function for a LFSR, a linear combination of output bits

such that the non-linear terms vanish. Show that for some badly chosen

output functions the method can work with short output sequences.

Construct such bad examples.

iii. Write a working implementation of the cube attack as presented in [DS09].

Compare this to other attacks against the ļ¬ltered generator.

Ā© 2009 by Taylor and Francis Group, LLC

Chapter 13

Lattice-based cryptanalysis

Lattice-based cryptanalysis uses lattice reduction algorithms in order to dis-

cover short linear relations between integer vectors that give a useful insight

into various cryptographic systems. They can roughly be classiļ¬ed into two

main classes: direct attacks, where the cryptanalytic problem at hand can

directly be expressed as a lattice reduction problem and Coppersmithā™s based

attacks, which rely on several algorithms which can recover small roots of

polynomial systems of equations using lattice reduction.

Due to the complexity of lattice reduction algorithms, a frequent way of

analyzing lattice-based attacks consists of assuming the existence of a lattice

reduction oracle that solves some lattice problem such as the shortest vector

problem (see Chapter 10). Of course, no such oracle exists; however, it is a

very convenient heuristic based on the good practical performance of lattice

reduction algorithms. In some cases, the analysis can be reļ¬ned to reļ¬‚ect

the proved properties of available lattice reduction algorithm. This can be

necessary either to obtain a proved attack or when faced with lattice reduction

problems that are out of reach of current lattice reduction programs.

13.1 Direct attacks using lattice reduction

13.1.1 Dependence relations with small coeļ¬cients

One very important family of lattice reduction attacks makes use of lattices

to ļ¬nd linear relations with relatively small coeļ¬cients between numbers or

vectors. These relations may be considered over the integers or in some modu-

lar ring. Before addressing this problem, it is nice to look at its combinatorial

properties and determine for which instances we should expect small linear

relations.

13.1.1.1 Combinatorial properties

When searching short linear dependency between elements of a family of

numbers or vectors, two cases occur. We can either be looking for an āabnor-

mallyā small relation whose existence is guaranteed by the speciļ¬c properties

397

Ā© 2009 by Taylor and Francis Group, LLC

398 Algorithmic Cryptanalysis

of the family and may, for example, reļ¬‚ect the existence of a trapdoor in a

cryptographic scheme, or we may simply desire an ordinary relation whose

existence is generically guaranteed. The goal of this section is to analyze the

expected size of such generic relations.

LEMMA 13.1

Let v1 , . . . , vn be a family of vectors with integer coeļ¬cients in t coordinates

with t < n. Let M denote an upper bound for the absolute values of all

coeļ¬cients of the various vi s. Then there exists a non-zero integer relation

n

Ī»i vi = 0, (13.1)

i=1

such that max |Ī»i | ā¤ B, where B is given by

log2 M + log2 n + 1

log2 B = t Ā· (13.2)

nā’t

PROOF Consider all possible linear combinations

n

Āµi vi (13.3)

i=1

with 0 ā¤ Āµi < B. There are B n such relations and the resulting vectors

have all their coordinates upper bounded by nBM . Since there are less than

(2nBM )t such vectors, two distinct relations have to compute the same value,

as soon as (2BM )t ā¤ B n , which amounts to the relation given in Equa-

tion (13.2). This collision gives

n n

Āµi vi = Āµi vi , (13.4)

i=1 i=1

with 0 ā¤ Āµi < B and 0 ā¤ Āµi < B. After regrouping both sides by subtraction,

we ļ¬nd a non-zero relation as per Equation (13.1).

Note that in this proof, we do not invoke the birthday paradox, because the

various linear combinations are related and highly dependent.

Of course, for a speciļ¬c instance of the problem, the shortest dependence

relation (say w.r.t. the Euclidean length) can be much shorter than the generic

bound stated in the above lemma. A similar lemma exists in the case of

modular relations:

LEMMA 13.2

Let v1 , . . . , vn be a family of vectors with integer coeļ¬cients in t coordinates

with t < n. Let N be an integer. Then there exists a non-zero integer relation

Ā© 2009 by Taylor and Francis Group, LLC

Lattice-based cryptanalysis 399

modulo N :

n

Ī»i vi = 0 (mod N ), (13.5)

i=1

such that max |Ī»i | ā¤ min(B, N/2), where B is given by

B = N t/n . (13.6)

PROOF As before, consider all possible linear combinations

n

Āµi vi (mod N ) (13.7)

i=1

with 0 ā¤ Āµi < B. There are B n such relations and that the resulting vectors

are expressed modulo N . Thus, there are less than N t possible vectors and

two distinct relations have to compute the same value, as soon as N t ā¤ B n ,

which is equivalent to Equation (13.2). This collision gives

n n

Āµi vi = Āµi vi (mod N ), (13.8)

i=1 i=1

with 0 ā¤ Āµi < B and 0 ā¤ Āµi < B. Once again, this yields a non-zero relation.

Moreover, using reduction modulo N , this relation can be expressed with

coeļ¬cients in the range [ā’N/2, N/2].

13.1.1.2 Lattice reduction based search for short relations

Given a family of integer vectors on t coordinates v1 , . . . , vn as above, we

now form the integer lattice generated by the columns of the following matrix:

ļ£« ļ£¶

Kv1 Kv2 Ā· Ā· Ā· Kvn

0 Ā·Ā·Ā· 0 ļ£·

ļ£¬1

ļ£¬ ļ£·

1 Ā·Ā·Ā· 0 ļ£·,

ļ£¬0

L(v) = ļ£¬ (13.9)

ļ£·

ļ£¬. . .. .ļ£·

ļ£. . ..ļ£ø

. . .

0 Ā·Ā·Ā·

0 1

where K is a constant that is determined in the sequel.

Any linear relation as per Equation (13.1) can be mapped into a lattice

point:

(0 Ā· Ā· Ā· 0Ī»1 Ī»2 Ā· Ā· Ā· Ī»n ),

whose ļ¬rst t coordinates are equal to 0. Conversely, any lattice point starting

with t zeros corresponds to a linear relation. As a consequence, the main

restriction when choosing K is that this constant should be large enough to

ensure that the ļ¬rst vector in a reduced basis of the lattice has zero compo-

nents in its upper part consisting of the ļ¬rst t coordinates that correspond

Ā© 2009 by Taylor and Francis Group, LLC

400 Algorithmic Cryptanalysis

to the contributions of the vi s. Indeed, when the contribution of the ļ¬rst t

coordinates is not all zero, then the Euclidean norm of the lattice vector is at

least K.

To make the choice of K more precise, it is important to specify the lattice

reduction process that is used. With a lattice reduction oracle, it suļ¬ces to

choose a value K larger than the norm of the expected linear relation. Using

the expected size of generic relation as in the previous section, it suļ¬ces to

take:

ā 1

K= n(2M ) nā’t . (13.10)

When using the L3 lattice algorithm, K should be multiplied by a safety

coeļ¬cient 2n/2 , to account for the fact that the ļ¬rst vector of the reduced

basis may be larger than the shortest lattice vector by a factor up to 2n/2 .

With this choice for K, the lattice reduction process outputs short vectors

whose upper part is guaranteed to be zero and these vectors clearly corre-

spond to linear relations with small coeļ¬cients. As pointed out above, the

coeļ¬cients of the linear relation appear as coordinates of rank t + 1, Ā· Ā· Ā· , t + n

of the output vector.

13.1.1.3 Generalization to approximate relations

In some cases, instead of searching for exact relations between the vectors,

n

we may look for approximate relations, where i=1 Ī»i vi is no longer 0 but a

vector of small norm. A typical application of approximate relations occurs

when the values of the vectors vi s correspond to approximations of vectors

with real-valued coordinates.

In the case of approximate relations, one option is to choose K = 1. With

this choice output vectors are short but there is no special reason to obtain

an upper part formed of zeros. This clearly corresponds to approximate de-

pendencies with small coeļ¬cients.

Other choices for K are possible and the choice of K allows to control the

balance between the quality of the approximate relation and the size of the

coeļ¬cients. When K is larger, the linear relation is more precise but the

coeļ¬cients are larger.

13.1.1.4 Modular relations

Another possible extension is to consider linear relations which are no longer

exact but, instead, hold modulo some integer N . In other words, we need to

modify the lattice that is considered to deal with modular relations. Assuming

that N is large enough, the answer is very simple and consists of adding to

the lattice basis a few columns that ensure modular reduction. This is done

Ā© 2009 by Taylor and Francis Group, LLC

Lattice-based cryptanalysis 401

by considering the lattice generated by the columns of the following matrix:

ļ£« ļ£¶

Kv1 Kv2 Ā· Ā· Ā· Kvn KN I

0 Ā·Ā·Ā· 0

ļ£¬1 0ļ£·

ļ£¬ ļ£·

1 Ā·Ā·Ā· 0

ļ£¬0 0ļ£·

ļ£¬ ļ£·

ļ£¬. . .. .

ļ£. . ..

ļ£·

. . . 0ļ£ø

0 Ā·Ā·Ā· 1

0 0

where I is a tĆ—t identity matrix. Thanks to the additional columns, a modular

n

relation i=1 Ī»i i = 0 (mod N ) corresponds to a lattice point given by the

vector:

(0 Ā· Ā· Ā· 0 Ī»1 Ī»2 Ā· Ā· Ā· Ī»n ).

Of course, this is not suļ¬cient, we also need to make sure that such a

vector corresponding to a modular linear relation may appear as the result

of lattice reduction. To discuss this question, it is important to remark that

the above lattice includes short vectors which are not related to the existence

of any linear relation. These vectors can be obtained by multiplying any

of the ļ¬rst n vectors in the initial basis by a factor N and then by reducing

upper part mod N . This process yields vectors whose coordinates are all zero,

with the exception of a single coordinate with value N . Applying the above

construction to all vi s, we get a family of n vectors of norm N that are

mutually orthogonal. Experiments show that, if N is too small, this family

appears in sequence as the ļ¬rst output vectors of a reduced basis, and thus

completely masks any useful information about linear relations.

Clearly, assuming a lattice reduction oracle, we need to check that the

short vector that corresponds to the linear relation coming from Lemma 13.2

is shorter than these systematic vectors of norm N . ā Since the short vector

associated with the linear relation has norm at most nB, we need to check

ā

that nN t/n < N or equivalently that:

n

N > n 2(nā’t) . (13.11)

If the reduction is done using L3 , an additional factor of 2n/2 is required to

guarantee that the ļ¬rst vector of the reduced basis corresponds to a genuine

linear relation.

An interesting special case is the binary case where N = 2. In this case,

any linear relation with four or more non-zero coeļ¬cients yields a vector with

norm greater than the systematic vectors of norm N . Thus, we cannot hope

that a lattice reduction approach ļ¬nds a linear relation with more than 3 non-

zero coeļ¬cients. Moreover, relations with 3 coeļ¬cients or fewer can easily be

found using exhaustive search or birthday based algorithms. For this reason

no known lattice reduction algorithms are used for attacking binary problems,

such as ļ¬nding short codewords in linear codes.

Ā© 2009 by Taylor and Francis Group, LLC

402 Algorithmic Cryptanalysis

13.1.2 Some applications of short dependence relations

13.1.2.1 Knapsack problems

Solving knapsack problems is an interesting special case of ļ¬nding linear

relations between given numbers. Historically, breaking knapsack problems

was one major application of lattice based cryptanalysis in the 80s and 90s.

In addition to this historical role, solving a knapsack problem requires more

involved techniques than the general case, because the expected relations have

all their coeļ¬cients in {0, 1}. In cryptographic scenarios, we know that such

a relation exists between the given elements of the knapsack a1 , . . . , an and

n

the target sum s = i=1 i ai . Moreover we know that the Euclidean norm

ā

of the vector that encodes this relation is Ī±n, where Ī± is the proportion

of ones in the relations. Depending on the considered cryptosystem, Ī± may

or may not be known to the cryptanalyst but, in many practical examples it

is a part of the cryptographic system itself. Furthermore Ī± is an important

parameter when trying to analyze the performances of lattice-based attacks

against knapsack problems. When each coeļ¬cient of is chosen uniformly at

random, the number of ones is close to n/2 with high probability and we can

assume that Ī± = 1/2.

Another important parameter with knapsack problems is their density d

deļ¬ned as:

n

d= . (13.12)

log2 (maxi ai )

Since this parameter is the ratio between the number of elements in the knap-

sack and the number of bits in each element, it controls the expected size of

āparasiticalā linear relations of short norms between the ai , with coeļ¬cients

not restricted to {0, 1}.

It was shown in [LO85], that, when the density is low, parasitical vectors are

large and, thus, the shortest vector gives the solution to any random instance

of the knapsack problem. If we use the lattice from Equation (13.9), for

vectors on one coordinate only, and if we assume that shortest lattice-vectors

can be eļ¬ciently computed (even if this is not totally accurate), then low

density means d < 0.6463. Later in [CJL+ 92], this condition was improved to

d < 0.9408. In order to reach that bound, either of the two following lattices

(generated by columns) can be used:

ļ£« ļ£¶

Ka1 Ka2 Ā· Ā· Ā· Kan ā’Ks ļ£« ļ£¶

Ka1 Ka2 Ā· Ā· Ā· Kan Ks

ļ£¬ n + 1 ā’1 Ā· Ā· Ā· ā’1 ā’1 ļ£·

0 Ā· Ā· Ā· 0 1/2 ļ£·

ļ£¬ ā’1 n + 1 Ā· Ā· Ā· ā’1 ā’1 ļ£· ļ£¬ 1

ļ£¬ ļ£·ļ£¬

ļ£·

1 Ā· Ā· Ā· 0 1/2 ļ£· .

ļ£·ļ£¬ 0

ļ£¬

ļ£¬. . .. . . ļ£·ļ£¬

ļ£¬. . .. . ļ£·ļ£¬ . ļ£·

. .. . .ļ£·

. . . . ļ£·ļ£ . . .. .ļ£ø

. . . .

ļ£¬

ļ£ ā’1 ā’1 Ā· Ā· Ā· n + 1 ā’1 ļ£ø

0 Ā· Ā· Ā· 1 1/2

0

ā’1 ā’1 Ā· Ā· Ā· ā’1 n + 1

Before we close this section, let us warn the reader on the meaning of the

low-density attacks. The inequality d < 0.9408, provides a provable guarantee

Ā© 2009 by Taylor and Francis Group, LLC

Lattice-based cryptanalysis 403

that, from a shortest vector for a lattice computed from the problem one

can, with high probability, solve a random instance of the original knapsack

problem. Note, however, that in some cases, even lattice with larger density

can sometimes be solved using the approach. Moreover, the critical density

given here is for knapsack problems where the solution is balanced, with about

half zeros and half ones. For knapsacks with an unbalanced solution, i.e., when

the parameter Ī± given above diļ¬ers from 1/2, the critical density can be much

higher.

13.1.2.2 Polynomial relations

Finding the minimal polynomial of a real algebraic number x of degree d

corresponds to searching a linear dependency between 1, x, x2 , . . . , xd . Since

we are working with integer lattices, we choose a large integer K and we try

to ļ¬nd an approximate relation between the closest integers to K, Kx, Kx2 ,

. . . , Kxd . More precisely, we reduce the following lattice, given by columns:

K Kx Kx2 Ā· Ā· Ā· Kxd

ļ£« ļ£¶

0 Ā·Ā·Ā·

ļ£¬1 0 0ļ£·

ļ£¬ ļ£·

0 Ā·Ā·Ā·

ļ£¬0 1 0ļ£·

ļ£¬ ļ£·

1 Ā·Ā·Ā·

ļ£¬0 0 0ļ£·

ļ£¬ ļ£·

ļ£¬. . . .ļ£·

..

ļ£. . . .ļ£ø

.

. . . .

Ā·Ā·Ā·

0 0 0 1

The ļ¬rst vector of the reduced lattice can be written as:

( a0 a1 Ā· Ā· Ā· ad ).

Since we wish to interpret a0 , . . . , ad as the coeļ¬cients of the minimal poly-

nomial of x, i.e., we want to conclude that a0 + a1 x + a2 x2 + Ā· Ā· Ā· + ad xd = 0.

The most important parameters here are K and d. If d is smaller than the

degree of the minimal polynomial of x then this technique cannot succeed.

Likewise, if K is too small, then it cannot succeed either. To see this, assume

for example that x is between 0 and 1 and apply Lemma (13.1): this yields a

linear combination of the elements on the ļ¬rst row of the above matrix with

coeļ¬cients bounded above by B, where B satisļ¬es:

log K + log d + 1

log B =

nā’1

If K is small, this relation is much more likely to appear as an output to lattice

reduction algorithms than the one corresponding to the minimal polynomial.

In fact, a reasonable choice is to take K ā„ (max |ai |)2d . In other words, K

should be much larger than the expected size of the coeļ¬cients of the minimal

polynomial. If d is not exactly known, for example if we only know an upper

bound on the degree of the minimal polynomial of x, then the following trick

can be applied: take the ļ¬rst two or three vectors appearing in the output

Ā© 2009 by Taylor and Francis Group, LLC

404 Algorithmic Cryptanalysis

reduced lattice, transform them into polynomials and compute their GCD. If

K was large enough the minimal polynomial of x is usually obtained.

It is very important to know that the heuristic procedure we just described

can give positive results, i.e., it can ļ¬nd a minimal polynomial, but cannot

give a negative result. Moreover, if the method succeeds, we have a candidate

for the minimal polynomial; we can then check this candidate either up to

arbitrary large precision or formally if the situation permits.

13.1.2.3 NTRU lattices

Another kind of lattices has been an important source of progress for practi-

cal lattice reduction, lattice obtained from the NTRU cryptosystem. Without

entering into the details of this cryptosystem, let us simply state that the re-

sulting lattices have some very speciļ¬c properties. In fact, they are very simi-

lar to knapsack lattices, except that we are considering modular knapsacks on

vectors. Depending on the exact version of the NTRU system, these knapsack

instances contain 0, 1 and also ā’1 values; moreover, the exact number of 1

and ā’1 values is usually known. The modulo used in these knapsacks is very

small, for example, 64 is a typical value. In addition, there is some extra

structure coming from the fact that the vectors in this modular knapsack are

rotated copies of each other. Another diļ¬erence with pure knapsack systems

is that the sum of vectors we are searching does not exactly sum to zero but

instead to a vector with a ļ¬xed proportion of 1 and ā’1 values.

More precisely, starting from a vector with n coordinates, we are considering

a lattice spanned by the columns of a 2n Ć— 2n matrix of the following form:

ļ£« ļ£¶

a1 a2 Ā· Ā· Ā· an q 0 Ā· Ā· Ā· 0

ļ£¬ a2 a3 Ā· Ā· Ā· a1 0 q Ā· Ā· Ā· 0 ļ£·

ļ£¬ . . .. . . . .. . ļ£·

ļ£¬ ļ£·

ļ£¬. . . . . . . .ļ£·

ļ£¬. . . .. .ļ£·

ļ£¬ an a1 Ā· Ā· Ā· anā’1 0 0 Ā· Ā· Ā· q ļ£·

ļ£¬ ļ£·

ļ£¬ 1 0 Ā·Ā·Ā· 0 0 0 Ā·Ā·Ā· 0ļ£·

ļ£¬ ļ£·

ļ£¬ 0 1 Ā·Ā·Ā· 0 0 0 Ā·Ā·Ā· 0ļ£·

ļ£¬ ļ£·

ļ£¬. . . . . . .. . ļ£·

ļ£ . . .. . . . . . ļ£ø

.. . .. .

0 0 Ā·Ā·Ā· 0 0 Ā·Ā·Ā· 0

1

This lattice has many interesting properties:

ā¢ Starting from any vector in the lattice and rotating the coordinates of

the ļ¬rst and of the second half by the same amount, we obtain another

lattice vector.

ā¢ The short vectors corresponding to the NTRU secret key contain a small

number of 1 and ā’1 values and have a very short norm.

ā¢ The initial lattice basis given above already contains short vectors, of

norm q, corresponding to the reduction modulo q rule.

Ā© 2009 by Taylor and Francis Group, LLC

Lattice-based cryptanalysis 405

Despite these extra properties, fully reducing the NTRU lattice in large di-

mension in order to recover the secret key is a real challenge. This shows

that even when working with small numbers, lattice reduction is not an easy

task. A state-of-the-art of lattice reduction methods applied, in particular, to

ńņš. 14 |