. 3
( 7)


of (2) is trivial.
46 L. Keliher

Remark 1. In what follows, terms such as “¬rst round” and “last round” are
relative to the T rounds under consideration. Single-variable superscripts refer
to individual rounds, e.g., LP t (a, b; kt ) and ELP t (a, b) are LP and ELP values,
respectively, for round t (1 ¤ t ¤ T ). Superscripts of the form [i . . . j] (with
i < j) refer to a sequence of consecutive rounds viewed as a single unit, e.g.,
EDP [1...3] (∆x, ∆y) is an EDP value over rounds 1 . . . 3.

2.2 Provable Security (MELP and MEDP)
Given T ≥ 2 core rounds under consideration, the critical value for linear crypt-
analysis is the maximum expected linear probability (MELP)5 :

ELP [1...T ] (a, b) .
MELP = max (3)
a,b∈{0,1}N \0

The critical value for di¬erential cryptanalysis is the maximum expected di¬er-
ential probability (MEDP):

EDP [1...T ] (∆x, ∆y) .
MEDP = max (4)
∆x,∆y∈{0,1}N \0

For linear cryptanalysis / di¬erential cryptanalysis, the data complexity of an
attack with a given probability of success is proportional to the inverse of the
MELP / MEDP. Therefore provable security can be claimed if this value is
su¬ciently small that the corresponding data complexity is prohibitive [19, 20].

2.3 Linear and Di¬erential Characteristics
In general, for T ≥ 2, it appears to be infeasible to compute the MELP or MEDP
exactly for most SPNs. A traditional method of approximation involves the use
of characteristics.
De¬nition 2. A linear characteristic / di¬erential characteristic for rounds
1 . . . T is a (T +1)-tuple of N -bit masks / di¬erences, „¦ = a1 , a2 , . . . , aT , aT +1 /
„¦ = ∆x1 , ∆x2 , . . . , ∆xT , ∆xT +1 ; we view at / ∆xt and at+1 / ∆xt+1 as in-
put and output masks / di¬erences, respectively, for round t (1 ¤ t ¤ T ). The
corresponding expected linear characteristic probability (ELCP) / expected dif-
ferential characteristic probability (EDCP) is de¬ned as
ELP t (at , at+1 ) /
[1...T ]
ELCP („¦) =
EDP t (∆xt , ∆xt+1 ) .
[1...T ]
EDCP („¦) =

A number of papers (including some by the author) use maximum average linear hull
probability (MALHP), but MELP is more consistent with other related terminology.
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 47

Remark 2. For most SPNs, it is feasible to compute the values ELP t (at , at+1 ) /
EDP t (∆xt , ∆xt+1 ) (see Section 2.5), and therefore to compute ELCP [1...T ] („¦) /
EDCP [1...T ] („¦).

Using the Best Characteristic (Practical Security). A best linear / dif-
ferential characteristic is one that maximizes ELCP [1...T ] („¦) / EDCP [1...T ] („¦)
(a best characteristic is not necessarily unique) . There are well-known (and
relatively e¬cient) algorithms for ¬nding best characteristics [17, 21]. Denote
the best linear / di¬erential characteristic by „¦L = a1 , a2 , . . . , aT , aT +1 /
ˆˆ ˆˆ
„¦D = ∆ˆ , ∆ˆ , . . . , ∆ˆ , ∆ˆ
1 2 T T +1
x x x x . The data complexity of linear / di¬eren-
tial cryptanalysis is often estimated by assuming that
MELP = ELP [1...T ] (ˆ1 , aT +1 ) ≈ ELCP [1...T ] („¦L ) /
aˆ (5)
MEDP = EDP [1...T ] (∆ˆ 1 , ∆ˆ T +1 ) ≈ EDCP [1...T ] („¦D ) .
x x (6)
If the resulting data complexity is prohibitive, the cipher is practically secure [13].

2.4 Linear Hulls and Di¬erentials
The concept of linear hulls is due to Nyberg [19]. The counterpart for di¬erential
cryptanalysis is the concept of di¬erentials, due to Lai et al. [14].
De¬nition 3. If T ≥ 2 and a, b ∈ {0, 1}N / ∆x, ∆y ∈ {0, 1}N , then the corre-
sponding linear hull / di¬erential, denoted ALH (a, b)6 / DIFF (∆x, ∆y), is the
set of all linear / di¬erential characteristics for rounds 1 . . . T having a / ∆x
as the ¬rst mask / di¬erence and b / ∆y as the last mask / di¬erence, i.e., all
linear / di¬erential characteristics of the form
„¦ = a, a2 , a3 , . . . , aT , b / „¦ = ∆x, ∆x2 , ∆x3 , . . . , ∆xT , ∆y .

Theorem 1 ([19, 14]). Let a, b ∈ {0, 1}N . Then

ELP [1...T ] (a, b) = ELCP [1...T ] („¦)
„¦∈ALH (a,b)

EDP [1...T ] (∆x, ∆y) = EDCP [1...T ] („¦) .
„¦∈DIFF (∆x,∆y)

It follows from Theorem 1 that the approximation in (5) / (6) does not hold
in general, since ELP [1...T ] (a, b) / EDP [1...T ] (∆x, ∆y) is seen to be the sum
of (a large number of) terms ELCP [1...T ] („¦) / EDCP [1...T ] („¦), and therefore,
in general, the ELCP / EDCP of any characteristic will be strictly less than
the corresponding ELP / EDP value. Further, the MELP / MEDP may not be
equal to (i.e., may be strictly greater than) the ELP / EDP associated with any
best characteristic. This situation may result in an overestimation of the data
complexity”bene¬cial for an attacker, but problematic for a cipher designer.

Nyberg originally used approximate linear hull, hence the abbreviation ALH.
48 L. Keliher

2.5 Active S-Boxes and Branch Numbers
Let L denote the SPN linear transformation represented as an invertible N — N
binary matrix, i.e., if x, y ∈ {0, 1}N are the input and output, respectively, for
the linear transformation, then y = Lx (view x and y as column vectors).
Lemma 2 ([5]). If a ∈ {0, 1}N is a mask applied to the inputs of L, then there
is a unique corresponding mask b ∈ {0, 1}N applied to the outputs, i.e., there
is a mask b such that for all x ∈ {0, 1}N , a • x = b • (Lx). The relationship
between a and b is given by a = L b, where L is the matrix transpose of L.
If ∆x ∈ {0, 1}N is an input di¬erence for L, then ∆y = L(∆x) is the unique
corresponding output di¬erence, i.e., L(x)•L(x•∆x) = ∆y for all x ∈ {0, 1}N .
It follows from Lemma 2 that if at / ∆xt and at+1 / ∆xt+1 are input and out-
put masks / di¬erences for round t, then the resulting input and output masks /
di¬erences for the substitution stage of round t are at / ∆xt and bt = L at+1 /
∆yt = L’1 (∆xt+1 ). Further, at / ∆xt and bt / ∆yt can be naturally partitioned
into input and output masks / di¬erences for each s-box in round t. Enumerate
the s-boxes from left to right as S1 , S2 , . . . , SM , and let the input and output
t t t

masks / di¬erences for Sm be denoted at / ∆xt and bt / ∆ym (1 ¤ m ¤ M ).
t t
m m m
Then from Matsui™s Piling-up Lemma [16],
ELP (a , a LP Sm (at , bt )
t t t+1
)= (7)
m m
EDP (∆x , ∆x DP Sm (∆xt , ∆ym )
t t t+1 t
)= (8)

(here the superscript Sm has the obvious meaning).

De¬nition 4 ([25]). Let „¦ be a T -round linear / di¬erential characteristic for
rounds 1 . . . T . Then „¦ is called consistent if, for each s-box in rounds 1 . . . T ,
the input and output masks / di¬erences determined by „¦ for that s-box are
either both zero or both nonzero.

De¬nition 5 ([1]). Given a consistent linear / di¬erential characteristic, any
s-box for which the resulting input and output masks / di¬erences are nonzero is
called linearly / di¬erentially active (or just active, when the context is clear).
For the remainder of this paper, we limit our consideration to consistent
De¬nition 6. Given a linear / di¬erential characteristic, let v ∈ {0, 1}N be the
input or output mask / di¬erence for the substitution stage of round t. Then
the active s-boxes in round t can be determined from v (without knowing the
corresponding output or input mask / di¬erence). We de¬ne γv to be the M -bit
vector that encodes this pattern of active s-boxes: γv = γ1 γ2 . . . γM , where γi = 1
if the ith s-box is active, and γi = 0 otherwise, for 1 ¤ i ¤ M .
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 49

De¬nition 7. Let γ, γ ∈ {0, 1}M . Then

Wl [γ, γ ] = # y ∈ {0, 1}N : γx = γ, γy = γ , where x = L y
ˆ ˆ
Wd [γ, γ ] = # ∆x ∈ {0, 1}N : γ∆x = γ, γ∆y = γ , where ∆y = L(∆x) .
ˆ ˆ

Remark 3. Informally, the value Wl [γ, γ ] / Wd [γ, γ ] represents the number of
ˆ ˆ
ways the linear transformation can “connect” a pattern of active s-boxes in one
round (γ) to a pattern of active s-boxes in the next round (ˆ ).

The di¬usive power of a linear transformation is its ability to force some
minimum number of s-boxes to be active over a sequence of rounds. This is
quanti¬ed in the following de¬nition.
De¬nition 8 ([5]). The linear / di¬erential branch number, Bl / Bd , of an
SPN linear transformation is the minimum number of linearly / di¬erentially
active s-boxes in two consecutive rounds for any nonzero characteristic:

Bl = min wt(γx ) + wt(γy ) : y ∈ {0, 1}N \ 0, x = L y /
Bd = min wt(γ∆x ) + wt(γ∆y ) : ∆x ∈ {0, 1}N \ 0, ∆y = L(∆x) .

Remark 4. It is trivial to show that 2 ¤ Bl , Bd ¤ (M + 1). Hong et al. [6]
prove that Bl = (M + 1) if and only if Bd = (M + 1); in this case, the linear
transformation is called maximally di¬usive.

3 General Analysis of 2-Round MELP / MEDP
In this section we analyze the 2-round MELP and MEDP for a general SPN,
stating results that will be useful in later sections. We focus primarily on the
MELP, as the development for the MEDP is essentially parallel (we point out
the signi¬cant di¬erences in Section 3.1).
Without loss of generality, assume that the linear transformation is omitted
from round 2. Therefore the active s-boxes in round 2 can be determined directly
from an output mask for round 2, without applying Lemma 2.
Let a, b ∈ {0, 1}N \0 be input and output masks, respectively, for round 1 and
round 2, and let f = wt(γa ), = wt(γb ). Enumerate the active s-boxes in round 1
as S1 , S2 , . . . , Sf , and enumerate the active s-boxes in round 2 as S1 , S2 , . . . , S 2 .
1 1 1 2 2

Let ±i be the input mask for Si (derived from a), for 1 ¤ i ¤ f , and let β j

be the output mask for Sj (derived from b), for 1 ¤ j ¤ . The characteristics

in ALH (a, b) have the form a, y, b ; enumerate the distinct “middle” masks
as y1 , y2 , . . . , yW , where W = Wl [γa , γb ]. The yw are input masks for round 2;
denote the corresponding output masks for the substitution stage of round 1 as
x1 , x2 , . . . , xW (xw and yw are related as in the ¬rst part of Lemma 2). For a
given xw (1 ¤ w ¤ W ), let χ(w,i) be the output mask for Si (1 ¤ i ¤ f ), and
50 L. Keliher

for the corresponding yw , let … (w,j) be the input mask for Sj (1 ¤ j ¤ ). It

follows from Theorem 1, De¬nition 2, and (7) that
⎛ ⎞
1 2
⎝ LP Sj (… (w,j) , β j )⎠ . (9)
LP Si (±i , χ(w,i) ) ·
ELP (a, b) =
w=1 i=1 j=1

It is useful to consider the set of vectors (of length f + ) of the form

Vw = χ(w,1) , χ(w,2) , . . . , χ(w,f ) , … (w,1) , … (w,2) , . . . , … (w, , (10)

for 1 ¤ w ¤ W . Each coordinate of Vw is an element of {0, 1}n \ 0 (recall that
n is the s-box input/output size).

Lemma 3. Given a, b ∈ {0, 1}N \ 0 that satisfy wt(γa ) + wt(γb ) = Bl , let
W = Wl [γa , γb ], f = wt(γa ), = wt(γb ), and let χ(w,i) , … (w,j) be de¬ned as
above. Then for ¬xed i (1 ¤ i ¤ f ), the values χ(1,i) , . . . , χ(W,i) are distinct, and
for ¬xed j (1 ¤ j ¤ ), the values … (1,j) , . . . , … (W,j) are distinct. In other words,
for the set of vectors {Vw }W , all the values in any one position are distinct.

Proof. Contained in the proof of Theorem 1 in [23].

Remark 5. Clearly if wt(γa ) + wt(γb ) = Bl , then Wl [γa , γb ] ¤ (2n ’ 1). Further,
the values χ(w,i) and … (w,j) depend only on γa and γb , not on the speci¬c values
of a and b.

Lemma 4. Given a, b ∈ {0, 1}N \ 0 that satisfy wt(γa ) + wt(γb ) > Bl , let W =
Wl [γa , γb ], f = wt(γa ), = wt(γb ), and let χ(w,i) , … (w,j) be de¬ned as above.
Consider the vectors Vw in (10). Select any (f + ’Bl ) vector positions, and ¬x a
value in {0, 1}n \ 0 for each position. Now form the subset of {Vw }W consisting
of those Vw that contain the selected ¬xed values in the speci¬ed positions”denote
this subset by V. Then for each of the Bl vector positions whose values were not
¬xed, all the values in that position are distinct as we range over V.

Proof. Contained in the proof of Theorem 1 in [23].

Remark 6. It follows that the number of vectors in V is at most (2n ’ 1). The
vectors in V depend on γa and γb (not on the speci¬c values of a and b), and
also on the choice of vector positions to be assigned ¬xed values, together with
the particular ¬xed values chosen for those positions.

De¬nition 9. For T = 2 core SPN rounds, a Bl -list is a set of vectors, each of
which has length Bl , that has been derived in one of two ways:
1. by selecting any a, b ∈ {0, 1}N \ 0 satisfying wt(γa ) + wt(γb ) = Bl and
forming the set {Vw }W , as in Lemma 3;
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 51

2. by selecting any a, b ∈ {0, 1}N \ 0 satisfying wt(γa ) + wt(γb ) > Bl , forming
any set V according to Lemma 4, and then shortening each vector in V to
length Bl by removing those positions that were assigned ¬xed values.
Let Bl -LIST (i) denote the set of all Bl -lists that are formed by Option i above,
for i = 1, 2, and let

Bl -LIST = Bl -LIST (1) ∪ Bl -LIST (2) .

For any Z ∈ Bl -LIST, let δ(Z) denote the number of vectors in Z.
It follows from Remarks 5 and 6 that δ(Z) ¤ (2n ’ 1) for any Z ∈ Bl -LIST.
For any vector z = ζ 1 , ζ 2 , . . . , ζ Bl in any Bl -list Z, each ζ i is either an output
mask for a particular s-box in round 1, or an input mask for a particular s-box
in round 2. In the former case, let ±i denote a nonzero input mask for the
same s-box, and let LP — (±i , ζ i ) = LP (±i , ζ i ). In the latter case, let ±i denote
a nonzero output mask for the same s-box, and let LP — (±i , ζ i ) = LP (ζ i , ±i );
here ±i is playing the role of one of the β j used earlier, e.g., in (9), and LP — (·, ·)
is the transpose of the s-box LP table. (For simplicity, the speci¬c s-box is now
implicit in the notation.)
De¬nition 10. Let Z ∈ Bl -LIST. Then
⎛ ⎞
⎜ ⎟
LP — (±i , ζ i ) ⎠ .
σ(Z) = ⎝
±1 ,...,±Bl ∈{0,1}n \0
ζ 1 ,...,ζ B ∈Z

The following two theorems are central to this paper.
Theorem 2. The 2-round MELP is lower bounded by

max σ(Z) : Z ∈ Bl -LIST (1) .

Proof. It is easy to see that max σ(Z) : Z ∈ Bl -LIST (1) is exactly equal to

ELP [1...2] (a, b) ,
a,b∈{0,1} \0
wt(γa )+wt(γb )=Bl

which clearly lower bounds the 2-round MELP (see (3)).

Theorem 3. The 2-round MELP is upper bounded by

max {σ(Z) : Z ∈ Bl -LIST } .

Proof. Let M = max {σ(Z) : Z ∈ Bl -LIST}. Given the proof of Theorem 2, it
su¬ces to show that

ELP [1...2] (a, b) ¤ M .
a,b∈{0,1}N \0
wt(γa )+wt(γb )>Bl
52 L. Keliher

Let a, b ∈ {0, 1}N \ 0 such that F = wt(γa ) + wt(γb ) ’ Bl > 0. In keeping
with Lemma 4, isolate F of the active s-boxes to be assigned ¬xed output/input
masks (¬xed output masks for round-1 s-boxes, and ¬xed input masks for round-2
s-boxes), let these ¬xed masks be denoted ζ 1 , . . . , ζ F , and let the corresponding
input/output masks derived from a and b be denoted ±1 , . . . , ±F . Let the masks
derived from a and b for the “non-¬xed” s-boxes be denoted ±1 , . . . , ±Bl . Denote
the Bl -list resulting from this setup by Zζ 1 ,...,ζ F . Then

ELP [1...2] (a, b)
⎛ ⎞
⎝ LP — (±j , ζ j ) ⎠
LP — (±i , ζ i ) ·
i=1 j=1
ζ 1 ,...,ζ F ∈{0,1}n \0 ζ 1 ,...,ζ B ∈Zζ
1 ,...,ζ F
⎛ ⎞
⎜ ⎟

LP — (±j , ζ j ) ⎠
LP (±i , ζ i ) ⎝
i=1 j=1
ζ 1 ,...,ζ F ∈{0,1}n \0 ζ 1 ,...,ζ B ∈Zζ
1 ,...,ζ F
⎛ ⎞
¤ M⎝ LP — (±i , ζ i ) ⎠
ζ 1 ,...,ζ F ∈{0,1}n \0
= M,

where the last equality follows easily from (1).

3.1 Considerations Speci¬c to MEDP
Tailoring the above de¬nitions and results to the MEDP is straightforward: Bd
is substituted for Bl , Wd [ ] for Wl [ ], “di¬erence” for “mask,” DP values for
LP values, and DIFF (·, ·) for ALH (·, ·). As well, the relationship between input
and output di¬erences over the linear transformation is via the second part of
Lemma 2.

4 Lower Bounding the AES 2-Round MELP / MEDP
For the AES, Bl = Bd = 5; this is due to the fact that Bl = Bd = 5 for the 32-bit
linear transformation component of the 128-bit AES linear transformation (see
Figure 1) [5]. Hereafter we refer to Bl -lists or Bd -lists as 5-lists. As noted earlier,
all AES s-boxes are identical. It is not hard to see that computing the MELP /
MEDP for 2 AES rounds is equivalent to computing the MELP / MEDP for the
“reduced” SPN depicted in Figure 2.
To lower bound the 2-round MELP / MEDP for the AES, we compute the
value in Theorem 2 (or its MEDP counterpart) for the SPN in Figure 2. There
are 56 pairs (γ, γ ) ∈ {0, 1}4 — {0, 1}4 for which wt(γ) + wt(ˆ ) = 5; enumerate
these as (γ1 , γ1 ), (γ2 , γ2 ), . . . , (γ56 , γ56 ). A straightforward computation reveals
ˆ ˆ ˆ
that the 5-list associated with each pair (γs , γs ) contains exactly 28 ’ 1 = 255
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 53

32’bit LT

Fig. 2. Reduced 2-round AES

LowerBound = 0
For s = 1 to 56
For 1 ¤ ±1 , ±2 , ±3 , ±4 , ±5 ¤ 255
Sum = 0
For w = 1 to 255
Prod = 1
For i = 1 to 5
Prod = Prod — XP — (±i , D[s, w, i])
End For
Sum = Sum + Prod
End For
If (Sum > LowerBound)
LowerBound = Sum
End If
End For
16. End For

Fig. 3. Algorithm for lower bounding the 2-round MELP / MEDP for the AES

vectors. We store all the 5-lists in a 3-dimensional array of bytes, D[·, ·, ·], of size
56 — 255 — 5, such that D[s, ·, ·] contains the 5-list for (γs , γs ) in the obvious way.
Computing the lower bound on the MELP / MEDP reduces to the algorithm in
Figure 3. (We use XP to mean either LP or DP, as appropriate.) The algorithm
as presented is computationally intensive; however, we can make use of the fact
that we are searching for a maximum to incorporate signi¬cant pruning, greatly
reducing the running time.
Using the above algorithm, the lower bound on the 2-round MELP for the
AES is 1.638—2’28 . Since the best upper bound is 48,193,441 ≈ 1.44—2’27 [3, 23],
the 2-round MELP is now known almost exactly. The lower bound on the 2-round
MEDP is 1.656—2’29 , and since the best upper bound is 234 ≈ 1.23—2’28 [3, 23],

the 2-round MEDP is also now known almost exactly.
These lower bounds are important in light of the fact, stated earlier, that
the 4th power of any upper bound on the 2-round MELP / MEDP for the
AES (including the exact value) is an upper bound on the MELP / MEDP for
T ≥ 4 [23, 24]. This is how Park et al. obtain the upper bounds in the last row
54 L. Keliher

of Table 1. However, by constraining the 2-round MELP / MEDP as above, we
see that this approach is essentially exhausted (for the AES).

5 Best AES 2-Round Upper Bounds Not Tight
In this section we show that the best upper bounds on the 2-round MELP and
MEDP for the AES are not tight. First, we state the rationale behind the current
bounds, and then we explain why they cannot be attained.
It is well known that all the nontrivial rows and columns of the LP / DP
table for the AES s-box have the same distribution of values, given in Table 2
for the LP table and Table 3 for the DP table (ρi is a distinct value, and φi is
the frequency with which it occurs) [11, 23].

Table 2. Distribution of LP values for the AES s-box

i 1 2 3 4 5 6 7 8 9
82 72 62 52 42 32 22 12
ρi 0
64 64 64 64 64 64 64 64

φi 5 16 36 24 34 40 36 48 17

Table 3. Distribution of DP values for the AES s-box

i 1 2 3

ρi 1 1
64 128

φi 1 126 129

We again use XP to mean either LP or DP, as appropriate. Consider the
upper bound given in Theorem 3 (or its MEDP counterpart). Let Z ∈ 5-LIST.
Adapting Theorem 1 in [23] to our notation, σ(Z) equals the maximum value
possible for a 5-list if, for some ±1 , . . . , ±5 ∈ {0, 1}8 \ 0, the following two con-
ditions are satis¬ed:

1. For every ζ 1 , . . . , ζ 5 ∈ Z,

XP — (±1 , ζ 1 ) = XP — (±2 , ζ 2 ) = · · · = XP — (±5 , ζ 5 ) .

2. No nonzero XP value is omitted. In other words, for each i ∈ {1 . . . 5}, as we
range over all ζ 1 , . . . , ζ 5 ∈ Z the values XP — (±i , ζ i ) include every nonzero
value in the XP table row or column indexed by ±i . (Obviously a necessary
condition for the omission of a nonzero XP value is that δ(Z) < 255.)

Using the notation of Table 2 / Table 3, the maximum possible value for σ(Z)
ρ5 φi ” this is exactly the best upper bound on the 2-round MELP /
is i
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 55

MEDP [3, 23]. However, we have determined that this “worst-case” situation
never occurs.
In brief, we systematically generated all 5-lists in 5-LIST(2) (it is clear from
Section 4 that we don™t need to consider the elements of 5-LIST(1) ). We observed
that δ(Z) < 255 for all Z ∈ 5-LIST(2) (speci¬cally, 251 ¤ δ(Z) ¤ 254). For each
5-list generated, we ran an algorithm which ascertained that there do not exist
masks ±1 , . . . , ±5 ∈ {0, 1}8 \ 0 satisfying both Condition 1 and Condition 2
above. We used aggressive pruning to avoid iterating through all possible values
of the ±i (approximately 240 ) for any 5-list.

6 Modi¬ed Version of KMT2 Algorithm
The KMT2 algorithm (resp. its dual, KMT2-DC), due to Keliher, Meijer, and
Tavares [8, 11], is a general algorithm that can be used to compute an upper
bound on the MELP (resp. MEDP) for each value of T ≥ 2 for any SPN. For
a ¬xed value T ≥ 2, and for all nonzero patterns of active s-boxes in the ¬rst
and last rounds given by γ and γ , KMT2 (resp. KMT2-DC) computes a value
(γ, γ ) such that for all a, b ∈ {0, 1}N \ 0, if γa = γ and γb = γ , then
[1...T ]
UB ˆ ˆ
(a, b) ¤ UB (γ, γ ) (resp. EDP (a, b) ¤ UB (γ, γ )). The
[1...T ] [1...T ] [1...T ] [1...T ]
ELP ˆ ˆ
values in the third row of Table 1 are from KMT2 and KMT2-DC.
The KMT2 / KMT2-DC algorithm works recursively, i.e., for T ≥ 3, the
values UB [1...T ] (γ, γ ) depend on the values UB [1...(T ’1)] (γ, γ ). This allows for a
ˆ ˆ
very simple improvement: For any T ≥ 2, suppose that B is known to be an
upper bound on the MELP / MEDP for that value of T (from some external
source of information). Then if B < UB [1...T ] (γ, γ ), replace UB [1...T ] (γ, γ ) with
ˆ ˆ
B before proceeding to the computations for T + 1. In other words, enhance

Modi¬ed KMT2
-104 Park et al. [23]
log2 (upper bound)




3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number of rounds being approximated (T)

Fig. 4. Results from modi¬ed KMT2 for AES
56 L. Keliher

KMT2 / KMT2-DC by incorporating other upper bounds when those bounds
are superior to the values determined directly by the algorithm.
For the AES, we modi¬ed KMT2 in this fashion by incorporating the upper
bound on the MELP for T ≥ 4 due to Park et al [23]. The results are plotted in
Figure 4. For T ≥ 8, for example, the upper bound on the MELP is improved
to 1.778 — 2’107 . This improvement is slight, but it is an e¬ective “proof of
concept.” For other ciphers, the modi¬ed version of KMT2 / KMT2-DC may
yield much more signi¬cant improvements over existing upper bounds.
Interestingly, modifying KMT2-DC using the upper bound on the AES MEDP
for T ≥ 4 due to Park et al. yielded no improvement over the existing bound for
T ≥ 4. This appears to be an artifact of the simple distribution of DP values for
the AES s-box (LP / DP values play a fundamental role in KMT2 / KMT2-DC).

7 Conclusion
We have carefully analyzed bounds related to linear and di¬erential cryptanal-
ysis for the AES. We present nontrivial lower bounds on the 2-round maximum
expected linear probability (MELP) and maximum expected di¬erential proba-
bility (MEDP), trapping each value in a small interval. We then prove that the
best upper bounds on the 2-round MELP and MEDP are not tight. Finally, we
show how a modi¬ed version of the KMT2 / KMT2-DC algorithm can poten-
tially improve existing upper bounds on the MELP / MEDP for any SPN, and
we use the modi¬ed KMT2 algorithm to tighten the upper bound on the AES
MELP to 1.778 — 2’107 , for T ≥ 8.

1. E. Biham, On Matsui™s linear cryptanalysis, Advances in Cryptology”
EUROCRYPT™94, LNCS 950, pp. 341“355, Springer-Verlag, 1995.
2. E. Biham and A. Shamir, Di¬erential cryptanalysis of DES-like cryptosystems, Ad-
vances in Cryptology”CRYPTO™90, LNCS 537, pp. 2“21, Springer-Verlag, 1991.
3. K. Chun, S. Kim, S. Lee, S.H. Sung, S. Yoon, Di¬erential and linear cryptanalysis
for 2-round SPNs, Information Processing Letters, Vol. 87, pp. 277“282, 2003.
4. J. Daemen, L. Knudsen, and V. Rijmen, The block cipher Square, Fast Software
Encryption (FSE™97), LNCS 1267, pp. 149“165, Springer-Verlag, 1997.
5. J. Daemen and V. Rijmen, The Design of Rijndael: AES”The Advanced Encryp-
tion Standard, Springer-Verlag, 2002.
6. S. Hong, S. Lee, J. Lim, J. Sung, and D. Cheon, Provable security against di¬er-
ential and linear cryptanalysis for the SPN structure, Fast Software Encryption
(FSE 2000), LNCS 1978, pp. 273“283, Springer-Verlag, 2001.
7. J.-S. Kang, S. Hong, S. Lee, O. Yi, C. Park, and J. Lim, Practical and provable
security against di¬erential and linear cryptanalysis for substitution-permutation
networks, ETRI Journal, Vol. 23, No. 4, December 2001.
8. L. Keliher, Linear cryptanalysis of substitution-permutation networks, Ph.D. The-
sis, Queen™s University, Kingston, Canada, 2003.
Re¬ned Analysis of Bounds Related to Linear and Di¬erential Cryptanalysis 57

9. L. Keliher, H. Meijer, and S. Tavares, New method for upper bounding the
maximum average linear hull probability for SPNs, Advances in Cryptology”
EUROCRYPT 2001, LNCS 2045, pp. 420“436, Springer-Verlag, 2001.
10. L. Keliher, H. Meijer, and S. Tavares, Dual of new method for upper bounding the
maximum average linear hull probability for SPNs, Technical Report, IACR ePrint
Archive (http://eprint.iacr.org, Paper # 2001/033), 2001.
11. L. Keliher, H. Meijer, and S. Tavares, Improving the upper bound on the maximum
average linear hull probability for Rijndael, Eighth Annual International Workshop
on Selected Areas in Cryptography (SAC 2001), LNCS 2259, pp. 112“128, Springer-
Verlag, 2001.
12. L. Keliher, H. Meijer, and S. Tavares, Completion of computation of improved
upper bound on the maximum average linear hull probability for Rijndael, Techni-
cal Report, IACR ePrint Archive (http://eprint.iacr.org, Paper # 2004/074),
13. L. Knudsen, Practically secure Feistel ciphers, Fast Software Encryption,
LNCS 809, pp. 211“221, Springer-Verlag, 1994.
14. X. Lai, J. Massey, and S. Murphy, Markov ciphers and di¬erential cryptanaly-
sis, Advances in Cryptology”EUROCRYPT™91, LNCS 547, pp. 17“38, Springer-
Verlag, 1991.
15. C.H. Lim, CRYPTON: A new 128-bit block cipher, The First Advanced Encryption
Standard Candidate Conference, Proceedings, Ventura, California, August 1998.
16. M. Matsui, Linear cryptanalysis method for DES cipher, Advances in Cryptology”
EUROCRYPT™93, LNCS 765, pp. 386“397, Springer-Verlag, 1994.
17. M. Matsui, On correlation between the order of s-boxes and the strength of DES,
Advances in Cryptology”EUROCRYPT™94, LNCS 950, pp. 366“375, Springer-
Verlag, 1995.
18. W. Meier and O. Sta¬elbach, Nonlinearity criteria for cryptographic functions,
Advances in Cryptology”EUROCRYPT™89, LNCS 434, pp. 549“562, Springer-
Verlag, 1990.
19. K. Nyberg, Linear approximation of block ciphers, Advances in Cryptology”
EUROCRYPT™94, LNCS 950, pp. 439“444, Springer-Verlag, 1995.
20. K. Nyberg and L. Knudsen, Provable security against a di¬erential attack, Journal
of Cryptology, Vol. 8, No. 1, pp. 27“37, 1995.
21. K. Ohta, S. Moriai, and K. Aoki, Improving the search algorithm for the best lin-
ear expression, Advances in Cryptology”CRYPTO™95, LNCS 963, pp. 157“170,
Springer-Verlag, 1995.
22. S. Park, S.H. Sung, S. Chee, E-J. Yoon, and J. Lim, On the security of Rijndael-like
structures against di¬erential and linear cryptanalysis, Advances in Cryptology”
ASIACRYPT 2002, LNCS 2501, pp. 176“191, Springer-Verlag, 2002.
23. S. Park, S.H. Sung, S. Lee, J. Lim, Improving the upper bound on the maximum
di¬erential and the maximum linear hull probability for SPN structures and AES,
Fast Software Encryption (FSE 2003), LNCS 2887, pp. 247“260, Springer-Verlag,
24. F. Sano, K. Ohkuma, H. Shimizu, and S. Kawamura, On the security of nested
SPN cipher against the di¬erential and linear cryptanalysis, IEICE Transactions on
Fundamentals of Electronics, Communications and Computer Sciences, Vol. E86-A,
No. 1, pp. 37“46, 2003.
25. S. Vaudenay, On the security of CS-Cipher, Fast Software Encryption (FSE™99),
LNCS 1636, pp. 260“274, Springer-Verlag, 1999.
Some Algebraic Aspects of the Advanced
Encryption Standard

Carlos Cid

Information Security Group,
Royal Holloway, University of London,
Egham, Surrey, TW20 0EX, UK

Abstract. Since being o¬cially selected as the new Advanced Encryp-
tion Standard (AES), Rijndael has continued to receive great attention
and has had its security continuously evaluated by the cryptographic
Rijndael is a cipher with a simple, elegant and highly algebraic struc-
ture. Its selection as the AES has led to a growing interest in the study
of algebraic properties of block ciphers, and in particular algebraic tech-
niques that can be used in their cryptanalysis.
In these notes we will examine some algebraic aspects of the AES
and consider a number of algebraic techniques that could be used in the
analysis of the cipher. In particular, we will focus on the large, though
surprisingly simple, systems of multivariate quadratic equations derived
from the encryption operation, and consider some approaches that could
be used when attempting to solve these systems.
These notes refer to an invited talk given at the Fourth Conference on
the Advanced Encryption Standard (AES4) in May 2004, and are largely
based on [4].

1 Introduction
Rijndael is a block-cipher with a simple and elegant structure. It has been de-
signed to o¬er strong resistance against known attacks, in particular di¬erential
and linear cryptanalysis, while enabling e¬cient implementation on di¬erent
platforms. Given its careful design criteria, it seems unlikely that its security
can be a¬ected by conventional methods of cryptanalysis.
Rijndael has also a highly algebraic structure: the cipher round transforma-
tions are based on simple operations over the ¬nite ¬eld F28 . Its selection as the
AES has therefore led to a growing interest in the study of algebraic proper-
ties of block ciphers, as well as algebraic techniques that can be used in their
cryptanalysis [1, 2, 6, 12, 13].
This new approach in cryptanalysis seems promising. One reason is that con-
ventional methods of cryptanalysis of block-ciphers (e.g. di¬erential and linear
cryptanalysis) are generally based on a “statistical” approach: the attacker at-
tempts to construct probabilistic characteristics through as many rounds of the

H. Dobbertin, V. Rijmen, A. Sowa (Eds.): AES 2004, LNCS 3373, pp. 58“66, 2005.
c Springer-Verlag Berlin Heidelberg 2005
Some Algebraic Aspects of the Advanced Encryption Standard 59

cipher as possible, in order to distinguish the cipher from a random permuta-
tion. Most modern ciphers have been designed with these attacks in mind, and
therefore do not generally have their security a¬ected by them.
In contrast, the so-called algebraic attacks exploit the intrinsic algebraic
structure of a cipher: the attacker expresses the encryption transformation as
a (large) set of multivariate polynomial equations, and subsequently attempts
to solve such a system to recover the encryption key. Algebraic attacks could
open new perspectives in the cryptanalysis of block ciphers.
In these notes we will examine some algebraic aspects of the AES and consider
a number of algebraic techniques that could be used in the analysis of the cipher.

2 The Basic Structure of the AES
Rijndael is a key-iterated block cipher, which alternates key-independent round
transformations and key addition. In the basic version (considered here), the
cipher encrypts 128-bit blocks in 10 rounds, using 128-bit keys. We refer to [9]
for a full description of the cipher.
In these notes we will also consider the Big Encryption System (BES), another
iterated block cipher which was introduced in [13]. BES operates on 128-byte
blocks with 128-byte keys, and has also a very simple algebraic structure: one
round of the cipher consists of inversion, matrix multiplication and key addition,
all operations over F28 .We refer to [13] for the full description of the BES cipher.
Both the AES and the BES use a state vector of bytes, which is transformed
by the basic operations within a round. Furthermore, it is shown in [13] that one
can embed the AES into the BES, and that way obtain an alternative description
of the AES. This relationship between both ciphers may well provide new ways
for the cryptanalysis of the AES.

3 Algebraic Analysis of the AES
Due to the rich algebraic structure of the AES, there is currently a growing
interest in the study of algebraic techniques which could be applied in its crypt-
analysis. These are known as algebraic attacks. Currently there appears to be
two main approaches:

“ Study the system of polynomial equations derived from the cipher;
“ Study the AES underlying algebraic structure.

4 Algebraic Attacks
Unlike most conventional methods of cryptanalysis, the so-called algebraic at-
tacks attempt to exploit the intrinsic algebraic structure of the cipher. More
speci¬cally, the attacker expresses the encryption transformation as a set of
60 C. Cid

multivariate polynomial equations and tries to recover the encryption key by
solving the system.
While in theory most modern block ciphers can be fully described by a sys-
tem of multivariate polynomial equations over a ¬nite ¬eld, for the majority of
the cases such systems prove to be just too complex for any practical purpose.
However, given its algebraic structure, it seems that the AES could be more
vulnerable to such approach.
In [6] Courtois and Pieprzyk exhibit a large, sparse and overde¬ned system of
multivariate quadratic equations over F2 whose solution would recover the AES
encryption key. In the same paper they propose a method called XSL (eXtended
Sparse Linearization), as an attempt to e¬ciently solve the system. Around the
same time, Murphy and Robshaw [13] showed how to express the AES encryption
as a far simpler system of equations over F28 , which is derived from the BES.
If XSL or some of its variants are in fact valid methods, this system should be
faster to solve than the original one over F2 , and in theory, could provide an
e¬cient key-recovery attack.

5 Potential Attack Techniques
Given the AES and BES algebraic formulations, it is clear that an e¬cient
method for the solution of this type of system of multivariate quadratic equa-
tions would provide a key-recovery attack of the AES with potentially very few
plaintext-ciphertext pairs. While the problem of solving generic large systems of
multivariate equations of degree greater than one over a ¬nite ¬eld is known to
be NP-complete, it is conceivable that a technique can be developed which ex-
ploits the particular algebraic structure of the AES and BES systems. Below we
investigate a few approaches which have been proposed for solving such systems.

6 Linearization Methods
The method of linearization is a well-known technique for solving large systems
of multivariate polynomial equations. In this method one considers all monomi-
als in the system as independent variables and tries to solve the system using
linear algebra techniques. In order to apply the method, the number of linearly
independent equations in the system needs to be approximately the same as the
number of terms in the system. When this is not the case, a number of techniques
have been proposed to generate enough LI equations.

6.1 XL Algorithm
In [5] an algorithm for solving systems of multivariate quadratic equations called
XL (standing for eXtended Linearization) is proposed. XL is a simple algorithm:
if A is a system of m quadratic equations fi in n variables over a ¬eld K, and
D ∈ N, one executes the following steps:
Some Algebraic Aspects of the Advanced Encryption Standard 61

1. Multiply: Generate all the products j=1 xij — fi with k ¤ D ’ 2;
2. Linearize: Consider each monomial of degree ¤ D as a new variable and
perform Gaussian elimination on the system obtained in step 1;
3. Solve: Assume that step 2 yields at least one univariate equation. Solve this
4. Repeat: Simplify the equations and repeat the process to ¬nd the values of
the other variables.
The hope is that after few iterations the algorithm will yield a solution for the
In [5] the authors present some estimates for the complexity of the algorithm
for random systems with m ≈ n. In particular, they provide evidence that XL
can solve randomly generated overde¬ned systems of polynomial equations in
subexponential time.
The XL algorithm (as in the form above) is a reasonably new idea, and its
behaviour is not entirely understood yet. When analysing the algorithm, one
must examine two key points:
1. Does the algorithm always terminate?
2. Does the algorithm work as predicted?

Does XL Always Terminate? By applying well-known commutative algebra
techniques (Hilbert Theory), one can show that there are cases for which the
algorithm does not terminate [10]. However, when working over ¬nite ¬elds, this
problem can be avoided by adding to the system the underlying ¬eld equations
xq ’ xi = 0 .

Does XL Work as Predicted? Initially it was suggested that XL could solve
systems of polynomial equations in subexponential time when the number of
equations exceeded the number of variables by a small number. However there
has been strong evidence that some of the heuristics used in the original article
were too optimistic [3, 10].
This discrepancy arises from the fact that one may often overestimate the
number of linearly independent equations generated by the algorithm. There
has been recently few papers studying the XL [3, 10], and one could say that the
algorithm has just started to be better understood now.
In any case, it is widely agreed that application of the XL algorithm against
the polynomial system which arises from the AES (either over F28 or F2 ) does
not provide an e¬cient attack against the cipher.

6.2 Variants of XL
Since the introduction of the XL method, a number of variants have been pro-
posed. These attempt to exploit speci¬c properties of the polynomial systems,
such as how overde¬ned the system is, the order of the ¬eld, etc. Of particular
relevance for the AES is the method proposed in [6] by Courtois and Pieprzyk.
62 C. Cid

XSL is based on the XL method, but uses the sparsity and speci¬c structure
of the equations to mount the attack; instead of multiplying the equations by
all monomials up to certain degree, in the XSL algorithm the equations are
multiplied only by “carefully selected monomials” (we refer to [6] and its earlier
version [7] for a full description of the method). While this has the intention to
create less new terms when generating new equations, it is not entirely clear the
exact criteria used for selecting the monomials.
The system used in [6] to mount the attack has 8000 quadratic equations
and 1600 variables, over F2 (the variables represent the input/output bits). Two
attacks are described in [7]: the ¬rst one ignores the key schedule and therefore
needs 11 known plaintext/ciphertext pairs (for the AES-128); the second attack
uses the key schedule, and in theory could be mounted with a single known
plaintext/ciphertext pair. In [6] it is claimed that the second XSL attack would
have complexity of ≈ 2230 and ≈ 2255 when applied against the 128-bit and
256-bit AES, respectively. So the XSL attack would represent a (theoretical)
successful key-recovery attack against the 256-bit AES.

XSL Attack on the BES. As said earlier, the F28 -system derived from the
BES is much simpler than the F2 -system presented in [6]. In particular, it is far
sparser. This would strongly suggest that the XSL attack is more suited to the
BES system than to the original AES system.
Murphy and Robshaw consider in [13, 14] the consequences of the XSL attack
against the BES. Using the estimates given in [6], they conclude that if XSL is in
fact a valid technique, a key-recovery attack against the AES might be possible
with a work e¬ort of about 2100 encryptions. This would clearly represent a
successful attack against the AES-128.

Accuracy of the XSL Estimates. The XSL algorithm consists basically of
two main steps:
1. The equation generation procedure;
2. The T method at the end of the algorithm.
The ¬rst step corresponds to the multiplication of the initial set of equations
by selected monomials. This is done in similar manner of the XL algorithm.
The T method is used at the end, and in theory would allow the method to
e¬ectively solve the system even when the di¬erence between the total number
of terms and the number of linearly independent equations is reasonably large.
The main issue when considering XSL attacks (in fact, all the XL-based
attacks) against the AES is how accurate the estimates for the number of linearly
independent equations are. As explained above, there is evidence that some of
the heuristics in the original XL paper were too optimistic. In fact, there is even
more concern when considering the XSL algorithm. Additionally, it is not clear
how e¬ective the T method is as a last step of the algorithm. The algorithm is
an ad-hoc method, based on a number of heuristics arguments, and although this
might not invalidate the XSL technique entirely, it makes it harder to formally
Some Algebraic Aspects of the Advanced Encryption Standard 63

examine the algorithm and consider whether the XSL attacks described in [6]
work as claimed.
In fact, we have considered very small versions of BES, with reduced block
length and number of rounds, and smaller ¬eld. We ran a few simulations with
these versions, and it appears that the attacks do not work in the manner pre-
dicted in [6]. Again, while this might not invalidate the XSL technique, it could
raise doubts on whether the method is generally applicable against the AES and
BES systems. It is clear that more research is needed to determine how e¬ective
this technique is against the AES.

7 Computational Algebra Techniques
Solving multivariate polynomial systems is a typical problem studied in Alge-
braic Geometry and Commutative Algebra. The classical algorithm for solving
this type of problem is the Buchberger algorithm for calculating Gr¨bner Bases
(see [8] for de¬nitions and description of the algorithm). The algorithm generates
a basis for the ideal derived from the set of equations, which can then be used
to obtain the solutions.
The complexity of most algorithms used for calculating a Gr¨bner basis of
an ideal is closely related to the total degree of the intermediate polynomials
that are generated during the running of algorithm. In the worst case the Buch-
berger algorithm is known to run in double exponential time. One of the most
e¬cient algorithms known, due to Faug`re [11], appears to be single exponen-
tial. In any case, in practice it is widely believed that Gr¨bner Bases algorithms
cannot be used for e¬ciently solving generic systems with more than a handful
of variables.
However, the type of systems which arise from cryptosystems are often very
structured. In particular, the BES system has a very regular structure. This is
given for j = 0, . . . , 15 and m = 0, . . . , 7 by:

0 = w0,(j,m) + p(j,m) + k0,(j,m) ,
0 = xi,(j,m) wi,(j,m) + 1 for i = 0, . . . , 9,
0 = wi,(j,m) + ki,(j,m) + (j ,m ) ±(j,m),(j ,m ) xi’1,(j ,m ) for i = 1, . . . , 9,
0 = c(j,m) + k10,(j,m) + (j ,m ) β(j,m),(j ,m ) x9,(j ,m ) .

This system could be viewed as an “iterated” system of equations, with blocks
of similar “sub-systems” repeated for every round. One could also use the trans-
formation x ’ x254 as the S-Box inversion to eliminate a number of variables
(the BES system considered above has the simplest form, with only quadratic
and linear equations).
Furthermore, since the system includes the equations relating every variable
with its conjugates, we have the following easy proposition:

Proposition 1. The degree of polynomials occurring in the computation of a
Gr¨bner basis of a BES-type system with n variables is at most n.
64 C. Cid

This is clearly an upper bound, and we expect that in practice the degrees are
much lower. This fact, together with the particular structure of the system, can
be exploited to infer more precise bounds for the complexity of the attack.
One can also seek alternatives for the use of the usual Gr¨bner bases al-
gorithm. There are also a number of common techniques used in cryptanalysis
that could be used in conjunction with computer algebra methods . For example,
one could attempt to adapt the meet-in-the-middle technique and consider two
smaller systems. This has the potential to reduce the complexity of the attack.
One should also note that in practice the attacker is not primarily interested
in the full solution of the system, but rather in the key variables. In fact, in a
“partial key recovery” attack, only few key variables might su¬ce.
Therefore, it is possible that one may be able to use a combination of crypt-
analytic and algebraic techniques (including Linearisation and Gr¨bner Bases)
to mount a successful attack without actually computing the solution of the
entire system.

7.1 The Polynomial Ideal Generated by the BES System
Let S be the system of multivariate quadratic equations derived from the BES
encryption operation, and K a ¬xed encryption key. A closer look at the prop-
erties of the ideal generated by these polynomials may prove to be useful when
attempting to solve the system.
For every plaintext/ciphertext pair (P, C), we have a particular system S(P,C)
and an ideal 1

I(P,C) = S(P,C) ⊆ K[xi,(j,m) , . . . , wi,(j,m) , . . . , ki,(j,m) ].

In fact we are mostly interested in the ideal

I(P,C) = I(P,C) © K[k0 , k1 , . . . , k15 ]

where k0 , k1 , . . . , k15 are the ¬rst key variables (i.e. the original key).
Thus for every key K, we can associate an ideal of F[k0 , k1 , . . . , k15 ] de¬ned
IK = I(P,C) ,


where (P, C) run through all plaintext/ciphertext pairs.
Given a key K, a random plaintext block P , and C such that EK (P ) = C,
the probability that there exists another key K with EK (P ) = C is approx-
imately (1 ’ 1/(e ’ 1)) ∼ 42%. Therefore we expect that in many cases, for
a given plaintext/ciphertext pair (P, C), the K-dimension of the residue class
ring K[k0 , k1 , . . . , k15 ]/I(P,C) is greater than 1 (i.e., the corresponding reduced

Gr¨bner basis should contain polynomials with degree greater than 1).

To avoid inconsistent systems, we will make sure to describe the system in such way
that it does not include “0-inversions” (i.e. use the map x ’ x254 when necessary).
Some Algebraic Aspects of the Advanced Encryption Standard 65

On the other hand, the K-dimension of K[k0 , k1 , . . . , k15 ]/IK is almost cer-
tainly 1. In other words, we expect IK to be of the form

IK =< k0 ’ κ0 , k1 ’ κ1 , . . . , k15 ’ κ15 >

with κi ∈ K. If this is not true, then there are at least two keys K1 and K2 such
EK1 (P ) = EK2 (P )
for every plaintext block P , and K1 and K2 induce the same permutation on the
set of possible plaintext blocks, which would not appear to be the case for the

8 Alternative Approaches
It is clear that an e¬cient method for solving the polynomial systems considered
so far would represent a successful key-recovery attack against the AES. However,
even when the system cannot be solved, other approaches could well be used in
order to mount less ambitious attacks against the cipher. One could examine
common applications of the AES, such as AES-based hash function and MAC
constructions, modes of operation, relation between plaintexts, etc.
At the very least, a cryptanalyst would like to ¬nd a polynomial-time dis-
tinguisher between the cipher and a random permutation. This could be used
either to mount a practical attack or simply to show some structural weakness
of the cipher.
Given the rich algebraic structure of the cipher, it is not inconceivable that an
“algebraic” distinguisher exists. This would most likely exploit the byte-oriented
structure of the cipher and the typical round version of the BES, which consists
of inversion, matrix multiplication and key addition over F28 :

b ’ MB .b’1 + (kB )i

Mathematically, this seems to be the most natural representation of the cipher.
Both the S-Box (inversion on F28 ) and the linear layer are highly structured,
and this could well be exploited in the analysis of the cipher.

9 Conclusion
Rijndael is a cipher with a simple, elegant and highly algebraic structure. Its se-
lection as the AES has led to a growing interest in the study of algebraic proper-
ties of block ciphers, with a particular focus on algebraic techniques that can be
used in their cryptanalysis. One promising approach is to exploit the large, though
surprisingly simple, system of multivariate quadratic equations derived from the
cipher. An e¬cient method for solving this system would represent a successful
key-recovery attack against the AES. While the problem of solving such systems
66 C. Cid

is known to be hard, it is not entirely unlikely that a technique can be developed
which exploits the particular algebraic structure of these particular systems.
Furthermore, it is also possible that the AES algebraic structure could be
exploited on mounting less ambitious attacks. The AES has a rich algebraic
structure, and while many of these properties might not prove to be relevant
in the cryptanalysis, it is not inconceivable that one could ¬nd a novel way to
explore this structure in the analysis of the cipher.

1. Elad Barkan and Eli Biham. In how many ways can you write Rijndael? In Yuliang
Zheng, editor, Advances in Cryptology - ASIACRYPT 2002, volume 2501 of Lecture
Notes in Computer Science, pages 160“175. Springer, 2002.
2. Alex Biryukov and Christophe De Canniere. Block Ciphers and Systems of
Quadratic Equations. In FSE™2003, 2003.
3. Jiun-Ming Chen and Bo-Yin Yang. Theoretical Analysis of XL over Small Fields.
In Proceedings of the 9th Australasian Conference on Information Security and
Privacy, 2004. to appear.
4. Carlos Cid, Sean Murphy, and Matthew Robshaw. Computational and Algebraic
Aspects of the Advanced Encryption Standard. In Proceedings of the Seventh
International Workshop on Computer Algebra in Scienti¬c Computing - CASC
2004, 2004. to appear.
5. Nicolas Courtois, Alexander Klimov, Jacques Patarin, and Adi Shamir. E¬cient
Algorithms for Solving Overde¬ned Systems of Multivariate Polynomial Equations.
In Eurocrypt™2000, pages 392“407. Springer, 2000.
6. Nicolas Courtois and Josef Pieprzyk. Cryptanalysis of Block Ciphers with Overde-
¬ned Systems of Equations. In Yuliang Zheng, editor, Advances in Cryptology -
ASIACRYPT 2002, volume 2501 of Lecture Notes in Computer Science, pages
267“287. Springer, 2002.
7. Nicolas Courtois and Josef Pieprzyk. Cryptanalysis of Block Ciphers with Overde-
¬ned Systems of Equations. Cryptology ePrint Archive, Report 2002/044, 2002.
8. David Cox, John Little, and Donal O™Shea. Ideals, Varieties, and Algorithms.
Undergraduate Texts in Mathematics. Springer, Second edition, 1997.
9. Joan Daemen and Vincent Rijmen. The Design of Rijndael. Springer-Verlag, 2002.
10. Claus Diem. The XL-algorithm and a conjecture from commutative algebra, 2004.
11. Jean-Charles Faug`re. A new e¬cient algorithm for computing Gr¨bner Bases
e o
without reduction to zero F5. In T. Mora, editor, International Symposium on
Symbolic and Algebraic Computation - ISSAC 2002, pages 75“83, July 2002.
12. N. Ferguson, R. Shroeppel, and D. Whiting. A simple algebraic representation
of Rijndael. In Proceedings of Selected Areas in Cryptography, pages 103“111.
Springer-Verlag, 2001.
13. Sean Murphy and Matthew Robshaw. Essential Algebraic Structure within the
AES. In M. Yung, editor, Advances in Cryptology - CRYPTO 2002, volume 2442
of LNCS, pages 1“16. Springer-Verlag, 2002.
14. Sean Murphy and Matthew Robshaw. Comments on the Security of the AES and
the XSL Technique. Electronic Letters, 39:26“38, 2003.
General Principles of Algebraic Attacks and
New Design Criteria for Cipher Components

Nicolas T. Courtois

Axalto Cryptographic Research & Advanced Security,
36-38 rue de la Princesse, BP 45, 78430 Louveciennes Cedex, France

Abstract. This paper is about the design of multivariate public key
schemes, as well as block and stream ciphers, in relation to recent at-
tacks that exploit various types of multivariate algebraic relations. We
survey these attacks focusing on their common fundamental principles
and on how to avoid them. From this we derive new very general design
criteria, applicable for very di¬erent cryptographic components. These
amount to avoiding (if possible) the existence of, in some sense “too sim-
ple” algebraic relations. Though many ciphers that do not satisfy this
new paradigm probably still remain secure, the design of ciphers will
never be the same again.

Keywords: algebraic attacks, polynomial relations, multivariate equa-
tions, ¬nite ¬elds, design of cryptographic primitives, generalised linear
cryptanalysis, multivariate public key encryption and signature schemes,
HFE, Quartz, S¬‚ash, stream ciphers, Boolean functions, combiners with
memory, block ciphers, AES, Rijndael, Serpent, elimination methods,
Gr¨bner bases.

1 Introduction
In this paper we consider a very ambitious question: how to design secure cryp-
tosystems and in particular how to design secure ciphers ? Very little real answers
do exist in this area. However it is possible to learn from our experience, and
formulate some design criteria, resulting on the one hand, from some practical
requirements on cryptographic systems, and on the other hand, from the known
attacks. Doing so we are still not done, and this for two reasons. First of all,
the recommandations do usually con¬‚ict with each other and are not obvious to
balance. Moreover for both practical implementation criteria and security crite-
ria, it is always hard to know and debatable to what extent exactly a system
satis¬es these. Nevertheless, the work on the design criteria is and always was
an important and necessary area of research.

Work supported by the French Ministry of Research RNRT Project “X-CRYPT”.

H. Dobbertin, V. Rijmen, A. Sowa (Eds.): AES 2004, LNCS 3373, pp. 67“83, 2005.
c Springer-Verlag Berlin Heidelberg 2005
68 N.T. Courtois

This paper is about an emergence of a new type of design criteria on various
types of cryptographic primitives. It turns out that many recent attacks on
public key signature and encryption schemes, block and stream ciphers (including
AES) have a common denominator. This common feature is the exploitation (by
various methods) of the existence of various types of algebraic relations that
involve both the inputs and the outputs of some component. We will formulate
the resulting design criteria on the respective components that will be very
similar, if not identical.

2 From Boolean Functions to Algebraic Relations
Most of the current cipher design paradigms can be seen in terms of looking for
in some sense “good” Boolean functions / “good” vectorial functions (S-boxes)
and avoiding “bad” ones. The outputs of cryptosystems (and their components)
should simply not depend on their inputs in a way that is too simple. The de¬-
nition of the word “simple” does naturally vary from one place to another. For
example in the design of stream ciphers, there are many so called “non-linearity”
criteria, dictated by some (not always really practical) attacks. Building ciphers
with such components allows to make sure that many (from real to very theoret-
ical) attacks will not work very well on these ciphers. For example, in [25] Golic
explains the criteria on the Boolean functions that should be used in stream
ciphers. Obviously these criteria, to some extent being necessary in the design
of good ciphers, are by far insu¬cient and nothing guarantees that a cipher
that made out of “good” components will be good itself (i.e. will be secure).
Moreover, using such components is sometimes even perceived (if they are really
very good) as a potential danger (special may mean dangerous). In particular,
many recent attacks in di¬erent areas of cryptography do work in spite of using
very good (sometimes optimal) components w.r.t. aforementioned criteria (for
example highly non-linear components).

2.1 Interesting Special Case: AES S-Box
AES (Rijndael) [19, 20] is precisely a good case to study in this respect. First,
because its security is simply essential, and more importantly, because it pushes
the (aforementioned) philosophy that culminates two decades of research in the
design of modern ciphers to its limits. A general question is, whether it is possible
(and how) to attack ciphers build with highly-nonlinear components (and thus
build with eminently “good” Boolean function. Obviously studying this question
will in most cases not give results being directly applicable to AES, but it gives
us the opportunity to come up with new approaches to attack AES later, as well
as should help us to simply design much better ciphers in the future (that avoid
also the recent attacks).
In [5], Canteaut and Videau study the non-linearity properties of the Inverse
function in GF (2n ) (the only non-linear component of AES) with relation to
linear, di¬erential and higher-order di¬erential attacks. It is exceptional and
General Principles of Algebraic Attacks and New Design Criteria 69

close to optimality, see [5]. On page 6 of [21], the designers of AES say: “[...] The
disadvantage of these boxes is that they have a simple description in GF (2m ), [...]
we are not aware of any vulnerability caused by this property. [...] Should such
a vulnerability exist, one can always replace the Sboxes by Sboxes [...] that are
not algebraic over GF (2m ). [...]” Unfortunately important vulnerability of the
inverse S-box does exist. Historically the idea goes back to the algebraic attacks
on several so called multivariate public key schemes, initiated by Patarin in [41],
greatly improved by Courtois et al. [9, 18], and recently adopted by Faug`re e
and Joux [31]. The seminal idea (due to Patarin) is to study the security of
a cipher component not in terms of Boolean/algebraic functions, but in terms
of Boolean/algebraic relations that involve both inputs and output bits. In
the last two years, this precise idea, has led to a sudden collapse of several
important families of stream ciphers, as demonstrated by Courtois, Meier et al in
[16, 17, 2, 10, 12] and numerous other recent papers. We explain these in Section
4. But does it matter at all for block ciphers ? This will the main subject of this
paper starting from Section 5.

3 From Multivariate Public Key Schemes to General
Algebraic Attacks
At Crypto™95, Jacques Patarin proposes a very interesting attack on the
Matsumoto-Imai public-key cryptosystem of Eurocrypt™98, see [36, 40]. This cryp-
tosystem, at the time considered as very promising, is based on a univariate
transformation, that can be for example X ’ X 3 . This cube function, instead
of being over a ring of numbers modulo some N like with RSA, is over a ¬nite
¬eld, for example GF (280 ). The order of a multiplicative group of GF (280 ) is
known and therefore in many cases, such a power function over a ¬nite ¬eld is,
unlike in RSA, easily invertible. However, the same algebraic structure of this
function can be “concealed” (cf. [36, 40]) when it is written in a new representa-
tion, as a set of multivariate quadratic polynomial functions. It is done in such
a way that it is easy to compute it forwards, and hard backwards, for anyone
that does not known how the system of equations have been generated. Thus,
Matsumoto and Imai construct their public key cryptosystem, see [36] for more
Incidentally, due to the cube function, this cryptosystem have extremely good
properties when considered in terms of Boolean functions, see [39, 5]. Yet, this
did not prevent Jacques Patarin from rather badly breaking this cryptosystem,
at Crypto™95 [40]. He shows that there are simple algebraic relations that re-
late input and output bits of this cryptosystem. More precisely, if the input
is (x0 , . . . , x79 ) and the output is (y0 , . . . , y79 ) there exist bi-linear equations of
type, for example ij ±ij xi yj = 0. Then, Patarin remarks that if such equations
exist, they can be easily found from the public key, and then subsequently they
can be used to decrypt any message: if we substitute a concrete values of y in
these equations they become linear and can be solved to recover the xi .
70 N.T. Courtois

This attack has been generalised by Courtois in [9]. This paper also proposes
a ¬rst “theory” of algebraic attacks on public key schemes1 that we will develop
and explain here. This “theory” is quite simple and can potentially be applied
to many di¬erent situations that arise in cryptanalysis. To achieve this we will
be voluntarily imprecise. Some details vary from one attack to another, and it
should be applicable also to situations that are very di¬erent than the area of
algebraic attacks.
From one point of view, one can think that it applies to more or less all cryp-
tographic attacks. To explain this, let™s consider any attack on any deterministic
one-way function which is described as a set of explicit arithmetic formulae
yi = Fi (x0 , . . . , xn’1 ). The answer x we are looking for is also seen as a set
of equations, though much simpler xi = . . ., which a hypothetical attack would
evaluate to. We wish to look at any deterministic attack as a series of transfor-
mations that starts from (somewhat “complex”) initial equations and eventually
produces somewhat “simpler” ones (containing the solution to the system). Sim-
ilarly, following [9], starting from some notion of complexity that is adapted to
our initial equations, and makes them hard to solve, we can also try to construct
attacks that work exactly in this way. For this, still following [9], we need to
study (and ¬nd) methods that given some initial equations, give hope to gener-
ate some “simpler” equations. With such methods we hope to solve the system,
by successive simpli¬cations. For example, one possible notion of complexity is
the non-linear degree. In Matsumoto-Imai and HFE systems [36, 40, 41] we have
initial equations that are quadratic and our goal will be to ¬nd some simpler,
linear equations. Most attacks known on these systems work in this way, e.g.
[40, 41, 9, 18, 31].
Attacks that work in such a way, may be iterative with many steps, which
makes them hard to understand and study. For example it is far from being clear
what is the complexity of Courtois-Pieprzyk XSL attack on AES [15]. However,
again following [9], what one should study, and what is really interesting, is
what happens in one step of the attack. From the cryptological point of view
the main question will be not what is the exact complexity of an attack, but
rather if the attack is feasible in general (at least in some cases), and even
more importantly, how to completely avoid such attacks. For these questions,
the most important answers may already be given by looking at the beginning
of the attack process. Do we gain something ? Can we by some means gain
anything simpler from the initial equations ? Obviously it is always possible to
combine equations in some way, (and it is very simple for Boolean algebraic
equations over a ¬nite ¬led). However, usually, we obtain other equations that
have nothing special and are in fact more complex than the initial equations.
Following [9], the interesting phenomenon to watch for is a type of “collapse in
the complexity”. For example, we take some multivariate equations of degree 2,
combine them algebraically to get an equation of expected degree 4, but when we

It applies also almost literally to algebraic attacks on block and stream ciphers, but
at the time, nobody really suspected this.
General Principles of Algebraic Attacks and New Design Criteria 71

compute this equation its degree collapses from 4 to 3. Here we gain something,
some simpli¬cation arises. The heuristic is then that, if it can be done once, it can
be done several times and in many cases we end up by obtaining a full working
algebraic attack. In rare cases, it will obviously fail, but we know that designing
systems such that there is no “collapse of complexity” in some sense, will prevent
many attacks, whether they work well, or not. For example, building a cipher
with large random components(e.g. S-boxes) makes such cases of “complexity
collapse” to some degree very unlikely if not impossible, this whatever is our
de¬nition of complexity.
When, as in many cases studied in this paper, the notion of complexity is the
non-linear degree of a multivariate polynomial form of a function, the existence of
“complexity collapse” can be characterised as follows. If an algebraic combination
of the original equations is of lower degree than expected, it means that there
exist a non-trivial and in some sense “simple” (e.g. low degree) function G such

G (x0 , . . . , xn’1 ; F0 (x0 , . . . , xn’1 ), . . . Fm’1 (x0 , . . . , xn’1 )) = 0

If we replace yi = Fi (x0 , . . . , xn’1 ) we get an algebraic relation between input
and output bits:

G (x0 , . . . , xn’1 ; y0 , . . . , ym’1 ) = 0 (—)

In these formulas the xi and the yi may be in GF (2), but may be also in
any other ¬nite ¬eld GF (q). We are at the right point. It turns out that talking
about algebraic relations is more general than considering “a collapse in the
complexity”: algebraic relations may exist, be found and directly be used in an
attack, disregarding the initial complexity of the equations, that in some cases
is within no comparison (much more complex).
Undoubtedly, there are many cases in which the very existence of an alge-
braic “complexity collapse” or/and resulting algebraic relations at some level, is
somewhat trivial and inevitable. There are also many cases in which such occur-
rence can be an isolated phenomenon that does not lead to interesting attacks.
Yet, to make sure that a system resists to large class of possible attacks it is
sensible to avoid such situations whatsoever. (This concerns, as we will explain
later mainly Generalised Linear Cryptanalysis and direct algebraic XSL-type at-
tacks, and potentially other future attacks). Another way of seeing such design
criteria is to say that, in a sense, components of our system (or the whole system
itself) will be “more” indistinguishable from random components (e.g. random
functions or random permutations), and thus less attacks should be possible.
In the following sections we will explain brie¬‚y, how this general paradigm
of algebraic attacks applies to other contexts. This list is not exhaustive, and
we expect that many other areas of cryptographic security can be described in a
meaningful way in terms of “complexity collapse” and/or simple “I/O relations”
with respect to some (not necessarily algebraic/polynomial degree) notion of
72 N.T. Courtois

3.1 How to Build Secure Multivariate Public Key Cryptosystems
Here the conclusion follows immediately: for a trapdoor function to be secure
we need to make sure that there is no multivariate relations such as (—) that
contain less than, let™s say 280 di¬erent monomials (in general, for ¬nite systems,
it is impossible to avoid the existence of algebraic relations, but their size will be
astronomical). In practice, for most systems, if there is no algebraic/multivariate
relations of size less than 240 , there should be no practical algebraic attack on the
system (because we need to be able to recover the equations ¬rst). However, in
some special cases, equations of large sizes can be build directly by a method that
depends on the cipher, and then they can be used by substitution of variables.
Therefore the proposed bound of 280 gives a better guarantee.

4 Algebraic Attacks on Stream Ciphers
The algebraic attacks on stream ciphers have been introduced in 2003 by Cour-
tois and Meier [17, 16]. Since then, the area has known an important research
activity with many interesting contributions, to quote only some, by Armknecht
and Krause [2, 1], Cho and Pieprzyk [6], Courtois [12, 10], Hawkes and Rose [27],
Lee, Kim, Hong, Han and Moon [33], Meier, Carlet and Pasalic [37], and others.
In this paper we only explain the main principle of algebraic attacks on stream
ciphers from [17, 10], and what are the resulting design criteria for components
of such ciphers.
The algebraic attack on stream ciphers is extremely general and applies po-
tentially to all ciphers that have some linear feedback (for example based on
LFSRs or cellular automata). We assume that in our cipher the ¬rst (linear)
component is as follows. Let x = (x0 , . . . , xn’1 ) ∈ GF (q)n be the state of this
component. We assume that the cipher is regularly clocked (some relaxations
are possible, see [17, 16]) and at each clock the linear state x is updated by some
multivariate linear function L. This means that at each clock x becomes L(x),
and if K = (K0 , . . . , Kn’1 ) is the initial state, at time t the state will be called
x(t) and by de¬nition we have x(t) = Lt (K).
Then we assume that the state of the linear component is supplied to the
second “¬lter/combiner” component that outputs the keystream (it may output
one or several bits at a time). This output component can be stateless or stateful:
in the second case it also has internal memory bits that are updated at each clock.
In this case, we have in addition to the linear feedback in the ¬rst component,
a non-linear feedback in the second component (but usually of much smaller
size/importance than the linear feedback).
Let l be the number of memory bits in the second component, that after
(t) (t)
the time t are a0 , . . . , al’1 . In particular, for stateless ¬lters/combiners l is 0,
for example when a Boolean function is used to ¬lter/combine the state bits
of one or several LFSRs. The initial inner state is a(’1) , exists before t = 0,
and can be anything (it is unknown in the attack and algebraic attacks tend to
eliminate all the monomials in the ai ). At each clock t, the combiner outputs m
General Principles of Algebraic Attacks and New Design Criteria 73

(t) (t)
bits y0 , . . . , ym’1 , for t = 0, 1, 2, . . .. For example, if the ciphers uses a single
Boolean function to combine input bits, we have simply m = 1. In general,
the second component can be described as a pair of functions F = (F1 , F2 ) :
GF (2)n+l ’ GF (2)m+l , that given the current state and the input, computes
the next state and the output:
(t+1) (t+1) (t) (t) (t) (t)
, . . . , ym’1 ) = F1 (x0 , . . . , xn’1 , a0 , . . . , al’1 )
F: (t+1) (t+1) (t) (t) (t) (t)
, . . . , al’1 ) = F2 (x0 , . . . , xn’1 , a0 , . . . , al’1 )
The most general form of an algebraic attack on stream ciphers following
closely [10, 12, 17] works as follows.
• We assume that L is known (for example the LFSRs used in the cipher are
known or can be guessed/revovered).
• We consider M consecutive states of the cipher.
• Find (by some method that is very di¬erent for each cipher) one (at least,
but one is enough) multivariate relation G between the state bits xi and
some M consecutive outputs, for example:
G(x0 , x1 , . . . , xn’1 ; y (0) , . . . , y (M ’1) ) = 0
We assume that G is of degree d in the xi (the degree in the yi may also be
important, but usually will not in¬‚uence the total attack complexity).
• By recursive structure of the cipher, for any initial state K and for any t,
the same equation will apply to all consecutive windows of M states
G(Lt (K); y (t) , . . . , y (t+M ’1) ) = 0
• The y (t) , . . . , y (t+M ’1) are replaced by their values known from the observed
output of the cipher.
• Due to the linearity of L, for any t, the degree of these equations is still d.
• For each keystream bit, we get a multivariate equation of degree k in the xi .
• Given many keystream bits, we inevitably obtain a very overde¬ned system
of equations (i.e. great many multivariate equations of degree d in the Ki ).
• To solve these equations we may apply the XL algorithm from Eurocrypt
2000 [43], adapted for this purpose in [16] and other improved elimination
techniques such as computing Gr¨bner bases combined with linear algebra,
see [22, 23]. However, if we dispose of a su¬cient amount of keystream, (which
is frequently not very big, see [17]), all these are not necessary.
• If the amount of keystream available is large enough, we use a so called
linearization method that is particularly simple. There are about T ≈ n d
monomials of degree ¤ d in the n variables Ki (assuming d ¤ n/2). We
consider each of these monomials as a new variable Vj . Given about n + M d
keystream bits, and therefore R = d equations on successive windows of

M bits, we get a system of R ≥ T linear equations with T = n variables Vi
that can be easily solved by Gaussian elimination on a linear system of size
T . The time to solve such a system is T ω with in theory ω ¤ 2.376 [7] but
in practice for small systems, it is believed that one should rather consider
ω that is closer to 3 than to 2.376.
74 N.T. Courtois

4.1 How to Build Secure Stream Ciphers
For stream ciphers in which the second component does not have internal mem-
ory, the case M > 1 does not make a lot of sense, and if we wish the cipher to
avoid algebraic attacks, we get a requirement on the second component that is
identical to our requirement on public key trapdoor functions. There should be
no “simple” algebraic relations between its inputs and outputs such as:

G (x0 , . . . , xk’1 ; y0 , . . . , ym’1 ) = 0 (—)
Similarly, in the general case l ≥ 1 we need to avoid the existence of “not too
complex” equations (that eliminate the internal state bits ai ) of type:

G(x0 , x1 , . . . , xn’1 , y (0) , . . . , y (M ’1) ) = 0 (——)
For stream ciphers however, the notion of “simple” and “complex” equations
changes. It is no longer the total size of these equations (number of monomials)


. 3
( 7)