UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

nemeaD naoJ

nemjiR tnecniV

AES Proposal: Rijndael

Joan Daemen, Vincent Rijmen

Joan Daemen Vincent Rijmen

Proton World Int.l Katholieke Universiteit Leuven, ESAT-COSIC

Zweefvliegtuigstraat 10 K. Mercierlaan 94

B-1130 Brussel, Belgium B-3001 Heverlee, Belgium

daemen.j@protonworld.com vincent.rijmen@esat.kuleuven.ac.be

Table of Contents

1. Introduction 4

1.1 Document history 4

2. Mathematical preliminaries 4

8

2.1 The field GF(2 ) 4

2.1.1 Addition 4

2.1.2 Multiplication 5

2.1.3 Multiplication by x 6

8

2.2 Polynomials with coefficients in GF(2 ) 6

2.2.1 Multiplication by x 7

3. Design rationale 8

4. Specification 8

4.1 The State, the Cipher Key and the number of rounds 8

4.2 The round transformation 10

4.2.1 The ByteSub transformation 11

4.2.2 The ShiftRow transformation 11

4.2.3 The MixColumn transformation 12

4.2.4 The Round Key addition 13

4.3 Key schedule 14

4.3.1 Key expansion 14

4.3.2 Round Key selection 15

4.4 The cipher 16

5. Implementation aspects 16

5.1 8-bit processor 16

5.2 32-bit processor 17

5.2.1 The Round Transformation 17

5.2.2 Parallelism 18

5.2.3 Hardware suitability 19

5.3 The inverse cipher 19

5.3.1 Inverse of a two-round Rijndael variant 19

5.3.2 Algebraic properties 20

5.3.3 The equivalent inverse cipher structure 20

5.3.4 Implementations of the inverse cipher 21

6. Performance figures 23

6.1 8-bit processors 23

6.1.1 Intel 8051 23

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 1 :egaP

:srohtuA ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

nemeaD naoJ

nemjiR tnecniV

6.1.2 Motorola 68HC08 23

6.2 32-bit processors 24

6.2.1 Optimised ANSI C 24

6.2.2 Java 25

7. Motivation for design choices 25

7.1 The reduction polynomial m(x ) 25

7.2 The ByteSub S-box 26

7.3 The MixColumn transformation 27

7.3.1 Branch number 27

7.4 The ShiftRow offsets 27

7.5 The key expansion 28

7.6 Number of rounds 28

8. Strength against known attacks 30

8.1 Symmetry properties and weak keys of the DES type 30

8.2 Differential and linear cryptanalysis 30

8.2.1 Differential cryptanalysis 30

8.2.2 Linear cryptanalysis 30

8.2.3 Weight of differential and linear trails 31

8.2.4 Propagation of patterns 31

8.3 Truncated differentials 36

8.4 The Square attack 36

8.4.1 Preliminaries 36

8.4.2 The basic attack 36

8.4.3 Extension by an additional round at the end 37

8.4.4 Extension by an additional round at the beginning 37

8.4.5 Working factor and memory requirements for the attacks 38

8.5 Interpolation attacks 38

8.6 Weak keys as in IDEA 38

8.7 Related-key attacks 39

9. Expected strength 39

10. Security goals 39

10.1 Definitions of security concepts 39

10.1.1 The set of possible ciphers for a given block length and key length 39

10.1.2 K-Security 40

10.1.3 Hermetic block ciphers 40

10.2 Goal 40

11. Advantages and limitations 41

11.1 Advantages 41

11.2 Limitations 41

12. Extensions 42

12.1 Other block and Cipher Key lengths 42

12.2 Another primitive based on the same round transformation 42

13. Other functionality 42

13.1 MAC 42

13.2 Hash function 43

13.3 Synchronous stream cipher 43

13.4 Pseudorandom number generator 43

13.5 Self-synchronising stream cipher 43

14. Suitability for ATM, HDTV, B-ISDN, voice and satellite 44

15. Acknowledgements 44

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 2 :egaP

:srohtuA lasoporP SEA

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

nemeaD naoJ

nemjiR tnecniV

16. References 44

17. List of Annexes 45

Table of Figures

Figure 1: Example of State (with Nb = 6) and Cipher Key (with Nk = 4) layout.......................... 9

Figure 2: ByteSub acts on the individual bytes of the State..................................................... 11

Figure 3: ShiftRow operates on the rows of the State. ............................................................ 12

Figure 4: MixColumn operates on the columns of the State. ................................................... 13

Figure 5: In the key addition the Round Key is bitwise EXORed to the State. ......................... 13

Figure 6: Key expansion and Round Key selection for Nb = 6 and Nk = 4. ............................. 15

Figure 7: Propagation of activity pattern (in grey) through a single round................................ 32

Figure 8: Propagation of patterns in a single round. ................................................................ 33

Figure 9: Illustration of Theorem 1 with Q = 2. ......................................................................... 34

Figure 10: Illustration of Lemma 1 with one active column in a1. ............................................. 35

Figure 11: Illustration of Theorem 2. ........................................................................................ 35

Figure 12: Complexity of the Square attack applied to Rijndael. ............................................. 38

List of Tables

Table 1: Number of rounds (Nr) as a function of the block and key length. ............................. 10

Table 2: Shift offsets for different block lengths....................................................................... 12

Table 3: Execution time and code size for Rijndael in Intel 8051 assembler. .......................... 23

Table 4: Execution time and code size for Rijndael in Motorola 68HC08 Assembler............... 24

Table 5: Number of cycles for the key expansion .................................................................... 24

Table 6: Cipher (and inverse) performance ............................................................................. 25

Table 7: Performance figures for the cipher execution (Java) ................................................. 25

Table 8: Shift offsets in Shiftrow for the alternative block lengths............................................ 42

/

HWD' QRLVUHY WQHPXFR' HJD3

:srohtuA lasoporP SEA

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

nemeaD naoJ

nemjiR tnecniV

1. Introduction

In this document we describe the cipher Rijndael. First we present the mathematical basis

necessary for understanding the specifications followed by the design rationale and the

description itself. Subsequently, the implementation aspects of the cipher and its inverse are

treated. This is followed by the motivations of all design choices and the treatment of the

resistance against known types of attacks. We give our security claims and goals, the

advantages and limitations of the cipher, ways how it can be extended and how it can be used

for functionality other than block encryption/decryption. We conclude with the

acknowledgements, the references and the list of annexes.

Patent Statement: Rijndael or any of its implementations is not and will not be subject to

patents.

1.1 Document history

This is the second version of the Rijndael documentation. The main difference with the first

version is the correction of a number of errors and inconsistencies, the addition of a motivation

for the number of rounds, the addition of some figures in the section on differential and linear

cryptanalysis, the inclusion of Brian Gladman™s performance figures and the specification of

Rijndael extensions supporting block and key lengths of 160 and 224 bits.

2. Mathematical preliminaries

Several operations in Rijndael are defined at byte level, with bytes representing elements in

8

the finite field GF(2 ). Other operations are defined in terms of 4-byte words. In this section we

introduce the basic mathematical concepts needed in the following of the document.

2.1 The field GF(28)

The elements of a finite field [LiNi86] can be represented in several different ways. For any

8

prime power there is a single finite field, hence all representations of GF(2 ) are isomorphic.

Despite this equivalence, the representation has an impact on the implementation complexity.

We have chosen for the classical polynomial representation.

A byte b, consisting of bits b7 b6 b5 b4 b3 b2 b1 b0, is considered as a polynomial with coefficient

in {0,1}:

7 6 5 4 3 2

b7 x + b 6 x + b 5 x + b 4 x + b 3 x + b 2 x + b 1 x + b 0

Example: the byte with hexadecimal value ˜57™ (binary 01010111) corresponds with

polynomial

6 4 2

x +x +x +x+1.

2.1.1 Addition

In the polynomial representation, the sum of two elements is the polynomial with coefficients

that are given by the sum modulo 2 (i.e., 1 + 1 = 0) of the coefficients of the two terms.

/

HWD' QRLVUHY WQHPXFR' HJD3

:srohtuA ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

nemeaD naoJ

nemjiR tnecniV

Example: ˜57™ + ˜83™ = ˜D4™, or with the polynomial notation:

6 4 2 7 7 6 4 2

( x + x + x + x + 1 ) + ( x + x + 1) = x + x + x + x .

In binary notation we have: “01010111” + “10000011” = “11010100”. Clearly, the addition

corresponds with the simple bitwise EXOR ( denoted by • ) at the byte level.

All necessary conditions are fulfilled to have an Abelian group: internal, associative, neutral

element (˜00™), inverse element (every element is its own additive inverse) and commutative.

As every element is its own additive inverse, subtraction and addition are the same.

2.1.2 Multiplication

8

In the polynomial representation, multiplication in GF(2 ) corresponds with multiplication of

polynomials modulo an irreducible binary polynomial of degree 8. A polynomial is irreducible if

it has no divisors other than 1 and itself. For Rijndael, this polynomial is called m(x ) and given

by

8 4 3

m(x ) = x + x + x + x + 1

or ˜11B™ in hexadecimal representation.

Example: ˜57™ • ˜83™ = ˜C1™, or:

6 4 2 7 13 11 9 8 7

(x + x + x + x + 1) ( x + x + 1) x +x +x +x +x +

=

7 5 3 2

x +x +x +x +x+

6 4 2

x +x +x +x+1

13 11 9 8 6 5 4 3

x +x +x +x +x +x +x +x +1

=

13 11 9 8 6 5 4 3 8 4 3

x + x + x + x + x + x + x + x + 1 modulo x + x + x + x + 1

7 6

x +x +1

=

Clearly, the result will be a binary polynomial of degree below 8. Unlike for addition, there is no

simple operation at byte level.

The multiplication defined above is associative and there is a neutral element (˜01™). For any

binary polynomial b(x ) of degree below 8, the extended algorithm of Euclid can be used to

compute polynomials a(x ), c(x ) such that

b(x )a(x ) + m(x )c(x ) = 1 .

Hence, a(x ) • b(x ) mod m(x )= 1 or

b’1(x ) = a(x ) mod m(x )

Moreover, it holds that a(x ) • (b(x ) + c(x )) = a(x ) • b(x ) + a(x ) • c(x ).

It follows that the set of 256 possible byte values, with the EXOR as addition and the

8

multiplication defined as above has the structure of the finite field GF(2 ).

/

HWD' QRLVUHY WQHPXFR' HJD3

:srohtuA ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

nemeaD naoJ

nemjiR tnecniV

2.1.3 Multiplication by x

If we multiply b(x ) by the polynomial x, we have:

8 7 6 5 4 3 2

b7 x + b 6 x + b 5 x + b 4 x + b 3 x + b 2 x + b 1 x + b 0 x

x • b(x ) is obtained by reducing the above result modulo m(x ). If b7 = 0, this reduction is the

identity operation, If b7 = 1, m(x ) must be subtracted (i.e., EXORed). It follows that

multiplication by x (hexadecimal ˜02™) can be implemented at byte level as a left shift and a

subsequent conditional bitwise EXOR with ˜1B™. This operation is denoted by b = xtime(a).

In dedicated hardware, xtime takes only 4 EXORs. Multiplication by higher powers of x can

be implemented by repeated application of xtime. By adding intermediate results,

multiplication by any constant can be implemented.

Example: ˜57™ • ˜13™ = ˜FE™

˜57™ • ˜02™ = xtime(57) = ˜AE™

˜57™ • ˜04™ = xtime(AE) = ˜47™

˜57™ • ˜08™ = xtime(47) = ˜8E™

˜57™ • ˜10™ = xtime(8E) = ˜07™

˜57™ • ˜13™ = ˜57™ • (˜01™ • ˜02™ • ˜10™ ) = ˜57™ • ˜AE™ • ˜07™ = ˜FE™

2.2 Polynomials with coefficients in GF(28)

8

Polynomials can be defined with coefficients in GF(2 ). In this way, a 4-byte vector

corresponds with a polynomial of degree below 4.

Polynomials can be added by simply adding the corresponding coefficients. As the addition in

8

GF(2 ) is the bitwise EXOR, the addition of two vectors is a simple bitwise EXOR.

8

Multiplication is more complicated. Assume we have two polynomials over GF(2 ):

3 2 3 2

a(x ) = a3 x + a2 x + a1 x + a0 and b(x ) = b3 x + b2 x + b1 x + b0.

Their product c(x ) = a(x )b(x ) is given by

6 5 4 3 2

c(x ) = c6 x + c5 x + c4 x + c3 x + c2 x + c1 x + c0 with

c0 = a0•b0 c4 = a3•b1 • a2•b2 • a1•b3

c1 = a1•b0 • a0•b1 c5 = a3•b2 • a2•b3

c2 = a2•b0 • a1•b1 • a0•b2 c6 = a3•b3

c3 = a3•b0 • a2•b1 • a1•b2 • a0•b3

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

Clearly, c(x ) can no longer be represented by a 4-byte vector. By reducing c(x ) modulo a

polynomial of degree 4, the result can be reduced to a polynomial of degree below 4. In

4

Rijndael, this is done with the polynomial M(x ) = x + 1. As

i i mod 4

4

x mod x + 1 = x ,

the modular product of a(x ) and b(x ), denoted by d(x ) = a(x ) — b(x ) is given by

3 2

d(x ) = d3 x + d2 x + d1 x + d0 with

d0 = a0•b0 • a3•b1 • a2•b2 • a1•b3

d1 = a1•b0 • a0•b1 • a3•b2 • a2•b3

d2 = a2•b0 • a1•b1 • a0•b2 • a3•b3

d3 = a3•b0 • a2•b1 • a1•b2 • a0•b3

The operation consisting of multiplication by a fixed polynomial a(x ) can be written as matrix

multiplication where the matrix is a circulant matrix. We have

®d 0 ®a0 a1 ®b0

a3 a2

d a a2 b1

a0 a3

1 = 1

d 2 a2 a3 b2

a1 a0

° d 3 » ° a3 a0 » °b3 »

a2 a1

4 8

Note: x + 1 is not an irreducible polynomial over GF(2 ), hence multiplication by a fixed

polynomial is not necessarily invertible. In the Rijndael cipher we have chosen a fixed

polynomial that does have an inverse.

2.2.1 Multiplication by x

If we multiply b(x ) by the polynomial x, we have:

4 3 2

b3 x + b 2 x + b 1 x + b 0 x

x — b(x ) is obtained by reducing the above result modulo 1 + x . This gives

4

3 2

b2 x + b 1 x + b 0 x + b 3

The multiplication by x is equivalent to multiplication by a matrix as above with all ai =˜00™

except a1 =˜01™. Let c(x ) = x —b(x ). We have:

®c0 ®00 01 ®b0

00 00

c 01 00 b1

00 00

1 =

c2 00 00 b2

01 00

° c3 » °00 00» °b3 »

00 01

Hence, multiplication by x, or powers of x, corresponds to a cyclic shift of the bytes inside the

vector.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

3. Design rationale

The three criteria taken into account in the design of Rijndael are the following:

• Resistance against all known attacks;

• Speed and code compactness on a wide range of platforms;

• Design simplicity.

In most ciphers, the round transformation has the Feistel Structure. In this structure typically

part of the bits of the intermediate State are simply transposed unchanged to another position.

The round transformation of Rijndael does not have the Feistel structure. Instead, the round

transformation is composed of three distinct invertible uniform transformations, called layers.

By “uniform”, we mean that every bit of the State is treated in a similar way.

The specific choices for the different layers are for a large part based on the application of the

Wide Trail Strategy [Da95] (see Annex ), a design method to provide resistance against linear

and differential cryptanalysis (see Section 8.2). In the Wide Trail Strategy, every layer has its

own function:

guarantees high diffusion over multiple rounds.

The linear mixing layer:

parallel application of S-boxes that have optimum worst-case

The non-linear layer:

nonlinearity properties.

A simple EXOR of the Round Key to the intermediate State.

The key addition layer:

Before the first round, a key addition layer is applied. The motivation for this initial key addition

is the following. Any layer after the last key addition in the cipher (or before the first in the

context of known-plaintext attacks) can be simply peeled off without knowledge of the key and

therefore does not contribute to the security of the cipher. (e.g., the initial and final permutation

in the DES). Initial or terminal key addition is applied in several designs, e.g., IDEA, SAFER

and Blowfish.

In order to make the cipher and its inverse more similar in structure, the linear mixing layer of

the last round is different from the mixing layer in the other rounds. It can be shown that this

does not improve or reduce the security of the cipher in any way. This is similar to the absence

of the swap operation in the last round of the DES.

4. Specification

Rijndael is an iterated block cipher with a variable block length and a variable key length. The

block length and the key length can be independently specified to 128, 192 or 256 bits.

Note: this section is intended to explain the cipher structure and not as an implementation

guideline. For implementation aspects, we refer to Section 5.

4.1 The State, the Cipher Key and the number of rounds

The different transformations operate on the intermediate result, called the State:

Definition: the intermediate cipher result is called the State.

The State can be pictured as a rectangular array of bytes. This array has four rows, the

number of columns is denoted by Nb and is equal to the block length divided by 32.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 8 :egaP

:srohtuA lasoporP SEA

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

nemeaD naoJ

nemjiR tnecniV

The Cipher Key is similarly pictured as a rectangular array with four rows. The number of

columns of the Cipher Key is denoted by Nk and is equal to the key length divided by 32.

These representations are illustrated in Figure 1.

In some instances, these blocks are also considered as one-dimensional arrays of 4-byte

vectors, where each vector consists of the corresponding column in the rectangular array

representation. These arrays hence have lengths of 4, 6 or 8 respectively and indices in the

ranges 0..3, 0..5 or 0..7. 4-byte vectors will sometimes be referred to as words.

Where it is necessary to specify the four individual bytes within a 4-byte vector or word the

notation (a, b, c, d) will be used where a, b, c and d are the bytes at positions 0, 1, 2 and 3

respectively within the column, vector or word being considered.

a 0,0 a 0,1 a 0,2 a 0,3 a 0,4 a 0,5 k 0,0 k 0,1 k 0,2 k 0,3

a 1,0 a 1,1 a 1,2 a 1,3 a 1,4 a 1,5 k 1,0 k 1,1 k 1,2 k 1,3

a 2,0 a 2,1 a 2,2 a 2,3 a 2,4 a 2,5 k 2,0 k 2,1 k 2,2 k 2,3

a 3,0 a 3,1 a 3,2 a 3,3 a 3,4 a 3,5 k 3,0 k 3,1 k 3,2 k 3,3

Figure 1: Example of State (with Nb = 6) and Cipher Key (with Nk = 4) layout.

The input and output used by Rijndael at its external interface are considered to be one-

dimensional arrays of 8-bit bytes numbered upwards from 0 to the 4*Nb’1. These blocks

hence have lengths of 16, 24 or 32 bytes and array indices in the ranges 0..15, 0..23 or 0..31.

The Cipher Key is considered to be a one-dimensional arrays of 8-bit bytes numbered upwards

from 0 to the 4*Nk’1. These blocks hence have lengths of 16, 24 or 32 bytes and array

indices in the ranges 0..15, 0..23 or 0..31.

The cipher input bytes (the “plaintext” if the mode of use is ECB encryption) are mapped onto

the state bytes in the order a0,0, a1,0, a2,0, a3,0, a0,1, a1,1, a2,1, a3,1, a4,1 ... , and the bytes of the

Cipher Key are mapped onto the array in the order k0,0, k1,0, k2,0, k3,0, k0,1, k1,1, k2,1, k3,1, k4,1 ... At

the end of the cipher operation, the cipher output is extracted from the state by taking the state

bytes in the same order.

Hence if the one-dimensional index of a byte within a block is n and the two dimensional index

is (i ,j ), we have:

n = i + 4* j

j = °n / 4 » ;

i = n mod 4 ;

Moreover, the index i is also the byte number within a 4-byte vector or word and j is the index

for the vector or word within the enclosing block.

The number of rounds is denoted by Nr and depends on the values Nb and Nk. It is given in

Table 1.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

Nr Nb = 4 Nb = 6 Nb = 8

Nk = 4 10 12 14

Nk = 6 12 12 14

Nk = 8 14 14 14

Table 1: Number of rounds (Nr) as a function of the block and key length.

4.2 The round transformation

The round transformation is composed of four different transformations. In pseudo C notation

we have:

Round(State,RoundKey)

{

ByteSub(State);

ShiftRow(State);

MixColumn(State);

AddRoundKey(State,RoundKey);

}

The final round of the cipher is slightly different. It is defined by:

FinalRound(State,RoundKey)

{

ByteSub(State) ;

ShiftRow(State) ;

AddRoundKey(State,RoundKey);

}

In this notation, the “functions” (Round, ByteSub, ShiftRow, ¦) operate on arrays to which

pointers (State, RoundKey) are provided.

It can be seen that the final round is equal to the round with the MixColumn step removed.

The component transformations are specified in the following subsections.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

4.2.1 The ByteSub transformation

The ByteSub Transformation is a non-linear byte substitution, operating on each of the State

bytes independently. The substitution table (or S-box ) is invertible and is constructed by the

composition of two transformations:

8

1. First, taking the multiplicative inverse in GF(2 ), with the representation defined in

Section 2.1. ˜00™ is mapped onto itself.

2. Then, applying an affine (over GF(2) ) transformation defined by:

® y0 ®1 1 ® x0 ®1

0 0 0 1 1 1

y 1 1 x1 1

1 0 0 0 1 1

1

y2 1 1 x2 0

1 1 0 0 0 1

y3 = 1 1 1 1 0 0 0 1 x3 0

+

y4 1 0 x4 0

1 1 1 1 0 0

y5 0 0 x5 1

1 1 1 1 1 0

y 0 0 x6 1

0 1 1 1 1 1

6

y7 0 1 x7 0

°»° »° » ° »

0 0 1 1 1 1

The application of the described S-box to all bytes of the State is denoted by:

ByteSub(State) .

Figure 2 illustrates the effect of the ByteSub transformation on the State.

S-box

a 0,0 a 0,1 a 0,2 a 0,3 a 0,4 a 0,5 b 0,0 b 0,1 b 0,2 b 0,3 b 0,4 b 0,5

a 1,0 a 1,1 a 1,2 a1,3 a 1,4 a 1,5 b 1,0 b 1,1 b 1,2 b1,3 b 1,4 b 1,5

a b

i,j i,j

a 2,0 a 2,1 a 2,2 a 2,3 a 2,4 a 2,5 b 2,0 b 2,1 b 2,2 b 2,3 b 2,4 b 2,5

a 3,0 a 3,1 a 3,2 a 3,3 a 3,4 a 3,5 b 3,0 b 3,1 b 3,2 b 3,3 b 3,4 b 3,5

Figure 2: ByteSub acts on the individual bytes of the State.

The inverse of ByteSub is the byte substitution where the inverse table is applied. This is

obtained by the inverse of the affine mapping followed by taking the multiplicative inverse in

8

GF(2 ).

4.2.2 The ShiftRow transformation

In ShiftRow, the rows of the State are cyclically shifted over different offsets. Row 0 is not

shifted, Row 1 is shifted over C1 bytes, row 2 over C2 bytes and row 3 over C3 bytes.

The shift offsets C1, C2 and C3 depend on the block length Nb. The different values are

specified in Table 2.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 11 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

C1 C2 C3

Nb

4 1 2 3

6 1 2 3

8 1 3 4

Table 2: Shift offsets for different block lengths.

The operation of shifting the rows of the State over the specified offsets is denoted by:

ShiftRow(State) .

Figure 3 illustrates the effect of the ShiftRow transformation on the State.

m n o p ... m n o p ...

no shift

j k l cyclic shift by C1 l

k ... h i j

... (1)

d e f ... cyclic shift by fC2 (2) a b c d e

z

w x y z ... ... w x y

cyclic shift by C3 (3)

Figure 3: ShiftRow operates on the rows of the State.

The inverse of ShiftRow is a cyclic shift of the 3 bottom rows over Nb-C1, Nb-C2 and Nb-C3

bytes respectively so that the byte at position j in row i moves to position (j + Nb-Ci) mod Nb.

4.2.3 The MixColumn transformation

8

In MixColumn, the columns of the State are considered as polynomials over GF(2 ) and

4

multiplied modulo x + 1 with a fixed polynomial c(x ), given by

c(x ) = ˜03™ x3 + ˜01™ x2 + ˜01™ x + ˜02™ .

4

This polynomial is coprime to x + 1 and therefore invertible. As described in Section 2.2, this

can be written as a matrix multiplication. Let b(x ) = c(x ) — a(x ),

®b0 ®02 01 ®a 0

03 01

b 01 01 a1

02 03

1 =

b2 01 03 a 2

01 02

°b3 » ° 03 02 » ° a 3 »

01 01

The application of this operation on all columns of the State is denoted by

MixColumn(State).

Figure 4 illustrates the effect of the MixColumn transformation on the State.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 21 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

a b

0,j 0,j

a 0,0 a 0,1 a 0,2 a 0,3 a 0,4 a 0,5 b 0,0 b 0,1 b 0,2 b 0,3 b 0,4 b 0,5

— c(x)

a 1,j b 1,j

a 1,0 a 1,1 a 1,2 a 1,3 a 1,4 a 1,5 b 1,0 b 1,1 b 1,2 b 1,3 b 1,4 b 1,5

a 2,0 a 2,1 a 2,2 a 2,3 a 2,4 a 2,5 b 2,0 b 2,1 b 2,2 b 2,3 b 2,4 b 2,5

a b

2,j 2,j

a 3,0 a 3,1 a 3,2 a 3,3 a 3,4 a 3,5 b 3,0 b 3,1 b 3,2 b 3,3 b 3,4 b 3,5

a 3,j b 3,j

Figure 4: MixColumn operates on the columns of the State.

The inverse of MixColumn is similar to MixColumn. Every column is transformed by multiplying

it with a specific multiplication polynomial d(x ), defined by

( ˜03™ x + ˜01™ x + ˜01™ x + ˜02™ ) — d(x ) = ˜01™ .

3 2

It is given by:

3 2

d(x ) = ˜0B™ x + ˜0D™ x + ˜09™ x + ˜0E™ .

4.2.4 The Round Key addition

In this operation, a Round Key is applied to the State by a simple bitwise EXOR. The Round

Key is derived from the Cipher Key by means of the key schedule. The Round Key length is

equal to the block length Nb.

The transformation that consists of EXORing a Round Key to the State is denoted by:

AddRoundKey(State,RoundKey).

This transformation is illustrated in Figure 5.

a 0,0 a 0,1 a 0,2 a 0,3 a 0,4 a 0,5 k 0,0 k 0,1 k 0,2 k 0,3 k 0,4 k 0,5 b 0,0 b 0,1 b 0,2 b 0,3 b 0,4 b 0,5

a 1,0 a 1,1 a 1,2 a 1,3 a 1,4 a 1,5 k 1,0 k 1,1 k 1,2 k 1,3 k 1,4 k 1,5 b 1,0 b 1,1 b 1,2 b 1,3 b 1,4 b 1,5

• =

a 2,0 a 2,1 a 2,2 a 2,3 a 2,4 a 2,5 k 2,0 k 2,1 k 2,2 k 2,3 k 2,4 k 2,5 b 2,0 b 2,1 b 2,2 b 2,3 b 2,4 b 2,5

a 3,0 a 3,1 a 3,2 a 3,3 a 3,4 a 3,5 k 3,0 k 3,1 k 3,2 k 3,3 k 3,4 k 3,5 b 3,0 b 3,1 b 3,2 b 3,3 b 3,4 b 3,5

Figure 5: In the key addition the Round Key is bitwise EXORed to the State.

AddRoundKey is its own inverse.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 31 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

4.3 Key schedule

The Round Keys are derived from the Cipher Key by means of the key schedule. This consists

of two components: the Key Expansion and the Round Key Selection. The basic principle is

the following:

• The total number of Round Key bits is equal to the block length multiplied by the

number of rounds plus 1. (e.g., for a block length of 128 bits and 10 rounds, 1408

Round Key bits are needed).

• The Cipher Key is expanded into an Expanded Key.

• Round Keys are taken from this Expanded Key in the following way: the first Round

Key consists of the first Nb words, the second one of the following Nb words, and so

on.

4.3.1 Key expansion

The Expanded Key is a linear array of 4-byte words and is denoted by W[Nb*(Nr+1)]. The

first Nk words contain the Cipher Key. All other words are defined recursively in terms of words

with smaller indices. The key expansion function depends on the value of Nk: there is a

version for Nk equal to or below 6, and a version for Nk above 6.

For Nk ¤ 6, we have:

KeyExpansion(byte Key[4*Nk] word W[Nb*(Nr+1)])

{

for(i = 0; i < Nk; i++)

W[i] = (Key[4*i],Key[4*i+1],Key[4*i+2],Key[4*i+3]);

for(i = Nk; i < Nb * (Nr + 1); i++)

{

temp = W[i - 1];

if (i % Nk == 0)

temp = SubByte(RotByte(temp)) ^ Rcon[i / Nk];

W[i] = W[i - Nk] ^ temp;

}

}

In this description, SubByte(W) is a function that returns a 4-byte word in which each byte is

the result of applying the Rijndael S-box to the byte at the corresponding position in the input

word. The function RotByte(W) returns a word in which the bytes are a cyclic permutation of

those in its input such that the input word (a,b,c,d) produces the output word (b,c,d,a).

It can be seen that the first Nk words are filled with the Cipher Key. Every following word W[i]

is equal to the EXOR of the previous word W[i-1] and the word Nk positions earlier W[i-Nk].

For words in positions that are a multiple of Nk, a transformation is applied to W[i-1] prior to

the EXOR and a round constant is EXORed. This transformation consists of a cyclic shift of

the bytes in a word (RotByte), followed by the application of a table lookup to all four bytes

of the word (SubByte).

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 41 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

For Nk > 6, we have:

KeyExpansion(byte Key[4*Nk] word W[Nb*(Nr+1)])

{

for(i = 0; i < Nk; i++)

W[i] = (key[4*i],key[4*i+1],key[4*i+2],key[4*i+3]);

for(i = Nk; i < Nb * (Nr + 1); i++)

{

temp = W[i - 1];

if (i % Nk == 0)

temp = SubByte(RotByte(temp)) ^ Rcon[i / Nk];

else if (i % Nk == 4)

temp = SubByte(temp);

W[i] = W[i - Nk] ^ temp;

}

}

The difference with the scheme for Nk ¤ 6 is that for i-4 a multiple of Nk, SubByte is applied

to W[i-1] prior to the EXOR.

The round constants are independent of Nk and defined by:

Rcon[i] = (RC[i],˜00™,˜00™,˜00™)

( i ’ 1)

8

with RC[I] representing an element in GF(2 ) with a value of x so that:

RC[1] = 1 (i.e. ˜01™)

RC[i] = x (i.e. ˜02™) •(RC[i-1]) = x(i-1)

4.3.2 Round Key selection

Round key i is given by the Round Key buffer words W[Nb*i] to W[Nb*(i+1)]. This is

illustrated in Figure 6.

W0 W1 W2 W3 W4 W5 W6 W7 W8 W 9 W 10 W 11 W 12 W 13 W 14 ...

...

Round key 0 Round key 1

Figure 6: Key expansion and Round Key selection for Nb = 6 and Nk = 4.

Note: The key schedule can be implemented without explicit use of the array W[Nb*(Nr+1)].

For implementations where RAM is scarce, the Round Keys can be computed on-the-fly using

a buffer of Nk words with almost no computational overhead.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 51 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

4.4 The cipher

The cipher Rijndael consists of

• an initial Round Key addition;

• Nr-1 Rounds;

• a final round.

In pseudo C code, this gives:

Rijndael(State,CipherKey)

{

KeyExpansion(CipherKey,ExpandedKey) ;

AddRoundKey(State,ExpandedKey);

For( i=1 ; i<Nr ; i++ ) Round(State,ExpandedKey + Nb*i) ;

FinalRound(State,ExpandedKey + Nb*Nr);

}

The key expansion can be done on beforehand and Rijndael can be specified in terms of the

Expanded Key.

Rijndael(State,ExpandedKey)

{

AddRoundKey(State,ExpandedKey);

For( i=1 ; i<Nr ; i++ ) Round(State,ExpandedKey + Nb*i) ;

FinalRound(State,ExpandedKey + Nb*Nr);

}

Note: the Expanded Key shall always be derived from the Cipher Key and never be specified

directly. There are however no restrictions on the selection of the Cipher Key itself.

5. Implementation aspects

The Rijndael cipher is suited to be implemented efficiently on a wide range of processors and

in dedicated hardware. We will concentrate on 8-bit processors, typical for current Smart Cards

and on 32-bit processors, typical for PCs.

5.1 8-bit processor

On an 8-bit processor, Rijndael can be programmed by simply implementing the different

component transformations. This is straightforward for RowShift and for the Round Key

addition. The implementation of ByteSub requires a table of 256 bytes.

The Round Key addition, ByteSub and RowShift can be efficiently combined and executed

serially per State byte. Indexing overhead is minimised by explicitly coding the operation for

every State byte.

8

The transformation MixColumn requires matrix multiplication in the field GF(2 ). This can be

implemented in an efficient way. We illustrate it for one column:

Tmp = a[0] ^ a[1] ^ a[2] ^ a[3] ; /* a is a byte array */

Tm = a[0] ^ a[1] ; Tm = xtime(Tm); a[0] ^= Tm ^ Tmp ;

Tm = a[1] ^ a[2] ; Tm = xtime(Tm); a[1] ^= Tm ^ Tmp ;

Tm = a[2] ^ a[3] ; Tm = xtime(Tm); a[2] ^= Tm ^ Tmp ;

Tm = a[3] ^ a[0] ; Tm = xtime(Tm); a[3] ^= Tm ^ Tmp ;

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 61 :egaP

VURKWX$ lasoporP SEA

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

QHPHD' QDR-

QHPML5 WQHFQL9

This description is for clarity. In practice, coding is of course done in assembly language. To

prevent timing attacks, attention must be paid that xtime is implemented to take a fixed

number of cycles, independent of the value of its argument. In practice this can be achieved by

using a dedicated table-lookup.

Obviously, implementing the key expansion in a single shot operation is likely to occupy too

much RAM in a Smart Card. Moreover, in most applications, such as debit cards or electronic

purses, the amount of data to be enciphered, deciphered or that is subject to a MAC is

typically only a few blocks per session. Hence, not much performance can be gained by

expanding the key only once for multiple applications of the block cipher.

The key expansion can be implemented in a cyclic buffer of 4*max(Nb,Nk) bytes. The

Round Key is updated in between Rounds. All operations in this key update can be

implemented efficiently on byte level. If the Cipher Key length and the blocks length are equal

or differ by a factor 2, the implementation is straightforward. If this is not the case, an

additional buffer pointer is required.

5.2 32-bit processor

5.2.1 The Round Transformation

The different steps of the round transformation can be combined in a single set of table

lookups, allowing for very fast implementations on processors with word length 32 or above. In

this section, it is explained how this can be done.

We express one column of the round output e in terms of bytes of the round input a. In this

section, ai,j denotes the byte of a in row i and column j, aj denotes the column j of State a. For

the key addition and the MixColumn transformation, we have

® e 0, j ® d 0, j ® k 0, j ®d 0, j ®02 03 01 01 ®c0 , j

e d k d

02 03 01 c1, j

1, j = 01

1, j 1, j 1, j

= • and .

e2 , j d 2 , j k 2 , j d 2 , j 01 01 02 03 c2 , j

d 3, j » 03

° 01 01 02» °c3, j »

e3, j » °d 3, j » ° k 3, j »

° °

For the ShiftRow and the ByteSub transformations, we have:

®c0, j ® b0, j

c b

[]

1, j

=

1, j ’ C1

and bi , j = S ai , j .

c2 , j b2 , j ’ C 2

° c3, j » °b3, j ’ C 3 »

In this expression the column indices must be taken modulo Nb. By substitution, the above

expressions can be combined into:

[ ] ®k

®

®e0, j ®02

03 01 01 S a0, j

[ ] • k

0, j

e

02 03 01 S a1, j ’ C1

1, j = 01 1, j

[ ] k

.

e2, j 01

01 02 03 S a2 , j ’ C 2 2, j

01 01 02» S a

[ ]» °k

° e3, j » ° 03 3, j »

3, j ’ C 3

°

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ lasoporP SEA

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

rehpiC kcolB leadnjiR ehT

QHPHD' QDR-

QHPML5 WQHFQL9

The matrix multiplication can be expressed as a linear a combination of vectors:

® e 0, j ® 01 ® k 0, j

®02 ® 03 ® 01

e 01 k

01 02 03

[] [ ] [ ] [ ]

1, j = S a • 1, j .

•S a •S a •S a

e2 , j 03 k 2 , j

01 01 02

1, j ’ C1 2, j ’C2 3, j ’ C 3

0, j

° 03» ° 01» ° 01» °02» ° k 3, j »

° e3, j »

The multiplication factors S[ai,j] of the four vectors are obtained by performing a table lookup

on input bytes ai,j in the S-box table S[256].

We define tables T0 to T3 :

®S[a ] • 02 ®S[a ] • 03 ® S[a ] ® S[ a ]

S[a ] S[ a ] • 02 S[ a ] • 03 S[ a ]

T0 [ a ] = T1[a ] = T3 [a ] =

T2 [a ] = .

S[a ] S[a ] S[a ] • 02 S[a ] • 03

S[ a ] • 03» S[a ] » S[a ] » S[a ] • 02 »

° ° °

°

These are 4 tables with 256 4-byte word entries and make up for 4KByte of total space. Using

these tables, the round transformation can be expressed as:

[][ ][ ][ ]

e j = T0 a0, j • T1 a1, j ’ C1 • T2 a2 , j ’C 2 • T3 a3, j ’ C 3 • k j .

Hence, a table-lookup implementation with 4 Kbytes of tables takes only 4 table lookups and 4

EXORs per column per round.

It can be seen that Ti[a] = RotByte(Ti-1[a]). At the cost of 3 additional rotations per round per

column, the table-lookup implementation can be realised with only one table, i.e., with a total

table size of 1KByte. We have

[] [ ] [ ] [ ]

e j = k j • T0 b0 , j • Rotbyte( T0 b1 , j ’ C 1 • Rotbyte( T0 b2 , j ’ C 2 • R otbyte( T0 b3 , j ’ C 3 )))

The code-size (relevant in applets) can be kept small by including code to generate the tables

instead of the tables themselves.

In the final round, there is no MixColumn operation. This boils down to the fact that the S table

must be used instead of the T tables. The need for additional tables can be suppressed by

extracting the S table from the T tables by masking while executing the final round.

Most operations in the key expansion can be implemented by 32-bit word EXORs. The

additional transformations are the application of the S-box and a cyclic shift over 8-bits. This

can be implemented very efficiently.

5.2.2 Parallelism

It can be seen that there is considerable parallelism in the round transformation. All four

component transformations of the round act in a parallel way on bytes, rows or columns of the

State.

In the table-lookup implementation, all table lookups can in principle be done in parallel. The

EXORs can be done in parallel for the most part also.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 81 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

The key expansion is clearly of a more sequential nature: the value of W[i-1] is needed for the

computation of W[i]. However, in most applications where speed is critical, the KeyExpansion

has to be done only once for a large number of cipher executions. In applications where the

Cipher Key changes often (in extremis once per application of the Block Cipher), the key

expansion and the cipher Rounds can be done in parallel..

5.2.3 Hardware suitability

The cipher is suited to be implemented in dedicated hardware. There are several trade-offs

between area and speed possible. Because the implementation in software on general-

purpose processors is already very fast, the need for hardware implementations will very

probably be limited to two specific cases:

• Extremely high speed chip with no area restrictions: the T tables can be hardwired

and the EXORs can be conducted in parallel.

• Compact co-processor on a Smart Card to speed up Rijndael execution: for this

platform typically the S-box and the xtime (or the complete MixColumn) operation

can be hardwired.

5.3 The inverse cipher

In the table-lookup implementation it is essential that the only non-linear step (ByteSub) is the

first transformation in a round and that the rows are shifted before MixColumn is applied. In the

Inverse of a round, the order of the transformations in the round is reversed, and consequently

the non-linear step will end up being the last step of the inverse round and the rows are shifted

after the application of (the inverse of) MixColumn. The inverse of a round can therefore not be

implemented with the table lookups described above.

This implementation aspect has been anticipated in the design. The structure of Rijndael is

such that the sequence of transformations of its inverse is equal to that of the cipher itself, with

the transformations replaced by their inverses and a change in the key schedule. This is

shown in the following subsections.

Note: this identity in structure differs from the identity of components and structure in IDEA

[LaMaMu91].

5.3.1 Inverse of a two-round Rijndael variant

The inverse of a round is given by:

InvRound(State,RoundKey)

{

AddRoundKey(State,RoundKey);

InvMixColumn(State);

InvShiftRow(State);

InvByteSub(State);

}

The inverse of the final round is given by:

InvFinalRound(State,RoundKey)

{

AddRoundKey(State,RoundKey);

InvShiftRow(State);

InvByteSub(State);

}

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 91 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

The inverse of a two-round variant of Rijndael consists of the inverse of the final round

followed by the inverse of a round, followed by a Round Key Addition. We have:

AddRoundKey(State,ExpandedKey+2*Nb);

InvShiftRow(State);

InvByteSub(State);

AddRoundKey(State,ExpandedKey+Nb);

InvMixColumn(State);

InvShiftRow(State);

InvByteSub(State);

AddRoundKey(State,ExpandedKey);

5.3.2 Algebraic properties

In deriving the equivalent structure of the inverse cipher, we make use of two properties of the

component transformations.

First, the order of ShiftRow and ByteSub is indifferent. ShiftRow simply transposes the bytes

and has no effect on the byte values. ByteSub works on individual bytes, independent of their

position.

Second, the sequence

AddRoundKey(State,RoundKey);

InvMixColumn(State);

can be replaced by:

InvMixColumn(State);

AddRoundKey(State,InvRoundKey);

with InvRoundKey obtained by applying InvMixColumn to the corresponding RoundKey. This is

based on the fact that for a linear transformation A, we have A(x+k)= A(x )+A(k).

5.3.3 The equivalent inverse cipher structure

Using the properties described above, the inverse of the two-round Rijndael variant can be

transformed into:

AddRoundKey(State,ExpandedKey+2*Nb);

InvByteSub(State);

InvShiftRow(State);

InvMixColumn(State);

AddRoundKey(State,I_ExpandedKey+Nb);

InvByteSub(State);

InvShiftRow(State);

AddRoundKey(State,ExpandedKey);

It can be seen that we have again an initial Round Key addition, a round and a final round. The

Round and the final round have the same structure as those of the cipher itself. This can be

generalised to any number of rounds.

/

HWD' QRLVUHY WQHPXFR' HJD3

:srohtuA ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

nemeaD naoJ

nemjiR tnecniV

We define a round and the final round of the inverse cipher as follows:

I_Round(State,I_RoundKey)

{

InvByteSub(State);

InvShiftRow(State);

InvMixColumn(State);

AddRoundKey(State,I_RoundKey);

}

I_FinalRound(State,I_RoundKey)

{

InvByteSub(State);

InvShiftRow(State);

AddRoundKey(State,RoundKey0);

}

The Inverse of the Rijndael Cipher can now be expressed as follows:

I_Rijndael(State,CipherKey)

{

I_KeyExpansion(CipherKey,I_ExpandedKey) ;

AddRoundKey(State,I_ExpandedKey+ Nb*Nr);

For( i=Nr-1 ; i>0 ; i-- ) Round(State,I_ExpandedKey+ Nb*i) ;

FinalRound(State,I_ExpandedKey);

}

The key expansion for the Inverse Cipher is defined as follows:

1. Apply the Key Expansion.

2. Apply InvMixColumn to all Round Keys except the first and the last one.

In Pseudo C code, this gives:

I_KeyExpansion(CipherKey,I_ExpandedKey)

{

KeyExpansion(CipherKey,I_ExpandedKey);

for( i=1 ; i < Nr ; i++ )

InvMixColumn(I_ExpandedKey + Nb*i) ;

}

5.3.4 Implementations of the inverse cipher

The choice of the MixColumn polynomial and the key expansion was partly based on cipher

performance arguments. Since the inverse cipher is similar in structure, but uses a MixColumn

transformation with another polynomial and (in some cases) a modified key schedule, a

performance degradation is observed on 8-bit processors.

This asymmetry is due to the fact that the performance of the inverse cipher is considered to

be less important than that of the cipher. In many applications of a block cipher, the inverse

cipher operation is not used. This is the case for the calculation of MACs, but also when the

cipher is used in CFB-mode or OFB-mode.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

5.3.4.1 8-bit processors

As explained in Section 4.1, the operation MixColumn can be implemented quite efficiently on

8-bit processors. This is because the coefficients of MixColumn are limited to ˜01™, ˜02™ and ˜03™

and because of the particular arrangement in the polynomial. Multiplication with these

coefficients can be done very efficiently by means of the procedure xtime(). The coefficients

of InvMixColumn are ˜09™, ™0E', ™0B' and ™0D'. In our 8-bit implementation, these multiplications

take significantly more time. A considerable speed-up can be obtained by using table lookups

at the cost of additional tables.

The key expansion operation that generates W is defined in such a way that we can also start

with the last Nk words of Round Key information and roll back to the original Cipher Key. So,

calculation ™on-the-fly' of the Round Keys, starting from an “Inverse Cipher Key”, is still

possible.

5.3.4.2 32-bit processors

The Round of the inverse cipher can be implemented with table lookups in exactly the same

way as the round of the cipher and there is no performance degradation with respect to the

cipher. The look-up tables for the inverse are of course different.

The key expansion for the inverse cipher is slower, because after the key expansion all but two

of the Round Keys are subject to InvMixColumn (cf. Section 5.3.3).

5.3.4.3 Hardware suitability

Because the cipher and its inverse use different transformations, a circuit that implements

Rijndael does not automatically support the computation of the inverse of Rijndael. Still, in a

circuit implementing both Rijndael and its inverse, parts of the circuit can be used for both

functions.

This is for instance the case for the non-linear layer. The S-box is constructed from two

mappings:

S(x ) = f(g(x )),

where g(x ) is the mapping:

x ’ x’ in GF(2 )

1 8

and f(x ) is the affine mapping.

“1 “1 “1 “1

The mapping g(x ) is self-inverse and hence S (x ) = g (f (x )) = g(f (x )). Therefore when we

“1 “1 “1

want both S and S , we need to implement only g, f and f . Since both f and f are very

simple bit-level functions, the extra hardware can be reduced significantly compared to having

two full S-boxes.

Similar arguments apply to the re-use of the xtime transformation in the diffusion layer.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

6. Performance figures

6.1 8-bit processors

Rijndael has been implemented in assembly language for two types of microprocessors that

are representative for Smart Cards in use today.

In these implementations the Round Keys are computed in between the rounds of the cipher

(just-in-time calculation of the Round Keys) and therefore the key schedule is repeated for

every cipher execution. This means that there is no extra time required for key set-up or a key

change. There is also no time required for algorithm set-up. We have only implemented the

forward operation of the cipher. Implementation efforts by other people have indicated that the

inverse cipher turns out to be about 30 % slower. This is due to reasons explained in the

section on implementation.

6.1.1 Intel 8051

Rijndael has been implemented on the Intel 8051 microprocessor, using 8051 Development

tools of Keil Elektronik: uVision IDE for Windows and dScope Debugger/Simulator for

Windows.

Execution time for several code sizes is given in Table 3 (1 cycle = 12 oscillator periods).

Key/Block Length Number of Cycles Code length

(128,128) a) 4065 cycles 768 bytes

(128,128) b) 3744 cycles 826 bytes

(128,128) c) 3168 cycles 1016 bytes

(192,128) 4512 cycles 1125 bytes

(256,128) 5221 cycles 1041 bytes

Table 3: Execution time and code size for Rijndael in Intel 8051 assembler.

6.1.2 Motorola 68HC08

Rijndael has been implemented on the Motorola 68HC08 microprocessor using the 68HC08

development tools by P&E Microcomputer Systems, Woburn, MA USA, the IASM08 68HC08

Integrated Assembler and SIML8 68HC08 simulator. Execution time, code size and required

RAM for a number of implementations are given in Table 4 (1 cycle = 1oscillator period). No

optimisation of code length has been attempted for this processor.

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

Key/Block Length Number of Cycles Required RAM Code length

(128,128) a) 8390 cycles 36 bytes 919 bytes

(192,128) 10780 cycles 44 bytes 1170 bytes

(256,128) 12490 cycles 52 bytes 1135 bytes

Table 4: Execution time and code size for Rijndael in Motorola 68HC08 Assembler.

6.2 32-bit processors

6.2.1 Optimised ANSI C

We have no access to a Pentium Pro computer. Speed estimates for this platform were

originally generated by compiling the code with EGCS (release 1.0.2) and executing it on a

200 MHz Pentium, running Linux. However, since this report was first published further

performance figures have become available and those published by Brian Gladman are

reported below.

The AES CD figures are for ANSI C using the NIST API. The figures reported by Brian

Gladman are for the Pentium Pro and Pentium II processor families using a more efficient

interface. These results were obtained with the Microsoft Visual C++ (version 6) compiler that

provides fast intrinsic rotate instructions. The ability to use these instructions within C code

provides substantial performance gains without incurring significant portability problems since

many C compilers now offer equivalent facilities. The speed figures given in the tables have

been scaled to be those that would apply on the 200MHz Pentium Pro reference platform.

Algorithm set-up takes no time. Key set-up and key change take exactly the same time: the

time to generate the Expanded Key from the Cipher Key. The key set-up for the inverse cipher

takes more time than the key set-up for the cipher itself (cf. Section 5.3.3).

Table 5 lists the number of cycles needed for the key expansion.

# cycles AES CD (ANSI C) Brian Gladman (Visual C++)

-1 -1

(key,block) Rijndael Rijndael Rijndael Rijndael

length

(128,128) 2100 2900 305 1389

(192,128) 2600 3600 277 1595

(256,128) 2800 3800 374 1960

Table 5: Number of cycles for the key expansion

The cipher and its inverse take the same time. The difference in performance that is discussed

in the section on implementation, is only caused by the difference in the key set-up. Table 6

gives the figures for the raw encryption, when implemented in C, without counting the

overhead caused by the AES API.

/

99/90/30 :etaD ,2 noisrev tnemucoD 54 42 :egaP

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

(key,block) AES CD (ANSI C) Brian Gladman (Visual C++)

length

speed (Mbits/Sec) # cycles/block speed (Mbits/Sec) # cycles/block

(128,128) 27.0 950 70.5 363

(192,128) 22.8 1125 59.3 432

(256,128) 19.8 1295 51.2 500

Table 6: Cipher (and inverse) performance

6.2.2 Java

We gratefully accepted the generous offer from Cryptix to produce the Java implementation.

Cryptix provides however no performance figures. Our estimates are based on the execution

time of the KAT and MCT code on a 200 MHz Pentium, running Linux. The JDK1.1.1 Java

compiler was used. The performance figures of the Java implementation are given in Table 7.

We cannot provide estimates for the key set-up or algorithm set-up time.

Key/Block length Speed # cycles for Rijndael

(128,128) 1100 Kbit/s 23.0 Kcycles

(192,128) 930 Kbit/s 27.6 Kcycles

(256,128) 790 Kbit/s 32.3 Kcycles

Table 7: Performance figures for the cipher execution (Java)

7. Motivation for design choices

In the following subsections, we will motivate the choice of the specific transformations and

constants. We believe that the cipher structure does not offer enough degrees of freedom to

hide a trap door.

7.1 The reduction polynomial m(x )

8

The polynomial m(x ) (˜11B™) for the multiplication in GF(2 ) is the first one of the list of

irreducible polynomials of degree 8, given in [LiNi86, p. 378].

/

HWD' QRLVUHY WQHPXFR' HJD3

VURKWX$ ODVRSRU3 6($

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

UHKSL& NFRO% OHDGQML5 HK7

QHPHD' QDR-

QHPML5 WQHFQL9

7.2 The ByteSub S-box

The design criteria for the S-box are inspired by differential and linear cryptanalysis on the one

hand and attacks using algebraic manipulations, such as interpolation attacks, on the other:

1. Invertibility;

2. Minimisation of the largest non-trivial correlation between linear combinations of

input bits and linear combination of output bits;

3. Minimisation of the largest non-trivial value in the EXOR table;

8