Permutation Tables

Jean-S´bastien Coron

e

University of Luxembourg

jean-sebastien.coron@uni.lu

Abstract. We propose and analyse a new countermeasure against Dif-

ferential Power Analysis (DPA) for the AES encryption algorithm, based

on permutation tables. As opposed to existing AES countermeasures, it

does not use random masking. We prove that our new countermeasure is

resistant against ¬rst-order DPA; we also show that it is quite e¬cient

in practice.

1 Introduction

The AES [4] encryption algorithm is with DES the most widely used encryption

algorithm. However, it is easy to see that without modi¬cation, AES is vulnerable

to Di¬erential Power Analysis as introduced by Kocher et al. in [6, 7]. A DPA

attack consists in extracting information about the secret key of a cryptographic

algorithm, by studying the power consumption of the electronic device during the

execution of the algorithm. The attack was ¬rst described on the DES encryption

scheme, but it was soon understood that the attack could easily be extended

to other symmetrical cryptosystems such as the AES, and also to public-key

cryptosystems such as RSA and Elliptic-Curves Cryptosystems [3].

A common technique to protect secret-key algorithms against side-channel

attacks consists in masking all data with a random integer, as suggested in

[2]. The masked data and the random integer are then processed separately and

eventually recombined at the end of the algorithm. An attacker trying to analyse

power consumption at a single point will obtain only random values; therefore,

the implementation will be secure against ¬rst-order Di¬erential Power Analysis

(DPA). In order to obtain valuable information about the key, the attacker must

correlate the power consumption at multiple points during the execution of the

algorithm; this is called a High Order Di¬erential Power Analysis (HO-DPA);

such attack usually requires a much larger number of power consumption curves,

which makes it infeasible in practice if the number of executions is limited (for

example, by using a counter). Many AES countermeasures have been described

based on random masking [1, 5, 9, 10, 12].

In this article we propose a di¬erent countermeasure against DPA for AES,

based on permutation tables. The main di¬erence with existing AES counter-

measures is that it avoids random masking; in practice this can be an advantage

because random masking is subject to numerous patents [7]. We prove that our

countermeasure is resistant against ¬rst-order DPA (like the random masking

countermeasure) and we show that its e¬ciency is comparable to that of the

random masking countermeasure.

It works as follows: at initialisation time a randomised permutation table

is generated in RAM; this permutation table is then applied to the message

and to the key; then all intermediate variables that appear during the course of

the algorithm remain in permuted form; eventually the inverse permutation is

applied to obtain the resulting ciphertext.

We also describe a technique to reduce the RAM consumption of our per-

mutation table countermeasure, at the cost of increasing the running time. Our

technique is based on a compression scheme proposed in [11] for the classical

random masking countermeasure; here we adapt this scheme to permutation ta-

bles. We show that this variant is also secure against ¬rst-order DPA. Finally, we

also provide the result of implementations that show that our countermeasure

is reasonably e¬cient in practice, as it is only roughly four times slower than

the classical masking countermeasure, for a comparable RAM requirement (see

Table 1 in Section 7 for a detailed comparison).

2 The AES encryption algorithm

In this section we recall the main operations involved in the AES algorithm. We

refer to [4] for a full description. AES operates on a 4 — 4 array of bytes si,j ,

termed the state. For encryption, each AES round (except the last) consists of

four stages:

1. AddRoundKey: each byte of the state is xored with the round key ki,j ,

derived from the key schedule:

si,j ← si,j • ki,j

2. SubBytes: each byte of the state is updated using an 8-bit S-box:

si,j ← S(si,j )

3. ShiftRows: the bytes of the state are cyclically shifted in each row by a

certain o¬set; the ¬rst row is left unchanged.

4. MixColumns: the bytes of the state are modi¬ed column by column as

follows:

s′ ← (02 · s0,c ) • (03 · s1,c ) • s2,c • s3,c

0,c

′

s1,c ← s0,c • (02 · s1,c ) • (03 · s2,c ) • s3,c

s′ ← s0,c • s1,c • (02 · s2,c ) • (03 · s3,c )

2,c

s′ ← (03 · s0,c ) • s1,c • s2,c • (02 · s3,c )

3,c

The pseudo-code for AES encryption with a 128-bit key is recalled in Figure

1 in Appendix. The word array w contains the round keys that are generated by

the key-schedule algorithm. We refer to [4] for a more detailed description.

For decryption, each round (except the last) consists in the following opera-

tions:

1. InvShiftRows: is the inverse of the ShiftRows operation. The bytes of the

state are cyclically shifted in each row by a certain o¬set; the ¬rst row is left

unchanged.

2. InvSubBytes: is the inverse of the SubBytes operation. The inverse S-box

S ’1 is applied on each byte of the state.

3. AddRoundKey: the operation is equal to its own inverse.

4. InvMixColumns: is the inverse of the MixColumns operation. The bytes of

the state are modi¬ed column by column as follows:

s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )

0,c

s′ ← (09 · s0,c ) • (0e · s1,c ) • (0b · s2,c ) • (0d · s3,c )

1,c

s′ ← (0d · s0,c ) • (09 · s1,c ) • (0e · s2,c ) • (0b · s3,c )

2,c

s′ ← (0b · s0,c ) • (0d · s1,c ) • (09 · s2,c ) • (0e · s3,c )

3,c

The pseudo-code for the inverse cipher is recalled in Figure 2 in Appendix.

Finally, the key-schedule is based on the following operations:

1. SubWord: takes a four-byte input word and applies the S-box S to each of

the four bytes.

2. RotWord: takes a word [a0 , a1 , a2 , a3 ] as input and performs a cyclic per-

mutation to return [a1 , a2 , a3 , a0 ].

3. Xor with Rcon: takes as input a 32-bits word and xor it with the round

constant word array Rcon[i] = [(02)i’1 , 00, 00, 00], for round 1 ¤ i ¤ 10.

We refer to [4] for a full description of the key-schedule.

3 The Permutation Table Countermeasure

In this section we describe our basic countermeasure with permutation tables.

A variant with a time-memory trade-o¬ is described in Section 6.

Our countermeasure consists in using a randomised representation of the

state variables, using two independent 4-bit permutation tables p1 and p2 that

are freshly generated before each new execution of the algorithm. More precisely,

every intermediate byte x = xh xl , where xh is the high nibble and xl is the low

nibble, will be represented in the following form:

P (x) = p2 (xh ) p1 (xl )

This permuted representation is then used throughout the execution of AES.

Eventually the original representation is recovered at the end of the encryption

algorithm, by applying the inverse permutation.

3.1 Generation of Permutation Tables p1 and p2

The 4-bit permutation tables p1 and p2 are generated for each new execution of

the algorithm as follows. Let s0 be a ¬xed 4-bit permutation table; one can take

for example:

s0 = [14, 6, 0, 5, 9, 1, 4, 15, 8, 10, 12, 2, 3, 13, 11, 7]

One de¬nes a sequence of permutations si for i ≥ 0 as follows:

si+1 (x) = s0 (si (x) • ki )

where each ki ∈ {0, 1}4 is randomly generated. The 4-bit permutation p1 is then

de¬ned as p1 := sn for some n (in practice, one can take n = 4). One applies

the same procedure to generate the other table p2 (with independently generated

ki ™s). Every intermediate byte x = xh xl that appear in AES is then represented

as:

P (x) = p2 (xh ) p1 (xl )

Therefore, P is a 8-bit permutation; its storage requires 16 bytes of RAM. In

the following we explain how to use this permuted representation throughout the

AES operations, so that the intermediate data are never manipulated in clear.

3.2 AddRoundKey

Given P (x) and P (y) we explain how to compute P (x•y) without manipulating

x and y directly (since otherwise this would give a straightforward DPA attack).

We de¬ne the following two 8-bit to 4-bit xor-tables; for all x′ , y ′ ∈ {0, 1}4:

XT1 (x′ , y ′ ) := p1 (p’1 (x′ ) • p1 (y ′ ))

’1

4 1

XT2 (x′ , y ′ ) := p2 (p2 (x′ ) • p’1 (y ′ ))

’1

4 2

Those two tables require a total of 256 bytes in memory. Then given p1 (x) and

p1 (y) one can compute p1 (x • y) using:

XT1 (p1 (x), p1 (y)) = p1 (x • y)

4

for all x, y ∈ {0, 1}4. Similarly, we have:

XT2 (p2 (x), p2 (y)) = p2 (x • y)

4

Using those two tables we de¬ne the following function for all x′ , y ′ ∈ {0, 1}8,

where x′ = x′ x′ and y ′ = yh yl :

′ ′

h l

XT8 (x′ , y ′ ) = XT2 (x′ , yh ) XT1 (x′ , yl )

′ ′

4h 4l

Then given P (x) and P (y), one can compute P (x • y) as:

XT8 (P (x), P (y)) = P (x • y) (1)

The AddRoundKey operation can then be implemented as:

s′ ← XT8 (s′ , ki,j )

′

i,j i,j

where s′ = P (si,j ) and ki,j = P (ki,j ). It is therefore necessary to use the

′

i,j

permuted representation for the round keys. We further describe how this is

done by modifying the key-schedule operations (see sections 3.9, 3.10 and 3.11).

3.3 SubBytes

Let S be the AES SBOX. We de¬ne the following randomised permutation S ′ :

S ′ (x) = P (S(P ’1 (x)))

Given P (x), the permuted representation of S(x) is then computed as:

P (S(x)) = S ′ (P (x))

The SubBytes operation on the permuted state variables can then be computed

using the table S ′ as follows:

s′ ← S ′ (s′ )

i,j i,j

The randomised table S ′ (x) requires 256 bytes in RAM.

3.4 ShiftRows

No modi¬cation is required.

3.5 MixColumns

Using 03 · x = (02 · x) • x, the MixColumns operation can be written as follows

(¬rst line):

s′ ← (02 · s0,c ) • (02 · s1,c ) • s1,c • s2,c • s3,c

0,c

Therefore, we must be able to compute P (02·x) from P (x). For any x = xh xl ∈

{0, 1}8, from x = (0 xl ) • (xh 0) we have:

02 · x = (02 · (0 xl )) • (02 · (xh 0))

and using equation (1), we obtain:

P (02 · x) = XT8 P (02 · (0 xl )), P (02 · (xh 0) (2)

Therefore, for all x′ ∈ {0, 1}8 where x′ = x′ x′ , we de¬ne the following two

h l

tables (4-bit to 8-bit):

p’1 (x′ )

ML(x′ ) := P 02 · 0 1

l l

’1

MH(x′ ) := P 02 · p2 (x′ ) 0

h h

which gives for any xl , xh ∈ {0, 1}4:

ML(p1 (xl )) = P 02 · ( 0 xl )

(3)

MH(p2 (xh )) = P 02 · ( xh 0)

Using equations (2) and (3), given P (x) = p2 (xh ) p1 (xl ), we can therefore

compute P (02 · x) as follows:

P (02 · x) = XT8 (ML(p1 (xl )), MH(p2 (xh )))

For simplicity, given P (x) = x′ x′ , we denote:

h l

D2 (x′ x′ ) = XT8 ML(x′ ), MH(x′ )

h l l h

so that for any x ∈ {0, 1}8:

P (02 · x) = D2 (P (x)) (4)

The ¬rst line of the MixColumns operation with permuted data can therefore be

written as:

s′′ ← XT8 D2 (s′ ), XT8 D2 (s′ ), XT8 s′ , XT8 (s′ , s′ )

0,c 0,c 1,c 1,c 2,c 3,c

and the other three lines are implemented analogously. The storage of the two

ML and MH tables requires 32 bits of RAM.

3.6 InvShiftRows

The algorithm remains the same.

3.7 InvSubBytes

This is similar to the SubBytes algorithm: we de¬ne the following randomised

permutation S ′’1 :

S ′’1 (x) = P (S ’1 (P ’1 (x)))

Therefore, the InvSubBytes operation on the permuted state variables is com-

puted using table S ′’1 (x) as follows:

s′ ← S ′’1 (s′ )

i,j i,j

Note that one can generate the randomised table S ′’1 (x) from S(x) only, so

that it is not necessary to store S ’1 (x) in ROM, using the fact that:

S ′’1 = P —¦ S ’1 —¦ P ’1 = (P —¦ S —¦ P ’1 )’1

3.8 InvMixColumns

The ¬rst line of the InvMixColumns operation is as follows:

s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )

0,c

We have the following relations in GF(28 ):

0b = 09 • 02, 0d = 0c • 01, 0e = 0c • 02 (5)

Therefore only two tables for computing the multiplication by 09 and 0c are

required, from which multiplication by 0b, 0d and 0e can be computed without

additional tables. More precisely, for all x ∈ {0, 1}4, we build the following 4-bit

to 8-bit tables:

p’1 (x′ )

TL(x′ ) := P 09 · 0

l l

1

TH(x′ ) := P 09 · p’1 (x′ ) 0

2

h h

p’1 (x′ )

RL(x′ ) := P 0c · 0

l l

1

RH(x′ ) := P 0c · p’1 (x′ ) 0

2

h h

Storing those four tables requires 64 bytes in RAM. Then, as in section 3.5,

writing x′ = P (x) = x′ x′ = p2 (xh ) p1 (xl ), we obtain:

h l

P (09 · x) = XT8 TH(x′ ), TL(x′ )

h l

P (0c · x) = XT8 RH(x′ ), RL(x′ )

h l

As previously we denote:

D9 (x′ x′ ) = XT8 TH(x′ ), TL(x′ )

h l h l

Dc (x′ x′ ) = XT8 RH(x′ ), RL(x′ )

h l h l

Using equations (5), we also denote:

Db (x) = XT8 (D9 (x), D2 (x))

Dd (x) = XT8 (Dc (x), x)

De (x) = XT8 (Dc (x), D2 (x))

The ¬rst line of the InvMixColumns operation can then be rewritten as follows:

s′′ ← XT8 De (s′ ), XT8 Db (s′ ), XT8 Dd (s′ ), D9 (s′ )

0,c 0,c 1,c 2,c 3,c

and the other three lines are rewritten analogously.

3.9 SubWord

The SubWord operation on the modi¬ed state variables is implemented like the

SubByte operation.

3.10 RotWord

The RotWord operation remains unmodi¬ed.

3.11 Xor with Rcon

Let

R(i) = 02i’1

for all 1 ¤ i ¤ 10. We have R(0) = 01 and R(i) = 02 · R(i ’ 1) for all 1 ¤ i ¤ 10.

Therefore, letting R′ (i) := P (R(i)), we have:

R′ (i) = P (R(i)) = P (02 · R(i ’ 1)) = D2 (R(i ’ 1))

Therefore the Rcon constant can be computed using the function D2 (x) de¬ned

in section 3.5.

4 Security

In this section we show that our countermeasure in resistant against ¬rst-order

DPA. This is due to the following lemma:

Lemma 1. For a ¬xed key and input message, every intermediate byte that

is computed in the course of the randomised AES algorithm has the uniform

distribution in {0, 1}8 .

Proof. The proof follows directly from the fact that any intermediate AES data

x is represented as P (x), where P (xh xl ) = p2 (xh ) p1 (xl ) is the randomised

permutation. From the construction of p1 , p2 , this implies that P (x) is randomly

distributed in {0, 1}8. “

”

The previous lemma implies that an attacker who makes statistics about

power-consumption at a single point gets only random values; hence, the coun-

termeasure is resistant against ¬rst-order DPA (like the random masking coun-

termeasure). In order to obtain valuable information about the key, the attacker

must correlate the power consumption at multiple points during the execution of

the algorithm; this is called a High Order Di¬erential Power Analysis (HO-DPA);

such attack usually requires a much larger number of power consumption curves,

which makes it infeasible in practice if the number of executions is limited (for

example, by using a counter).

5 A Compression Scheme

A very nice compression scheme for SBOX tables has been proposed in [11]. This

compression scheme works for SBOXes with a random mask; we recall it in this

section.1 Then in Section 6 we show how to adapt it to our permutation table

countermeasure.

Let S(x) for x ∈ {0, 1}8 be a 8-bit to 8-bit SBOX. One de¬nes S1 (x) and

S2 (x) such that S(x) = S2 (x) S1 (x) for all x ∈ {0, 1}8, where S1 (x) and S2 (x)

are 4-bit values. Let r1 , r2 ∈ {0, 1}8 and s ∈ {0, 1}4 be random masks, and let

de¬ne the randomised table:

T (x) = S1 (x • r1 ) • S2 (x • r2 ) • s (6)

which is a 8-bit to 4-bit table. Let x′ = x • (r1 • r2 ) be a masked data; we have

from (6):

S1 (x) = S1 (x′ • r1 • r2 ) = T (x′ • r2 ) • S2 (x′ ) • s

S2 (x) = S2 (x′ • r1 • r2 ) = T (x′ • r1 ) • S1 (x′ ) • s

which gives:

S1 (x) • s = T (x′ • r2 ) • S2 (x′ ) (7)

S2 (x) • s = T (x′ • r1 ) • S1 (x′ ) (8)

Therefore given the masked data x′ one can obtain a masked S1 (x) and

a masked S2 (x), by using the randomised table T . The advantage is that the

size of T is only half the size of a classically randomised SBOX table; here the

storage of T requires only 128 bytes of RAM instead of 256 bytes for a classically

randomised AES SBOX. More precisely, the algorithm is as follows:

InitTable(r1 , r2 , s)

1. Write S(x) = S2 (x) S1 (x)

2. Generate randomised table T (x) = S1 (x•r1 )•S2 (x•r2 )•s for all x ∈ {0, 1}8

TableLookUp(x′ , r1 , r2 , s)

1. Generate a random t ∈ {0, 1}4

2. u ← T (x′ • r2 ) • S2 (x′ )

3. v ← T (x′ • r1 ) • S1 (x′ ) • t

4. Output y = v u ∈ {0, 1}8 and r′ = (s • t) s.

Here we add an additional masking step with t so that the values u and v

are not masked with the same nibble s; from equations (7) and (8), we obtain

that the value returned by TableLookUp is such that y = S(x) • r′ .

It is easy to see that this countermeasure is resistant against ¬rst-order DPA,

as all the variables that appear in the TableLookUp function are uniformly dis-

tributed. Note that the tables S1 and S2 are only stored in ROM; they don™t

need to be randomised because in the TableLookUp algorithm those tables are

accessed at point x′ which is already randomised.

1

In [11] the countermeasure is described in plain English which makes it di¬cult to

understand; here we attempt to provide a more clear description.

It is also easy to see that the countermeasure can be generalised to any SBOX

input and output size. Moreover, one can obtain a better compression factor by

splitting S(x) into more shares; for example, a 8-bit SBOX could be split into

8 tables (one for each output bit); then the resulting randomised T table would

be 8 times smaller, at the cost of increasing the running time for every table

look-up.

6 Time memory Trade-o¬s

In this section we describe three time-memory trade-o¬s. The goal is to re-

duce the RAM requirement of the permutation table countermeasure described

in Section 3, for implementation on low-cost devices, at the cost of increasing

the running time. The main time-memory tradeo¬ is based on the SBOX com-

pression scheme recalled in the previous section. The second idea consists in

using a single XOR table XT1 (x′ , y ′ ) instead of two XOR tables XT1 (x′ , y ′ ) and

4 4

XT2 (x′ , y ′ ) as in Section 3.2. The third idea consists in removing tables TL, TH,

4

RL and RH, by using a “Double-And-Add” approach. In Section 7 we describe

the results of practical implementations of these time-memory trade-o¬s.

6.1 Compressed SBOX

The compression scheme of [11] recalled in the previous section was used for

random masking; here we show how to adapt this compression scheme to our

permutation table countermeasure.

We de¬ne a new permutation P ′ (x) = p′ (xh ) p′ (xl ) where p′ and p′ are

2 1 1 2

4-bit permutations tables which are generated like p1 and p2 . As previously, we

write:

S(x) = S2 (x) S1 (x)

where S1 (x) and S2 (x) are 4-bit nibbles. We then de¬ne a randomised table:

T (x′ ) = p1 S1 P ’1 (x′ ) • p2 S2 P ’1 P ′’1 (x′ ) (9)

The randomised table T (x′ ) is a 8-bit to 4-bit table; therefore, its storage requires

128 bits in memory, instead of 256 bytes for the randomised table S ′ (x′ ) in

Section 3.3. Writing x′ = P (x), we obtain from equation (9):

p1 (S1 (x)) = T (P (x)) • p2 S2 P ’1 P ′’1 (P (x))

This shows that given P (x) we can compute p1 (S1 (x)) using randomised table

T and table S2 stored in ROM. Similarly, writing x′ = P ′ (P (x)), we obtain from

equation (9):

p2 (S2 (x)) = T (P ′ (P (x))) • p1 S1 P ’1 (P ′ (P (x)))

This shows that given P (x) we can compute p2 (S2 (x)) using randomised table

T and table S1 stored in ROM. Therefore, given P (x) we can compute:

P (S(x)) = p2 (S2 (x)) p1 (S1 (x))

using the following operations:

InitTable(P, P ′ ):

1. Write S(x) = S2 (x) S1 (x)

2. Generate table T (x′ ) = p1 S1 P ’1 (x′ ) • p2 S2 P ’1 P ′’1 (x′ ) for all

x′ ∈ {0, 1}8 .

TableLookUp(x′ , T, P, P ′ )

1. u ← T (x′ ) • p2 S2 P ’1 P ′’1 (x′ )

2. v ← T (P ′ (x′ )) • p1 S1 P ’1 (P ′ (x′ ))

3. Output v u

Given the tables P , P ′ , T and denoting F (x′ ) = TableLookUp(x′ , T, P, P ′ ),

the SubBytes operation on the permuted state variables is computed as follows:

s′ ← F (s′ )

i,j i,j

The table T requires 128 bytes in memory, and the additional permutations P ′

and P ′’1 require 32 bytes in memory, so in total 160 bytes are required (instead

of 256 bytes for the randomised table S ′ ). The InvSubBytes operation on the

permuted state variables with compressed inverse SBOX S ’1 is obtained by

replacing S by S ’1 in the previous equations.

6.2 Single XOR table

In Section 3.2 two 8-bit to 4-bit xor-tables are de¬ned. In this section, we show

that it is su¬cient to de¬ne only one 8-bit to 4-bit xor-table. As in Section 3.2,

we de¬ne:

XT1 (x′ , y ′ ) := p1 (p1 (x′ ) • p1 (y ′ ))

’1 ’1

4

We also de¬ne the 4-bit to 4-bit permutations:

p12 (x) = p1 (p’1 (x)) (10)

2

p2 (p’1 (x))

p21 (x) = (11)

1

and we store those two permutation tables in RAM. Then we can de¬ne the

function:

XT2 (x′ , y ′ ) := p21 (XT1 (p12 (x′ ), p12 (y ′ )))

4 4

From equations (10) and (11) we obtain that for all xh , yh ∈ {0, 1}4:

XT2 (p2 (xh ), p2 (yh )) = p21 (XT1 (p1 (xh ), p1 (yh )) = p21 (p1 (xh • yh )) = p2 (xh • yh )

4 4

Therefore, the XT2 function takes the same values as the XT2 table de¬ned in

4 4

Section 3.2. The advantage is that only 16 bytes of RAM are necessary to store

the permutation tables p12 and p21 instead of 128 bytes for the previous XT2 4

table.

Double and Add for InvMixColumns

6.3

The ¬rst line of the InvMixColumns operation is as follows:

s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )

0,c

In this section we show how to avoid the four tables TL, TH, RL and RH, using

a Double and Add method. More precisely, using the existing D2 function (see

equation (4)) we de¬ne the following functions:

D4 (x′ ) := D2 (D2 (x′ ))

D8 (x′ ) := D2 (D4 (x′ ))

D9 (x′ ) := XT8 (D8 (x′ ), x′ )

Db (x′ ) := XT8 (D9 (x′ ), D2 (x′ ))

Dc (x′ ) := XT8 (D8 (x′ ), D4 (x′ ))

Dd (x′ ) := XT8 (Dc (x′ ), x′ )

De (x′ ) := XT8 (Dc (x′ ), D2 (x′ ))

Therefore no additional table is required beyond the already de¬ned ML and

MH tables. The ¬rst line of the InvMixColumns operation can then be rewritten

as follows:

s′′ ← XT8 De (s′ ), XT8 Db (s′ ), XT8 Dd (s′ ), D9 (s′ )

0,c 0,c 1,c 2,c 3,c

and the other three lines are rewritten analogously.

6.4 Security

As for our basic permutation table countermeasure, all the intermediate variables

that appear in the time-memory tradeo¬ variants have the uniform distribution:

Lemma 2. For a ¬xed key and input message, every intermediate byte that is

computed in the course of the randomised AES algorithm with any of the three

time-memory trade-o¬s has the uniform distribution in {0, 1}8 .

Therefore, the three previous time-memory tradeo¬s are secure against ¬rst-

order DPA.

7 Implementation

We summarise in Table 1 the timings observed and RAM needed for each AES

operation. The timings are based on an implementation in C on a 2 GHz laptop

under Linux. The RAM requirements are based on Tables 2 and 3 in Appendix.

These timings show that the new countermeasure is reasonably e¬cient, as it is

only roughly four times slower than AES with masking countermeasure, for a

comparable amount of memory. We note that the time-memory tradeo¬s enable

to spare 272 bytes in RAM compared to our basic permutation table counter-

measure.

Countermeasure Timing (µs) RAM (bytes)

AES encryption without countermeasure 2.2 214

AES encryption with masking 4.0 474

AES encryption with basic permutation tables 11 778

AES encryption with permutation tables + trade-o¬s 27 570

AES decryption without countermeasure 4.0 214

AES decryption with masking 9.5 474

AES decryption with basic permutation tables 23 842

AES decryption with permutation tables + trade-o¬s 42 570

Table 1. Timings obtained using a C implementation of AES without countermeasure,

with masking countermeasure, and with proposed countermeasure: basic permutation

table, and with time-memory trade-o¬s, on a 2 GHz laptop under Linux

8 Conclusion

We have described a new countermeasure against DPA for AES, that does not

use random masking. Our countermeasure is provably secure against ¬rst order

DPA, and reasonably e¬cient compared to the classical random masking coun-

termeasure. We have also described three time-memory tradeo¬s to reduce the

RAM requirement; this can be useful for smart-card implementations.

Acknowledgments: Thanks to Christophe Clavier for suggesting the “Double

and Add” approach in Section 6.3.

References

1. M. L. Akkar and Christophe Giraud, An Implementation of DES and AES, Secure

against Some Attacks, proceedings of CHES 2001. Springer-Verlag.

2. S. Chari, C. S. Jutla, J. R. Rao and P. Rohatgi, Towards Sound Approaches to

Counteract Power-Analysis Attacks, Proceedings of Crypto™99.

3. J.S. Coron, Resistance against Di¬erential Power Analysis for Elliptic Curve Cryp-

tosystems, Proceedings of CHES 1999, pp. 292-302.

4. J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced Encryp-

tion Standard. Springer-Verlag, 2002. ISBN 3540425802.

5. J. D. Golic and C. Tymen, Multiplicative Masking and Power Analysis of AES,

Proceedings of CHES 2002.

6. P. Kocher, J. Ja¬e and B. Jun, Di¬erential Power Analysis, Proceedings of

CRYPTO ™99, LNCS.

7. P. Kocher et al, DES and Other Cryptographic processes with Leak Minimization

for smartcards and other cryptosystems. US 6,278,783 B1, Jun. 3, 1999. Available

at: http://www.cryptography.com/technology/dpa/licensing.html.

8. IBM Corporation, Space-e¬cient, side-channel attack resistant table lookups, Ap-

plication Patent 20030044003.

9. E. Oswald, S. Mangard, N. Pramstaller and V. Rijmen, A Side-Channel Analysis

Resistant Description of the AES S-box, Proceedings of FSE 2005, LNCS 3557,

pp. 413-423.

10. E. Oswald and K. Schramm, An E¬cient Masking Scheme for AES Software Im-

plementations. Proceedings of WISA 2005, LNCS 3786, Springer, pp. 292-305, 2006

11. J.R. Rao, P. Rohatgi, H. Scherzer and S. Tinguely, Partitioning attacks: Or How

to rapidly Clone Some GSM Cards, Proceedings of the 2002 IEEE Symposium on

Security and Privacy, 2002.

12. J. Wolkerstorfer, E. Oswald and M. Lamberger, An ASIC implementation of the

AES SBoxes. Proceedings of CT-RSA 2002. Springer-Verlag.

A AES pseudo-code

Cipher(byte in[16], byte out[16], word w[44])

begin

byte state[16]

state=in

AddRoundKey(state,w[0,3])

for round=1 to 9

SubBytes(state)

ShiftRows(state)

MixColumns(state)

AddRoundKey(state,w[round*4,(round+1)*4-1])

end for

SubBytes(state)

ShiftRows(state)

AddRoundKey(state,w[40,43])

end

Fig. 1. Pseudo-code for AES encryption

B List of Tables

We summarise in Table 2 the list of randomised tables required for each opera-

tion. Note that for decryption, we also need table S ′ to compute the key-schedule,

but this table can be discarded at the end of the key-schedule. The RAM re-

quirement for those randomised tables is summarised in Table 3.

InvCipher(byte in[16], byte out[16], word w[44])

begin

byte state[16]

state=in

AddRoundKey(state,w[40,43])

for round=9 downto 1

InvShiftRows(state)

InvSubBytes(state)

AddRoundKey(state,w[round*4,(round+1)*4-1])

InvMixColumns(state)

end for

InvShiftRows(state)

InvSubBytes(state)

AddRoundKey(state,w[0,3])

end

Fig. 2. Pseudo-code for AES decryption

Operation Tables required

P , P ’1 , XT1 , XT2 , S ′ , ML, MH

AES encryption, basic 4 4

P , P , XT4 , XT2 , S ′’1 , ML, MH,RL, RH,TL, TH

1

’1

AES decryption, basic 4

1

tradeo¬s P , P , XT4 , ML, MH, p12 , p21 , P ′ , P ′’1 , T

’1

AES encryption,

tradeo¬s P , P ’1 , XT1 , ML, MH, p12 , p21 , P ′ , P ′’1 , T

AES decryption, 4

Table 2. Randomised tables required for each AES operation, for the basic permutation

table countermeasure, and with the three time-memory tradeo¬s.

Operation RAM (bytes)

’1

Permutations P and P 32

1

Xor-table XT4 128

2

Xor-table XT4 128

′

Randomised SBOX S 256

′’1

Randomised SBOX S 256

Tables ML, MH 32

Tables TL, TH, RL, RH 64

Permutations p12 and p21 16

′ ′’1

Permutations P and P 32

Table T 128

Table 3. Memory requirement for the randomised tables