A new DPA Countermeasure based on
Permutation Tables

Jean-S´bastien Coron
e

University of Luxembourg
jean-sebastien.coron@uni.lu



Abstract. We propose and analyse a new countermeasure against Dif-
ferential Power Analysis (DPA) for the AES encryption algorithm, based
on permutation tables. As opposed to existing AES countermeasures, it
does not use random masking. We prove that our new countermeasure is
resistant against ¬rst-order DPA; we also show that it is quite e¬cient
in practice.


1 Introduction

The AES [4] encryption algorithm is with DES the most widely used encryption
algorithm. However, it is easy to see that without modi¬cation, AES is vulnerable
to Di¬erential Power Analysis as introduced by Kocher et al. in [6, 7]. A DPA
attack consists in extracting information about the secret key of a cryptographic
algorithm, by studying the power consumption of the electronic device during the
execution of the algorithm. The attack was ¬rst described on the DES encryption
scheme, but it was soon understood that the attack could easily be extended
to other symmetrical cryptosystems such as the AES, and also to public-key
cryptosystems such as RSA and Elliptic-Curves Cryptosystems [3].
A common technique to protect secret-key algorithms against side-channel
attacks consists in masking all data with a random integer, as suggested in
[2]. The masked data and the random integer are then processed separately and
eventually recombined at the end of the algorithm. An attacker trying to analyse
power consumption at a single point will obtain only random values; therefore,
the implementation will be secure against ¬rst-order Di¬erential Power Analysis
(DPA). In order to obtain valuable information about the key, the attacker must
correlate the power consumption at multiple points during the execution of the
algorithm; this is called a High Order Di¬erential Power Analysis (HO-DPA);
such attack usually requires a much larger number of power consumption curves,
which makes it infeasible in practice if the number of executions is limited (for
example, by using a counter). Many AES countermeasures have been described
based on random masking [1, 5, 9, 10, 12].
In this article we propose a di¬erent countermeasure against DPA for AES,
based on permutation tables. The main di¬erence with existing AES counter-
measures is that it avoids random masking; in practice this can be an advantage
because random masking is subject to numerous patents [7]. We prove that our
countermeasure is resistant against ¬rst-order DPA (like the random masking
countermeasure) and we show that its e¬ciency is comparable to that of the
random masking countermeasure.
It works as follows: at initialisation time a randomised permutation table
is generated in RAM; this permutation table is then applied to the message
and to the key; then all intermediate variables that appear during the course of
the algorithm remain in permuted form; eventually the inverse permutation is
applied to obtain the resulting ciphertext.
We also describe a technique to reduce the RAM consumption of our per-
mutation table countermeasure, at the cost of increasing the running time. Our
technique is based on a compression scheme proposed in [11] for the classical
random masking countermeasure; here we adapt this scheme to permutation ta-
bles. We show that this variant is also secure against ¬rst-order DPA. Finally, we
also provide the result of implementations that show that our countermeasure
is reasonably e¬cient in practice, as it is only roughly four times slower than
the classical masking countermeasure, for a comparable RAM requirement (see
Table 1 in Section 7 for a detailed comparison).


2 The AES encryption algorithm

In this section we recall the main operations involved in the AES algorithm. We
refer to [4] for a full description. AES operates on a 4 — 4 array of bytes si,j ,
termed the state. For encryption, each AES round (except the last) consists of
four stages:

1. AddRoundKey: each byte of the state is xored with the round key ki,j ,
derived from the key schedule:

si,j ← si,j • ki,j

2. SubBytes: each byte of the state is updated using an 8-bit S-box:

si,j ← S(si,j )

3. ShiftRows: the bytes of the state are cyclically shifted in each row by a
certain o¬set; the ¬rst row is left unchanged.
4. MixColumns: the bytes of the state are modi¬ed column by column as
follows:

s′ ← (02 · s0,c ) • (03 · s1,c ) • s2,c • s3,c
0,c

s1,c ← s0,c • (02 · s1,c ) • (03 · s2,c ) • s3,c
s′ ← s0,c • s1,c • (02 · s2,c ) • (03 · s3,c )
2,c
s′ ← (03 · s0,c ) • s1,c • s2,c • (02 · s3,c )
3,c
The pseudo-code for AES encryption with a 128-bit key is recalled in Figure
1 in Appendix. The word array w contains the round keys that are generated by
the key-schedule algorithm. We refer to [4] for a more detailed description.
For decryption, each round (except the last) consists in the following opera-
tions:
1. InvShiftRows: is the inverse of the ShiftRows operation. The bytes of the
state are cyclically shifted in each row by a certain o¬set; the ¬rst row is left
unchanged.
2. InvSubBytes: is the inverse of the SubBytes operation. The inverse S-box
S ’1 is applied on each byte of the state.
3. AddRoundKey: the operation is equal to its own inverse.
4. InvMixColumns: is the inverse of the MixColumns operation. The bytes of
the state are modi¬ed column by column as follows:
s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )
0,c
s′ ← (09 · s0,c ) • (0e · s1,c ) • (0b · s2,c ) • (0d · s3,c )
1,c
s′ ← (0d · s0,c ) • (09 · s1,c ) • (0e · s2,c ) • (0b · s3,c )
2,c
s′ ← (0b · s0,c ) • (0d · s1,c ) • (09 · s2,c ) • (0e · s3,c )
3,c



The pseudo-code for the inverse cipher is recalled in Figure 2 in Appendix.
Finally, the key-schedule is based on the following operations:
1. SubWord: takes a four-byte input word and applies the S-box S to each of
the four bytes.
2. RotWord: takes a word [a0 , a1 , a2 , a3 ] as input and performs a cyclic per-
mutation to return [a1 , a2 , a3 , a0 ].
3. Xor with Rcon: takes as input a 32-bits word and xor it with the round
constant word array Rcon[i] = [(02)i’1 , 00, 00, 00], for round 1 ¤ i ¤ 10.
We refer to [4] for a full description of the key-schedule.

3 The Permutation Table Countermeasure
In this section we describe our basic countermeasure with permutation tables.
A variant with a time-memory trade-o¬ is described in Section 6.
Our countermeasure consists in using a randomised representation of the
state variables, using two independent 4-bit permutation tables p1 and p2 that
are freshly generated before each new execution of the algorithm. More precisely,
every intermediate byte x = xh xl , where xh is the high nibble and xl is the low
nibble, will be represented in the following form:
P (x) = p2 (xh ) p1 (xl )
This permuted representation is then used throughout the execution of AES.
Eventually the original representation is recovered at the end of the encryption
algorithm, by applying the inverse permutation.
3.1 Generation of Permutation Tables p1 and p2
The 4-bit permutation tables p1 and p2 are generated for each new execution of
the algorithm as follows. Let s0 be a ¬xed 4-bit permutation table; one can take
for example:

s0 = [14, 6, 0, 5, 9, 1, 4, 15, 8, 10, 12, 2, 3, 13, 11, 7]

One de¬nes a sequence of permutations si for i ≥ 0 as follows:

si+1 (x) = s0 (si (x) • ki )

where each ki ∈ {0, 1}4 is randomly generated. The 4-bit permutation p1 is then
de¬ned as p1 := sn for some n (in practice, one can take n = 4). One applies
the same procedure to generate the other table p2 (with independently generated
ki ™s). Every intermediate byte x = xh xl that appear in AES is then represented
as:
P (x) = p2 (xh ) p1 (xl )
Therefore, P is a 8-bit permutation; its storage requires 16 bytes of RAM. In
the following we explain how to use this permuted representation throughout the
AES operations, so that the intermediate data are never manipulated in clear.

3.2 AddRoundKey
Given P (x) and P (y) we explain how to compute P (x•y) without manipulating
x and y directly (since otherwise this would give a straightforward DPA attack).
We de¬ne the following two 8-bit to 4-bit xor-tables; for all x′ , y ′ ∈ {0, 1}4:

XT1 (x′ , y ′ ) := p1 (p’1 (x′ ) • p1 (y ′ ))
’1
4 1
XT2 (x′ , y ′ ) := p2 (p2 (x′ ) • p’1 (y ′ ))
’1
4 2

Those two tables require a total of 256 bytes in memory. Then given p1 (x) and
p1 (y) one can compute p1 (x • y) using:

XT1 (p1 (x), p1 (y)) = p1 (x • y)
4

for all x, y ∈ {0, 1}4. Similarly, we have:

XT2 (p2 (x), p2 (y)) = p2 (x • y)
4

Using those two tables we de¬ne the following function for all x′ , y ′ ∈ {0, 1}8,
where x′ = x′ x′ and y ′ = yh yl :
′ ′
h l

XT8 (x′ , y ′ ) = XT2 (x′ , yh ) XT1 (x′ , yl )
′ ′
4h 4l

Then given P (x) and P (y), one can compute P (x • y) as:

XT8 (P (x), P (y)) = P (x • y) (1)
The AddRoundKey operation can then be implemented as:

s′ ← XT8 (s′ , ki,j )

i,j i,j

where s′ = P (si,j ) and ki,j = P (ki,j ). It is therefore necessary to use the

i,j
permuted representation for the round keys. We further describe how this is
done by modifying the key-schedule operations (see sections 3.9, 3.10 and 3.11).

3.3 SubBytes
Let S be the AES SBOX. We de¬ne the following randomised permutation S ′ :

S ′ (x) = P (S(P ’1 (x)))

Given P (x), the permuted representation of S(x) is then computed as:

P (S(x)) = S ′ (P (x))

The SubBytes operation on the permuted state variables can then be computed
using the table S ′ as follows:

s′ ← S ′ (s′ )
i,j i,j

The randomised table S ′ (x) requires 256 bytes in RAM.

3.4 ShiftRows
No modi¬cation is required.

3.5 MixColumns
Using 03 · x = (02 · x) • x, the MixColumns operation can be written as follows
(¬rst line):
s′ ← (02 · s0,c ) • (02 · s1,c ) • s1,c • s2,c • s3,c
0,c

Therefore, we must be able to compute P (02·x) from P (x). For any x = xh xl ∈
{0, 1}8, from x = (0 xl ) • (xh 0) we have:

02 · x = (02 · (0 xl )) • (02 · (xh 0))

and using equation (1), we obtain:

P (02 · x) = XT8 P (02 · (0 xl )), P (02 · (xh 0) (2)

Therefore, for all x′ ∈ {0, 1}8 where x′ = x′ x′ , we de¬ne the following two
h l
tables (4-bit to 8-bit):

p’1 (x′ )
ML(x′ ) := P 02 · 0 1
l l
’1
MH(x′ ) := P 02 · p2 (x′ ) 0
h h
which gives for any xl , xh ∈ {0, 1}4:

ML(p1 (xl )) = P 02 · ( 0 xl )
(3)
MH(p2 (xh )) = P 02 · ( xh 0)

Using equations (2) and (3), given P (x) = p2 (xh ) p1 (xl ), we can therefore
compute P (02 · x) as follows:

P (02 · x) = XT8 (ML(p1 (xl )), MH(p2 (xh )))

For simplicity, given P (x) = x′ x′ , we denote:
h l

D2 (x′ x′ ) = XT8 ML(x′ ), MH(x′ )
h l l h

so that for any x ∈ {0, 1}8:

P (02 · x) = D2 (P (x)) (4)

The ¬rst line of the MixColumns operation with permuted data can therefore be
written as:

s′′ ← XT8 D2 (s′ ), XT8 D2 (s′ ), XT8 s′ , XT8 (s′ , s′ )
0,c 0,c 1,c 1,c 2,c 3,c

and the other three lines are implemented analogously. The storage of the two
ML and MH tables requires 32 bits of RAM.


3.6 InvShiftRows

The algorithm remains the same.


3.7 InvSubBytes

This is similar to the SubBytes algorithm: we de¬ne the following randomised
permutation S ′’1 :
S ′’1 (x) = P (S ’1 (P ’1 (x)))
Therefore, the InvSubBytes operation on the permuted state variables is com-
puted using table S ′’1 (x) as follows:

s′ ← S ′’1 (s′ )
i,j i,j

Note that one can generate the randomised table S ′’1 (x) from S(x) only, so
that it is not necessary to store S ’1 (x) in ROM, using the fact that:

S ′’1 = P —¦ S ’1 —¦ P ’1 = (P —¦ S —¦ P ’1 )’1
3.8 InvMixColumns
The ¬rst line of the InvMixColumns operation is as follows:

s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )
0,c

We have the following relations in GF(28 ):

0b = 09 • 02, 0d = 0c • 01, 0e = 0c • 02 (5)

Therefore only two tables for computing the multiplication by 09 and 0c are
required, from which multiplication by 0b, 0d and 0e can be computed without
additional tables. More precisely, for all x ∈ {0, 1}4, we build the following 4-bit
to 8-bit tables:

p’1 (x′ )
TL(x′ ) := P 09 · 0
l l
1

TH(x′ ) := P 09 · p’1 (x′ ) 0
2
h h

p’1 (x′ )
RL(x′ ) := P 0c · 0
l l
1

RH(x′ ) := P 0c · p’1 (x′ ) 0
2
h h

Storing those four tables requires 64 bytes in RAM. Then, as in section 3.5,
writing x′ = P (x) = x′ x′ = p2 (xh ) p1 (xl ), we obtain:
h l

P (09 · x) = XT8 TH(x′ ), TL(x′ )
h l
P (0c · x) = XT8 RH(x′ ), RL(x′ )
h l

As previously we denote:

D9 (x′ x′ ) = XT8 TH(x′ ), TL(x′ )
h l h l
Dc (x′ x′ ) = XT8 RH(x′ ), RL(x′ )
h l h l

Using equations (5), we also denote:

Db (x) = XT8 (D9 (x), D2 (x))
Dd (x) = XT8 (Dc (x), x)
De (x) = XT8 (Dc (x), D2 (x))

The ¬rst line of the InvMixColumns operation can then be rewritten as follows:

s′′ ← XT8 De (s′ ), XT8 Db (s′ ), XT8 Dd (s′ ), D9 (s′ )
0,c 0,c 1,c 2,c 3,c

and the other three lines are rewritten analogously.

3.9 SubWord
The SubWord operation on the modi¬ed state variables is implemented like the
SubByte operation.
3.10 RotWord

The RotWord operation remains unmodi¬ed.


3.11 Xor with Rcon

Let
R(i) = 02i’1
for all 1 ¤ i ¤ 10. We have R(0) = 01 and R(i) = 02 · R(i ’ 1) for all 1 ¤ i ¤ 10.
Therefore, letting R′ (i) := P (R(i)), we have:

R′ (i) = P (R(i)) = P (02 · R(i ’ 1)) = D2 (R(i ’ 1))

Therefore the Rcon constant can be computed using the function D2 (x) de¬ned
in section 3.5.


4 Security

In this section we show that our countermeasure in resistant against ¬rst-order
DPA. This is due to the following lemma:

Lemma 1. For a ¬xed key and input message, every intermediate byte that
is computed in the course of the randomised AES algorithm has the uniform
distribution in {0, 1}8 .

Proof. The proof follows directly from the fact that any intermediate AES data
x is represented as P (x), where P (xh xl ) = p2 (xh ) p1 (xl ) is the randomised
permutation. From the construction of p1 , p2 , this implies that P (x) is randomly
distributed in {0, 1}8. “


The previous lemma implies that an attacker who makes statistics about
power-consumption at a single point gets only random values; hence, the coun-
termeasure is resistant against ¬rst-order DPA (like the random masking coun-
termeasure). In order to obtain valuable information about the key, the attacker
must correlate the power consumption at multiple points during the execution of
the algorithm; this is called a High Order Di¬erential Power Analysis (HO-DPA);
such attack usually requires a much larger number of power consumption curves,
which makes it infeasible in practice if the number of executions is limited (for
example, by using a counter).


5 A Compression Scheme

A very nice compression scheme for SBOX tables has been proposed in [11]. This
compression scheme works for SBOXes with a random mask; we recall it in this
section.1 Then in Section 6 we show how to adapt it to our permutation table
countermeasure.
Let S(x) for x ∈ {0, 1}8 be a 8-bit to 8-bit SBOX. One de¬nes S1 (x) and
S2 (x) such that S(x) = S2 (x) S1 (x) for all x ∈ {0, 1}8, where S1 (x) and S2 (x)
are 4-bit values. Let r1 , r2 ∈ {0, 1}8 and s ∈ {0, 1}4 be random masks, and let
de¬ne the randomised table:

T (x) = S1 (x • r1 ) • S2 (x • r2 ) • s (6)

which is a 8-bit to 4-bit table. Let x′ = x • (r1 • r2 ) be a masked data; we have
from (6):

S1 (x) = S1 (x′ • r1 • r2 ) = T (x′ • r2 ) • S2 (x′ ) • s
S2 (x) = S2 (x′ • r1 • r2 ) = T (x′ • r1 ) • S1 (x′ ) • s

which gives:

S1 (x) • s = T (x′ • r2 ) • S2 (x′ ) (7)
S2 (x) • s = T (x′ • r1 ) • S1 (x′ ) (8)

Therefore given the masked data x′ one can obtain a masked S1 (x) and
a masked S2 (x), by using the randomised table T . The advantage is that the
size of T is only half the size of a classically randomised SBOX table; here the
storage of T requires only 128 bytes of RAM instead of 256 bytes for a classically
randomised AES SBOX. More precisely, the algorithm is as follows:
InitTable(r1 , r2 , s)
1. Write S(x) = S2 (x) S1 (x)
2. Generate randomised table T (x) = S1 (x•r1 )•S2 (x•r2 )•s for all x ∈ {0, 1}8

TableLookUp(x′ , r1 , r2 , s)
1. Generate a random t ∈ {0, 1}4
2. u ← T (x′ • r2 ) • S2 (x′ )
3. v ← T (x′ • r1 ) • S1 (x′ ) • t
4. Output y = v u ∈ {0, 1}8 and r′ = (s • t) s.

Here we add an additional masking step with t so that the values u and v
are not masked with the same nibble s; from equations (7) and (8), we obtain
that the value returned by TableLookUp is such that y = S(x) • r′ .
It is easy to see that this countermeasure is resistant against ¬rst-order DPA,
as all the variables that appear in the TableLookUp function are uniformly dis-
tributed. Note that the tables S1 and S2 are only stored in ROM; they don™t
need to be randomised because in the TableLookUp algorithm those tables are
accessed at point x′ which is already randomised.
1
In [11] the countermeasure is described in plain English which makes it di¬cult to
understand; here we attempt to provide a more clear description.
It is also easy to see that the countermeasure can be generalised to any SBOX
input and output size. Moreover, one can obtain a better compression factor by
splitting S(x) into more shares; for example, a 8-bit SBOX could be split into
8 tables (one for each output bit); then the resulting randomised T table would
be 8 times smaller, at the cost of increasing the running time for every table
look-up.


6 Time memory Trade-o¬s
In this section we describe three time-memory trade-o¬s. The goal is to re-
duce the RAM requirement of the permutation table countermeasure described
in Section 3, for implementation on low-cost devices, at the cost of increasing
the running time. The main time-memory tradeo¬ is based on the SBOX com-
pression scheme recalled in the previous section. The second idea consists in
using a single XOR table XT1 (x′ , y ′ ) instead of two XOR tables XT1 (x′ , y ′ ) and
4 4
XT2 (x′ , y ′ ) as in Section 3.2. The third idea consists in removing tables TL, TH,
4
RL and RH, by using a “Double-And-Add” approach. In Section 7 we describe
the results of practical implementations of these time-memory trade-o¬s.

6.1 Compressed SBOX
The compression scheme of [11] recalled in the previous section was used for
random masking; here we show how to adapt this compression scheme to our
permutation table countermeasure.
We de¬ne a new permutation P ′ (x) = p′ (xh ) p′ (xl ) where p′ and p′ are
2 1 1 2
4-bit permutations tables which are generated like p1 and p2 . As previously, we
write:
S(x) = S2 (x) S1 (x)
where S1 (x) and S2 (x) are 4-bit nibbles. We then de¬ne a randomised table:
T (x′ ) = p1 S1 P ’1 (x′ ) • p2 S2 P ’1 P ′’1 (x′ ) (9)
The randomised table T (x′ ) is a 8-bit to 4-bit table; therefore, its storage requires
128 bits in memory, instead of 256 bytes for the randomised table S ′ (x′ ) in
Section 3.3. Writing x′ = P (x), we obtain from equation (9):
p1 (S1 (x)) = T (P (x)) • p2 S2 P ’1 P ′’1 (P (x))
This shows that given P (x) we can compute p1 (S1 (x)) using randomised table
T and table S2 stored in ROM. Similarly, writing x′ = P ′ (P (x)), we obtain from
equation (9):
p2 (S2 (x)) = T (P ′ (P (x))) • p1 S1 P ’1 (P ′ (P (x)))
This shows that given P (x) we can compute p2 (S2 (x)) using randomised table
T and table S1 stored in ROM. Therefore, given P (x) we can compute:
P (S(x)) = p2 (S2 (x)) p1 (S1 (x))
using the following operations:
InitTable(P, P ′ ):
1. Write S(x) = S2 (x) S1 (x)
2. Generate table T (x′ ) = p1 S1 P ’1 (x′ ) • p2 S2 P ’1 P ′’1 (x′ ) for all
x′ ∈ {0, 1}8 .

TableLookUp(x′ , T, P, P ′ )
1. u ← T (x′ ) • p2 S2 P ’1 P ′’1 (x′ )
2. v ← T (P ′ (x′ )) • p1 S1 P ’1 (P ′ (x′ ))
3. Output v u

Given the tables P , P ′ , T and denoting F (x′ ) = TableLookUp(x′ , T, P, P ′ ),
the SubBytes operation on the permuted state variables is computed as follows:

s′ ← F (s′ )
i,j i,j

The table T requires 128 bytes in memory, and the additional permutations P ′
and P ′’1 require 32 bytes in memory, so in total 160 bytes are required (instead
of 256 bytes for the randomised table S ′ ). The InvSubBytes operation on the
permuted state variables with compressed inverse SBOX S ’1 is obtained by
replacing S by S ’1 in the previous equations.


6.2 Single XOR table

In Section 3.2 two 8-bit to 4-bit xor-tables are de¬ned. In this section, we show
that it is su¬cient to de¬ne only one 8-bit to 4-bit xor-table. As in Section 3.2,
we de¬ne:
XT1 (x′ , y ′ ) := p1 (p1 (x′ ) • p1 (y ′ ))
’1 ’1
4

We also de¬ne the 4-bit to 4-bit permutations:

p12 (x) = p1 (p’1 (x)) (10)
2
p2 (p’1 (x))
p21 (x) = (11)
1

and we store those two permutation tables in RAM. Then we can de¬ne the
function:
XT2 (x′ , y ′ ) := p21 (XT1 (p12 (x′ ), p12 (y ′ )))
4 4

From equations (10) and (11) we obtain that for all xh , yh ∈ {0, 1}4:

XT2 (p2 (xh ), p2 (yh )) = p21 (XT1 (p1 (xh ), p1 (yh )) = p21 (p1 (xh • yh )) = p2 (xh • yh )
4 4

Therefore, the XT2 function takes the same values as the XT2 table de¬ned in
4 4
Section 3.2. The advantage is that only 16 bytes of RAM are necessary to store
the permutation tables p12 and p21 instead of 128 bytes for the previous XT2 4
table.
Double and Add for InvMixColumns
6.3
The ¬rst line of the InvMixColumns operation is as follows:
s′ ← (0e · s0,c ) • (0b · s1,c ) • (0d · s2,c ) • (09 · s3,c )
0,c

In this section we show how to avoid the four tables TL, TH, RL and RH, using
a Double and Add method. More precisely, using the existing D2 function (see
equation (4)) we de¬ne the following functions:
D4 (x′ ) := D2 (D2 (x′ ))
D8 (x′ ) := D2 (D4 (x′ ))
D9 (x′ ) := XT8 (D8 (x′ ), x′ )
Db (x′ ) := XT8 (D9 (x′ ), D2 (x′ ))
Dc (x′ ) := XT8 (D8 (x′ ), D4 (x′ ))
Dd (x′ ) := XT8 (Dc (x′ ), x′ )
De (x′ ) := XT8 (Dc (x′ ), D2 (x′ ))
Therefore no additional table is required beyond the already de¬ned ML and
MH tables. The ¬rst line of the InvMixColumns operation can then be rewritten
as follows:
s′′ ← XT8 De (s′ ), XT8 Db (s′ ), XT8 Dd (s′ ), D9 (s′ )
0,c 0,c 1,c 2,c 3,c

and the other three lines are rewritten analogously.

6.4 Security
As for our basic permutation table countermeasure, all the intermediate variables
that appear in the time-memory tradeo¬ variants have the uniform distribution:
Lemma 2. For a ¬xed key and input message, every intermediate byte that is
computed in the course of the randomised AES algorithm with any of the three
time-memory trade-o¬s has the uniform distribution in {0, 1}8 .
Therefore, the three previous time-memory tradeo¬s are secure against ¬rst-
order DPA.


7 Implementation
We summarise in Table 1 the timings observed and RAM needed for each AES
operation. The timings are based on an implementation in C on a 2 GHz laptop
under Linux. The RAM requirements are based on Tables 2 and 3 in Appendix.
These timings show that the new countermeasure is reasonably e¬cient, as it is
only roughly four times slower than AES with masking countermeasure, for a
comparable amount of memory. We note that the time-memory tradeo¬s enable
to spare 272 bytes in RAM compared to our basic permutation table counter-
measure.
Countermeasure Timing (µs) RAM (bytes)
AES encryption without countermeasure 2.2 214
AES encryption with masking 4.0 474
AES encryption with basic permutation tables 11 778
AES encryption with permutation tables + trade-o¬s 27 570
AES decryption without countermeasure 4.0 214
AES decryption with masking 9.5 474
AES decryption with basic permutation tables 23 842
AES decryption with permutation tables + trade-o¬s 42 570

Table 1. Timings obtained using a C implementation of AES without countermeasure,
with masking countermeasure, and with proposed countermeasure: basic permutation
table, and with time-memory trade-o¬s, on a 2 GHz laptop under Linux



8 Conclusion
We have described a new countermeasure against DPA for AES, that does not
use random masking. Our countermeasure is provably secure against ¬rst order
DPA, and reasonably e¬cient compared to the classical random masking coun-
termeasure. We have also described three time-memory tradeo¬s to reduce the
RAM requirement; this can be useful for smart-card implementations.
Acknowledgments: Thanks to Christophe Clavier for suggesting the “Double
and Add” approach in Section 6.3.


References
1. M. L. Akkar and Christophe Giraud, An Implementation of DES and AES, Secure
against Some Attacks, proceedings of CHES 2001. Springer-Verlag.
2. S. Chari, C. S. Jutla, J. R. Rao and P. Rohatgi, Towards Sound Approaches to
Counteract Power-Analysis Attacks, Proceedings of Crypto™99.
3. J.S. Coron, Resistance against Di¬erential Power Analysis for Elliptic Curve Cryp-
tosystems, Proceedings of CHES 1999, pp. 292-302.
4. J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced Encryp-
tion Standard. Springer-Verlag, 2002. ISBN 3540425802.
5. J. D. Golic and C. Tymen, Multiplicative Masking and Power Analysis of AES,
Proceedings of CHES 2002.
6. P. Kocher, J. Ja¬e and B. Jun, Di¬erential Power Analysis, Proceedings of
CRYPTO ™99, LNCS.
7. P. Kocher et al, DES and Other Cryptographic processes with Leak Minimization
for smartcards and other cryptosystems. US 6,278,783 B1, Jun. 3, 1999. Available
at: http://www.cryptography.com/technology/dpa/licensing.html.
8. IBM Corporation, Space-e¬cient, side-channel attack resistant table lookups, Ap-
plication Patent 20030044003.
9. E. Oswald, S. Mangard, N. Pramstaller and V. Rijmen, A Side-Channel Analysis
Resistant Description of the AES S-box, Proceedings of FSE 2005, LNCS 3557,
pp. 413-423.
10. E. Oswald and K. Schramm, An E¬cient Masking Scheme for AES Software Im-
plementations. Proceedings of WISA 2005, LNCS 3786, Springer, pp. 292-305, 2006
11. J.R. Rao, P. Rohatgi, H. Scherzer and S. Tinguely, Partitioning attacks: Or How
to rapidly Clone Some GSM Cards, Proceedings of the 2002 IEEE Symposium on
Security and Privacy, 2002.
12. J. Wolkerstorfer, E. Oswald and M. Lamberger, An ASIC implementation of the
AES SBoxes. Proceedings of CT-RSA 2002. Springer-Verlag.


A AES pseudo-code



Cipher(byte in[16], byte out[16], word w[44])
begin
byte state[16]
state=in
AddRoundKey(state,w[0,3])
for round=1 to 9
SubBytes(state)
ShiftRows(state)
MixColumns(state)
AddRoundKey(state,w[round*4,(round+1)*4-1])
end for
SubBytes(state)
ShiftRows(state)
AddRoundKey(state,w[40,43])
end


Fig. 1. Pseudo-code for AES encryption




B List of Tables
We summarise in Table 2 the list of randomised tables required for each opera-
tion. Note that for decryption, we also need table S ′ to compute the key-schedule,
but this table can be discarded at the end of the key-schedule. The RAM re-
quirement for those randomised tables is summarised in Table 3.
InvCipher(byte in[16], byte out[16], word w[44])
begin
byte state[16]
state=in
AddRoundKey(state,w[40,43])
for round=9 downto 1
InvShiftRows(state)
InvSubBytes(state)
AddRoundKey(state,w[round*4,(round+1)*4-1])
InvMixColumns(state)
end for
InvShiftRows(state)
InvSubBytes(state)
AddRoundKey(state,w[0,3])
end


Fig. 2. Pseudo-code for AES decryption



Operation Tables required
P , P ’1 , XT1 , XT2 , S ′ , ML, MH
AES encryption, basic 4 4
P , P , XT4 , XT2 , S ′’1 , ML, MH,RL, RH,TL, TH
1
’1
AES decryption, basic 4
1
tradeo¬s P , P , XT4 , ML, MH, p12 , p21 , P ′ , P ′’1 , T
’1
AES encryption,
tradeo¬s P , P ’1 , XT1 , ML, MH, p12 , p21 , P ′ , P ′’1 , T
AES decryption, 4


Table 2. Randomised tables required for each AES operation, for the basic permutation
table countermeasure, and with the three time-memory tradeo¬s.




Operation RAM (bytes)
’1
Permutations P and P 32
1
Xor-table XT4 128
2
Xor-table XT4 128

Randomised SBOX S 256
′’1
Randomised SBOX S 256
Tables ML, MH 32
Tables TL, TH, RL, RH 64
Permutations p12 and p21 16
′ ′’1
Permutations P and P 32
Table T 128

Table 3. Memory requirement for the randomised tables