ńņš. 3 |

One of the problems with security proofs is that they tend to cause a

certain level of complacency about using a cipher. The truth is that a se-

curity proof is only valid in a very precise set of circumstances. It assumes

that the attacker has access to no other information besides the public key

and (possibly) a decryption oracle, and has a perfect implementation of the

cryptosystem. In the real world an attacker may have access to much more

information than the theoretical attacker did in the security proof. Attacks

that make use of information or abilities that are not available to an attacker

in a security proof are often called side-channel attacks.

Some side-channel attacks are only relevant to one particular cryptosys-

tem, but others can be used to attack whole classes of ciphers. Elliptic curve

cryptosystems are often vulnerable to side-channel attacks, and this will be

discussed more thoroughly in Chapter IV. For the moment we will ignore

these general attacks and focus on those attacks that are speciļ¬c to ECIES.

III.3.1. Weak Key Generation Algorithm. On one hand it is easy to

see how a ļ¬‚awed key generation algorithm can reduce the security of a cryp-

tosystem. It is not diļ¬cult to understand how an attacker might be able

to break a scheme if, for example, the private key x was always a particular

number or had a particular form. However, there is a more subtle reason why

key generation is an important part of a cryptosystem, and this has particular

relevance when talking about security proofs.

It is very important that the key generation algorithm has the ability to

produce lots of diļ¬erent key-pairs and each with equal probability. This may

seem a bit contradictory. After all, an attacker will only get to attack the

scheme after the key-pair has been chosen, so why should it matter whether

the key generation algorithm produces one key or lots and lots?

Well, for a start we have assumed that the attacker knows all the ele-

ments of the cryptosystem including the key generation algorithm. If the key

generation algorithm only ever produces one key, or a very small number of

keys, then it is not hard for the attacker to run the key generation algorithm,

ļ¬nd out the private key and break the scheme. The obvious solution to this

problem is to keep the key generation algorithm hidden from the attacker, as

is often the case with real-life instantiations of cryptosystems.

III.3. OTHER ATTACKS AGAINST ECIES 59

It turns out that this is not a useful thing to do, but for quite a subtle

reason. The reason is that the key generation algorithm is a ļ¬xed algorithm

that doesnā™t depend on any secret values, so there are always attackers that

assume something about the key generation algorithm, such as that it al-

ways produces one particular key-pair, and attempt attacks based on this

assumption. If the key generation algorithm produces a large enough number

of diļ¬erent key-pairs, then these attackers will not have a signiļ¬cant success

probability as it is very unlikely that a key-pair that is generated by G will

satisfy the assumption that the attacker makes, but this could be a problem

for key generation algorithm with small ranges.

It is possibly best to think about this in terms of a simple example. If sk

is some ļ¬xed private key for an asymmetric encryption scheme (G, E, D), then

there is an attacker A who will try to decrypt a challenge ciphertext using

the function D(Ā·, sk) regardless of what the public key is or whether it knows

any information about the key generation algorithm. This means that, even

if G is unknown, A will still have a high success probability if G outputs the

key-pair (pk, sk) with a signiļ¬cant probability. Since we are forced to consider

all possible attackers for the scheme (G, E, D), we are forced to conclude that

(G, E, D) is a weak encryption algorithm because an attacker exists that will

break the scheme with high probability even though we do not know what

that attacker is. On the other hand, if G only ever produces (pk, sk) with an

insigniļ¬cant probability, i.e., G has the ability to output lots of diļ¬erent keys

each with equal probability, then (G, E, D) is secure because the attacker A

will not work in all but an insigniļ¬cant fraction of the possible cases.

This demonstrates another weakness of security proofs. Whilst a security

proof tells us about the security of a scheme in general, it doesnā™t tell us about

how the scheme works with one particular key-pair. It is entirely possible for a

scheme to have a good security proof and yet be completely insecure for some

set of keys. The best that a security proof can say is that if one generates a

key-pair using the key generation algorithm, then the probability that that

scheme will be insecure is insigniļ¬cant; it doesnā™t say anything about the

suitability of any one speciļ¬c key-pair that might actually be in use.

III.3.2. Invalid Elliptic Curve Point Attacks. Another assumption that

has been made throughout this section is that all the data are valid. In real

implementations of elliptic curve cryptosystems, elliptic curve points are usu-

ally represented as elements of Fq Ć— Fq or Fq by using some form of point

compression. Whenever an attacker has requested the decryption of a cipher-

text (U, c, r), we have always implicitly assumed that U is a valid point on

the elliptic curve E and an element of the subgroup generated by P . If this is

not explicitly checked, then an attacker can exploit this to break the scheme.

As an example, consider the following simple attack. The attacker ļ¬nds

a point U on E with small prime order p1 , chooses any message m ā M and,

60 III. PROOFS OF SECURITY FOR ECIES

for 0 ā¤ i < p1 , computes

(i) (i)

(k1 ||k2 ) = KD([i]U, l),

(i)

c(i) = Enc(m, k1 ),

(i)

r(i) = MAC(c(i) , k2 ).

Let C (i) = ([i]U, c(i) , r(i) ) and 0 ā¤ j < p1 be such that j ā” x (mod p1 ). Since

[j]U = [x]U , where x is the secret key, C (j) will decrypt to give the message

m. It is very unlikely that any of the other ciphertexts C (i) (with i = j) will

decrypt to give m. Therefore, with only p1 requests to the decryption oracle,

the attacker can ļ¬nd out the value of the private key x (mod p1 ).

If an attacker does this for a series of primes p1 , p2 , . . . , pk such that

p1 p2 . . . pk ā„ q, then it can recover the whole private key x using the Chi-

nese Remainder Theorem (and it will only need to make p1 + p2 + . . . + pk

requests to the decryption oracle to do this).

Of course this attack only works if the elliptic curve contains a number of

points with small prime orders. However, other variants of this type of attack

exist which involve the attacker changing parts of the public parameters, such

as the public key Y , the group generator P or even the elliptic curve E itself.

These variants may work even if the original attack does not. Details of

methods that can be used to check the validity of elliptic curve points and

elliptic curves themselves can be found in Section I.5.

III.3.3. Variable Length Symmetric Keys. Another implicit assump-

tion that is made in the security proofs is that the keys produced by the key

derivation function are of a ļ¬xed length l. This seems like a trivial point, but

it can actually be very important, especially when one combines this with

some details about the way in which most key derivation functions work.

Most key derivation functions compute KD(U, l) by taking the ļ¬rst l bits of

an inļ¬nite pseudo-random sequence produced from the input U . This means

that, for any 0 < l < l, KD(U, l ) is the same as the ļ¬rst l bits of KD(U, l).

This is not a problem for ECIES providing that we use symmetric keys of a

ļ¬xed length l ā“ it becomes a problem, however, if we allow l to vary.

In particular, some early versions of ECIES allowed the symmetric cipher

to be a variable length Vernam cipher (see Section III.1.3). In this case

encryption is given by:

Algorithm III.1: ECIES Encryption (Weak Version)

A message m, public key Y and the length of the MAC

INPUT:

key l .

OUTPUT: A ciphertext (U, c, r).

1. Choose k āR {1, . . . , q}.

2. U ā [k]G.

III.4. ECIES-KEM 61

T ā [k]Y .

3.

(k1 k2 ) ā KD(T, |m| + l ).

4.

Encrypt the message, c = m ā• k1 .

5.

Compute the MAC on the ciphertext, r ā M AC(c, k2 ).

6.

Output (U, c, r).

7.

Decryption is given in the obvious way.

This variant of ECIES was proven to be insecure by Shoup [305]. Suppose

m = m1 ||m2 is a message such that |m1 | > 0 and |m2 | = l . Suppose further

that the ciphertext C = (U, c, r) is an encryption of m and that c = c1 ||c2 ,

where |c2 | = l . It is easy to see that the ciphertext C = (U, c , r ), where

c = c1 ā• ā and r = MAC(c , c2 ā• m2 ),

is a valid encryption of the message m1 ā• ā for any bit-string ā of length

|m1 |. Hence an attacker can check whether a ciphertext C is the encryption

of a message m or not by checking to see if the shortened ciphertext C is

an encryption of the shortened message m1 ā• ā or not. This would allow an

IND-CCA2 attacker to break ECIES.

III.3.4. Using the x-Coordinate. As has already been mentioned in Sec-

tion I.4, in some implementations of ECIES the key derivation function KD

is not applied to the elliptic curve point T but to the x-coordinate of T .

Whilst this is not really a problem for the cipher, since it is easy to see that

computing the x-coordinate of T is roughly as hard as computing the whole

of T , it is a problem for the security proof.

The problem occurs because the decryption of a ciphertext (U, c, r) in

this case is always the same as the decryption of the ciphertext (ā’U, c, r).

Therefore an attacker with CCA2 access to a decryption oracle can always

decrypt the challenge ciphertext C ā— = (U ā— , cā— , rā— ) by requesting the decryp-

tion of (ā’U ā— , cā— , rā— ). Hence the scheme is not secure against CCA2 attackers.

A cryptosystem for which there is some easy way of computing two diļ¬er-

ent ciphertexts that decrypt to give the same message is said to be benignly

malleable [305].

We can remove this problem by deļ¬ning a binary relation ; between

ciphertexts, where C ; C if (1) D(C, sk) = D(C , sk) and (2) it is easy to

ļ¬nd C from C. In this case the decryption oracle must refuse to decrypt not

only the challenge ciphertext C ā— but any ciphertext C such that C ā— ; C .

These ideas are slowly gaining more acceptance in the academic community

[58].

III.4. ECIES-KEM

One problem with hybrid encryption is that one often ends up proving

the same results again and again. We can use the three security proofs for

ECIES in given Section III.2 as an example: each of these had a diļ¬erent

62 III. PROOFS OF SECURITY FOR ECIES

technique for dealing with the asymmetric (elliptic curve) part of the system,

and yet all three used exactly the same arguments about the symmetric part.

Recently Shoup (see [304] and [93]) introduced the concept of a generic

hybrid construction (sometimes called a KEM/DEM construction). Here the

asymmetric and symmetric parts of a hybrid scheme are separated and each

has its own formal security requirements. This allows us to evaluate their

security individually and to āmix and matchā the asymmetric KEMs and the

symmetric DEMs without needing to produce another long, boring security

proof.

III.4.1. KEMs and DEMs. A KEM/DEM construction is made of two

parts: an asymmetric key encapsulation mechanism (or KEM) and a sym-

metric data encapsulation mechanism (or DEM). A KEM takes as input a

public key and produces a random symmetric key of a predetermined length

and an encryption (or encapsulation) of that key. A DEM takes as input a

message and a symmetric key and produces an encryption of that message

under that key. Once again, it is very important to be precise about how we

deļ¬ne a KEM and a DEM.

Definition III.11. A KEM is a triple of algorithms (G, Encap, Decap), where

ā¢ G is a probabilistic key generation algorithm. It takes no input, except

randomness, and outputs a key-pair (pk, sk). Again pk is the public key

and must be distributed to everyone that uses the encryption algorithm,

and sk is the private key and should only be made available to those

parties who have permission to decrypt messages.

ā¢ Encap is a probabilistic algorithm called the encapsulation algorithm.

It takes only the public key as input, and randomness, and outputs a

symmetric key K of some predetermined length and an encapsulation

C1 of K.

ā¢ Decap is a deterministic algorithm called the decapsulation algorithm.

It takes as input the private key sk and an encapsulation C1 . It returns

either a symmetric key K or an error symbol ā„.

A KEM is sound if for all valid key-pairs (pk, sk) and any output (K, C1 ) =

Encap(pk) we have K = Decap(C1 , sk).

It should be fairly easy to see the similarities between KEMs and public

key encryption schemes. The deļ¬nitions are identical except for the fact that

the encapsulation algorithm does not take a message as input but rather

generates its own (the symmetric key).

Definition III.12. A DEM is a pair of algorithms (Enc, Dec), where

ā¢ Enc is a (possibly) probabilistic algorithm called the encryption algo-

rithm. It takes as input a message m and a symmetric key K of some

predetermined length and outputs an encryption C2 of the message m.

III.4. ECIES-KEM 63

ā¢ Dec is a deterministic algorithm called the decryption algorithm. It

takes as input an encryption C2 and a symmetric key K and returns

either a message m or an error symbol ā„.

A DEM is sound if for all keys K of the correct length and messages m

we have Dec(Dec(m, K), K) = m.

In order to form a public-key encryption scheme one combines a KEM

and a DEM, where the symmetric keys that are outputted by the KEM are

the correct size for use by the DEM, in the following manner. Key generation

is given by the key generation algorithm for the KEM, G.

Algorithm III.2: KEM/DEM Encryption

INPUT: A message m and the public key pk.

OUTPUT: A ciphertext C = (C1 , C2 ).

1. (K, C1 ) ā Encap(pk).

2. C2 ā Enc(m, K).

3. Output C = (C1 , C2 ).

Algorithm III.3: KEM/DEM Decryption

INPUT: A ciphertext C = (C1 , C2 ) and the private key sk.

OUTPUT: A message m or an ā˜ā˜Invalid Ciphertextā™ā™ message.

1. K ā Decap(C1 , sk).

2. If K =ā„ then output ā˜ā˜Invalid Ciphertextā™ā™.

3. m ā Dec(C2 , K).

4. If m =ā„ then output ā˜ā˜Invalid Ciphertextā™ā™.

5. Output m.

III.4.2. Security for KEMs and DEMs. The security criteria for KEMs

and DEMs are actually quite similar, and both are closely related to the IND

game that was used to deļ¬ne the security of a public-key encryption scheme.

Both are concerned with the eļ¬ects of a two-stage attacker A = (A1 , A2 ) who

is playing a game against a challenger. For a KEM the attacker is trying to

tell a real symmetric key from a random symmetric key and for a DEM the

attacker is trying to tell if a particular ciphertext is the encryption of one

message or another.

Definition III.13. For a KEM, the indistinguishability IND game for an

attacker A = (A1 , A2 ) consists of four major steps:

1. A challenger generates a random key-pair (pk, sk) by running the key

generation algorithm G.

64 III. PROOFS OF SECURITY FOR ECIES

2. The attacker runs A1 on the input pk. It returns some state informa-

tion s.

3. The challenger generates a valid symmetric key and its encapsulation

(K0 , C ā— ) by running the encapsulation algorithm Encap(pk). It also

generates a random symmetric key K1 of the same length and chooses

a bit Ļ ā {0, 1} uniformly at random.

4. The attacker runs A2 on the input (KĻ , C ā— , pk, s). It returns a guess

Ļ for Ļ.

The attacker wins the game if Ļ = Ļ. The advantage of an attacker in playing

the IND game is deļ¬ned to be

|P r[Ļ = Ļ] ā’ 1/2| .

Attackers that are playing the IND game for a KEM can have either CPA,

CCA1 or CCA2 access to a decapsulation oracle, i.e., an oracle that runs the

decapsulation algorithm for them. For more details see Deļ¬nition III.4. In

order to guarantee that the overall hybrid scheme is secure against IND-

CCA2 attackers we will require that the KEM be secure against IND-CCA2

attackers too.

Definition III.14. For a DEM, the indistinguishability IND game for an

attacker A = (A1 , A2 ) consists of four major steps:

1. A challenger generates a random symmetric key K of an appropriate

size.

2. The attacker runs A1 . It returns two messages m0 and m1 of equal

size, as well as some state information s.

3. The challenger chooses a bit Ļ ā {0, 1} uniformly at random. It com-

putes the challenge ciphertext C ā— = Enc(mĻ , K).

4. The attacker runs A2 on the input (C ā— , s). It returns a guess Ļ for Ļ.

Again, the attacker wins the game if Ļ = Ļ. The advantage of an attacker in

playing the IND game is deļ¬ned to be

|P r[Ļ = Ļ] ā’ 1/2| .

For our purposes, an attacker that is playing the IND game for a DEM can

either have no access to a decryption oracle, in which case the attacker is said

to be passive (PA), or have complete access to a decryption oracle, in which

case the attacker is said to be active (CCA). Obviously an active attacker is

forbidden from requesting the decryption of the challenge ciphertext from the

decryption oracle. Note that in neither of these two cases can the attacker

encrypt messages under the randomly generated symmetric key K. This is a

major diļ¬erence between the symmetric and asymmetric cases.

One of the main points of this approach is that the security of the two com-

ponents can be evaluated independently, as the following theorem of Shoup

[304] demonstrates.

III.4. ECIES-KEM 65

Theorem III.15. If (G, E, D) is an asymmetric encryption scheme composed

of a KEM that is secure against IND-CCA2 attackers and a DEM that is se-

cure against IND-CCA attackers, then (G, E, D) is secure against IND-CCA2

attackers.

The proof of this theorem is similar to parts of the proofs of security for

ECIES that we have already sketched. Suppose there exists an attacker A

that breaks the hybrid scheme in the IND-CCA2 model with a signiļ¬cant

advantage. We attempt to build an IND-CCA2 attacker B that breaks the

KEM. Either B has a signiļ¬cant advantage or A must be attacking the sym-

metric part of the scheme alone, and in that case Aā™s advantage would be the

same even if we replaced the symmetric keys used to encrypt the challenge

ciphertext with randomly generated symmetric keys. We can therefore use

A to build an IND-CCA attacker C that attacks the DEM with signiļ¬cant

advantage. On the other hand, if both the KEM and the DEM, are secure

then the overall hybrid encryption scheme must be secure too.

There is one note of caution however. Granboulan [153] has shown that

it is vitally important that the KEM and the DEM are strictly independent

of each other, i.e., no information used by the KEM (including any element

of the public key or the system parameters) should be made available to the

DEM, or Theorem III.15 will not hold.2 In other words, if a secure KEM

and a secure DEM share some information, then it is possible that the hybrid

scheme formed from them might not be secure.

III.4.3. ECIES-KEM and ECIES-DEM. Both ECIES-KEM and ECIES-

DEM (both of which are deļ¬ned in Section I.4) meet their required security

levels and have proofs of security to attest to this.

The security proof for ECIES-DEM is the same as the argument used

in Section III.2.1 to show that ECIES is secure if the symmetric keys are

randomly generated. Essentially one can prove that ECIES-DEM is secure

provided the underlying symmetric cipher and MAC scheme are secure. See

Section III.1.3 for more details on the security of symmetric primitives.

As one can imagine, the security proof for ECIES-KEM is a little more

complicated, and again we are going to have to make some kind of simplifying

assumption in the security analysis, just as we had to do for ECIES. It should

be easy to see that we can prove the security of ECIES-KEM using either the

random oracle model or the generic group model (see Sections III.2.2 and

III.2.3). In both models the security proof for ECIES involved demonstrating

that either we could break some underlying mathematical problem or the

symmetric keys were indistinguishable from completely random keys. This

is exactly the condition we need to prove in order to prove the security of

ECIES-KEM.

2

Similarly, it is important that the DEM is chosen before the asymmetric key pair is

generated or else elements of the key-pair may inļ¬‚uence the choice of DEM. This would

also invalidate Theorem III.15.

66 III. PROOFS OF SECURITY FOR ECIES

We also managed to show this based on the assumption that the hash

Diļ¬eā“Hellman problem is diļ¬cult to solve, but this does not help us prove

the security of ECIES-KEM. This is because the hash Diļ¬eā“Hellman problem

is exactly the same as the problem of breaking ECIES-KEM with an IND-

CCA2 attacker. So basing a security proof for ECIES-KEM on the hash

Diļ¬eā“Hellman problem would be rather like proving that ECIES-KEM is

diļ¬cult to break whenever it is diļ¬cult to break ECIES-KEM!

ECIES-KEM also suļ¬ers from all of the problems associated with ECIES

discussed in Section III.3, although the use of the cofactor mode (see Sec-

tion I.4) can help stop the invalid elliptic curve point attacks discussed in

Section III.3.2. Nevertheless, ECIES-KEM has already become a popular

algorithm with both academic cryptographers and software developers and

has already been proposed for inclusion in many important cryptographic

standards.

Part 2

Implementation Techniques

CHAPTER IV

Side-Channel Analysis

E. Oswald

Side-channel analysis (SCA) or information leakage analysis (ILA), refers

to a new and emerging type of cryptanalysis that uses leaked side-channel

information from a cryptographic device to determine the secret key. The

traditional cryptographic model consists of two parties, Alice and Bob, who

wish to communicate secretly over an insecure channel. To protect against

a potential eavesdropper, cryptographic algorithms are used to ensure the

secrecy of the communication. In this traditional model, the security of the

system relies solely on the secrecy of the key and the mathematical properties

of the cryptographic algorithm used.

However, in a more practical scenario the communicating parties are

mostly computing devices, such as personal computers, smart cards or mo-

bile phones. On such devices, a cryptographic algorithm is implemented

and executed whenever a cryptographic protocol requires its computation.

Although these devices operate in potentially hostile environments and are

subject to attacks themselves, they are supposed to protect the secret key.

Furthermore, such devices can act as a source of information. Depending on

the data they process, they might consume diļ¬erent amounts of power, emit

diļ¬erent amounts of electromagnetic emanations, produce diļ¬erent types of

noises, need diļ¬erent running times or output diļ¬erent error messages; see

Figure IV.1. If the device is executing a cryptographic algorithm (which uses

a secret key), the leaked information may be directly related to this secret

key. In certain cases, this additional information can be exploited to deduce

information about the secret key. Hence, the security of the whole system

relies on the security of the algorithms and protocols, and on the security of

their implementation.

This chapter is as organized as follows. The introductory part starts with

a brief section on cryptographic hardware in Section IV.1. Active attacks

assume that an attacker can manipulate a device in some way. Such attacks

are brieļ¬‚y sketched in Section IV.2. In passive attacks, which are treated in

Section IV.3, an attacker monitors the side-channels of a device but does not

manipulate the device himself. We introduce the currently used side-channels

in Section IV.3.1 and explain the concepts of simple side-channel analysis and

diļ¬erential side-channel analysis in Sections IV.3.2 and IV.3.3. We conclude

69

70 IV. SIDE-CHANNEL ANALYSIS

e

im

tion T

Execu Eve

Bob

Electromag

netic Eman

ation

Po

we

ion

rC

nat

ma

on

Alice

So

e

E

su

etic m

un

Ti

m

agn

pt

d

trom on

io ti

Po

c

n Ele u

ec

we

Ex

rC

on

su

mp

ti

on

Figure IV.1. A practical scenario where Alice and Bob are

computing devices and Eve can spy on the communication both

directly and via the emanations produced by the devices.

the introductory part of this chapter with a discussion on the attack scenarios

for elliptic curve cryptosystems in Section IV.3.4.

The second part is devoted to side-channel analysis on ECC. In Sec-

tion IV.4 we discuss the application of simple side-channel analysis on imple-

mentations of the elliptic curve scalar point multiplication. In Section IV.5

we apply diļ¬erential side-channel analysis on implementations of the elliptic

curve scalar point multiplication.

Defences against side-channel analysis are discussed separately, in Chap-

ter V.

IV.1. Cryptographic Hardware

Traditionally, the main task of cryptographic hardware was the acceler-

ation of operations frequently used in cryptosystems or the acceleration of

a complete cryptographic algorithm. Nowadays, hardware devices are also

required to store secret keys securely. Especially in applications involving

digital signatures, the secrecy of the private signing key is vital to the secu-

rity of the whole system. A cryptographic device must make key extraction

impossible. Active attacks that target cryptographic devices are commonly

referred to as tamper attacks.

Tamper resistant devices are supposed to make key extraction impossible,

while tamper evident devices should make the key extraction obvious. An ex-

ample of a highly tamper resistant device is IBMā™s 4758 [IBM CoPro]. It is

the only device evaluated against the highest level of tamper resistance (FIPS

140-1[FIPS 140.1], level 4) at the time of writing. Such high-end devices are

costly and therefore usually used in few applications such as ATM machines.

IV.2. ACTIVE ATTACKS 71

Tamper resistant devices that are used for the mass market today are mostly

smart cards. Smart cards have to be cheap and are therefore restricted in

computing power, memory, and of course, mechanisms to counteract tamper

attacks.

IV.1.1. Smart Cards. A smart card usually consists of a microprocessor

(8-bit for low cost cards and up to 32-bit in the newest generation of smart

cards), memory and some interface. Usually, the interface is a serial interface,

for example in the case of banking cards, but also USBā“interfaces or RFā“

interfaces for contactless smart cards are possible. Smart card commands are

coded in Application Protocol Data Units (APDUs).

For several purposes, authentication commands such as internal authenti-

cate or external authenticate are usually available for smart cards with cryp-

tographic functions. The internal authenticate command usually takes some

input data and computes some sort of signature or decryption using a secret

or private key. The external authenticate command even allows the supply of

the key for the cryptographic operation. These two commands are commonly

used for attacks in practice.

IV.2. Active Attacks

Active attacks or tamper attacks have a long history in the ļ¬eld of cryp-

tography. The term active implies that an attacker actively manipulates the

cryptographic device.

IV.2.1. Simple Attacks on Smart Cards. It is relatively simple to ex-

tract the microprocessor from a given smart card. Then, the microprocessor

can be depackaged and placed into a more suitable medium in order to per-

form some experiments.

Various simple types of tamper attacks have been popular for smart cards.

The very ļ¬rst generations of smart cards used an external programming volt-

age to read and write contents to its memory. By cutting of this external

programming voltage, an attacker could freeze the contents of the memory.

Manufacturersā™ test circuits led to other attacks on smart cards. Such test

circuits are used during the testing phase after the fabrication of the smart

card. After the testing phase, the test circuits are disconnected from the

microprocessor. An attacker has to ļ¬nd and repair the disconnected wires to

use the test circuit.

Most of these simpler attacks on smart cards were not only active at-

tacks but were also intrusive. In modern smart cards, the components of

the processor are covered by a protective mesh. This mesh reports when it

is damaged, at which point the smart card destroys its memory contents or

stops functioning.

72 IV. SIDE-CHANNEL ANALYSIS

IV.2.2. Fault Attacks. Here, an attacker tries to induce faulty signals in

the power input (or clock input) of the smart card. Often, this forces a change

in the data in one of the registers. As a consequence, the computation pro-

ceeds with the faulty data. Alternatively, an attacker might try to supply

faulty data directly (see also Section III.3.2). Fault attacks on implementa-

tions of elliptic curve cryptosystems have been introduced in [26] and were

also discussed in [78]. The main idea behind the attacks in this chapter is

the following. By inserting a fault in some register, we force the device to

execute the point multiplication algorithm with values that are not a point on

the given curve but on some other curve instead. This other curve is probably

a cryptographically weaker curve that can be used to calculate the private

key k more easily. As a consequence, a device has to check its input and its

output values for correctness.

IV.3. Passive Attacks

Passive attacks were only recognized in the cryptographic community as

a major threat in 1996, when Paul Kocher published the ļ¬rst article about

timing attacks [205]. In a passive attack, the attacker only uses the standard

functionality of the cryptographic device. In the case of smart cards, these

are the internal and the external authenticate commands. The outputs of

the device are then used for the attack. A device can have diļ¬erent types

of outputs. If such outputs unintentionally deliver information (about the

secret key), then the outputs deliver side-channel information and we call

them side-channel outputs (or side-channels). In the next sections, we present

side-channels that have been exploited in attacks so far.

IV.3.1. Types of Side-Channels. The ļ¬rst oļ¬cial information related to

a side-channel analysis dates back to 1956. Peter Wright reported in [349]

that MI5, the British intelligence agency, was stuck in their eļ¬orts to break

an encryption machine in the Egyptian Embassy in London. The encryption

machine was a rotor machine of the Hagelin type. Breaking the machineā™s

encryption exceeded their computational capabilities. Wright suggested plac-

ing a microphone near that machine to spy on the click-sound the machine

produced. Wright discovered that the sound of the clicks allows the attacker

to determine the position of some of the rotors. This additional information

reduced the computation eļ¬ort to break the cipher, and MI5 was able to spy

on the communication for years.

Timing Leakage : Timing information about the execution of a crypto-

graphic algorithm was the ļ¬rst side-channel that was brought to the atten-

tion of the cryptographic community by Kocher. In [205] he introduced a

technique to exploit even very small timing variations in an attack. Timing

variations often occur because of data-dependent instructions. For example,

IV.3. PASSIVE ATTACKS 73

Evaluation

Scope

GPIB

Data from PC

Probe

Vcc GND

Reset Vpp

Smartcard

Clock IO

Name

Cardnumber

Figure IV.2. This is a typical measurement setup to conduct

power (or EM) measurements. A personal computer is used

to operate the smart card and is connected with the digital

scope. The scope measures the power consumption on the GND

contact with a probe. A second probe, which is connected to

the IOā“contact, is used as a trigger. The measurements are

sent from the scope to the personal computer for evaluation.

in the (original) Montgomery multiplication [254], a ļ¬nal subtraction step is

needed only if the result is exceeding the modulus.

In order to exploit timing information, very precise timing measurements

need to be made.

Power Consumption Leakage : Nowadays, almost all smart card proces-

sors are implemented in CMOS (Complementary Metal-Oxide Silicon) tech-

nology. The dominating factor for the power consumption of a CMOS gate

is the dynamic power consumption [348]. Two types of power consumption

leakage can be observed. The transition count leakage is related to the num-

ber of bits that change their state at a time. The Hamming weight leakage is

related to the number of 1-bits, being processed at a time. The internal cur-

rent ļ¬‚ow of a smart card can be observed on the outside of the smart card by,

for example, putting a small resistor between the ground (GND) of the smart

card and the true ground. The current ļ¬‚owing through the resistor creates a

voltage that can be measured by a digital oscilloscope (see Figure IV.2).

The ļ¬rst practical realization of an attack that makes use of the power

consumption leakage was reported by Kocher et al. in [206]. Since then,

many companies and universities have developed the skills to conduct these

measurements in practice.

74 IV. SIDE-CHANNEL ANALYSIS

Electromagnetic Radiation Leakage : The movement of electrically

charged particles causes electromagnetic leakage. In CMOS technology, cur-

rent ļ¬‚ows whenever the circuit is clocked and gates switch. This current ļ¬‚ow

is visible in the power consumption, and it is also visible as electromagnetic

emanations. There are two main categories for electromagnetic emanations.

Direct emanations result from intentional current ļ¬‚ow. Unintentional ema-

nations are caused by the miniaturization and complexity of modern CMOS

devices.

The ļ¬rst concrete measurements were presented in [139] and [278]. At

the time of writing, it is possible to conduct EM attacks with measurements

taken several meters away from the attacked device [5].

Combining Side-Channels : The combination of several diļ¬erent side-

channels has not been investigated intensively so far. Two sources of side-

channel information that are known to be eļ¬ciently combined are the timing

and the power side-channel. For instance, power measurements can be used

to get precise timing measurements for intermediate operations (if these op-

erations are clearly visible in the power trace). These power measurements

can then be used in timing attacks. This observation has been used in [342]

and [289].

Error Messages : Error-message attacks usually target devices implement-

ing a decryption scheme. It is a common assumption that a device gives a

feedback about whether or not a message could be decrypted successfully. If

an attacker knows the reason why a decryption failed, information about the

secret key can be deduced by sending well-chosen ciphertexts to the device.

These kind of attacks were ļ¬rst introduced by Bleichenbacher in [30].

This article describes a chosen ciphertext attack against the RSA encryp-

tion standard PKCS#1. Other error-message attacks are mentioned in [232]

against RSA-OAEP, [99] and [189] against the NESSIE candidate EPOC-2

and [330] against the padding and subsequent encryption in CBC mode as it

is done in various standards.

IV.3.2. Simple Side-Channel Analysis. In simple side-channel analysis,

an attacker uses the side-channel information from one measurement directly

to determine (parts of) the secret key. Therefore, the side-channel informa-

tion related to the attacked instructions (the signal) needs to be larger than

the side-channel information of the unrelated instructions (the noise). In

order to be able to exploit this information, at least some of the executed in-

structions need to be distinguishable by their side-channel trace. In addition,

the attacked instructions need to have a relatively simple relationship with

the secret key.

IV.3. PASSIVE ATTACKS 75

Unknown/

Data

Uncontrolled

Influences hypothetical

Data

Key

Physical Device Model of the

Physical Device

Key

Physical Sideā’Channel Output Hypothetical Sideā’Channel Output

Statistical Analysis

Decision

Figure IV.3. The idea behind diļ¬erential side-channel analy-

sis: an attacker compares the predictions from his hypothetical

model with the side-channel outputs of the real, physical device.

The question, of whether the signal is clearly visible in the side-channel

trace, often depends on the underlying hardware, but also sometimes on the

implementation itself; see Chapter V.

IV.3.3. Diļ¬erential Side-Channel Analysis. When simple side-channel

analysis are not feasible due to too much noise in the measurements, diļ¬eren-

tial side-channel analysis can be tried. Diļ¬erential side-channel analysis uses

many measurements to ļ¬lter out the interfering noise. Simple side-channel

analysis exploits the relationship between the executed instructions and the

side-channel output. Diļ¬erential side-channel analysis exploits the relation-

ship between the processed data and the side-channel output.

In diļ¬erential side-channel analysis, an attacker uses a so-called hypothet-

ical model of the attacked device. The quality of this model is dependent on

the capabilities of the attacker:

Clever Outsiders: They do not have insider knowledge of the system.

They are assumed to have moderately sophisticated equipment.

Knowledgeable Insiders: They are educated in the subject and have

a varying degree of understanding of the attacked system. They have

sophisticated tools to conduct the measurements.

Funded Organizations: They have teams of specialists that may have

access to insider knowledge about the attacked system, and they have

the most advanced analysis tools.

The model is used to predict the side-channel output of a device. Used as

such a predictor, it can output several diļ¬erent values. These can be values

76 IV. SIDE-CHANNEL ANALYSIS

predicting the leakage of one side-channel for several moments in time, but

they can also be values predicting the leakage of diļ¬erent side-channels. In

case only one single output value is used for an attack, then the attack is

called a ļ¬rst-order attack . If two or more output values for the same side-

channel are used in an attack, then the attack is called a second-order attack

or higher-order attack, respectively.

These predictions are compared to the real, measured side-channel output

of the device. Comparisons are performed by applying statistical methods on

the data. Among others, the most popular methods are the distance-of-mean

test and correlation analysis.

For the distance-of-mean test, the model only needs to predict two val-

ues, low and high, for a certain moment of time in the execution. These

two predictions are used to classify the measurement data. All measurements

for which the prediction stays high are put into one set, and all other mea-

surements are put in a second set. Then, the diļ¬erence between the sets is

calculated by computing the diļ¬erence of their mean values.

For a formal description of this test, we use the following notation. Let ti

denote the ith measurement data (i.e., the ith trace) and let tl and th denote

i i

a trace for which the model predicted a low and a high side-channel leakage,

respectively. A set of traces tl , th and ti is then denoted by T l , T h and T and

ii

ĀÆĀÆ ĀÆ

the average trace of such a set by T l , T h and T . Consequently, the diļ¬erence

trace for the distance-of-mean method is calculated by

ĀÆ ĀÆ

D = T l ā’ T h.

Whenever this diļ¬erence is suļ¬ciently large, or has some other recognizable

properties, it can be used to verify or reject the key hypothesis.

For correlation analysis, the model predicts the amount of side-channel

leakage for a certain moment of time in the execution. These predictions are

correlated to the real side-channel output. To measure this correlation, the

Pearson correlation coeļ¬cient can be used. Let pi denote the prediction of

the model for the ith trace and P set of such predictions. Then the Pearson

correlation coeļ¬cient is given as

E(T ā— P ) ā’ E(T ) ā— E(P )

ā’1 ā¤ C(T, P ) ā¤ 1.

C(T, P ) = ,

Var(T ) ā— Var(P )

Here E(T ) denotes the expectation (average) trace of the set of traces T and

Var(T ) denotes the variance of a set of traces T . If this correlation is high,

i.e., close to +1 or ā’1, then the key hypothesis is assumed to be correct.

To summarize and conclude this section, diļ¬erential side-channel analysis

is possible if an attacker can make a hypothesis over a small part of the secret

key to predict the side-channel output with his model.

IV.3.4. Target Operations in Elliptic Curve Cryptosystems. There

are two operations in commonly used elliptic curve cryptosystems that involve

IV.4. SIMPLE SCA ATTACKS ON POINT MULTIPLICATIONS 77

the private key or an ephemeral (secret) key. We brieļ¬‚y discuss the scenarios

under which these operations are attacked.

ā¢ The modular multiplication of a known value and the private key needs

to be computed in ECDSA. If the multiplication is implemented in such

a way that the multiplier is the private key and the multiplication is

carried out with a variant of the binary algorithm, then this implemen-

tation is, in principle, vulnerable to side-channel analysis. However,

this operation can be implemented securely by not using the private

key as a multiplier.

ā¢ The scalar multiplication of a secret value with a known elliptic curve

point, i.e., the point multiplication [ECC, Section IV.2], needs to be

calculated in all elliptic curve cryptosystems. Also this operation is

usually implemented by some version of the binary algorithm. It has

been shown to be vulnerable to simple and diļ¬erential side-channel

analysis.

Probably the most important elliptic curve protocol is ECDSA. From the

previous considerations it is clear that the operation that is diļ¬cult to secure

is the point multiplication operation. It is also well known [262] that not all

bits of the ephemeral ECDSA key need to be known in order to reconstruct

the private key of the ECDSA. As a result, the point multiplication operation

must be implemented to resist in particular simple side-channel analysis.

IV.4. Simple SCA Attacks on Point Multiplications

In the case of a simple SCA attack on an implementation of a point

multiplication algorithm, the adversary is assumed to be able to monitor the

side-channel leakage of one point multiplication, Q = [k]P , where Q and P

are points on an elliptic curve E and k ā Z is a scalar. The attackerā™s goal is

to learn the key k using the information obtained from carefully observing the

side-channel leakage (e.g. power trace) of a point multiplication. Such a point

multiplication consists of a sequence of point addition, point subtraction and

point doubling operations. Each elliptic curve operation itself consists of a

sequence of ļ¬eld operations. In most implementations, the standard sequence

of ļ¬eld operations in point addition diļ¬ers from that in point doubling. Every

ļ¬eld operation has its unique side-channel trace. Hence, the sequence of ļ¬eld

operations of point addition has a diļ¬erent side-channel pattern than that of

point doubling (see Figures IV.4 and IV.5 for an example with power traces).

IV.4.1. Attacking the Basic Double-and-Add Algorithm. It was al-

ready observed in [87] that the implementation of the simplest form of a

double-and-add algorithm, which is the binary algorithm (see Algorithm IV.1),

is an easy target for simple side-channel analysis.

First we note that the conditional branch (i.e., Step 5 in Algorithm IV.1)

only depends on a single bit of k. Hence, an attacker can inspect the power

78 IV. SIDE-CHANNEL ANALYSIS

7

6

5

4

3

mA

2

1

0

ā’1

ā’2

ā’3

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000

clock cycle

Figure IV.4. A point addition power trace.

7

6

5

4

3

mA

2

1

0

ā’1

ā’2

ā’3

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000

clock cycle

Figure IV.5. A point doubling power trace.

trace of the point multiplication algorithm to determine k. We have assumed

that it is possible to distinguish the point addition operation from the point

doubling operation in the power trace. Since a point addition operation can

IV.4. SIMPLE SCA ATTACKS ON POINT MULTIPLICATIONS 79

7

6

5

4

3

mA

2

1

0

ā’1

ā’2

0 0.56 1.08 1.63 2.3 2.89 3.56 4.1 4.69

4

x 10

clock cycle

0

0 1 0

1 0

Figure IV.6. The power consumption trace of an elliptic

curve scalar point multiplication operation that was performed

with a simple double-and-add algorithm. The used scalar can

be read directly from this power consumption trace.

only be caused by a 1 in the bit representation of k, deducing the key is trivial

(see Figure IV.6 for an example) in this naive implementation.

Algorithm IV.1: Right-to-Left Binary Algorithm

ā’1

kj 2j , kj ā {0, 1}.

INPUT: Point P and -bit multiplier k = j=0

OUTPUT: Q = [k]P .

1. Q ā P .

2. If k0 = 1 then R ā P else R ā O.

3. For i = 1 to l ā’ 1 do:

Q ā [2]Q.

4.

If (ki = 1) then R ā R + Q.

5.

6. Return R.

More interesting to attack are algorithms that make use of the rich arith-

metic on elliptic curves. This arithmetic allows the use of other representa-

tions of k, instead of only the binary representation. They oļ¬er some inherent

resistance to straightforward simple SCA because they also make use of point

subtraction. Because point addition and point subtraction only diļ¬er slightly,

they can be implemented in such a way that their side-channel patterns look

80 IV. SIDE-CHANNEL ANALYSIS

alike. Therefore, an attacker cannot determine the key bits by just inspecting

the side-channel trace.

IV.4.2. Attacking Double-Add-and-Subtract Algorithms. The task

of an attacker is to deduce some information about k by using the informa-

tion of the side-channel trace. In particular, an attacker wants to determine

and exploit the relationship between the occurrence of certain sequences of

bits and certain sequences of side-channel patterns. Let X be a random vari-

able that denotes the sequence of patterns in the side-channel trace, i.e., an

AD-sequence (for example X=āDDDā or X=āDADā). Let Y be a random

variable that denotes a sequence of patterns in the digit representation of k,

i.e., a 01-sequence (for example Y = 000 or Y = 01). Then an attacker is

interested in exploiting and calculating the conditional probability

Pr(Y = y ā© X = x)

Pr(Y = y|X = x) = (IV.1)

Pr(X = x)

for many diļ¬erent x and y. Such probabilities can be calculated by using

Markov theory [155].

A Markov process, or in case of a ļ¬nite state space also called Markov

chain, is a rather simple statistical process. In a Markov process, the next

state is dependent on the present state but independent of the way in which

the present state arose from the states before. The transitions between the

states are determined by random variables that are either known or have to

be estimated.

Definition IV.1. Let T be a (kĆ—k) matrix with elements tij , 1 ā¤ i, j ā¤ k. A

random process (X0 , X1 , . . . ) with ļ¬nite state space S = {s1 , . . . , sk } is said to

be a Markov chain with transition matrix T if, for all n all i, j ā {1, . . . , k},

and all i0 , . . . , inā’1 ā {1, . . . , k} we have

Pr(Xn+1 = sj |X0 = si0 , . . . , Xn = si ) =

Pr(Xn+1 = sj |Xn = si ) = tij .

A large class of such Markov processes has the two important properties

of being irreducible and aperiodic.

The ļ¬rst property, i.e., that a process is irreducible, means that all states

can be reached from all other states with a ļ¬nite number of steps. The

second property, i.e., that a process is aperiodic, means that all states are

aperiodic. A state is aperiodic if the period is equal to 1, i.e., the probability

of returning to a state is always positive. These two properties are conditions

for the main theorem of Markov theory. Before we state this theorem we

deļ¬ne the stationary distribution ļ¬rst.

Definition IV.2. Let (X0 , X1 , . . . ) be a Markov chain with state space given

by {s1 , . . . , sk } and transition matrix T . A row vector Ļ = (Ļ1 , . . . , Ļk ) is said

IV.4. SIMPLE SCA ATTACKS ON POINT MULTIPLICATIONS 81

to be a stationary distribution for the Markov chain if it satisļ¬es

n

Ļi ā„ 0 āi, and Ļi = 1, and (IV.2)

i=1

ĻT = Ļ. (IV.3)

Theorem IV.3. For any irreducible and aperiodic Markov chain, there exists

a unique stationary distribution Ļ, and any distribution Āµn of the chain at

time n approaches Ļ as n ā’ ā, regardless of the initial distribution Āµ0 .

This theorem states that, for Markov processes having the properties of

being aperiodic and irreducible, a steady state always exists. By using the

transition matrix T and the steady-state vector, we can calculate the condi-

tional probabilities (IV.1).

Example: Consider a simple double-add-and-subtract algorithm as shown

in Algorithm IV.2.

Algorithm IV.2: Double-Add-and-Subtract Algorithm

ā’1

INPUT: Point P and -bit multiplier k = j=0 kj 2j , kj ā {0, 1}.

OUTPUT: Q = [k]P .

1. R0 ā O, R1 ā P, s ā 0.

2. For i = 0 to l ā’ 1 do:

If ki = 0 then

3.

If s = 11 then R0 ā R0 + R1 .

4.

s ā 0, R1 ā [2]R1 .

5.

If ki = 1 then

6.

If s = 0 then R0 ā R0 + R1 , R1 ā [2]R1 , s ā 1.

7.

If s = 1 then R0 ā R0 ā’ R1 , R1 ā [2]R1 , s ā 11.

8.

If s = 11 then R1 ā [2]R1 .

9.

10. If s = 11 then R0 ā R0 + R1 .

11. Return R0

There are three diļ¬erent states s in this algorithm. The initial state is

always 0. Under the assumption that Pr(ki = 0) = Pr(ki = 1) = 1/2, the

transition matrix for this algorithm is

ļ£« ļ£¶

0.5 0.5 0.5

T = ļ£0.5 0 0 ļ£ø. (IV.4)

0 0.5 0.5

From the transition matrix the steady-state vector, which is (1/2, 1/4, 1/4),

can be calculated. The number of elliptic curve operations that are induced

by an -bit number can be calculated as well. In each state, a point doubling

has to be calculated. Hence, there are point doublings. In addition, in half

82 IV. SIDE-CHANNEL ANALYSIS

of the cases in each state, a point addition has to be calculated. Hence, there

are (1/4 + 1/8 + 1/8) point additions. In total, 3/2 elliptic curve operations

are calculated.

A hidden Markov process is a Markov process in which we can only observe

a sequence of emissions (the AD-sequence), but we do not know the sequence

of states, which are related to the key bits (the 01-sequence) the process went

through. A hidden Markov model can be characterized by the quintuple

(S, O, T, E, s0 ). The notation used here is similar to the one used before:

the transition matrix is denoted by T and the ļ¬nite set of states is called

S. The emissions are denoted by the set O. The emission matrix E is a

|S| Ć— |O| matrix and contains the conditional probability that an emission

symbol of the set E was produced in state S: Eij = Pr(Oj |Si ). The initial

state distribution is denoted by s0 . Given a (hidden) Markov model, the

task of the attacker is to ļ¬nd for a given AD-sequence, a corresponding state

sequence that explains the given AD-sequence best. One approach to tackle

this problem is to choose sub-sequences that are individually most likely, i.e.,

which have the highest conditional probabilities [269].

Example: Suppose we use Algorithm IV.2 to perform a point multiplication

with the scalar k = 560623. As a ļ¬rst step, we calculate suļ¬ciently many

conditional probabilities (see Table IV.1) by using the Markov model that we

derived in the previous example.

Table IV.1. Non-zero conditional probabilities. In this table

we use an abbreviated notation, i.e., we write p(000|DDD)

instead of p(Y = 000|X = DDD). We use the LSB ļ¬rst repre-

sentation.

Pr(000|DDD) = 1/2 Pr(01|DAD) = 1/2 Pr(11|ADAD) = 1/2

Pr(100|DDD) = 1/4 Pr(10|DAD) = 1/4 Pr(10|ADAD) = 1/4

Pr(111|DDD) = 1/4 Pr(11|DAD) = 1/4 Pr(01|ADAD) = 1/4

Pr(001|DDAD) = 1/2 Pr(000|ADDD) = 1/4 Pr(110|ADADAD) = 1/2

Pr(101|DDAD) = 1/4 Pr(100|ADDD) = 1/2 Pr(101|ADADAD) = 1/4

Pr(110|DDAD) = 1/4 Pr(111|ADDD) = 1/4 Pr(011|ADADAD) = 1/4

Table IV.2 shows how the remainder of the attack works. The ļ¬rst row

contains the AD-sequence that we deduced from the power trace. In the

second row, this sequence is split into sub-sequences. In a practical attack,

the length of the sub-sequences will depend on the computational capabilities

of the attacker. However, it will be much larger than in this toy example. In

the third, fourth and ļ¬fth rows, the possible bit-patterns are listed according

to their conditional probabilities.

IV.4. SIMPLE SCA ATTACKS ON POINT MULTIPLICATIONS 83

Table IV.2. k = 11110111101100010001, LSB First Representation

ADADDDADADADDDADADADADDDADDDDAD

ADAD DDAD ADAD DDAD ADAD ADDD ADDD DAD

11 001 11 001 11 100 100 01

10 101 10 101 10 000 000 10

01 110 01 110 01 111 111 11

We are interested in the number of possible scalar values that have to be

tested before the correct k is found. Let denote the number of bits of k and

n the average length of the sub-sequences. In the worst case, if we only take

the non-zero conditional probabilities into account, but not their individual

values, we have to test 0.5 Ć— 33 /2n keys on average. Let m be the number

of sub-sequences, i.e., m = 3 /2n in this example, and g(x, y) = x 2xā’y . If

y

we take the individual conditional probabilities into account, this reduces to

0.5m + 0.5 i=0 0.5i 0.25mā’i 1 + g(m, i) + 2 m

mā’1

j=i+1 g(m, j) g(m, i) keys on

average [270].

The approach that we used in the previous example was to choose sub-

sequences that are individually most likely. This strategy guarantees us ļ¬nd-

ing the key k; however, it does not take sequences of sub-sequences into

account. Another widely used strategy is to ļ¬nd the single best path. A well

known technique for this approach is the Viterbi algorithm and has been used

successfully to attack double-add-and-subtract algorithms [195]. In case of a

randomized version of the algorithm that was discussed in the previous exam-

ple, the Viterbi algorithm determines approximately 88% of all intermediate

states correctly. The remaining and unknown states can be determined by a

meet-in-the-middle technique that is sketched in the following section.

Improvements : Assume that the point Q is given by Q = [k]P , where

k = xb + y denotes the private key k and P the public base-point. Then we

know that for the correct values of x, b and y the equation Q ā’ [y]P = [xb]P

must hold. By computing and comparing the values of the left- and the

right-hand sides of this equation, we can determine k. Hence, if k is almost

determined, the remaining bits can be guessed and veriļ¬ed by checking the

previous equation. This approximately halves the search space for k.

Other Attack Scenarios : Suppose an attacker has the ability to force

a device to use the same scalar k for several point multiplication operations.

Due to the randomization, the same scalar will produce diļ¬erent power traces

that all give information about the used scalar. Combining the information

gained from several traces, a scalar value can usually be determined with very

few (approximately 10) measurements [265], [340].

84 IV. SIDE-CHANNEL ANALYSIS

IV.4.3. The Doubling Attack. This attack relies on the fact that similar

intermediate values are manipulated when working with a point P and its

double [2]P [122]. We assume that an attacker can identify point doubling

operations with identical data in two power traces.

Algorithm IV.3: Double-and-Add-Always Algorithm

ā’1

kj 2j , kj ā {0, 1}.

INPUT: Point P and -bit multiplier k = j=0

OUTPUT: Q = [k]P .

1. R0 ā P .

2. For j = ā’ 2 to 0 by ā’1 do:

R0 ā [2]R0 .

3.

R1ā’kj ā R1ā’kj + P .

4.

5. Return R0 .

Algorithm IV.3 is secure against simple SCA on a ļ¬rst glance as in each

iteration a point doubling and a point addition operation is executed. The

i=j

value of the variable R0 after j+1 iterations is Qj (P ) = [ i=0 2jā’i ]P . Rewrit-

ing this expression in terms of [2]P leads to Qj (P ) = Qjā’1 ([2]P ) + [d ā’j ]P .

Thus, the intermediate result of the algorithm with input P (which is stored

in R0 ) at step j is equal to the intermediate result of the algorithm with input

[2]P at step j ā’ 1 if and only if d ā’j is zero. Hence, we just need to compare

the doubling operation at step j + 1 for P and at step j for [2]P to recover

the bit d ā’j .

IV.5. Diļ¬erential SCA Attacks on Point Multiplications

The ļ¬rst article related to side-channel analysis on public-key cryptosys-

tems was Messerges et al.[245]. Shortly afterwards, Coron published an ar-

ticle speciļ¬cally related towards implementations of elliptic curve cryptosys-

tems [87]. Goubin published an article wherein some of Coronā™s proposed

countermeasures were counteracted by a reļ¬ned attack [151], which was anal-

ysed further in [313].

IV.5.1. Diļ¬erential SCA Without any Knowledge of the Implemen-

tation. Two diļ¬erential SCA attacks have been published which assume no

knowledge about the attacked device.

The SEMD (Single-Exponent Multiple-Data) attack assumes that an at-

tacker can perform the point multiplication operation with the secret value k

and some public value r with several elliptic curve points. Then, for both the

secret and the public values, the conducted measurements, tki and tri , are

averaged. The two average traces are subtracted from each other to produce a

diļ¬erence trace. Whenever this diļ¬erence trace shows a peak, the two values

k and r must have been diļ¬erent as well:

D = TĀÆk ā’ T r.

ĀÆ

IV.5. DIFFERENTIAL SCA ATTACKS ON POINT MULTIPLICATIONS 85

Thus, by comparing the power traces from a known and an unknown scalar

an attacker can learn the unknown scalar.

The MESD (Multiple-Exponent Single-Data) attack works similarly. In

this scenario it is assumed that the attacker can feed the device with multiple

scalar values and one single elliptic curve point. Another assumption is that

the attacker is allowed to perform the point multiplication with the secret

scalar value and the same elliptic curve point several times. The measure-

ments for the secret scalar, tsi , are averaged to get rid of noise. Then, the

attacker determines the bits of the secret value step by step. For each choice

of a new bit, r0 or r1, the attacker calculates how well the choice ļ¬ts the

secret exponent:

ĀÆ

D1 = T s ā’ tr1

ĀÆ

D0 = T s ā’ tr0.

Whichever diļ¬erence trace is zero for a longer time corresponds to the correct

bit.

IV.5.2. Diļ¬erential SCA with Knowledge of the Implementation.

The two other published attacks work according to our general introduc-

tion in Section IV.3.3. The ZEMD (Zero-Exponent Multiple-Data) uses as a

statistical method the distance-of-mean test. The outputs of the model are

therefore a simple classiļ¬cation of the measurement data into two sets. For

one set, the model predicts a high side-channel leakage for a certain moment

in time, T h , while for the other set a low side-channel leakage T l is predicted.

Only if the model predicted the side-channel behaviour correctly, is the clas-

siļ¬cation into the two diļ¬erent sets meaningful and thus a diļ¬erence between

the sets must be visible:

ĀÆ ĀÆ

D = T h ā’ T l.

Coronā™s attack is essentially the same as the ZEMD attack, but it uses

the correlation coeļ¬cient instead of the distance-of-mean test. Both attacks

assume that an attacker knows the internal representation of the elliptic curve

points and that a prediction of the side-channel output is possible.

IV.5.3. Diļ¬erential SCA Using Address Information. As explained

in Section IV.3.3, an attacker tries to include as much information as pos-

sible in her hypothetical model. This information is usually related to the

intermediate data, but it can also be related to the intermediate addresses.

In some devices, loading data from a speciļ¬c address, i.e., performing data =

R[Address], is composed into two stages. First, the physical address is de-

termined given R[Address] and then the data that is stored in this address

are loaded. As addresses also travel over a bus (as data travel over a bus),

diļ¬erent addresses can be expected to leak diļ¬erent information. In [175],

this assumption has been veriļ¬ed for a certain device and an attack based

on it is presented. This attack breaks implementations of EC schemes on

86 IV. SIDE-CHANNEL ANALYSIS

that device, which resist typical diļ¬erential SCA attacks based on predicting

intermediate data by using a Montgomery point multiplication algorithm (see

Algorithm V.2). In Step 4 of this algorithm, data is fetched from a partic-

ular address that depends on a particular bit of the key. Hence, when for

example performing an SEMD attack, the attacker should see a peak in the

side-channel trace whenever two key bits diļ¬er.

IV.5.4. Reļ¬ned Diļ¬erential SCA. If projective coordinates are used in

an implementation, they can be used to randomize the intermediate data;

diļ¬erential SCA as described in the previous sections cannot be used di-

rectly. However, under the assumption that the multiplier is not randomized,

exploiting the properties of so-called special points can lead to an attack;

see [151]. A special point P0 = O is a point having the property that one

of the aļ¬ne or projective coordinates is 0. Hence, randomization does not

aļ¬ect this property.

Suppose the attacker already knows the highest bits d ā’1 , . . . , dj+1 of k. To

attack the next bit dj in Algorithm IV.3, the attacker needs to feed suitable

multiples P1 of P0 into the algorithm. Such a multiple is calculated based on

the (guessed) key bit dj . If this guess is correct, then this multiple is nicely

related to the content of the registers in the (j + 1)th step: If we denote the

value of R1 in the (j + 1)th step by [x]P , then, if dj is correct, the special

point is [xā’1 ]P (we have to assume that x is relatively prime to the order of

P ). Hence, in the (j + 1)th step of the algorithm with input P1 , the point

[xxā’1 ]P0 = P0 is processed. The special property of this point can then be

picked up from several side-channel traces by averaging over the traces.

In [6], an extension to [151] is described. This extended attack is based on

the observation that, even if a point does not have a zero coordinate, some of

the intermediate values that occur during point addition (or point doubling)

might become zero.

CHAPTER V

Defences Against Side-Channel Analysis

M. Joye

V.1. Introduction

This chapter is aimed at studying the resistance of elliptic curve cryp-

tosystems against side-channel analysis. The main operation for elliptic curve

cryptosystems is the point multiplication: Q = [k]P , where the multiplier k is

generally a secret (or private) parameter. In particular, this chapter surveys

various software strategies to prevent side-channel attacks in the computation

of Q = [k]P .

Given the rich mathematical structure of elliptic curves, an endless num-

ber of new countermeasures can be imagined. More importantly, as we will

see, eļ¬cient countermeasures (i.e., with almost no impact on the running time

and/or the memory requirements) are known. The advantage is clearly on

the end-user side (not on the attacker side). We must also keep in mind that

present-day cryptographic devices come equipped with a variety of hardware

countermeasures, already making harder (or impossible) side-channel attacks.

As given in textbooks, the formulĆ¦ for doubling a point or for adding

two (distinct) points on a WeierstraĆ elliptic curve are diļ¬erent. So, for

example, from the distinction between the two operations, a simple power

analysis (i.e., a simple side-channel analysis using power consumption as a

side-channel; cf. Section IV.4) might produce diļ¬erent power traces, revealing

the value of k in the point multiplication algorithm. There are basically three

approaches to circumvent the leakage. This can be achieved by:

1. unifying the addition formulĆ¦ [51, 50] or considering alternative pa-

rameterizations [188, 220, 27];

2. inserting dummy instructions or operations [87, 76];

3. using algorithms that already behave āregularlyā [225, 264, 253, 51,

178, 120].

Even if a point multiplication algorithm is protected against simple side-

channel analysis, it may succumb to the more sophisticated diļ¬erential side-

channel analysis [87, 206] (cf. Section IV.5). Practically, we note that, except

ECIES, very few elliptic curve cryptosystems are susceptible to such attacks

as, usually, the input point is imposed by the system and the multiplier is an

ephemeral parameter, varying at each execution.

87

88 V. DEFENCES AGAINST SIDE-CHANNEL ANALYSIS

In order to thwart diļ¬erential side-channel analysis, the inputs of the

point multiplication algorithm, namely, the base-point P and the multiplier

k, should be randomized [87, 190].

The insertion of random decisions during the execution of the point multi-

plication algorithm also helps in preventing side-channel analysis [271, 156].

Of course, several countermeasures can be combined for providing a higher

security level.

V.2. Indistinguishable Point Addition FormulĆ¦

We start by recalling the explicit group law on elliptic curves (see [ECC,

Section III.2]). Let E denote an elliptic curve given by the WeierstraĆ form

E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 (V.1)

and let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) denote points on the curve. The

inverse of P1 is ā’P1 = (x1 , ā’y1 ā’ a1 x1 ā’ a3 ). If P1 = ā’P2 , then the sum

P3 = P1 + P2 = (x3 , y3 ) is deļ¬ned as

x3 = Ī»2 + a1 Ī» ā’ a2 ā’ x1 ā’ x2 , y3 = Ī»(x1 ā’ x3 ) ā’ y1 ā’ a1 x3 ā’ a3 (V.2)

ļ£±

with

ļ£“ y1 ā’ y2

ļ£“ when x1 = x2 ,

ļ£²x ā’ x

1 2

Ī»= (V.3)

ļ£“ 3x2 + 2a2 x1 + a4 ā’ a1 y1

ļ£“1

ļ£³ when x1 = x2 .

2y1 + a1 x1 + a3

From Eq. (V.3), it clearly appears that the formulĆ¦ for adding two (dis-

tinct) points and for doubling a point are diļ¬erent.

This section discusses various ways for making indistinguishable the addi-

tion formulĆ¦ on elliptic curves so that a side-channel analysis cannot reveal

what kind of addition is being processed.

V.2.1. Uniļ¬ed Addition FormulĆ¦. In [51], Brier and Joye observed that

the expressions of slope Ī» in Eq. (V.3) can be uniļ¬ed for adding or doubling

points. When P1 = Ā±P2 and provided that P1 and ā’P2 do not have the

same y-coordinate (i.e., provided that y1 = ā’y2 ā’ a1 x2 ā’ a3 ), we can write

Ī» = x1 ā’y2 = x1 ā’y2 Ā· y1 +y2 +a1 x2 +a3 and so obtain after a little algebra

y y

1 ā’x2 1 ā’x2 y1 +y2 +a1 x2 +a3

x2 + x1 x2 + x2 + a2 x1 + a2 x2 + a4 ā’ a1 y1

1 2

Ī»= . (V.4)

y1 + y2 + a1 x2 + a3

Remarkably, this expression for Ī» remains valid when P1 = P2 and can thus be

used for doubling a point. Indeed, replacing (x2 , y2 ) by (x1 , y1 ) in Eq. (V.4),

3x2 +2a2 x +a4 ā’a y

we get Ī» = 1 2y1 +a1 x1 +a3 1 1 .

1

To include the uncovered case y(P1 ) = y(ā’P2 ), we can extend the previous

expression to

(x2 + x1 x2 + x2 + a2 x1 + a2 x2 + a4 ā’ a1 y1 ) + (y1 ā’ y2 )

Ī»= 1 2

(y1 + y2 + a1 x2 + a3 ) + (x1 ā’ x2 )

V.2. INDISTINGUISHABLE POINT ADDITION FORMULĆ 89

for any ā K [50]. For example, when char(K) = 2, we can set = 1 if

y1 + y2 + a1 x2 + a3 + x1 ā’ x2 = 0 and = ā’1 otherwise; when char(K) = 2, we

can respectively set = 1 and = 0. Another strategy consists in randomly

choosing ā K. The extended expressions for Ī» prevent the exceptional

procedure attacks of [179].

Two main classes of elliptic curves are used in cryptography: elliptic

curves over large prime ļ¬elds and elliptic curves over binary ļ¬elds, i.e., ļ¬elds

of characteristic two. Avoiding supersingular curves, an elliptic curve over

F2n can be expressed by the short WeierstraĆ form

Ea2 ,a6 : y 2 + xy = x3 + a2 x2 + a6 .

From Eqs. (V.4) and (V.2), we see that the addition of two points requires

one inversion, four multiplies, and one squaring. Note that the multiplication

(x1 + x2 )(x1 + x2 + a2 ) can also be evaluated as (x1 + x2 )2 + a2 (x1 + x2 ).

Over the large prime ļ¬eld Fp (with p > 3), the WeierstraĆ equation sim-

pliļ¬es to

Ea4 ,a6 : y 2 = x3 + a4 x + a6 .

The addition of two points then requires one inversion, three multiplies, and

two squarings. As inversion in a ļ¬eld of large characteristic is a relatively

costly operation, projective representations of points are preferred [102]. A

projective point P = (XP : YP : ZP ) on the curve satisļ¬es the (projective)

WeierstraĆ equation

Ea4 ,a6 : Y 2 Z = X 3 + a4 XZ 2 + a6 Z 3

and, when ZP = 0, corresponds to the aļ¬ne point P = (XP /ZP , YP /ZP ).

The notation (X : Y : Z) represents equivalent classes: (X : Y : Z) and (X :

Y : Z ) are equivalent projective representations if and only if there exists a

non-zero ļ¬eld element Īø such that X = ĪøX , Y = ĪøY , and Z = ĪøZ . The

only point at inļ¬nity (i.e., with Z = 0) is the identity element O = (0 : 1 : 0),

and the inverse of P1 = (X1 : Y1 : Z1 ) is ā’P1 = (X1 : ā’Y1 : Z1 ). The sum of

two points P1 = (X1 : Y1 : Z1 ) and P2 = (X2 : Y2 : Z2 ) (with P1 , P2 = O and

P1 = ā’P2 ) is given by P3 = (X3 : Y3 : Z3 ) with

X3 = 2F W, Y3 = R(G ā’ 2W ) ā’ L2 , Z3 = 2F 3 ,

where U1 = X1 Z2 , U2 = X2 Z1 , S1 = Y1 Z2 , S2 = Y2 Z1 , Z = Z1 Z2 , T = U1 +U2 ,

M = S1 + S2 , R = T 2 ā’ U1 U2 + a4 Z 2 , F = ZM , L = M F , G = T L,

and W = R2 ā’ G. Therefore, adding two points with the uniļ¬ed formulĆ¦

(Eq. (V.4)) requires 17 multiplies plus 1 multiplication by constant a4 . When

a4 = ā’1, we may write R = (T ā’ Z)(T + Z) ā’ U1 U2 and the number of

multiplies decreases to 16 [51].

90 V. DEFENCES AGAINST SIDE-CHANNEL ANALYSIS

V.2.2. Other Parameterizations. Every elliptic curve is isomorphic to a

WeierstraĆ form. Parameterizations other than the WeierstraĆ form may lead

to faster uniļ¬ed point addition formulĆ¦. This was independently suggested

by Joye and Quisquater [188] and by Liardet and Smart [220]. We illustrate

the topic with elliptic curves deļ¬ned over a large prime ļ¬eld Fp , p > 3.

V.2.2.1. Hessian Form. Most elliptic curves containing a copy of Z/3Z

(and thus with a group order a multiple of 3) can be expressed by the (pro-

jective) Hessian form [309]1

H : X 3 + Y 3 + Z 3 = 3DXY Z . (V.5)

Let P1 = (X1 : Y1 : Z1 ) and P2 = (X2 : Y2 : Z2 ) denote points on the Hessian

curve H. The inverse of P1 is ā’P1 = (Y1 : X1 : Z1 ), and the sum P3 = P1 + P2

is deļ¬ned as (X3 : Y3 : Z3 ) with

X3 = X2 Y12 Z2 ā’ X1 Y22 Z1 , Y3 = X1 Y2 Z2 ā’ X2 Y1 Z1 , Z3 = X2 Y2 Z1 ā’ X1 Y1 Z2

2 2 2 2

ńņš. 3 |