. 3
( 10)


III.3. Other Attacks Against ECIES
One of the problems with security proofs is that they tend to cause a
certain level of complacency about using a cipher. The truth is that a se-
curity proof is only valid in a very precise set of circumstances. It assumes
that the attacker has access to no other information besides the public key
and (possibly) a decryption oracle, and has a perfect implementation of the
cryptosystem. In the real world an attacker may have access to much more
information than the theoretical attacker did in the security proof. Attacks
that make use of information or abilities that are not available to an attacker
in a security proof are often called side-channel attacks.
Some side-channel attacks are only relevant to one particular cryptosys-
tem, but others can be used to attack whole classes of ciphers. Elliptic curve
cryptosystems are often vulnerable to side-channel attacks, and this will be
discussed more thoroughly in Chapter IV. For the moment we will ignore
these general attacks and focus on those attacks that are speci¬c to ECIES.

III.3.1. Weak Key Generation Algorithm. On one hand it is easy to
see how a ¬‚awed key generation algorithm can reduce the security of a cryp-
tosystem. It is not di¬cult to understand how an attacker might be able
to break a scheme if, for example, the private key x was always a particular
number or had a particular form. However, there is a more subtle reason why
key generation is an important part of a cryptosystem, and this has particular
relevance when talking about security proofs.
It is very important that the key generation algorithm has the ability to
produce lots of di¬erent key-pairs and each with equal probability. This may
seem a bit contradictory. After all, an attacker will only get to attack the
scheme after the key-pair has been chosen, so why should it matter whether
the key generation algorithm produces one key or lots and lots?
Well, for a start we have assumed that the attacker knows all the ele-
ments of the cryptosystem including the key generation algorithm. If the key
generation algorithm only ever produces one key, or a very small number of
keys, then it is not hard for the attacker to run the key generation algorithm,
¬nd out the private key and break the scheme. The obvious solution to this
problem is to keep the key generation algorithm hidden from the attacker, as
is often the case with real-life instantiations of cryptosystems.

It turns out that this is not a useful thing to do, but for quite a subtle
reason. The reason is that the key generation algorithm is a ¬xed algorithm
that doesn™t depend on any secret values, so there are always attackers that
assume something about the key generation algorithm, such as that it al-
ways produces one particular key-pair, and attempt attacks based on this
assumption. If the key generation algorithm produces a large enough number
of di¬erent key-pairs, then these attackers will not have a signi¬cant success
probability as it is very unlikely that a key-pair that is generated by G will
satisfy the assumption that the attacker makes, but this could be a problem
for key generation algorithm with small ranges.
It is possibly best to think about this in terms of a simple example. If sk
is some ¬xed private key for an asymmetric encryption scheme (G, E, D), then
there is an attacker A who will try to decrypt a challenge ciphertext using
the function D(·, sk) regardless of what the public key is or whether it knows
any information about the key generation algorithm. This means that, even
if G is unknown, A will still have a high success probability if G outputs the
key-pair (pk, sk) with a signi¬cant probability. Since we are forced to consider
all possible attackers for the scheme (G, E, D), we are forced to conclude that
(G, E, D) is a weak encryption algorithm because an attacker exists that will
break the scheme with high probability even though we do not know what
that attacker is. On the other hand, if G only ever produces (pk, sk) with an
insigni¬cant probability, i.e., G has the ability to output lots of di¬erent keys
each with equal probability, then (G, E, D) is secure because the attacker A
will not work in all but an insigni¬cant fraction of the possible cases.
This demonstrates another weakness of security proofs. Whilst a security
proof tells us about the security of a scheme in general, it doesn™t tell us about
how the scheme works with one particular key-pair. It is entirely possible for a
scheme to have a good security proof and yet be completely insecure for some
set of keys. The best that a security proof can say is that if one generates a
key-pair using the key generation algorithm, then the probability that that
scheme will be insecure is insigni¬cant; it doesn™t say anything about the
suitability of any one speci¬c key-pair that might actually be in use.

III.3.2. Invalid Elliptic Curve Point Attacks. Another assumption that
has been made throughout this section is that all the data are valid. In real
implementations of elliptic curve cryptosystems, elliptic curve points are usu-
ally represented as elements of Fq — Fq or Fq by using some form of point
compression. Whenever an attacker has requested the decryption of a cipher-
text (U, c, r), we have always implicitly assumed that U is a valid point on
the elliptic curve E and an element of the subgroup generated by P . If this is
not explicitly checked, then an attacker can exploit this to break the scheme.
As an example, consider the following simple attack. The attacker ¬nds
a point U on E with small prime order p1 , chooses any message m ∈ M and,

for 0 ¤ i < p1 , computes
(i) (i)
(k1 ||k2 ) = KD([i]U, l),
c(i) = Enc(m, k1 ),
r(i) = MAC(c(i) , k2 ).
Let C (i) = ([i]U, c(i) , r(i) ) and 0 ¤ j < p1 be such that j ≡ x (mod p1 ). Since
[j]U = [x]U , where x is the secret key, C (j) will decrypt to give the message
m. It is very unlikely that any of the other ciphertexts C (i) (with i = j) will
decrypt to give m. Therefore, with only p1 requests to the decryption oracle,
the attacker can ¬nd out the value of the private key x (mod p1 ).
If an attacker does this for a series of primes p1 , p2 , . . . , pk such that
p1 p2 . . . pk ≥ q, then it can recover the whole private key x using the Chi-
nese Remainder Theorem (and it will only need to make p1 + p2 + . . . + pk
requests to the decryption oracle to do this).
Of course this attack only works if the elliptic curve contains a number of
points with small prime orders. However, other variants of this type of attack
exist which involve the attacker changing parts of the public parameters, such
as the public key Y , the group generator P or even the elliptic curve E itself.
These variants may work even if the original attack does not. Details of
methods that can be used to check the validity of elliptic curve points and
elliptic curves themselves can be found in Section I.5.
III.3.3. Variable Length Symmetric Keys. Another implicit assump-
tion that is made in the security proofs is that the keys produced by the key
derivation function are of a ¬xed length l. This seems like a trivial point, but
it can actually be very important, especially when one combines this with
some details about the way in which most key derivation functions work.
Most key derivation functions compute KD(U, l) by taking the ¬rst l bits of
an in¬nite pseudo-random sequence produced from the input U . This means
that, for any 0 < l < l, KD(U, l ) is the same as the ¬rst l bits of KD(U, l).
This is not a problem for ECIES providing that we use symmetric keys of a
¬xed length l “ it becomes a problem, however, if we allow l to vary.
In particular, some early versions of ECIES allowed the symmetric cipher
to be a variable length Vernam cipher (see Section III.1.3). In this case
encryption is given by:

Algorithm III.1: ECIES Encryption (Weak Version)

A message m, public key Y and the length of the MAC
key l .
OUTPUT: A ciphertext (U, c, r).
1. Choose k ∈R {1, . . . , q}.
2. U ← [k]G.

T ← [k]Y .
(k1 k2 ) ← KD(T, |m| + l ).
Encrypt the message, c = m • k1 .
Compute the MAC on the ciphertext, r ← M AC(c, k2 ).
Output (U, c, r).

Decryption is given in the obvious way.
This variant of ECIES was proven to be insecure by Shoup [305]. Suppose
m = m1 ||m2 is a message such that |m1 | > 0 and |m2 | = l . Suppose further
that the ciphertext C = (U, c, r) is an encryption of m and that c = c1 ||c2 ,
where |c2 | = l . It is easy to see that the ciphertext C = (U, c , r ), where
c = c1 • ∆ and r = MAC(c , c2 • m2 ),
is a valid encryption of the message m1 • ∆ for any bit-string ∆ of length
|m1 |. Hence an attacker can check whether a ciphertext C is the encryption
of a message m or not by checking to see if the shortened ciphertext C is
an encryption of the shortened message m1 • ∆ or not. This would allow an
IND-CCA2 attacker to break ECIES.
III.3.4. Using the x-Coordinate. As has already been mentioned in Sec-
tion I.4, in some implementations of ECIES the key derivation function KD
is not applied to the elliptic curve point T but to the x-coordinate of T .
Whilst this is not really a problem for the cipher, since it is easy to see that
computing the x-coordinate of T is roughly as hard as computing the whole
of T , it is a problem for the security proof.
The problem occurs because the decryption of a ciphertext (U, c, r) in
this case is always the same as the decryption of the ciphertext (’U, c, r).
Therefore an attacker with CCA2 access to a decryption oracle can always
decrypt the challenge ciphertext C — = (U — , c— , r— ) by requesting the decryp-
tion of (’U — , c— , r— ). Hence the scheme is not secure against CCA2 attackers.
A cryptosystem for which there is some easy way of computing two di¬er-
ent ciphertexts that decrypt to give the same message is said to be benignly
malleable [305].
We can remove this problem by de¬ning a binary relation ; between
ciphertexts, where C ; C if (1) D(C, sk) = D(C , sk) and (2) it is easy to
¬nd C from C. In this case the decryption oracle must refuse to decrypt not
only the challenge ciphertext C — but any ciphertext C such that C — ; C .
These ideas are slowly gaining more acceptance in the academic community

One problem with hybrid encryption is that one often ends up proving
the same results again and again. We can use the three security proofs for
ECIES in given Section III.2 as an example: each of these had a di¬erent

technique for dealing with the asymmetric (elliptic curve) part of the system,
and yet all three used exactly the same arguments about the symmetric part.
Recently Shoup (see [304] and [93]) introduced the concept of a generic
hybrid construction (sometimes called a KEM/DEM construction). Here the
asymmetric and symmetric parts of a hybrid scheme are separated and each
has its own formal security requirements. This allows us to evaluate their
security individually and to “mix and match” the asymmetric KEMs and the
symmetric DEMs without needing to produce another long, boring security

III.4.1. KEMs and DEMs. A KEM/DEM construction is made of two
parts: an asymmetric key encapsulation mechanism (or KEM) and a sym-
metric data encapsulation mechanism (or DEM). A KEM takes as input a
public key and produces a random symmetric key of a predetermined length
and an encryption (or encapsulation) of that key. A DEM takes as input a
message and a symmetric key and produces an encryption of that message
under that key. Once again, it is very important to be precise about how we
de¬ne a KEM and a DEM.
Definition III.11. A KEM is a triple of algorithms (G, Encap, Decap), where
• G is a probabilistic key generation algorithm. It takes no input, except
randomness, and outputs a key-pair (pk, sk). Again pk is the public key
and must be distributed to everyone that uses the encryption algorithm,
and sk is the private key and should only be made available to those
parties who have permission to decrypt messages.
• Encap is a probabilistic algorithm called the encapsulation algorithm.
It takes only the public key as input, and randomness, and outputs a
symmetric key K of some predetermined length and an encapsulation
C1 of K.
• Decap is a deterministic algorithm called the decapsulation algorithm.
It takes as input the private key sk and an encapsulation C1 . It returns
either a symmetric key K or an error symbol ⊥.
A KEM is sound if for all valid key-pairs (pk, sk) and any output (K, C1 ) =
Encap(pk) we have K = Decap(C1 , sk).
It should be fairly easy to see the similarities between KEMs and public
key encryption schemes. The de¬nitions are identical except for the fact that
the encapsulation algorithm does not take a message as input but rather
generates its own (the symmetric key).
Definition III.12. A DEM is a pair of algorithms (Enc, Dec), where
• Enc is a (possibly) probabilistic algorithm called the encryption algo-
rithm. It takes as input a message m and a symmetric key K of some
predetermined length and outputs an encryption C2 of the message m.

• Dec is a deterministic algorithm called the decryption algorithm. It
takes as input an encryption C2 and a symmetric key K and returns
either a message m or an error symbol ⊥.
A DEM is sound if for all keys K of the correct length and messages m
we have Dec(Dec(m, K), K) = m.
In order to form a public-key encryption scheme one combines a KEM
and a DEM, where the symmetric keys that are outputted by the KEM are
the correct size for use by the DEM, in the following manner. Key generation
is given by the key generation algorithm for the KEM, G.

Algorithm III.2: KEM/DEM Encryption

INPUT: A message m and the public key pk.
OUTPUT: A ciphertext C = (C1 , C2 ).
1. (K, C1 ) ← Encap(pk).
2. C2 ← Enc(m, K).
3. Output C = (C1 , C2 ).

Algorithm III.3: KEM/DEM Decryption

INPUT: A ciphertext C = (C1 , C2 ) and the private key sk.
OUTPUT: A message m or an ˜˜Invalid Ciphertext™™ message.
1. K ← Decap(C1 , sk).
2. If K =⊥ then output ˜˜Invalid Ciphertext™™.
3. m ← Dec(C2 , K).
4. If m =⊥ then output ˜˜Invalid Ciphertext™™.
5. Output m.

III.4.2. Security for KEMs and DEMs. The security criteria for KEMs
and DEMs are actually quite similar, and both are closely related to the IND
game that was used to de¬ne the security of a public-key encryption scheme.
Both are concerned with the e¬ects of a two-stage attacker A = (A1 , A2 ) who
is playing a game against a challenger. For a KEM the attacker is trying to
tell a real symmetric key from a random symmetric key and for a DEM the
attacker is trying to tell if a particular ciphertext is the encryption of one
message or another.
Definition III.13. For a KEM, the indistinguishability IND game for an
attacker A = (A1 , A2 ) consists of four major steps:
1. A challenger generates a random key-pair (pk, sk) by running the key
generation algorithm G.

2. The attacker runs A1 on the input pk. It returns some state informa-
tion s.
3. The challenger generates a valid symmetric key and its encapsulation
(K0 , C — ) by running the encapsulation algorithm Encap(pk). It also
generates a random symmetric key K1 of the same length and chooses
a bit σ ∈ {0, 1} uniformly at random.
4. The attacker runs A2 on the input (Kσ , C — , pk, s). It returns a guess
σ for σ.
The attacker wins the game if σ = σ. The advantage of an attacker in playing
the IND game is de¬ned to be
|P r[σ = σ] ’ 1/2| .
Attackers that are playing the IND game for a KEM can have either CPA,
CCA1 or CCA2 access to a decapsulation oracle, i.e., an oracle that runs the
decapsulation algorithm for them. For more details see De¬nition III.4. In
order to guarantee that the overall hybrid scheme is secure against IND-
CCA2 attackers we will require that the KEM be secure against IND-CCA2
attackers too.
Definition III.14. For a DEM, the indistinguishability IND game for an
attacker A = (A1 , A2 ) consists of four major steps:
1. A challenger generates a random symmetric key K of an appropriate
2. The attacker runs A1 . It returns two messages m0 and m1 of equal
size, as well as some state information s.
3. The challenger chooses a bit σ ∈ {0, 1} uniformly at random. It com-
putes the challenge ciphertext C — = Enc(mσ , K).
4. The attacker runs A2 on the input (C — , s). It returns a guess σ for σ.
Again, the attacker wins the game if σ = σ. The advantage of an attacker in
playing the IND game is de¬ned to be
|P r[σ = σ] ’ 1/2| .
For our purposes, an attacker that is playing the IND game for a DEM can
either have no access to a decryption oracle, in which case the attacker is said
to be passive (PA), or have complete access to a decryption oracle, in which
case the attacker is said to be active (CCA). Obviously an active attacker is
forbidden from requesting the decryption of the challenge ciphertext from the
decryption oracle. Note that in neither of these two cases can the attacker
encrypt messages under the randomly generated symmetric key K. This is a
major di¬erence between the symmetric and asymmetric cases.
One of the main points of this approach is that the security of the two com-
ponents can be evaluated independently, as the following theorem of Shoup
[304] demonstrates.

Theorem III.15. If (G, E, D) is an asymmetric encryption scheme composed
of a KEM that is secure against IND-CCA2 attackers and a DEM that is se-
cure against IND-CCA attackers, then (G, E, D) is secure against IND-CCA2
The proof of this theorem is similar to parts of the proofs of security for
ECIES that we have already sketched. Suppose there exists an attacker A
that breaks the hybrid scheme in the IND-CCA2 model with a signi¬cant
advantage. We attempt to build an IND-CCA2 attacker B that breaks the
KEM. Either B has a signi¬cant advantage or A must be attacking the sym-
metric part of the scheme alone, and in that case A™s advantage would be the
same even if we replaced the symmetric keys used to encrypt the challenge
ciphertext with randomly generated symmetric keys. We can therefore use
A to build an IND-CCA attacker C that attacks the DEM with signi¬cant
advantage. On the other hand, if both the KEM and the DEM, are secure
then the overall hybrid encryption scheme must be secure too.
There is one note of caution however. Granboulan [153] has shown that
it is vitally important that the KEM and the DEM are strictly independent
of each other, i.e., no information used by the KEM (including any element
of the public key or the system parameters) should be made available to the
DEM, or Theorem III.15 will not hold.2 In other words, if a secure KEM
and a secure DEM share some information, then it is possible that the hybrid
scheme formed from them might not be secure.
DEM (both of which are de¬ned in Section I.4) meet their required security
levels and have proofs of security to attest to this.
The security proof for ECIES-DEM is the same as the argument used
in Section III.2.1 to show that ECIES is secure if the symmetric keys are
randomly generated. Essentially one can prove that ECIES-DEM is secure
provided the underlying symmetric cipher and MAC scheme are secure. See
Section III.1.3 for more details on the security of symmetric primitives.
As one can imagine, the security proof for ECIES-KEM is a little more
complicated, and again we are going to have to make some kind of simplifying
assumption in the security analysis, just as we had to do for ECIES. It should
be easy to see that we can prove the security of ECIES-KEM using either the
random oracle model or the generic group model (see Sections III.2.2 and
III.2.3). In both models the security proof for ECIES involved demonstrating
that either we could break some underlying mathematical problem or the
symmetric keys were indistinguishable from completely random keys. This
is exactly the condition we need to prove in order to prove the security of
Similarly, it is important that the DEM is chosen before the asymmetric key pair is
generated or else elements of the key-pair may in¬‚uence the choice of DEM. This would
also invalidate Theorem III.15.

We also managed to show this based on the assumption that the hash
Di¬e“Hellman problem is di¬cult to solve, but this does not help us prove
the security of ECIES-KEM. This is because the hash Di¬e“Hellman problem
is exactly the same as the problem of breaking ECIES-KEM with an IND-
CCA2 attacker. So basing a security proof for ECIES-KEM on the hash
Di¬e“Hellman problem would be rather like proving that ECIES-KEM is
di¬cult to break whenever it is di¬cult to break ECIES-KEM!
ECIES-KEM also su¬ers from all of the problems associated with ECIES
discussed in Section III.3, although the use of the cofactor mode (see Sec-
tion I.4) can help stop the invalid elliptic curve point attacks discussed in
Section III.3.2. Nevertheless, ECIES-KEM has already become a popular
algorithm with both academic cryptographers and software developers and
has already been proposed for inclusion in many important cryptographic
Part 2

Implementation Techniques

Side-Channel Analysis

E. Oswald

Side-channel analysis (SCA) or information leakage analysis (ILA), refers
to a new and emerging type of cryptanalysis that uses leaked side-channel
information from a cryptographic device to determine the secret key. The
traditional cryptographic model consists of two parties, Alice and Bob, who
wish to communicate secretly over an insecure channel. To protect against
a potential eavesdropper, cryptographic algorithms are used to ensure the
secrecy of the communication. In this traditional model, the security of the
system relies solely on the secrecy of the key and the mathematical properties
of the cryptographic algorithm used.
However, in a more practical scenario the communicating parties are
mostly computing devices, such as personal computers, smart cards or mo-
bile phones. On such devices, a cryptographic algorithm is implemented
and executed whenever a cryptographic protocol requires its computation.
Although these devices operate in potentially hostile environments and are
subject to attacks themselves, they are supposed to protect the secret key.
Furthermore, such devices can act as a source of information. Depending on
the data they process, they might consume di¬erent amounts of power, emit
di¬erent amounts of electromagnetic emanations, produce di¬erent types of
noises, need di¬erent running times or output di¬erent error messages; see
Figure IV.1. If the device is executing a cryptographic algorithm (which uses
a secret key), the leaked information may be directly related to this secret
key. In certain cases, this additional information can be exploited to deduce
information about the secret key. Hence, the security of the whole system
relies on the security of the algorithms and protocols, and on the security of
their implementation.
This chapter is as organized as follows. The introductory part starts with
a brief section on cryptographic hardware in Section IV.1. Active attacks
assume that an attacker can manipulate a device in some way. Such attacks
are brie¬‚y sketched in Section IV.2. In passive attacks, which are treated in
Section IV.3, an attacker monitors the side-channels of a device but does not
manipulate the device himself. We introduce the currently used side-channels
in Section IV.3.1 and explain the concepts of simple side-channel analysis and
di¬erential side-channel analysis in Sections IV.3.2 and IV.3.3. We conclude

tion T
Execu Eve

netic Eman


etic m


trom on
io ti

n Ele u


Figure IV.1. A practical scenario where Alice and Bob are
computing devices and Eve can spy on the communication both
directly and via the emanations produced by the devices.

the introductory part of this chapter with a discussion on the attack scenarios
for elliptic curve cryptosystems in Section IV.3.4.
The second part is devoted to side-channel analysis on ECC. In Sec-
tion IV.4 we discuss the application of simple side-channel analysis on imple-
mentations of the elliptic curve scalar point multiplication. In Section IV.5
we apply di¬erential side-channel analysis on implementations of the elliptic
curve scalar point multiplication.
Defences against side-channel analysis are discussed separately, in Chap-
ter V.

IV.1. Cryptographic Hardware
Traditionally, the main task of cryptographic hardware was the acceler-
ation of operations frequently used in cryptosystems or the acceleration of
a complete cryptographic algorithm. Nowadays, hardware devices are also
required to store secret keys securely. Especially in applications involving
digital signatures, the secrecy of the private signing key is vital to the secu-
rity of the whole system. A cryptographic device must make key extraction
impossible. Active attacks that target cryptographic devices are commonly
referred to as tamper attacks.
Tamper resistant devices are supposed to make key extraction impossible,
while tamper evident devices should make the key extraction obvious. An ex-
ample of a highly tamper resistant device is IBM™s 4758 [IBM CoPro]. It is
the only device evaluated against the highest level of tamper resistance (FIPS
140-1[FIPS 140.1], level 4) at the time of writing. Such high-end devices are
costly and therefore usually used in few applications such as ATM machines.

Tamper resistant devices that are used for the mass market today are mostly
smart cards. Smart cards have to be cheap and are therefore restricted in
computing power, memory, and of course, mechanisms to counteract tamper

IV.1.1. Smart Cards. A smart card usually consists of a microprocessor
(8-bit for low cost cards and up to 32-bit in the newest generation of smart
cards), memory and some interface. Usually, the interface is a serial interface,
for example in the case of banking cards, but also USB“interfaces or RF“
interfaces for contactless smart cards are possible. Smart card commands are
coded in Application Protocol Data Units (APDUs).
For several purposes, authentication commands such as internal authenti-
cate or external authenticate are usually available for smart cards with cryp-
tographic functions. The internal authenticate command usually takes some
input data and computes some sort of signature or decryption using a secret
or private key. The external authenticate command even allows the supply of
the key for the cryptographic operation. These two commands are commonly
used for attacks in practice.

IV.2. Active Attacks
Active attacks or tamper attacks have a long history in the ¬eld of cryp-
tography. The term active implies that an attacker actively manipulates the
cryptographic device.

IV.2.1. Simple Attacks on Smart Cards. It is relatively simple to ex-
tract the microprocessor from a given smart card. Then, the microprocessor
can be depackaged and placed into a more suitable medium in order to per-
form some experiments.
Various simple types of tamper attacks have been popular for smart cards.
The very ¬rst generations of smart cards used an external programming volt-
age to read and write contents to its memory. By cutting of this external
programming voltage, an attacker could freeze the contents of the memory.
Manufacturers™ test circuits led to other attacks on smart cards. Such test
circuits are used during the testing phase after the fabrication of the smart
card. After the testing phase, the test circuits are disconnected from the
microprocessor. An attacker has to ¬nd and repair the disconnected wires to
use the test circuit.
Most of these simpler attacks on smart cards were not only active at-
tacks but were also intrusive. In modern smart cards, the components of
the processor are covered by a protective mesh. This mesh reports when it
is damaged, at which point the smart card destroys its memory contents or
stops functioning.

IV.2.2. Fault Attacks. Here, an attacker tries to induce faulty signals in
the power input (or clock input) of the smart card. Often, this forces a change
in the data in one of the registers. As a consequence, the computation pro-
ceeds with the faulty data. Alternatively, an attacker might try to supply
faulty data directly (see also Section III.3.2). Fault attacks on implementa-
tions of elliptic curve cryptosystems have been introduced in [26] and were
also discussed in [78]. The main idea behind the attacks in this chapter is
the following. By inserting a fault in some register, we force the device to
execute the point multiplication algorithm with values that are not a point on
the given curve but on some other curve instead. This other curve is probably
a cryptographically weaker curve that can be used to calculate the private
key k more easily. As a consequence, a device has to check its input and its
output values for correctness.

IV.3. Passive Attacks
Passive attacks were only recognized in the cryptographic community as
a major threat in 1996, when Paul Kocher published the ¬rst article about
timing attacks [205]. In a passive attack, the attacker only uses the standard
functionality of the cryptographic device. In the case of smart cards, these
are the internal and the external authenticate commands. The outputs of
the device are then used for the attack. A device can have di¬erent types
of outputs. If such outputs unintentionally deliver information (about the
secret key), then the outputs deliver side-channel information and we call
them side-channel outputs (or side-channels). In the next sections, we present
side-channels that have been exploited in attacks so far.

IV.3.1. Types of Side-Channels. The ¬rst o¬cial information related to
a side-channel analysis dates back to 1956. Peter Wright reported in [349]
that MI5, the British intelligence agency, was stuck in their e¬orts to break
an encryption machine in the Egyptian Embassy in London. The encryption
machine was a rotor machine of the Hagelin type. Breaking the machine™s
encryption exceeded their computational capabilities. Wright suggested plac-
ing a microphone near that machine to spy on the click-sound the machine
produced. Wright discovered that the sound of the clicks allows the attacker
to determine the position of some of the rotors. This additional information
reduced the computation e¬ort to break the cipher, and MI5 was able to spy
on the communication for years.

Timing Leakage : Timing information about the execution of a crypto-
graphic algorithm was the ¬rst side-channel that was brought to the atten-
tion of the cryptographic community by Kocher. In [205] he introduced a
technique to exploit even very small timing variations in an attack. Timing
variations often occur because of data-dependent instructions. For example,



Data from PC


Reset Vpp
Clock IO



Figure IV.2. This is a typical measurement setup to conduct
power (or EM) measurements. A personal computer is used
to operate the smart card and is connected with the digital
scope. The scope measures the power consumption on the GND
contact with a probe. A second probe, which is connected to
the IO“contact, is used as a trigger. The measurements are
sent from the scope to the personal computer for evaluation.

in the (original) Montgomery multiplication [254], a ¬nal subtraction step is
needed only if the result is exceeding the modulus.
In order to exploit timing information, very precise timing measurements
need to be made.

Power Consumption Leakage : Nowadays, almost all smart card proces-
sors are implemented in CMOS (Complementary Metal-Oxide Silicon) tech-
nology. The dominating factor for the power consumption of a CMOS gate
is the dynamic power consumption [348]. Two types of power consumption
leakage can be observed. The transition count leakage is related to the num-
ber of bits that change their state at a time. The Hamming weight leakage is
related to the number of 1-bits, being processed at a time. The internal cur-
rent ¬‚ow of a smart card can be observed on the outside of the smart card by,
for example, putting a small resistor between the ground (GND) of the smart
card and the true ground. The current ¬‚owing through the resistor creates a
voltage that can be measured by a digital oscilloscope (see Figure IV.2).
The ¬rst practical realization of an attack that makes use of the power
consumption leakage was reported by Kocher et al. in [206]. Since then,
many companies and universities have developed the skills to conduct these
measurements in practice.

Electromagnetic Radiation Leakage : The movement of electrically
charged particles causes electromagnetic leakage. In CMOS technology, cur-
rent ¬‚ows whenever the circuit is clocked and gates switch. This current ¬‚ow
is visible in the power consumption, and it is also visible as electromagnetic
emanations. There are two main categories for electromagnetic emanations.
Direct emanations result from intentional current ¬‚ow. Unintentional ema-
nations are caused by the miniaturization and complexity of modern CMOS
The ¬rst concrete measurements were presented in [139] and [278]. At
the time of writing, it is possible to conduct EM attacks with measurements
taken several meters away from the attacked device [5].

Combining Side-Channels : The combination of several di¬erent side-
channels has not been investigated intensively so far. Two sources of side-
channel information that are known to be e¬ciently combined are the timing
and the power side-channel. For instance, power measurements can be used
to get precise timing measurements for intermediate operations (if these op-
erations are clearly visible in the power trace). These power measurements
can then be used in timing attacks. This observation has been used in [342]
and [289].

Error Messages : Error-message attacks usually target devices implement-
ing a decryption scheme. It is a common assumption that a device gives a
feedback about whether or not a message could be decrypted successfully. If
an attacker knows the reason why a decryption failed, information about the
secret key can be deduced by sending well-chosen ciphertexts to the device.
These kind of attacks were ¬rst introduced by Bleichenbacher in [30].
This article describes a chosen ciphertext attack against the RSA encryp-
tion standard PKCS#1. Other error-message attacks are mentioned in [232]
against RSA-OAEP, [99] and [189] against the NESSIE candidate EPOC-2
and [330] against the padding and subsequent encryption in CBC mode as it
is done in various standards.

IV.3.2. Simple Side-Channel Analysis. In simple side-channel analysis,
an attacker uses the side-channel information from one measurement directly
to determine (parts of) the secret key. Therefore, the side-channel informa-
tion related to the attacked instructions (the signal) needs to be larger than
the side-channel information of the unrelated instructions (the noise). In
order to be able to exploit this information, at least some of the executed in-
structions need to be distinguishable by their side-channel trace. In addition,
the attacked instructions need to have a relatively simple relationship with
the secret key.

Influences hypothetical
Physical Device Model of the

Physical Device

Physical Side’Channel Output Hypothetical Side’Channel Output

Statistical Analysis


Figure IV.3. The idea behind di¬erential side-channel analy-
sis: an attacker compares the predictions from his hypothetical
model with the side-channel outputs of the real, physical device.

The question, of whether the signal is clearly visible in the side-channel
trace, often depends on the underlying hardware, but also sometimes on the
implementation itself; see Chapter V.

IV.3.3. Di¬erential Side-Channel Analysis. When simple side-channel
analysis are not feasible due to too much noise in the measurements, di¬eren-
tial side-channel analysis can be tried. Di¬erential side-channel analysis uses
many measurements to ¬lter out the interfering noise. Simple side-channel
analysis exploits the relationship between the executed instructions and the
side-channel output. Di¬erential side-channel analysis exploits the relation-
ship between the processed data and the side-channel output.
In di¬erential side-channel analysis, an attacker uses a so-called hypothet-
ical model of the attacked device. The quality of this model is dependent on
the capabilities of the attacker:
Clever Outsiders: They do not have insider knowledge of the system.
They are assumed to have moderately sophisticated equipment.
Knowledgeable Insiders: They are educated in the subject and have
a varying degree of understanding of the attacked system. They have
sophisticated tools to conduct the measurements.
Funded Organizations: They have teams of specialists that may have
access to insider knowledge about the attacked system, and they have
the most advanced analysis tools.
The model is used to predict the side-channel output of a device. Used as
such a predictor, it can output several di¬erent values. These can be values

predicting the leakage of one side-channel for several moments in time, but
they can also be values predicting the leakage of di¬erent side-channels. In
case only one single output value is used for an attack, then the attack is
called a ¬rst-order attack . If two or more output values for the same side-
channel are used in an attack, then the attack is called a second-order attack
or higher-order attack, respectively.
These predictions are compared to the real, measured side-channel output
of the device. Comparisons are performed by applying statistical methods on
the data. Among others, the most popular methods are the distance-of-mean
test and correlation analysis.
For the distance-of-mean test, the model only needs to predict two val-
ues, low and high, for a certain moment of time in the execution. These
two predictions are used to classify the measurement data. All measurements
for which the prediction stays high are put into one set, and all other mea-
surements are put in a second set. Then, the di¬erence between the sets is
calculated by computing the di¬erence of their mean values.
For a formal description of this test, we use the following notation. Let ti
denote the ith measurement data (i.e., the ith trace) and let tl and th denote
i i
a trace for which the model predicted a low and a high side-channel leakage,
respectively. A set of traces tl , th and ti is then denoted by T l , T h and T and
¯¯ ¯
the average trace of such a set by T l , T h and T . Consequently, the di¬erence
trace for the distance-of-mean method is calculated by
¯ ¯
D = T l ’ T h.
Whenever this di¬erence is su¬ciently large, or has some other recognizable
properties, it can be used to verify or reject the key hypothesis.
For correlation analysis, the model predicts the amount of side-channel
leakage for a certain moment of time in the execution. These predictions are
correlated to the real side-channel output. To measure this correlation, the
Pearson correlation coe¬cient can be used. Let pi denote the prediction of
the model for the ith trace and P set of such predictions. Then the Pearson
correlation coe¬cient is given as
E(T — P ) ’ E(T ) — E(P )
’1 ¤ C(T, P ) ¤ 1.
C(T, P ) = ,
Var(T ) — Var(P )
Here E(T ) denotes the expectation (average) trace of the set of traces T and
Var(T ) denotes the variance of a set of traces T . If this correlation is high,
i.e., close to +1 or ’1, then the key hypothesis is assumed to be correct.
To summarize and conclude this section, di¬erential side-channel analysis
is possible if an attacker can make a hypothesis over a small part of the secret
key to predict the side-channel output with his model.

IV.3.4. Target Operations in Elliptic Curve Cryptosystems. There
are two operations in commonly used elliptic curve cryptosystems that involve

the private key or an ephemeral (secret) key. We brie¬‚y discuss the scenarios
under which these operations are attacked.
• The modular multiplication of a known value and the private key needs
to be computed in ECDSA. If the multiplication is implemented in such
a way that the multiplier is the private key and the multiplication is
carried out with a variant of the binary algorithm, then this implemen-
tation is, in principle, vulnerable to side-channel analysis. However,
this operation can be implemented securely by not using the private
key as a multiplier.
• The scalar multiplication of a secret value with a known elliptic curve
point, i.e., the point multiplication [ECC, Section IV.2], needs to be
calculated in all elliptic curve cryptosystems. Also this operation is
usually implemented by some version of the binary algorithm. It has
been shown to be vulnerable to simple and di¬erential side-channel
Probably the most important elliptic curve protocol is ECDSA. From the
previous considerations it is clear that the operation that is di¬cult to secure
is the point multiplication operation. It is also well known [262] that not all
bits of the ephemeral ECDSA key need to be known in order to reconstruct
the private key of the ECDSA. As a result, the point multiplication operation
must be implemented to resist in particular simple side-channel analysis.

IV.4. Simple SCA Attacks on Point Multiplications
In the case of a simple SCA attack on an implementation of a point
multiplication algorithm, the adversary is assumed to be able to monitor the
side-channel leakage of one point multiplication, Q = [k]P , where Q and P
are points on an elliptic curve E and k ∈ Z is a scalar. The attacker™s goal is
to learn the key k using the information obtained from carefully observing the
side-channel leakage (e.g. power trace) of a point multiplication. Such a point
multiplication consists of a sequence of point addition, point subtraction and
point doubling operations. Each elliptic curve operation itself consists of a
sequence of ¬eld operations. In most implementations, the standard sequence
of ¬eld operations in point addition di¬ers from that in point doubling. Every
¬eld operation has its unique side-channel trace. Hence, the sequence of ¬eld
operations of point addition has a di¬erent side-channel pattern than that of
point doubling (see Figures IV.4 and IV.5 for an example with power traces).

IV.4.1. Attacking the Basic Double-and-Add Algorithm. It was al-
ready observed in [87] that the implementation of the simplest form of a
double-and-add algorithm, which is the binary algorithm (see Algorithm IV.1),
is an easy target for simple side-channel analysis.
First we note that the conditional branch (i.e., Step 5 in Algorithm IV.1)
only depends on a single bit of k. Hence, an attacker can inspect the power











0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000
clock cycle

Figure IV.4. A point addition power trace.











0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000
clock cycle

Figure IV.5. A point doubling power trace.

trace of the point multiplication algorithm to determine k. We have assumed
that it is possible to distinguish the point addition operation from the point
doubling operation in the power trace. Since a point addition operation can










0 0.56 1.08 1.63 2.3 2.89 3.56 4.1 4.69
x 10
clock cycle
0 1 0
1 0

Figure IV.6. The power consumption trace of an elliptic
curve scalar point multiplication operation that was performed
with a simple double-and-add algorithm. The used scalar can
be read directly from this power consumption trace.

only be caused by a 1 in the bit representation of k, deducing the key is trivial
(see Figure IV.6 for an example) in this naive implementation.

Algorithm IV.1: Right-to-Left Binary Algorithm
kj 2j , kj ∈ {0, 1}.
INPUT: Point P and -bit multiplier k = j=0
OUTPUT: Q = [k]P .
1. Q ← P .
2. If k0 = 1 then R ← P else R ← O.
3. For i = 1 to l ’ 1 do:
Q ← [2]Q.
If (ki = 1) then R ← R + Q.
6. Return R.

More interesting to attack are algorithms that make use of the rich arith-
metic on elliptic curves. This arithmetic allows the use of other representa-
tions of k, instead of only the binary representation. They o¬er some inherent
resistance to straightforward simple SCA because they also make use of point
subtraction. Because point addition and point subtraction only di¬er slightly,
they can be implemented in such a way that their side-channel patterns look

alike. Therefore, an attacker cannot determine the key bits by just inspecting
the side-channel trace.

IV.4.2. Attacking Double-Add-and-Subtract Algorithms. The task
of an attacker is to deduce some information about k by using the informa-
tion of the side-channel trace. In particular, an attacker wants to determine
and exploit the relationship between the occurrence of certain sequences of
bits and certain sequences of side-channel patterns. Let X be a random vari-
able that denotes the sequence of patterns in the side-channel trace, i.e., an
AD-sequence (for example X=“DDD” or X=“DAD”). Let Y be a random
variable that denotes a sequence of patterns in the digit representation of k,
i.e., a 01-sequence (for example Y = 000 or Y = 01). Then an attacker is
interested in exploiting and calculating the conditional probability
Pr(Y = y © X = x)
Pr(Y = y|X = x) = (IV.1)
Pr(X = x)
for many di¬erent x and y. Such probabilities can be calculated by using
Markov theory [155].
A Markov process, or in case of a ¬nite state space also called Markov
chain, is a rather simple statistical process. In a Markov process, the next
state is dependent on the present state but independent of the way in which
the present state arose from the states before. The transitions between the
states are determined by random variables that are either known or have to
be estimated.
Definition IV.1. Let T be a (k—k) matrix with elements tij , 1 ¤ i, j ¤ k. A
random process (X0 , X1 , . . . ) with ¬nite state space S = {s1 , . . . , sk } is said to
be a Markov chain with transition matrix T if, for all n all i, j ∈ {1, . . . , k},
and all i0 , . . . , in’1 ∈ {1, . . . , k} we have
Pr(Xn+1 = sj |X0 = si0 , . . . , Xn = si ) =
Pr(Xn+1 = sj |Xn = si ) = tij .
A large class of such Markov processes has the two important properties
of being irreducible and aperiodic.
The ¬rst property, i.e., that a process is irreducible, means that all states
can be reached from all other states with a ¬nite number of steps. The
second property, i.e., that a process is aperiodic, means that all states are
aperiodic. A state is aperiodic if the period is equal to 1, i.e., the probability
of returning to a state is always positive. These two properties are conditions
for the main theorem of Markov theory. Before we state this theorem we
de¬ne the stationary distribution ¬rst.
Definition IV.2. Let (X0 , X1 , . . . ) be a Markov chain with state space given
by {s1 , . . . , sk } and transition matrix T . A row vector π = (π1 , . . . , πk ) is said

to be a stationary distribution for the Markov chain if it satis¬es
πi ≥ 0 ∀i, and πi = 1, and (IV.2)
πT = π. (IV.3)
Theorem IV.3. For any irreducible and aperiodic Markov chain, there exists
a unique stationary distribution π, and any distribution µn of the chain at
time n approaches π as n ’ ∞, regardless of the initial distribution µ0 .
This theorem states that, for Markov processes having the properties of
being aperiodic and irreducible, a steady state always exists. By using the
transition matrix T and the steady-state vector, we can calculate the condi-
tional probabilities (IV.1).

Example: Consider a simple double-add-and-subtract algorithm as shown
in Algorithm IV.2.

Algorithm IV.2: Double-Add-and-Subtract Algorithm
INPUT: Point P and -bit multiplier k = j=0 kj 2j , kj ∈ {0, 1}.
OUTPUT: Q = [k]P .
1. R0 ← O, R1 ← P, s ← 0.
2. For i = 0 to l ’ 1 do:
If ki = 0 then
If s = 11 then R0 ← R0 + R1 .
s ← 0, R1 ← [2]R1 .
If ki = 1 then
If s = 0 then R0 ← R0 + R1 , R1 ← [2]R1 , s ← 1.
If s = 1 then R0 ← R0 ’ R1 , R1 ← [2]R1 , s ← 11.
If s = 11 then R1 ← [2]R1 .
10. If s = 11 then R0 ← R0 + R1 .
11. Return R0

There are three di¬erent states s in this algorithm. The initial state is
always 0. Under the assumption that Pr(ki = 0) = Pr(ki = 1) = 1/2, the
transition matrix for this algorithm is
« 
0.5 0.5 0.5
T = 0.5 0 0 . (IV.4)
0 0.5 0.5
From the transition matrix the steady-state vector, which is (1/2, 1/4, 1/4),
can be calculated. The number of elliptic curve operations that are induced
by an -bit number can be calculated as well. In each state, a point doubling
has to be calculated. Hence, there are point doublings. In addition, in half

of the cases in each state, a point addition has to be calculated. Hence, there
are (1/4 + 1/8 + 1/8) point additions. In total, 3/2 elliptic curve operations
are calculated.

A hidden Markov process is a Markov process in which we can only observe
a sequence of emissions (the AD-sequence), but we do not know the sequence
of states, which are related to the key bits (the 01-sequence) the process went
through. A hidden Markov model can be characterized by the quintuple
(S, O, T, E, s0 ). The notation used here is similar to the one used before:
the transition matrix is denoted by T and the ¬nite set of states is called
S. The emissions are denoted by the set O. The emission matrix E is a
|S| — |O| matrix and contains the conditional probability that an emission
symbol of the set E was produced in state S: Eij = Pr(Oj |Si ). The initial
state distribution is denoted by s0 . Given a (hidden) Markov model, the
task of the attacker is to ¬nd for a given AD-sequence, a corresponding state
sequence that explains the given AD-sequence best. One approach to tackle
this problem is to choose sub-sequences that are individually most likely, i.e.,
which have the highest conditional probabilities [269].

Example: Suppose we use Algorithm IV.2 to perform a point multiplication
with the scalar k = 560623. As a ¬rst step, we calculate su¬ciently many
conditional probabilities (see Table IV.1) by using the Markov model that we
derived in the previous example.

Table IV.1. Non-zero conditional probabilities. In this table
we use an abbreviated notation, i.e., we write p(000|DDD)
instead of p(Y = 000|X = DDD). We use the LSB ¬rst repre-

Pr(000|DDD) = 1/2 Pr(01|DAD) = 1/2 Pr(11|ADAD) = 1/2
Pr(100|DDD) = 1/4 Pr(10|DAD) = 1/4 Pr(10|ADAD) = 1/4
Pr(111|DDD) = 1/4 Pr(11|DAD) = 1/4 Pr(01|ADAD) = 1/4
Pr(001|DDAD) = 1/2 Pr(000|ADDD) = 1/4 Pr(110|ADADAD) = 1/2
Pr(101|DDAD) = 1/4 Pr(100|ADDD) = 1/2 Pr(101|ADADAD) = 1/4
Pr(110|DDAD) = 1/4 Pr(111|ADDD) = 1/4 Pr(011|ADADAD) = 1/4

Table IV.2 shows how the remainder of the attack works. The ¬rst row
contains the AD-sequence that we deduced from the power trace. In the
second row, this sequence is split into sub-sequences. In a practical attack,
the length of the sub-sequences will depend on the computational capabilities
of the attacker. However, it will be much larger than in this toy example. In
the third, fourth and ¬fth rows, the possible bit-patterns are listed according
to their conditional probabilities.

Table IV.2. k = 11110111101100010001, LSB First Representation

11 001 11 001 11 100 100 01
10 101 10 101 10 000 000 10
01 110 01 110 01 111 111 11

We are interested in the number of possible scalar values that have to be
tested before the correct k is found. Let denote the number of bits of k and
n the average length of the sub-sequences. In the worst case, if we only take
the non-zero conditional probabilities into account, but not their individual
values, we have to test 0.5 — 33 /2n keys on average. Let m be the number
of sub-sequences, i.e., m = 3 /2n in this example, and g(x, y) = x 2x’y . If
we take the individual conditional probabilities into account, this reduces to
0.5m + 0.5 i=0 0.5i 0.25m’i 1 + g(m, i) + 2 m
j=i+1 g(m, j) g(m, i) keys on
average [270].

The approach that we used in the previous example was to choose sub-
sequences that are individually most likely. This strategy guarantees us ¬nd-
ing the key k; however, it does not take sequences of sub-sequences into
account. Another widely used strategy is to ¬nd the single best path. A well
known technique for this approach is the Viterbi algorithm and has been used
successfully to attack double-add-and-subtract algorithms [195]. In case of a
randomized version of the algorithm that was discussed in the previous exam-
ple, the Viterbi algorithm determines approximately 88% of all intermediate
states correctly. The remaining and unknown states can be determined by a
meet-in-the-middle technique that is sketched in the following section.

Improvements : Assume that the point Q is given by Q = [k]P , where
k = xb + y denotes the private key k and P the public base-point. Then we
know that for the correct values of x, b and y the equation Q ’ [y]P = [xb]P
must hold. By computing and comparing the values of the left- and the
right-hand sides of this equation, we can determine k. Hence, if k is almost
determined, the remaining bits can be guessed and veri¬ed by checking the
previous equation. This approximately halves the search space for k.

Other Attack Scenarios : Suppose an attacker has the ability to force
a device to use the same scalar k for several point multiplication operations.
Due to the randomization, the same scalar will produce di¬erent power traces
that all give information about the used scalar. Combining the information
gained from several traces, a scalar value can usually be determined with very
few (approximately 10) measurements [265], [340].

IV.4.3. The Doubling Attack. This attack relies on the fact that similar
intermediate values are manipulated when working with a point P and its
double [2]P [122]. We assume that an attacker can identify point doubling
operations with identical data in two power traces.

Algorithm IV.3: Double-and-Add-Always Algorithm
kj 2j , kj ∈ {0, 1}.
INPUT: Point P and -bit multiplier k = j=0
OUTPUT: Q = [k]P .
1. R0 ← P .
2. For j = ’ 2 to 0 by ’1 do:
R0 ← [2]R0 .
R1’kj ← R1’kj + P .
5. Return R0 .

Algorithm IV.3 is secure against simple SCA on a ¬rst glance as in each
iteration a point doubling and a point addition operation is executed. The
value of the variable R0 after j+1 iterations is Qj (P ) = [ i=0 2j’i ]P . Rewrit-
ing this expression in terms of [2]P leads to Qj (P ) = Qj’1 ([2]P ) + [d ’j ]P .
Thus, the intermediate result of the algorithm with input P (which is stored
in R0 ) at step j is equal to the intermediate result of the algorithm with input
[2]P at step j ’ 1 if and only if d ’j is zero. Hence, we just need to compare
the doubling operation at step j + 1 for P and at step j for [2]P to recover
the bit d ’j .

IV.5. Di¬erential SCA Attacks on Point Multiplications
The ¬rst article related to side-channel analysis on public-key cryptosys-
tems was Messerges et al.[245]. Shortly afterwards, Coron published an ar-
ticle speci¬cally related towards implementations of elliptic curve cryptosys-
tems [87]. Goubin published an article wherein some of Coron™s proposed
countermeasures were counteracted by a re¬ned attack [151], which was anal-
ysed further in [313].
IV.5.1. Di¬erential SCA Without any Knowledge of the Implemen-
tation. Two di¬erential SCA attacks have been published which assume no
knowledge about the attacked device.
The SEMD (Single-Exponent Multiple-Data) attack assumes that an at-
tacker can perform the point multiplication operation with the secret value k
and some public value r with several elliptic curve points. Then, for both the
secret and the public values, the conducted measurements, tki and tri , are
averaged. The two average traces are subtracted from each other to produce a
di¬erence trace. Whenever this di¬erence trace shows a peak, the two values
k and r must have been di¬erent as well:
D = T¯k ’ T r.

Thus, by comparing the power traces from a known and an unknown scalar
an attacker can learn the unknown scalar.
The MESD (Multiple-Exponent Single-Data) attack works similarly. In
this scenario it is assumed that the attacker can feed the device with multiple
scalar values and one single elliptic curve point. Another assumption is that
the attacker is allowed to perform the point multiplication with the secret
scalar value and the same elliptic curve point several times. The measure-
ments for the secret scalar, tsi , are averaged to get rid of noise. Then, the
attacker determines the bits of the secret value step by step. For each choice
of a new bit, r0 or r1, the attacker calculates how well the choice ¬ts the
secret exponent:
D1 = T s ’ tr1
D0 = T s ’ tr0.
Whichever di¬erence trace is zero for a longer time corresponds to the correct
IV.5.2. Di¬erential SCA with Knowledge of the Implementation.
The two other published attacks work according to our general introduc-
tion in Section IV.3.3. The ZEMD (Zero-Exponent Multiple-Data) uses as a
statistical method the distance-of-mean test. The outputs of the model are
therefore a simple classi¬cation of the measurement data into two sets. For
one set, the model predicts a high side-channel leakage for a certain moment
in time, T h , while for the other set a low side-channel leakage T l is predicted.
Only if the model predicted the side-channel behaviour correctly, is the clas-
si¬cation into the two di¬erent sets meaningful and thus a di¬erence between
the sets must be visible:
¯ ¯
D = T h ’ T l.
Coron™s attack is essentially the same as the ZEMD attack, but it uses
the correlation coe¬cient instead of the distance-of-mean test. Both attacks
assume that an attacker knows the internal representation of the elliptic curve
points and that a prediction of the side-channel output is possible.
IV.5.3. Di¬erential SCA Using Address Information. As explained
in Section IV.3.3, an attacker tries to include as much information as pos-
sible in her hypothetical model. This information is usually related to the
intermediate data, but it can also be related to the intermediate addresses.
In some devices, loading data from a speci¬c address, i.e., performing data =
R[Address], is composed into two stages. First, the physical address is de-
termined given R[Address] and then the data that is stored in this address
are loaded. As addresses also travel over a bus (as data travel over a bus),
di¬erent addresses can be expected to leak di¬erent information. In [175],
this assumption has been veri¬ed for a certain device and an attack based
on it is presented. This attack breaks implementations of EC schemes on

that device, which resist typical di¬erential SCA attacks based on predicting
intermediate data by using a Montgomery point multiplication algorithm (see
Algorithm V.2). In Step 4 of this algorithm, data is fetched from a partic-
ular address that depends on a particular bit of the key. Hence, when for
example performing an SEMD attack, the attacker should see a peak in the
side-channel trace whenever two key bits di¬er.
IV.5.4. Re¬ned Di¬erential SCA. If projective coordinates are used in
an implementation, they can be used to randomize the intermediate data;
di¬erential SCA as described in the previous sections cannot be used di-
rectly. However, under the assumption that the multiplier is not randomized,
exploiting the properties of so-called special points can lead to an attack;
see [151]. A special point P0 = O is a point having the property that one
of the a¬ne or projective coordinates is 0. Hence, randomization does not
a¬ect this property.
Suppose the attacker already knows the highest bits d ’1 , . . . , dj+1 of k. To
attack the next bit dj in Algorithm IV.3, the attacker needs to feed suitable
multiples P1 of P0 into the algorithm. Such a multiple is calculated based on
the (guessed) key bit dj . If this guess is correct, then this multiple is nicely
related to the content of the registers in the (j + 1)th step: If we denote the
value of R1 in the (j + 1)th step by [x]P , then, if dj is correct, the special
point is [x’1 ]P (we have to assume that x is relatively prime to the order of
P ). Hence, in the (j + 1)th step of the algorithm with input P1 , the point
[xx’1 ]P0 = P0 is processed. The special property of this point can then be
picked up from several side-channel traces by averaging over the traces.
In [6], an extension to [151] is described. This extended attack is based on
the observation that, even if a point does not have a zero coordinate, some of
the intermediate values that occur during point addition (or point doubling)
might become zero.

Defences Against Side-Channel Analysis

M. Joye

V.1. Introduction
This chapter is aimed at studying the resistance of elliptic curve cryp-
tosystems against side-channel analysis. The main operation for elliptic curve
cryptosystems is the point multiplication: Q = [k]P , where the multiplier k is
generally a secret (or private) parameter. In particular, this chapter surveys
various software strategies to prevent side-channel attacks in the computation
of Q = [k]P .
Given the rich mathematical structure of elliptic curves, an endless num-
ber of new countermeasures can be imagined. More importantly, as we will
see, e¬cient countermeasures (i.e., with almost no impact on the running time
and/or the memory requirements) are known. The advantage is clearly on
the end-user side (not on the attacker side). We must also keep in mind that
present-day cryptographic devices come equipped with a variety of hardware
countermeasures, already making harder (or impossible) side-channel attacks.

As given in textbooks, the formul¦ for doubling a point or for adding
two (distinct) points on a Weierstraß elliptic curve are di¬erent. So, for
example, from the distinction between the two operations, a simple power
analysis (i.e., a simple side-channel analysis using power consumption as a
side-channel; cf. Section IV.4) might produce di¬erent power traces, revealing
the value of k in the point multiplication algorithm. There are basically three
approaches to circumvent the leakage. This can be achieved by:
1. unifying the addition formul¦ [51, 50] or considering alternative pa-
rameterizations [188, 220, 27];
2. inserting dummy instructions or operations [87, 76];
3. using algorithms that already behave “regularly” [225, 264, 253, 51,
178, 120].
Even if a point multiplication algorithm is protected against simple side-
channel analysis, it may succumb to the more sophisticated di¬erential side-
channel analysis [87, 206] (cf. Section IV.5). Practically, we note that, except
ECIES, very few elliptic curve cryptosystems are susceptible to such attacks
as, usually, the input point is imposed by the system and the multiplier is an
ephemeral parameter, varying at each execution.

In order to thwart di¬erential side-channel analysis, the inputs of the
point multiplication algorithm, namely, the base-point P and the multiplier
k, should be randomized [87, 190].
The insertion of random decisions during the execution of the point multi-
plication algorithm also helps in preventing side-channel analysis [271, 156].
Of course, several countermeasures can be combined for providing a higher
security level.

V.2. Indistinguishable Point Addition Formul¦
We start by recalling the explicit group law on elliptic curves (see [ECC,
Section III.2]). Let E denote an elliptic curve given by the Weierstraß form
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6 (V.1)
and let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) denote points on the curve. The
inverse of P1 is ’P1 = (x1 , ’y1 ’ a1 x1 ’ a3 ). If P1 = ’P2 , then the sum
P3 = P1 + P2 = (x3 , y3 ) is de¬ned as
x3 = »2 + a1 » ’ a2 ’ x1 ’ x2 , y3 = »(x1 ’ x3 ) ’ y1 ’ a1 x3 ’ a3 (V.2)
 y1 ’ y2
 when x1 = x2 ,
x ’ x
1 2
»= (V.3)
 3x2 + 2a2 x1 + a4 ’ a1 y1
 when x1 = x2 .
2y1 + a1 x1 + a3
From Eq. (V.3), it clearly appears that the formul¦ for adding two (dis-
tinct) points and for doubling a point are di¬erent.
This section discusses various ways for making indistinguishable the addi-
tion formul¦ on elliptic curves so that a side-channel analysis cannot reveal
what kind of addition is being processed.
V.2.1. Uni¬ed Addition Formul¦. In [51], Brier and Joye observed that
the expressions of slope » in Eq. (V.3) can be uni¬ed for adding or doubling
points. When P1 = ±P2 and provided that P1 and ’P2 do not have the
same y-coordinate (i.e., provided that y1 = ’y2 ’ a1 x2 ’ a3 ), we can write
» = x1 ’y2 = x1 ’y2 · y1 +y2 +a1 x2 +a3 and so obtain after a little algebra
y y
1 ’x2 1 ’x2 y1 +y2 +a1 x2 +a3

x2 + x1 x2 + x2 + a2 x1 + a2 x2 + a4 ’ a1 y1
1 2
»= . (V.4)
y1 + y2 + a1 x2 + a3
Remarkably, this expression for » remains valid when P1 = P2 and can thus be
used for doubling a point. Indeed, replacing (x2 , y2 ) by (x1 , y1 ) in Eq. (V.4),
3x2 +2a2 x +a4 ’a y
we get » = 1 2y1 +a1 x1 +a3 1 1 .
To include the uncovered case y(P1 ) = y(’P2 ), we can extend the previous
expression to
(x2 + x1 x2 + x2 + a2 x1 + a2 x2 + a4 ’ a1 y1 ) + (y1 ’ y2 )
»= 1 2
(y1 + y2 + a1 x2 + a3 ) + (x1 ’ x2 )

for any ∈ K [50]. For example, when char(K) = 2, we can set = 1 if
y1 + y2 + a1 x2 + a3 + x1 ’ x2 = 0 and = ’1 otherwise; when char(K) = 2, we
can respectively set = 1 and = 0. Another strategy consists in randomly
choosing ∈ K. The extended expressions for » prevent the exceptional
procedure attacks of [179].

Two main classes of elliptic curves are used in cryptography: elliptic
curves over large prime ¬elds and elliptic curves over binary ¬elds, i.e., ¬elds
of characteristic two. Avoiding supersingular curves, an elliptic curve over
F2n can be expressed by the short Weierstraß form

Ea2 ,a6 : y 2 + xy = x3 + a2 x2 + a6 .

From Eqs. (V.4) and (V.2), we see that the addition of two points requires
one inversion, four multiplies, and one squaring. Note that the multiplication
(x1 + x2 )(x1 + x2 + a2 ) can also be evaluated as (x1 + x2 )2 + a2 (x1 + x2 ).
Over the large prime ¬eld Fp (with p > 3), the Weierstraß equation sim-
pli¬es to
Ea4 ,a6 : y 2 = x3 + a4 x + a6 .
The addition of two points then requires one inversion, three multiplies, and
two squarings. As inversion in a ¬eld of large characteristic is a relatively
costly operation, projective representations of points are preferred [102]. A
projective point P = (XP : YP : ZP ) on the curve satis¬es the (projective)
Weierstraß equation

Ea4 ,a6 : Y 2 Z = X 3 + a4 XZ 2 + a6 Z 3

and, when ZP = 0, corresponds to the a¬ne point P = (XP /ZP , YP /ZP ).
The notation (X : Y : Z) represents equivalent classes: (X : Y : Z) and (X :
Y : Z ) are equivalent projective representations if and only if there exists a
non-zero ¬eld element θ such that X = θX , Y = θY , and Z = θZ . The
only point at in¬nity (i.e., with Z = 0) is the identity element O = (0 : 1 : 0),
and the inverse of P1 = (X1 : Y1 : Z1 ) is ’P1 = (X1 : ’Y1 : Z1 ). The sum of
two points P1 = (X1 : Y1 : Z1 ) and P2 = (X2 : Y2 : Z2 ) (with P1 , P2 = O and
P1 = ’P2 ) is given by P3 = (X3 : Y3 : Z3 ) with

X3 = 2F W, Y3 = R(G ’ 2W ) ’ L2 , Z3 = 2F 3 ,

where U1 = X1 Z2 , U2 = X2 Z1 , S1 = Y1 Z2 , S2 = Y2 Z1 , Z = Z1 Z2 , T = U1 +U2 ,
M = S1 + S2 , R = T 2 ’ U1 U2 + a4 Z 2 , F = ZM , L = M F , G = T L,
and W = R2 ’ G. Therefore, adding two points with the uni¬ed formul¦
(Eq. (V.4)) requires 17 multiplies plus 1 multiplication by constant a4 . When
a4 = ’1, we may write R = (T ’ Z)(T + Z) ’ U1 U2 and the number of
multiplies decreases to 16 [51].

V.2.2. Other Parameterizations. Every elliptic curve is isomorphic to a
Weierstraß form. Parameterizations other than the Weierstraß form may lead
to faster uni¬ed point addition formul¦. This was independently suggested
by Joye and Quisquater [188] and by Liardet and Smart [220]. We illustrate
the topic with elliptic curves de¬ned over a large prime ¬eld Fp , p > 3.

V.2.2.1. Hessian Form. Most elliptic curves containing a copy of Z/3Z
(and thus with a group order a multiple of 3) can be expressed by the (pro-
jective) Hessian form [309]1
H : X 3 + Y 3 + Z 3 = 3DXY Z . (V.5)
Let P1 = (X1 : Y1 : Z1 ) and P2 = (X2 : Y2 : Z2 ) denote points on the Hessian
curve H. The inverse of P1 is ’P1 = (Y1 : X1 : Z1 ), and the sum P3 = P1 + P2
is de¬ned as (X3 : Y3 : Z3 ) with
X3 = X2 Y12 Z2 ’ X1 Y22 Z1 , Y3 = X1 Y2 Z2 ’ X2 Y1 Z1 , Z3 = X2 Y2 Z1 ’ X1 Y1 Z2
2 2 2 2


. 3
( 10)