is at most one challenge for which there exists a response

such that CH, RSP) = 1. This property is called

strong soundness.

(3) The protocol is honest-verifier zero knowledge (HVZK),meaning there ex-

ists a probabilistic polynomial-time simulator S such that the following two

ensembles are computationally indistinguishable:

where W is any function that given an input in L returns a witness to its

membership in L, and is a random variable taking value

internal coin tosses and the sequence of messages it receives during an

interaction between prover (with auxiliary input and verifier on

common input

TEAM LinG

284 Mihir Bellare and Adriana Palacio

Fig. 3. HTP and pHTP. Verifier V of protocol HTP = (P, V) does not include the

highlighted portion. Verifier of protocol does.

If is a 3-round argument for an NP-complete language, meeting the three

conditions above, then we refer to as a canonical argument. In what

follows, we assume that we have such canonical arguments. They can be con-

structed in various ways. For example, a canonical argument can be constructed

by modifying the parallel composition of Blum™s zero-knowledge protocol for the

Hamiltonian circuit problem [5], as described in [11,12].

THE HADA-TANAKA PROTOCOL. Let be a canonical argument for an

NP-complete language L, and let DEC be the verifier™s decision predicate. The

Hada-Tanaka protocol HTP = (P, V) is described in Figure 3. Note V™s decision

predicate does not include the highlighted portion of its code.

We now observe that the HTP protocol is unsound. More precisely, there

exist canonical arguments such that the HTP protocol based on them does not

have negligible error. This is true for any canonical argument satisfying

the extra condition that for infinitely many there exists a commitment

for which there is a response to challenge 1 that will make

the verifier accept. There are many such canonical arguments. For instance, a

canonical argument satisfying this condition results from using an appropriate

encoding of group elements in Hada and Tanaka™s modification of the paral-

lel composition of Blum™s zero-knowledge protocol for the Hamiltonian circuit

problem.

TEAM LinG

285

The Knowledge-of-Exponent Assumptions

Proposition 3. Let HTP be the Hada-Tanaka protocol based on a canonical

argument satisfying the condition stated above. Then there exists a polynomial-

time prover for HTP that can make the verifier accept with probability one for

infinitely many common inputs not in L.

Proof (Proposition 3). Let be the canonical argument and let V be the

verifier of the corresponding protocol HTP. Consider a cheating prover that

on initial state selects an exponent uni-

formly at random, and sends as its commitment to verifier

V. Upon receiving a challenge (B, X), it checks if If not, it aborts.

Otherwise, it sends as its response to V. By the assumption about

protocol for infinitely many there exists an auxiliary input

such that

PROTOCOL PHTP. The above attack can be avoided by modifying the verifier

to include the highlighted portion of the code in Figure 3. We call the resulting

verifier The following guarantees that the protocol is sound

under KEA3, if the DLP is hard.

Theorem 2. If KEA3 holds, the DLA holds, and is a canonical 3-round

argument for an NP-complete language L, then as defined in

Figure 3 is a negligible-error argument for L.

PROOF OF THEOREM 2. The proof is almost identical to that of [12]. For com-

pleteness, however, we provide it.

Completeness follows directly from the completeness of protocol To

prove soundness, we proceed by contradiction. Assume that pHTP is not sound,

i.e., there is no negligible function such that the soundness condition in

Definition 1 holds with respect to We show that the DLP is easy under KEA3.

By the assumption that pHTP is not sound and a result of [2], there exists

a probabilistic polynomial-time algorithm such that the function

is not negligible. Hence there exists a probabilistic polynomial-time algorithm

a polynomial and an infinite set

such that for every

and such that is infinite.

Since takes an auxiliary input we may assume, without loss of generality,

that is deterministic. We also assume that, if is commit-

ment on input when the initial state is for some with

then (There exists a prover for which

2

We note that this set is finite since is a polynomial-time algorithm and

depends only on the first bits of where is the running time of

TEAM LinG

286 Mihir Bellare and Adriana Palacio

Fig. 4. Adversary for the KEA3 problem and adversary

for the DLP, for the proof of Theorem 2.

for every and this assumption holds.) We will use to

construct an adversary A for the KEA3 problem. By assumption, there exists for

it an extractor with negligible error bound. Using and we then present

a poly-size family of randomized circuits and show that it does not

have a negligible success bound. By Proposition 1, this shows that the DLP is

not hard.

Let such that We observe that K is

an infinite set. For each fix such that The poly-size

family of circuits is presented in Figure 4. Now, under KEA3,

there exists a poly-size family of circuits and a negligible function

v such that is an extractor for A with error bound v. For each let

where is commitment on input when

the initial state is Using we define the poly-size family of circuits

shown in Figure 4. The proof of the following is in [4].

TEAM LinG

The Knowledge-of-Exponent Assumptions 287

Claim 2. For infinitely many there exists such that for every

Claim 2 implies that J does not have a negligible success bound, which, by

Proposition 1, shows that the DLP is not hard, contradicting the assumption

made in this Theorem.

ZERO KNOWLEDGE OF PHTP. Having modified HTP, we need to revisit the zero

knowledge. Hada and Tanaka proved that if the canonical argument is HVZK

(property (3) above) then HTP is zero knowledge under KEA1. However, we

observe that pHTP modifies only the verifier, not the prover. Furthermore, only

the decision predicate of the verifier is modified, not the messages it sends. This

means that the view (i.e., the internal coin tosses and the sequence of messages

received during an interaction with a prover P) of verifier of pHTP is identical

to that of verifier V of HTP. Thus, zero knowledge of pHTP follows from zero

knowledge of HTP, and in particular is true under the same assumptions, namely

KEA1.

SUMMARY. In summary, pHTP is a 3-round protocol that we have shown is a

negligible-error argument for NP assuming DLA and KEA3, and is ZK assuming

KEA1. Given Proposition 2, this means we have shown that assuming DLA and

KEA3 there exists a 3-round negligible-error ZK argument for NP.

Acknowledgments

Mihir Bellare is supported in part by NSF grants CCR-0098123, ANR-0129617

and CCR-0208842, and by an IBM Faculty Partnership Development Award.

Adriana Palacio is supported in part by a NSF Graduate Research Fellowship.

Proposition 2 is due to Shai Halevi and we thank him for permission to

include it. We thank the Crypto 2004 referees for their comments on the paper.

References

1. B. BARAK. How to go beyond the black-box simulation barrier. Proceedings of

the 42nd Symposium on Foundations of Computer Science, IEEE, 2001.

2. M. BELLARE. A note on negligible functions. Journal of Cryptology, Vol. 15,

No. 4, pp. 271“284, June 2002.

3. M. BELLARE AND S. MICALI. Non-interactive oblivious transfer and applications.

Advances in Cryptology “ CRYPTO ™89, Lecture Notes in Computer Science

Vol. 435, G. Brassard ed., Springer-Verlag, 1989.

4. M. BELLARE AND A. PALACIO. The Knowledge-of-Exponent assumptions and

3-round zero-knowledge protocols. Full version of the current paper, available via

http://www-cse.ucsd.edu/users/mihir.

5. M. BLUM. How to prove a theorem so no one else can claim it. Proceedings of

the International Congress of Mathematicians, pp. 1444“1451, 1986.

TEAM LinG

Mihir Bellare and Adriana Palacio

288

6. G. BRASSARD, D. CHAUM AND C. CRÉPEAU. Minimum disclosure proofs of

knowledge. J. Computer and System Sciences, Vol. 37, No. 2, pp. 156“189, Octo-

ber 1988.

7. I. DAMG…RD. Towards practical public-key cryptosystems provably-secure against

chosen-ciphertext attacks. Advances in Cryptology “ CRYPTO ™91, Lecture Notes

in Computer Science Vol. 576, J. Feigenbaum ed., Springer-Verlag, 1991.

8. A. FIAT AND A. SHAMIR. How to prove yourself: Practical solutions to identifi-

cation and signature problems. Advances in Cryptology “ CRYPTO ™86, Lecture

Notes in Computer Science Vol. 263, A. Odlyzko ed., Springer-Verlag, 1986.

9. O. GOLDREICH AND H. KRAWCZYK. On the Composition of Zero Knowledge

Proof Systems. SIAM J. on Computing, Vol. 25, No. 1, pp. 169“192, 1996.

10. S. GOLDWASSER, S. MICALI AND C. RACKOFF. The knowledge complexity of

interactive proof systems. SIAM Journal of Computing, Vol. 18, No. 1, pp. 186“

208, February 1989.

11. S. HADA AND T. TANAKA. On the existence of 3-round zero-knowledge protocols.

Advances in Cryptology “ CRYPTO ™98, Lecture Notes in Computer Science

Vol. 1462, H. Krawczyk ed., Springer-Verlag, 1998. [Preliminary version of [12].]

12. S. HADA AND T. TANAKA. On the existence of 3-round zero-knowledge protocols.

Cryptology ePrint Archive: Report 1999/009, March 1999. http://eprint.iacr.

org/1999/009/. [Final version of [11].]

13. M. LEPINSKI. On the existence of 3-round zero-knowledge proofs. SM Thesis,

MIT, June 2002.

http://theory.lcs.mit.edu/˜cis/theses/lepinski-masters.ps.

14. M. LEPINSKI AND S. MICALI. On the existence of 3-round zero-knowledge proof

systems. MIT LCS Technical Memo. 616, April 2001. http://www.lcs.mit.edu/

publications/pubs/pdf/MIT-LCS-TM-616.pdf.

15. M. NAOR. On cryptographic assumptions and challenges. Invited paper and talk,

Advances in Cryptology “ CRYPTO ™03, Lecture Notes in Computer Science

Vol. 2729, D. Boneh ed., Springer-Verlag, 2003.

16. K. SAKURAI AND T. ITOH. On the discrepancy between serial and parallel of zero-

knowledge protocols. Advances in Cryptology “ CRYPTO ™92, Lecture Notes in

Computer Science Vol. 740, E. Brickell ed., Springer-Verlag, 1992.

A KEA3 Implies KEA1

We recall KEA1, following [12], but applying the same simplifications as we did

for KEA2 so as to merge their two conditions into one:

Assumption 4. [KEA1] Let and be families of

circuits, and v: a function. We associate to any any

and any the following experiment:

Experiment

If AND then return 1 else return 0

We let

TEAM LinG

The Knowledge-of-Exponent Assumptions 289

denote the advantage of A relative to on inputs We say that is a

kea1- extractor for A with error bound v if

We say that KEA1 holds if for every poly-size family of circuits A there exists

a poly-size family of circuits and a negligible function v such that is a

keal-extractor for A with error bound v.

Proof (Proposition 2). Let A be an adversary (poly-size family of circuits) for

KEA1. We need to show there exists a negligible function v and a poly-size

family of circuits such that is a keal-extractor for A with error-bound v.

We begin by constructing from A the following adversary for KEA3:

Adversary

Return (C, Y)

We have assumed KEA3. Thus there exists a negligible function v and an ex-

tractor such that is a kea3-extractor for with error bound v. Now we

define an extractor for A as follows:

Extractor

Return

We claim that is a keal-extractor for A with error bound v. To see this, as-

sume is successful, meaning Then

is successful as well.

TEAM LinG

Near-Collisions of SHA-0

Eli Biham and Rafi Chen

Computer Science Department

Technion “ Israel Institute of Technology

Haifa 32000, Israel

{biham,rafi_hen}@cs.technion.ac.il

http://www.cs.technion.ac.il/˜biham/

Abstract. In this paper we find two near-collisions of the full compres-

sion function of SHA-0, in which up to 142 of the 160 bits of the output

are equal. We also find many full collisions of 65-round reduced SHA-0,

which is a large improvement to the best previous result of 35 rounds.

We use the very surprising fact that the messages have many neutral

bits, some of which do not affect the differences for about 15“20 rounds.

We also show that 82-round SHA-0 is much weaker than the (80-round)

SHA-0, although it has more rounds. This fact demonstrates that the

strength of SHA-0 is not monotonous in the number of rounds.

1 Introduction

SHA-0 is a cryptographic hash function, which was issued as a Federal Infor-

mation Processing Standard (FIPS-180) by NIST in 1993 [8]. It is based on the

principles of MD4 [12] and MD5 [13]. The algorithm takes a message of any

length up to bits and computes a 160-bit hash value. A technical revision,

called SHA-1, which specifies an additional rotate operation to the algorithm,

was issued as FIPS-180-1 [9] in 1995. The purpose of the revision according to

NIST is to improve the security provided by the hash function.

Finding collisions of hash functions is not an easy task. The known cases

of successful finding of collisions (such as the attack on Snefru [14, 2], and the

attack on MD4 [12,4]) are rare, and use detailed weaknesses of the broken func-

tions. It is widely believed that finding near-collisions (i.e., two messages that

hash to almost the same value, with a difference of only a few bits) are as diffi-

cult, or almost as difficult, as finding a full collision. The Handbook of Applied

Cryptography [7] defines near-collision resistance by

near-collision resistance. It should be hard to find any two inputs

such that and differ in only a small number of bits.

and states that it may serve as a certificational property. In some designs of hash

functions, such as SHA-2/224 [10], SHA-2/384 [11], and Tiger [1], the designers

that wish to allow several hash sizes for their design, base the version with the

smaller size on the one with the larger size, and discard some of the output bits,

thus showing the confidence of the designers in the difficulty of finding near-

collisions. Near-collisions were also used in the cryptanalysis of MD4 [15,4].

M. Franklin (Ed.): CRYPTO 2004, LNCS 3152, pp. 290“305, 2004.

© International Association for Cryptologic Research 2004

TEAM LinG

Near-Collisions of SHA-0 291

Near-collisions are the simplest example of forbidden relations between outputs

of the hash function. Another proposed forbidden relation of the hash results

is division intractability [5] where finding messages hashed to a divisor of other

hashes should be difficult.

In [3] Chabaud and Joux proposed a theoretical attack on the full SHA-0

with complexity of Using their technique they found a collision of SHA-0

reduced to 35 rounds.

In this paper we improve over the results of [3], and present attacks with

lower complexities. We present collisions of 65-round reduced SHA-0, and near-

collisions of the full compression function of SHA-0 in which up to 142 of the

160 bits of the hash value are equal. We use the very surprising observation that

many bits of the message are neutral bits, i.e., they do not affect the differences of

the intermediate data for 15“20 rounds. We observe that the strength of SHA-0 is

not monotonous, i.e., collisions of 82 rounds are easier to find than of 80 rounds,

and use it in our search for near-collisions. We also present several observations

on variants of SHA-0.

A comparison of Chabaud and Joux™ results with our results is given in

Table 1.

Table 2 shows the complexity of finding collisions of reduced and extended

SHA-0, as a function of the number of rounds. The table demonstrates that

the strength of SHA-0 is not monotonous with the number of rounds. In the

complexity calculations we assume that for the extended SHA-0, the additional

rounds after the original 80 rounds are performed with the function being

XOR, like in rounds 60, ..., 79 that preceed them. We also assume that the first

22 rounds can be gained for free by using the neutral bits.

A comparison between finding near-collisions using a generic attack and our

attack is given in Table 3. Note that the generic attack hashes a large number

of random messages, all of them are then kept in memory. Due to the birth-

day paradox, it is expected to have a collision or near-collision with complexity

(number of messages) about

TEAM LinG

292 Eli Biham and Rafi Chen

where is the Hamming weight of the difference. As this attack is generic, it

uses no special properties on SHA-0, and thus cannot be used to gain insight on

its design.

This paper is organized as follows: Section 2 describes the SHA-0 algorithm,

and a few notations. Section 3 describes the attack of Chabaud and Joux. Our

improved attack is presented in Section 4. Two pairs of near-collisions of the

compression function of SHA-0 and full collision of 65-round reduced SHA-0 are

given in section 5. Section 6 describes small variations of SHA-0 that largely

affect its security. Finally, Section 7 summarizes the paper.

2 Description of SHA-0

SHA-0 hashes messages of any length in blocks of 512 bits, and produces a

message digest of 160 bits.

1. The message is padded with a single bit ˜1™, followed by 0“511 bits ˜0™,

followed by a 64-bit representation of the message length, where the number

of zeroes is selected to ensure the total length of the padded message is

a multiple of 512 bits. The padded message is divided to 512-bit blocks

2. A 5-word buffer is initialized to

TEAM LinG

Near-Collisions of SHA-0 293

3. Each block in turn is subjected to the compression function, along with

the current value of the buffer The output is a new value for

4. is the output of the hash function.

The compression function is:

1. Divide the 512-bit block to 16 32-bit words

2. Expand the 16 words to 80 words by the recurrence equation:

We denote expansion of a block to 80 words by this equation by exp(·), and

note that

3. Divide to the five registers A, B, C, D, and E by

4. Iterate the following round function 80 times

where the functions and constants used in each round are described in Ta-

ble 4.

5. The output of the compression function is

In the remainder of the paper we consider only 512-bit messages and only the

first application of the compression function. We denote the bit of by

and similarly we denote the bits of and by

and We also use the notation to denote the output of

in round and denotes the bit of

TEAM LinG

294 Eli Biham and Rafi Chen

3 Description of Chabaud and Joux Attack

In the attack of Chabaud and Joux [3] messages are constructed with specific

differences, such that the effect of the differences of the messages on the difference

of the registers A, ..., E can be canceled within a few rounds. The cancellation

is performed by applying correcting patterns by additional differences in the

messages.

The attack is initiated by a selection of a difference that is later used as the

difference of the two colliding messages. The difference is selected with various

disturbances and corrections, where the corrections are additional differences

used to correct the differences caused by the disturbances. The disturbances are

always selected in bit 1 of the message words. Due to the rotations by 5 and

30 bits in the round function, corrections are made in bits 1, 6, and 31 of the

words. These disturbances and corrections are aimed to limit the evolution of

differences to other bits. The result is that in an expected run, and can

only differ in bit 1 (i.e., and each time they differ,

they cause differences in the other registers in the following rounds, which are

then corrected by differences of the messages (or

A disturbance starts by setting bit 1 in one of the input words of as

the complement of the corresponding bit of M. We now show how applying a

correction sequence on bits 6, 1, 31, 31, 31 on the following words may cancel

the differences at the end of the sequence. Suppose the initial disturbance is in

This input difference causes registers A and to differ at bit 1. On

each consequent round the difference moves to the next register (B, C, D or E),

while the corrections of bits 6, 1, 31, 31, 31 in the input words

respectively, keep registers A and equal in these rounds. After this sequence

of a single disturbance and five corrections, the registers™ contents are equal. By

generating from M by applying this mask, and calculating the difference of

A and at each round we can get the differences described in Table 5 with a

non negligible probability. The table describes a disturbance with and

and the required corrections. A similar disturbance and corrections

can be applied for a ˜1™ to ˜0™ difference. The notation refer to a change

TEAM LinG

Near-Collisions of SHA-0 295

where a bit is ˜0™ in W and ˜1™ in The notation means that there is a

change either from ˜0™ to ˜1™ or from ˜1™ to ˜0™.

Let be a vector of 80 words, which correspond to the 80 rounds of the

compression function. Each word in the vector is set to ˜1™ if there is a disturbance

in the corresponding round, and is set to ˜0™ otherwise. We call this vector the

disturbance vector. Since getting a collision for the full function requires five

correcting rounds, full collisions require the last five words of the disturbance

vector to be zero (but for near-collisions this property is not required). Let

be the vector of 80 words received by prepending zero words to the

first words of (i.e., a non-cyclic shift operation of the words). Then,

the corrections are made in bit 6 in the rounds which correspond to non-zero

words in in bit 1 in and in bit 31 in and and

Thus, the expansion of to 80 round can be written in the form

where denotes shift of each word of the vector separately. In addition, since

is expanded by the linear feedback shift register of Equation (1), the dis-

turbance vector is also generatable by this linear feedback shift register. See [3]

for additional details on the attack, and the additional required constraints.

We expect that the value of be if all the corrections

succeed (i.e., only disturbances in the current round affect the difference after

the round). Thus, the vector of the expected values of

which we denote by is

(note that the indices of are 1, ..., 80, rather than 0, ..., 79).

As the correction process is probabilistic, and assuming each disturbance has

the same probability for correction, we are interested in the disturbance vector

with the least Hamming weight for getting the least search complexity (but

note that the correction probabilities vary, and depend on the used in the

correction rounds).

4 Our Improved Attack

Our attack is based on the attack of Chabaud and Joux with enhancements that

increase the probability of finding collisions and near-collisions.

The main idea is to start the collision search from some intermediate round,

thus eliminating the probabilistic behavior of prior rounds. In order to start the

collision search from round we build a pair of messages M and with a

difference and with the two additional properties described below.

Before we describe these properties we wish to make the following definitions:

TEAM LinG

296 Eli Biham and Rafi Chen

Definition 1. Given the difference of two messages, the attack of Chabaud

and Joux defines the expected differences of the values of register A in each

round. We say that a pair of messages conforms to if for every

(which means that the differences at the output of the first

rounds 0, ..., are as expected).

Definition 2. Let M and be a pair of messages that conforms to for some

We say that the bit of the messages is a neutral bit

with respect to M and if the pair of messages received by complementing the

bits of M and also conform to We say that the pair of the and

bits is neutral with respect to M and if all the pairs of messages received by

complementing any subset of these bits or in both messages M

and also conform to We say that a set of bits is neutral

with respect to M and if all pairs of messages received by complementing

any subset of the bits in S in both messages M and also conform to We

say that a subset of the bits of the messages is a 2-neutral set

with respect to M and if every bit in S is neutral, and every pair of bits in

S is neutral.

We denote the size of the maximal 2-neutral set (for given messages and

by We are now ready to describe the two additional properties:

1. The message pair conforms to Having the required sequence of

implies that all other differences (i.e., are

also as required.

2. The message pair has a large-enough 2-neutral set of bits. We expect that a

large fraction of the subsets of the bits in the 2-neutral set are also neutral.

Given a pair of messages with these properties, we can construct a set of

message pairs by complementing subsets of the bits of the 2-neutral set. Since a

large fraction of these pairs conform to while the probability of random pairs

is much smaller, it is advisable to use these pairs for the attack.

How and are determined? Starting the search from round we can

calculate the probability

of successful corrections in all the rounds given messages that conform to

(where is the probability of successful corrections in round or 1 if no cor-

rection is performed). When the disturbance vector has zeroes at the last five

rounds, is the probability for getting a collision (otherwise, a near-collision

is expected). The number of conforming pairs we need to test is expected to be

about Since every subset of neutral bits can be used, we can try

pairs using with these bits. Thus, we should select that satisfies

In fact, we select the largest that satisfies this inequality.

TEAM LinG

Near-Collisions of SHA-0 297

4.1 Finding 2-Neutral Sets of Bits of a Given Pair

The following algorithm finds a 2-neutral set of bits. The input to the algorithm

is a pair of messages M, with a difference that conforms to The

algorithm generates 512 candidate pairs by complementing single bits in M,

(leaving their difference unchanged). Let denote a message

whose value has a single bit ˜1™, and 511 bits ˜0™, where the bit ˜1™ is in the

location. The candidate pairs can be written by

Each candidate pair is tested to conform to If a candidate pair conforms to

then bit is a neutral bit.

In order to find a 2-neutral set of bits we define a graph whose vertices cor-

respond to the neutral bits. We then add an edge for each pair of bits whose

simultaneous complementations does not affect conformance. This graph de-

scribes all the bits whose complementation does not affect conformance, and

all the pairs of these bits whose simultaneous complementations does not affect

conformance. We are now interested to find the maximal clique (or an almost

maximal clique) in this graph, i.e., the maximal subset of vertices for which any

vertex in the subset is connected to any other vertex in the subset by an edge.

Although in general finding a maximal clique is an NP-complete problem, in our

case finding a large enough clique is not difficult, as many vertices are connected

to all other vertices by edges.

We are now ready to make some very important observations, on which the

success of our attack is based:

Observation 1 When we perform a search with the set of message pairs,

about 1/8 of the pairs (i.e., about pairs) conform to

Let

be the probability that a pair that conforms to also conforms to and notice

that

Observation 2 Let and be some rounds where By

trying the generated message pairs, we get the expected number of pairs

conforming to but surprisingly a fraction of the pairs that conform to

also conform to which we would expect to get with a larger set of about

where and

In the actual attack we improve the algorithm further by searching for pairs

of non-neutral bits whose simultaneous complementation create pairs that also

conform to (and similarly search for triplets of bits, or larger sets of bits).

Using this method we receive a larger number of neutral “bits” that can be used

for our analysis with higher rounds.

TEAM LinG

298 Eli Biham and Rafi Chen

An example of a pair of messages with its neutral set of bits, is given in

Table 6. In this example and the size of the neutral set is

In particular, the quadruplet 229 137 108 71 consists of bits of rounds 7, 4, 3,

and 2, so the changes at round 2 are successfully corrected by the changes in the

other rounds so the difference is unaffected for 20 rounds, and even from round

7 there are 15 additional rounds whose difference is not affected.

Observation 3 In many cases pairs of bits that are simultaneously neutral, but

each bit is not, are of the form for small Similarly triplets

(and quartets, etc.) of non-neutral bits, whose simultaneous complementation is

neutral are of the same form, i.e., and for two different small

We call such sets of bits simultaneous-neutral sets, and in case of pairs of

bits simultaneous-neutral pairs.

4.2 Finding a Pair with a Larger 2-Neutral Set

For the attack, we are interested in finding a message pair with a maximal 2-

neutral set of bits. Assume that we are already given a pair conforming to We

are now modifying this pair slightly in order to get another pair that conforms

to with a larger 2-neutral set of bits.

This algorithm takes the given message pair as a base, modifies it in a certain

way that we describe later, and calls the algorithm that finds the 2-neutral set

TEAM LinG

Near-Collisions of SHA-0 299

of the new pair. If the size of this set is larger than the set of the base pair, the

base pair is replaced by the new pair, and the algorithm restarts with the new

pair as the base.

By modifying the current message pair we create a new pair that hopefully

conforms to The modifications are made in bits that maximize the probability

of success. In order to create a new conforming pair, we modify several neutral

bits (and simultaneously-neutral sets of bits), and check whether the resultant

pair conforms to

In some cases we can improve further. In rounds where bit 1 differs, i.e.,

the carry from bit 1 to the next can create a difference in the next

bit. The probability for this carry to make this difference is 1/2. In such case

and thus the new pair does not conform to

Observation 4 If the differences of the carry is changed, the change can be

canceled by complementing and or by complementing other bits in the

message that affect indirectly.

Such bits are also and (which affect and then after the

rotate operation), or and or and

for other small Each such complementation has probability 1/2 to cancel the

difference in the carry.

This algorithm can be simplified as follows: The algorithm takes as an in-

put a message and modifies a few subsequent bits in several subsequent words,

with the shift of five bits as mentioned above. For example, the modified bits

cover all (non-empty) subsets of

Then, the pattern of modification is shifted by all 31 possible

rotations. Finally, we proceed and make the same analysis starting from

then etc. The modification process ends when the algorithm starts with

This simplification lacks consideration of some optimizations and details

given earlier, whose incorporation is vital for an optimized implementation.

4.3 Increasing the Number of Conforming Rounds

In order to start the search at a higher round we need to construct a pair that

conforms to where This pair is constructed using the last pair with the

maximal number of neutral bit we have. The pair undergoes small modifications

of the form described above. Once a message conforms to is found, we use

the algorithms described in Subsections 4.1 and 4.2 to find a 2-neutral set, and

then to find a pair with the largest 2-neutral set.

4.4 Final Search

After computing the 2-neutral set, we start the final search by complementing

sequentially every subset of the bits in the 2-neutral set (a total of trials).

Since a large fraction of the resulting pairs of messages conform to then the

search effectively starts at round If in addition then we expect

TEAM LinG

300 Eli Biham and Rafi Chen

to find a collision or a near-collision, depending on the expected difference after

rounds. If for some then we expect to find a collision

(or a near-collision) of rounds reduced (or extended) SHA-0.

5 Results

In our search we used that is optimized for finding 82-round collisions (thus

also near-collisions of 80 rounds). This is not suitable for finding full collisions

of 80 rounds, as it has two disturbances at the last five rounds. However, its

corresponding 80-round probability is much higher than the probability of a

that allows a full collision. Although this cannot provide full collisions,

it can lead to collisions of 65-round reduced SHA-0 and of 82-round extended

SHA-0. The overall probability of successful corrections in 82-round SHA-0 is

A probability summary for each set of 20 consecutive rounds

(i.e., the IF, XOR, MAJ, XOR rounds) is described in Table 7 (in rounds 80 and

81 the probability is 1 if Using our technique with the

overall probability is reduced to Our algorithm finds a 2-neutral set with

40 neutral and simultaneous-neutral bits (see Table 6), thus we expect to find

near-collisions of the compression function after 73 rounds in two computation

days on a PC. Our actual findings (using an earlier set of neutral bits) are near-

collisions of the compression function with a difference of only three bits (of

after 76 rounds (that still conform to which are also

near-collisions of the full compression function (but do not conform to and

full collisions of 65-round reduced SHA-0. The near collisions were found after

about a day of computation for each pair, which is equivalent to a search with

a complexity of Finding 65-round near-collisions take about half an hour.

Two such pairs of messages (in 32-bit hex words) are:

1. AC418FC2 415D5A54 6FFA5AA9

5EE5A5F5 7621F42D 8AE2F4CA F7ACF74B

B144B4E1 5164DF45 C61AD50C D5833699

6F0BB389 B6468AC5 4D4323F9 86088694

AC418FC2 415D5A54 6FFA5AAB

5EE5A5B5 7621F42F 0AE2F4CA 77ACF74B

3144B4E3 5164DF05 C61AD50C 558336D9

EF0BB38B B6468AC7 CD4323B9 06088696

TEAM LinG

Near-Collisions of SHA-0 301

F0722904 009D8999 5AFB3337

2.

37D5D6A8 9E843D80 69229FB9 06D589AA

4AD89B67 CFCCCD2C A9BAE20D 6F18C150

43F89DA4 2E54FE2E AE7B7A15 80A09D3D

F0722904 009D8999 5AFB3335

37D5D6E8 9E843D82 E9229FB9 86D589AA

CAD89B65 CFCCCD6C A9BAE20D EF18C110

C3F89DA6 2E54FE2C 2E7B7A55 OOA09D3F

The differences of the results of hashing and with the full SHA-0 are

described in Table 8 along with the number of differing bits. Tables 9 and 10

show detailed information of the evolution of differences in each round of the

compression function, including the expanded messages, their differences, the

differences the probability of conformance of each round (in log

form), and the rounds where the values collide, or the number of differing bits

of the five registers. Both messages collide after 65 rounds, and have only small

differences afterwards. If we consider SHA-0 reduced to 76 rounds, our results

show a near collision with difference of only three bits before the feed forward

and three and four bits difference after the feed forward when using and

6 SHA-0 Variants

In this section we analyze some variants of SHA-0 that show strengths and

weaknesses of the hash function.

6.1 Increasing the Number of Rounds

There are that lead to collision after 82 rounds, whose probability

is considerably larger than the probability of the best that leads

to an 80-round collision. Therefore, increasing the number of rounds of SHA-0

from 80 to 82 would make it much easier to find collisions.

TEAM LinG

302 Eli Biham and Rafi Chen

TEAM LinG

Near-Collisions of SHA-0 303

TEAM LinG

Eli Biham and Rafi Chen

304

6.2 Different Order of Functions

Modifying the order of the functions can reduce the complexity of the at-

tack. For example, if the order would be IF, XOR, MAJ, XOR, ..., IF, XOR,

MAJ, XOR, where in each round the function changes, the restrictions caused

by two consecutive IF round would be removed, and thus with much higher

probabilities could be chosen.

6.3 SHA-1

Since in SHA-1 Equation (1) is replaced by

which makes the mixing of the message bits much more effective, and since the

techniques used in this paper uses the properties inherited from equation (1),

the presented attacks are not applicable to SHA-1.

7 Summary

In this paper we described how to find near-collisions of SHA-0 using the surpris-

ing existence of many neutral bits. The near-collisions were found within a day

on our PC. Our technique also improves the complexity of finding full collisions

of SHA-0, but we concentrated on near-collisions due to the very low complexity

of finding them. The observation that the strength of SHA-0 is not monotonous

with the number of rounds is used here to find near-collisions of 80 rounds by

applying the much more efficient attack on SHA-0 extended to 82 rounds. We

expect that finding full collisions will take a month of computation time, and

intend to check it in the continuation of our research. Due to the additional

rotate operation, the results of this paper are not applicable to SHA-1.

References

1. Ross Anderson, Eli Biham, Tiger: a Fast New Hash Function, proceedings of Fast

Software Encryption, LNCS 1039, pp. 89“97, Springer Verlag, 1996.

2. Eli Biham, Adi Shamir, Differential Cryptanalysis of Snefru, Khafre, REDOC-II,

LOKI and Lucifer, Advances in Cryptology, proceedings of CRYPTO ™91, LNCS

576, pp. 156“171, 1992.

3. Florent Chabaud, Antoine Joux, Differential Collisions in SHA-0, Advanced in

Cryptology, proceedings of CRYPTO ™98, LNCS 1462, pp. 56“71, Springer Verlag,

1999.

4. Hans Dobbertin, Cryptanalysis of MD4, Journal of Cryptology, Vol. 11 pp. 253“

271, 1998.

5. Rosario Genaro, Shai Halevi, Tal Rabin, Secure Hash-and-Sign Signatures Without

the Random Oracle, Advanced in Cryptology, proceedings of EUROCRYPT™99,

LNCS 1592, pp. 123“139, 1999.

TEAM LinG

Near-Collisions of SHA-0 305

6. Antoine Joux, private communications, 2004.

7. Alfred Menezes, Paul van Oorschot, Scott Vanstone, Handbook of Applied Cryp-

tography, CRC Press, 1997.

National Institute of Standards and Technologies, Secure Hash Standard, Federal

8.

Information Processing Standards Publication, FIPS-180, May 1993.

9. National Institute of Standards and Technologies, Secure Hash Standard, Federal

Information Processing Standards, Publication FIPS-180-1, April 1995.

10. National Institute of Standards and Technologies, FIPS 180-2 Secure Hash Stan-

dard, Change Notice 1, Federal Information Processing Standards Publication,

FIPS-180-2, December, 2003.

11. National Institute of Standards and Technologies, Secure Hash Standard, Federal

Information Processing Standards Publication, FIPS-180-2, August 2002.

12. Ron Rivest, The MD4 Message-Digest Algorithm, Network Working Group Request

for Comments: 1186, October 1990.

13. Ron Rivest, The MD5 Message-Digest Algorithm, Network Working Group Request

for Comments:1321, April 1992.

14. Ralph Merkle, A Fast Software One- Way Hash Function, Journal of Cryptology,

Vol. 3, No. 1, pp. 43“58, 1990.

15. Serge Vaudenay, On the Need for Multipermutation: Cryptanalysis of MD4 and

SAFER, proceedings of Fast Software Encryption, Second International Workshop,

LNCS 1008, pp. 286“297, Springer-Verlag, 1995.

TEAM LinG

Multicollisions in Iterated Hash Functions.

Application to Cascaded Constructions

Antoine Joux

DCSSI Crypto Lab

51, Bd de Latour-Maubourg

75700 Paris 07 SP, France

antoine.joux@m4x.org

Abstract. In this paper, we study the existence of multicollisions in it-

erated hash functions. We show that finding multicollisions, i.e.

of messages that all hash to the same value, is not much harder than

finding ordinary collisions, i.e. pairs of messages, even for extremely large

values of More precisely, the ratio of the complexities of the attacks

is approximately equal to the logarithm of Then, using large multi-

collisions as a tool, we solve a long standing open problem and prove

that concatenating the results of several iterated hash functions in or-

der to build a larger one does not yield a secure construction. We also

discuss the potential impact of our attack on several published schemes.

Quite surprisingly, for subtle reasons, the schemes we study happen to

be immune to our attack.

1 Introduction

One-Way hash functions are widely used cryptographic primitives, they operate

on messages of almost arbitrary length1 and output a fixed size value. Cryp-

tographic hash functions should satisfy many security properties, such as the

impossibility from a given hash to recover an associated message. However, the

main security requirement for a hash function is its collision resistance. Infor-

mally, given a good hash function, no attacker should be able to find a pair of

different messages M and leading to identical hash values. It is a well-known

fact that all hash functions suffer from a generic birthday paradox based attack.

More precisely, if H is a hash function that outputs values, then among the

hash values of different messages, there exists a collision with non negligible

probability. For this reason, hash functions that output values smaller than 160

bits are considered as deprecated. Yet, in the past, 128“bit hash functions were

proposed and for legacy reasons they are still encountered in applications.

In practice, building a cryptographic function with an input of variable size

is not a simple task. For this reason, most hash functions are based on an it-

erated construction that makes use of a so-called compression function, whose

inputs have fixed sizes. Examples of such a construction are Snefru [7], MD4 [12],

MD5 [13] or SHA [9]. In this paper, we specifically study one-way hash-functions

built by iterating a compression function.

1

The length is often bounded by a very large number such as However, this is

irrelevant for the attacks presented here.

M. Franklin (Ed.): CRYPTO 2004, LNCS 3152, pp. 306“316, 2004.

© International Association for Cryptologic Research 2004

TEAM LinG

Multicollisions in Iterated Hash Functions 307

Our main goal is to solve a long standing open problem: Is the concatenation

of two independent hash values more secure than a single hash-value ? This

question is of general interest and has appeared in many contexts. As far as we

know, this construction first appeared as a generic transform in the PhD thesis

of B. Preneel [10] and was called cascading. It was presented there as a mean to

increase the security level at the cost of a decreased performance.

In fact, this idea of cascading hash functions is likely to be encountered in

applications, for example, a construction called SHA-1x was used at some point

in PGP and involves the computation of two SHA-1 values with a different

set of initial constants. Similarly, the authors of RIPEMD [4] propose optional

extensions of their hash functions to 256 and 320 bits values. In this case, the use

of two hashing is extremely efficient since the original 128 and 160 bits algorithms

already involve two parallel hashing whose results are normally added together.

Yet, according to [4], one should not expect to improve the security level with

these constructions since unwanted dependencies between two slightly different

instances of the same hash function may yield unforeseen attacks. In the same

vein, a length doubling transform is suggested in the hash function chapter

of [14], together with a warning that while no attacks are known several people

have serious reservations about the construct.

As a consequence, the security of hash functions cascading is not very clear.

Roughly, the cryptographic folklore states that the construction is good when

two “independent” hash functions are cascaded. Clearly, this is true for random

oracles and the generalization seems natural. For reference about this folklore

knowledge, the interested reader may look up fact 9-27 in [6], that states that

such a cascade is secure and that one could hope for a security of the order of

the product of the security of the initial hash functions. However, we show in

section 4 that this construction is in fact insecure, whenever an iterated hash

function is involved in the cascading. Even cascading a 160-bit iterated hash

function and a 160-bit random oracle does not really increase security above the

initial level for collision resistance and above for preimage (or second

preimage) resistance.

In order to solve this problem and prove that cascading two hash values is in

fact insecure, we first address the simpler question of constructed multicollisions

in an iterated hash function. This notion of multicollisions was first used by

Merkle in [8] to study the security of a hash function based on DES. A related

security property, namely freeness, has been suggested as a useful

tool for building efficient cryptographic primitives. It was used for the micro-

payment scheme Micromint of Rivest and Shamir [11], for identification schemes

by Girault and Stern in [5] and for signature schemes by Brickell and al. in [1].

The intuition behind this problem is that constructing different messages with

the same hash values should be much harder than constructing only two such

messages. Once again, this is true when using random oracles. However, when

iterated hash functions are involved, this intuition is false and multicollisions

can be easily constructed.

The paper is organized as follows. In section 2 we recall some basic facts

about iterated hash function and the possible security properties of hash func-

TEAM LinG

308 Antoine Joux

tions. In section 3, we describe the basic attack for constructing multicollisions

in iterated hash functions. In section 4, we use this attack as a tool and show

that the security obtained when cascading several hash values is far from opti-

mal. Unintuitively, this attack works even when two completely unrelated hash

functions are cascaded and does not stem from any unforeseen correlation be-

tween similar hash functions. Finally, in section 5, we study the impact of our

construction on several concrete schemes that rely on cascading or multicolli-

sion resistance. Very surprisingly, in all published examples, we encounter some

obstruction which prevents the attack from working.

2 Basic Facts About Iterated Hash Functions

An iterated hash function H is built by iterating a basic compression function.

The compression function takes two inputs, a chaining variable and a message

block, it outputs the next value of the chaining variable. Before processing, the

message is first padded and split into elementary blocks. The padding itself is

generally performed by appending a single ˜1™ bit, followed by as many ˜0™ bits

as needed. To avoid some attacks, the binary encoding of the message length

can also be added to complete the padding. This is called a Merkle-Damgard

strengthening [8,3]. Once the padded message is split into blocks,

the chaining variable is set to some fixed initial value and the iteration is

performed. To summarize, the hashing process works as follows:

Pad the original message and split it into blocks

Set to the initial value IV.

For from 1 to let

Output

Given such an iterated hash function, defining its security is a tricky matter.

Ideally, the hash function is often seen as a concrete substitute for random oracles

in cryptographic construction. Of course, it is well known (see [2]) that this

extreme level of security is in fact impossible to reach. Thus, the security level of

hash function is usually characterized by considering “easier” security goals. The

most frequently encountered goal is the impossibility for a bounded adversary

to find a collision in the hash function. We recall that a collision is a pair of

different messages M and such that Due to the birthday

paradox, there is a generic attack that find collisions after about evaluations

of the hash function, where is the size in bits of the hash values. The attack

works by randomly choosing messages and computing their hash values until a

collision occurs. Typically, with iterated hash functions, the size of messages™

blocks is often larger than the size of the hash values themselves, and this attack

usually works on the compression function itself. Other important security goals

for hash functions are preimage resistance and second-preimage resistance. An

attack against preimage resistance is an attack that, given some target value

finds a message M such that An attack against second preimage

resistance, given a message M, finds another message such that

TEAM LinG

Multicollisions in Iterated Hash Functions 309

The best generic attacks against these security goals cost about evaluation

of the function H.

The notion of collision can easily be generalized to that of collision

(or, for short, A is simply a of messages

such that Assuming as above that the

hash values behave almost randomly, finding an could be done by

hashing about messages. When becomes large, this tends to Due

to this fact, relying on freeness in cryptographic construction seems a

good way to gain more security without increasing the size of the hash functions.

This is very tempting in some applications such as identification schemes [5] and

signature schemes [1]. The next section demonstrates that, in fact, in

iterated hash functions are not much harder to construct than ordinary collisions,

even for very large values of

3 Constructing Multicollisions

In this section, we show that constructing multicollisions in iterated hash func-

tion can be done quite efficiently. More precisely, constructing costs

times as much as building ordinary 2-collisions. Before describing the attack,

let us remark that the padding process can be ignored as long as we consider

collisions between messages of the same length. Indeed, in that case, the blocks

of padding are identical. Moreover, if the intermediate hash chaining values col-

lide at some point in the hash computation of two messages, the following values

remain equal as soon as the ends of the messages are identical. Thus, on mes-

sages of the same length, collisions without the padding clearly lead to collisions

with the padding.

For simplicity of exposure, we assume that the size of the message blocks is

bigger than the size of the hash (and chaining) values. However, the attack can

be easily generalized to the other case. We also assume that we can access a

collision finding machine C, that given as input a chaining value outputs two

different blocks B and such that This collision finding

machine may use the generic birthday attack or any specific attack based on a

weakness of The most relevant property is that C should work properly for

all chaining values2. To illustrate the basic idea, we first show how 4-collisions

can be obtained with two calls to C. Starting from the initial value IV, we use a

first call to C to obtain two different blocks, and that yield a collision, i.e.

Let denotes this common value and using a second call

to C, find two other blocks and such that Putting

these two steps together, we obtain the following 4-collision:

We now claim that this basic idea can be extended to much larger collisions

by using more calls to the machine C. More precisely, using calls, we can build

in H. The attack works as follows:

2

Or at least on a fixed proportion of them.

TEAM LinG

Antoine Joux

310

Let be equal to the initial value IV of H.

For from 1 to do:

Call C and find and such that

Let

Pad and output the messages of the form Padding) where is

one of the two blocks or

Clearly, the different messages built as above all reach the same final value.

In fact, they have an even stronger property. Namely, all the intermediate hash

values are equal, since all of the hashing processes go through

A schematic representation of these messages together with their common

intermediate hash values is drawn in figure 1.

Fig. 1. Schematic representation of multicollision construction

Some generalizations. If works on messages blocks which are smaller that

the chaining values, the natural way to proceed is to group a few consecutive

applications of For example, we can consider the function

which composes two rounds of the compression function. As soon

as the total size of the input blocks exceed the size of one chaining value, we can

apply the original attack to the composed compression function.

Another generalization is to build from a 2-collision attack ma-

chine C that works only on a fixed proportion of the chaining values. Of

course, this is not the case with the generic birthday attack, however, it may

happen with some specific attacks. In that case, the basic attack described above

only works with probability Indeed, if any of the does not belong to

the set of chaining values that C can attack, we are in trouble. However, this

bad behavior can be easily corrected by inserted a randomization step between

two consecutive applications of C. Namely, after finding and such that

choose a random block and let:

If fails to be in the scope of C, change to get another candidate. Altogether,

this randomization technique leads to a global complexity of the attack of the

order of calls to C.

4 On the Security of Cascaded Hash Functions

A natural construction to build large hash values is to concatenate several smaller

hashes. For example, given two hash functions F and G, it seems reasonable given

TEAM LinG

Multicollisions in Iterated Hash Functions 311

a message M to form the large hash value In this construction,

F and G can either be two completely different hash functions or two slightly

different instances of the same hash function3. If F and G are good iterated hash

functions with no attack better than the generic birthday paradox attack, we

claim that the hash function obtained by concatenating F and G is not

really more secure that F or G by itself. Moreover, this result applies both to

collision resistance, preimage resistance and second preimage resistance.

4.1 Collision Resistance

Assume that F outputs an hash value and G an value. Then, with

respect to collision resistance, the security level of F is and the level of

G is If was a good hash function, the complexity of the best attack

would be We claim that there exists a much better attack which find

collisions on with complexity of the order of if

(respectively if Assuming than the attack

works as follows.

First, using the multicollision algorithm of section 3 with equal to

rounded up, construct a on F. This costs calls to the basic birthday

paradox attack on the compression function of i.e. about operations.

This yields different messages with the same hash value on the F side. Since

we can perform direct application of the birthday paradox on this set

of elements and, with reasonable probability, expect that a collision occurs

among the hashes of these messages by G. To increase the probability

of success, it suffices to increase the value of and add a few more calls to the

basic attack on F.

Note that when evaluating the complexity of the attack, one must take into

account the contribution of applying G to different messages of size With

a naive implementation, this would cost calls to the compression function

of G. However, using the tree structure of the messages, this can be reduced to

evaluations, assuming that the compression functions of F and G operate on

the same size of blocks. Otherwise, it is necessary to add some padding between

blocks in order to resynchronize the two functions.

A very important fact about this attack is that it does not require of G to be

an iterative hash function. Any hash function will do, and this attack on cascaded

hash works even when G is replaced by a random oracle4. Since a random oracle

is independent from any function, this shows that the folklore knowledge about

cascading hash functions is false. Thus, at least in that case, cascading two

good and independent hash functions does not significatively improve collision

resistance.

3

E.g., two instances of SHA-1 with different constants.

4

The only difference in that case is the fact that the evaluations of G on the

messages can no longer be simplified. As a consequence, assuming that the cost of

calling the random oracle G is linear in the size of the message, the contribution of

G to the complexity becomes

TEAM LinG

312 Antoine Joux

4.2 Preimage and Second-Preimage Resistance

Concerning preimage resistance, it is already known that cascading two hash

functions is, at least in some cases, the cascade is no stronger than the weakest

hash function. Indeed, assume that we are hashing messages from a relatively

small set, say a set of messages. Clearly, the best generic attack to find a

preimage in that case is to perform exhaustive search on the set of messages,

which costs steps. Assume that the output of each the two hash functions

being cascaded is larger than bit and that on this set of messages, one of the

two hash functions, say F, has a shortcut attack. Then, we can clearly use this

attack to recover a candidate preimage. Once this is done, it suffices to check

that this candidate is also a preimage for the other function. The new attack

presented in this section deals with a different case, where the entropy of the

message space is much larger. It shows that even then the cascaded hash is no

more secure than F itself.

Assume again that F outputs an hash value and G an value.

Then, with respect to preimage resistance, the security level of F is and the

level of G is Indeed, the best known generic algorithm to break preimage

resistance is to try random messages until the expected hash value is reached.

This amounts to exhaustive search on the set of possible hash values. If

was a good hash function, the complexity of this exhaustive search attack would

be As with collision resistance, there exists a much better attack which

find a preimage on with complexity of the order of if

(respectively if Assuming than

the attack works as follows.

First, using the multicollision algorithm of section 3 with equal to

construct a on F. This costs calls to the basic birthday paradox

attack on the compression function of i.e. about operations. Then,

search for an additional block that maps the last chaining value to the target

value of F. Note that when looking for this additional block, we need to compute

the output of the complete F function, including the padding of the message.

However, this is a simple matter. After, this last step, we obtain different

messages with the expected hash value on the F side. Since we expect

that, with constant probability, at least one of these messages also match the

expected value on the G side. Once again, the probability of success can

be improved by adding a few more steps to the attack. Note that this attack

on preimage resistance does not either require for G to be an iterative hash

function. As before, it also works when G is replaced by a random oracle.

Clearly, the above attack finds a preimage for the target value which is essen-

tially random. As a consequence, it can be applied directly without any change

when a second preimage is requested.

4.3 Extensions and Open Problems

Given these attacks, it is natural to ask whether they generalizes to three or more

concatenated hash values. In this section, we focus on the possibility of generaliz-

ing the collision search attack. We show that it does and that the generalization

TEAM LinG

Multicollisions in Iterated Hash Functions 313

is almost straightforward. Indeed, assume that H is a third hash function on

bits, then using the above attack on a couple of times, say times,

it is possible as in section 3 to build a on Among these mes-

sages, we expect a collision of H. All in all, this yields a simultaneous collision

on F, G and H. When the expression of the complexity

simplifies and is of the order of More generally, a simultaneous colli-

sion on different iterative hash functions can be found with complexity

Thus, the security of such a construction stays within a polynomial

factor of the security of a single good iterative hash function of the same size.

Similarly, variants of the attack on preimage resistance can be adapted to the

case of different hash functions. However, since they are more complicated, we

do not present them here. One possible variant is described in appendix.

Another generalization of the above attack is also worth noting. In [14],

B. Schneier described a different way of building a long hash from a hash function

F. In this method, F(M) is concatenated with (or

At first view, this is more complicated than the construction. However, the

very same attack can be applied. Indeed, when a is found on F(M),

this fixes F(M) in the first half of the big hash and also the copy of F(M)

in the call to G, thus a collision on the G part is expected exactly as before.

The preimage attack also works as before. We leave open the problem of finding

a related construction making calls to hash function and with security

higher than with respect to collision resistance.

One can also study a related question, how does the security of the concate-

nated hash behaves, when F and G have non-generic attacks better than

the birthday paradox collision search ? In that case, can be significantly

more secure than the best of F and G ?

For the sake of simplicity, assume once again that Then, if F has

a collision finding algorithm C as in section 3 with complexity or better

and G has no shortcut attack better than the birthday paradox, the security of

is essentially the same as the security of G itself. On the other hand, if

G also admits a shortcut attack (as in section 3), it is unclear whether the two

shortcut attacks may be used together to improve the composed attack against

Yet, some other type of attacks against G can be integrated into a better

composed attack on To give an example, let denote the compression

function of G. Assume that there exists a shortcut attack which given a large

set of chaining values finds a message block B and two indices and

such that in time N. Clearly, such a merging attack could

be used to turn an N-collision on F into a full collision on Thus, it is safer

to assume that is essentially as secure as the best of F and G, no more.

5 Potential Applications

While the ideas of cascaded construction and multicollisions are frequently en-

countered in the cryptographic folklore, they are somewhat avoided in published

papers. As a consequence, we were not able to find a single research paper that

can be cryptanalyzed using the attacks presented here. In this section, we de-

TEAM LinG

314 Antoine Joux

scribe some published construction which were likely candidates and explain why

the attacks failed.

Cascaded hash functions. Among the frequently encountered hash function,

RIPEMD is the most suited to the cascaded construction. Indeed, the basic

algorithm already consists of two separate hash computations which are put to-

gether at the end of the round function. Thus, using the result of the two chains

to form a longer hash would be both natural and efficient. In fact, the authors

of RIPEMD propose in [4] optional extensions to 256 and 320 bits by using this

idea. These extensions are not fully specified, but a sketch is given. In this sketch,

the authors of RIPEMD recommend to add to the basic cascade some interaction

between the two parallel compression functions. More precisely, they propose to

swap one register from the first chain and its counterpart in the second chain

after each round of the compression function (5 rounds in RIPEMD-160 and

4 rounds in RIPEMD-128). This interaction was introduced as an additional

security measure and, with respect to our attack, this countermeasure is very

efficient and completely voids it.

Use of multicollisions. Among the published constructions that make use of

multicollisions, one can cite the micropayment scheme Micromint [11], the iden-

tification scheme of Girault and Stern [5] and the signature scheme of Brickell

and al. [1]. In these three applications, multicollisions are indeed used, how-

ever, in the proposed instances of the schemes, no iterated hash functions with

a small internal memory is used. Instead, one encounters either a block cipher

based compression function with a small block size but without iteration or a

truncation of an iterated hash function with a relatively large output size. In

both cases, our attack is unapplicable. In the first case, the required iteration is

not available, in the second, the attack needs collisions on the full internal states

of the hash function, rather than on the truncated states.

6 Conclusion

In this paper, we have shown that multicollisions in iterated hash functions are

not really harder to find than ordinary collision. This yields the first effective

attack against a natural construction that extend the size of hash values by

concatenating several independent results. While considered suspect by some,

especially when used with related hash functions, this construction had never

been attacked before. The cryptanalysis we presented here yields attacks against

collision resistance, preimage resistance and second preimage resistance. As a

consequence, it leaves open the problem of constructing secure hash functions

with variable-output length, which is a important primitive to instantiate some

cryptographic paradigm such as the full domain hash.

Another important theoretical result is the fact that iterated hash functions

cannot be used as entropy-smoothing functions on arbitrary sets of inputs. De-

vising good cryptographic entropy-smoothing functions would be a nice topic for

future research.

TEAM LinG

Multicollisions in Iterated Hash Functions 315

References

1. E. Brickell, D. Pointcheval, S. Vaudenay, and M. Yung. Design validation for

siscrete logarithm based signature schemes. In PKC™2000, volume 1751 of Lecture

Notes in Computer Science, pages 276“292. Springer“Verlag, 2000.

2. R. Canetti, O. Goldreich, and S. Halevi. The random oracle methodology, revisited.

In Proc. 30th Annual ACM Symposium on Theory of Computing (STOC), pages

209“218, 1998.

3. I. Damgård. A design principle for hash functions. In Advances in Cryptology

“ Crypto™89, volume 435 of Lecture Notes in Computer Science, pages 416“427.

Springer“Verlag, 1989.

4. H. Dobbertin, A. Bosselaers, and B. Preneel. RIPEMD-160, a strengthened ver-

sion of RIPEMD. In Fast Software Encryption, volume 1039 of Lecture Notes in

Computer Science, pages 71“82. Springer“Verlag, 1996.

5. M. Girault and J. Stern. On the length of cryptographic hash-values used in

identification schemes. In Advances in Cryptology “ Crypto™94, volume 839 of

Lecture Notes in Computer Science, pages 202“215. Springer“Verlag, 1994.

6. A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography.

CRC Press, 1997. Available on line : http://www.cacr.math.uwaterloo.ca/hac.

7. R. Merkle. A fast software one-way hash function. Journal of Cryptology, 3(1) :43“

58, 1990.

8. R. C. Merkle. One way hash functions and DES. In Advances in Cryptology

“ Crypto™89, volume 435 of Lecture Notes in Computer Science, pages 428“446.

Springer“Verlag, 1989.

9. Secure hash standard. Federal Information Processing Standard Publication 180“1,

1995.

10. B. Preneel. Analysis and design of cryptographic hash functions. PhD thesis,

Katholieke Universiteit Leuven, January 1993.

11. R. Rivest and A. Shamir. PayWord and MicroMint “ two simple micropayment

schemes. CryptoBytes, 2(1):7“11, Spring 1996.

12. R. L. Rivest. The MD4 message digest algorithm. In Advances in Cryptology

“ Crypto™90, volume 537 of Lecture Notes in Computer Science, pages 303“311.

Springer“Verlag, 1991.

13. R. L. Rivest. The MD5 message-digest algorithm. Network Working Group Request

for Comments: 1321, April 1992.

14. B. Schneier. Applied Cryptography. John Wiley & Sons, second edition edition,

1996.

A Preimage Resistance

with Many Hash Functions Cascaded

While the attack against collision resistance described in section 4.1 is easily

generalized to hash functions and yields an attack with complexity

assuming that each function outputs bits, this is not the case for the attack

on preimage resistance. Indeed, the attack we described in section 4.2 is not

straightforward to generalize. The goal of this section is to present a variant

of the attack that can be easily generalized. The drawback is that this variant

is slightly more complicated than the initial attack. In this variant, each hash

TEAM LinG

316 Antoine Joux

function is attacked in two steps. Each of these steps is constructed in a way

that ensures compatibility with the previous hash functions. The first step within

each hash function is to find two different sequences of blocks that, through the

iterated hash process, sends the initial value back to itself. The cost of this

amounts to twice exhaustive search on the set of possible chaining values. The

second step is to find a terminating sequence of blocks that sends this chaining

value to the target value for the current hash function. This costs about one

exhaustive search on the same set.

When processing the first message, we look for single block sequences. More-

over, the terminating block should be correctly padded. A slight technical prob-

lem is that padding requires a priori knowledge of the final message size. How-

ever, we show at the end of this section that this size can be fixed in advance

before launching the attack. Let T denotes this size, then we consider as inputs

to the first hash functions the messages formed of T“1 blocks, each chosen

among the two basic blocks that send the initial value to itself and one final

block which sends to the target value A representation of these messages

is given in figure reffig:preim

Fig. 2. First step of the preimage attack

With the second hash function, the looping sequences are constructed by

concatenating blocks chosen among the two (B and that makes the first

hash function loop (a few more blocks can be added to increase the probability

of finding good sequences). Clearly, when applying one of these sequences both

the first and the second hash functions are going back to their initial values. The

final sequence is constructed by concatenating many copies of the looping blocks

B or and a single instance of the final block, in order to send the second hash

function to its expected destination. Clearly, such a sequence also sends the first

hash function to its target value. The advantage of this attack compared to that

of section 4.2 is that additional hash functions can be processed by iterating

the previous procedure. A notable exception is the computation of the last hash

function which requires no looping part and can thus be simplified. The total

runtime is clearly bounded by a polynomial times the cost of exhaustive

search. The length T of the message can be easily predetermined and is of the

order of

TEAM LinG

Adaptively Secure Feldman VSS

and Applications to Universally-Composable

Threshold Cryptography

Masayuki Abe1 and Serge Fehr2,*

1

NTT Laboratories, Japan

abe@isl.ntt.co.jp

2

CWI, Amsterdam, The Netherlands

fehr@cwi.nl

Abstract. We propose the first distributed discrete-log key generation

(DLKG) protocol from scratch which is adaptively-secure in the non-

erasure model, and at the same time completely avoids the use of in-

teractive zero-knowledge proofs. As a consequence, the protocol can be

proven secure in a universally-composable (UC) like framework which

prohibits rewinding. We prove the security in what we call the single-

inconsistent-player UC model, which guarantees arbitrary composition

as long as all protocols are executed by the same players. As an applica-

tion, we propose a fully UC threshold Schnorr signature scheme.

Our results are based on a new adaptively-secure Feldman VSS scheme.

Although adaptive security was already addressed by Feldman in the

original paper, the scheme requires secure communication, secure era-

sure, and either a linear number of rounds or digital signatures to re-

solve disputes. Our scheme overcomes all of these shortcomings, but on

the other hand requires some restriction on the corruption behavior of

the adversary, which however disappears in some applications including

our new DLKG protocol.

We also propose several new adaptively-secure protocols, which may find

other applications, like a sender non-committing encryption scheme, a

distributed trapdoor-key generation protocol for Pedersen™s commitment

scheme, or distributed-verifier proofs for proving relations among com-

mitments or even any NP relations in general.

1 Introduction

A distributed key generation protocol is an essential component in threshold

cryptography. It allows a set of players to jointly generate a key pair, (pk, sk),

that follows the distribution defined by the target cryptosystem, without the

need for a trusted party. While the public-key pk is output in clear, the corre-

sponding secret-key sk remains hidden and is maintained in a shared manner

Research was carried out while at the Centre for Advanced Computing - Algorithms

*

and Cryptography, Department of Computing, Macquarie University, Australia.

M. Franklin (Ed.): CRYPTO 2004, LNCS 3152, pp. 317“334, 2004.

© International Association for Cryptologic Research 2004

TEAM LinG

318 Masayuki Abe and Serge Fehr

among the players via a secret sharing scheme. This should allow the play-

ers to later use sk without explicitly having to reconstruct it. The distributed

key-generation for discrete-log based schemes, DLKG in short, amounts to the

joint generation of a random group element as public-key and a sharing of its

discrete-log (DL) as secret-key with regard to some given base A

DLKG protocol must remain secure in the presence of a malicious adversary who

may corrupt up to a minority of the players and make them behave in an arbi-

trary way. Informally, it is required that, for any adversary, must be uniformly

distributed, and the adversary must learn nothing about beyond

DLKG was first addressed by Pedersen in [14]. Gennaro et al. pointed out that

Pedersen™s scheme is not secure against a rushing adversary (and even against a

non-rushing adversary) and proposed a new (statically) secure scheme [12]. Then

Frankel et al. and Canetti et al. introduced in [11] respectively [7] adaptively

secure schemes in the erasure model, and Jarecki and Lysyanskaya improved the

schemes to work in the non-erasure model and to remain secure under concurrent

composition [13].

These DLKG protocols which are secure against an adaptive adversary rely

heavily on the use of interactive zero-knowledge proofs. This poses the question

whether this is an inherent phenomenon for adaptively secure DLKG. We an-

swer this question in the negative. Concretely, we propose an adaptively-secure

distributed key-generation protocol from scratch which completely avoids the

use of interactive zero-knowledge proofs. As a consequence, the protocol can be

and is proven secure in a relaxed version of Canetti™s universally-composable

(UC) framework [4], which prohibits rewinding. We show the usefulness of our

distributed key-generation protocol by showing how it gives rise to a (fully) UC

threshold Schnorr signature scheme. To the best of our knowledge, this is the

first threshold scheme proven secure in the UC framework.

The relaxed UC framework, which we call the single-inconsistent-player (SIP)

UC framework, coincides with the original UC framework, except that the sim-

ulator is allowed to fail in case the adversary corrupts some designated player

which is chosen at random from the set of all players and announced to (and

only to) the simulator. This relaxation still allows for a powerful composition

theorem in that protocols may be arbitrary composed, as long as all subsidiary

protocols involve the same set of players.

We stress once more that this relaxation only applies to the proposed dis-

tributed key-generation protocol but not to its application for the threshold

Schnorr signature scheme.

Our DLKG protocol (and thus the threshold Schnorr signature scheme) is

based on a new adaptively-secure version of Feldman™s famous (statically se-

cure) VSS scheme. Although adaptive security was already addressed by Feld-

man in the original paper [10], and besides the well known standard Feldman

VSS scheme he also proposed an adaptively-secure version, the proposed scheme

has several shortcomings: (1) it requires the players to be able to reliably erase

data, (2) it either proceeds over a linear number of rounds or otherwise needs to

incorporate signatures as we will point, and (3) it requires secure communica-

TEAM LinG

Adaptively Secure Feldman VSS and Applications 319

tion channels (or expensive non-committing encryption schemes). We propose a

new variant of Feldman™s VSS scheme which overcomes all of these limitations.

Even though the proposed scheme is not fully adaptively secure but requires

some restriction on the corruption behavior of the adversary, this restriction is

acceptable in that it disappears in the above applications to threshold cryptog-

raphy.

Furthermore, as building blocks for the above schemes or as related construc-

tions, we also propose a sender non-committing encryption scheme, a new adap-

tively secure distributed trapdoor-key generation protocol for Pedersen™s com-

mitment scheme, as well as adaptively secure distributed-verifier zero-knowledge

proofs, which all may very well find other applications. Finally, in the full ver-

sion of this paper [3], we propose several additional applications and/or related

adaptively-secure constructions of independent interest: a simple modification

of Feldman™s adaptively-secure VSS scheme which overcomes (1) and (2) above,

though not (3), but is fully adaptively-secure, an adaptively secure version of

Pedersen™s VSS scheme as a committed VSS, a threshold version of the DSS

signature scheme in the UC model, a threshold version of the Cramer-Shoup

cryptosystem in the SIP UC model, and a common-reference-string generator

with applications to zero-knowledge proofs in the UC model.

The paper is organized as follows. Section 2 reviews the model we are con-

sidering. It includes a short introduction to the UC framework of Canetti and

the new SIP UC framework. In Sect. 3 we recall Feldman™s statically and adap-

tively secure VSS schemes, and we point out an obstacle in the dispute resolu-

tion phase of the adaptive scheme, before we construct our version in Sect. 4.

Finally, Sect. 5 shows the applications to adaptively-secure DLKG, universally-

composable threshold cryptography and distributed-verifier proofs.

Due to space limitations, many definitions and proofs could only be sketched