and

Applications to Quantum Cryptography

By

Nikolaos P. Papadakos

B.Sc. in Mathematics, University of Athens, 2001

Supervisor

Professor Abbas Edalat

Individual Study Option,

on the First Term of the Academic year 2001-2002,

for the M.Sc. in Advanced Computing

Department of Computing

Imperial College of Science, Technology and Medicine

University of London, United Kingdom

Contents

List of Figures iii

Acknowledgements iv

About this Individual Study Option v

1 Physics and information theory 1

1.1 Classical physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Shannon entropy and relevant measures of classical information theory . . . . . . . . . . . 2

1.1.2 Basic properties of Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Quantum physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Basic features of quantum mechanics relevant to information theory . . . . . . . . . . . . 5

1.2.2 The language of quantum information theory: quantum thermodynamics . . . . . . . . . 7

1.2.3 Von Neumann entropy and relevant measures of quantum information theory . . . . . . . 9

1.2.4 A great discrepancy between classical and quantum information theory: quantum entan-

glement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.5 Basic properties of von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.2.6 Measurement in quantum physics and information theory . . . . . . . . . . . . . . . . . . 14

2 Some important results of information theory 16

2.1 Accessible information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 16

2.1.1 Accessible classical information: Fano™s inequality . . . . . . . . . ......... . . . . 16

2.1.2 Accessible quantum information: quantum Fano inequality and the Holevo bound . . . . 17

2.2 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18

2.2.1 Classical data processing . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18

2.2.2 Quantum data processing . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 19

3 Real-life applications of information theory 20

3.1 Data compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.1 Shannon™s noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.2 Schumacher™s quantum noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . 21

3.2 Information over noisy channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.1 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.2 Classical information over noisy classical channels . . . . . . . . . . . . . . . . . . . . . . 24

3.2.3 Classical information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 25

3.2.4 Quantum information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 26

4 Practical quantum cryptography 27

4.1 Theoretical principles of quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1.1 The BB84 quantum key distribution protocol . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.1.2 Information reconciliation and privacy ampli¬cation . . . . . . . . . . . . . . . . . . . . . 29

4.1.3 Privacy and security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.4 A provable secure version of the BB84 protocol . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 A commercial quantum cryptographic device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 Experimental quantum key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.2 Commercial quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

Summary 33

Appendices 35

A Omitted proofs 35

A.1 Special diagonal transformation of normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 35

A.2 Projective measurements are equivalent to an addition of unitary operations . . . . . . . . . . . . 36

B Distance measures of information 37

B.1 Classical information distance: Hamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

B.2 Quantum information distance: Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 38

ii

List of Figures

1.1 Relationships between di¬erent entropies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 A summary of basic information relative features, of quantum mechanics. . . . . . . . . . . . . . 7

1.3 Transition from classical to quantum thermodynamics. . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Quantum data compression. The compression operation Cn compresses a quantum source ρ stored

in n log d qubits into nS (ρ) qubits. The source is accurately recovered via the decompression

operation Dn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 The noisy channel coding problem for classical messages. It is required that every one of the 2 nR

possible messages should be sent uncorrupted through the channel with high probability. . . . . . 25

4.1 Alice and Bob communicating with the fear of Eve stealing information. . . . . . . . . . . . . . . 27

4.2 The setup of BB84 quantum key distribution protocol. The quantum line is used for the distri-

bution of the key, while the classical line for discussing security topics and the transmission of

encrypted messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.3 Typical system for quantum cryptography using polarization coding (LD: laser diode, BS: beam-

splitter, F: neutral density ¬lter, PBS: polarizing beam splitter, »/2: half waveplate, APD:

avalanche photodiode). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

S.1 Summary of classical and quantum information theory. . . . . . . . . . . . . . . . . . . . . . . . . 33

S.2 Summary of quantum cryptography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

iii

Acknowledgements

I am very pleased to thank my supervisor Professor Abbas Edalat, who with his important comments not only

did he help me improve the presentation of this Individual Study Option, but by suggesting me to solve all

64 exercises of the last two chapters of [1, chap.11,12] he contributed dramatically to my understanding and

augmented my self-con¬dence on the ¬eld. I am indebted to him, for his e¬ort of transforming the present

material into something more than just an Individual Study Option.

iv

About this Individual Study Option

In this Individual Study Option the concepts of classical and quantum information theory are presented, with

some real-life applications, such as data compression, information transmission over noisy channels and quantum

cryptography.

Abstract

In each chapter the following topics are covered:

1. The relation of information theory with physics is studied and by using the notion of Shannon™s entropy,

measures of information are introduced. Then their basic properties are proven. Moreover important

information relevant features of quantum physics, like the disability of distinguishing or cloning states

in general are studied. In order to get a quantum measure of information, an introduction to quantum

thermodynamics is given with a special focus on the explanation of the utility of density matrix. After this

von Neumann entropy and its related measures are de¬ned. In this context a major discrepancy between

classical and quantum information theory is presented: quantum entanglement. The basic properties of

von Neumann entropy are proven and some information theoretic interpretation of quantum measurement

is given.

2. The amount of accessible information is obtained by the Fano™s inequality in the classical case, and by its

quantum analogue and Holevo™s bound in the quantum case. In addition to this classical and quantum

data processing is discussed.

3. Furthermore for real-life applications data compression is studied via Shannon™s classical and Schumacher™s

quantum noiseless channel coding theorems. Another application is transmission of classical information

over noisy channels. For this a summary of classical and quantum error correction is given, and then

Shannon™s classical noisy channel and Holevo-Schumacher-Westmoreland quantum noisy channel coding

theorems are studied. The present state of transmission of quantum information over quantum channels

is summarized.

4. A practical application of the aforementioned is presented: quantum cryptography. In this context the

BB84, a quantum key distribution protocol, is demonstrated and its security is discussed. The current

experimental status of quantum key distribution is summarized and the possibility of a commercial device

realizing quantum cryptography is presented.

Special viewpoints emphasized

The above material is in close connection with the last two chapters of [1, chap.11,12], and error correction is a

summary of the results given in chapter 10 of [1]. From this book Figures 1.1, 3.1, 3.2 and S.1, were extracted.

However there was much in¬‚uence on information theory by [2], and some results given therein like for example

equation (3.1) were extended to (3.2). Of equal impact was [3] concerning error correction, and [3, 4] concerning

quantum cryptography. Note that Figure was 4.2 extracted from [4]. Of course some notions which were vital

to this Individual Study Option were taken from almost all the chapters of [1].

However there where some topics not well explained or sometimes not enough emphasized. The most

important tool of quantum information theory, the density matrix, is misleadingly given in [1, p.98]: ”the

density matrix [...] is mathematically equivalent to the state vector approach, but it provides a much more

convenient language for thinking about commonly encountered scenarios in quantum mechanics”. The density

matrix is not equivalent to the state vector approach and is much more than just a convenient language. The

former is a tool for quantum thermodynamics, a ¬eld where is impossible to use the latter. Rephrasing it,

v

quantum thermodynamics is the only possible language. Willing to augment understanding of this meaningful

tool, subsection 1.2.2 is devoted to it, and is close to the viewpoint of [5, p.295-307], which is a classical textbook

for quantum mechanics.

In this Individual Study Option some topics are presented in a di¬erent perspective. There is an attempt to

emphasize information theoretic discrepancies between classical and quantum information theory, the greatest of

which is perhaps quantum entanglement. A very special demonstration of this di¬erence is given in subsection

1.2.4. It should be noted that some particular information theoretic meaning of measurement in physics is

presented in subsection 1.2.6.

Concerning the di¬erent views presented in this text, nothing could be more bene¬cial than the 64 exercises

of the last two chapters of [1, chap.11,12]. As an example a delicate presentation of measures of information is

given in page 1, due to exercise 11.2 [1, p.501], or subsection 1.2.4 on quantum entanglement was inspired by

exercise 11.14 [1, p.514]. In some occasions special properties where proved in order to solve the exercises. Such

a property is the preservation of quantum entropy by unitary operations (property 2 of von Neumann entropy,

in page 11), which was needed to solve for example exercise 11.19,20 [1, p.517,518]. For these last two exercises

some special lemmas were proven in appendices A.1 and A.2. It is important to notice that from the above

mentioned exercises of [1, chap.11,12] only half of them are presented here.

Notation

Finally about the mathematical notation involved, the symbol should be interpreted as ”de¬ned by”, and the

symbol ≡ as ”to be identi¬ed with”.

vi

Chapter 1

Physics and information theory

Since its invention, computer science [6] was considered as a branch of mathematics, in contrast to information

theory [7, 8] which was viewed by its physical realization; quoting Rolf Landauer [9] ”information is physical”.

The last decades changed the landscape and both computers and information are mostly approached by their

physical implementations [10, 11, 12, 13]. This view is not only more natural, but in the case of quantum

laws it gives very exciting results and sometimes an intriguing view of what information is or can be. Such an

understanding could never be inspired just from mathematics. Moreover there is a possibility that the inverse

relation exists between physics and information [14, 15] or quoting Steane [3] one could ¬nd a new methods of

studying physics by ”the ways that nature allows, and prevents, information to be expressed and manipulated,

rather than [ the ways it allows] particles to move”. Such a program is still in its infancy, however one relevant

application is presented in subsection 1.2.6. In this chapter the well established information theory based on

classical physics is presented in section 1.1, and the corresponding up to date known results for quantum physics

are going to be analyzed in section 1.2.

1.1 Classical physics and information theory

In classical physics all entities have certain properties which can be known up to a desired accuracy. This fact

gives a simple pattern to store and transmit information, by assigning information content to each of the property

a physical object can have. For example storage can be realized by writing on a paper, where information lays

upon each letter, or on a magnetic disk, where information, in binary digits (bits), is represented each of the

two spin states a magnetic dipole can have. In what concerns transmission, speech is one example, where each

sound corresponds to an information, or a second example is an electronic signal on a wire, where each state

of the electricity is related to some piece of information. Unfortunately in every day life such simple patterns

are non-economical and unreliable. This is because communication is realized by physical entities which are

imperfect, and hence they can be in¬‚uenced by environmental noise, resulting information distortion.

Concerning the economical transmission of information, one can see that the naive pattern of information

assignment presented in the last paragraph is not always an optimal choice. This is because a message, in

English language for example, contains symbols with di¬erent occurrence frequencies. Looking for example

this text one can note immediately that the occurrence probability of letter a, p a , is much greater than that

of exclamation p! . According to the naive assignment, English language symbols are encoded to codewords of

identical length l, and the average space needed to store a is lpa and of the exclamation lp! , and since pa > p! ,

a lot of space is wasted for the letter a. In order to present how a encoding scheme can be economical consider

a four letter alphabet A, B, C, D, with occurrence probabilities pA = 3 , pB = 1 , pC = pD = 16 , the subsequent

1

4 8

assignment of bits: A ’ 1, B ’ 01, C ’ 010, and D ’ 011. A message of n symbols, using this encoding, has

on average n (pA + 2pB + 3pC + 3pD ) = n 3 + 2 1 + 3 16 + 3 16 = n 11 bits instead of 2n which would needed

1 1

4 8 8

if somebody just mapped to each letter a two bit codeword.

The topics discussed in the last two paragraphs give rise to the most important information theoretic

question: which are the minimal resources needed to reliably communicate? An answer to this question can be

given by abstractly quantifying information in relevance to the physical resources needed to carry it. Motivated

by the previously demonstrated four letter alphabet example, probabilities are going to be used for such an

abstraction. One now de¬nes a function H, quantifying a piece of information I, exhibiting the following

reasonable properties:

1

1. H (I) is a function only of the probability of occurrence p of information I, thus H (I) ≡ H (p) .

2. H is a smooth function.

3. The resources needed for two independent informations with individual probabilities p, q > 0 are the sum

of the resources needed for one alone, or in mathematical language H (pq) = H (p) + H (q) .

The second and third property imply that H is a logarithmic function, and by setting q = 1 in the third it

is immediate to see that H (1) = 0. Hence H (p) = k log p, where k and a are constants to be determined (refer

to comments after equations (1.2) and (1.5)). This means that the average of resources needed when one of the

mutually exclusive set of information with probabilities p1, . . . , pn occurs is

H (p1, . . . , pn) = k pi loga pi . (1.1)

i

It should be noted that probability is not the only way of quantifying information [14, 15, 16, 17, 18, 19].

1.1.1 Shannon entropy and relevant measures of classical information theory

The function Q found in (1.1) is known in physics as entropy, and measures the order of a speci¬c statistical

system. Of course one interesting physical system is an n-bit computer memory, and if all the possible cases of

data entry are described by a random variable X with probabilities p1, . . . , p2n , then the computer™s memory

should have an entropy given by

H (X) ≡ H (p1, . . . , p2n ) ’ px log px . (1.2)

x

Here a modi¬ed version of (1.1) is used, with log ≡ log2 and k 1, an assignment to be veri¬ed after equation

(1.5). Equation (1.2) is known in information theory as the Shannon™s entropy [7]. There are two complementary

ways of understanding Shannon™s entropy. It can be considered as a measure of uncertainty before learning the

value of a physical information or the information gain after learning it.

The Shannon entropy gives rise to other important measures of information. One such is the relative entropy,

which is de¬ned by

px

H (px||qx) = ’H (X) ’

px log px log qx, (1.3)

qx

x x

and is a measure of distance between two probability distributions. This is because it can be proven that

H (px ||qx) > 0, (1.4)

H (px ||qx) 0 ” ∀x : px = qx .

=

Of course it is not a metric because as one can check H (px ||qx) = H (qx||px) is not always true. The relative

entropy is often useful, not in itself, but because it helps ¬nding important results. One such is derived using

the last equality in (1.3) and (1.4), then in a memory with n-bits

log 2n = n,

H (X) < (1.5)

1

log 2n = n ” ∀i : pi =

H(X) = ,

2n

which justi¬es the selection of k = 1 in (1.2), because in the optimal case and in absence of noise, the maximum

physical resources needed to transmit or to store an n-bit word should not exceed n. One can also see from

(1.2) that

H (X) > 0, (1.6)

0 ” system X is in a de¬nite state (p = 1).

H (X) =

Other important results are deduced relative entropy, and concern useful entropic quantities such as the

joint entropy, the entropy of X conditional on knowing Y, and the common or mutual information of X and

2

Y. Those entropies are correspondingly de¬ned by the subsequent intuitive relations

’

H (X, Y ) pxy log pxy , (1.7)

xy

H (X, Y ) ’ H (Y ) ,

H (X|Y ) (1.8)

H (X) + H (Y ) ’ H (X, Y ) = H (X) ’ H (X|Y ) ,

H (X : Y ) (1.9)

and can be represented in the ™entropy Venn diagram™ as shown in Figure 1.1.

"¢

#"¢

¨¤

! ¤¢

©¨¦¤£¢

§¥ ¡

Figure 1.1: Relationships between di¬erent entropies.

1.1.2 Basic properties of Shannon entropy

It is worth mentioning here the basic properties of Shannon entropy:

1. H (X, Y ) = H (Y, X) , H (X : Y ) = H (Y : X) .

2. H (Y |X) ≥ 0 and thus by the second equality of (1.9) H (X : Y ) ¤ H (Y ) , with equality if and only if Y

is a function of X, Y = f (X) .

3. H (X) ¤ H (X, Y ) , with equality if and only if Y is a function of X.

4. Subadditivity: H (X, Y ) ¤ H (X) + H (Y ) with equality if and only if X and Y are independent

random variables.

5. H (Y |X) ¤ H (Y ) and thus by the second equality of (1.9) H (X : Y ) ≥ 0, with equality in each if and

only if X and Y are independent variables.

6. Strong subadditivity: H (X, Y, Z) + H (X) ¤ H (X, Y ) + H (Y, Z) , with equality if and only if Z ’

Y ’ X forms a Markov chain.

7. Conditioning reduces entropy: H (X|Y, Z) ¤ H (X|Y ) .

8. Chaining rule for conditional entropies: Let X1 , . . . , Xn and Y be any set of random variables, then

n

H (X1 , . . . , Xn |Y ) = i=1 H (Xi |Y, X1, . . . , Xi’1) .

9. Concavity of the entropy: Suppose there are probabilities qi ≥ 0, pi > 0 and then H ( ≥

i pi qj )

i pi H (qj ) , with equality if and only if qj s are identical.

The various relationships between entropies may mostly be derived from the ™entropy Venn diagram™ shown

in Figure 1.1. Such ¬gures are not completely reliable as a guide to properties of entropy, but they provide a

useful mnemonic for remembering the various de¬nitions and properties of entropy. The proofs of the above

mentioned properties follow.

Proof

3

1. Obvious from the de¬nition of joint entropy (1.7) and mutual entropy (1.9).

2. Based on the de¬nition of conditional entropy (1.8)

H (Y |X) H (X, Y ) ’ H (Y ) = ’ p (x, y) log p (x, y) + p (y) log p (y)

xy x

p (x, y)

’ p (x, y) log p (y) = ’

= p (x, y) log p (x, y) + p (x, y) log

p (y)

xy xy xy

’H (p (x, y) ||p (y)) ,

=

and using (1.4), where equality holds if and only if p (X, Y ) = p (Y ) , that is Y = f (X) .

3. It is immediately proven by the second one and using the de¬nition of conditional entropy (1.8).

4. Following the subsequent steps of calculation

H (p (x, y) ||p (x) p (y)) ’H (p (x, y)) ’ p (x, y) log p (x) p (y)

xy

’H (p (x, y)) ’ p (x, y) log p (x) ’

= p (x, y) log p (y)

xy xy

’H (p (x, y)) + H (p (x)) + H (p (y)) ,

=

where the second equality of (1.3) was used. The result follows directly from equation (1.4).

5. It is easily derived from subadditivity (property 4) and the de¬nition of conditional entropy (1.8).

6. First note that by the de¬nition of the joint entropy (1.7) and some algebra H (X, Y, Z) + H (Y ) ’

H (X, Y )’H (Y, Z) = x,y,z p (x, y, z) log p(x,y)p(y,z) . Then using the fact that log x ¤ x’1 for all positive

p(x,y,z)p(y) ln 2

x and equality achieved if and only if x = 1, the following can be concluded

p (x, y) p (y, z) p (x, y, z) p (x, y) p (y, z)

¤ ’1

p (x, y, z) log

p (x, y, z) p (y) ln 2 p (x, y, z) p (y)

xyz xyz

1 p (y) p (y) 1

’

= = 0,

ln 2 p (y) ln 2

y

p(x,y)p(y,z)

= 1 ” p (z|y) = p (z|x, y) ” X ’ Y ’ Z is a Markov chain, q.e.d.

with equality if and only if p(x,y,z)p(y)

7. From strong subadditivity (property 6) it straightforward that H (X, Y, Z) ’ H (Y, Z) ¤ H (X, Y ) ’ H (Y )

and from the de¬nition of conditional entropy (1.26c) the result is obvious.

8. First the result is proven for n = 2 using the de¬nition of conditional entropy (1.26c)

H (X1 , X2 |Y ) = H (X1 , X2 , Y ) ’ H (Y ) = H (X1 , X2, Y ) ’ H (X1 , Y ) + H (X1 , Y ) ’ H (Y )

H (X2 |Y, X1) + H (X1 |Y ) .

=

Now induction is going to be used to prove it for every n. Assume that the result holds for n, then

using the one for n = 2, H (X1 , . . . , Xn+1|Y ) = H (X2 , . . . , Xn+1|Y, X1 ) + H (X1 |Y ) , and applying the

inductive hypothesis to the ¬rst term on the right hand side gives

n+1 n+1

H (X1 , . . . , Xn+1|Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) + H (X1 |Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) ,

i=2 i=1

q.e.d.

4

9. The concavity of Shannon entropy, will be deduced by the concavity of von Neumann™s entropy in 1.38.

However here is going to be demonstrated that binary entropy (Hbin (p) H (p, 1 ’ p)) is strictly concave,

Hbin (px1 + (1 ’ p) x2) ≥ pHbin (x1) + (1 ’ p) Hbin (x2) ,

where 0 ¤ p, x1, x2 ¤ 1 and equality holds for the trivial cases x1 = x2 , or p = 0, or p = 1. This is easily

proved by using the fact that the logarithmic function is increasing and ’p (1 ’ x) ≥ ’ (1 ’ px) , hence

Hbin (px1 + (1 ’ p) x2 ) ’ (px1 + (1 ’ p) x2) log (px1 + (1 ’ p) x2)

’ [1 ’ (px1 + (1 ’ p) x2)] log [1 ’ (px1 + (1 ’ p) x2 )]

≥ ’px1 log x1 ’ (1 ’ p) x2 log x2 ’ p (1 ’ x1) log (1 ’ px1)

’ (1 ’ p) (1 ’ x2) log [1 ’ (1 ’ p) x2]

pHbin (x1) + (1 ’ p) Hbin (x2 ) .

=

The strictness of concave property is seen by noting that only in the trivial cases inequalities such as

log px1 ¤ log (px1 + (1 ’ p) x2) could be equalities. Finally concerning the binary entropy it is obvious that

d2

because dp Hbin (p) = ’ log p’1+log (1 ’ p)+1 = 0 ” p = 1 , and for p = 0, 1, dp2 Hbin (p) = ’ 1 ’ 1’p < 0,

d 1

2 p

1

the maximum is reached at p = 2 .

Some additional notes on the properties of Shannon entropy

Concluding the properties of Shannon entropy, it should be noted that the mutual information is not always

subadditive or superadditive. One counterexample for the ¬rst is the case where X and Y are independent

identically distributed random variables taking the values 0 or 1 with half probability. Let Z = X • Y, where

• the modulo 2 addition, then H (X, Y : Z) = 1 and further calculating H (X : Z) + H (Y : Z) = 0, that is

H (X, Y : Z) H (X : Z) + H (Y : Z) .

The counterexample concerning the second case is the case of a random variable X 1 taking values 0 or 1

with half probabilities and X2 Y1 Y2 X1 . Then H (X1 : Y1 ) + H (X2 : Y2 ) = 2 and in addition to this

H (X1 , X2 : Y1, Y2 ) = 1, which means that

H (X1 : Y1 ) + H (X2 : Y2) H (X1 , X2 : Y1 , Y2) .

1.2 Quantum physics and information theory

Quantum theory is another very important area of physics, which is used to describe the elementary particles that

make up our world. The laws and the intuition of quantum theory are totally di¬erent from the classical case.

To be more speci¬c quantum theory is considered as counter intuitive, or quoting Richard Feynman, ”nobody

really understands quantum mechanics”. However quantum physics o¬ers new phenomena and properties which

can change peoples view for information. These properties are going to be investigated in this section.

1.2.1 Basic features of quantum mechanics relevant to information theory

Mathematically quantum entities are represented by Hilbert space vectors |ψ , usually normalized ψ|ψ = 1.

Quantum systems evolve unitarily, that is, if a system is initially in a state |ψ 1 , it becomes later another state

|ψ 2 after a unitary operation

U |ψ 1 = |ψ 2 . (1.10)

Unitary operations are reversible, since U U † = 1 and previous states can be reconstructed by |ψ 1 = U † |ψ2 .

What is important about such operations is that because ψ 2 |ψ2 = ψ1 | U † U |ψ1 = ψ1 | I |ψ 1 = 1, normal-

ization, which soon will be interpreted as probability, is conserved. The measurement of the properties of these

objects is described by a collection of operators {Mm } . The quantum object will found to be in the m-th state

with probability

†

p (m) = ψ| Mm Mm |ψ , (1.11)

5

and after this measurement it is going to be in de¬nite state, possibly di¬erent from the starting one

Mm

|ψ . (1.12)

†

|ψ

ψ| Mm Mm

Of course m should be viewed as the measurement outcome, hence information extracted from a physical system.

This way information content can be assigned to each state of a quantum object. As a practical example, in

each energy state of an atom one can map four numbers or in each polarization state of a photon one can

map two numbers say 0 and 1. The last case is similar to bits of classical information theory, but because

a photon is a quantum entity they are named qubits (quantum bits). It should be stressed here that the

†

measurement operators should satisfy the completeness relation m Mm Mm = I which results m p (m) =

† †

m ψ| Mm Mm |ψ = 1 as instructed by probability theory. However this implies that ψ| M m Mm |ψ = 1 and

looking at equation (1.12) one understands that measurement is an irreversible operation.

What is very interesting about quantum entities is that they can either be in a de¬nite state or in a

superposition of states! Mathematically this is written

|ψ = cs |s .

s

Using the language of quantum theory |ψ is in state |s with probability c— cs , and because of normalization

s

the total probability of measuring |ψ is

c— cs = 1.

ψ|ψ = (1.13)

s

s

States |s are usually orthonormal to each other, hence

ψ|s = cs. (1.14)

Although being simultaneously in many states sounds weird, quantum information can be very powerful in

computing. Suppose some quantum computer takes as input quantum objects, which are in a superposition

of multiple states, then the output is going to be quantum objects which of course are going to be in multiple

states too. This way one can have many calculations done only by one computational step! However careful

extraction of results is needed [13, 20, 21, 22, 23], because quantum measurement has as outcome only one

answer from the superposition of multiple states, as equation (1.12) instructs, and further information is lost.

Then one can have incredible results, like for example calculate discrete logarithms and factorize numbers in

√

polynomial time [20, 21], or search an unsorted database of N objects with only N iterations [22, 23]!

In contrast to classical information which under perfect conditions can be known up to a desired accuracy,

quantum information is sometimes ambiguous. This is because one cannot distinguish non-orthogonal states

reliably. Assuming for a while that such a distinction is possible for two non-orthogonal states |ψ 1 and |ψ 2

and a collection of measurement operators {Mm } . Then according to this assumption some of the measuring

operators give reliable information whether the measured quantity is |ψ 1 or |ψ2 , and collecting them together

the following two distinguishing POVM elements can be de¬ned

†

Ei Mm Mm , i = 1, 2.

Mm measuring i

The assumption that these states can be reliably distinguished, is expressed mathematically

ψi | Ei |ψi = 1, i = 1, 2. (1.15)

Since i=1,2 Ei = I it follows that i=1,2 ψ 1| Ei |ψ1 = 1. Because E1 operator reliable measures the ¬rst

state, then ψ1 | E1 |ψ1 = 1, hence the other term must be

ψ1 | E2 |ψ1 = 0. (1.16)

Suppose |ψ 2 is decomposed in |ψ 1 and and orthogonal state to |ψ 1 , say |ψ ⊥ ; then |ψ 2 = ± |ψ1 + β |ψ ⊥ .

2 2

Of course |±| + |β| = 1 and |β| < 1 because |ψ 1 and |ψ 2 are not orthogonal. Using the last decomposition

2 2

and (1.16) ψ 2 | E2 |ψ2 = |β| ψ⊥ | E2 |ψ⊥ ¤ |β| < 1 which is in contradiction with (1.15).

6

In what concerns the results of the last paragraph it should be additionally mentioned that information gain

implies disturbance. Let |ψ and |φ be non-orthogonal states, without loss of generality assume that a unitary

process is used to obtain information with the aid of an ancilla system |u . Assuming that such a process does

not disturb the system, then in both cases, one obtains

|ψ — |u ’ |ψ — |v ,

|φ — |u ’ |φ — |v .

Then one would like |v and |v to be di¬erent, in order to acquire information about the states. However since

the inner products are preserved under unitary transformations, v|v ψ|φ = u|u ψ|φ ’ v|v = u|u = 1,

and hence are identical. Thus distinguishing between |ψ and |φ must inevitably disturb at least one of these

states.

However at least theoretically there is always a way of distinguishing orthonormal states. Suppose |i

|i i| plus the operator

are orthonormal, then it is straightforward to de¬ne the set of operators Mi

I ’ i=0 |i i|, which satisfy the completeness relation. Now if the state |i is prepared then p (i) =

M0

i| Mi† Mi |i = 1, thus they are reliably distinguished.

Another very surprising result is the prohibition of copying arbitrary quantum states. This is known as

no-cloning theorem [24, 25] and it can be very easily proven. Suppose it is possible to have a quantum photo-

copying machine, which will have as input a quantum white paper |w and a state to be copied. The quantum

photocopying machine should be realized by a unitary transformation U and if somebody tries to photocopy

two states |ψ and |φ , it should very naturally work as follows

U {|w — |ψ } = |ψ — |ψ ,

U {|w — |φ } = |φ — |φ .

2

Now taking the inner product of these relations ψ|φ = ψ|φ , thus ψ|φ = 0 or ψ|φ = 1, hence |ψ = |φ

or |ψ and |φ are orthogonal. This means that cloning is allowed only for orthogonal states! Thus at least

somebody can construct a device, by quantum circuits, to copy orthogonal states. For example if |ψ and |φ

are orthogonal then there exists a unitary transformation U such that U |ψ = |0 and U |φ = |1 . Then by

applying the FANOUT quantum gate, which maps the input to |00 if the input qubit was |0 and to |11 if the

it was |1 , and further applying U † — U † to the tensor product of outcoming qubits, then either the state |ψ

is will be copied and ¬nally get |ψψ , or |φ and get |φφ .

The basic information relevant features of quantum mechanics, analyzed in this subsection, are summarized

in Figure 1.2.

Basic features of quantum mechanics

1. Reversibility of quantum operations

2. Irreversibility of measurement

3. Probabilistic outcomes

4. Superposition of states

5. Distinguishability of orthogonal states by operators

6. Non-distinguishability of non-orthogonal states

7. Information gain implies disturbance

8. Non-cloning of non-orthogonal states

Figure 1.2: A summary of basic information relative features, of quantum mechanics.

1.2.2 The language of quantum information theory: quantum thermodynamics

As in the case of classical information theory thermodynamics should be studied for abstracting quantum

information. As it was already stated in the last paragraphs, quantum mechanics have some sort of built-in

probabilities. However this is not enough. The probabilistic features of quantum states are di¬erent from that

of classical thermodynamics. The di¬erence is demonstrated by taking two totally independent facts A and B.

7

classical

Then the probability of both of them occurring would be PAB = P (A) + P (B) . In contrast, in quantum

mechanics the wave functions should be added, and by calculating the inner product probabilities are obtained.

More analytically

quantum

( ψ A | + ψ B |) (|ψA + |ψB ) = ( ψA |ψA + ψ B |ψB ) + 2Re ψ — |ψB

PAB = (1.17)

A

in general

P (A) + P (B) + 2Re ψ — |ψB classical

= = PAB .

A

Moreover if somebody decides to encode some information with quantum objects then he is going to be interested

with probabilities of occurrence of the alphabet, exactly the same way it was done in section 1.1, and he would

never like to mess up with the probabilities already found in quantum entities. For this reason a quantum version

of thermodynamics is needed. In the sequel the name thermodynamical probabilities is used to distinguish the

statistical mixture of several quantum states, from the quantum probabilities occurring by observation of a

quantum state.

The basic tool of quantum thermodynamics is the density matrix and its simplest case is when the quantum

state occurs with thermodynamical probability p = 1. Then by (1.14) t|ψ ψ|s = c — cs and therefore

t

|ψ ψ| ,

ρ

is the natural matrix generalization of a quantum vector state. This was the de¬nition of a density matrix of a

pure state, in contradiction to mixture of states where several states occur with probabilities 0 < p < 1. Each

element of this matrix is

t|ψ ψ|s = c— cs.

ρts t

and by equation (1.13) where the normalization of vector was de¬ned, density matrix is correspondingly nor-

malized by

c— cs =

trρ = ρss = s|ψ ψ|s = 1. (1.18)

s

s s s

Moreover it is straightforward that unitary evolution (1.10) is described by

U |ψ ψ| U † = U ρU † , (1.19)

the probability for measuring the m-th state (1.11) is given by

† † † †

p (m) = ψ| Mm Mm |ψ = ψ| Mm |s s| Mm |ψ = ψ| Mm |s s| Mm |ψ = tr Mm ρMm , (1.20)

s s

and the state after measurement (1.12) is obtained by

†

Mm Mm 1 †

|ψ ψ| = Mm ρMm . (1.21)

†

† † tr Mm ρMm

|ψ |ψ

ψ| Mm Mm ψ| Mm Mm

Suppose now there is a collection of quantum states |ψ i occurring with thermodynamical probabilities pi ,

then the mixed density operator is simply de¬ned

pi |ψ i ψi | .

ρ (1.22)

i

This construction is precisely what was expected in the beginning of this subsection, because the thermody-

namical probabilities are real numbers and can be added, contrasting to the situation of quantum probabilities

for a state vector (1.17). The generalization of normalization (1.18), unitary evolution (1.19), probability of

measurement (1.20) and measurement (1.21) for the mixed density matrix are

pitr (|ψ i ψ i|) = 1,

trρ =

i

pi |ψi ψ i | = U ρU † ,

i

†

p (m) = tr Mm ρMm ,

1 †

Mm ρMm .

†

tr Mm ρMm

8

The above results are pretty obvious for the ¬rst, the second and the fourth, but some algebra of probabilities

is needed for the third (refer to [1, p.99] for details). The unitary evolution of the system can be generalized by

quantum operations

†

E (ρ) = Ei ρEi ,

i

†

where i Ei Ei = I. Quantum operations are sometimes referred in the literature [2] as superoperators.

Quantum thermodynamics, just described can be viewed as a transition from classical to quantum thermo-

dynamics. Suppose an ensemble of n particles in an equilibrium is given. For this ensemble assume that the

i-th particle is located at xi , has velocity vi and energy Ei (xi , vi) . Classical thermodynamics says that for the

i-th particle, there is a probability

1

e’βEi ,

pi = (1.23)

’βEj

je

1

to be in this state, where β = kB T , with kB Boltzmann™s constant and T the temperature. Now if the particles

where described by quantum mechanics, then if the i-th would have an eigenenergy Ei, given by the solution

of the Schr¨dinger equation H |ψ i = Ei |ψ i , where H is the Hamiltonian of the system. Now with the help of

o

equation (1.23) the density matrix as was de¬ned in (1.22) can be written as

1 1

|ψ i e’βEi ψi | = e’βH ,

|ψi pi ψi | =

ρ= ’βEj tre’βH

je

i i

which is a generalization of classical thermodynamical probability (1.23). This transition is illustrated in Figure

1.3.

Statistical behavior of classical particles:

Statistical behavior of quantum particles:

having velocity vi , at xi , and energy Ei (vi , xi) ,

at state |ψ i , given by H |ψ i = Ei |ψ i

e’βEi

with probability pi = e’βEi

’βEj

je with probability pi = tre’βH

x2

p2 , • p2, |ψ 2

“ v2

v

n

pn, ←’ •

pn, |ψ n

xn

Quantum

v3

p3, |ψ 3

•

p3 , Mechanics

x3

’’ ’ ’’

’’’’

v1

p1 , • p1 , |ψ1

x1

Figure 1.3: Transition from classical to quantum thermodynamics.

1.2.3 Von Neumann entropy and relevant measures of quantum information the-

ory

Willing to describe quantum information, one could use quantum version of entropy, and in order to justify its

mathematical de¬nition, recall how Shannon entropy was given in equation (1.2), and assume that a density

matrix ρ is diagonalized by ρ = x »x |x x| . Then naturally quantum entropy of ρ is de¬ned

’

S (ρ) »x log »x . (1.24)

x

Translating this into the mathematical formalism developed in the last subsection, the von Neumann entropy

is de¬ned

’tr (ρ log ρ) .

S (ρ) (1.25)

9

The last formula is often used for proving theoretical results and equation (1.24) is used for calculations. As an

example the von Neumann entropy of ρ = p |0 0| + (1 ’ p) (|0 +|1 )( 0|+ 1|) is found to be

2

1’p 1’p

p+ 2 2 = ’»1 log »1 ’ »2 log »2,

S (ρ) = S 1’p 1’p

2 2

√ √

(1+2p2 ’2p) (1+2p2 ’2p)

1+ 1’

where »1 = and »2 = are the eigenvalues of the corresponding matrix. Sur-

2 2

prisingly S (ρ) = H (p, 1 ’ p) even if the same probabilities where assigned for both of them! This shows

that quantum probabilities are not expelled by quantum thermodynamics. The equality could only hold if the

probabilities written in Shannon™s entropy are the eigenvalues of the density matrix.

Following the same path as for classical information theory, in the quantum case it is straightforward to

de¬ne the joint entropy, the relative entropy of ρ to σ, the entropy of A conditional on knowing B and the

common or mutual information of A and B. Each case is correspondingly

tr ρAB log ρAB ,

S (A, B) (1.26a)

tr (ρ log ρ) ’ tr (ρ log σ) ,

S (ρ||σ) (1.26b)

S (A, B) ’ S (B) ,

S (A|B) (1.26c)

S (A) + S (B) ’ S (A, B) = S (A) ’ S (A|B) .

S (A : B) (1.26d)

One can see that there are lot of similarities between Shannon™s and von Neumann™s entropy. As such one can

prove a result reminding equation (1.4)

S (ρ||σ) > 0, (1.27a)

0 ” ρ = σ,

S (ρ||σ) = (1.27b)

and is known as Klein™s inequality. This inequality provides evidence of why von Neumann relative entropy is

close to the notion of metric. What is also important is that it can be used, like in the classic case, to prove

something corresponding to equation (1.5) of Shannon™s entropy

S (ρ) < log d, (1.28a)

1

= log d ” ρ =

S(ρ) I. (1.28b)

d

In addition to this from the de¬nition (1.25) it follows that

S (ρ) > 0, (1.29a)

0 ” ρ is pure,

S (ρ) = (1.29b)

which resembles to equation (1.6).

One can also prove that supposing some ρi states, with probabilities pi , have support on orthogonal sub-

spaces, then

S pi ρi = H (pi ) + pi S (ρi ) .

i i

Directly from this relation the joint entropy theorem can be proven, where supposing that p i are probabilities,

|i are orthogonal states for a system A, and ρi is an set of density matrices of another system B, then

pi |i i| — ρi

S = H (pi) + pi S (ρi ) . (1.30)

i i

Using the de¬nition of von Neumann entropy (1.25) and the above mentioned theorem for the case where ρ i = σ

for every i, and let ρ be a density matrix with eigenvalues pi , and eigenvectors |i , then the entropy of a tensor

product ρ — σ is found to be

S (ρ — σ) = S (ρ) + S (σ) . (1.31)

Another interesting result can be derived by Schmidt decomposition; if a composite system AB is in a pure

state, it has subsystems A and B with density matrices of equal eigenvalues, and by (1.24)

S (A) = S (B) . (1.32)

10

1.2.4 A great discrepancy between classical and quantum information theory:

quantum entanglement

The tools developed in the above subsection can help reveal a great discrepancy between classical and quantum

information theory: entanglement. Concerning the nomenclature, in quantum mechanics two states are named

entangled if they cannot be written as a tensor product of other states. For the demonstration of the aforemen-

tioned discrepancy, let for example a composite system AB be in an entangled pure state |AB , then, because

of entanglement, in the Schmidt decomposition it should be written as the sum of more than one terms

|AB = »i |iA |iB , with |I| > 1, (1.33)

i∈I

where |iA and |iB are orthonormal bases. The corresponding density matrix is obviously ρAB = |AB AB| =

i,j∈I »i »j |iA |iB jA | jB | . As usually the density matrix of the subsystem B can be found by tracing out

system A,

»2 |iB

ρB = trA ρAB = »i»j kA |iA |iB jA |kA jB | = iB | .

i

i∈I

i,j,k∈I

Now because of the assumption |I| > 1 in (1.33) and the fact that |iB are orthonormal bases and it is impossible

to collect them together in a tensor product, subsystem B is not pure. Thus by equation (1.29a) S (B) > 0,

AB is pure thus by (1.29b) S (A, B) = 0 and obviously by (1.26b) S (A|B) < 0. The last steps can be repeated

backwards and the conclusion which can be drawn is that a pure composite system AB is entangled if and only

if S (A|B) < 0.

Of course in classical information theory conditional entropy could only be H (X|Y ) ≥ 0 (property 2 in

subsection 1.1.2) and that is obviously the reason why entangled states did not exist at all! This is an exclusive

feature of quantum information theory. A very intriguing or better to say a very entangled feature, which until

now is not well understood by physicists. However it has incredible applications, such as quantum cryptography,

which will be the main topic of the chapter 4. Concerning the nomenclature, entangled states are named after

(1.26c)

the fact that S (A|B) < 0 ⇐’ S (A, B) < S (B) which means that the ignorance about a system B can be in

quantum mechanics more than the ignorance of both A and B! This proposes some correlation between these

two systems.

How can nature have such a peculiar property? Imagine a simple pair of quantum particles, with two possible

states each |0 and |1 . Then a possible formulation of entanglement can be a state |ψ = |0 √2 + |1 √2 . After

—|0 —|1

a measurement Mm of the ¬rst particle for example, according to (1.12)

Mm |ψ = 1 (|m — |m ) + 0 (|1 ’ m — |1 ’ m ) = |m — |m ,

hence they both collapse to state |m . This example sheds light into the quantum property, where ignorance of

both particles is greater than the ignorance of one of them, since perfect knowledge about one implies perfect

knowledge about the second.

1.2.5 Basic properties of von Neumann entropy

The basic properties of von Neumann entropy, which can be compared to the properties of Shannon entropy

discussed in subsection 1.1.2, are:

1. S (A, B) = S (B, A) , S (A : B) = S (B : A) .

2. Unitary operations preserve entropy: S U ρU † = S (ρ) .

3. Subadditivity: S (A, B) ¤ S (A) + S (B) .

4. S (A, B) ≥ |S (A) ’ S (B)| (Triangle or Araki-Lieb inequality).

5. Strict concavity of the entropy: Suppose there are probabilities pi ≥ 0 and the corresponding density

matrices ρi , then S ( i pi ρi ) < i piS (ρi ) and S ( i pi ρi ) = i pi S (ρi ) ” ρi for which pi > 0 are all

identical.

11

6. Upper bound of a mixture of states: Suppose ρ = i pi ρi where pi probabilities and the correspond-

ing density matrices ρi , then S (ρ) < i piS (ρi ) + H (pi) and S (ρ) = i pi S (ρi ) + H (pi) ” ρi have

support on orthogonal subspaces.

7. Strong subadditivity: S (A, B, C) + S (B) ¤ S (A, B) + S (B, C) , or equivalently S (A) + S (B) ¤

S (A, C) + S (B, C) .

8. Conditioning reduces entropy: S (A|B, C) ¤ S (A|B) .

9. Discarding quantum systems never increases mutual information: Suppose ABC is a composite

quantum system, then S (A : B) ¤ S (A : B, C) .

10. Trace preserving quantum operations never increase mutual information: Suppose AB is a

composite quantum system and E is a trace preserving quantum operation on system B. Let S (A : B)

denote the mutual information between systems A and B before E applied to system B, and S (A : B )

the mutual information after E is applied to system B. Then S (A : B ) ¤ S (A : B) .

11. Relative entropy is jointly convex in its arguments: let 0 ¤ » ¤ 1, then

S (»A1 + (1 ’ ») A2 ||»B1 + (1 ’ ») B2 ) ≥ »S (A1 ||B1) + (1 ’ ») S (A2||B2) .

12. The relative entropy is monotonic: S ρA ||σA ¤ S ρAB ||σAB .

Proof

1. Obvious from the de¬nition of joint entropy (1.26a) and mutual entropy (1.26d).

2. Let U be a unitary matrix then S U ρU † ’tr U ρU † log U ρU † = ’tr U ρU † U (log ρ) U † , where the

fact that U ρU † and ρ are similar and hence they have the same eigenvalues, was employed. Furthermore

tr(AB)=tr(BA)

U is unitary, hence U U † = I, and the proof is concluded S U ρU † = ’tr U ρ (log ρ) U † =

†

U U =I

’tr U † U ρ (log ρ) ’tr (ρ log ρ)

= S (ρ) .

3. Refer to [1, p.516].

4. Assume a ¬ctitious state R purifying the system ρAB , then by applying subadditivity (property 3)

S (R) + S (A) ≥ S (A, R) . Because ρ ABR is pure, and by (1.32) S (R) = S (A, B) and S (A, R) = S (B) .

Combining the last two equations with the last inequality, S (A, B) ≥ S (B) ’ S (A) . Moreover be-

cause S (A, B) = S (B, A) , A and B can be interchanged and then S (A, B) ≥ S (A) ’ S (B) , thus

S (A, B) ≥ |S (A) ’ S (B)| . It is obvious by (1.31) that the equality holds if ρ AR = ρA — ρB . This is hard

to understand because R system was arti¬cially introduced. Another way to obtain the equality condi-

tion is by assuming ρAB has a spectral decomposition ρ AB = ik »ik |iA iA | — |kB kB | , and obviously

ρA =trB ρAB = i ( k »ik ) |iA iA | and ρB = k ( i »ik ) |kB kB | , then one can write

»jk »jk

j j

S (A, B) = S (B) ’ S (A) ” ” »ik =

»ik log »ik = »ik log .

m » im m »im

ik ik

Summing over k in the last equation

2

»jk 1

kj

” =1”

»ik = = »ik »ik = 1,

»im m » im

m

k k k

where the last relation holds because »ik ≥ 0. Now combining the last two outcomes

S (A, B) = S (B) ’ S (A) ” »ik = »jk ” »ik ≡ »i .

j

Rephrasing this result the equality condition holds if and only if the matrices trB (|i i|) have a common

eigenbasis and the matrices trA (|i i|) have orthogonal support. As an example of the above comment

consider the systems ρ A = |0 0| and ρB = 1 |0 0| + 2 |1 1| , then the entropy of each is S (A) = 0 and

1

2

S (B) = 1 and ¬nally the joint system ρAB = ρA — ρB = 1 |00 00| + 2 |01 01| has entropy S (A, B) = 1.

1

2

12

5. Introducing an auxiliary system B, whose state has an orthogonal basis |i corresponding to the states ρ i

of system A. The joint system will have a state ρ AB = i piρi — |i i| , and its entropy according to the

joint entropy theorem (1.30) is S (A, B) = H (pi ) + i pi ρi . Moreover the entropy of each subsystem will

be S (A) = S ( i pi ρi ) and S (B) = S ( i pi |i i|) = H (pi ) , and by applying subadditivity (property

3) S (A, B) ¤ S (A) + S (B) , q.e.d. Concerning its equality conditions assume ρ = ρ i for every i, then

by calculating ρAB = A B

i pi ρ i — |i i| = ρ — i pi |i i| = ρ — ρ , and equality follows from (1.31).

Conversely if S ( i pi ρi ) = i pi S (ρi ) , and suppose there is at least one density matrix which is not

equal to the others, say σ ρj , then

S (qρ + pσ) = qS (ρ) + pS (σ) , (1.34)

where the following quantities where de¬ned q pi , p pj and ρ ρi for i = j. If the density

i=j

i »i |i ρ i|ρ , σ = j κj |j σ j|σ . It is easy to ¬nd

matrices ρ, σ have a spectral decomposition, say ρ =

out that

qS (ρ) + pS (σ) = q»m log (q»m ) + pκm log (pκm ) . (1.35)

m m

This was the right hand side of (1.34). To get the left hand side assume that the matrix qρ + pσ has a

spectral decomposition qρ + pσ = m µm |m ρσ , one can ¬nd unitary matrices connecting these bases,

|i ρ = m uim |m ρσ and |j σ = m wim |m ρσ . This implies that

®«

n|ρσ °q wjl l|ρσ

u— l|ρσ + p —

=’ »iuim |m κj wjm |m

S (qρ + pσ) (1.36)

il

ρσ ρσ

n iml jml

«

— log q wjl l|ρσ » |n

u— l|ρσ + p —

»iuim |m κj wjm |m

il

ρσ ρσ ρσ

iml jml

« «

q κj wjmwjl log q κj wjm wjl

»i uim u— + p —

»i uim u— + p —

=’ il il

n iml jml iml jml

=’ (q»n + pκn ) log (q»n + pκn) .

n

In the last step the fact that uij and wij are unitary (U U † = I ’ j uij u— = δ il ) was employed. Taking

lj

the left and right hand side of equation (1.34) as found in (1.35) and (1.36) correspondingly, it is simple

to verify that

q»m log (1 + pκm ) + pκm log (1 + q»m ) = 0.

m m

The fact that log (1 + pκm ) , log (1 + q»m ) ≥ 0, implies that both summations are greater or equal to zero,

and the last equality leaves no other case than being both of them zero. This can only happen if for the

non-zero »m , pm are null. An alternative proof that von Neumann entropy is concave, can be presented by

de¬ning f (p) S (pρ + (1 ’ p) σ) . Then by calculus if f (p) ¤ 0, then f is concave, that is for 0 ¤ p ¤ 1,

f (px + (1 ’ p) y) ≥ pf (x) + (1 ’ p) f (y) . Selecting x = 1, y = 0, f (p) ≥ pf (1) + (1 ’ p) f (0) , which

according to the de¬nition of f, implies that S (pρ + (1 ’ p) σ) ≥ pS (ρ) + (1 ’ p) S (σ) .

6. Refer to [1, p.518,519].

7. For the proof refer to [1, p.519-521]. The fact that these inequalities are equivalent, will be presented

here. If S (R) + S (B) ¤ S (R, C) + S (B, C) holds then by introducing an auxiliary system A, which

puri¬es the system RBC, S (R) = S (A, B, C) and S (R, C) = S (A, B) , so the last inequality becomes

S (A, B, C)+S (B) ¤ S (A, B)+S (B, C) . Conversely if S (R, B, C)+S (B) ¤ S (R, B)+S (B, C) holds by

inserting again a system A purifying system RBC, because S (R, B, C) = S (A) and S (R, B) = S (A, C) ,

the last inequality becomes S (A) + S (B) ¤ S (A, C) + S (B, C) . From this another equivalent form to

write strong subadditivity is 0 ¤ S (C|A)+S (C|B) or S (A)+S (B)’S (A, B)+S (A)+S (C)’S (A, C) ¤

2S (A) ” S (A : B) + S (A : C) ¤ 2S (A) . This inequality corresponds to the second property of Shannon

13

entropy, where H (X : Y ) ¤ H (X) , which is not always true for quantum information theory. For

1 1

example, the composite system |AB = 2 |00 + 2 |11 , has entropy S (A, B) = 0 because it is pure, and

1 1

ρA = 2 |0 0| + 2 |1 1| , thus S (A) = 1 and similarly S (B) = 1. Hence S (A : B) > S (A) .

8. Refer to [1, p.523].

9. Refer to [1, p.523].

10. Refer to [1, p.523].

11. For the proof refer to [1, p.520]. Concerning this property it should be emphasized that joint concavity

implies concavity in each input; this is obvious by selecting B1 B2 B or A1 A2 A. The converse

‚

is not true. For example by choosing f (x, y) = ’x2 ey , which is convex on x because ‚x2 f = ’2ey ¤ 0 for

every x, y, and convex on y because ‚y2 f = ’x2ey ¤ 0 for every x, y. However f 3 4 + 2 1 , 1 (’3) + 3 3

‚ 1 22

33 3

’0.57 which is less than 3 f 4, 1 + 2 f ’3, 2

1

’0.41.

3 3 3

12. Refer to [1, p.524,525].

Using strict concavity to prove other properties of von Neumann entropy

Strict concavity (property 5) of von Neumann entropy, can be used to prove (1.28b) and moreover that the

completely mixed state 1 I on a space of d dimensions is the unique state of maximal entropy. To do this the

d

following result is stated: for a d — d normal matrix A there exists a set of unitary matrices U i such that

d

(A) (A)†

Ui AUi = tr (A) I. (1.37)

i=1

For a proof refer to appendix A.1 (the proof of a more general proposition is due to Abbas Edalat). To prove

property 2

d

1 1

as a maximal state, take A ≡ ρ any quantum state, then S (ρ) =

the uniqueness of dI S (ρ) =

i=1

d

property 5 (1.37)

(ρ) (ρ)† 1 (ρ) (ρ)†

d d

1 1

¤

i=1 d S Ui ρUi S i=1 d Ui ρUi = S dI . Hence by strict concavity (property 5)

1

any state ρ has less or equal entropy to the completely mixed state of d I, and in order to be equal they should

be identical.

Using von Neumann entropy a proof of the concavity of Shannon entropy can be provided. Let p i and qi

two probability distributions. Then

property 5

pi qj |j j| ≥ pi S (qj |j j|) =

H pi qj =S piH (qj ) , (1.38)

i i i i

with equality if and only if qj |j j|s are the same, that is qj s are identical.

1.2.6 Measurement in quantum physics and information theory

As already noted in the introduction of the present section, quantum physics seems very puzzling. This is

because there are two types of evolution a quantum system can undergo: unitary and measurement. One

understands that the ¬rst one is needed to preserve probability during evolution (see subsection 1.2.1). Then

why a second type of evolution is needed? Information theory can explain this, using the following results:

1. Projective measurement can increase entropy. This is derived using strict concavity. Let P be a projector

and Q I ’ P the complementary projector, then there exist unitary matrices U1 , U2 and a probability

† †

p such that for all ρ, P ρP + QρQ = pU1 ρU1 + (1 ’ p) U2 ρU2 (refer to A.2 for a proof), thus

property 5

† † † †

S (P ρP + QρQ) = S pU1 ρU1 + (1 ’ p) U2 ρU2 ≥ pS U1 ρU1 + (1 ’ p) S U2 ρU2 (1.39)

property 2

pS (ρ) + (1 ’ p) S (ρ) = S (ρ) .

=

Because of strict concavity the equality holds if and only if P ρP + QρQ = ρ.

14

2. General measurement can decrease entropy. One can convince himself by considering a qubit in state

1 1

ρ = 2 |0 0| + 2 |1 1| , which is not pure thus S (ρ) > 0, which is measured using the measurement

matrices M1 = |0 0| and M2 = |0 1| . If the result of the measurement is unknown then the state of the

† †

system afterwards is ρ = M1 ρM1 + M2 ρM2 = |0 0| , which is pure, hence S (ρ ) = 0 < S (ρ) .

3. Unitary evolution preserves entropy. This is already seen as von Neumann™s entropy property 2.

Now one should remember the information theoretic interpretation of entropy given throughout this chapter:

entropy is the amount of knowledge one has about a system. Result 3 instructs that if only unitary evolutions

were present in quantum theory, then no knowledge on any physical system could exist! One is relieved by

seeing that knowledge can decrease or increase by measurements, as seen by results 1 and 2, and of course that

is what measurements were meant to be in the ¬rst place.

15

Chapter 2

Some important results of information

theory

Some very important results derived by information theory, which will be useful for the development of the

next two chapters, concern the amount of accessible information and how data can be processed. These are

presented correspondingly in section 2.1 and in section 2.2, both for the classical and the quantum case.

2.1 Accessible information

Information is not always perfectly known, for example during a transmission over a noisy channel there is a

possibility of information loss. This means that obtaining upper bounds of accessible information can be very

useful in practice. These upper bounds are calculated in subsection 2.1.1 for the classical and in subsection

2.1.2 for the quantum case.

2.1.1 Accessible classical information: Fano™s inequality

Of major importance, in classical information theory, is the amount of information that can be extracted from

a random variable X based on the knowledge of another random variable Y. That should be given as an upper

˜

bound for H (X|Y ) , and is going to be calculated next. Suppose X f (Y ) is some function which is used

˜

as the best guess for X. Let pe p X = X be the probability that this guess is incorrect. Then an ™error™

random variable can be de¬ned

˜

1, X = X

E ˜,

0, X = X

thus H (E) = H (pe ) . Using the chaining rule for conditional entropies H (E, X|Y ) = H (E|X, Y ) + H (X|Y )

(Shannon entropy, property 8), however E is completely determined once X and Y are known, so H (E|X, Y ) = 0

and hence

H (E, X|Y ) = H (X|Y ) . (2.1)

Applying chaining rule for conditional entropies (Shannon entropy, property 8) again, but for di¬erent variables

H (E, X|Y ) = H (X|E, Y ) + H (E|Y ) , (2.2)

and further because conditioning reduces entropy (Shannon entropy, property 7), H (E|Y ) ¤ H (E) = H (p e ) ,

whence by (2.1) and (2.2)

H (X|Y ) = H (E, X|Y ) ¤ H (X|E, Y ) + H (pe ) . (2.3)

Finally H (X|E, Y ) should be bounded as follows, after some algebra

H (X|E, Y ) = p (E = 0) H (X|E = 0, Y ) + p (E = 1) H (X|E = 1, Y )

¤ p (E = 0) · 0 + pe log (|X| ’ 1) = pe log (|X| ’ 1) .

This relation with the help of (2.3) is known as Fano™s inequality:

H (pe ) + pe log (|X| ’ 1) ≥ H (X|Y ) . (2.4)

16

2.1.2 Accessible quantum information: quantum Fano inequality and the Holevo

bound

Quantum Fano inequality

There exists analogous relation to (2.4), in quantum information theory, named quantum Fano inequality:

S (ρ, E) ¤ H (F (ρ, E)) + (1 ’ F (ρ, E)) log d2 ’ 1 . (2.5)

Where F (ρ, E) is the entanglement ¬delity of a quantum operation de¬ned in (B.1), for more details refer to

appendix B.2. In the above equation, the entropy exchange of the operation E upon ρ, was introduced

S (ρ, E) S (R , Q ) ,

which is a measure of the noise caused by E, on a quantum system Q (ρ ≡ ρ Q ), puri¬ed by R. The prime

notation is used to indicate the states after the application of E. Note that the entropy exchange, does not

depend upon the way in which the initial state of Q, is puri¬ed by R. This is because any two puri¬cations of Q

into RQ are related by a unitary operation on the system R, [1, p.111], and because of von Neumann entropy,

property 2.

Quantum Fano inequality (2.5) is proven by taking an orthonormal basis |i for the system RQ, cho-

i| ρR Q |i , then it follows

sen so that the ¬rst state in the set |1 = |RQ . Forming the quantities pi

from (1.39) that S (R , Q ) ¤ H (p1, . . . , pd2 ) , and with some simple algebra H (p1, . . . , pd2 ) = H (p1 ) +

pd 2

p2

(1 ’ p1 ) H 1’p1 , . . . , 1’p1 ¤ log d2 ’ 1 , and since p1 F (ρ, E) , q.e.d.

The Holevo bound

Another result giving an upper bound of accessible quantum information is the Holevo bound [26]

H (X : Y ) ¤ S (ρ) ’ px S (ρx ) , (2.6)

x

where ρ = x px ρx . Moreover the right hand side of this inequality is useful in quantum information theory,

and hence it is given a special name: Holevo χ quantity. Concerning its proof assume that someone, named

P, prepares some quantum information system Q with states ρ X , where X = 0, . . . , n, having probabilities

p0, . . . , pn. The quantum information Q is going to be measured by another person, M, using POVM elements

{Ey } = {E0, . . . , Em} on the state and will have an outcome Y. The state of the total system before measurement

will then be

ρP QM = px |x x| — ρx — |0 0| ,

x

where the tensor product was in respect to the order P QM. Matrix |0 0| represents the initial state of the

measurement system, which holds before getting any information. The measurement is described by an operator

E, that acts on each state σ of Q by measuring it with POVM elements {Ey } , and storing on M the outcome.

This can be expressed by

E (σ — |0 0|) = Ey — |y y| .

Ey σ

y

Quantum operation E is trace preserving. To see this ¬rst notice that it is made us of operations elements

Ey — Uy , where Uy |y |y + y , with + the modulo n + 1 addition. Of course Uy is unitary since it is

a map taking |y basis vector to another basis vector |y + y , and hence it is change of basis from one basis

†

Uy Uy =I

†

’

to a cyclic permutation of the same basis. Now because Ey are POVM I = Ey = Ey Ey

y y

† †

Ey Ey — Uy Uy = I, q.e.d.

y

Subsequently primes are used to denote states after the application of E, and unprimed notation for states

before its application. Note now that S (P : Q) = S (P : Q, M ) since M is initially uncorrelated with P and Q,

and because by applying operator E it is not possible to increase mutual information (von Neumann entropy,

property 10) S (P : Q, M ) ≥ S (P : Q , M ) . Putting these results together

S (P : M ) ¤ S (P : Q) . (2.7)

17

The last equation, with a little algebra is understood to be the Holevo bound. The one on the right of

(2.7) can be found by thinking that ρP Q = x px |x x| — ρ x , hence S (P ) = H (px ) , S (Q) = S (ρ) , and

S (P, Q) = H (px ) + x px S ρχ (von Neumann entropy, property 6), thus

S (P : Q) = S (P ) + S (Q) ’ S (P, Q) = S (ρ) ’ px S (ρx ) .

x

Now the left hand side of (2.7) is found by noting that after a measurement

ρP QM

px |x x| — Ey — |y y| ,

= Ey ρx

xy

tracing out the system Q and using the observation that the joint distribution p (x, y) for the pair (X, Y )

satis¬es p (x, y) = px p (y|x) = px tr(ρx Ey ) = px tr Ey ρx Ey , it is straightforward to see that ρP M =

xy p (x, y) |x x| — |y y| , whence

S (P : M ) = S (P ) + S (M ) ’ S (P , M ) = H (X) + H (Y ) ’ H (X, Y ) = H (X : Y ) ,

q.e.d.

2.2 Data processing

As it is widely known, information except of being stored and transmitted, it is also processed. In subsection

2.2.1 data processing is de¬ned for the classical case and the homonymous inequality is proven. An analogous

de¬nition and inequality are demonstrated for the quantum case in subsection 2.2.2.

2.2.1 Classical data processing

Classical data processing can be described in mathematical terms by a chain of random variables

X1 ’ X 2 ’ X 3 ’ · · · ’ X n (2.8)

where Xi is the i-th step of processing. Of course each step depends only from the information gained by

the previous, that is p (Xn+1 = xn+1 |Xn = xn, . . . , X1) = p (Xn+1 = xn+1 |Xn = xn) , which de¬nes a Markov

chain. But as it is already accentuated information is a physical entity which can be distorted by noise. Thus

if X ’ Y ’ Z is a Markov chain representing an information process one can prove

H (X) ≥ H (X : Y ) ≥ H (X : Z) , (2.9)

which is known as the data processing inequality. This inequality reveals a mathematical insight of the following

physical truth: if a system described by a random variable X is subjected to noise, producing Y, then further

data process cannot be used to increase the amount of mutual information between the output process and the

original information X; once information is lost, it cannot be restored. It is worth mentioning that in a data

process chain X ’ Y ’ Z, information a system Z shares with X must be information which Z also shares

with Y ; the information is ™pipelined™ from X through Y to Z. This is described by the data pipelining inequality

H (Z : Y ) ≥ H (Z : X) .

This is derived by (2.9) and noting that X ’ Y ’ Z is a Markov chain, if and only if

p (X = x, Y = y, Z = z) p (Y = y, Z = z)

p (Z = z|Y = y, X = x) = p (Z = z|Y = y) ” ”

=

p (X = x, Y = y) p (Y = y)

p (X = x, Y = y, Z = z) p (X = x, Y = y)

” p (X = x|Y = y, Z = z) = p (X = x|Y = y) ,

=

p (Y = y, Z = z) p (Y = y)

if and only if, Z ’ Y ’ X is a Markov chain too. In the above proof there is no problem with null probabilities

in the denominators, because then every probability would be null, and the proven result would still hold.

18

2.2.2 Quantum data processing

The quantum analogue of data processing (2.8) is described by a chain of quantum operations

ρ ’ E1 (ρ) ’ (E2 —¦ E1 ) (ρ) ’ · · · ’ (En —¦ · · · —¦ E2 —¦ E1) (ρ) .

In the above model each step of process is obtained by application of a quantum operator. By de¬ning the

coherent information,

I (ρ, E) S (E (ρ)) ’ S (ρ, E) . (2.10)

It can be proven that

S (ρ) ≥ I (ρ, E1) ≥ I (ρ, E1 —¦ E2) , (2.11)

which corresponds to the classical data processing inequality (2.9).

19

Chapter 3

Real-life applications of information

theory

After a theoretical presentation of information theory one should look over its real-life applications. Two ex-

tremely useful applications of information theory are data compression discussed in section 3.1 and transmission

of information over noisy channels is the main topic of section 3.2.

3.1 Data compression

Nowadays data compression is a widely applied procedure; everybody uses .zip archives, listen to .mp3 music,

watches videos in .mpeg format and exchanges photos in .jpg ¬les. Although in all these cases, special techniques

depending on the type of data are used, the general philosophy underlying data compression is inherited by

Shannon™s noiseless channel coding theorem [7, 8] discussed in subsection 3.1.1. In the quantum case data

compression was theoretically found to be possible in 1995 by Schumacher [27] and is presented in subsection

3.1.2.

3.1.1 Shannon™s noiseless channel coding theorem

Shannon™s main idea was to estimate the physical resources needed to represent an information content, which

as already seen in section 1.1 are related to entropy. As a simple example to understand how the theorem works,

consider a binary alphabet to be compressed. In this alphabet assume that 0 occurs with probability 1 ’ p and 1

with probability p. Then if n-bits strings are formed, they will contain (1 ’ p) n zero bits and pn one bits, with

very high probability related to the magnitude of n, according to the law of large numbers. Since from all the

possible strings to be formed these are the most likely they are usually named typical sequences. Calculating

n

all the combinations of ones and zeros, there are totally np such strings. Using Stirling™s approximation

nn

n! , one ¬nds that

e

nn

n n! e

= (3.1)

n(1’p)

(np)! [n (1 ’ p)]!

np np np n(1’p)

e e

nn

= 2n log n’np log np’n(1’p) log n(1’p)

= np n(1’p)

(n (1 ’ p))

(np)

2n[’p log p’(1’p) log(1’p)] = 2nH(p) .

=

Generalizing the above argument for an alphabet of k letters xi ∈ X with occurrence probabilities p (xi) , the

number of typical sequences can be easily calculated using combinatorics as before, and found to be

n n!

2nH(X) .

= (3.2)

np (x1 ) , np (x2) , . . . , np (xk’1) (np (x))!

x∈X\{xk }

Where letters need not just be one symbol but also a sequence of symbols like words of English language.

Obviously the probability of such a sequence to occur is approximately 2’nH(X) . This approximate probability

20

gives a very intuitive way of understanding the mathematical terminology of Shannon™s noiseless theorem, where

a sequence y = xi1 xi2 xi3 · · · xin is de¬ned to be -typical if the probability of its occurrence is

2’n(H(X)+ ) ¤ p (y) ¤ 2’n(H(X)’ ) .

The set of all such sequences is denoted T (n, ) . Another very useful way of writing this result is

1 1

’ H (X) ¤ .

log (3.3)

p (xi1 xi2 xi3 · · · xin )

n

Now the theorem of typical sequences can be stated

Theorem of typical sequences:

1. Fix > 0, then for any δ > 0, for su¬ciently large n, the probability that a sequence is -typical is at

least 1 ’ δ.

> o and δ > 0, for su¬ciently large n, (1 ’ δ) 2n(H(X)’ ) ¤ |T (n, )| ¤ 2n(H(X)+ ) .

2. For any ¬xed

3. Let S (n) be the collection of size at most 2nR, of length n sequences, where R < H (X) is ¬xed. Then

for any δ > 0 and for su¬ciently large n, y∈S(n) p (y) ¤ δ.

The above theorem is proven by using the law of large numbers. Moreover Shannon™s noiseless channel

coding theorem, is just an application of the last stated theorem. Shannon implemented a compression scheme

which is just a map of an n-bit sequence y = xi1 xi2 xi3 · · · xin to another one of nR-bits denoted by Cn (y) .

Of course in such a compression scheme an invert map Dn (Dn —¦ Cn = idX n ) should exist, which naturally

would be named decompression scheme. However the set of typical sequences, in non-trivial cases, is only a

subset of all the possible sequences and this drives to failure of the schemes, when they will be invited to

map the complementary subset, known as atypical sequences subset. This way further nomenclature may be

added by saying that a compression decompression scheme (Cn, Dn ) is said to be reliable if the probability that

Dn (Cn (y)) = y approaches one as n approaches in¬nity. It is time to state the theorem.

Shannon™s noiseless channel coding theorem:

Assume an alphabet X then if R > H (X) is chosen there exists a reliable compression decom-

pression scheme of rate R. Conversely, if R < H (X) any such scheme will not be reliable.

This theorem is revealing a remarkable operational interpretation for the entropy rate H (X): it is just the

minimal physical resources necessary and su¬cient to reliably transmit data. Finally it should be stressed that

somebody can have a perfectly reliable compression decompression scheme just by extending maps to atypical

sequences; in this case there is just high probability that no more than nR resources are needed to carry the

information.

3.1.2 Schumacher™s quantum noiseless channel coding theorem