. 1
( 2)



>>

Quantum Information Theory
and
Applications to Quantum Cryptography




By

Nikolaos P. Papadakos
B.Sc. in Mathematics, University of Athens, 2001




Supervisor
Professor Abbas Edalat




Individual Study Option,
on the First Term of the Academic year 2001-2002,
for the M.Sc. in Advanced Computing




Department of Computing
Imperial College of Science, Technology and Medicine
University of London, United Kingdom
Contents

List of Figures iii

Acknowledgements iv

About this Individual Study Option v

1 Physics and information theory 1
1.1 Classical physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Shannon entropy and relevant measures of classical information theory . . . . . . . . . . . 2
1.1.2 Basic properties of Shannon entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Quantum physics and information theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Basic features of quantum mechanics relevant to information theory . . . . . . . . . . . . 5
1.2.2 The language of quantum information theory: quantum thermodynamics . . . . . . . . . 7
1.2.3 Von Neumann entropy and relevant measures of quantum information theory . . . . . . . 9
1.2.4 A great discrepancy between classical and quantum information theory: quantum entan-
glement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.5 Basic properties of von Neumann entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.6 Measurement in quantum physics and information theory . . . . . . . . . . . . . . . . . . 14

2 Some important results of information theory 16
2.1 Accessible information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 16
2.1.1 Accessible classical information: Fano™s inequality . . . . . . . . . ......... . . . . 16
2.1.2 Accessible quantum information: quantum Fano inequality and the Holevo bound . . . . 17
2.2 Data processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18
2.2.1 Classical data processing . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 18
2.2.2 Quantum data processing . . . . . . . . . . . . . . . . . . . . . . . ......... . . . . 19

3 Real-life applications of information theory 20
3.1 Data compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.1 Shannon™s noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.2 Schumacher™s quantum noiseless channel coding theorem . . . . . . . . . . . . . . . . . . . 21
3.2 Information over noisy channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Error correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 Classical information over noisy classical channels . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 Classical information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 25
3.2.4 Quantum information over noisy quantum channels . . . . . . . . . . . . . . . . . . . . . . 26

4 Practical quantum cryptography 27
4.1 Theoretical principles of quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1 The BB84 quantum key distribution protocol . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1.2 Information reconciliation and privacy ampli¬cation . . . . . . . . . . . . . . . . . . . . . 29
4.1.3 Privacy and security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.4 A provable secure version of the BB84 protocol . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 A commercial quantum cryptographic device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.1 Experimental quantum key distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.2 Commercial quantum cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


i
Summary 33

Appendices 35

A Omitted proofs 35
A.1 Special diagonal transformation of normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.2 Projective measurements are equivalent to an addition of unitary operations . . . . . . . . . . . . 36

B Distance measures of information 37
B.1 Classical information distance: Hamming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.2 Quantum information distance: Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Bibliography 38




ii
List of Figures

1.1 Relationships between di¬erent entropies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 A summary of basic information relative features, of quantum mechanics. . . . . . . . . . . . . . 7
1.3 Transition from classical to quantum thermodynamics. . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Quantum data compression. The compression operation Cn compresses a quantum source ρ stored
in n log d qubits into nS (ρ) qubits. The source is accurately recovered via the decompression
operation Dn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 The noisy channel coding problem for classical messages. It is required that every one of the 2 nR
possible messages should be sent uncorrupted through the channel with high probability. . . . . . 25

4.1 Alice and Bob communicating with the fear of Eve stealing information. . . . . . . . . . . . . . . 27
4.2 The setup of BB84 quantum key distribution protocol. The quantum line is used for the distri-
bution of the key, while the classical line for discussing security topics and the transmission of
encrypted messages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Typical system for quantum cryptography using polarization coding (LD: laser diode, BS: beam-
splitter, F: neutral density ¬lter, PBS: polarizing beam splitter, »/2: half waveplate, APD:
avalanche photodiode). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
S.1 Summary of classical and quantum information theory. . . . . . . . . . . . . . . . . . . . . . . . . 33
S.2 Summary of quantum cryptography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34




iii
Acknowledgements

I am very pleased to thank my supervisor Professor Abbas Edalat, who with his important comments not only
did he help me improve the presentation of this Individual Study Option, but by suggesting me to solve all
64 exercises of the last two chapters of [1, chap.11,12] he contributed dramatically to my understanding and
augmented my self-con¬dence on the ¬eld. I am indebted to him, for his e¬ort of transforming the present
material into something more than just an Individual Study Option.




iv
About this Individual Study Option

In this Individual Study Option the concepts of classical and quantum information theory are presented, with
some real-life applications, such as data compression, information transmission over noisy channels and quantum
cryptography.

Abstract
In each chapter the following topics are covered:

1. The relation of information theory with physics is studied and by using the notion of Shannon™s entropy,
measures of information are introduced. Then their basic properties are proven. Moreover important
information relevant features of quantum physics, like the disability of distinguishing or cloning states
in general are studied. In order to get a quantum measure of information, an introduction to quantum
thermodynamics is given with a special focus on the explanation of the utility of density matrix. After this
von Neumann entropy and its related measures are de¬ned. In this context a major discrepancy between
classical and quantum information theory is presented: quantum entanglement. The basic properties of
von Neumann entropy are proven and some information theoretic interpretation of quantum measurement
is given.
2. The amount of accessible information is obtained by the Fano™s inequality in the classical case, and by its
quantum analogue and Holevo™s bound in the quantum case. In addition to this classical and quantum
data processing is discussed.
3. Furthermore for real-life applications data compression is studied via Shannon™s classical and Schumacher™s
quantum noiseless channel coding theorems. Another application is transmission of classical information
over noisy channels. For this a summary of classical and quantum error correction is given, and then
Shannon™s classical noisy channel and Holevo-Schumacher-Westmoreland quantum noisy channel coding
theorems are studied. The present state of transmission of quantum information over quantum channels
is summarized.
4. A practical application of the aforementioned is presented: quantum cryptography. In this context the
BB84, a quantum key distribution protocol, is demonstrated and its security is discussed. The current
experimental status of quantum key distribution is summarized and the possibility of a commercial device
realizing quantum cryptography is presented.

Special viewpoints emphasized
The above material is in close connection with the last two chapters of [1, chap.11,12], and error correction is a
summary of the results given in chapter 10 of [1]. From this book Figures 1.1, 3.1, 3.2 and S.1, were extracted.
However there was much in¬‚uence on information theory by [2], and some results given therein like for example
equation (3.1) were extended to (3.2). Of equal impact was [3] concerning error correction, and [3, 4] concerning
quantum cryptography. Note that Figure was 4.2 extracted from [4]. Of course some notions which were vital
to this Individual Study Option were taken from almost all the chapters of [1].
However there where some topics not well explained or sometimes not enough emphasized. The most
important tool of quantum information theory, the density matrix, is misleadingly given in [1, p.98]: ”the
density matrix [...] is mathematically equivalent to the state vector approach, but it provides a much more
convenient language for thinking about commonly encountered scenarios in quantum mechanics”. The density
matrix is not equivalent to the state vector approach and is much more than just a convenient language. The
former is a tool for quantum thermodynamics, a ¬eld where is impossible to use the latter. Rephrasing it,


v
quantum thermodynamics is the only possible language. Willing to augment understanding of this meaningful
tool, subsection 1.2.2 is devoted to it, and is close to the viewpoint of [5, p.295-307], which is a classical textbook
for quantum mechanics.
In this Individual Study Option some topics are presented in a di¬erent perspective. There is an attempt to
emphasize information theoretic discrepancies between classical and quantum information theory, the greatest of
which is perhaps quantum entanglement. A very special demonstration of this di¬erence is given in subsection
1.2.4. It should be noted that some particular information theoretic meaning of measurement in physics is
presented in subsection 1.2.6.
Concerning the di¬erent views presented in this text, nothing could be more bene¬cial than the 64 exercises
of the last two chapters of [1, chap.11,12]. As an example a delicate presentation of measures of information is
given in page 1, due to exercise 11.2 [1, p.501], or subsection 1.2.4 on quantum entanglement was inspired by
exercise 11.14 [1, p.514]. In some occasions special properties where proved in order to solve the exercises. Such
a property is the preservation of quantum entropy by unitary operations (property 2 of von Neumann entropy,
in page 11), which was needed to solve for example exercise 11.19,20 [1, p.517,518]. For these last two exercises
some special lemmas were proven in appendices A.1 and A.2. It is important to notice that from the above
mentioned exercises of [1, chap.11,12] only half of them are presented here.

Notation
Finally about the mathematical notation involved, the symbol should be interpreted as ”de¬ned by”, and the
symbol ≡ as ”to be identi¬ed with”.




vi
Chapter 1

Physics and information theory

Since its invention, computer science [6] was considered as a branch of mathematics, in contrast to information
theory [7, 8] which was viewed by its physical realization; quoting Rolf Landauer [9] ”information is physical”.
The last decades changed the landscape and both computers and information are mostly approached by their
physical implementations [10, 11, 12, 13]. This view is not only more natural, but in the case of quantum
laws it gives very exciting results and sometimes an intriguing view of what information is or can be. Such an
understanding could never be inspired just from mathematics. Moreover there is a possibility that the inverse
relation exists between physics and information [14, 15] or quoting Steane [3] one could ¬nd a new methods of
studying physics by ”the ways that nature allows, and prevents, information to be expressed and manipulated,
rather than [ the ways it allows] particles to move”. Such a program is still in its infancy, however one relevant
application is presented in subsection 1.2.6. In this chapter the well established information theory based on
classical physics is presented in section 1.1, and the corresponding up to date known results for quantum physics
are going to be analyzed in section 1.2.


1.1 Classical physics and information theory
In classical physics all entities have certain properties which can be known up to a desired accuracy. This fact
gives a simple pattern to store and transmit information, by assigning information content to each of the property
a physical object can have. For example storage can be realized by writing on a paper, where information lays
upon each letter, or on a magnetic disk, where information, in binary digits (bits), is represented each of the
two spin states a magnetic dipole can have. In what concerns transmission, speech is one example, where each
sound corresponds to an information, or a second example is an electronic signal on a wire, where each state
of the electricity is related to some piece of information. Unfortunately in every day life such simple patterns
are non-economical and unreliable. This is because communication is realized by physical entities which are
imperfect, and hence they can be in¬‚uenced by environmental noise, resulting information distortion.
Concerning the economical transmission of information, one can see that the naive pattern of information
assignment presented in the last paragraph is not always an optimal choice. This is because a message, in
English language for example, contains symbols with di¬erent occurrence frequencies. Looking for example
this text one can note immediately that the occurrence probability of letter a, p a , is much greater than that
of exclamation p! . According to the naive assignment, English language symbols are encoded to codewords of
identical length l, and the average space needed to store a is lpa and of the exclamation lp! , and since pa > p! ,
a lot of space is wasted for the letter a. In order to present how a encoding scheme can be economical consider
a four letter alphabet A, B, C, D, with occurrence probabilities pA = 3 , pB = 1 , pC = pD = 16 , the subsequent
1
4 8
assignment of bits: A ’ 1, B ’ 01, C ’ 010, and D ’ 011. A message of n symbols, using this encoding, has
on average n (pA + 2pB + 3pC + 3pD ) = n 3 + 2 1 + 3 16 + 3 16 = n 11 bits instead of 2n which would needed
1 1
4 8 8
if somebody just mapped to each letter a two bit codeword.
The topics discussed in the last two paragraphs give rise to the most important information theoretic
question: which are the minimal resources needed to reliably communicate? An answer to this question can be
given by abstractly quantifying information in relevance to the physical resources needed to carry it. Motivated
by the previously demonstrated four letter alphabet example, probabilities are going to be used for such an
abstraction. One now de¬nes a function H, quantifying a piece of information I, exhibiting the following
reasonable properties:


1
1. H (I) is a function only of the probability of occurrence p of information I, thus H (I) ≡ H (p) .
2. H is a smooth function.
3. The resources needed for two independent informations with individual probabilities p, q > 0 are the sum
of the resources needed for one alone, or in mathematical language H (pq) = H (p) + H (q) .

The second and third property imply that H is a logarithmic function, and by setting q = 1 in the third it
is immediate to see that H (1) = 0. Hence H (p) = k log p, where k and a are constants to be determined (refer
to comments after equations (1.2) and (1.5)). This means that the average of resources needed when one of the
mutually exclusive set of information with probabilities p1, . . . , pn occurs is

H (p1, . . . , pn) = k pi loga pi . (1.1)
i

It should be noted that probability is not the only way of quantifying information [14, 15, 16, 17, 18, 19].

1.1.1 Shannon entropy and relevant measures of classical information theory
The function Q found in (1.1) is known in physics as entropy, and measures the order of a speci¬c statistical
system. Of course one interesting physical system is an n-bit computer memory, and if all the possible cases of
data entry are described by a random variable X with probabilities p1, . . . , p2n , then the computer™s memory
should have an entropy given by

H (X) ≡ H (p1, . . . , p2n ) ’ px log px . (1.2)
x


Here a modi¬ed version of (1.1) is used, with log ≡ log2 and k 1, an assignment to be veri¬ed after equation
(1.5). Equation (1.2) is known in information theory as the Shannon™s entropy [7]. There are two complementary
ways of understanding Shannon™s entropy. It can be considered as a measure of uncertainty before learning the
value of a physical information or the information gain after learning it.
The Shannon entropy gives rise to other important measures of information. One such is the relative entropy,
which is de¬ned by
px
H (px||qx) = ’H (X) ’
px log px log qx, (1.3)
qx
x x

and is a measure of distance between two probability distributions. This is because it can be proven that

H (px ||qx) > 0, (1.4)
H (px ||qx) 0 ” ∀x : px = qx .
=

Of course it is not a metric because as one can check H (px ||qx) = H (qx||px) is not always true. The relative
entropy is often useful, not in itself, but because it helps ¬nding important results. One such is derived using
the last equality in (1.3) and (1.4), then in a memory with n-bits

log 2n = n,
H (X) < (1.5)
1
log 2n = n ” ∀i : pi =
H(X) = ,
2n
which justi¬es the selection of k = 1 in (1.2), because in the optimal case and in absence of noise, the maximum
physical resources needed to transmit or to store an n-bit word should not exceed n. One can also see from
(1.2) that

H (X) > 0, (1.6)
0 ” system X is in a de¬nite state (p = 1).
H (X) =

Other important results are deduced relative entropy, and concern useful entropic quantities such as the
joint entropy, the entropy of X conditional on knowing Y, and the common or mutual information of X and



2
Y. Those entropies are correspondingly de¬ned by the subsequent intuitive relations


H (X, Y ) pxy log pxy , (1.7)
xy

H (X, Y ) ’ H (Y ) ,
H (X|Y ) (1.8)
H (X) + H (Y ) ’ H (X, Y ) = H (X) ’ H (X|Y ) ,
H (X : Y ) (1.9)

and can be represented in the ™entropy Venn diagram™ as shown in Figure 1.1.

"¢
 #"¢




¨¤
 ! ¤¢
 ©¨¦¤£¢ 
§¥ ¡




Figure 1.1: Relationships between di¬erent entropies.


1.1.2 Basic properties of Shannon entropy
It is worth mentioning here the basic properties of Shannon entropy:

1. H (X, Y ) = H (Y, X) , H (X : Y ) = H (Y : X) .
2. H (Y |X) ≥ 0 and thus by the second equality of (1.9) H (X : Y ) ¤ H (Y ) , with equality if and only if Y
is a function of X, Y = f (X) .
3. H (X) ¤ H (X, Y ) , with equality if and only if Y is a function of X.
4. Subadditivity: H (X, Y ) ¤ H (X) + H (Y ) with equality if and only if X and Y are independent
random variables.
5. H (Y |X) ¤ H (Y ) and thus by the second equality of (1.9) H (X : Y ) ≥ 0, with equality in each if and
only if X and Y are independent variables.
6. Strong subadditivity: H (X, Y, Z) + H (X) ¤ H (X, Y ) + H (Y, Z) , with equality if and only if Z ’
Y ’ X forms a Markov chain.
7. Conditioning reduces entropy: H (X|Y, Z) ¤ H (X|Y ) .
8. Chaining rule for conditional entropies: Let X1 , . . . , Xn and Y be any set of random variables, then
n
H (X1 , . . . , Xn |Y ) = i=1 H (Xi |Y, X1, . . . , Xi’1) .
9. Concavity of the entropy: Suppose there are probabilities qi ≥ 0, pi > 0 and then H ( ≥
i pi qj )
i pi H (qj ) , with equality if and only if qj s are identical.

The various relationships between entropies may mostly be derived from the ™entropy Venn diagram™ shown
in Figure 1.1. Such ¬gures are not completely reliable as a guide to properties of entropy, but they provide a
useful mnemonic for remembering the various de¬nitions and properties of entropy. The proofs of the above
mentioned properties follow.
Proof


3
1. Obvious from the de¬nition of joint entropy (1.7) and mutual entropy (1.9).
2. Based on the de¬nition of conditional entropy (1.8)

H (Y |X) H (X, Y ) ’ H (Y ) = ’ p (x, y) log p (x, y) + p (y) log p (y)
xy x
p (x, y)
’ p (x, y) log p (y) = ’
= p (x, y) log p (x, y) + p (x, y) log
p (y)
xy xy xy
’H (p (x, y) ||p (y)) ,
=

and using (1.4), where equality holds if and only if p (X, Y ) = p (Y ) , that is Y = f (X) .

3. It is immediately proven by the second one and using the de¬nition of conditional entropy (1.8).
4. Following the subsequent steps of calculation

H (p (x, y) ||p (x) p (y)) ’H (p (x, y)) ’ p (x, y) log p (x) p (y)
xy

’H (p (x, y)) ’ p (x, y) log p (x) ’
= p (x, y) log p (y)
xy xy
’H (p (x, y)) + H (p (x)) + H (p (y)) ,
=

where the second equality of (1.3) was used. The result follows directly from equation (1.4).
5. It is easily derived from subadditivity (property 4) and the de¬nition of conditional entropy (1.8).
6. First note that by the de¬nition of the joint entropy (1.7) and some algebra H (X, Y, Z) + H (Y ) ’
H (X, Y )’H (Y, Z) = x,y,z p (x, y, z) log p(x,y)p(y,z) . Then using the fact that log x ¤ x’1 for all positive
p(x,y,z)p(y) ln 2
x and equality achieved if and only if x = 1, the following can be concluded

p (x, y) p (y, z) p (x, y, z) p (x, y) p (y, z)
¤ ’1
p (x, y, z) log
p (x, y, z) p (y) ln 2 p (x, y, z) p (y)
xyz xyz

1 p (y) p (y) 1

= = 0,
ln 2 p (y) ln 2
y

p(x,y)p(y,z)
= 1 ” p (z|y) = p (z|x, y) ” X ’ Y ’ Z is a Markov chain, q.e.d.
with equality if and only if p(x,y,z)p(y)

7. From strong subadditivity (property 6) it straightforward that H (X, Y, Z) ’ H (Y, Z) ¤ H (X, Y ) ’ H (Y )
and from the de¬nition of conditional entropy (1.26c) the result is obvious.
8. First the result is proven for n = 2 using the de¬nition of conditional entropy (1.26c)

H (X1 , X2 |Y ) = H (X1 , X2 , Y ) ’ H (Y ) = H (X1 , X2, Y ) ’ H (X1 , Y ) + H (X1 , Y ) ’ H (Y )
H (X2 |Y, X1) + H (X1 |Y ) .
=

Now induction is going to be used to prove it for every n. Assume that the result holds for n, then
using the one for n = 2, H (X1 , . . . , Xn+1|Y ) = H (X2 , . . . , Xn+1|Y, X1 ) + H (X1 |Y ) , and applying the
inductive hypothesis to the ¬rst term on the right hand side gives
n+1 n+1
H (X1 , . . . , Xn+1|Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) + H (X1 |Y ) = H (Xi , . . . , Xn+1|Y, Xi’1) ,
i=2 i=1



q.e.d.




4
9. The concavity of Shannon entropy, will be deduced by the concavity of von Neumann™s entropy in 1.38.
However here is going to be demonstrated that binary entropy (Hbin (p) H (p, 1 ’ p)) is strictly concave,

Hbin (px1 + (1 ’ p) x2) ≥ pHbin (x1) + (1 ’ p) Hbin (x2) ,

where 0 ¤ p, x1, x2 ¤ 1 and equality holds for the trivial cases x1 = x2 , or p = 0, or p = 1. This is easily
proved by using the fact that the logarithmic function is increasing and ’p (1 ’ x) ≥ ’ (1 ’ px) , hence

Hbin (px1 + (1 ’ p) x2 ) ’ (px1 + (1 ’ p) x2) log (px1 + (1 ’ p) x2)
’ [1 ’ (px1 + (1 ’ p) x2)] log [1 ’ (px1 + (1 ’ p) x2 )]
≥ ’px1 log x1 ’ (1 ’ p) x2 log x2 ’ p (1 ’ x1) log (1 ’ px1)
’ (1 ’ p) (1 ’ x2) log [1 ’ (1 ’ p) x2]
pHbin (x1) + (1 ’ p) Hbin (x2 ) .
=

The strictness of concave property is seen by noting that only in the trivial cases inequalities such as
log px1 ¤ log (px1 + (1 ’ p) x2) could be equalities. Finally concerning the binary entropy it is obvious that
d2
because dp Hbin (p) = ’ log p’1+log (1 ’ p)+1 = 0 ” p = 1 , and for p = 0, 1, dp2 Hbin (p) = ’ 1 ’ 1’p < 0,
d 1
2 p
1
the maximum is reached at p = 2 .

Some additional notes on the properties of Shannon entropy
Concluding the properties of Shannon entropy, it should be noted that the mutual information is not always
subadditive or superadditive. One counterexample for the ¬rst is the case where X and Y are independent
identically distributed random variables taking the values 0 or 1 with half probability. Let Z = X • Y, where
• the modulo 2 addition, then H (X, Y : Z) = 1 and further calculating H (X : Z) + H (Y : Z) = 0, that is

H (X, Y : Z) H (X : Z) + H (Y : Z) .

The counterexample concerning the second case is the case of a random variable X 1 taking values 0 or 1
with half probabilities and X2 Y1 Y2 X1 . Then H (X1 : Y1 ) + H (X2 : Y2 ) = 2 and in addition to this
H (X1 , X2 : Y1, Y2 ) = 1, which means that

H (X1 : Y1 ) + H (X2 : Y2) H (X1 , X2 : Y1 , Y2) .


1.2 Quantum physics and information theory
Quantum theory is another very important area of physics, which is used to describe the elementary particles that
make up our world. The laws and the intuition of quantum theory are totally di¬erent from the classical case.
To be more speci¬c quantum theory is considered as counter intuitive, or quoting Richard Feynman, ”nobody
really understands quantum mechanics”. However quantum physics o¬ers new phenomena and properties which
can change peoples view for information. These properties are going to be investigated in this section.

1.2.1 Basic features of quantum mechanics relevant to information theory
Mathematically quantum entities are represented by Hilbert space vectors |ψ , usually normalized ψ|ψ = 1.
Quantum systems evolve unitarily, that is, if a system is initially in a state |ψ 1 , it becomes later another state
|ψ 2 after a unitary operation

U |ψ 1 = |ψ 2 . (1.10)

Unitary operations are reversible, since U U † = 1 and previous states can be reconstructed by |ψ 1 = U † |ψ2 .
What is important about such operations is that because ψ 2 |ψ2 = ψ1 | U † U |ψ1 = ψ1 | I |ψ 1 = 1, normal-
ization, which soon will be interpreted as probability, is conserved. The measurement of the properties of these
objects is described by a collection of operators {Mm } . The quantum object will found to be in the m-th state
with probability

p (m) = ψ| Mm Mm |ψ , (1.11)


5
and after this measurement it is going to be in de¬nite state, possibly di¬erent from the starting one
Mm
|ψ . (1.12)


ψ| Mm Mm

Of course m should be viewed as the measurement outcome, hence information extracted from a physical system.
This way information content can be assigned to each state of a quantum object. As a practical example, in
each energy state of an atom one can map four numbers or in each polarization state of a photon one can
map two numbers say 0 and 1. The last case is similar to bits of classical information theory, but because
a photon is a quantum entity they are named qubits (quantum bits). It should be stressed here that the

measurement operators should satisfy the completeness relation m Mm Mm = I which results m p (m) =
† †
m ψ| Mm Mm |ψ = 1 as instructed by probability theory. However this implies that ψ| M m Mm |ψ = 1 and
looking at equation (1.12) one understands that measurement is an irreversible operation.
What is very interesting about quantum entities is that they can either be in a de¬nite state or in a
superposition of states! Mathematically this is written

|ψ = cs |s .
s

Using the language of quantum theory |ψ is in state |s with probability c— cs , and because of normalization
s
the total probability of measuring |ψ is

c— cs = 1.
ψ|ψ = (1.13)
s
s

States |s are usually orthonormal to each other, hence

ψ|s = cs. (1.14)

Although being simultaneously in many states sounds weird, quantum information can be very powerful in
computing. Suppose some quantum computer takes as input quantum objects, which are in a superposition
of multiple states, then the output is going to be quantum objects which of course are going to be in multiple
states too. This way one can have many calculations done only by one computational step! However careful
extraction of results is needed [13, 20, 21, 22, 23], because quantum measurement has as outcome only one
answer from the superposition of multiple states, as equation (1.12) instructs, and further information is lost.
Then one can have incredible results, like for example calculate discrete logarithms and factorize numbers in

polynomial time [20, 21], or search an unsorted database of N objects with only N iterations [22, 23]!
In contrast to classical information which under perfect conditions can be known up to a desired accuracy,
quantum information is sometimes ambiguous. This is because one cannot distinguish non-orthogonal states
reliably. Assuming for a while that such a distinction is possible for two non-orthogonal states |ψ 1 and |ψ 2
and a collection of measurement operators {Mm } . Then according to this assumption some of the measuring
operators give reliable information whether the measured quantity is |ψ 1 or |ψ2 , and collecting them together
the following two distinguishing POVM elements can be de¬ned

Ei Mm Mm , i = 1, 2.
Mm measuring i

The assumption that these states can be reliably distinguished, is expressed mathematically

ψi | Ei |ψi = 1, i = 1, 2. (1.15)

Since i=1,2 Ei = I it follows that i=1,2 ψ 1| Ei |ψ1 = 1. Because E1 operator reliable measures the ¬rst
state, then ψ1 | E1 |ψ1 = 1, hence the other term must be

ψ1 | E2 |ψ1 = 0. (1.16)

Suppose |ψ 2 is decomposed in |ψ 1 and and orthogonal state to |ψ 1 , say |ψ ⊥ ; then |ψ 2 = ± |ψ1 + β |ψ ⊥ .
2 2
Of course |±| + |β| = 1 and |β| < 1 because |ψ 1 and |ψ 2 are not orthogonal. Using the last decomposition
2 2
and (1.16) ψ 2 | E2 |ψ2 = |β| ψ⊥ | E2 |ψ⊥ ¤ |β| < 1 which is in contradiction with (1.15).


6
In what concerns the results of the last paragraph it should be additionally mentioned that information gain
implies disturbance. Let |ψ and |φ be non-orthogonal states, without loss of generality assume that a unitary
process is used to obtain information with the aid of an ancilla system |u . Assuming that such a process does
not disturb the system, then in both cases, one obtains

|ψ — |u ’ |ψ — |v ,
|φ — |u ’ |φ — |v .

Then one would like |v and |v to be di¬erent, in order to acquire information about the states. However since
the inner products are preserved under unitary transformations, v|v ψ|φ = u|u ψ|φ ’ v|v = u|u = 1,
and hence are identical. Thus distinguishing between |ψ and |φ must inevitably disturb at least one of these
states.
However at least theoretically there is always a way of distinguishing orthonormal states. Suppose |i
|i i| plus the operator
are orthonormal, then it is straightforward to de¬ne the set of operators Mi
I ’ i=0 |i i|, which satisfy the completeness relation. Now if the state |i is prepared then p (i) =
M0
i| Mi† Mi |i = 1, thus they are reliably distinguished.
Another very surprising result is the prohibition of copying arbitrary quantum states. This is known as
no-cloning theorem [24, 25] and it can be very easily proven. Suppose it is possible to have a quantum photo-
copying machine, which will have as input a quantum white paper |w and a state to be copied. The quantum
photocopying machine should be realized by a unitary transformation U and if somebody tries to photocopy
two states |ψ and |φ , it should very naturally work as follows

U {|w — |ψ } = |ψ — |ψ ,
U {|w — |φ } = |φ — |φ .
2
Now taking the inner product of these relations ψ|φ = ψ|φ , thus ψ|φ = 0 or ψ|φ = 1, hence |ψ = |φ
or |ψ and |φ are orthogonal. This means that cloning is allowed only for orthogonal states! Thus at least
somebody can construct a device, by quantum circuits, to copy orthogonal states. For example if |ψ and |φ
are orthogonal then there exists a unitary transformation U such that U |ψ = |0 and U |φ = |1 . Then by
applying the FANOUT quantum gate, which maps the input to |00 if the input qubit was |0 and to |11 if the
it was |1 , and further applying U † — U † to the tensor product of outcoming qubits, then either the state |ψ
is will be copied and ¬nally get |ψψ , or |φ and get |φφ .
The basic information relevant features of quantum mechanics, analyzed in this subsection, are summarized
in Figure 1.2.

Basic features of quantum mechanics
1. Reversibility of quantum operations
2. Irreversibility of measurement
3. Probabilistic outcomes
4. Superposition of states
5. Distinguishability of orthogonal states by operators
6. Non-distinguishability of non-orthogonal states
7. Information gain implies disturbance
8. Non-cloning of non-orthogonal states

Figure 1.2: A summary of basic information relative features, of quantum mechanics.


1.2.2 The language of quantum information theory: quantum thermodynamics
As in the case of classical information theory thermodynamics should be studied for abstracting quantum
information. As it was already stated in the last paragraphs, quantum mechanics have some sort of built-in
probabilities. However this is not enough. The probabilistic features of quantum states are di¬erent from that
of classical thermodynamics. The di¬erence is demonstrated by taking two totally independent facts A and B.


7
classical
Then the probability of both of them occurring would be PAB = P (A) + P (B) . In contrast, in quantum
mechanics the wave functions should be added, and by calculating the inner product probabilities are obtained.
More analytically
quantum
( ψ A | + ψ B |) (|ψA + |ψB ) = ( ψA |ψA + ψ B |ψB ) + 2Re ψ — |ψB
PAB = (1.17)
A
in general
P (A) + P (B) + 2Re ψ — |ψB classical
= = PAB .
A

Moreover if somebody decides to encode some information with quantum objects then he is going to be interested
with probabilities of occurrence of the alphabet, exactly the same way it was done in section 1.1, and he would
never like to mess up with the probabilities already found in quantum entities. For this reason a quantum version
of thermodynamics is needed. In the sequel the name thermodynamical probabilities is used to distinguish the
statistical mixture of several quantum states, from the quantum probabilities occurring by observation of a
quantum state.
The basic tool of quantum thermodynamics is the density matrix and its simplest case is when the quantum
state occurs with thermodynamical probability p = 1. Then by (1.14) t|ψ ψ|s = c — cs and therefore
t

|ψ ψ| ,
ρ
is the natural matrix generalization of a quantum vector state. This was the de¬nition of a density matrix of a
pure state, in contradiction to mixture of states where several states occur with probabilities 0 < p < 1. Each
element of this matrix is
t|ψ ψ|s = c— cs.
ρts t

and by equation (1.13) where the normalization of vector was de¬ned, density matrix is correspondingly nor-
malized by
c— cs =
trρ = ρss = s|ψ ψ|s = 1. (1.18)
s
s s s

Moreover it is straightforward that unitary evolution (1.10) is described by
U |ψ ψ| U † = U ρU † , (1.19)
the probability for measuring the m-th state (1.11) is given by
† † † †
p (m) = ψ| Mm Mm |ψ = ψ| Mm |s s| Mm |ψ = ψ| Mm |s s| Mm |ψ = tr Mm ρMm , (1.20)
s s

and the state after measurement (1.12) is obtained by

Mm Mm 1 †
|ψ ψ| = Mm ρMm . (1.21)

† † tr Mm ρMm
|ψ |ψ
ψ| Mm Mm ψ| Mm Mm
Suppose now there is a collection of quantum states |ψ i occurring with thermodynamical probabilities pi ,
then the mixed density operator is simply de¬ned
pi |ψ i ψi | .
ρ (1.22)
i

This construction is precisely what was expected in the beginning of this subsection, because the thermody-
namical probabilities are real numbers and can be added, contrasting to the situation of quantum probabilities
for a state vector (1.17). The generalization of normalization (1.18), unitary evolution (1.19), probability of
measurement (1.20) and measurement (1.21) for the mixed density matrix are

pitr (|ψ i ψ i|) = 1,
trρ =
i

pi |ψi ψ i | = U ρU † ,
i

p (m) = tr Mm ρMm ,
1 †
Mm ρMm .

tr Mm ρMm


8
The above results are pretty obvious for the ¬rst, the second and the fourth, but some algebra of probabilities
is needed for the third (refer to [1, p.99] for details). The unitary evolution of the system can be generalized by
quantum operations

E (ρ) = Ei ρEi ,
i


where i Ei Ei = I. Quantum operations are sometimes referred in the literature [2] as superoperators.
Quantum thermodynamics, just described can be viewed as a transition from classical to quantum thermo-
dynamics. Suppose an ensemble of n particles in an equilibrium is given. For this ensemble assume that the
i-th particle is located at xi , has velocity vi and energy Ei (xi , vi) . Classical thermodynamics says that for the
i-th particle, there is a probability
1
e’βEi ,
pi = (1.23)
’βEj
je

1
to be in this state, where β = kB T , with kB Boltzmann™s constant and T the temperature. Now if the particles
where described by quantum mechanics, then if the i-th would have an eigenenergy Ei, given by the solution
of the Schr¨dinger equation H |ψ i = Ei |ψ i , where H is the Hamiltonian of the system. Now with the help of
o
equation (1.23) the density matrix as was de¬ned in (1.22) can be written as
1 1
|ψ i e’βEi ψi | = e’βH ,
|ψi pi ψi | =
ρ= ’βEj tre’βH
je
i i

which is a generalization of classical thermodynamical probability (1.23). This transition is illustrated in Figure
1.3.

Statistical behavior of classical particles:
Statistical behavior of quantum particles:
having velocity vi , at xi , and energy Ei (vi , xi) ,
at state |ψ i , given by H |ψ i = Ei |ψ i
e’βEi
with probability pi = e’βEi
’βEj
je with probability pi = tre’βH
x2
p2 , • p2, |ψ 2
“ v2
v
n
pn, ←’ •
pn, |ψ n
xn
Quantum
v3
p3, |ψ 3

p3 , Mechanics
x3
’’ ’ ’’
’’’’
v1
p1 , • p1 , |ψ1
x1



Figure 1.3: Transition from classical to quantum thermodynamics.



1.2.3 Von Neumann entropy and relevant measures of quantum information the-
ory
Willing to describe quantum information, one could use quantum version of entropy, and in order to justify its
mathematical de¬nition, recall how Shannon entropy was given in equation (1.2), and assume that a density
matrix ρ is diagonalized by ρ = x »x |x x| . Then naturally quantum entropy of ρ is de¬ned


S (ρ) »x log »x . (1.24)
x

Translating this into the mathematical formalism developed in the last subsection, the von Neumann entropy
is de¬ned

’tr (ρ log ρ) .
S (ρ) (1.25)



9
The last formula is often used for proving theoretical results and equation (1.24) is used for calculations. As an
example the von Neumann entropy of ρ = p |0 0| + (1 ’ p) (|0 +|1 )( 0|+ 1|) is found to be
2
1’p 1’p
p+ 2 2 = ’»1 log »1 ’ »2 log »2,
S (ρ) = S 1’p 1’p
2 2
√ √
(1+2p2 ’2p) (1+2p2 ’2p)
1+ 1’
where »1 = and »2 = are the eigenvalues of the corresponding matrix. Sur-
2 2
prisingly S (ρ) = H (p, 1 ’ p) even if the same probabilities where assigned for both of them! This shows
that quantum probabilities are not expelled by quantum thermodynamics. The equality could only hold if the
probabilities written in Shannon™s entropy are the eigenvalues of the density matrix.
Following the same path as for classical information theory, in the quantum case it is straightforward to
de¬ne the joint entropy, the relative entropy of ρ to σ, the entropy of A conditional on knowing B and the
common or mutual information of A and B. Each case is correspondingly
tr ρAB log ρAB ,
S (A, B) (1.26a)
tr (ρ log ρ) ’ tr (ρ log σ) ,
S (ρ||σ) (1.26b)
S (A, B) ’ S (B) ,
S (A|B) (1.26c)
S (A) + S (B) ’ S (A, B) = S (A) ’ S (A|B) .
S (A : B) (1.26d)
One can see that there are lot of similarities between Shannon™s and von Neumann™s entropy. As such one can
prove a result reminding equation (1.4)
S (ρ||σ) > 0, (1.27a)
0 ” ρ = σ,
S (ρ||σ) = (1.27b)
and is known as Klein™s inequality. This inequality provides evidence of why von Neumann relative entropy is
close to the notion of metric. What is also important is that it can be used, like in the classic case, to prove
something corresponding to equation (1.5) of Shannon™s entropy
S (ρ) < log d, (1.28a)
1
= log d ” ρ =
S(ρ) I. (1.28b)
d
In addition to this from the de¬nition (1.25) it follows that
S (ρ) > 0, (1.29a)
0 ” ρ is pure,
S (ρ) = (1.29b)
which resembles to equation (1.6).
One can also prove that supposing some ρi states, with probabilities pi , have support on orthogonal sub-
spaces, then

S pi ρi = H (pi ) + pi S (ρi ) .
i i

Directly from this relation the joint entropy theorem can be proven, where supposing that p i are probabilities,
|i are orthogonal states for a system A, and ρi is an set of density matrices of another system B, then

pi |i i| — ρi
S = H (pi) + pi S (ρi ) . (1.30)
i i

Using the de¬nition of von Neumann entropy (1.25) and the above mentioned theorem for the case where ρ i = σ
for every i, and let ρ be a density matrix with eigenvalues pi , and eigenvectors |i , then the entropy of a tensor
product ρ — σ is found to be
S (ρ — σ) = S (ρ) + S (σ) . (1.31)
Another interesting result can be derived by Schmidt decomposition; if a composite system AB is in a pure
state, it has subsystems A and B with density matrices of equal eigenvalues, and by (1.24)
S (A) = S (B) . (1.32)


10
1.2.4 A great discrepancy between classical and quantum information theory:
quantum entanglement
The tools developed in the above subsection can help reveal a great discrepancy between classical and quantum
information theory: entanglement. Concerning the nomenclature, in quantum mechanics two states are named
entangled if they cannot be written as a tensor product of other states. For the demonstration of the aforemen-
tioned discrepancy, let for example a composite system AB be in an entangled pure state |AB , then, because
of entanglement, in the Schmidt decomposition it should be written as the sum of more than one terms

|AB = »i |iA |iB , with |I| > 1, (1.33)
i∈I

where |iA and |iB are orthonormal bases. The corresponding density matrix is obviously ρAB = |AB AB| =
i,j∈I »i »j |iA |iB jA | jB | . As usually the density matrix of the subsystem B can be found by tracing out
system A,

»2 |iB
ρB = trA ρAB = »i»j kA |iA |iB jA |kA jB | = iB | .
i
i∈I
i,j,k∈I

Now because of the assumption |I| > 1 in (1.33) and the fact that |iB are orthonormal bases and it is impossible
to collect them together in a tensor product, subsystem B is not pure. Thus by equation (1.29a) S (B) > 0,
AB is pure thus by (1.29b) S (A, B) = 0 and obviously by (1.26b) S (A|B) < 0. The last steps can be repeated
backwards and the conclusion which can be drawn is that a pure composite system AB is entangled if and only
if S (A|B) < 0.
Of course in classical information theory conditional entropy could only be H (X|Y ) ≥ 0 (property 2 in
subsection 1.1.2) and that is obviously the reason why entangled states did not exist at all! This is an exclusive
feature of quantum information theory. A very intriguing or better to say a very entangled feature, which until
now is not well understood by physicists. However it has incredible applications, such as quantum cryptography,
which will be the main topic of the chapter 4. Concerning the nomenclature, entangled states are named after
(1.26c)
the fact that S (A|B) < 0 ⇐’ S (A, B) < S (B) which means that the ignorance about a system B can be in
quantum mechanics more than the ignorance of both A and B! This proposes some correlation between these
two systems.
How can nature have such a peculiar property? Imagine a simple pair of quantum particles, with two possible
states each |0 and |1 . Then a possible formulation of entanglement can be a state |ψ = |0 √2 + |1 √2 . After
—|0 —|1

a measurement Mm of the ¬rst particle for example, according to (1.12)

Mm |ψ = 1 (|m — |m ) + 0 (|1 ’ m — |1 ’ m ) = |m — |m ,

hence they both collapse to state |m . This example sheds light into the quantum property, where ignorance of
both particles is greater than the ignorance of one of them, since perfect knowledge about one implies perfect
knowledge about the second.

1.2.5 Basic properties of von Neumann entropy
The basic properties of von Neumann entropy, which can be compared to the properties of Shannon entropy
discussed in subsection 1.1.2, are:

1. S (A, B) = S (B, A) , S (A : B) = S (B : A) .

2. Unitary operations preserve entropy: S U ρU † = S (ρ) .
3. Subadditivity: S (A, B) ¤ S (A) + S (B) .
4. S (A, B) ≥ |S (A) ’ S (B)| (Triangle or Araki-Lieb inequality).
5. Strict concavity of the entropy: Suppose there are probabilities pi ≥ 0 and the corresponding density
matrices ρi , then S ( i pi ρi ) < i piS (ρi ) and S ( i pi ρi ) = i pi S (ρi ) ” ρi for which pi > 0 are all
identical.



11
6. Upper bound of a mixture of states: Suppose ρ = i pi ρi where pi probabilities and the correspond-
ing density matrices ρi , then S (ρ) < i piS (ρi ) + H (pi) and S (ρ) = i pi S (ρi ) + H (pi) ” ρi have
support on orthogonal subspaces.
7. Strong subadditivity: S (A, B, C) + S (B) ¤ S (A, B) + S (B, C) , or equivalently S (A) + S (B) ¤
S (A, C) + S (B, C) .
8. Conditioning reduces entropy: S (A|B, C) ¤ S (A|B) .
9. Discarding quantum systems never increases mutual information: Suppose ABC is a composite
quantum system, then S (A : B) ¤ S (A : B, C) .
10. Trace preserving quantum operations never increase mutual information: Suppose AB is a
composite quantum system and E is a trace preserving quantum operation on system B. Let S (A : B)
denote the mutual information between systems A and B before E applied to system B, and S (A : B )
the mutual information after E is applied to system B. Then S (A : B ) ¤ S (A : B) .
11. Relative entropy is jointly convex in its arguments: let 0 ¤ » ¤ 1, then

S (»A1 + (1 ’ ») A2 ||»B1 + (1 ’ ») B2 ) ≥ »S (A1 ||B1) + (1 ’ ») S (A2||B2) .

12. The relative entropy is monotonic: S ρA ||σA ¤ S ρAB ||σAB .

Proof

1. Obvious from the de¬nition of joint entropy (1.26a) and mutual entropy (1.26d).
2. Let U be a unitary matrix then S U ρU † ’tr U ρU † log U ρU † = ’tr U ρU † U (log ρ) U † , where the
fact that U ρU † and ρ are similar and hence they have the same eigenvalues, was employed. Furthermore
tr(AB)=tr(BA)
U is unitary, hence U U † = I, and the proof is concluded S U ρU † = ’tr U ρ (log ρ) U † =

U U =I
’tr U † U ρ (log ρ) ’tr (ρ log ρ)
= S (ρ) .
3. Refer to [1, p.516].
4. Assume a ¬ctitious state R purifying the system ρAB , then by applying subadditivity (property 3)
S (R) + S (A) ≥ S (A, R) . Because ρ ABR is pure, and by (1.32) S (R) = S (A, B) and S (A, R) = S (B) .
Combining the last two equations with the last inequality, S (A, B) ≥ S (B) ’ S (A) . Moreover be-
cause S (A, B) = S (B, A) , A and B can be interchanged and then S (A, B) ≥ S (A) ’ S (B) , thus
S (A, B) ≥ |S (A) ’ S (B)| . It is obvious by (1.31) that the equality holds if ρ AR = ρA — ρB . This is hard
to understand because R system was arti¬cially introduced. Another way to obtain the equality condi-
tion is by assuming ρAB has a spectral decomposition ρ AB = ik »ik |iA iA | — |kB kB | , and obviously
ρA =trB ρAB = i ( k »ik ) |iA iA | and ρB = k ( i »ik ) |kB kB | , then one can write

»jk »jk
j j
S (A, B) = S (B) ’ S (A) ” ” »ik =
»ik log »ik = »ik log .
m » im m »im
ik ik

Summing over k in the last equation
2
»jk 1
kj
” =1”
»ik = = »ik »ik = 1,
»im m » im
m
k k k

where the last relation holds because »ik ≥ 0. Now combining the last two outcomes

S (A, B) = S (B) ’ S (A) ” »ik = »jk ” »ik ≡ »i .
j

Rephrasing this result the equality condition holds if and only if the matrices trB (|i i|) have a common
eigenbasis and the matrices trA (|i i|) have orthogonal support. As an example of the above comment
consider the systems ρ A = |0 0| and ρB = 1 |0 0| + 2 |1 1| , then the entropy of each is S (A) = 0 and
1
2
S (B) = 1 and ¬nally the joint system ρAB = ρA — ρB = 1 |00 00| + 2 |01 01| has entropy S (A, B) = 1.
1
2


12
5. Introducing an auxiliary system B, whose state has an orthogonal basis |i corresponding to the states ρ i
of system A. The joint system will have a state ρ AB = i piρi — |i i| , and its entropy according to the
joint entropy theorem (1.30) is S (A, B) = H (pi ) + i pi ρi . Moreover the entropy of each subsystem will
be S (A) = S ( i pi ρi ) and S (B) = S ( i pi |i i|) = H (pi ) , and by applying subadditivity (property
3) S (A, B) ¤ S (A) + S (B) , q.e.d. Concerning its equality conditions assume ρ = ρ i for every i, then
by calculating ρAB = A B
i pi ρ i — |i i| = ρ — i pi |i i| = ρ — ρ , and equality follows from (1.31).
Conversely if S ( i pi ρi ) = i pi S (ρi ) , and suppose there is at least one density matrix which is not
equal to the others, say σ ρj , then

S (qρ + pσ) = qS (ρ) + pS (σ) , (1.34)

where the following quantities where de¬ned q pi , p pj and ρ ρi for i = j. If the density
i=j
i »i |i ρ i|ρ , σ = j κj |j σ j|σ . It is easy to ¬nd
matrices ρ, σ have a spectral decomposition, say ρ =
out that

qS (ρ) + pS (σ) = q»m log (q»m ) + pκm log (pκm ) . (1.35)
m m

This was the right hand side of (1.34). To get the left hand side assume that the matrix qρ + pσ has a
spectral decomposition qρ + pσ = m µm |m ρσ , one can ¬nd unitary matrices connecting these bases,
|i ρ = m uim |m ρσ and |j σ = m wim |m ρσ . This implies that
®« 
n|ρσ °q wjl l|ρσ 
u— l|ρσ + p —
=’ »iuim |m κj wjm |m
S (qρ + pσ) (1.36)
il
ρσ ρσ
n iml jml
« 
— log q wjl l|ρσ » |n
u— l|ρσ + p —
»iuim |m κj wjm |m
il
ρσ ρσ ρσ
iml jml
«  « 
q κj wjmwjl  log q κj wjm wjl 
»i uim u— + p —
»i uim u— + p —
=’ il il
n iml jml iml jml

=’ (q»n + pκn ) log (q»n + pκn) .
n

In the last step the fact that uij and wij are unitary (U U † = I ’ j uij u— = δ il ) was employed. Taking
lj
the left and right hand side of equation (1.34) as found in (1.35) and (1.36) correspondingly, it is simple
to verify that

q»m log (1 + pκm ) + pκm log (1 + q»m ) = 0.
m m

The fact that log (1 + pκm ) , log (1 + q»m ) ≥ 0, implies that both summations are greater or equal to zero,
and the last equality leaves no other case than being both of them zero. This can only happen if for the
non-zero »m , pm are null. An alternative proof that von Neumann entropy is concave, can be presented by
de¬ning f (p) S (pρ + (1 ’ p) σ) . Then by calculus if f (p) ¤ 0, then f is concave, that is for 0 ¤ p ¤ 1,
f (px + (1 ’ p) y) ≥ pf (x) + (1 ’ p) f (y) . Selecting x = 1, y = 0, f (p) ≥ pf (1) + (1 ’ p) f (0) , which
according to the de¬nition of f, implies that S (pρ + (1 ’ p) σ) ≥ pS (ρ) + (1 ’ p) S (σ) .
6. Refer to [1, p.518,519].
7. For the proof refer to [1, p.519-521]. The fact that these inequalities are equivalent, will be presented
here. If S (R) + S (B) ¤ S (R, C) + S (B, C) holds then by introducing an auxiliary system A, which
puri¬es the system RBC, S (R) = S (A, B, C) and S (R, C) = S (A, B) , so the last inequality becomes
S (A, B, C)+S (B) ¤ S (A, B)+S (B, C) . Conversely if S (R, B, C)+S (B) ¤ S (R, B)+S (B, C) holds by
inserting again a system A purifying system RBC, because S (R, B, C) = S (A) and S (R, B) = S (A, C) ,
the last inequality becomes S (A) + S (B) ¤ S (A, C) + S (B, C) . From this another equivalent form to
write strong subadditivity is 0 ¤ S (C|A)+S (C|B) or S (A)+S (B)’S (A, B)+S (A)+S (C)’S (A, C) ¤
2S (A) ” S (A : B) + S (A : C) ¤ 2S (A) . This inequality corresponds to the second property of Shannon


13
entropy, where H (X : Y ) ¤ H (X) , which is not always true for quantum information theory. For
1 1
example, the composite system |AB = 2 |00 + 2 |11 , has entropy S (A, B) = 0 because it is pure, and
1 1
ρA = 2 |0 0| + 2 |1 1| , thus S (A) = 1 and similarly S (B) = 1. Hence S (A : B) > S (A) .
8. Refer to [1, p.523].
9. Refer to [1, p.523].
10. Refer to [1, p.523].
11. For the proof refer to [1, p.520]. Concerning this property it should be emphasized that joint concavity
implies concavity in each input; this is obvious by selecting B1 B2 B or A1 A2 A. The converse

is not true. For example by choosing f (x, y) = ’x2 ey , which is convex on x because ‚x2 f = ’2ey ¤ 0 for
every x, y, and convex on y because ‚y2 f = ’x2ey ¤ 0 for every x, y. However f 3 4 + 2 1 , 1 (’3) + 3 3
‚ 1 22
33 3
’0.57 which is less than 3 f 4, 1 + 2 f ’3, 2
1
’0.41.
3 3 3

12. Refer to [1, p.524,525].

Using strict concavity to prove other properties of von Neumann entropy
Strict concavity (property 5) of von Neumann entropy, can be used to prove (1.28b) and moreover that the
completely mixed state 1 I on a space of d dimensions is the unique state of maximal entropy. To do this the
d
following result is stated: for a d — d normal matrix A there exists a set of unitary matrices U i such that
d
(A) (A)†
Ui AUi = tr (A) I. (1.37)
i=1

For a proof refer to appendix A.1 (the proof of a more general proposition is due to Abbas Edalat). To prove
property 2
d
1 1
as a maximal state, take A ≡ ρ any quantum state, then S (ρ) =
the uniqueness of dI S (ρ) =
i=1
d
property 5 (1.37)
(ρ) (ρ)† 1 (ρ) (ρ)†
d d
1 1
¤
i=1 d S Ui ρUi S i=1 d Ui ρUi = S dI . Hence by strict concavity (property 5)
1
any state ρ has less or equal entropy to the completely mixed state of d I, and in order to be equal they should
be identical.
Using von Neumann entropy a proof of the concavity of Shannon entropy can be provided. Let p i and qi
two probability distributions. Then

property 5
pi qj |j j| ≥ pi S (qj |j j|) =
H pi qj =S piH (qj ) , (1.38)
i i i i

with equality if and only if qj |j j|s are the same, that is qj s are identical.

1.2.6 Measurement in quantum physics and information theory
As already noted in the introduction of the present section, quantum physics seems very puzzling. This is
because there are two types of evolution a quantum system can undergo: unitary and measurement. One
understands that the ¬rst one is needed to preserve probability during evolution (see subsection 1.2.1). Then
why a second type of evolution is needed? Information theory can explain this, using the following results:

1. Projective measurement can increase entropy. This is derived using strict concavity. Let P be a projector
and Q I ’ P the complementary projector, then there exist unitary matrices U1 , U2 and a probability
† †
p such that for all ρ, P ρP + QρQ = pU1 ρU1 + (1 ’ p) U2 ρU2 (refer to A.2 for a proof), thus
property 5
† † † †
S (P ρP + QρQ) = S pU1 ρU1 + (1 ’ p) U2 ρU2 ≥ pS U1 ρU1 + (1 ’ p) S U2 ρU2 (1.39)
property 2
pS (ρ) + (1 ’ p) S (ρ) = S (ρ) .
=

Because of strict concavity the equality holds if and only if P ρP + QρQ = ρ.



14
2. General measurement can decrease entropy. One can convince himself by considering a qubit in state
1 1
ρ = 2 |0 0| + 2 |1 1| , which is not pure thus S (ρ) > 0, which is measured using the measurement
matrices M1 = |0 0| and M2 = |0 1| . If the result of the measurement is unknown then the state of the
† †
system afterwards is ρ = M1 ρM1 + M2 ρM2 = |0 0| , which is pure, hence S (ρ ) = 0 < S (ρ) .
3. Unitary evolution preserves entropy. This is already seen as von Neumann™s entropy property 2.

Now one should remember the information theoretic interpretation of entropy given throughout this chapter:
entropy is the amount of knowledge one has about a system. Result 3 instructs that if only unitary evolutions
were present in quantum theory, then no knowledge on any physical system could exist! One is relieved by
seeing that knowledge can decrease or increase by measurements, as seen by results 1 and 2, and of course that
is what measurements were meant to be in the ¬rst place.




15
Chapter 2

Some important results of information
theory

Some very important results derived by information theory, which will be useful for the development of the
next two chapters, concern the amount of accessible information and how data can be processed. These are
presented correspondingly in section 2.1 and in section 2.2, both for the classical and the quantum case.


2.1 Accessible information
Information is not always perfectly known, for example during a transmission over a noisy channel there is a
possibility of information loss. This means that obtaining upper bounds of accessible information can be very
useful in practice. These upper bounds are calculated in subsection 2.1.1 for the classical and in subsection
2.1.2 for the quantum case.

2.1.1 Accessible classical information: Fano™s inequality
Of major importance, in classical information theory, is the amount of information that can be extracted from
a random variable X based on the knowledge of another random variable Y. That should be given as an upper
˜
bound for H (X|Y ) , and is going to be calculated next. Suppose X f (Y ) is some function which is used
˜
as the best guess for X. Let pe p X = X be the probability that this guess is incorrect. Then an ™error™
random variable can be de¬ned
˜
1, X = X
E ˜,
0, X = X
thus H (E) = H (pe ) . Using the chaining rule for conditional entropies H (E, X|Y ) = H (E|X, Y ) + H (X|Y )
(Shannon entropy, property 8), however E is completely determined once X and Y are known, so H (E|X, Y ) = 0
and hence
H (E, X|Y ) = H (X|Y ) . (2.1)
Applying chaining rule for conditional entropies (Shannon entropy, property 8) again, but for di¬erent variables
H (E, X|Y ) = H (X|E, Y ) + H (E|Y ) , (2.2)
and further because conditioning reduces entropy (Shannon entropy, property 7), H (E|Y ) ¤ H (E) = H (p e ) ,
whence by (2.1) and (2.2)
H (X|Y ) = H (E, X|Y ) ¤ H (X|E, Y ) + H (pe ) . (2.3)
Finally H (X|E, Y ) should be bounded as follows, after some algebra
H (X|E, Y ) = p (E = 0) H (X|E = 0, Y ) + p (E = 1) H (X|E = 1, Y )
¤ p (E = 0) · 0 + pe log (|X| ’ 1) = pe log (|X| ’ 1) .
This relation with the help of (2.3) is known as Fano™s inequality:
H (pe ) + pe log (|X| ’ 1) ≥ H (X|Y ) . (2.4)


16
2.1.2 Accessible quantum information: quantum Fano inequality and the Holevo
bound
Quantum Fano inequality
There exists analogous relation to (2.4), in quantum information theory, named quantum Fano inequality:

S (ρ, E) ¤ H (F (ρ, E)) + (1 ’ F (ρ, E)) log d2 ’ 1 . (2.5)

Where F (ρ, E) is the entanglement ¬delity of a quantum operation de¬ned in (B.1), for more details refer to
appendix B.2. In the above equation, the entropy exchange of the operation E upon ρ, was introduced

S (ρ, E) S (R , Q ) ,

which is a measure of the noise caused by E, on a quantum system Q (ρ ≡ ρ Q ), puri¬ed by R. The prime
notation is used to indicate the states after the application of E. Note that the entropy exchange, does not
depend upon the way in which the initial state of Q, is puri¬ed by R. This is because any two puri¬cations of Q
into RQ are related by a unitary operation on the system R, [1, p.111], and because of von Neumann entropy,
property 2.
Quantum Fano inequality (2.5) is proven by taking an orthonormal basis |i for the system RQ, cho-
i| ρR Q |i , then it follows
sen so that the ¬rst state in the set |1 = |RQ . Forming the quantities pi
from (1.39) that S (R , Q ) ¤ H (p1, . . . , pd2 ) , and with some simple algebra H (p1, . . . , pd2 ) = H (p1 ) +
pd 2
p2
(1 ’ p1 ) H 1’p1 , . . . , 1’p1 ¤ log d2 ’ 1 , and since p1 F (ρ, E) , q.e.d.

The Holevo bound
Another result giving an upper bound of accessible quantum information is the Holevo bound [26]

H (X : Y ) ¤ S (ρ) ’ px S (ρx ) , (2.6)
x

where ρ = x px ρx . Moreover the right hand side of this inequality is useful in quantum information theory,
and hence it is given a special name: Holevo χ quantity. Concerning its proof assume that someone, named
P, prepares some quantum information system Q with states ρ X , where X = 0, . . . , n, having probabilities
p0, . . . , pn. The quantum information Q is going to be measured by another person, M, using POVM elements
{Ey } = {E0, . . . , Em} on the state and will have an outcome Y. The state of the total system before measurement
will then be

ρP QM = px |x x| — ρx — |0 0| ,
x

where the tensor product was in respect to the order P QM. Matrix |0 0| represents the initial state of the
measurement system, which holds before getting any information. The measurement is described by an operator
E, that acts on each state σ of Q by measuring it with POVM elements {Ey } , and storing on M the outcome.
This can be expressed by

E (σ — |0 0|) = Ey — |y y| .
Ey σ
y

Quantum operation E is trace preserving. To see this ¬rst notice that it is made us of operations elements
Ey — Uy , where Uy |y |y + y , with + the modulo n + 1 addition. Of course Uy is unitary since it is
a map taking |y basis vector to another basis vector |y + y , and hence it is change of basis from one basis

Uy Uy =I


to a cyclic permutation of the same basis. Now because Ey are POVM I = Ey = Ey Ey
y y
† †
Ey Ey — Uy Uy = I, q.e.d.
y
Subsequently primes are used to denote states after the application of E, and unprimed notation for states
before its application. Note now that S (P : Q) = S (P : Q, M ) since M is initially uncorrelated with P and Q,
and because by applying operator E it is not possible to increase mutual information (von Neumann entropy,
property 10) S (P : Q, M ) ≥ S (P : Q , M ) . Putting these results together

S (P : M ) ¤ S (P : Q) . (2.7)


17
The last equation, with a little algebra is understood to be the Holevo bound. The one on the right of
(2.7) can be found by thinking that ρP Q = x px |x x| — ρ x , hence S (P ) = H (px ) , S (Q) = S (ρ) , and
S (P, Q) = H (px ) + x px S ρχ (von Neumann entropy, property 6), thus

S (P : Q) = S (P ) + S (Q) ’ S (P, Q) = S (ρ) ’ px S (ρx ) .
x

Now the left hand side of (2.7) is found by noting that after a measurement

ρP QM
px |x x| — Ey — |y y| ,
= Ey ρx
xy

tracing out the system Q and using the observation that the joint distribution p (x, y) for the pair (X, Y )
satis¬es p (x, y) = px p (y|x) = px tr(ρx Ey ) = px tr Ey ρx Ey , it is straightforward to see that ρP M =
xy p (x, y) |x x| — |y y| , whence

S (P : M ) = S (P ) + S (M ) ’ S (P , M ) = H (X) + H (Y ) ’ H (X, Y ) = H (X : Y ) ,

q.e.d.


2.2 Data processing
As it is widely known, information except of being stored and transmitted, it is also processed. In subsection
2.2.1 data processing is de¬ned for the classical case and the homonymous inequality is proven. An analogous
de¬nition and inequality are demonstrated for the quantum case in subsection 2.2.2.

2.2.1 Classical data processing
Classical data processing can be described in mathematical terms by a chain of random variables

X1 ’ X 2 ’ X 3 ’ · · · ’ X n (2.8)

where Xi is the i-th step of processing. Of course each step depends only from the information gained by
the previous, that is p (Xn+1 = xn+1 |Xn = xn, . . . , X1) = p (Xn+1 = xn+1 |Xn = xn) , which de¬nes a Markov
chain. But as it is already accentuated information is a physical entity which can be distorted by noise. Thus
if X ’ Y ’ Z is a Markov chain representing an information process one can prove

H (X) ≥ H (X : Y ) ≥ H (X : Z) , (2.9)

which is known as the data processing inequality. This inequality reveals a mathematical insight of the following
physical truth: if a system described by a random variable X is subjected to noise, producing Y, then further
data process cannot be used to increase the amount of mutual information between the output process and the
original information X; once information is lost, it cannot be restored. It is worth mentioning that in a data
process chain X ’ Y ’ Z, information a system Z shares with X must be information which Z also shares
with Y ; the information is ™pipelined™ from X through Y to Z. This is described by the data pipelining inequality

H (Z : Y ) ≥ H (Z : X) .

This is derived by (2.9) and noting that X ’ Y ’ Z is a Markov chain, if and only if

p (X = x, Y = y, Z = z) p (Y = y, Z = z)
p (Z = z|Y = y, X = x) = p (Z = z|Y = y) ” ”
=
p (X = x, Y = y) p (Y = y)
p (X = x, Y = y, Z = z) p (X = x, Y = y)
” p (X = x|Y = y, Z = z) = p (X = x|Y = y) ,
=
p (Y = y, Z = z) p (Y = y)

if and only if, Z ’ Y ’ X is a Markov chain too. In the above proof there is no problem with null probabilities
in the denominators, because then every probability would be null, and the proven result would still hold.



18
2.2.2 Quantum data processing
The quantum analogue of data processing (2.8) is described by a chain of quantum operations

ρ ’ E1 (ρ) ’ (E2 —¦ E1 ) (ρ) ’ · · · ’ (En —¦ · · · —¦ E2 —¦ E1) (ρ) .

In the above model each step of process is obtained by application of a quantum operator. By de¬ning the
coherent information,

I (ρ, E) S (E (ρ)) ’ S (ρ, E) . (2.10)

It can be proven that

S (ρ) ≥ I (ρ, E1) ≥ I (ρ, E1 —¦ E2) , (2.11)

which corresponds to the classical data processing inequality (2.9).




19
Chapter 3

Real-life applications of information
theory

After a theoretical presentation of information theory one should look over its real-life applications. Two ex-
tremely useful applications of information theory are data compression discussed in section 3.1 and transmission
of information over noisy channels is the main topic of section 3.2.


3.1 Data compression
Nowadays data compression is a widely applied procedure; everybody uses .zip archives, listen to .mp3 music,
watches videos in .mpeg format and exchanges photos in .jpg ¬les. Although in all these cases, special techniques
depending on the type of data are used, the general philosophy underlying data compression is inherited by
Shannon™s noiseless channel coding theorem [7, 8] discussed in subsection 3.1.1. In the quantum case data
compression was theoretically found to be possible in 1995 by Schumacher [27] and is presented in subsection
3.1.2.

3.1.1 Shannon™s noiseless channel coding theorem
Shannon™s main idea was to estimate the physical resources needed to represent an information content, which
as already seen in section 1.1 are related to entropy. As a simple example to understand how the theorem works,
consider a binary alphabet to be compressed. In this alphabet assume that 0 occurs with probability 1 ’ p and 1
with probability p. Then if n-bits strings are formed, they will contain (1 ’ p) n zero bits and pn one bits, with
very high probability related to the magnitude of n, according to the law of large numbers. Since from all the
possible strings to be formed these are the most likely they are usually named typical sequences. Calculating
n
all the combinations of ones and zeros, there are totally np such strings. Using Stirling™s approximation
nn
n! , one ¬nds that
e
nn
n n! e
= (3.1)
n(1’p)
(np)! [n (1 ’ p)]!
np np np n(1’p)
e e
nn
= 2n log n’np log np’n(1’p) log n(1’p)
= np n(1’p)
(n (1 ’ p))
(np)
2n[’p log p’(1’p) log(1’p)] = 2nH(p) .
=
Generalizing the above argument for an alphabet of k letters xi ∈ X with occurrence probabilities p (xi) , the
number of typical sequences can be easily calculated using combinatorics as before, and found to be
n n!
2nH(X) .
= (3.2)
np (x1 ) , np (x2) , . . . , np (xk’1) (np (x))!
x∈X\{xk }

Where letters need not just be one symbol but also a sequence of symbols like words of English language.
Obviously the probability of such a sequence to occur is approximately 2’nH(X) . This approximate probability


20
gives a very intuitive way of understanding the mathematical terminology of Shannon™s noiseless theorem, where
a sequence y = xi1 xi2 xi3 · · · xin is de¬ned to be -typical if the probability of its occurrence is

2’n(H(X)+ ) ¤ p (y) ¤ 2’n(H(X)’ ) .

The set of all such sequences is denoted T (n, ) . Another very useful way of writing this result is
1 1
’ H (X) ¤ .
log (3.3)
p (xi1 xi2 xi3 · · · xin )
n
Now the theorem of typical sequences can be stated

Theorem of typical sequences:

1. Fix > 0, then for any δ > 0, for su¬ciently large n, the probability that a sequence is -typical is at
least 1 ’ δ.
> o and δ > 0, for su¬ciently large n, (1 ’ δ) 2n(H(X)’ ) ¤ |T (n, )| ¤ 2n(H(X)+ ) .
2. For any ¬xed
3. Let S (n) be the collection of size at most 2nR, of length n sequences, where R < H (X) is ¬xed. Then
for any δ > 0 and for su¬ciently large n, y∈S(n) p (y) ¤ δ.
The above theorem is proven by using the law of large numbers. Moreover Shannon™s noiseless channel
coding theorem, is just an application of the last stated theorem. Shannon implemented a compression scheme
which is just a map of an n-bit sequence y = xi1 xi2 xi3 · · · xin to another one of nR-bits denoted by Cn (y) .
Of course in such a compression scheme an invert map Dn (Dn —¦ Cn = idX n ) should exist, which naturally
would be named decompression scheme. However the set of typical sequences, in non-trivial cases, is only a
subset of all the possible sequences and this drives to failure of the schemes, when they will be invited to
map the complementary subset, known as atypical sequences subset. This way further nomenclature may be
added by saying that a compression decompression scheme (Cn, Dn ) is said to be reliable if the probability that
Dn (Cn (y)) = y approaches one as n approaches in¬nity. It is time to state the theorem.

Shannon™s noiseless channel coding theorem:
Assume an alphabet X then if R > H (X) is chosen there exists a reliable compression decom-
pression scheme of rate R. Conversely, if R < H (X) any such scheme will not be reliable.
This theorem is revealing a remarkable operational interpretation for the entropy rate H (X): it is just the
minimal physical resources necessary and su¬cient to reliably transmit data. Finally it should be stressed that
somebody can have a perfectly reliable compression decompression scheme just by extending maps to atypical
sequences; in this case there is just high probability that no more than nR resources are needed to carry the
information.

3.1.2 Schumacher™s quantum noiseless channel coding theorem

. 1
( 2)



>>