L(γ, D) ≥ L ’ .

Explain why

lγ (tj ) ’ lγ (tj’1 ) ¤

and deduce that, if tj’1 ¤ t ¤ t ¤ tj , then

lγ (t ) ’ lγ (t) ¤

for all 1 ¤ j ¤ n.

(ii) Use part (i) to show that lγ : [a, b] ’ R is continuous.

Exercise 9.5.9. [This a just a simple observation obscured by notation.]

We use the hypotheses and notation of Exercise 9.5.7. Let L be the length

of γ. Show that, if γ : [a, b] ’ Rm is injective, then setting θ = lγ we

have θ : [a, b] ’ [0, L] a bijective continuous map. Explain why this means

that θ’1 is continuous (see the proof of Lemma 5.6.7 if necessary). If we set

„ = γ —¦ θ’1 , show that

l„ (s) = s

for 0 ¤ s ¤ L. We say that the curve „ is the curve γ ˜reparameterised by

arc length™.

If we de¬ne Lγ : R ’ R by

for t ¤ a,

Lγ (t) = lγ (a)

Lγ (t) = lγ (t) for a < t < b,

for b ¤ t,

Lγ (t) = lγ (b)

then, since Lγ is an increasing function, we can de¬ne the Riemann-Stieltjes

integral

b

g(t) dLγ (t)

a

228 A COMPANION TO ANALYSIS

for any continuous function g : [a, b] ’ R. It follows that, if f : Rm ’ R is

continuous, we can de¬ne the ˜integral along the curve™ by

b

f (x) ds = f (γ(t)) dLγ (t).

a

γ

Note that, if the curve „ is the curve γ ˜reparameterised by arc length™ in

the sense of Example 9.5.9, then

L

f (x) ds = f („ (t)) dt.

0

„

We have de¬ned an integral along a curve, but we have not shown how

to calculate it. If γ is su¬ciently smooth, we can proceed as follows.

Exercise 9.5.10. Suppose x, y : [a, b] ’ R are continuous functions, >0

and A, B are real numbers such that

|(x(t) ’ x(s)) ’ A(t ’ s)|, |(y(t) ’ y(s)) ’ B(t ’ s)| ¤

for all t, s ∈ [a, b]. Show that, if γ : [a, b] ’ R2 is the curve given by

γ(t) = (x(t), y(t)), then

(min(|A| ’ , 0))2 + (min(|B| ’ , 0))2 (t ’ s)2 ¤ γ(t) ’ γ(s) 2

¤ (|A| + )2 + (|B| + )2 (t ’ s)2

for all t, s ∈ [a, b]. Deduce that γ is recti¬able and

1/2

(min(|A| ’ , 0))2 + (min(|B| ’ , 0))2 (b ’ a) ¤ length(γ)

1/2

¤ (|A| + )2 + (|B| + )2 (b ’ a).

Exercise 9.5.11. This exercise uses the ideas of the previous exercise to-

gether with the mean value inequality. Suppose that x, y : [a, b] ’ R

have continuous derivatives (with the usual conventions about left and right

derivatives at end points). Show that the curve γ : [a, b] ’ R2 given by

γ(t) = (x(t), y(t)) is recti¬able and, by considering the behaviour of

lγ (s) ’ lγ (t)

s’t

as s ’ t, show that lγ is everywhere di¬erentiable on [a, b] with

lγ (t) = (x (t)2 + y (t)2 )1/2 .

Explain why your proof does not apply to the counterexample in part (ii)

of Exercise 9.5.5.

229

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Figure 9.1: Arc length via polygonal paths

Exercise 9.5.12. Use Exercise 9.4.10 to compute the lengths of the curves

γ 1 to γ 6 de¬ned on page 224.

Using Exercise 9.4.10, we see that, if x, y : [a, b] ’ R2 have continuous

derivatives (with the usual conventions about left and right derivatives at end

points), and we consider the curve γ : [a, b] ’ R2 given by γ(t) = (x(t), y(t)),

then, if f : R2 ’ R is continuous,

b

f (x(t), y(t))(x (t)2 + y (t))1/2 dt,

f (x) ds =

a

γ

a result coinciding with that we would expect from mathematical methods

courses.

Exercise 9.5.13. Extend this result to smooth curves γ : [a, b] ’ Rm , giving

as much (or as little) detail as seems to you desirable.

This seems highly satisfactory until we read more advanced texts than

this and discover that, instead of using the sophisticated general ideas of

this section, these advanced texts use a rigorised version of the mathematical

methods approach. Why do they do this?

There are various reasons, but one of the most important is that the

approaches developed here do not apply in higher dimensions. More specif-

ically, we obtained arc length by ˜approximating by polygonal paths™ as in

Figure 9.1.

In 1890, H. A. Schwarz10 published an example showing that any naive at-

tempt to ¬nd the area of a surface by ˜approximating by polyhedral surfaces™

must fail. The example provides a good exercise in simple three dimensional

geometry.

10

The Schwarz of the Cauchy-Schwarz inequality.

230 A COMPANION TO ANALYSIS

Figure 9.2: Part of Schwarz™s polyhedral approximation

Figure 9.3: Two more views of Schwarz

Exercise 9.5.14. (Schwarz™s counterexample.) Split a 1 by 2π rectangle

into mn rectangles each m’1 by 2πn’1 , and split each of these into four

triangles by means of the two diagonals. Bend the large rectangle into a

cylinder of height 1 and circumference 2π, and use the vertices of the 4mn

triangles as the vertices of an inscribed polyhedron with 4mn ¬‚at triangular

faces11 . We call the area of the resulting polyhedron A(m, n).

In Figure 9.2 we show one of the nm rectangles ABCD with diagonals

meeting at X before and after bending. In our discussion, we shall refer only

to the system after bending. Let W be the mid point of the arc AB, Y the

mid point of the chord AB and Z the mid point of the line BD as shown in

Figure 9.3.

By observing that W A = XZ, or otherwise, show that the area of the tri-

angle AXC is (2m)’1 sin(π/n). Show that Y W has length 2(sin(π/2n))2 and

deduce, or prove otherwise, that the triangle XAB has area sin(π/n)((2m) ’1 +

4(sin(π/2n))2 ). Conclude that

1/2

π π 2

2

A(m, n) = n sin 1 + 1 + 16m sin .

n 2n

11

The last two sentences are copied directly from Billingsley™s splendid Probability and

Measure [6], since I cannot see how to give a clearer description than his.

231

Please send corrections however trivial to twk@dpmms.cam.ac.uk

If we choose nj ’ ∞ and mj ’ ∞ in such a way that m2 /nj ’ » as

j

j ’ ∞, what happens to A(mj , nj )? Can you choose nj ’ ∞ and mj ’ ∞

so that A(mj , nj ) ’ ∞ as j ’ ∞? Can you choose nj ’ ∞ and mj ’ ∞

so that A(mj , nj ) is bounded but does not converge? Can you choose nj ’ ∞

and mj ’ ∞ so that A(mj , nj ) ’ π?

Exercise 9.5.15. Explain in words and without using calculations why, if

we ¬x n, A(m, n) ’ ∞ as m ’ ∞. (Unless you can give a simple ge-

ometric account of what is going on, you have not understood Schwarz™s

example.) Deduce directly that we can choose nj ’ ∞ and mj ’ ∞ such

that A(mj , nj ) ’ ∞ as j ’ ∞. [Thus, if we simply want to show that the

naive approach fails, we do not need the calculations of the previous exercise.

However, if we want more information (for example, necessary and su¬cient

conditions for A(mj , nj ) ’ 2π) then we must do the calculations.]

Once the reader has grasped the point of Exercise 9.5.15, she may feel

that it is obvious what is wrong. However, the problem is not that we cannot

see what the area of a cylinder ought to be, but that we cannot think of a

simple de¬nition of area which will apply to general surfaces in the same

way that our de¬nition of length applied to general curves. Up to now, all

attempts to produce a de¬nition of area to parallel our de¬nition of length

have failed12 .

If a surface is su¬ciently well behaved, it is more or less clear how to frame

a notion of approximation by polyhedra which excludes Schwarz™s example.

But, if a surface is su¬ciently well behaved, we can produce a rigorous de¬ni-

tion of its area by tidying up the standard mathematical methods de¬nition.

If we are going to do this for the area of a surface and its higher dimensional

analogues, it seems a waste of time to have a special theory for length. Life,

as the saying goes, is too short to stu¬ an olive. The next exercise gives the

standard development.

Exercise 9.5.16. (Standard treatment of line integrals.) We deal only

with curves γ : [a, b] ’ Rm which are continuously di¬erentiable (so that,

writing

γ(t) = (γ1 (t), γ2 (t), . . . , γm (t)),

we know that γj : [a, b] ’ R exists and is continuous). If f : Rm ’ R is

continuous, we de¬ne

b

f (γ1 (t), γ2 (t), . . . , γm (t))(γ1 (t)2 + γ2 (t)2 + · · · + γm (t)2 )1/2 dt.

f (x) ds =

a

γ

12

There are other ways of tackling this problem. Once again I refer the reader to the

circle of ideas associated with the name of Hausdor¬.

232 A COMPANION TO ANALYSIS

Suppose that θ : [c, d] ’ [a, b] is a surjective function with continuous

derivative. Explain why „ = γ —¦ θ is a continuously di¬erentiable function

from [c, d] to Rm . Show that, if θ is injective, then

f (x) ds = f (x) ds,

„ γ

but give an example to show that, if θ is not injective, this may not hold.

We de¬ne the length of γ to be 1 ds.

γ

Chapter 10

Metric spaces

Sphere packing ™

10.1

(In this section and the next we are much more interested in ideas than

rigour. We shall use methods and results which go well beyond the scope of

this book.)

Human beings prefer order to disorder. Asked to pack a crate of oranges,

we do not throw them in at random, but try to pack them in a regular

pattern. But choosing patterns requires insight and sometimes we have no

insight. Consider the problem of packing n dimensional balls in a very large

box. (We shall take our balls and boxes to be open, but it should be clear

that such details do not matter.)

As an example of a regular packing, let the balls have radius 1/2 and let

us adopt a cubical packing so that the centre of each ball is one of integer

points Zn . Is this reasonably e¬cient or not? To answer this question it is

helpful to know the volume Vn (r) of an n dimensional ball of radius r.

Lemma 10.1.1. (i) If we write Vn = Vn (1), then Vn (r) = Vn rn .

(ii) If a > 0 and I[0,a] : [0, ∞) ’ R is de¬ned by I[0,a] (t) = 1 if t ∈ [0, a],

I[0,a] (t) = 0, otherwise, then

∞

I[0,a] ( r ) dV = nVn I[0,a] (r)rn’1 dr.

Rn 0

k

(iii) If f : [0, ∞) ’ R is given by f = »j I[0,aj ] , then

j=1

∞

f (r)rn’1 dr.

f ( r ) dV = nVn

Rn 0

(iv) If f : [0, ∞) ’ R is such that f (r)r n+1 ’ 0 as r ’ ∞, then

∞

f (r)rn’1 dr.

f ( r ) dV = nVn

Rn 0

233

234 A COMPANION TO ANALYSIS

(v) Taking f (r) = exp(’r 2 /2) in (ii), we obtain

∞

n/2

rn’1 exp(’r2 /2) dr,

(2π) = nVn

0

and so

πn n!22n π n’1

V2n = , V2n’1 = .

n! (2n)!

Sketch proof. (i) Use similarity.

(ii) This is a restatement of (i).

(iii) Use linearity of the integral.

(iv) Use an approximation argument.

(v) Using repeated integration,

∞ ∞ ∞

exp(’(x2 + x2 + · · · + x2 )/2) dx1 dx2 . . . dxn

···

f ( r ) dV = 1 2 n

Rn ’∞ ’∞ ’∞

∞ ∞ ∞

exp(’x2 /2) dx1 exp(’x2 /2) dx2 · · · exp(’x2 /2) dxn = (2π)n/2 .

= 1 2 n

’∞ ’∞ ’∞

On the other hand, integration by parts gives

∞ ∞

n’1 2

rn’3 exp(’r2 /2) dr,

exp(’r /2) dr = (n ’ 2)

r

0 0

so

∞

n/2

rn’1 exp(’r2 /2) dr

(2π) = nVn

0

∞

rn’3 exp(’r2 /2) dr = (2π)(n’2)/2 nVn /Vn’2

= n(n ’ 2)Vn

0

and Vn = (2π)Vn’2 /n. The stated results follow by induction.

Remark: Since we are not seeking rigour in this section, I have only sketched

a proof. However, few mathematicians would demand more proof for this

result. As I emphasise in Appendix C, we need rigour when we make general

statements which must apply to objects we have not yet even imagined. Here,

we need a result about a speci¬c property of a speci¬c object (the volume of

an n-dimension sphere), so we can have much more con¬dence in our sketched

proof.

If we use a cubical packing, the proportion of space occupied by balls is

2 Vn (think about the ball center q radius 1/2 inside the cube n (qj ’

’n

j=1

235

Please send corrections however trivial to twk@dpmms.cam.ac.uk

1/2, qj ’ 1/2)) and Lemma 10.1.1 shows that this proportion drops very

rapidly indeed as n becomes large.

What happens if we ˜just place balls wherever we can™. Suppose we are

trying to ¬ll the cube

C = {x : ’N ’ 1/2 < xj < N + 1/2 [1 ¤ j ¤ n]}

with non-intersecting balls of radius 1/2, and that we have managed to place

m such balls

Sk = {x : x ’ yk < 1/2} [1 ¤ k ¤ m].

Now consider the balls with the same centres and doubled radius

σk = {x : x ’ yk < 1} [1 ¤ k ¤ m].

A little thought shows that, if y lies inside the cube

C = {x : ’N < xj < N [1 ¤ j ¤ n]}

and does not lie in any σk , then the ball

S = {x : x ’ y < 1/2}

lies in C and does not intersect any Sk , so we may add another ball to our

collection. Such a point y will always exist if

m

Vol C > Vol σk

k=1

m m

σk , and so C \ σk = …). Thus if

(since then Vol C > Vol k=1 k=1

Vol C > mVn

we can add extra balls and so, by just ¬lling up available spaces, without

following any pattern, we can ¬nd at least Vol C /Vn disjoint balls of radius

1/2 in C. Since the volume of a ball of radius 1/2 is 2’n Vn the proportion of

1

space occupied by balls is 2’n Vol C / Vol C = 2’n (1 + 2N )’n so, for N large,

the proportion of space occupied by balls is essentially 2’n , an astounding

gain in e¬ciency over cubical packing.

What morals should we draw? The most immediate is that we do not

know what an n-dimensional ball looks like! (Here, ˜we™ refers to the writer

and most of his audience. Specialists in topics like the geometry of Banach

spaces do have substantial insight.) The second is that, when we have little

insight, trying to impose order on things may well be counter productive.

236 A COMPANION TO ANALYSIS

Shannon™s theorem ™

10.2

It costs money to send messages. In the old days, telegrams might be charged

for by the word. Nowadays, we pay for communication channels which carry

so many bits per second (so we are charged by the bit). Suppose that I can

a¬ord to send three bits of information to my partner. Then we can make

up eight words 000, 001, . . . and use them to send eight di¬erent messages.

For example, 000 could mean ˜send money™, 001 ˜come at once™, 010 ˜sell all

shares™, 011 ˜¬‚ee the country™ and so on. Unfortunately, bits are sometimes

received wrongly, so 011 could be sent and 001 received with unfortunate

consequences. We might, therefore, decide to have only two messages 000

(˜all well™) and 111 (˜¬‚ee at once™) so that, if at most one digit was received

wrongly, my partner could still take the correct action.

Exercise 10.2.1. Explain why the last statement is true.

If I can a¬ord to send 100 bits, then we can still decide to only have two

messages 0000 . . . 0 and 1111 . . . 1 but, if mistakes are rare, this is extremely

wasteful. How many di¬erent messages can we send and still be reasonably

con¬dent that we can tell which message was transmitted even when errors

occur? This section gives a simple but very useful model for the problem

and solves it.

Consider X = {0, 1}n as a vector space over F2 1 . Each element x ∈

{0, 1}n may be considered as a ˜word™ so X contains 2n possible words. In

transmitting a word over a noisy channel it may become corrupted to

x+e

where each coordinate ej of the random error e takes the value 1 with prob-

ability p and the value 0 with probability 1 ’ p, independent of the other

coordinates [0 < p < 1/2].

Exercise 10.2.2. Why do we not have to consider p > 1/2? Why is it

hopeless to consider p = 1/2?

If we choose q such that 1/2 > q > p > 0, then the law of large numbers

tells us that, provided n is large enough, it is very unlikely that more than

qn of the j are non-zero2 We may, therefore, simplify our problem to one in

which we know that at most qn of the j are non-zero.

It is natural to introduce the following notion of the distance d(x, y)

between two words (the ˜Hamming distance™).

1

This means that when we add two vectors x and y the jth component of x + y has

the form xj + yj where 0 + 0 = 1 + 1 = 0 and 1 + 0 = 0 + 1 = 1.

2

We prove this result directly in Exercise K.173.

237

Please send corrections however trivial to twk@dpmms.cam.ac.uk

De¬nition 10.2.3. If x, y ∈ X, we write

n

|xi ’ yi |.

d(x, y) =

j=1

Suppose that we only transmit the words y1 , y2 , . . . ym and that the

balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ m],

do not intersect. Since there are no more than qn errors, the received message

will lie in one of the balls Sj , say, and the transmitted message will have been

yj . Thus our system allows us to communicate m distinct messages correctly

in spite of the random noise e.

This sounds impressive, but is not very useful unless m is reasonably

large. How large can m be? This is clearly a variant of our orange packing

problem. The natural way to proceed is to de¬ne the volume of a subset E

of X to be the number of points in E. Reusing the ideas of our previous

argument, we consider the balls with the same centres and doubled radius

σk = {x : d(x, yk ) < 2qn} [1 ¤ k ¤ m].

If y does not lie in any σk , then the ball

S = {x : d(x, y) < qn}

does not intersect any Sk , so we may add another ball to our collection. Such

a point y will always exist if

m

Vol X > Vol σk ,

k=1

that is, if

m

2n > Vol σk .

k=1

The only problem that faces us, is to estimate the volume of a ball in this

context.

The key turns out to be a simple but, to many people, initially surprising

result.

238 A COMPANION TO ANALYSIS

Lemma 10.2.4. Suppose 0 < » < 1/2.

(i) If 1 ¤ r ¤ »n then

n » n

¤ .

r’1 1’» r

(ii) There exists a constant A(») such that a ball of radius »n in X has

n n

volume at most A(») and at least .

[»n] [»n]

Proof. (i) Observe that

r

n n r »

n

¤

= = .

1 ’ r’1

r’1 n+1’r 1’»

r n

(ii) Thus, if r» = [»n], the greatest integer less than »n, we have

n

Volume ball radius »n =

r

r¤»n

r» ’r

» n

¤

1’» r»

r¤r»

∞ m

» n

¤

1’» r»

m=0

1’» n

= .

1 ’ 2» [»n]

Taking A(») = (1 ’ »)/(1 ’ 2»), we have the required result.

Exercise 10.2.5. (i) If 1 > » > 1/2, ¬nd a good estimate for the volume of

a ball of radius »n in X when n is large.

(ii) Lemma 10.2.4 says, in e¬ect, that the volume of an appropriate ball in

X is concentrated near its ˜surface™ (that is those points whose distance from

the centre is close to the radius). To what extent is this true for ordinary

balls in Rm when m is large?

In part (i) of Lemma 10.2.6 we recall a simple form of Stirling™s formula

(obtained in Exercise 9.2.7 (iii)). In the rest of the lemma we use it to obtain

estimates of the volume V (», n) of a ball of radius »n in X = {0, 1}n when

n is large. We use the notation

log b

loga b =

log a

where a, b > 0. (The reader probably knows that log a b is called ˜the loga-

rithm of b to base a™. She should check the relation aloga b = b.)

239

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 10.2.6. (i) loge N ! = N loge N + N + θ(N )N , where θ(N ) ’ 0 as

N ’ ∞.

(ii) If 0 < » < 1/2, then

n’1 loge V (», n) ’ ’» loge » ’ (1 ’ ») loge (1 ’ »)

as n ’ ∞.

(iii) If 0 < » < 1/2, then

n’1 log2 V (», n) ’ ’» log2 » ’ (1 ’ ») log2 (1 ’ »)

as n ’ ∞.

Proof. (i) This is Exercise 9.2.7 (iii).

(ii) In what follows we shall be replacing a real number y by an integer

m with |m ’ y| ¤ 1. Observe that, if f is well behaved, the mean value

inequality gives |f (y) ’ f (m)| ¤ sup|t’y|¤1 |f (t)|.

By Lemma 10.2.4, part (i) of the present lemma, and the remark just

made, there exist θ1 (n), θ2 (n) ’ 0, as n ’ ∞ such that

n

n’1 loge V (», n) = n’1 loge

[»n]

= n’1 loge n! ’ loge (n ’ [»n])! ’ loge [»n]!

= n’1 n loge n + n ’ (n ’ [»n]) loge (n ’ [»n]) ’ [»n] ’ [»n] loge [»n] + θ1 (n)

= n’1 n loge n + n ’ (n ’ »n) loge (n ’ »n) ’ »n ’ »n loge »n + θ2 (n)

= n’1 n loge n + n ’ (1 ’ »)n(loge (1 ’ ») + loge n) ’ »n(loge » + loge n) + θ2 (n)

= ’» loge » ’ (1 ’ ») loge (1 ’ ») + θ2 (n),

and this is the required result.

(iii) Just apply the de¬nition of log 2 .

Let us write H(0) = H(1) = 0 and

H(») = ’» log2 » ’ (1 ’ ») log2 (1 ’ »)

for 0 < » < 1.

Exercise 10.2.7. Prove the following results and then sketch the graph of

H.

(i) H(s) = H(1 ’ s) for all s ∈ [0, 1].

(ii) H : [0, 1] ’ R is continuous.

(iii) H is twice di¬erentiable on (0, 1) with H (t) < 0 for all t ∈ (0, 1).

(iv) 0 ¤ H(t) ¤ 1 for all t ∈ [0, 1]. Identify the maximum and minimum

points.

(v) H(t)/t ’ 1 as t ’ 0 through positive values.

240 A COMPANION TO ANALYSIS

Our discussion has shown that we can ¬nd 2n(1’H(2q)) code words all a

Hamming distance at least q n apart. We can immediately interpret this

result in terms of our original problem.

Lemma 10.2.8. Consider our noisy channel. If p, the probability of error

in a single bit, is less than 1/4, then, choosing p < p < 1/4, we can transmit,

with error rate as small as we please, so that information is passed at a rate

of 1 ’ H(2p ) times that of our original channel.

Shannon pushed the argument a little bit further. Let · > 0 be small

and let N be such that

N — Vol ball radius qn ≈ · Vol X.

Choose N points yj at random in X and let them be code words. There is

no reason to expect that the balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ N ],

do not intersect (and excellent reasons for supposing that they will).

Exercise 10.2.9. (This requires some experience with probability.) Write

Yij = 1 if Si and Sj intersect [i = j], Yij = 0, otherwise. What is EYij ?

What is E 1¤j<i¤N Yij ? If q is small show that E 1¤j<i¤N Yij is large.

What does this tell you about the number of intersecting balls?

If the balls intersect, then, even if less than qn errors are made, in trans-

mission, the received message may belong to two or more balls. In such a

case we simply agree that our system has failed. How probable is such a

failure? Suppose we transmit y1 and receive z = y1 + e. Since the remaining

yj with 2 ¤ j ¤ N , have been chosen at random, independently of yj , the

probability that z lies in N Sj has nothing to do with how we de¬ned z

j=2

and is just

Vol( N Sj ) N

Vol Sj

j=2 j=2

¤ ≈ ·.

Vol X Vol X

Thus the probability of failure of the type discussed in this paragraph can

be kept at any speci¬ed level · whilst still allowing roughly · Vol S1 / Vol X

code words.

Exercise 10.2.10. How can you reconcile this last result with Exercise 10.2.9?

We have thus obtained the full Shannon™s theorem.

241

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Theorem 10.2.11. (Shannon.) Consider our noisy channel. If p, the

probability of error in a single bit, is less than 1/2, then, choosing p < p <

1/2, we can transmit, with error rate as small as we please, so that informa-

tion is passed at a rate of 1 ’ H(p ) times that of our original channel.

Exercise 10.2.12. (For the knowledgeable only.) The statement ˜We have

thus obtained the full Shannon™s theorem™ is a bit optimistic since there are

quite a lot of loose ends to tie up. Tie them up.

In the 50 or so years since Shannon proved his theorem, no one has

produced non-random codes to match his random codes for long code words.

However, non-random methods are beginning to catch up. For our ¬rst

problem of sphere packing in Rn , the best known packings in high dimensions

are lattice (so very highly ordered) packings. The experts believe that this

re¬‚ects our ignorance rather than the truth.

10.3 Metric spaces

We discovered our proof of Shannon™s theorem (at least in the form of

Lemma 10.2.8) by drawing heavily on analogies with distance and volume in

Rn . Many proofs in analysis and elsewhere have been discovered by exploit-

ing analogies between the usual distance in Rn and ˜things that look more

or less like distance™.

Recalling that ˜once is a trick, twice is a method, thrice a theorem and

four times a theory™, we seek to codify this insight.

Our ¬rst try is to model the properties of Euclidean distance set out in

Lemma 4.1.4.

De¬nition 10.3.1. Let V be a real vector space and N : V ’ R a function.

We write x = N (x) and say that is a norm if the following conditions

hold.

(i) x ≥ 0 for all x ∈ V ,

(ii) If x = 0, then x = 0,

(iii) If » ∈ R and x ∈ V , then »x = |»| x .

(iv) (The triangle inequality) If x, y ∈ V , then x + y ¤ x + y .

·

Note that some mathematicians prefer to write instead of . The

˜·™ acts as a placeholder.

Exercise 10.3.2. Check that the usual Euclidean norm on Rm is indeed a

norm.

242 A COMPANION TO ANALYSIS

The appropriate de¬nition when V is a vector space over C, rather than

R, runs similarly.

De¬nition 10.3.3. Let V be a complex vector space and N : V ’ R a

function. We write x = N (x) and say that is a norm if the following

conditions hold.

(i) x ≥ 0 for all x ∈ V ,

(ii) If x = 0, then x = 0,

(iii) If » ∈ C and x ∈ V , then »x = |»| x .

(iv) (The triangle inequality) If x, y ∈ V , then x + y ¤ x + y .

We say that (V, ) is a normed space.

However, we wish to have a notion of distance which applies to spaces

which do not have a vector space structure. We seek a de¬nition modelled

on those properties of Euclidean distance which do not refer to vector space

structures. (Thus you should both compare and contrast De¬nition 10.3.1.)

De¬nition 10.3.4. We say that (X, d) is a metric space if X is a set and

d : X 2 ’ R is a function with the following properties:-

(i) d(x, y) = 0 if and only if x = y.

(ii) d(x, y) = d(y, x) for all x, y ∈ X.

(iii) (The triangle inequality) d(x, z) ¤ d(x, y)+d(y, z) for all x, y, z ∈ X.

Exercise 10.3.5. Show that, if (X, d) is a metric space, then d(x, y) ≥ 0 for

all x, y ∈ X.

It is a remarkable fact that the general notion of a metric space was ¬rst

introduced by Fr´chet in 1906 and the general notion of a normed space by

e

Banach in 1932! (We shall see one reason for the late emergence of the notion

of a norm in Theorem 10.4.6.)

Exercise 10.3.6. If is a norm over the (real or complex) vector space

V and we set d(x, y) = x ’ y , show that (V, d) is a metric space.

The conditions of De¬nition 10.3.4 are sometimes called the axioms for

metric space. However, they are not axioms in the same sense as the Funda-

mental Axiom of Analysis which asserts a fundamental principle of argument

which is to be accepted without further proof3 . Instead they try to isolate the

common properties of an interesting class of mathematical objects. (Com-

pare the de¬nition of birds as ˜oviparous, warm-blooded, amniotic vertebrates

which have their anterior extremities transformed into wings. Metacarpus

3

Within the context of a particular system. The axioms of one system may be theorems

in a another. We shall discuss this further in Section 14.3.

243

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and ¬ngers carry feathers or quills. There is an intertarsal joint and not

more than 4 toes, of which the ¬rst is a hallux.™ Thus penguins are and

dodos were birds but dragon¬‚ies and bats are not.)

At ¬rst sight, axiom systems like that for metric systems seem merely

methods for economising on the number of theorems we have to prove. A

theorem which applies to all metric spaces will not need to be proved for

each individually. However, a successful axiom system should be a powerful

research tool by suggesting appropriate questions. (You will learn more about

a penguin by comparing it to a hawk than by comparing it to a frog.) Thus

if we have a ˜space on which we do analysis™ we can ask ˜can we make it into

a metric space?™. If it is a metric space we can then compare and contrast

it with other metric spaces that we know and this may well suggest new

theorems.

Although concepts like metric space are intended to apply to important

natural systems, we often study ˜toy examples™ in order to gain insight into

what can happen in such a system. Here is such a ˜toy example™.

Exercise 10.3.7. We work in Rm with the usual Euclidean norm. Show that

if d(x, y) = x + y when x = y and d(x, x) = 0, then (R m , d) is a metric

space.

This metric is called the British Railway non-stop metric. To get from A

to B in the fastest time, we travel via London (the origin). Here is another

version of the same idea which we call the British Railway stopping metric.

Exercise 10.3.8. We work in Rm with the usual Euclidean norm. If there

exists a real » such that »x = y, we write d(x, y) = x ’ y . Otherwise, we

set d(x, y) = x + y . Show that (Rm , d) is a metric space.

Much of the ˜mere algebra™ which we did for the Euclidean metric car-

ries over with hardly any change to the general metric case. Compare the

following de¬nition with De¬nition 4.1.8.

De¬nition 10.3.9. Let (X, d) be a metric space. If an ∈ X for each n ≥ 1

and a ∈ X, then we say that an ’ a if, given > 0, we can ¬nd an n0 ( )

such that

for all n ≥ n0 ( ).

d(an , a) <

We call a the limit of an .

Exercise 10.3.10. (i) Let d1 be the British Railway non-stop metric on Rm

de¬ned in Exercise 10.3.7. Show that xn ’ x in (Rm , d1 ) if and only if

244 A COMPANION TO ANALYSIS

(A) there exists an N such that xn = x for all n ≥ N , or

(B) x = 0 and xn ’ 0 as n ’ ∞.

(ii) Find and prove the corresponding result for the British Railway stop-

ping metric of Exercise 10.3.8.

If (V, ) is a normed space, we say that xn ’ x in (V, ) if xn ’ x in

the derived metric. The following result on limits in such spaces is an easy

generalisation of Lemma 4.1.9.

Lemma 10.3.11. Let V be a real vector space and a norm on V .

(i) The limit is unique. That is, if an ’ a and an ’ b as n ’ ∞, then

a = b.

(ii) If an ’ a as n ’ ∞ and n(1) < n(2) < n(3) . . . , then an(j) ’ a as

j ’ ∞.

(iii) If an = c for all n, then an ’ c as n ’ ∞.

(iv) If an ’ a and bn ’ b as n ’ ∞, then an + bn ’ a + b.

(v) Suppose that an ∈ V , a ∈ V , »n ∈ R, and » ∈ R. If an ’ a and

»n ’ », then »n an ’ »a.

Proof. Left to the reader. The reader who looks for the proof of Lemma 4.1.9

will ¬nd herself referred backwards to the proof of Lemma 1.2.2, but will ¬nd

the proofs for a general norm on a general vector space just as easy as the

proofs for the Euclidean norm in one-dimension.

Exercise 10.3.12. What changes are necessary (if any) in the statements

and proofs of Lemma 10.3.11 if we make V be a complex vector space?

If we consider a general metric, then we have no algebraic structure and

the result corresponding to Lemma 10.3.11 has far fewer parts.

Lemma 10.3.13. Let (X, d) be a metric space.

(i) The limit is unique. That is, if an ’ a and an ’ b as n ’ ∞, then

a = b.

(ii) If an ’ a as n ’ ∞ and n(1) < n(2) < n(3) . . . , then an(j) ’ a as

j ’ ∞.

(iii) If an = c for all n, then an ’ c as n ’ ∞.

The proof is again left to the reader.

The material on open and closed sets from Section 4.2 goes through es-

sentially unchanged (and proofs are therefore left to the reader).

De¬nition 10.3.14. Let (X, d) be a metric space. A set F ⊆ X is closed if

whenever xn ∈ F for each n and xn ’ x as n ’ ∞ then x ∈ F .

A set U ⊆ X is open if, whenever x ∈ U , there exists an > 0 such that,

whenever d(x, y) < , we have y ∈ U .

245

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Example 10.3.15. Let (X, d) be a metric space. Let x ∈ X and r > 0.

(i) The set B(x, r) = {y ∈ X : d(x, y) < r} is open.

¯

(ii) The set B(x, r) = {y ∈ X : d(x, y) ¤ r} is closed.

¯

We call B(x, r) the open ball of radius r and centre x. We call B(x, r)

the closed ball of radius r and centre x.

Lemma 10.3.16. Let (X, d) be a metric space. A subset U of X is open if

and only if each point of U is the centre of an open ball lying entirely within

U.

Exercise 10.3.17. Find the open balls for the two British rail metrics. (For

all but one point there is a di¬erence between balls of small and large ra-

dius.) Show that the open sets for the British Railway stopping metric of

Exercise 10.3.8 consist of sets of the form A ∪ B where A is empty or

A = {x : x < δ} (here is the Euclidean norm and δ > 0) and B

is the union of sets of the form {»u : a < » < b} where 0 < a < b and

u = 0.

Find the open sets for the British Railway non-stop metric of Exercise 10.3.7.

De¬nition 10.3.18. The set N is a neighbourhood of the point x if we can

¬nd an r > 0 such that B(x, r) ⊆ N .

Lemma 10.3.19. Let (X, d) be a metric space. A subset U of X is open if

and only if its complement X \ U is closed.

Lemma 10.3.20. Let (X, d) be a metric space. Consider the collection „ of

open sets in X.

(i) … ∈ „ , X ∈ „ .

(ii) If U± ∈ „ for all ± ∈ A, then ±∈A U± ∈ „ .

(iii) If U1 , U2 , . . . , Un ∈ „ , then n Uj ∈ „ .

j=1

Lemma 10.3.21. Let (X, d) be a metric space. Consider the collection F

of closed sets in X.

(i) … ∈ F, X ∈ F.

(ii) If F± ∈ F for all ± ∈ A, then ±∈A F± ∈ F.

(iii) If F1 , F2 , . . . , Fn ∈ F, then n Fj ∈ F.

j=1

The new de¬nition of continuity only breaks the chain of translations

slightly because it involves two metric spaces.

De¬nition 10.3.22. Let (X, d) and (Z, ρ) be metric spaces. We say that a

function f : X ’ Z is continuous at some point x ∈ X if, given > 0, we

can ¬nd a δ( , x) > 0 such that, if y ∈ X and d(x, y) < δ( , x), we have

ρ(f (x), f (y)) < .

246 A COMPANION TO ANALYSIS

If f is continuous at every point x ∈ X, we say that f is a continuous

function on X.

The reader may feel that De¬nition 4.2.14 is more general than De¬ni-

tion 10.3.22 because it involves a set E. The following remark shows that

this is not so.

Lemma 10.3.23. Let (X, d) be a metric space and let E ⊆ X. Let dE :

E 2 ’ R be given by dE (u, v) = d(u, v) whenever u, v ∈ E. Then (E, dE ) is a

metric space.

We conclude this section with more results taken directly from Section 4.2

Lemma 10.3.24. Let (X, d) and (Z, ρ) be metric spaces and suppose that

the function f : X ’ Z is continuous at x ∈ X. Then, if xn ∈ E and

xn ’ x, it follows that f (xn ) ’ f (x).

Lemma 10.3.25. Let (X, d) and (Z, ρ) be metric spaces. The function f :

X ’ Z is continuous if and only if f ’1 (O) is open whenever O is open.

Lemma 10.3.26. Let (X, d), (Y, θ) and (Z, ρ) be metric spaces. If f : X ’

Y and g : Y ’ Z are continuous, then so is their composition g —¦ f .

10.4 Norms and the interaction of algebra

and analysis

Starting from one metric we can produce a wide variety of ˜essentially equiv-

alent metrics™.

Exercise 10.4.1. Let (X, d) be a metric space.

(i) If we de¬ne d1 (x, y) = min(1, d(x, y)), show that (X, d1 ) is a metric

space. Show further that, if xn ∈ X [n ≥ 0], then d1 (xn , x0 ) ’ 0 as n ’ ∞

if and only if d(xn , x0 ) ’ 0.

(ii) If we de¬ne d2 (x, y) = d(x, y)2 , show that (X, d2 ) is a metric space.

Show further that, if xn ∈ X [n ≥ 0], then d2 (xn , x0 ) ’ 0 as n ’ ∞ if and

only if d(xn , x0 ) ’ 0.

(iii) Let ± > 0 and de¬ne d± (x, y) = d(x, y)± . For which values of ± is

(X, d± ) always a metric space? For those values of ±, is it always true that,

if xn ∈ X [n ≥ 0], then d± (xn , x) ’ 0 as n ’ ∞ if and only if d(xn , x) ’ 0?

Prove your statements.

(iv) (This is trivial but explains why we had to be careful in the statement

of (iii).) Give an example of metric space (X, d) with X an in¬nite set such

that (X, d± ) is a metric space for all ± > 0.

247

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Because norms re¬‚ect an underlying vector space structure they are much

more constrained.

Lemma 10.4.2. Let 1 and 2 be norms on a vector space V . Then the

following two statements are equivalent.

(a) There exist K, L > 0 such that K x 1 ≥ x 2 ≥ L x 1 for all

x∈V.

(b) If xn ∈ V [n ≥ 0], then xn ’ x0 1 ’ 0 as n ’ ∞ if and only if

xn ’ x0 2 ’ 0.

Proof. It is easy to see that (a) implies (b), since, for example, if xn ’x0 ’

2

0, then

¤ L’1 xn ’ x0

xn ’ x0 ’ 0,

1 2

and so xn ’ x0 1 ’ 0 as n ’ ∞.

To see that (b) implies (a), suppose that (a) is false. Without loss of

generality, suppose that there is no K > 0 such that K x 1 ≥ x 2 for all

x ∈ V . Then we can ¬nd yn ∈ V such that

> n2 y n 1 .

yn 2

’1

Setting x0 = 0 and xn = n’1 yn for n ≥ 1, we obtain

1 yn

xn ’ x0 = 1/n ’ 0

= xn

1 1

and

xn ’ x0 = xn > n,

2 2

so xn ’ x0 0 as n ’ 0. Thus (b) is false if (a) is false

2

Exercise 10.4.3. Let d1 and d2 be metrics on a space X. Consider the

following two statements.

(a) There exist K, L > 0 such that Kd1 (x, y) ≥ d2 (x, y) ≥ Ld1 (x, y) for

all x, y ∈ X.

(b) If xn ∈ X [n ≥ 0] then d1 (xn , x0 ) ’ 0 as n ’ ∞ if and only if

d2 (xn , x0 ) ’ 0.

Show that (a) implies (b) but that (b) does not imply (a).

Although we shall not make much use of this, the concepts just introduced

have speci¬c names.

248 A COMPANION TO ANALYSIS

De¬nition 10.4.4. (i) Let K ≥ 0. If (X, d) and (Y, ρ) are metric spaces, we

say that a function f : X ’ Y is Lipschitz with constant K if ρ(f (x), f (y)) ¤

Kd(x, y) for all x, y ∈ X.

(ii) We say that two metrics d1 and d2 on a space X are Lipschitz equiv-

alent if there exist K, L > 0 such that Kd1 (x, y) ≥ d2 (x, y) ≥ Ld1 (x, y) for

all x, y ∈ X.

Exercise 10.4.5. (i) If (X, d) and (Y, ρ) are metric spaces, show that any

Lipschitz function f : X ’ Y is continuous. Give an example to show that

the converse is false.

(ii) Show that Lipschitz equivalence is indeed an equivalence relation be-

tween metric spaces.

The following result is a less trivial example of the interaction between

algebraic and metric structure.

Theorem 10.4.6. All norms on Rn are Lipschitz equivalent.

Proof. We consider the standard Euclidean norm given by

1/2

n

x2

x= j

j=1

and show that it is equivalent to an arbitrary norm —.

One of the required inequalities is easy to obtain. If we write ei for the

unit vector along the xi axis we have,

n n

¤ |xj | ej

x = xj ej

— —

j=1 j=1

—

¤ n max |xr | max er ¤ (n max er — ) x .

—

1¤r¤n 1¤r¤n 1¤r¤n

We have thus shown the existence of a K such that K x ≥ x — .

To obtain the other inequality we use an argument from analysis4 . Ob-

serve that, if we give Rn and R the usual Euclidean metric and de¬ne

f : Rn ’ R by

f (x) = x — ,

then, using the triangle inequality for and the result of the previous

—

paragraph,

|f (x) ’ f (y)| = | x ’ y —| ¤ x ’ y ¤K x’y ,

— —

4

Experts may be interested in Exercises K.184 and K.185. Non-experts should ignore

this footnote.

249

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so f is continuous. Now the unit sphere

S(0, 1) = {x ∈ Rn : x = 1}

is closed and bounded for the usual Euclidean metric and so, by Theo-

rem 4.3.4, f is bounded on S(0, 1). In particular, we can ¬nd k ∈ S(0, 1)

where f attains its minimum, that is to say,

f (k) ¤ f (x)

for all x with x = 1.

Since k = 1, we have k = 0 and thus k > 0. Let us set L = k — .

—

If x = 0, then x ’1 x ∈ S(0, 1) so

’1 ’1 ’1

x) ≥ f (k) = k

x x = x x = f( x = L,

— — —

≥ L x . The case x = 0 is trivial, so we have established

and so x —

Kx ≥ x ≥L x

—

for all x ∈ Rn as required.

Exercise 10.4.7. (i) Let aj ∈ R for 1 ¤ j ¤ n and write

n

aj |xj |.

x =

w

j=1

State and prove necessary and su¬cient conditions for w to be a norm on

R.n

be the usual Euclidean norm on Rn . Show that, if n ≥ 2, then

(ii) Let

given any K we can ¬nd a norm — and points y1 and y2 such that

y1 > K y1 and y2 > K y2 — .

—

Does this result remain true if n = 1? Give reasons.

The fact that all norms on a ¬nite dimensional space are, essentially,

the same meant that, in Chapters 4 and 6 and elsewhere, we did not really

need to worry about which norm we used. By the same token, it obscured

the importance of the appropriate choice of norm in approaching problems in

analysis. However, once we consider in¬nite dimensional spaces, the situation

is entirely di¬erent.

250 A COMPANION TO ANALYSIS

Exercise 10.4.8. Consider s00 the space of real sequences a = (an )∞ such

n=1

that all but ¬nitely many of the an are zero.

(i) Show that if we use the natural de¬nitions of addition and scalar

multiplication

(an ) + (bn ) = (an + bn ), »(an ) = (»an )

then s00 is a vector space.

(ii) Show that the following de¬nitions all give norms on s00 .

= max |an |,

a ∞

n≥1

= max |nan |,

a w

n≥1

∞

|an |,

a =

1

n=1

1/2

∞

|an |2

a = ,

2

n=1

∞

n|an |.

a =

u

n=1

(iii) For each of the twenty ¬ve possible pairs of norms A and B from

part (ii), establish whether or not there exists a K such that K a A ≥ a B

for all a ∈ s00 . Show that none of the norms are Lipschitz equivalent.

(iv) Find a family of norms ± on s00 [± > 0] such that ± and β

are not Lipschitz equivalent if ± = β.

When we study an algebraic structure we also study those maps which

preserve algebraic structure. Thus, if we have a map θ : G ’ H be-

tween groups, we want θ to preserve group multiplication, that is, we want

θ(xy) = θ(x)θ(y). Such maps are called homomorphisms. Similarly, when

we study an analytic structure, we also study those maps which preserve an-

alytic structure. Consider for example a map f between metric spaces which

preserves ˜sequences tending to limits™. We then have

xn ’ x0 implies f (xn ) ’ f (x0 )

and study continuous functions.

If we study objects which have linked analytic and algebraic structures,

we will thus wish to study maps between them which preserve both algebraic

and analytic structures. For normed vector spaces this means that we wish

to study continuous linear maps. This fact is obscured by the fact that linear

maps between ¬nite dimensional vector spaces are automatically continuous.

251

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 10.4.9. Let U and V be ¬nite dimensional vector spaces with norms

V . Show that any linear map T : U ’ V is automatically con-

U and

tinuous.

However, as we shall see in Exercise 10.4.14, when we deal with in¬nitely

dimensional vector spaces, not all linear maps need be continuous.

Here is a simple but important characterisation of those linear maps which

are continuous.

Lemma 10.4.10. Let (U, U ) and (V, V ) be normed vector spaces.

Then a linear map T : U ’ V is continuous if and only if there exists a K

such that T u V ¤ K u U for all u ∈ U .

Exercise 10.4.11. Prove Lemma 10.4.10. [One possible proof follows that

of Lemma 10.4.2 very closely.]

Exercise 10.4.12. By observing that the space of polynomials of degree n

or less has ¬nite dimension, or otherwise, prove the following result. There

exists a constant Cn such that

sup |P (t)| ¤ Cn sup |P (t)|

t∈[0,1] t∈[0,1]

for all real polynomials P of degree n or less.

State, with proof, whether we can ¬nd a C independent of n such that

sup |P (t)| ¤ C sup |P (t)|

t∈[0,1] t∈[0,1]

for all real polynomials P .

State, with proof, whether we can ¬nd a constant An such that of n such

that

sup |P (t)| ¤ An sup |P (t)|

t∈[0,1] t∈[0,1]

for all real polynomials P of degree n or less.

The close link between Lemma 10.4.10 and Lemma 10.4.2 is highlighted

in the next exercise.

Exercise 10.4.13. Let 1 and 2 be two norms on a vector space U .

(i) Show that the following statements are equivalent.

(a) If xn ’ x0 1 ’ 0, then xn ’ x0 2 ’ 0.

1 ) ’ (U,

(b) The identity map I : (U, 2 ) from U with norm 1

to U with norm 2 is continuous.

252 A COMPANION TO ANALYSIS

(c) There exists a K such that K u 1 ≥ u 2 for all u ∈ U .

(ii) Write down, and prove equivalent, three similar statements (a) , (b)

and (c) where (c) is the statement

(c) There exist K and L with K > L > 0 such that K u 1 ≥ u 2 ≥

L u 1 for all u ∈ U .

Here are some examples of continuous and discontinuous linear maps5 .

Exercise 10.4.14. Consider the vector space s00 de¬ned in Exercise 10.4.8

and the norms

= max |an |,

a ∞

n≥1

= max |nan |,

a w

n≥1

∞

|an |,

a =

1

n=1

1/2

∞

|an |2

a = ,

2

n=1

∞

n|an |.

a =

u

n=1

given there.

(i) For each of the twenty ¬ve possible pairs of norms A and B

A) ’

listed above state, with reasons, whether the identity map I : (s00 ,

(s00 , B ) from s00 with norm A to s00 with norm B is continuous.

(ii) Show that the map T : s00 ’ R de¬ned by T a = ∞ aj is linear.

j=1

If we give R the usual Euclidean norm and s00 one of the ¬ve norms listed

above, state, with reasons, whether T is continuous.

(iii) Show that the map S : s00 ’ s00 de¬ned by Sa = b with bj = jaj

is linear. For each of the twenty ¬ve possible pairs of norms A and B

listed above, state, with reasons, whether S, considered as a map from s 00

with norm A to s00 with norm B , is continuous.

Once we have Lemma 10.4.10, the way is open to an extension of the idea

of an operator norm investigated in Section 6.2.

5

If the reader returns to this example after studying the chapter on completeness,

she may note that the normed spaces given here are not complete. The question of the

existence of discontinuous linear maps between complete normed vector spaces involves

more advanced ideas, notably the axiom of choice. However, the consensus is that it

is unreasonable to expect all linear maps between complete normed vector spaces to be

continuous, in the same way, and for much the same reasons, as it is unreasonable to

expect all sets in R3 to have volume (see page 172).

253

Please send corrections however trivial to twk@dpmms.cam.ac.uk

De¬nition 10.4.15. Let U and V be vector spaces with norms and

U

V . If ± : U ’ V is a continuous linear map, then we set

± = sup ±x .

V

U ¤1

x

Exercise 10.4.16. (i) Let U = V = s00 and let a U = a V = ∞ |an |. n=1

’1

Show that if T : U ’ V is de¬ned by T a = b with bj = (1 ’ j )aj then T is

a continuous linear map. However, there does not exist an a ∈ U with a = 0

such that T a V = T a U . [See also Exercise 11.1.16.]

(ii) If U and V are ¬nite dimensional normed vector spaces and T : U ’

V is linear can we always ¬nd an a ∈ U with a = 0 such that T a V =

T a U ? Give reasons.

In exactly the way we proved Lemma 6.2.6, we can prove the following

results.

Exercise 10.4.17. Let U and V be vector spaces with norms U and V

and let ±, β : U ’ V be continuous linear maps.

(i) If x ∈ U , then ±x V ¤ ± x U .

(ii) ± ≥ 0.

(iv) If ± = 0, then ± = 0.

(v) If » ∈ R, then »± is a continuous linear map and »± = |»| ± .

(vi) (The triangle inequality) ± + β is a continuous linear map and

±+β ¤ ± + β .

W and γ : V ’ W is a continuous

(vii) If W is vector spaces with norm

linear map, then γ± is a continuous linear map and γ± ¤ γ ± .

We restate part of Exercise 10.4.17 in the language of this chapter.

Lemma 10.4.18. Let U and V be vector spaces with norms U and V.

The space L(U, V ) of continuous linear maps is a vector space and the oper-

ator norm is a norm on L(U, V ).

Although we shall not develop the theme further, we note that we now

have an appropriate de¬nition for di¬erentiation of functions between general

normed vector spaces.

De¬nition 10.4.19. Let U and V be vector spaces with norms U and

V . Suppose that E is a subset of of U and x a point such that there exists

a δ > 0 with B(x, δ) ⊆ E. We say that f : E ’ V is di¬erentiable at x, if

we can ¬nd a continuous linear map ± : U ’ V such that, when h ∈ B(x, δ),

f (x + h) = f (x) + ±h + (x, h) h U

254 A COMPANION TO ANALYSIS

where (x, h) V ’ 0 as h U ’ 0. We write ± = Df (x) or ± = f (x).

If E is open and f is di¬erentiable at each point of E, we say that f is

di¬erentiable on E.

Notice that the only important change from De¬nition 6.1.4 is that we

speci¬cally demand that ± = Df (x) is a continuous linear function.

The best way to understand what is going on is probably to do the next

exercise.

Exercise 10.4.20. State and prove the appropriate extension of the chain

rule given as Lemma 6.2.10.

There is no di¬culty in extending all our work on di¬erentiation from

¬nite dimensional to general normed spaces. The details are set out in

Dieudonn´™s book ([13], Chapter VIII).

e

Exercise 10.4.21. The title of this section was ˜Norms and the interaction

of algebra and analysis™ but much of it was really pure algebra in the sense

that it did not use the fundamental axiom of analysis. Go back through the

section and note which results are genuine results of analysis and which are

just algebra disguised as analysis.

Geodesics ™

10.5

This section introduces an idea which is important in more advanced work

but which will not be used elsewhere in this book. The discussion will be

informal.

Suppose we wish to build a road joining two points of the plane. The cost

per unit length of road will depend on the nature of the terrain. If the cost

of building a short stretch of length δs near a point (x, y) is (to ¬rst order)

g(x, y)δs then, subject to everything being well behaved, the cost of road “

will be

g(x, y) ds.

“

The reader may use whatever de¬nition of line integral she is comfortable

with. In particular she does not need to have read Section 9.5. It is tempting

to de¬ne the distance d(A, B) between two points A and B in the plane as

d(A, B) = inf g(x, y) ds : “ joining A and B .

“

If we include amongst our conditions that g is continuous and g(x, y) > 0

everywhere, we see that d is a metric.

255

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Plausible statement 10.5.1. (i) d(A, B) ≥ 0 for all A and B.

(ii) d(A, B) = 0 implies A = B.

(iii) d(A, B) = d(B, A) for all A and B.

(iv) d(A, C) ¤ d(A, B) + d(B, C) for all A, B and C.

I have carefully used ˜inf™ rather than ˜min™ in the de¬nition of d(A, B).

One problem is that I have not speci¬ed the kind of “ that is permissible.

(Notice that the argument needed to obtain part (iv) of Statement 10.5.1

means that we cannot just use ˜smooth™.) However, it can be shown that, if

we choose a suitable class for the possible “ and a suitably well behaved g,

then the minimum is attained. If “0 is a path from A to B such that

g(x, y) ds = d(A, B),

“0

we call “0 a geodesic. (The geodesic need not be unique, consider road

building for two towns at diametrically opposite points of a circular marsh.)

If any two points are joined by a geodesic, then d(A, B) is the ˜length of the

geodesic path joining A and B™ where length refers to the metric d and not

to the Euclidean metric.

Let us try to use these ideas to ¬nd a metric on the upper half-plane

H = {z ∈ C : z > 0}

which is invariant under M¨bius transformation. (If you have not met M¨bius

o o

transformations skip to after Exercise 10.5.4, where I restate the problem.)

More precisely, we want a metric d such that, if T is a M¨bius map mapping H

o

bijectively to itself, then d(T z1 , T z2 ) = d(z1 , z2 ) for all z1 , z2 ∈ H. Of course,

no such metric might exist, but we shall see where the question leads.

Lemma 10.5.2. The set H of M¨bius maps T such that T |H : H ’ H is

o

bijective is a subgroup of the group M of all M¨bius transformations. The

o

subgroup H is generated by the transformations Ta with a ∈ R, D» with »

real and » > 0 and J, where

Ta (z) =a + z

D» (z) =»z

J(z) = ’ z ’1

Sketch proof. The fact that H is a subgroup of M follows from easy general

principles (see Exercise 10.5.3). We check directly that (if a is real and » > 0)

Ta , D» and J lie in H. Thus the composition of elements of the stated type

will also lie in H.

256 A COMPANION TO ANALYSIS

We now wish to identify H. If T ∈ H, then T is a well behaved bijective

map on C— = C ∪ {∞} and must take the boundary of H to the boundary

of H. Thus, writing R for the real axis, we know that T takes R ∪ {∞}

to R ∪ {∞}. Thus, if T (∞) = ∞, then T (∞) = a with a real and we set

M1 = JT’a . If T (∞) = ∞ we take M1 to be the identity map. In either case

M1 T ∈ H and M1 T (∞) = ∞. We now observe that M1 T (0) ∈ R ∪ {∞},

by our previous argument, and that M1 T (0) = ∞, since M¨bius maps are

o

bijections. Thus M1 T (0) = b with b real. Set M2 = T’b . We have M2 M1 T ∈

H, M2 M1 T (0) = 0 and M2 M1 T (∞) = ∞. But any M¨bius map M which

o

¬xes 0 and ∞ must have the form M (z) = µz for some µ = 0. If M takes H

to H, then µ is real and positive. Thus M2 M1 T = D» for some » > 0 and

’1 ’1

T = M1 M2 D» . We have shown that any element H is the composition of

maps of the stated type and the lemma follows.

(The remark on page 234 applies here too. We know so much about M¨bius

o

transformations that the argument above is more a calculation than a proof.

However, for more complicated conformal maps than are generally found in

undergraduate courses, arguments involving ˜boundaries™ may be mislead-

ing.)

Exercise 10.5.3. (Most readers will already know the content of this exer-

cise.) Let X be a set and S(X) the collection of bijections of X. Show that

S(X) is a group under the operation of composition. If G is a subgroup of

S(X), Y a subset of X and we de¬ne

K = {f ∈ G : f (Y ) = Y },

show that K is a subgroup of G.

Exercise 10.5.4. If |a| < 1 and Ma is the M¨bius map Ma given by

o

z’a

Ma z = ,

a— z ’ 1

show that |Ma eiθ | = 1 for all real θ. Deduce, stating any properties of M¨bius

o

transforms that you need, that Ma maps the unit disc D = {z : |z| < 1} to

itself and interchanges 0 and a. Show that the set H of M¨bius maps T such

o

that T |H : D ’ D is bijective is a subgroup of the group M of all M¨bius o

transformations generated by the transformations Ma with |a| < 1 and the

rotations.

Lemma 10.5.2 reduces our problem to one of ¬nding a metric d on the half

plane H = {z : z > 0} such that d(T z1 , T z2 ) = d(z1 , z2 ) for all z1 , z2 ∈ H,

257

Please send corrections however trivial to twk@dpmms.cam.ac.uk

whenever T is one of the transformations Ta with a ∈ R, D» with » > 0 and

J de¬ned by

Ta (z) =a + z

D» (z) =»z

J(z) = ’ z ’1 .

We try using the ideas of this section and de¬ning

d(z1 , z2 ) = inf g(z) ds : “ joining z1 and z2 ,

“

for some appropriate strictly positive function g : H ’ R. (Throughout, we

write z = x + iy and identify the plane R2 with C in the usual manner.)

Arguing informally, this suggests that, to ¬rst order,

d(z + δz, z) = g(z)δs = g(z)|(z + δz) ’ z|.

If T is one of the transformations we are considering, then we must have, to

¬rst order,

g(z)|(z + δz) ’ z| = d(z + δz, z) = d(T (z + δz), T z)

= g(T z)|T (z + δz) ’ T (z)| = g(T z)|T (z)||δz|,

and so

g(z) = g(T z)|T (z)|.

Taking T = Ta , we obtain g(z) = g(z + a) for all real a so g(x + iy) = h(y)

for some well behaved function h : (0, ∞) ’ (0, ∞).

Taking T = D» (z), we obtain g(z) = »g(»z) for all real strictly positive ».

Thus h(y) = »h(»y) for all », y > 0. Taking » = 1/y, we obtain h(y) = Ay

for some A > 0. The factor A merely scales everything, so we take A = 1

and decide to experiment with g(x + iy) = 1/y.

Exercise 10.5.5. Verify that, if g(x + iy) = 1/y and T = J, then equation

holds.

Exercise 10.5.6. Suppose that γ : [0, 1] ’ H is well behaved and T ∈ H.

˜

Let γ = T —¦ γ (that is, γ (z) = T (γ(z))). If “ is the path described by γ and “

˜ ˜

is the path described by γ show, using whatever de¬nition of the line integral

˜

that you wish, that

inf g(z) ds = inf g(z) ds

˜

“ “

for T = Ta , T = D» and T = J. Conclude that the equality holds for all

T ∈ H.

258 A COMPANION TO ANALYSIS

Exercise 10.5.6 shows that, if we set

1

d(z1 , z2 ) = inf ds : “ joining z1 and z2 ,

y

“

then we do, indeed, get an invariant metric (that is, a metric with d(T z 1 , T z2 ) =

d(z1 , z2 ) for all z1 , z2 ∈ H whenever T ∈ H).

To ¬nd out more about this metric we need to ¬nd the geodesics and to do

this we use the methods of the calculus of variation described in Section 8.4.

We shall use the methods in a purely exploratory manner with no attempt

at rigour. (Notice that, even if we did things rigorously, we would only

get necessary conditions.) Suppose that we wish to ¬nd the path “0 which

1

minimises “ y ds among all paths “ from z1 = x1 + iy1 to z2 = x2 + iy2 with

x1 < x2 and y1 , y2 > 0. It seems reasonable to look for a path given by

z = x + iy(x) where y : [x1 , x2 ] ’ (0, ∞) is well behaved. We thus wish to

minimise

x2 x2

1

(1 + y (x)2 )1/2 dx = G(y(x), y (x)) dx

y(x)

x1 x1

where G(v, w) = v ’1 (1 + w2 )1/2 . The Euler-Lagrange equation gives, via

Exercise 8.4.9 (i),

G(y(x), y (x)) ’ y (x)G,2 (y(x), y (x)) = c,

where c is a constant. Writing the equation more explicitly, we get

(1 + y 2 )1/2 y2

’ = c,

y(1 + y 2 )1/2

y

so that

2

1 = cy(1 + y )1/2 .

Setting a = c’1 , we obtain

2

a2 = y 2 (1 + y )

so that, solving formally,

y dy

= dx

(a2 ’ y 2 )1/2

and

’(a2 ’ y 2 )1/2 = x ’ b,

259

Please send corrections however trivial to twk@dpmms.cam.ac.uk

for some constant b ∈ R. Thus

(x ’ b)2 + y 2 = a2 .

We expect the geodesic to be the arc of a circle with its centre on the real

axis passing through the two points z1 and z2 .

Exercise 10.5.7. Suppose (x1 , y1 ), (x2 , y2 ) ∈ R2 and x1 = x2 . Show that

there is one and only one circle with its centre on the real axis passing through

the two points (x1 , y1 ) and (x2 , y2 ).

What happens if z1 = z2 ? One way of guessing is to consider the

geodesic path between z1 and z2 + δ where δ is real and non-zero. If we let

δ ’ 0, the appropriate circle arcs approach a straight line joining z1 and z2

so we would expect this to be the solution.

Exercise 10.5.8. Attack the geodesic problem by considering paths given by

z = x(y) + iy where y : [y1 , y2 ] ’ R is well behaved. (The di¬culties are

mainly notational, the formulae of the variational calculus assume that y is

a function of x and it requires a certain amount of clear headedness to deal

with the case when x is a function of y. You should be able to make use

of Exercise 8.4.9 (ii).) Check, by choosing appropriate constants, that your

solutions include straight lines perpendicular to the x-axis.

Now that we know (or at least guess) what the geodesics are we can see

(at least if we know a little about M¨bius maps) a di¬erent way of showing

o

this.

Exercise 10.5.9. We work in C— = C ∪ {∞}.