<<

. 8
( 19)



>>


L(γ, D) ≥ L ’ .

Explain why

lγ (tj ) ’ lγ (tj’1 ) ¤

and deduce that, if tj’1 ¤ t ¤ t ¤ tj , then

lγ (t ) ’ lγ (t) ¤

for all 1 ¤ j ¤ n.
(ii) Use part (i) to show that lγ : [a, b] ’ R is continuous.

Exercise 9.5.9. [This a just a simple observation obscured by notation.]
We use the hypotheses and notation of Exercise 9.5.7. Let L be the length
of γ. Show that, if γ : [a, b] ’ Rm is injective, then setting θ = lγ we
have θ : [a, b] ’ [0, L] a bijective continuous map. Explain why this means
that θ’1 is continuous (see the proof of Lemma 5.6.7 if necessary). If we set
„ = γ —¦ θ’1 , show that

l„ (s) = s

for 0 ¤ s ¤ L. We say that the curve „ is the curve γ ˜reparameterised by
arc length™.

If we de¬ne Lγ : R ’ R by

for t ¤ a,
Lγ (t) = lγ (a)
Lγ (t) = lγ (t) for a < t < b,
for b ¤ t,
Lγ (t) = lγ (b)

then, since Lγ is an increasing function, we can de¬ne the Riemann-Stieltjes
integral
b
g(t) dLγ (t)
a
228 A COMPANION TO ANALYSIS

for any continuous function g : [a, b] ’ R. It follows that, if f : Rm ’ R is
continuous, we can de¬ne the ˜integral along the curve™ by
b
f (x) ds = f (γ(t)) dLγ (t).
a
γ

Note that, if the curve „ is the curve γ ˜reparameterised by arc length™ in
the sense of Example 9.5.9, then
L
f (x) ds = f („ (t)) dt.
0


We have de¬ned an integral along a curve, but we have not shown how
to calculate it. If γ is su¬ciently smooth, we can proceed as follows.
Exercise 9.5.10. Suppose x, y : [a, b] ’ R are continuous functions, >0
and A, B are real numbers such that
|(x(t) ’ x(s)) ’ A(t ’ s)|, |(y(t) ’ y(s)) ’ B(t ’ s)| ¤
for all t, s ∈ [a, b]. Show that, if γ : [a, b] ’ R2 is the curve given by
γ(t) = (x(t), y(t)), then
(min(|A| ’ , 0))2 + (min(|B| ’ , 0))2 (t ’ s)2 ¤ γ(t) ’ γ(s) 2

¤ (|A| + )2 + (|B| + )2 (t ’ s)2
for all t, s ∈ [a, b]. Deduce that γ is recti¬able and
1/2
(min(|A| ’ , 0))2 + (min(|B| ’ , 0))2 (b ’ a) ¤ length(γ)
1/2
¤ (|A| + )2 + (|B| + )2 (b ’ a).
Exercise 9.5.11. This exercise uses the ideas of the previous exercise to-
gether with the mean value inequality. Suppose that x, y : [a, b] ’ R
have continuous derivatives (with the usual conventions about left and right
derivatives at end points). Show that the curve γ : [a, b] ’ R2 given by
γ(t) = (x(t), y(t)) is recti¬able and, by considering the behaviour of
lγ (s) ’ lγ (t)
s’t
as s ’ t, show that lγ is everywhere di¬erentiable on [a, b] with
lγ (t) = (x (t)2 + y (t)2 )1/2 .
Explain why your proof does not apply to the counterexample in part (ii)
of Exercise 9.5.5.
229
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 9.1: Arc length via polygonal paths

Exercise 9.5.12. Use Exercise 9.4.10 to compute the lengths of the curves
γ 1 to γ 6 de¬ned on page 224.

Using Exercise 9.4.10, we see that, if x, y : [a, b] ’ R2 have continuous
derivatives (with the usual conventions about left and right derivatives at end
points), and we consider the curve γ : [a, b] ’ R2 given by γ(t) = (x(t), y(t)),
then, if f : R2 ’ R is continuous,
b
f (x(t), y(t))(x (t)2 + y (t))1/2 dt,
f (x) ds =
a
γ


a result coinciding with that we would expect from mathematical methods
courses.

Exercise 9.5.13. Extend this result to smooth curves γ : [a, b] ’ Rm , giving
as much (or as little) detail as seems to you desirable.

This seems highly satisfactory until we read more advanced texts than
this and discover that, instead of using the sophisticated general ideas of
this section, these advanced texts use a rigorised version of the mathematical
methods approach. Why do they do this?
There are various reasons, but one of the most important is that the
approaches developed here do not apply in higher dimensions. More specif-
ically, we obtained arc length by ˜approximating by polygonal paths™ as in
Figure 9.1.
In 1890, H. A. Schwarz10 published an example showing that any naive at-
tempt to ¬nd the area of a surface by ˜approximating by polyhedral surfaces™
must fail. The example provides a good exercise in simple three dimensional
geometry.
10
The Schwarz of the Cauchy-Schwarz inequality.
230 A COMPANION TO ANALYSIS




Figure 9.2: Part of Schwarz™s polyhedral approximation




Figure 9.3: Two more views of Schwarz

Exercise 9.5.14. (Schwarz™s counterexample.) Split a 1 by 2π rectangle
into mn rectangles each m’1 by 2πn’1 , and split each of these into four
triangles by means of the two diagonals. Bend the large rectangle into a
cylinder of height 1 and circumference 2π, and use the vertices of the 4mn
triangles as the vertices of an inscribed polyhedron with 4mn ¬‚at triangular
faces11 . We call the area of the resulting polyhedron A(m, n).
In Figure 9.2 we show one of the nm rectangles ABCD with diagonals
meeting at X before and after bending. In our discussion, we shall refer only
to the system after bending. Let W be the mid point of the arc AB, Y the
mid point of the chord AB and Z the mid point of the line BD as shown in
Figure 9.3.
By observing that W A = XZ, or otherwise, show that the area of the tri-
angle AXC is (2m)’1 sin(π/n). Show that Y W has length 2(sin(π/2n))2 and
deduce, or prove otherwise, that the triangle XAB has area sin(π/n)((2m) ’1 +
4(sin(π/2n))2 ). Conclude that
1/2
π π 2
2
A(m, n) = n sin 1 + 1 + 16m sin .
n 2n
11
The last two sentences are copied directly from Billingsley™s splendid Probability and
Measure [6], since I cannot see how to give a clearer description than his.
231
Please send corrections however trivial to twk@dpmms.cam.ac.uk

If we choose nj ’ ∞ and mj ’ ∞ in such a way that m2 /nj ’ » as
j
j ’ ∞, what happens to A(mj , nj )? Can you choose nj ’ ∞ and mj ’ ∞
so that A(mj , nj ) ’ ∞ as j ’ ∞? Can you choose nj ’ ∞ and mj ’ ∞
so that A(mj , nj ) is bounded but does not converge? Can you choose nj ’ ∞
and mj ’ ∞ so that A(mj , nj ) ’ π?
Exercise 9.5.15. Explain in words and without using calculations why, if
we ¬x n, A(m, n) ’ ∞ as m ’ ∞. (Unless you can give a simple ge-
ometric account of what is going on, you have not understood Schwarz™s
example.) Deduce directly that we can choose nj ’ ∞ and mj ’ ∞ such
that A(mj , nj ) ’ ∞ as j ’ ∞. [Thus, if we simply want to show that the
naive approach fails, we do not need the calculations of the previous exercise.
However, if we want more information (for example, necessary and su¬cient
conditions for A(mj , nj ) ’ 2π) then we must do the calculations.]
Once the reader has grasped the point of Exercise 9.5.15, she may feel
that it is obvious what is wrong. However, the problem is not that we cannot
see what the area of a cylinder ought to be, but that we cannot think of a
simple de¬nition of area which will apply to general surfaces in the same
way that our de¬nition of length applied to general curves. Up to now, all
attempts to produce a de¬nition of area to parallel our de¬nition of length
have failed12 .
If a surface is su¬ciently well behaved, it is more or less clear how to frame
a notion of approximation by polyhedra which excludes Schwarz™s example.
But, if a surface is su¬ciently well behaved, we can produce a rigorous de¬ni-
tion of its area by tidying up the standard mathematical methods de¬nition.
If we are going to do this for the area of a surface and its higher dimensional
analogues, it seems a waste of time to have a special theory for length. Life,
as the saying goes, is too short to stu¬ an olive. The next exercise gives the
standard development.
Exercise 9.5.16. (Standard treatment of line integrals.) We deal only
with curves γ : [a, b] ’ Rm which are continuously di¬erentiable (so that,
writing
γ(t) = (γ1 (t), γ2 (t), . . . , γm (t)),
we know that γj : [a, b] ’ R exists and is continuous). If f : Rm ’ R is
continuous, we de¬ne
b
f (γ1 (t), γ2 (t), . . . , γm (t))(γ1 (t)2 + γ2 (t)2 + · · · + γm (t)2 )1/2 dt.
f (x) ds =
a
γ
12
There are other ways of tackling this problem. Once again I refer the reader to the
circle of ideas associated with the name of Hausdor¬.
232 A COMPANION TO ANALYSIS



Suppose that θ : [c, d] ’ [a, b] is a surjective function with continuous
derivative. Explain why „ = γ —¦ θ is a continuously di¬erentiable function
from [c, d] to Rm . Show that, if θ is injective, then

f (x) ds = f (x) ds,
„ γ

but give an example to show that, if θ is not injective, this may not hold.
We de¬ne the length of γ to be 1 ds.
γ
Chapter 10

Metric spaces

Sphere packing ™
10.1
(In this section and the next we are much more interested in ideas than
rigour. We shall use methods and results which go well beyond the scope of
this book.)
Human beings prefer order to disorder. Asked to pack a crate of oranges,
we do not throw them in at random, but try to pack them in a regular
pattern. But choosing patterns requires insight and sometimes we have no
insight. Consider the problem of packing n dimensional balls in a very large
box. (We shall take our balls and boxes to be open, but it should be clear
that such details do not matter.)
As an example of a regular packing, let the balls have radius 1/2 and let
us adopt a cubical packing so that the centre of each ball is one of integer
points Zn . Is this reasonably e¬cient or not? To answer this question it is
helpful to know the volume Vn (r) of an n dimensional ball of radius r.
Lemma 10.1.1. (i) If we write Vn = Vn (1), then Vn (r) = Vn rn .
(ii) If a > 0 and I[0,a] : [0, ∞) ’ R is de¬ned by I[0,a] (t) = 1 if t ∈ [0, a],
I[0,a] (t) = 0, otherwise, then

I[0,a] ( r ) dV = nVn I[0,a] (r)rn’1 dr.
Rn 0
k
(iii) If f : [0, ∞) ’ R is given by f = »j I[0,aj ] , then
j=1

f (r)rn’1 dr.
f ( r ) dV = nVn
Rn 0
(iv) If f : [0, ∞) ’ R is such that f (r)r n+1 ’ 0 as r ’ ∞, then

f (r)rn’1 dr.
f ( r ) dV = nVn
Rn 0

233
234 A COMPANION TO ANALYSIS

(v) Taking f (r) = exp(’r 2 /2) in (ii), we obtain

n/2
rn’1 exp(’r2 /2) dr,
(2π) = nVn
0

and so
πn n!22n π n’1
V2n = , V2n’1 = .
n! (2n)!

Sketch proof. (i) Use similarity.
(ii) This is a restatement of (i).
(iii) Use linearity of the integral.
(iv) Use an approximation argument.
(v) Using repeated integration,
∞ ∞ ∞
exp(’(x2 + x2 + · · · + x2 )/2) dx1 dx2 . . . dxn
···
f ( r ) dV = 1 2 n
Rn ’∞ ’∞ ’∞
∞ ∞ ∞
exp(’x2 /2) dx1 exp(’x2 /2) dx2 · · · exp(’x2 /2) dxn = (2π)n/2 .
= 1 2 n
’∞ ’∞ ’∞

On the other hand, integration by parts gives
∞ ∞
n’1 2
rn’3 exp(’r2 /2) dr,
exp(’r /2) dr = (n ’ 2)
r
0 0

so

n/2
rn’1 exp(’r2 /2) dr
(2π) = nVn
0

rn’3 exp(’r2 /2) dr = (2π)(n’2)/2 nVn /Vn’2
= n(n ’ 2)Vn
0

and Vn = (2π)Vn’2 /n. The stated results follow by induction.
Remark: Since we are not seeking rigour in this section, I have only sketched
a proof. However, few mathematicians would demand more proof for this
result. As I emphasise in Appendix C, we need rigour when we make general
statements which must apply to objects we have not yet even imagined. Here,
we need a result about a speci¬c property of a speci¬c object (the volume of
an n-dimension sphere), so we can have much more con¬dence in our sketched
proof.
If we use a cubical packing, the proportion of space occupied by balls is
2 Vn (think about the ball center q radius 1/2 inside the cube n (qj ’
’n
j=1
235
Please send corrections however trivial to twk@dpmms.cam.ac.uk

1/2, qj ’ 1/2)) and Lemma 10.1.1 shows that this proportion drops very
rapidly indeed as n becomes large.
What happens if we ˜just place balls wherever we can™. Suppose we are
trying to ¬ll the cube

C = {x : ’N ’ 1/2 < xj < N + 1/2 [1 ¤ j ¤ n]}

with non-intersecting balls of radius 1/2, and that we have managed to place
m such balls

Sk = {x : x ’ yk < 1/2} [1 ¤ k ¤ m].

Now consider the balls with the same centres and doubled radius

σk = {x : x ’ yk < 1} [1 ¤ k ¤ m].

A little thought shows that, if y lies inside the cube

C = {x : ’N < xj < N [1 ¤ j ¤ n]}

and does not lie in any σk , then the ball

S = {x : x ’ y < 1/2}

lies in C and does not intersect any Sk , so we may add another ball to our
collection. Such a point y will always exist if
m
Vol C > Vol σk
k=1
m m
σk , and so C \ σk = …). Thus if
(since then Vol C > Vol k=1 k=1

Vol C > mVn

we can add extra balls and so, by just ¬lling up available spaces, without
following any pattern, we can ¬nd at least Vol C /Vn disjoint balls of radius
1/2 in C. Since the volume of a ball of radius 1/2 is 2’n Vn the proportion of
1
space occupied by balls is 2’n Vol C / Vol C = 2’n (1 + 2N )’n so, for N large,
the proportion of space occupied by balls is essentially 2’n , an astounding
gain in e¬ciency over cubical packing.
What morals should we draw? The most immediate is that we do not
know what an n-dimensional ball looks like! (Here, ˜we™ refers to the writer
and most of his audience. Specialists in topics like the geometry of Banach
spaces do have substantial insight.) The second is that, when we have little
insight, trying to impose order on things may well be counter productive.
236 A COMPANION TO ANALYSIS

Shannon™s theorem ™
10.2
It costs money to send messages. In the old days, telegrams might be charged
for by the word. Nowadays, we pay for communication channels which carry
so many bits per second (so we are charged by the bit). Suppose that I can
a¬ord to send three bits of information to my partner. Then we can make
up eight words 000, 001, . . . and use them to send eight di¬erent messages.
For example, 000 could mean ˜send money™, 001 ˜come at once™, 010 ˜sell all
shares™, 011 ˜¬‚ee the country™ and so on. Unfortunately, bits are sometimes
received wrongly, so 011 could be sent and 001 received with unfortunate
consequences. We might, therefore, decide to have only two messages 000
(˜all well™) and 111 (˜¬‚ee at once™) so that, if at most one digit was received
wrongly, my partner could still take the correct action.
Exercise 10.2.1. Explain why the last statement is true.
If I can a¬ord to send 100 bits, then we can still decide to only have two
messages 0000 . . . 0 and 1111 . . . 1 but, if mistakes are rare, this is extremely
wasteful. How many di¬erent messages can we send and still be reasonably
con¬dent that we can tell which message was transmitted even when errors
occur? This section gives a simple but very useful model for the problem
and solves it.
Consider X = {0, 1}n as a vector space over F2 1 . Each element x ∈
{0, 1}n may be considered as a ˜word™ so X contains 2n possible words. In
transmitting a word over a noisy channel it may become corrupted to
x+e
where each coordinate ej of the random error e takes the value 1 with prob-
ability p and the value 0 with probability 1 ’ p, independent of the other
coordinates [0 < p < 1/2].
Exercise 10.2.2. Why do we not have to consider p > 1/2? Why is it
hopeless to consider p = 1/2?
If we choose q such that 1/2 > q > p > 0, then the law of large numbers
tells us that, provided n is large enough, it is very unlikely that more than
qn of the j are non-zero2 We may, therefore, simplify our problem to one in
which we know that at most qn of the j are non-zero.
It is natural to introduce the following notion of the distance d(x, y)
between two words (the ˜Hamming distance™).
1
This means that when we add two vectors x and y the jth component of x + y has
the form xj + yj where 0 + 0 = 1 + 1 = 0 and 1 + 0 = 0 + 1 = 1.
2
We prove this result directly in Exercise K.173.
237
Please send corrections however trivial to twk@dpmms.cam.ac.uk

De¬nition 10.2.3. If x, y ∈ X, we write
n
|xi ’ yi |.
d(x, y) =
j=1


Suppose that we only transmit the words y1 , y2 , . . . ym and that the
balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ m],

do not intersect. Since there are no more than qn errors, the received message
will lie in one of the balls Sj , say, and the transmitted message will have been
yj . Thus our system allows us to communicate m distinct messages correctly
in spite of the random noise e.
This sounds impressive, but is not very useful unless m is reasonably
large. How large can m be? This is clearly a variant of our orange packing
problem. The natural way to proceed is to de¬ne the volume of a subset E
of X to be the number of points in E. Reusing the ideas of our previous
argument, we consider the balls with the same centres and doubled radius

σk = {x : d(x, yk ) < 2qn} [1 ¤ k ¤ m].

If y does not lie in any σk , then the ball

S = {x : d(x, y) < qn}

does not intersect any Sk , so we may add another ball to our collection. Such
a point y will always exist if
m
Vol X > Vol σk ,
k=1

that is, if
m
2n > Vol σk .
k=1

The only problem that faces us, is to estimate the volume of a ball in this
context.
The key turns out to be a simple but, to many people, initially surprising
result.
238 A COMPANION TO ANALYSIS

Lemma 10.2.4. Suppose 0 < » < 1/2.
(i) If 1 ¤ r ¤ »n then
n » n
¤ .
r’1 1’» r
(ii) There exists a constant A(») such that a ball of radius »n in X has
n n
volume at most A(») and at least .
[»n] [»n]
Proof. (i) Observe that
r
n n r »
n
¤
= = .
1 ’ r’1
r’1 n+1’r 1’»
r n

(ii) Thus, if r» = [»n], the greatest integer less than »n, we have
n
Volume ball radius »n =
r
r¤»n
r» ’r
» n
¤
1’» r»
r¤r»
∞ m
» n
¤
1’» r»
m=0
1’» n
= .
1 ’ 2» [»n]
Taking A(») = (1 ’ »)/(1 ’ 2»), we have the required result.
Exercise 10.2.5. (i) If 1 > » > 1/2, ¬nd a good estimate for the volume of
a ball of radius »n in X when n is large.
(ii) Lemma 10.2.4 says, in e¬ect, that the volume of an appropriate ball in
X is concentrated near its ˜surface™ (that is those points whose distance from
the centre is close to the radius). To what extent is this true for ordinary
balls in Rm when m is large?
In part (i) of Lemma 10.2.6 we recall a simple form of Stirling™s formula
(obtained in Exercise 9.2.7 (iii)). In the rest of the lemma we use it to obtain
estimates of the volume V (», n) of a ball of radius »n in X = {0, 1}n when
n is large. We use the notation
log b
loga b =
log a
where a, b > 0. (The reader probably knows that log a b is called ˜the loga-
rithm of b to base a™. She should check the relation aloga b = b.)
239
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 10.2.6. (i) loge N ! = N loge N + N + θ(N )N , where θ(N ) ’ 0 as
N ’ ∞.
(ii) If 0 < » < 1/2, then
n’1 loge V (», n) ’ ’» loge » ’ (1 ’ ») loge (1 ’ »)
as n ’ ∞.
(iii) If 0 < » < 1/2, then
n’1 log2 V (», n) ’ ’» log2 » ’ (1 ’ ») log2 (1 ’ »)
as n ’ ∞.
Proof. (i) This is Exercise 9.2.7 (iii).
(ii) In what follows we shall be replacing a real number y by an integer
m with |m ’ y| ¤ 1. Observe that, if f is well behaved, the mean value
inequality gives |f (y) ’ f (m)| ¤ sup|t’y|¤1 |f (t)|.
By Lemma 10.2.4, part (i) of the present lemma, and the remark just
made, there exist θ1 (n), θ2 (n) ’ 0, as n ’ ∞ such that
n
n’1 loge V (», n) = n’1 loge
[»n]
= n’1 loge n! ’ loge (n ’ [»n])! ’ loge [»n]!
= n’1 n loge n + n ’ (n ’ [»n]) loge (n ’ [»n]) ’ [»n] ’ [»n] loge [»n] + θ1 (n)
= n’1 n loge n + n ’ (n ’ »n) loge (n ’ »n) ’ »n ’ »n loge »n + θ2 (n)
= n’1 n loge n + n ’ (1 ’ »)n(loge (1 ’ ») + loge n) ’ »n(loge » + loge n) + θ2 (n)
= ’» loge » ’ (1 ’ ») loge (1 ’ ») + θ2 (n),
and this is the required result.
(iii) Just apply the de¬nition of log 2 .
Let us write H(0) = H(1) = 0 and
H(») = ’» log2 » ’ (1 ’ ») log2 (1 ’ »)
for 0 < » < 1.
Exercise 10.2.7. Prove the following results and then sketch the graph of
H.
(i) H(s) = H(1 ’ s) for all s ∈ [0, 1].
(ii) H : [0, 1] ’ R is continuous.
(iii) H is twice di¬erentiable on (0, 1) with H (t) < 0 for all t ∈ (0, 1).
(iv) 0 ¤ H(t) ¤ 1 for all t ∈ [0, 1]. Identify the maximum and minimum
points.
(v) H(t)/t ’ 1 as t ’ 0 through positive values.
240 A COMPANION TO ANALYSIS

Our discussion has shown that we can ¬nd 2n(1’H(2q)) code words all a
Hamming distance at least q n apart. We can immediately interpret this
result in terms of our original problem.

Lemma 10.2.8. Consider our noisy channel. If p, the probability of error
in a single bit, is less than 1/4, then, choosing p < p < 1/4, we can transmit,
with error rate as small as we please, so that information is passed at a rate
of 1 ’ H(2p ) times that of our original channel.

Shannon pushed the argument a little bit further. Let · > 0 be small
and let N be such that

N — Vol ball radius qn ≈ · Vol X.

Choose N points yj at random in X and let them be code words. There is
no reason to expect that the balls

Sk = {x : d(x, yk ) < qn} [1 ¤ k ¤ N ],

do not intersect (and excellent reasons for supposing that they will).

Exercise 10.2.9. (This requires some experience with probability.) Write
Yij = 1 if Si and Sj intersect [i = j], Yij = 0, otherwise. What is EYij ?
What is E 1¤j<i¤N Yij ? If q is small show that E 1¤j<i¤N Yij is large.
What does this tell you about the number of intersecting balls?

If the balls intersect, then, even if less than qn errors are made, in trans-
mission, the received message may belong to two or more balls. In such a
case we simply agree that our system has failed. How probable is such a
failure? Suppose we transmit y1 and receive z = y1 + e. Since the remaining
yj with 2 ¤ j ¤ N , have been chosen at random, independently of yj , the
probability that z lies in N Sj has nothing to do with how we de¬ned z
j=2
and is just

Vol( N Sj ) N
Vol Sj
j=2 j=2
¤ ≈ ·.
Vol X Vol X
Thus the probability of failure of the type discussed in this paragraph can
be kept at any speci¬ed level · whilst still allowing roughly · Vol S1 / Vol X
code words.

Exercise 10.2.10. How can you reconcile this last result with Exercise 10.2.9?

We have thus obtained the full Shannon™s theorem.
241
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Theorem 10.2.11. (Shannon.) Consider our noisy channel. If p, the
probability of error in a single bit, is less than 1/2, then, choosing p < p <
1/2, we can transmit, with error rate as small as we please, so that informa-
tion is passed at a rate of 1 ’ H(p ) times that of our original channel.

Exercise 10.2.12. (For the knowledgeable only.) The statement ˜We have
thus obtained the full Shannon™s theorem™ is a bit optimistic since there are
quite a lot of loose ends to tie up. Tie them up.

In the 50 or so years since Shannon proved his theorem, no one has
produced non-random codes to match his random codes for long code words.
However, non-random methods are beginning to catch up. For our ¬rst
problem of sphere packing in Rn , the best known packings in high dimensions
are lattice (so very highly ordered) packings. The experts believe that this
re¬‚ects our ignorance rather than the truth.


10.3 Metric spaces
We discovered our proof of Shannon™s theorem (at least in the form of
Lemma 10.2.8) by drawing heavily on analogies with distance and volume in
Rn . Many proofs in analysis and elsewhere have been discovered by exploit-
ing analogies between the usual distance in Rn and ˜things that look more
or less like distance™.
Recalling that ˜once is a trick, twice is a method, thrice a theorem and
four times a theory™, we seek to codify this insight.
Our ¬rst try is to model the properties of Euclidean distance set out in
Lemma 4.1.4.

De¬nition 10.3.1. Let V be a real vector space and N : V ’ R a function.
We write x = N (x) and say that is a norm if the following conditions
hold.
(i) x ≥ 0 for all x ∈ V ,
(ii) If x = 0, then x = 0,
(iii) If » ∈ R and x ∈ V , then »x = |»| x .
(iv) (The triangle inequality) If x, y ∈ V , then x + y ¤ x + y .

·
Note that some mathematicians prefer to write instead of . The
˜·™ acts as a placeholder.

Exercise 10.3.2. Check that the usual Euclidean norm on Rm is indeed a
norm.
242 A COMPANION TO ANALYSIS

The appropriate de¬nition when V is a vector space over C, rather than
R, runs similarly.

De¬nition 10.3.3. Let V be a complex vector space and N : V ’ R a
function. We write x = N (x) and say that is a norm if the following
conditions hold.
(i) x ≥ 0 for all x ∈ V ,
(ii) If x = 0, then x = 0,
(iii) If » ∈ C and x ∈ V , then »x = |»| x .
(iv) (The triangle inequality) If x, y ∈ V , then x + y ¤ x + y .
We say that (V, ) is a normed space.

However, we wish to have a notion of distance which applies to spaces
which do not have a vector space structure. We seek a de¬nition modelled
on those properties of Euclidean distance which do not refer to vector space
structures. (Thus you should both compare and contrast De¬nition 10.3.1.)

De¬nition 10.3.4. We say that (X, d) is a metric space if X is a set and
d : X 2 ’ R is a function with the following properties:-
(i) d(x, y) = 0 if and only if x = y.
(ii) d(x, y) = d(y, x) for all x, y ∈ X.
(iii) (The triangle inequality) d(x, z) ¤ d(x, y)+d(y, z) for all x, y, z ∈ X.

Exercise 10.3.5. Show that, if (X, d) is a metric space, then d(x, y) ≥ 0 for
all x, y ∈ X.

It is a remarkable fact that the general notion of a metric space was ¬rst
introduced by Fr´chet in 1906 and the general notion of a normed space by
e
Banach in 1932! (We shall see one reason for the late emergence of the notion
of a norm in Theorem 10.4.6.)

Exercise 10.3.6. If is a norm over the (real or complex) vector space
V and we set d(x, y) = x ’ y , show that (V, d) is a metric space.

The conditions of De¬nition 10.3.4 are sometimes called the axioms for
metric space. However, they are not axioms in the same sense as the Funda-
mental Axiom of Analysis which asserts a fundamental principle of argument
which is to be accepted without further proof3 . Instead they try to isolate the
common properties of an interesting class of mathematical objects. (Com-
pare the de¬nition of birds as ˜oviparous, warm-blooded, amniotic vertebrates
which have their anterior extremities transformed into wings. Metacarpus
3
Within the context of a particular system. The axioms of one system may be theorems
in a another. We shall discuss this further in Section 14.3.
243
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and ¬ngers carry feathers or quills. There is an intertarsal joint and not
more than 4 toes, of which the ¬rst is a hallux.™ Thus penguins are and
dodos were birds but dragon¬‚ies and bats are not.)
At ¬rst sight, axiom systems like that for metric systems seem merely
methods for economising on the number of theorems we have to prove. A
theorem which applies to all metric spaces will not need to be proved for
each individually. However, a successful axiom system should be a powerful
research tool by suggesting appropriate questions. (You will learn more about
a penguin by comparing it to a hawk than by comparing it to a frog.) Thus
if we have a ˜space on which we do analysis™ we can ask ˜can we make it into
a metric space?™. If it is a metric space we can then compare and contrast
it with other metric spaces that we know and this may well suggest new
theorems.
Although concepts like metric space are intended to apply to important
natural systems, we often study ˜toy examples™ in order to gain insight into
what can happen in such a system. Here is such a ˜toy example™.

Exercise 10.3.7. We work in Rm with the usual Euclidean norm. Show that
if d(x, y) = x + y when x = y and d(x, x) = 0, then (R m , d) is a metric
space.

This metric is called the British Railway non-stop metric. To get from A
to B in the fastest time, we travel via London (the origin). Here is another
version of the same idea which we call the British Railway stopping metric.

Exercise 10.3.8. We work in Rm with the usual Euclidean norm. If there
exists a real » such that »x = y, we write d(x, y) = x ’ y . Otherwise, we
set d(x, y) = x + y . Show that (Rm , d) is a metric space.

Much of the ˜mere algebra™ which we did for the Euclidean metric car-
ries over with hardly any change to the general metric case. Compare the
following de¬nition with De¬nition 4.1.8.

De¬nition 10.3.9. Let (X, d) be a metric space. If an ∈ X for each n ≥ 1
and a ∈ X, then we say that an ’ a if, given > 0, we can ¬nd an n0 ( )
such that

for all n ≥ n0 ( ).
d(an , a) <

We call a the limit of an .

Exercise 10.3.10. (i) Let d1 be the British Railway non-stop metric on Rm
de¬ned in Exercise 10.3.7. Show that xn ’ x in (Rm , d1 ) if and only if
244 A COMPANION TO ANALYSIS

(A) there exists an N such that xn = x for all n ≥ N , or
(B) x = 0 and xn ’ 0 as n ’ ∞.
(ii) Find and prove the corresponding result for the British Railway stop-
ping metric of Exercise 10.3.8.
If (V, ) is a normed space, we say that xn ’ x in (V, ) if xn ’ x in
the derived metric. The following result on limits in such spaces is an easy
generalisation of Lemma 4.1.9.
Lemma 10.3.11. Let V be a real vector space and a norm on V .
(i) The limit is unique. That is, if an ’ a and an ’ b as n ’ ∞, then
a = b.
(ii) If an ’ a as n ’ ∞ and n(1) < n(2) < n(3) . . . , then an(j) ’ a as
j ’ ∞.
(iii) If an = c for all n, then an ’ c as n ’ ∞.
(iv) If an ’ a and bn ’ b as n ’ ∞, then an + bn ’ a + b.
(v) Suppose that an ∈ V , a ∈ V , »n ∈ R, and » ∈ R. If an ’ a and
»n ’ », then »n an ’ »a.
Proof. Left to the reader. The reader who looks for the proof of Lemma 4.1.9
will ¬nd herself referred backwards to the proof of Lemma 1.2.2, but will ¬nd
the proofs for a general norm on a general vector space just as easy as the
proofs for the Euclidean norm in one-dimension.
Exercise 10.3.12. What changes are necessary (if any) in the statements
and proofs of Lemma 10.3.11 if we make V be a complex vector space?
If we consider a general metric, then we have no algebraic structure and
the result corresponding to Lemma 10.3.11 has far fewer parts.
Lemma 10.3.13. Let (X, d) be a metric space.
(i) The limit is unique. That is, if an ’ a and an ’ b as n ’ ∞, then
a = b.
(ii) If an ’ a as n ’ ∞ and n(1) < n(2) < n(3) . . . , then an(j) ’ a as
j ’ ∞.
(iii) If an = c for all n, then an ’ c as n ’ ∞.
The proof is again left to the reader.
The material on open and closed sets from Section 4.2 goes through es-
sentially unchanged (and proofs are therefore left to the reader).
De¬nition 10.3.14. Let (X, d) be a metric space. A set F ⊆ X is closed if
whenever xn ∈ F for each n and xn ’ x as n ’ ∞ then x ∈ F .
A set U ⊆ X is open if, whenever x ∈ U , there exists an > 0 such that,
whenever d(x, y) < , we have y ∈ U .
245
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Example 10.3.15. Let (X, d) be a metric space. Let x ∈ X and r > 0.
(i) The set B(x, r) = {y ∈ X : d(x, y) < r} is open.
¯
(ii) The set B(x, r) = {y ∈ X : d(x, y) ¤ r} is closed.
¯
We call B(x, r) the open ball of radius r and centre x. We call B(x, r)
the closed ball of radius r and centre x.
Lemma 10.3.16. Let (X, d) be a metric space. A subset U of X is open if
and only if each point of U is the centre of an open ball lying entirely within
U.
Exercise 10.3.17. Find the open balls for the two British rail metrics. (For
all but one point there is a di¬erence between balls of small and large ra-
dius.) Show that the open sets for the British Railway stopping metric of
Exercise 10.3.8 consist of sets of the form A ∪ B where A is empty or
A = {x : x < δ} (here is the Euclidean norm and δ > 0) and B
is the union of sets of the form {»u : a < » < b} where 0 < a < b and
u = 0.
Find the open sets for the British Railway non-stop metric of Exercise 10.3.7.
De¬nition 10.3.18. The set N is a neighbourhood of the point x if we can
¬nd an r > 0 such that B(x, r) ⊆ N .
Lemma 10.3.19. Let (X, d) be a metric space. A subset U of X is open if
and only if its complement X \ U is closed.
Lemma 10.3.20. Let (X, d) be a metric space. Consider the collection „ of
open sets in X.
(i) … ∈ „ , X ∈ „ .
(ii) If U± ∈ „ for all ± ∈ A, then ±∈A U± ∈ „ .
(iii) If U1 , U2 , . . . , Un ∈ „ , then n Uj ∈ „ .
j=1

Lemma 10.3.21. Let (X, d) be a metric space. Consider the collection F
of closed sets in X.
(i) … ∈ F, X ∈ F.
(ii) If F± ∈ F for all ± ∈ A, then ±∈A F± ∈ F.
(iii) If F1 , F2 , . . . , Fn ∈ F, then n Fj ∈ F.
j=1

The new de¬nition of continuity only breaks the chain of translations
slightly because it involves two metric spaces.
De¬nition 10.3.22. Let (X, d) and (Z, ρ) be metric spaces. We say that a
function f : X ’ Z is continuous at some point x ∈ X if, given > 0, we
can ¬nd a δ( , x) > 0 such that, if y ∈ X and d(x, y) < δ( , x), we have
ρ(f (x), f (y)) < .
246 A COMPANION TO ANALYSIS

If f is continuous at every point x ∈ X, we say that f is a continuous
function on X.
The reader may feel that De¬nition 4.2.14 is more general than De¬ni-
tion 10.3.22 because it involves a set E. The following remark shows that
this is not so.
Lemma 10.3.23. Let (X, d) be a metric space and let E ⊆ X. Let dE :
E 2 ’ R be given by dE (u, v) = d(u, v) whenever u, v ∈ E. Then (E, dE ) is a
metric space.
We conclude this section with more results taken directly from Section 4.2
Lemma 10.3.24. Let (X, d) and (Z, ρ) be metric spaces and suppose that
the function f : X ’ Z is continuous at x ∈ X. Then, if xn ∈ E and
xn ’ x, it follows that f (xn ) ’ f (x).
Lemma 10.3.25. Let (X, d) and (Z, ρ) be metric spaces. The function f :
X ’ Z is continuous if and only if f ’1 (O) is open whenever O is open.
Lemma 10.3.26. Let (X, d), (Y, θ) and (Z, ρ) be metric spaces. If f : X ’
Y and g : Y ’ Z are continuous, then so is their composition g —¦ f .


10.4 Norms and the interaction of algebra
and analysis
Starting from one metric we can produce a wide variety of ˜essentially equiv-
alent metrics™.
Exercise 10.4.1. Let (X, d) be a metric space.
(i) If we de¬ne d1 (x, y) = min(1, d(x, y)), show that (X, d1 ) is a metric
space. Show further that, if xn ∈ X [n ≥ 0], then d1 (xn , x0 ) ’ 0 as n ’ ∞
if and only if d(xn , x0 ) ’ 0.
(ii) If we de¬ne d2 (x, y) = d(x, y)2 , show that (X, d2 ) is a metric space.
Show further that, if xn ∈ X [n ≥ 0], then d2 (xn , x0 ) ’ 0 as n ’ ∞ if and
only if d(xn , x0 ) ’ 0.
(iii) Let ± > 0 and de¬ne d± (x, y) = d(x, y)± . For which values of ± is
(X, d± ) always a metric space? For those values of ±, is it always true that,
if xn ∈ X [n ≥ 0], then d± (xn , x) ’ 0 as n ’ ∞ if and only if d(xn , x) ’ 0?
Prove your statements.
(iv) (This is trivial but explains why we had to be careful in the statement
of (iii).) Give an example of metric space (X, d) with X an in¬nite set such
that (X, d± ) is a metric space for all ± > 0.
247
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Because norms re¬‚ect an underlying vector space structure they are much
more constrained.

Lemma 10.4.2. Let 1 and 2 be norms on a vector space V . Then the
following two statements are equivalent.
(a) There exist K, L > 0 such that K x 1 ≥ x 2 ≥ L x 1 for all
x∈V.
(b) If xn ∈ V [n ≥ 0], then xn ’ x0 1 ’ 0 as n ’ ∞ if and only if
xn ’ x0 2 ’ 0.

Proof. It is easy to see that (a) implies (b), since, for example, if xn ’x0 ’
2
0, then

¤ L’1 xn ’ x0
xn ’ x0 ’ 0,
1 2


and so xn ’ x0 1 ’ 0 as n ’ ∞.
To see that (b) implies (a), suppose that (a) is false. Without loss of
generality, suppose that there is no K > 0 such that K x 1 ≥ x 2 for all
x ∈ V . Then we can ¬nd yn ∈ V such that

> n2 y n 1 .
yn 2

’1
Setting x0 = 0 and xn = n’1 yn for n ≥ 1, we obtain
1 yn


xn ’ x0 = 1/n ’ 0
= xn
1 1


and

xn ’ x0 = xn > n,
2 2


so xn ’ x0 0 as n ’ 0. Thus (b) is false if (a) is false
2


Exercise 10.4.3. Let d1 and d2 be metrics on a space X. Consider the
following two statements.
(a) There exist K, L > 0 such that Kd1 (x, y) ≥ d2 (x, y) ≥ Ld1 (x, y) for
all x, y ∈ X.
(b) If xn ∈ X [n ≥ 0] then d1 (xn , x0 ) ’ 0 as n ’ ∞ if and only if
d2 (xn , x0 ) ’ 0.
Show that (a) implies (b) but that (b) does not imply (a).

Although we shall not make much use of this, the concepts just introduced
have speci¬c names.
248 A COMPANION TO ANALYSIS

De¬nition 10.4.4. (i) Let K ≥ 0. If (X, d) and (Y, ρ) are metric spaces, we
say that a function f : X ’ Y is Lipschitz with constant K if ρ(f (x), f (y)) ¤
Kd(x, y) for all x, y ∈ X.
(ii) We say that two metrics d1 and d2 on a space X are Lipschitz equiv-
alent if there exist K, L > 0 such that Kd1 (x, y) ≥ d2 (x, y) ≥ Ld1 (x, y) for
all x, y ∈ X.
Exercise 10.4.5. (i) If (X, d) and (Y, ρ) are metric spaces, show that any
Lipschitz function f : X ’ Y is continuous. Give an example to show that
the converse is false.
(ii) Show that Lipschitz equivalence is indeed an equivalence relation be-
tween metric spaces.
The following result is a less trivial example of the interaction between
algebraic and metric structure.
Theorem 10.4.6. All norms on Rn are Lipschitz equivalent.
Proof. We consider the standard Euclidean norm given by
1/2
n
x2
x= j
j=1

and show that it is equivalent to an arbitrary norm —.
One of the required inequalities is easy to obtain. If we write ei for the
unit vector along the xi axis we have,
n n
¤ |xj | ej
x = xj ej
— —
j=1 j=1

¤ n max |xr | max er ¤ (n max er — ) x .

1¤r¤n 1¤r¤n 1¤r¤n

We have thus shown the existence of a K such that K x ≥ x — .
To obtain the other inequality we use an argument from analysis4 . Ob-
serve that, if we give Rn and R the usual Euclidean metric and de¬ne
f : Rn ’ R by
f (x) = x — ,
then, using the triangle inequality for and the result of the previous

paragraph,
|f (x) ’ f (y)| = | x ’ y —| ¤ x ’ y ¤K x’y ,
— —
4
Experts may be interested in Exercises K.184 and K.185. Non-experts should ignore
this footnote.
249
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so f is continuous. Now the unit sphere

S(0, 1) = {x ∈ Rn : x = 1}

is closed and bounded for the usual Euclidean metric and so, by Theo-
rem 4.3.4, f is bounded on S(0, 1). In particular, we can ¬nd k ∈ S(0, 1)
where f attains its minimum, that is to say,

f (k) ¤ f (x)

for all x with x = 1.
Since k = 1, we have k = 0 and thus k > 0. Let us set L = k — .

If x = 0, then x ’1 x ∈ S(0, 1) so
’1 ’1 ’1
x) ≥ f (k) = k
x x = x x = f( x = L,
— — —


≥ L x . The case x = 0 is trivial, so we have established
and so x —


Kx ≥ x ≥L x



for all x ∈ Rn as required.

Exercise 10.4.7. (i) Let aj ∈ R for 1 ¤ j ¤ n and write
n
aj |xj |.
x =
w
j=1


State and prove necessary and su¬cient conditions for w to be a norm on
R.n

be the usual Euclidean norm on Rn . Show that, if n ≥ 2, then
(ii) Let
given any K we can ¬nd a norm — and points y1 and y2 such that


y1 > K y1 and y2 > K y2 — .



Does this result remain true if n = 1? Give reasons.

The fact that all norms on a ¬nite dimensional space are, essentially,
the same meant that, in Chapters 4 and 6 and elsewhere, we did not really
need to worry about which norm we used. By the same token, it obscured
the importance of the appropriate choice of norm in approaching problems in
analysis. However, once we consider in¬nite dimensional spaces, the situation
is entirely di¬erent.
250 A COMPANION TO ANALYSIS

Exercise 10.4.8. Consider s00 the space of real sequences a = (an )∞ such
n=1
that all but ¬nitely many of the an are zero.
(i) Show that if we use the natural de¬nitions of addition and scalar
multiplication
(an ) + (bn ) = (an + bn ), »(an ) = (»an )
then s00 is a vector space.
(ii) Show that the following de¬nitions all give norms on s00 .
= max |an |,
a ∞
n≥1

= max |nan |,
a w
n≥1

|an |,
a =
1
n=1
1/2

|an |2
a = ,
2
n=1

n|an |.
a =
u
n=1

(iii) For each of the twenty ¬ve possible pairs of norms A and B from
part (ii), establish whether or not there exists a K such that K a A ≥ a B
for all a ∈ s00 . Show that none of the norms are Lipschitz equivalent.
(iv) Find a family of norms ± on s00 [± > 0] such that ± and β
are not Lipschitz equivalent if ± = β.
When we study an algebraic structure we also study those maps which
preserve algebraic structure. Thus, if we have a map θ : G ’ H be-
tween groups, we want θ to preserve group multiplication, that is, we want
θ(xy) = θ(x)θ(y). Such maps are called homomorphisms. Similarly, when
we study an analytic structure, we also study those maps which preserve an-
alytic structure. Consider for example a map f between metric spaces which
preserves ˜sequences tending to limits™. We then have
xn ’ x0 implies f (xn ) ’ f (x0 )
and study continuous functions.
If we study objects which have linked analytic and algebraic structures,
we will thus wish to study maps between them which preserve both algebraic
and analytic structures. For normed vector spaces this means that we wish
to study continuous linear maps. This fact is obscured by the fact that linear
maps between ¬nite dimensional vector spaces are automatically continuous.
251
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 10.4.9. Let U and V be ¬nite dimensional vector spaces with norms
V . Show that any linear map T : U ’ V is automatically con-
U and
tinuous.
However, as we shall see in Exercise 10.4.14, when we deal with in¬nitely
dimensional vector spaces, not all linear maps need be continuous.
Here is a simple but important characterisation of those linear maps which
are continuous.
Lemma 10.4.10. Let (U, U ) and (V, V ) be normed vector spaces.
Then a linear map T : U ’ V is continuous if and only if there exists a K
such that T u V ¤ K u U for all u ∈ U .
Exercise 10.4.11. Prove Lemma 10.4.10. [One possible proof follows that
of Lemma 10.4.2 very closely.]
Exercise 10.4.12. By observing that the space of polynomials of degree n
or less has ¬nite dimension, or otherwise, prove the following result. There
exists a constant Cn such that

sup |P (t)| ¤ Cn sup |P (t)|
t∈[0,1] t∈[0,1]

for all real polynomials P of degree n or less.
State, with proof, whether we can ¬nd a C independent of n such that

sup |P (t)| ¤ C sup |P (t)|
t∈[0,1] t∈[0,1]

for all real polynomials P .
State, with proof, whether we can ¬nd a constant An such that of n such
that

sup |P (t)| ¤ An sup |P (t)|
t∈[0,1] t∈[0,1]

for all real polynomials P of degree n or less.
The close link between Lemma 10.4.10 and Lemma 10.4.2 is highlighted
in the next exercise.
Exercise 10.4.13. Let 1 and 2 be two norms on a vector space U .
(i) Show that the following statements are equivalent.
(a) If xn ’ x0 1 ’ 0, then xn ’ x0 2 ’ 0.
1 ) ’ (U,
(b) The identity map I : (U, 2 ) from U with norm 1
to U with norm 2 is continuous.
252 A COMPANION TO ANALYSIS

(c) There exists a K such that K u 1 ≥ u 2 for all u ∈ U .
(ii) Write down, and prove equivalent, three similar statements (a) , (b)
and (c) where (c) is the statement
(c) There exist K and L with K > L > 0 such that K u 1 ≥ u 2 ≥
L u 1 for all u ∈ U .
Here are some examples of continuous and discontinuous linear maps5 .
Exercise 10.4.14. Consider the vector space s00 de¬ned in Exercise 10.4.8
and the norms
= max |an |,
a ∞
n≥1

= max |nan |,
a w
n≥1

|an |,
a =
1
n=1
1/2

|an |2
a = ,
2
n=1

n|an |.
a =
u
n=1

given there.
(i) For each of the twenty ¬ve possible pairs of norms A and B
A) ’
listed above state, with reasons, whether the identity map I : (s00 ,
(s00 , B ) from s00 with norm A to s00 with norm B is continuous.
(ii) Show that the map T : s00 ’ R de¬ned by T a = ∞ aj is linear.
j=1
If we give R the usual Euclidean norm and s00 one of the ¬ve norms listed
above, state, with reasons, whether T is continuous.
(iii) Show that the map S : s00 ’ s00 de¬ned by Sa = b with bj = jaj
is linear. For each of the twenty ¬ve possible pairs of norms A and B
listed above, state, with reasons, whether S, considered as a map from s 00
with norm A to s00 with norm B , is continuous.

Once we have Lemma 10.4.10, the way is open to an extension of the idea
of an operator norm investigated in Section 6.2.
5
If the reader returns to this example after studying the chapter on completeness,
she may note that the normed spaces given here are not complete. The question of the
existence of discontinuous linear maps between complete normed vector spaces involves
more advanced ideas, notably the axiom of choice. However, the consensus is that it
is unreasonable to expect all linear maps between complete normed vector spaces to be
continuous, in the same way, and for much the same reasons, as it is unreasonable to
expect all sets in R3 to have volume (see page 172).
253
Please send corrections however trivial to twk@dpmms.cam.ac.uk

De¬nition 10.4.15. Let U and V be vector spaces with norms and
U
V . If ± : U ’ V is a continuous linear map, then we set

± = sup ±x .
V
U ¤1
x

Exercise 10.4.16. (i) Let U = V = s00 and let a U = a V = ∞ |an |. n=1
’1
Show that if T : U ’ V is de¬ned by T a = b with bj = (1 ’ j )aj then T is
a continuous linear map. However, there does not exist an a ∈ U with a = 0
such that T a V = T a U . [See also Exercise 11.1.16.]
(ii) If U and V are ¬nite dimensional normed vector spaces and T : U ’
V is linear can we always ¬nd an a ∈ U with a = 0 such that T a V =
T a U ? Give reasons.
In exactly the way we proved Lemma 6.2.6, we can prove the following
results.
Exercise 10.4.17. Let U and V be vector spaces with norms U and V
and let ±, β : U ’ V be continuous linear maps.
(i) If x ∈ U , then ±x V ¤ ± x U .
(ii) ± ≥ 0.
(iv) If ± = 0, then ± = 0.
(v) If » ∈ R, then »± is a continuous linear map and »± = |»| ± .
(vi) (The triangle inequality) ± + β is a continuous linear map and
±+β ¤ ± + β .

W and γ : V ’ W is a continuous
(vii) If W is vector spaces with norm
linear map, then γ± is a continuous linear map and γ± ¤ γ ± .
We restate part of Exercise 10.4.17 in the language of this chapter.
Lemma 10.4.18. Let U and V be vector spaces with norms U and V.
The space L(U, V ) of continuous linear maps is a vector space and the oper-
ator norm is a norm on L(U, V ).
Although we shall not develop the theme further, we note that we now
have an appropriate de¬nition for di¬erentiation of functions between general
normed vector spaces.
De¬nition 10.4.19. Let U and V be vector spaces with norms U and
V . Suppose that E is a subset of of U and x a point such that there exists
a δ > 0 with B(x, δ) ⊆ E. We say that f : E ’ V is di¬erentiable at x, if
we can ¬nd a continuous linear map ± : U ’ V such that, when h ∈ B(x, δ),
f (x + h) = f (x) + ±h + (x, h) h U
254 A COMPANION TO ANALYSIS

where (x, h) V ’ 0 as h U ’ 0. We write ± = Df (x) or ± = f (x).
If E is open and f is di¬erentiable at each point of E, we say that f is
di¬erentiable on E.
Notice that the only important change from De¬nition 6.1.4 is that we
speci¬cally demand that ± = Df (x) is a continuous linear function.
The best way to understand what is going on is probably to do the next
exercise.
Exercise 10.4.20. State and prove the appropriate extension of the chain
rule given as Lemma 6.2.10.
There is no di¬culty in extending all our work on di¬erentiation from
¬nite dimensional to general normed spaces. The details are set out in
Dieudonn´™s book ([13], Chapter VIII).
e
Exercise 10.4.21. The title of this section was ˜Norms and the interaction
of algebra and analysis™ but much of it was really pure algebra in the sense
that it did not use the fundamental axiom of analysis. Go back through the
section and note which results are genuine results of analysis and which are
just algebra disguised as analysis.


Geodesics ™
10.5
This section introduces an idea which is important in more advanced work
but which will not be used elsewhere in this book. The discussion will be
informal.
Suppose we wish to build a road joining two points of the plane. The cost
per unit length of road will depend on the nature of the terrain. If the cost
of building a short stretch of length δs near a point (x, y) is (to ¬rst order)
g(x, y)δs then, subject to everything being well behaved, the cost of road “
will be

g(x, y) ds.


The reader may use whatever de¬nition of line integral she is comfortable
with. In particular she does not need to have read Section 9.5. It is tempting
to de¬ne the distance d(A, B) between two points A and B in the plane as

d(A, B) = inf g(x, y) ds : “ joining A and B .


If we include amongst our conditions that g is continuous and g(x, y) > 0
everywhere, we see that d is a metric.
255
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Plausible statement 10.5.1. (i) d(A, B) ≥ 0 for all A and B.
(ii) d(A, B) = 0 implies A = B.
(iii) d(A, B) = d(B, A) for all A and B.
(iv) d(A, C) ¤ d(A, B) + d(B, C) for all A, B and C.
I have carefully used ˜inf™ rather than ˜min™ in the de¬nition of d(A, B).
One problem is that I have not speci¬ed the kind of “ that is permissible.
(Notice that the argument needed to obtain part (iv) of Statement 10.5.1
means that we cannot just use ˜smooth™.) However, it can be shown that, if
we choose a suitable class for the possible “ and a suitably well behaved g,
then the minimum is attained. If “0 is a path from A to B such that

g(x, y) ds = d(A, B),
“0

we call “0 a geodesic. (The geodesic need not be unique, consider road
building for two towns at diametrically opposite points of a circular marsh.)
If any two points are joined by a geodesic, then d(A, B) is the ˜length of the
geodesic path joining A and B™ where length refers to the metric d and not
to the Euclidean metric.
Let us try to use these ideas to ¬nd a metric on the upper half-plane

H = {z ∈ C : z > 0}

which is invariant under M¨bius transformation. (If you have not met M¨bius
o o
transformations skip to after Exercise 10.5.4, where I restate the problem.)
More precisely, we want a metric d such that, if T is a M¨bius map mapping H
o
bijectively to itself, then d(T z1 , T z2 ) = d(z1 , z2 ) for all z1 , z2 ∈ H. Of course,
no such metric might exist, but we shall see where the question leads.
Lemma 10.5.2. The set H of M¨bius maps T such that T |H : H ’ H is
o
bijective is a subgroup of the group M of all M¨bius transformations. The
o
subgroup H is generated by the transformations Ta with a ∈ R, D» with »
real and » > 0 and J, where

Ta (z) =a + z
D» (z) =»z
J(z) = ’ z ’1

Sketch proof. The fact that H is a subgroup of M follows from easy general
principles (see Exercise 10.5.3). We check directly that (if a is real and » > 0)
Ta , D» and J lie in H. Thus the composition of elements of the stated type
will also lie in H.
256 A COMPANION TO ANALYSIS

We now wish to identify H. If T ∈ H, then T is a well behaved bijective
map on C— = C ∪ {∞} and must take the boundary of H to the boundary
of H. Thus, writing R for the real axis, we know that T takes R ∪ {∞}
to R ∪ {∞}. Thus, if T (∞) = ∞, then T (∞) = a with a real and we set
M1 = JT’a . If T (∞) = ∞ we take M1 to be the identity map. In either case
M1 T ∈ H and M1 T (∞) = ∞. We now observe that M1 T (0) ∈ R ∪ {∞},
by our previous argument, and that M1 T (0) = ∞, since M¨bius maps are
o
bijections. Thus M1 T (0) = b with b real. Set M2 = T’b . We have M2 M1 T ∈
H, M2 M1 T (0) = 0 and M2 M1 T (∞) = ∞. But any M¨bius map M which
o
¬xes 0 and ∞ must have the form M (z) = µz for some µ = 0. If M takes H
to H, then µ is real and positive. Thus M2 M1 T = D» for some » > 0 and
’1 ’1
T = M1 M2 D» . We have shown that any element H is the composition of
maps of the stated type and the lemma follows.

(The remark on page 234 applies here too. We know so much about M¨bius
o
transformations that the argument above is more a calculation than a proof.
However, for more complicated conformal maps than are generally found in
undergraduate courses, arguments involving ˜boundaries™ may be mislead-
ing.)

Exercise 10.5.3. (Most readers will already know the content of this exer-
cise.) Let X be a set and S(X) the collection of bijections of X. Show that
S(X) is a group under the operation of composition. If G is a subgroup of
S(X), Y a subset of X and we de¬ne

K = {f ∈ G : f (Y ) = Y },

show that K is a subgroup of G.

Exercise 10.5.4. If |a| < 1 and Ma is the M¨bius map Ma given by
o
z’a
Ma z = ,
a— z ’ 1

show that |Ma eiθ | = 1 for all real θ. Deduce, stating any properties of M¨bius
o
transforms that you need, that Ma maps the unit disc D = {z : |z| < 1} to
itself and interchanges 0 and a. Show that the set H of M¨bius maps T such
o
that T |H : D ’ D is bijective is a subgroup of the group M of all M¨bius o
transformations generated by the transformations Ma with |a| < 1 and the
rotations.

Lemma 10.5.2 reduces our problem to one of ¬nding a metric d on the half
plane H = {z : z > 0} such that d(T z1 , T z2 ) = d(z1 , z2 ) for all z1 , z2 ∈ H,
257
Please send corrections however trivial to twk@dpmms.cam.ac.uk

whenever T is one of the transformations Ta with a ∈ R, D» with » > 0 and
J de¬ned by
Ta (z) =a + z
D» (z) =»z
J(z) = ’ z ’1 .
We try using the ideas of this section and de¬ning

d(z1 , z2 ) = inf g(z) ds : “ joining z1 and z2 ,


for some appropriate strictly positive function g : H ’ R. (Throughout, we
write z = x + iy and identify the plane R2 with C in the usual manner.)
Arguing informally, this suggests that, to ¬rst order,
d(z + δz, z) = g(z)δs = g(z)|(z + δz) ’ z|.
If T is one of the transformations we are considering, then we must have, to
¬rst order,
g(z)|(z + δz) ’ z| = d(z + δz, z) = d(T (z + δz), T z)
= g(T z)|T (z + δz) ’ T (z)| = g(T z)|T (z)||δz|,
and so
g(z) = g(T z)|T (z)|.
Taking T = Ta , we obtain g(z) = g(z + a) for all real a so g(x + iy) = h(y)
for some well behaved function h : (0, ∞) ’ (0, ∞).
Taking T = D» (z), we obtain g(z) = »g(»z) for all real strictly positive ».
Thus h(y) = »h(»y) for all », y > 0. Taking » = 1/y, we obtain h(y) = Ay
for some A > 0. The factor A merely scales everything, so we take A = 1
and decide to experiment with g(x + iy) = 1/y.
Exercise 10.5.5. Verify that, if g(x + iy) = 1/y and T = J, then equation
holds.
Exercise 10.5.6. Suppose that γ : [0, 1] ’ H is well behaved and T ∈ H.
˜
Let γ = T —¦ γ (that is, γ (z) = T (γ(z))). If “ is the path described by γ and “
˜ ˜
is the path described by γ show, using whatever de¬nition of the line integral
˜
that you wish, that

inf g(z) ds = inf g(z) ds
˜
“ “

for T = Ta , T = D» and T = J. Conclude that the equality holds for all
T ∈ H.
258 A COMPANION TO ANALYSIS

Exercise 10.5.6 shows that, if we set
1
d(z1 , z2 ) = inf ds : “ joining z1 and z2 ,
y


then we do, indeed, get an invariant metric (that is, a metric with d(T z 1 , T z2 ) =
d(z1 , z2 ) for all z1 , z2 ∈ H whenever T ∈ H).
To ¬nd out more about this metric we need to ¬nd the geodesics and to do
this we use the methods of the calculus of variation described in Section 8.4.
We shall use the methods in a purely exploratory manner with no attempt
at rigour. (Notice that, even if we did things rigorously, we would only
get necessary conditions.) Suppose that we wish to ¬nd the path “0 which
1
minimises “ y ds among all paths “ from z1 = x1 + iy1 to z2 = x2 + iy2 with
x1 < x2 and y1 , y2 > 0. It seems reasonable to look for a path given by
z = x + iy(x) where y : [x1 , x2 ] ’ (0, ∞) is well behaved. We thus wish to
minimise
x2 x2
1
(1 + y (x)2 )1/2 dx = G(y(x), y (x)) dx
y(x)
x1 x1

where G(v, w) = v ’1 (1 + w2 )1/2 . The Euler-Lagrange equation gives, via
Exercise 8.4.9 (i),

G(y(x), y (x)) ’ y (x)G,2 (y(x), y (x)) = c,

where c is a constant. Writing the equation more explicitly, we get

(1 + y 2 )1/2 y2
’ = c,
y(1 + y 2 )1/2
y
so that
2
1 = cy(1 + y )1/2 .

Setting a = c’1 , we obtain
2
a2 = y 2 (1 + y )

so that, solving formally,
y dy
= dx
(a2 ’ y 2 )1/2
and

’(a2 ’ y 2 )1/2 = x ’ b,
259
Please send corrections however trivial to twk@dpmms.cam.ac.uk

for some constant b ∈ R. Thus

(x ’ b)2 + y 2 = a2 .

We expect the geodesic to be the arc of a circle with its centre on the real
axis passing through the two points z1 and z2 .
Exercise 10.5.7. Suppose (x1 , y1 ), (x2 , y2 ) ∈ R2 and x1 = x2 . Show that
there is one and only one circle with its centre on the real axis passing through
the two points (x1 , y1 ) and (x2 , y2 ).
What happens if z1 = z2 ? One way of guessing is to consider the
geodesic path between z1 and z2 + δ where δ is real and non-zero. If we let
δ ’ 0, the appropriate circle arcs approach a straight line joining z1 and z2
so we would expect this to be the solution.
Exercise 10.5.8. Attack the geodesic problem by considering paths given by
z = x(y) + iy where y : [y1 , y2 ] ’ R is well behaved. (The di¬culties are
mainly notational, the formulae of the variational calculus assume that y is
a function of x and it requires a certain amount of clear headedness to deal
with the case when x is a function of y. You should be able to make use
of Exercise 8.4.9 (ii).) Check, by choosing appropriate constants, that your
solutions include straight lines perpendicular to the x-axis.
Now that we know (or at least guess) what the geodesics are we can see
(at least if we know a little about M¨bius maps) a di¬erent way of showing
o
this.
Exercise 10.5.9. We work in C— = C ∪ {∞}.

<<

. 8
( 19)



>>