<<

. 4
( 19)



>>

Proof. Most mathematicians would simply write
∞ ∞
n’1
zj zj |z|j
e(z) ’ ¤
=
j! j! j!
j=0 j=n j=n

|z|n |z|k
=
n! (n + 1)(n + 2) . . . (n + k)
k=0
∞ k
|z|n |z|
¤
n! n+1
k=0
n
(n + 1)|z|n
|z| 1
¤ =
n! 1 ’ |z| (n + 1 ’ |z|)n!
n+1
2|z|n
¤ .
n!
96 A COMPANION TO ANALYSIS

Exercise 5.4.9. A particularly cautious mathematician might prove Lemma 5.4.8
as follows. Set em (z) = m z . Show that, if m ≥ n, then
j
j=0 j!


(n + 1)|z|n
|em (z) ’ en’1 (z)| ¤ .
(n + 1 ’ |z|)n!

Deduce that

(n + 1)|z|n
|e(z) ’ en’1 (z)| ¤ |e(z) ’ em (z)| + |em (z) ’ en’1 (z)| ¤ |e(z) ’ em (z)| + .
(n + 1 ’ |z|)n!

By allowing m ’ ∞, obtain the required result.

We now switch our attention to the restriction of e to R. The results we
expect now come tumbling out.

Exercise 5.4.10. Consider e : R ’ R given by e(x) = ∞ xj /j!. j=0
(i) Using Lemma 5.4.8, show that |e(h) ’ 1 ’ h| ¤ h2 for |h| < 1/2.
Deduce that e is di¬erentiable at 0 with derivative 1.
(ii) Explain why e(x + h) ’ e(x) = e(x)(e(h) ’ 1). Deduce that e is
everywhere di¬erentiable with e (x) = e(x).
(iii) Show that e(x) ≥ 1 for x ≥ 0 and, by using the relation e(’x)e(x) =
1, or otherwise, show that e(x) > 0 for all x ∈ R.
(iv) Explain why e is a strictly increasing function.
(v) Show that e(x) ≥ x for x ≥ 0 and deduce that e(x) ’ ∞ as x ’ ∞.
Show also that e(x) ’ 0 as x ’ ’∞.
(vi) Use (v) and the intermediate value theorem to show that e(x) = y
has a solution for all y > 0.
(vii) Use (iv) to show that e(x) = y has at most one solution for all y > 0.
Conclude that e is a bijective map of R to R++ = {x ∈ R : x > 0}.
(viii) By modifying the proof of (v), or otherwise, show that P (x)e(’x) ’
0 as x ’ ∞. [We say ˜exponential beats polynomial™.]
(ix) By using (viii), or otherwise, show that e is not equal to any function
of the form P/Q with P and Q polynomials. [Thus e is a genuinely new
function.]

When trying to prove familiar properties of a familiar function, it is prob-
ably wise to use a slightly unfamiliar notation. However, as the reader will
have realised from the start, the function e is our old friend exp. We shall
revert to the mild disguise in the next section but we use standard notation
for the rest of this one.
97
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.4.11. (i) Check that R is an Abelian group under addition. Show
that R++ = {x ∈ R : x > 0} is an Abelian group under multiplication. Show
that exp : (R, +) ’ (R++ , —) is a isomorphism.
(ii) [Needs a little more familiarity with groups] Show that R \ {0} is an
Abelian group under multiplication. By considering the order of the element
’1 ∈ R \ {0}, or otherwise show that the groups (R, +) and (R \ {0}, —)
are not isomorphic.

We can turn Plausible Statement 5.4.1 into a theorem

Theorem 5.4.12. The general solution of the equation

y (x) = y(x), ()

where y : R ’ R is a di¬erentiable function is

y(x) = a exp(x)

with a ∈ R.

Proof. It is clear that y(x) = a exp(x) is a solution of . We must prove
there are no other solutions. To this end, observe that, if y satis¬es , then
d
(exp(’x)y(x)) = y (x) exp(’x) ’ y(x) exp(’x) = 0
dx
so, by the mean value theorem, exp(’x)y(x) is a constant function. Thus
exp(’x)y(x) = a and y(x) = a exp(x) for some a ∈ R.

Exercise 5.4.13. State and prove the appropriate generalisation of Theo-
rem 5.4.12 to cover the equation

y (x) = by(x)

with b a real constant.

Here is another consequence of Lemma 5.4.8.

Exercise 5.4.14. (e is irrational.) Suppose, if possible, that e = exp 1 is
rational. Then exp 1 = m/n for some positive integers m and n. Explain,
why if N ≥ n,
N
1
N ! exp 1 ’
j!
j=0
98 A COMPANION TO ANALYSIS

must be a non-zero integer and so

N
1
N ! exp 1 ’ ≥ 1.
j!
j=0


Use Lemma 5.4.8 to obtain a contradiction.
r+1
Show, similarly, that ∞ (’1)
r=1 (2r’1)! is irrational.


Most mathematicians draw diagrams frequently both on paper and in
their heads6 . However, these diagrams are merely sketches. To see this,
quickly sketch a graph of exp x.

Exercise 5.4.15. Choosing appropriate scales, draw an accurate graph of
exp on the interval [0, 100]. Does it look like your quick sketch?

We conclude this section with a result which is a little o¬ our main track
but whose proof provides an excellent example of the use of dominated con-
vergence (Theorem 5.3.3).

Exercise 5.4.16. We work in C. Show that if we write

z n
aj (n)z j
1+ =
n j=0


then aj (n)z j ’ z j /j! as n ’ ∞ and |aj (n)z j | ¤ |z j |/j! for all n and all j.
Use dominated convergence to conclude that
z n
’ e(z)
1+
n
as n ’ ∞, for all z ∈ C.


The trigonometric functions ™
5.5
In the previous section we considered the simple di¬erential equation y (x) =
y(x). What happens if we consider the di¬erential equation y (x)+y(x) = 0?
6
Little of this activity appears in books and papers, partly because, even today, adding
diagrams to printed work is non-trivial. It is also possible that it is the process of drawing
(or watching the process of drawing) which aids comprehension rather than the ¬nished
product.
99
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.5.1. Proceeding along the lines of Plausible Statement 5.4.1,
show that it is reasonable to conjecture that the general solution of the equa-
tion

y (x) + y(x) = 0, ( )

where y : R ’ R is a well behaved function, is
∞ ∞
(’1)j x2j (’1)j x2j+1
y(x) = a +b
(2j)! (2j + 1)!
j=0 j=0

with a, b ∈ R.
A little experimentation reveals what is going on.
Exercise 5.5.2. We work in C. If we write
e(iz) ’ e(’iz)
e(iz) + e(’iz)
c(z) = , s(z) = ,
2 2i
show carefully that
∞ ∞
(’1)j z 2j (’1)j z 2j+1
c(z) = , s(z) = .
(2j)! (2j + 1)!
j=0 j=0

We can use the fact that e(z + w) = e(z)e(w) to obtain a collection of
useful formula for s and c.
Exercise 5.5.3. Show that if z, w ∈ C then
(i) s(z + w) = s(z)c(w) + c(z)s(w),
(ii) c(z + w) = c(z)c(w) ’ s(z)s(w),
(iii) s(z)2 + c(z)2 = 1
(iv) s(’z) = ’s(z), c(’z) = c(z).
We now switch our attention to the restriction of s and c to R.
Exercise 5.5.4. Consider c, s : R ’ R given by c(x) = ∞ (’1)j x2j /(2j)!,
j=0
∞ j 2j+1
and s(x) = j=0 (’1) x /(2j + 1)!.
(i) Using the remainder estimate in alternating series test (second para-
graph of Lemma 5.2.1), or otherwise, show that |c(h) ’ 1| ¤ h 2 /2 and
|s(h) ’ h| ¤ |h|3 /6 for |h| < 1. Deduce that c and s are di¬erentiable at
0 with c (0) = 0, s (0) = 1.
(ii) Using the addition formula of Exercise 5.5.3 (ii) and (iii) to evaluate
c(x + h) and s(x + h), show that c and s are everywhere di¬erentiable with
c (x) = ’s(x), s (x) = c(x).
100 A COMPANION TO ANALYSIS

Suppose that a group of mathematicians who did not know the trigono-
metric functions were to investigate our functions c and s de¬ned by power
series. Careful calculation and graphing would reveal that, incredible as it
seemed, c and s appeared to be periodic!

Exercise 5.5.5. (i) By using the estimate for error in the alternating series
test, show that

c(x) > 0 for all 0 ¤ x ¤ 1.

By using a minor modi¬cation of these ideas, or otherwise, show that c(2) <
0. Explain carefully why this means that there must exist an a with 1 < a < 2
such that c(a) = 0.
(iii) In this part and what follows we make use of the formulae obtained
in Exercise 5.5.3 which tell us that

s(x + y) = s(x)c(y) + c(x)s(y), c(x + y) = c(x)c(y) ’ s(x)s(y),
c(x)2 + s(x)2 = 1, s(’x) = ’s(x), c(’x) = c(x)

for all x, y ∈ R. Show that, if c(a ) = 0 and c(a ) = 0, then s(a ’ a ) = 0.
Use the fact that s(0) = 0 and s (x) = c(x) > 0 for 0 ¤ x ¤ 1 to show that
s(x) > 0 for 0 < x ¤ 1. Conclude that, if a and a are distinct zeros of
c, then |a ’ a | > 1. Deduce that c(x) = 0 has exactly one solution with
0 ¤ x ¤ 2. We call this solution a.
(iv) By considering derivatives, show that s is strictly increasing on [0, a].
Conclude that s(a) > 0 and deduce that s(a) = 1. Show that

s(x + a) = c(x), c(x + a) = ’s(x)

for all x and that c and s are periodic with period 4a (that is s(x + 4a) = s(x)
and c(x + 4a) = c(x) for all x).
(v) Show that s is strictly increasing on [’a, a], and strictly decreasing
on [a, 3a].
(vi) If u and v are real numbers with u2 + v 2 = 1, show that there there
is exactly one solution to the pair of equations

c(x) = u, s(x) = v

with 0 ¤ x < 4a.

At this point we tear o¬ the thin disguise of our characters and write
exp(z) = e(z), sin z = s(z), cos(z) = c(z) and a = π/2.
101
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.5.6. We work in R. Show that, if |u| ¤ 1, there is exactly one
θ with 0 ¤ θ ¤ π such that cos θ = u.
Using the Cauchy-Schwarz inequality (Lemma 4.1.2) show that, if x and
y are non-zero vectors in Rm , then there is exactly one θ with 0 ¤ θ ¤ π
such that
x·y
cos θ = .
xy

We call θ the angle between x and y.

Exercise 5.5.7. We work in C and use the usual disguises except that we
write a = π/2.
(i) Show that e has period 2πi in the sense that

e(z + 2πi) = e(z)

for all z and

e(z + w) = e(z)

for all z if and only w = 2nπi for some n ∈ Z. State corresponding results
for s, c : C ’ C.
(ii) If x and y are real, show that

e(x + iy) = e(x)(c(y) + is(y)).

(iii) If w = 0, show that there are unique real numbers r and y with r > 0
and 0 ¤ y < 2π such that

w = re(iy).

(iv) If w ∈ C, ¬nd all solutions of w = re(iy) with r and y real and r ≥ 0.

The traditional statement of Exercise 5.5.7 (iii) says that z = reiθ where
r = |z| and θ is real. However, we have not de¬ned powers yet so, for the
moment, this must merely be considered as a useful mnemonic. (We will
discuss the matter further in Exercise 5.7.9.)
It may be objected that our de¬nitions of sine and cosine ignore their
geometric origins. Later (see Exercises K.169 and K.170) I shall give more
˜geometric™ treatment but the following points are worth noting.
The trigonometric functions did not arise as part of classical axiomatic
Euclidean geometry, but as part of practical geometry (mainly astronomy).
An astronomer is happy to consider the sine of 20 degrees, but a classical
102 A COMPANION TO ANALYSIS

geometer would simply note that it is impossible to construct an angle of
20 degrees using ruler and compass. Starting with our axioms for R we can
obtain a model of classical geometry, but the reverse is not true.
The natural ˜practical geometric™ treatment of angle does not use radians.
Our use of radians has nothing to do with geometric origins and everything
to do with the equation (written in radians)
d
sin cx = c cos cx.
dx
Mathematicians measure angles in radians because, for them, sine is a func-
tion of analysis, everyone else measures angles in degrees because, for them,
sine is a function used in practical geometry.
In the natural ˜practical geometric™ treatment of angle it is usual to con¬ne
oneself to positive angles less than two right angles (or indeed one right
angle). When was the last time you have heard a navigator shouting ˜turn
’20 degrees left™ or ˜up 370 degrees™ ? The extension of sine from a function
on [0, π/2] to a function on R and the corresponding extension of the notion
of angle is a product of ˜analytic™ and not ˜geometric™ thinking.
Since much of this book is devoted to stressing the importance of a ˜geo-
metric approach™ to the calculus of several variables, I do not wish to down-
play the geometric meaning of sine. However, we should treat sine both as
a geometric object and a function of analysis. In this context it matters
little whether we start with a power series de¬nition of sine and end up
with the parametric description of the unit circle as the path described by
the point (sin θ, cos θ) as θ runs from 0 to 2π or (as we shall do in Exer-
cises K.169 and K.170) we start with the Cartesian description of the circle
as x2 + y 2 = 1 and end up with a power series for sine.
Exercise 5.5.8. Write down the main properties of cosh and sinh that you
know. Starting with a tentative solution of the di¬erential equation y = y,
write down appropriate de¬nitions and prove the stated properties in the style
of this section. Distinguish between those properties which hold for cosh and
sinh as functions from C to C and those which hold for cosh and sinh as
functions from R to R.


The logarithm ™
5.6
In this section we shall make use of the one dimensional chain rule.
Lemma 5.6.1. Suppose that f : R ’ R is di¬erentiable at x with derivative
f (x), that g : R ’ R is di¬erentiable at y with derivative g (y) and that
f (x) = y. Then g —¦ f is di¬erentiable at x with derivative f (x)g (y).
103
Please send corrections however trivial to twk@dpmms.cam.ac.uk

d
In traditional notation g(f (x)) = f (x)g (f (x)). We divide the proof
dx
into two parts.

Lemma 5.6.2. Suppose that the hypotheses of Lemma 5.6.1 hold and, in
addition, f (x) = 0. Then the conclusion of Lemma 5.6.1 holds.

Proof. Since

f (x + h) ’ f (x)
’ f (x) = 0
h
we can ¬nd a δ > 0 such that
f (x + h) ’ f (x)
=0
h
for 0 < |h| < δ and so, in particular, f (x + h) ’ f (x) = 0 for 0 < |h| < δ.
Thus if 0 < |h| < δ. we may write

g(f (x + h)) ’ g(f (x)) g(f (x + h)) ’ g(f (x)) f (x + h) ’ f (x)
= . ()
f (x + h) ’ f (x)
h h

Now f is di¬erentiable and so continuous at x, so f (x + h) ’ f (x) ’ 0 as
h ’ 0. It follows, by using standard theorems on limits (which the reader
should identify explicitly), that

g(f (x + h)) ’ g(f (x))
’ g (f (x))f (x)
h
as h ’ 0 and we are done.

Unfortunately the proof of Lemma 5.6.2 does not work in general for
Lemma 5.6.1 since we then have no guarantee that f (x + h) ’ f (x) = 0, even
for small h, and so we cannot use equation 7 . We need a separate proof
for this case.

Lemma 5.6.3. Suppose that the hypotheses of Lemma 5.6.1 hold and, in
addition, f (x) = 0. Then the conclusion of Lemma 5.6.1 holds.

I outline a proof in the next exercise, leaving the details to the reader.
7
Hardy™s Pure Mathematics says ™The proof of [the chain rule] requires a little care™ and
carries the rueful footnote ˜The proofs in many text-books (and in the ¬rst three editions
of this book) are inaccurate™. This is the point that the text-books overlooked.
104 A COMPANION TO ANALYSIS

Exercise 5.6.4. We prove Lemma 5.6.3 by reductio ad absurdum. To this
end, suppose that the hypotheses of the lemma hold but the conclusion is false.
(i) Explain why we can ¬nd an > 0 and a sequence hn ’ 0 such that
hn = 0 and
g(f (x + hn )) ’ g(f (x))
>
hn
for each n ≥ 0.
(ii) Explain why f (x + hn ) = f (x) for each n ≥ 0.
(iii) Use the method of proof of Lemma 5.6.2 to derive a contradiction.
The rather ugly use of reductio ad absurdum in Exercise 5.6.4 can be
avoided by making explicit use of the ideas of Exercise K.23.
Note that, in this section, we only use the special case of the chain rule
given in Lemma 5.6.2. I believe that the correct way to look at the chain rule
is by adopting the ideas of Chapter 6 and attacking it directly as we shall do
in Lemma 6.2.10. We now move on to the main subject of this section.
Since e : R ’ R++ is a bijection (indeed by Exercise 5.4.11 a group
isomorphism) it is natural to look at its inverse. Let us write l(x) = e ’1 (x)
for x ∈ (0, ∞) = R++ . Some of the properties of l are easy to obtain. (Here
and later we use the properties of the function e obtained in Exercise 5.4.10.)
Exercise 5.6.5. (i) Explain why l : (0, ∞) ’ R is a bijection.
(ii) Show that l(xy) = l(x) + l(y) for all x, y > 0.
(iii) Show that l is a strictly increasing function.
Exercise 5.6.6. No one who went to school after 1960 can really appreci-
ate the immense di¬erence between the work involved in hand multiplication
without logarithms and hand multiplication if we are allowed to use loga-
rithms. The invention of logarithms was an important contribution to the
scienti¬c revolution. When Henry Briggs (who made a key simpli¬cation)
visited Baron Napier (who invented the idea) ˜almost one quarter of an hour
was spent, each beholding [the] other . . . with admiration before one word
was spoke, at last Mr Briggs began.
˜My lord, I have undertaken this long Journey purposely to see your Per-
son, and to know by what Engine of Wit or Ingenuity you came ¬rst to think
of this most excellent Help unto Astronomy, viz., the Logarithms; but, my
Lord, being by you found out, I wonder nobody else found it out before, when
now known it is so easy.™(Quotation from 9.E.3 of [16].)
(i) As Briggs realised, calculations become a little easier if we use log 10
de¬ned by
log10 x = l(x)/l(10)
105
Please send corrections however trivial to twk@dpmms.cam.ac.uk

for x > 0. Show that log10 xy = log10 x + log10 y for all x, y > 0 and that
log10 10r x = r + log10 x.
(ii) Multiply 1.3245 by 8.7893, correct to ¬ve signi¬cant ¬gures, without
using a calculator.
(iii) To multiply 1.3245 by 8.7893 using logarithms, one looked up log 10 1.3245
and log10 8.7893 in a table of logarithms. This was quick and easy, giving

log10 1.3245 ≈ 0.1220520, log10 8.7893 ≈ 0.9439543.

A hand addition, which the reader should do, gave

log10 (1.3245 — 8.7893) = log10 1.3245 + log10 8.7893
≈ 0.1220520 + 0.9439543 = 1.0660063.

A quick and easy search in a table of logarithms (or, still easier a table of
inverse logarithms, the so called antilogarithms) showed that

log10 1.164144 ≈ .0660052, log10 1.164145 ≈ .0660089

so that

log10 11.64144 ≈ 1.0660052, log10 11.64145 ≈ 1.0660089

and, correct to ¬ve signi¬cant ¬gures, 1.3245 — 8.7893 = 11.6414.
(iv) Repeat the exercise with numbers of your own choosing. You may
use the ˜log10 ™ (often just called ˜log™) function on your calculator and the
˜inverse log10 ™ (often called ˜10x ™) but you must do the multiplication and
addition by hand. Notice that you need one (or, if you are being careful, two)
more extra ¬gures in your calculations than there are signi¬cant ¬gures in
your answers.
[There are some additional remarks in Exercises 5.7.7 and K.85.]

Other properties require a little more work.

Lemma 5.6.7. (i) The function l : (0, ∞) ’ R is continuous.
(ii) The function l is everywhere di¬erentiable with
1
l (x) = .
x
Proof. (i) We wish to show that l is continuous at some point x ∈ (0, ∞).
To this end, let δ > 0 be given. Since l is increasing, we know that, if

e(l(x) + δ) > y > e(l(x) ’ δ),
106 A COMPANION TO ANALYSIS

we have
l e(l(x) + δ) > l(y) > l e(l(x) ’ δ)
and so
l(x) + δ > l(y) > l(x) ’ δ.
Now e is strictly increasing, so we can ¬nd ·(δ) > 0 such that
e(l(x) + δ) > x + ·(δ) > x = l(e(x)) > x ’ ·(δ) > e(l(x) ’ δ).
Combining the results of the two previous sentences, we see that, if |x ’ y| <
·(δ), then |l(x) ’ l(y)| < δ. Since δ was arbitrary, l is continuous at x.
(ii) We shall use the result that, if g is never zero and g(x + h) ’ a as
h ’ 0, then, if a = 0, 1/g(x + h) ’ 1/a as h ’ 0. Observe that, since l is
continuous, we have
l(x + h) ’ l(x) ’ 0
and so
l(x + h) ’ l(x) l(x + h) ’ l(x) 1 1 1

= = =
e(l(x + h)) ’ e(l(x))
h e (l(x)) e(l(x)) x
as h ’ 0.
By using the ideas of parts (iv), (v) and (vi) of Exercise 5.4.10 together
with parts (i) and (iii) of Exercise 5.6.5 and both parts of Lemma 5.6.7, we
get the following general result.
Exercise 5.6.8. (One dimensional inverse function theorem.) Sup-
pose that f : [a, b] ’ [c, d] is continuous and f is di¬erentiable on (a, b) with
f (x) > 0 for all x ∈ (a, b) and f (a) = c, f (b) = d. Show that f is a bijection,
that f ’1 : [c, d] ’ [a, b] is continuous and that f ’1 is di¬erentiable on (c, d)
with
1
(f ’1 ) (x) = .
f (f ’1 (x))
We shall give a di¬erent proof of this result in a more general (and, I would
claim, more instructive) context in Theorem 13.1.13. Traditionally, the one
dimensional inverse function theorem is illustrated, as in Figure 5.1, by taking
the graph y = f (x) with tangent shown at (f ’1 (x0 ), x0 ) and re¬‚ecting in the
angle bisector of the x and y axes to obtain the graph y = f ’1 (x) with
tangent shown at (x0 , f (x0 )).
Although the picture is suggestive, this is one of those cases where (at
the level of proof we wish to use) a simple picture is inadequate.
107
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 5.1: The one dimensional inverse function theorem

Exercise 5.6.9. Go through Exercise 5.6.8 and note where you used the
mean value theorem and the intermediate value theorem.

Exercise 5.6.10. (i) Write A = {x ∈ Q : 2 ≥ x ≥ 1} and B = {x ∈
Q : 4 ≥ x ≥ 1}. De¬ne f : A ’ B by f (x) = x2 . Show that f is strictly
increasing on A, that f (1) = 1 and f (2) = 4, that f is di¬erentiable on A
with f (x) ≥ 2 for all x ∈ A and that f : A ’ B is injective yet f is not
surjective.
(ii) De¬ne f : Q ’ Q by

for x < 0, x2 > 2,
f (x) = x + 1
for x2 < 2,
f (x) = x
for x > 0, x2 > 2.
f (x) = x ’ 1

Show that f (x) ’ ’∞ as x ’ ’∞, that f (x) ’ ∞ as x ’ ∞, that f
is everywhere di¬erentiable with f (x) = 1 for all x and that f : Q ’ Q is
surjective yet f is not injective8 .

Initially we de¬ned the exponential and trigonometric functions as maps
C ’ C although we did not make much use of this (they are very important
8
These examples do not exhaust the ways in which Figure 5.1 is an inadequate guide
to what can happen without the fundamental axiom of analysis [32].
108 A COMPANION TO ANALYSIS

in more advanced work) and switched rapidly to maps R ’ R. We did
nothing of this sort for the logarithm.
The most obvious attempt to de¬ne a complex logarithm fails at the ¬rst
hurdle. We showed that, working over R, the map exp : R ’ (0, ∞) is
bijective, so that we could de¬ne log as the inverse function. However, we
know (see Exercise 5.5.7) that, working over C, the map exp : C ’ C \ {0}
is surjective but not injective, so no inverse function exists.
Exercise 5.6.11. By using the fact that exp 2πi = 1 = exp 0, show that
there cannot exist a function L : C \ {0} ’ C with L(exp z) = z for all
z ∈ C.
However, a one-sided inverse can exist.
Exercise 5.6.12. (i) If we set L0 (r exp iθ) = log r + iθ for r > 0 and 2π >
θ ≥ 0, show that L0 : C\{0} ’ C is a well de¬ned function with exp(L0 (z)) =
z for all z ∈ C \ {0}.
(ii) Let n be an integer. If we set Ln (r exp iθ) = L0 (r exp iθ)+2πin, show
that Ln : C \ {0} ’ C is a well de¬ned function with exp(Ln (z)) = z for all
z ∈ C \ {0}.
(iii) If we set M (r exp iθ) = log r + iθ for r > 0 and 3π > θ ≥ π, show
that M : C \ {0} ’ C is a well de¬ned function with exp(M (z)) = z for all
z ∈ C \ {0}.
The functions Ln and M in the last exercise are not continuous everywhere
and it is natural to ask if there is a continuous function L : C \ {0} ’ C
with exp(L(z)) = z for all z ∈ C \ {0}. The reader should convince herself,
by trying to de¬ne L(exp iθ) and considering what happens as θ runs from 0
to 2π, that this is not possible. The next exercise crystallises the ideas.
Exercise 5.6.13. Suppose, if possible, that there exists a continuous L :
C \ {0} ’ C with exp(L(z)) = z for all z ∈ C \ {0}.
(i) If θ is real, show that L(exp(iθ)) = i(θ + 2πn(θ)) for some n(θ) ∈ Z.
(ii) De¬ne f : R ’ R by
L(exp iθ) ’ L(1)
1
’θ .
f (θ) =
2π i
Show that f is a well de¬ned continuous function, that f (θ) ∈ Z for all θ ∈ R,
that f (0) = 0 and that f (2π) = ’1.
(iii) Show that the statements made in the last sentence of (ii) are in-
compatible with the intermediate value theorem and deduce that no function
can exist with the supposed properties of L.
(iv) Discuss informally what connection, if any, the discussion above has
with the existence of the international date line.
109
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.6.13 is not an end but a beginning of much important mathe-
matics. In due course it will be necessary for the reader to understand both
the formal proof that, and the informal reasons why, no continuous L can
exist.


Powers ™
5.7
How should we de¬ne ab for a > 0 and b any real number? Most people
would say that we should ¬rst de¬ne ab for b rational and then extend ˜by
continuity™ to non-rational b. This can be done, even with the few tools at
our disposal, but it requires hard work to de¬ne ab this way and still more
hard work to obtain its properties. When we have more powerful tools at
our disposal (uniform convergence and the associated theorems) we shall see
how to make this programme work in Exercises K.227 to K.229 but, even
then, it requires careful thought.
There are, I think, various reasons why the direct approach is hard.
(1) The ¬rst point is mainly psychological. We need to consider ab as a
function of two variables a and b. When we de¬ne an , we think of the integers
n as ¬xed and a as varying and the same is true when we de¬ne ab with b
rational. However, when we want to de¬ne ab ˜by continuity™, we think of a
as ¬xed and b as varying.
(2) The second point is mathematical. The fact that a function is con-
tinuous on the rationals does not mean that it has a continuous extension
to the reals9 . Consider our standard example, the function f : Q ’ Q of
Example 1.1.3. We know that f is continuous but there is no continuous
function F : R ’ R with F (x) = f (x) for x ∈ Q.

Exercise 5.7.1. (i) Prove this statement by observing that, if F is continu-
ous, F (xn ) ’ F (2’1/2 ) whenever xn ’ 2’1/2 , or otherwise.
(ii) Find a function g : Q ’ Q which is di¬erentiable with continuous
derivative such that there is a continuous function G : R ’ R with G(x) =
g(x) for x ∈ Q but any such function G is not everywhere di¬erentiable.

However, the fact that I think something is hard does not prove that it
is hard. I suggest that the reader try it for herself. (She may well succeed,
all that is required is perseverance and a cool head. I simply claim that the
exercise is hard, not that it is impossible.)
9
I can still remember being scolded by my research supervisor for making this partic-
ular mistake. (The result is true if we replace ˜continuity™ by ˜uniform continuity™. See
Exercise K.56.)
110 A COMPANION TO ANALYSIS

Assuming that the reader agrees with me, can we ¬nd another approach?
We obtained the exponential and trigonometric functions as the solution of
di¬erential equations. How does this approach work here? The natural choice
of di¬erential equation, if we wish to obtain y(x) = x± , is

xy (x) = ±y(x)

(Here ± is real and y : (0, ∞) ’ (0, ∞).)
Tentative solution. We can rewrite as
y (x) ±
’ = 0.
y(x) x
Using the properties of logarithm and the chain rule, this gives
d
(log y(x) ’ ± log x) = 0
dx
so, by the mean value theorem,

log y(x) ’ ± log x = C

where C is constant. Applying the exponential function and taking A =
exp C, we obtain

y(x) = A exp(± log x)

where A is a constant.
Exercise 5.7.2. Check, by using the chain rule, that y(x) = A exp(± log x)
is indeed a solution of .
This suggests very strongly indeed that we should de¬ne x± = exp(± log x).
In order to avoid confusion, we adopt our usual policy of light disguise
and investigate the properties of functions r± : (0, ∞) ’ (0, ∞) de¬ned
by r± (x) = exp(± log x) [± real].
Exercise 5.7.3. (Index laws.) If ±, β ∈ R, show that
(i) r±+β (x) = r± (x)rβ (x) for all x > 0.
(ii) r±β (x) = r± (rβ (x)) for all x > 0.
Exercise 5.7.4. (Consistency.) Suppose that n, p and q are integers with
n ≥ 0 and q > 0. Show that
(i) r1 (x) = x for all x > 0.
(ii) rn+1 (x) = xrn (x) for all x > 0.
111
Please send corrections however trivial to twk@dpmms.cam.ac.uk

n

(iii) rn (x) = x — x — · · · — x for all x > 0.
1
(iv) r’n (x) = for all x > 0.
rn (x)
(v) rq (rp/q (x)) = rp (x) for all x > 0.
Explain brie¬‚y why this means that writing rp/q (x) = xp/q is consistent
with your previous school terminology.

Exercise 5.7.5. Suppose that ± is real. Show that
(i) r± (xy) = r± (x)r± (y) for all x, y > 0.
(ii) r0 (x) = 1 for all x > 0.
(iii) r± is everywhere di¬erentiable and xr± (x) = ±r± (x) and r± (x) =
±r±’1 (x) for all x > 0.

Exercise 5.7.6. (i) If x > 0 is ¬xed, show that r± (x) is a di¬erentiable
function of ± with
d
r± (x) = r± (x) log x.

(ii) If ± > 0 and ± is kept ¬xed, show that r± (x) is an increasing function
of x. What happens if ± < 0?
(iii) If x > 1 and x is kept ¬xed, show that r± (x) is an increasing function
of ±. What happens if 0 < x < 1?
(iv) If we write e = exp 1 show that exp x = re (x) (or, in more familiar
terms, exp x = ex ).

Exercise 5.7.7. Take two rulers A and B marked in centimeters (or some
other convenient unit) and lay them marked edge to marked edge. If we slide
the point marked 0 on ruler B until it is opposite the point marked x on ruler
A, then the point marked y on ruler B will be opposite the point marked x + y
on ruler A. We have invented an adding machine.
Now produce a new ruler A by renaming the point marked x as 10x (thus
the point marked 0 on A becomes the point marked 1 on A and the point
marked 3 on A becomes the point marked 1000 on A ). Obtain B from B in
the same way. If we slide the point marked 1 on ruler B until it is opposite
the point marked 10x on ruler A , then the point marked 10y on ruler B will
be opposite the point marked 10x+y on ruler A . Explain why, if a, b > 0 and
we slide the point marked 1 on ruler B until it is opposite the point marked
a on ruler A , then the point marked b on ruler B will be opposite the point
marked ab on ruler A . We have invented an multiplying machine.
(i) How would you divide a by b using this machine?
(ii) Does the number 10 play an essential role in the device?
112 A COMPANION TO ANALYSIS

(iii) Draw a line segment CD of some convenient length to represent the
ruler A . If C corresponds to 1 and D to 10, draw, as accurately as you can,
the points corresponding to 2, 3, . . . , 9.
The device we have described was invented by Oughtred some years after
Napier™s discovery of the logarithm and forms the basis for the ˜slide rule™.
From 1860 to 1960 the slide rule was the emblem of the mathematically com-
petent engineer. It allowed fast and reasonably accurate ˜back of an envelope™
calculations.

Exercise 5.7.8. By imitating the argument of Exercise 5.6.13 show that
there is no continuous function S : C ’ C with S(z)2 = z for all z ∈ C.
(In other words, we can not de¬ne a well behaved square root function on the
complex plane.)

Exercise 5.7.9. Exercise 5.7.8 shows, I think, that we can not hope to ex-
tend our de¬nition of r± (x) with x real and strictly positive and ± real to
some well behaved r± (z) with ± and z both complex. We can, however, ex-
tend our de¬nition to the case when x is still real and strictly positive but we
allow ± to be complex. Our de¬nition remains the same

r± (x) = exp(± log x)

but only some of our previous statements carry over.
(i) If ±, β ∈ C, show that r±+β (x) = r± (x)rβ (x) for all x > 0. Thus
part (i) of Exercise 5.7.3 carries over.
(ii) Explain carefully why the statement in part (ii) of Exercise 5.7.3

?
r±β (x) = r± (rβ (x))

makes no sense (within the context of this question) if we allow ± and β to
range freely over C. Does it make sense and is it true if β ∈ R and ± ∈ C?
Does it make sense and is it true if ± ∈ R and β ∈ C?
(iii) Find which parts of Exercises 5.7.5 and 5.7.6 continue to make sense
in the more general context of this question and prove them.
(iv) Show that, if u and v are real and e = exp(1), then exp(u + iv) =
ru+iv (e). We have thus converted the mnemonic

exp(z) = ez

into a genuine equality.
113
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.7.10. According to a well known story10 , the Harvard mathe-
matician Benjamin Pierce chalked the formula

eiπ + 1 = 0

on the board and addressed his students as follows.

Gentleman, that is surely true, it is absolutely paradoxical; we
cannot understand it, and we do not know what it means, but we
have proved it, and therefore we know it must be the truth.

(i) In the context of this chapter, what information is conveyed by the
formula

exp(iπ) + 1 = 0?

(What does exp mean, what does π mean and what does exp(iπ) mean?)
(ii) In the context of this chapter, what information is conveyed by the
formula

eiπ + 1 = 0?

There is a superb discussion of the problem of de¬ning x± in Klein™s
Elementary Mathematics from an Advanced Standpoint [28].


The fundamental theorem of algebra ™
5.8
It is in the nature of a book like this that much of our time is occupied in
proving results which the ˜physicist in the street™ would consider obvious. In
this section we prove a result which is less obvious.

Theorem 5.8.1. (The fundamental theorem of algebra.) Suppose that
n ≥ 1, a0 , a1 , . . . , an ∈ C and an = 0. Then the equation

an z n + an’1 z n’1 + · · · + a0 = 0

has at least one root in C.

In other words, every polynomial has a root in C.
If the reader believes that this is obvious, then she should stop reading
at this point and write down the ˜obvious argument™. In fact, Leibniz and
other mathematicians doubted the truth of the result. Although d™Alembert,
10
Repeated in Martin Gardner™s Mathematical Diversions. See also Exercise K.89.
114 A COMPANION TO ANALYSIS

Euler and Lagrange o¬ered proofs of the result, they were unsatisfactory and
the ¬rst satisfactory discussion is due to Gauss11 .
The ¬rst point to realise is that the ˜fundamental theorem of algebra™ is
in fact a theorem of analysis!

Exercise 5.8.2. Suppose z = u + iv with u, v ∈ R. If z 2 ’ 2 = 0, show that

u2 ’ v 2 = 2
uv = 0

and deduce that v = 0, u2 = 2.
If we write

Q + iQ = {x + iy : x, y ∈ Q},

show that the equation

z2 ’ 2 = 0

has no solution with z ∈ Q + iQ.

Since Q + iQ and C = R + iR share the same algebraic structure, Exer-
cise 5.8.2 shows that the truth of Theorem 5.8.1 must depend in some way of
the fundamental axiom of analysis. We shall use Theorem 4.3.4, which states
that any continuous function on a closed bounded set in Rn has a minimum,
to establish the following key step of our proof.

Lemma 5.8.3. If P is a polynomial, then there exists a z0 ∈ C such that

|P (z)| ≥ |P (z0 )|

for all z ∈ C.

We then complete the proof by establishing the following lemma.

Lemma 5.8.4. If P is a non-constant polynomial and |P | attains a mini-
mum at z0 , then P (z0 ) = 0.

Clearly, Lemmas 5.8.3 and 5.8.4 together imply Theorem 5.8.1. Our
proofs of the two lemmas make use of simple results given in the next exercise.
11
See [29], Chapter 19, section 4 and Chapter 25 sections 1 and 2.
115
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.8.5. (i) Let P (z) = n aj z j with n ≥ 1 and an = 0. Show
j=0
’1
that, if we set R0 = 2n|an | (1 + max0¤j¤n |aj |), then, whenever |z| ≥ R0 ,
|aj | |aj | |an |
¤ ¤
|z|n’j R0 2n
for all 0 ¤ j ¤ n ’ 1. Hence, or otherwise, show that
n’1
|an |
aj

an +
z n’j 2
j=0

and so
n
|an ||z|n
j

aj z
2
j=0

for all |z| ≥ R0 .
(ii) By using the result of (i), show that, given any real number K ≥ 0,
we can ¬nd an R(K) > 0 such that |P (z)| ≥ K whenever |z| ≥ R(K).
(iii) Let Q(z) = n bj z j with n ≥ k ≥ 1 and bk = 0. Show that there
j=k
exists a ·0 > 0 such that
n
|bk ||z|k
j
¤
bj z
2
j=k+1

for all |z| ¤ ·0 .
Proof of Lemma 5.8.3. We wish to show that, if P is any polynomial, then
|P | has a minimum. If P is a constant polynomial there is nothing to prove,
n j
j=0 aj z with n ≥ 1 and an = 0. By
so we may suppose that P (z) =
Exercise 5.8.5 (ii), we can ¬nd an R > 0 such that |P (z)| ≥ |P (0)| + 1
whenever |z| ≥ R.
Identifying C with R2 in the usual way, we observe that
¯
DR = {z ∈ C : |z| ¤ R}
is a closed bounded set and that the function |P | : C ’ R is continuous.
Thus we may use Theorem 4.3.4 which states that a continuous function on
¯
a closed bounded set attains its minimum to show the existence of a z0 ∈ DR
¯
with |P (z0 )| ¤ |P (z)| for all z ∈ DR .
We note, in particular, that |P (z0 )| ¤ |P (0)|. Thus, if |z| ≥ R, then
|P (z)| ≥ |P (0)| + 1 > |P (0)| ≥ |P (z0 )|.
It follows that |P (z0 )| ¤ |P (z)| for all z ∈ C as required.
116 A COMPANION TO ANALYSIS

Exercise 5.8.6. De¬ne f : C ’ R by f (z) = ’|z|2 . Show that f attains a
minimum on every set
¯
DR = {z ∈ C : |z| ¤ R}

but has no minimum on C. Explain brie¬‚y why the proof above works for |P |
but not for f .
We must now show that, if z0 gives the minimum value of the modulus |P |
of a non-constant polynomial P , then P (z0 ) = 0. We start with a collection
of remarks intended to simplify the algebra.
Exercise 5.8.7. (i) Let P be a non-constant polynomial whose modulus |P |
has a minimum at z0 . Show that if Q(z) = P (z + z0 ), then Q is a non-
constant polynomial whose modulus |Q| has a minimum at 0. Show further
that, if Q(0) = 0, then P (z0 ) = 0.
(ii) Let Q be a non-constant polynomial whose modulus |Q| has a mini-
mum at 0. Show that, for an appropriate φ ∈ R, to be de¬ned, the function
R(z) = eiφ Q(z) has R(0) real and positive12 . Show that R is a non-constant
polynomial whose modulus |R| has a minimum at 0 and that, if R(0) = 0,
then Q(0) = 0.
(iii) Let R be a non-constant polynomial whose modulus |R| has a mini-
mum at 0 and such that R(0) is real and positive. Explain why we have
n
aj z j
R(z) = a0 +
j=k

where a0 is real and positive, k ≥ 1 and ak = 0. Set S(z) = R(eiψ z). Show
that, for an appropriate ψ ∈ R, to be de¬ned,
n
bj z j
S(z) = b0 +
j=k

where b0 is real and positive, k ≥ 1 and bk is real and strictly negative (that
is bk < 0).
Most mathematicians would consider the results of Exercise 5.8.7 to be
trivial and use a phrase like ˜Without loss of generality we may suppose that
z0 = 0 and P (z) = a0 + n aj z j where a0 is real and positive, k ≥ 1 and
j=k
ak is real and strictly negative™ or (better) ˜By considering eiφ P (eiψ (z ’ z0 ))
we may suppose that z0 = 0 and P (z) = a0 + n aj z j where a0 is real and
j=k
positive, k ≥ 1 and ak is real and strictly negative™.
12
That is to say, non-negative.
117
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof of Lemma 5.8.4. We want to show that if P is a non-constant poly-
nomial and z0 gives a minimum of |P |, then P (z0 ) = 0. Without loss of
generality we may suppose that z0 = 0 and P (z) = a0 + n aj z j where a0
j=k
is real and positive, k ≥ 1 and ak is real and strictly negative. If a0 = 0 then
P (0) = 0 and we are done. We suppose that a0 is strictly positive and seek
a contradiction.
By Exercise 5.8.5, we can ¬nd an ·0 > 0 such that
n
|ak z k |
j
¤
aj z
2
j=k+1

for all |z| ¤ ·0 . Now choose ·1 , a real number with 0 < ·1 ¤ ·0 and
k
a0 > |ak |·1 /2 (·1 = min(·0 , 1, ’a0 /(2ak )) will do). Remembering that a0 is
real and strictly positive and ak is real and strictly negative, we see that,
whenever · is real and 0 < · < ·1 , we have
n n
j k
aj · j
|P (·)| = a0 + ¤ |a0 + ak · | +
aj ·
j=k j=k+1

¤ |a0 + ak · k | + |ak · k |/2 = a0 + ak · k ’ ak · k /2 = a0 + ak · k /2 < P (0),

contradicting the statement that 0 is a minimum for P . The result follows
by reductio ad absurdum.
The proof of Theorem 5.8.1 may look a little complicated but really it
only amounts to a ¬‚eshing out of the following sketch argument.
Outline proof of Theorem 5.8.1. Let P be a non constant polynomial. Since
|P (z)| ’ ∞ as |z| ’ ∞, P must attain a minimum. By translation, we may
suppose that the minimum occurs at 0. If P (0) = 0, then
n
aj z j
P (z) = a0 +
j=k

with k ≥ 1 and a0 , ak = 0. Close to zero,

P (z) ≈ a0 + ak z k .

Choosing an appropriate φ, we have |a0 + ak (eiφ ·)k | < |a0 | whenever · is
small and strictly positive, contradicting the statement that |P | attains a
minimum, at 0. The result follows by reductio ad absurdum.
Exercise 5.8.8. Give an explicit value for φ in the outline proof just sketched.
118 A COMPANION TO ANALYSIS

Exercise 5.8.9. We say that z0 is a local minimum of a function G : C ’ R
if we can ¬nd a δ > 0 such that G(z) ≥ G(z0 ) for all z with |z ’ z0 | < δ.
Show that if P is a non-constant polynomial and z0 is a local minimum of
|P |, then P (z0 ) = 0.

We have already used the strategy of looking for a minimum (or maxi-
mum) and then considering the behaviour of the function near that ˜extreme™
point in our proof of Rolle™s theorem (Theorem 4.4.4). Another example oc-
curs in Exercise K.30 if the reader wishes to try it and other examples will
crop up in this book. The method is very powerful but we must be careful to
establish that an extreme point actually exists (see, as a warning example,
the discussion beginning on page 199 of a counterexample due, essentially,
to Weierstrass). Notice that our proof required the ability to ˜look in all
directions™. The minimum had to be in the open set

DR = {z ∈ C : |z| < R}

and not merely in the set

¯
DR = {z ∈ C : |z| ¤ R}.

Exercise 5.8.10. This exercise recalls material that is probably familiar from
algebra. We work in C.
(i) Show, by induction on the degree of P , or otherwise, that if P is a
non-constant polynomial and » ∈ C, then there exists a polynomial Q and
an r ∈ C such that

P (z) = (z ’ »)Q(z) + r.

(ii) If P is a non-constant polynomial and » ∈ C is such that P (») = 0,
then there is a polynomial Q such that

P (z) = (z ’ »)Q(z).

(iii) Use the fundamental theorem of algebra and induction on the degree
of n to show that any polynomial P of degree n can be written in the form
n
(z ’ »j ).
P (z) = a
j=1


(iv) Show that a polynomial of degree n can have at most n distinct roots.
What is the minimum number of distinct roots it can have?
119
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(v) If P has real coe¬cients show13 that P (z)— = P (z — ) and deduce that,
if » is a root of P , so is »— .
(vi) Use part (v) and induction to show that, if P is a polynomial with
real coe¬cients, then P can be written in the form
m
P (z) = a Qj (z)
j=1

where a ∈ R and, for each j, either Qj (z) = z + aj with aj ∈ R, or Qj =
z 2 + aj z + bj with aj , bj ∈ R.
In the days before mathematicians acquired our present con¬dence with
complex numbers, the fundamental theorem of algebra was given the less gen-
eral statement that any polynomial with real coe¬cients could be written as
the product of linear and quadratic terms with real coe¬cients.
It is natural to ask if this restricted result which does not mention complex
numbers can be proved without using complex numbers. Gauss™s ¬rst proof of
the restricted result used complex numbers but he later gave a second proof
without using complex numbers which depends only on the fact that a real
polynomial of odd degree must have a root (Exercise 1.6.4) and so uses the
fundamental axiom in the form of the intermediate value theorem. As might
be expected, his proof and its modern sucessors are rather subtle. The reader
is advised to wait until she has studied the rudiments of Galois theory before
pursuing these ideas further.
Exercise 5.8.11. Let P (z) = n aj z j be a non-constant polynomial with
j=0
a root at z0 .
(i) Explain why we can ¬nd an ·0 > 0 such that P (z) = 0 for all z with
0 < |z ’ z0 | < ·0 .
(ii) If 0 < · < ·0 , use the fact that a continuous function on a closed
bounded set is bounded and attains its bounds to show that there is a δ(·) > 0
such that |P (z)| ≥ δ(·) > 0 for all z with |z ’ z0 | = ·.
(iii) Continuing with the notations and assumptions of (ii), show that if
Q(z) is a polynomial with |P (z) ’ Q(z)| < δ(·)/2 for all z with |z ’ z 0 | ¤ ·,
then |Q| has a local minimum (and so Q has a root) z1 with |z1 ’ z0 | < ·.
(iv) Show that given any δ > 0, we can ¬nd an > 0 (depending on δ, n,
a0 , a1 , . . . , an ) such that, if |aj ’ bj | < , for 0 ¤ j ¤ n then n bj z j has
j=0
at least one root z1 with |z0 ’ z1 | < δ.
[Note that this result is not true if we work over R. The equation x2 = 0
has a real root at 0 but x2 + = 0 has no real roots if > 0 however small
may be.]
We write z — for the complex conjugate of z. Thus, if x and y are real (x+iy)— = x’iy.
13

Some authors use z .
¯
120 A COMPANION TO ANALYSIS

Exercise 5.8.12. (This exercise requires countability and a certain willing-
ness to think like an algebraist.)
It is sometimes said that we have to introduce R in order to provide
equations like x2 ’ 2 = 0 with a root. A little thought shows that this is too
simple a view of the matter. Recall that a system (F, +, —) satisfying all the
axioms set out in Axioms A except axioms P1 to P4 (the axioms of order) is
called a ¬eld. If (F, +, —) is a ¬eld and and G ⊆ F is such that
(a) 0, 1, ’1 ∈ G,
(b) if x, y ∈ G, then x + y, xy ∈ G,
(c) if x ∈ G and x = 0, then x’1 ∈ G,
then we say that G is a sub¬eld of F. It easy to see that a sub¬eld is itself
a ¬eld. In this exercise we show that there is a countable sub¬eld L of C
containing Q and such that, if a0 , a1 , . . . , an ∈ L, with an = 0, then we can
¬nd a, »1 , . . . , »n ∈ L such that
n n
j
(z ’ »j )
aj z = a
j=0 k=1

for all z ∈ L. In other words, every polynomial with coe¬cients in L has all
its roots in L. Here are the steps in the proof.
(i) If K is a countable sub¬eld of C, show that the set of polynomials
with degree n with coe¬cients in K is countable. Deduce that the set of
polynomials P(K) with coe¬cients in K is countable. Show also that the set
Z(K) of roots in C of polynomials in P(K) is countable.
(ii) If K is a sub¬eld of C and ω ∈ C, we write K(ω) for the set of
numbers P (ω)/Q(ω) with P , Q ∈ P(K) and Q(ω) = 0. Show that K(ω) is a
sub¬eld of C containing K and ω. If K is countable, show that K(ω) is.
(iii) Let K be a sub¬eld of C and ω = (ω1 , ω2 , . . . ) where ωj ∈ C. Set
K0 = K and de¬ne Kn = Kn’1 (ωn ) for all n ≥ 1. If we set K(ω) = ∞ Kn ,n=0
show that K(ω) is a sub¬eld of C containing K and ωj for each j ≥ 1. If K
is countable, show that K(ω) is.
(iv) Let K be a countable sub¬eld of C (we could take K = Q). Set
K0 = K. Show by induction, using part (iii), that we may de¬ne inductively
a sequence Kn of countable sub¬elds of C such that Kn contains Z(Kn’1 ) for
each n ≥ 1. If we set L = ∞ Kn , show that L is a countable sub¬eld of C
n=0
such that every polynomial with coe¬cients in L has all its roots in L.
[We say that ¬elds like L are ˜algebraically closed™. The work we have
had to do to obtain an ˜algebraically closed™ L from K shows the fundamental
theorem of algebra in a remarkable light. Although R is not algebraically
closed, adjoining a single root i of a single equation z 2 + 1 = 0 to form
R(i) = C produces an algebraically closed ¬eld!]
Chapter 6

Di¬erentiation

6.1 Preliminaries
This section is as much propaganda as technical mathematics and, as with
much propaganda, most points are made more than once.
We have already looked brie¬‚y at di¬erentiation of functions f : R ’ R.
Unfortunately, nature is not one dimensional and we must consider the more
general case of a function f : Rm ’ Rp . The de¬nition of the derivative in
terms of the limit of some ratio is not available since we cannot divide by
vectors.
The ¬rst solution that mathematicians found to this problem is via ˜di-
rectional derivatives™ or, essentially equivalently, via ˜partial derivatives™. We
shall give formal de¬nitions later but the idea is to reduce a many dimen-
sional problem to a collection of one dimensional problems by only examining
changes in one direction at at time. Suppose, for example, that f : Rm ’ R
is well behaved. If we wish to examine how f behaves near x we choose a unit
vector u and look at fu (t) = f (x + tu) with t ∈ R. The function fu : R ’ R
is ˜one dimensional™ and we may look at its derivative

f (x + hu) ’ f (x)
fu (x) = lim .
h
h’0


By choosing m unit vectors uj at right angles and looking at the associated
˜directional derivatives™ fuj (x) we can obtain a picture of the way in which
f changes.
But to echo Maxwell

. . . the doctrine of Vectors . . . is a method of thinking and not,
at least for the present generation, a method of saving thought.

121
122 A COMPANION TO ANALYSIS

It does not, like some more popular mathematical methods, en-
courage the hope that mathematicians may give their minds a
holiday, by transferring all their work to their pens. It calls on us
at every step to form a mental image of the geometrical features
represented by the symbols, so that in studying geometry by this
method we have our minds engaged with geometrical ideas, and
are not permitted to call ourselves geometers when we are only
arithmeticians. (Page 951, [38])

Is there a less ˜coordinate bound™ and more ˜geometric™ way of looking at
di¬erentiation in many dimensions? If we are prepared to spend a little time
and e¬ort acquiring new habits of thought, the answer is yes.
The original discoverers of the calculus thought of di¬erentiation as the
process of ¬nding a tangent. If f : R ’ R is well behaved then the tangent
at x is the line y = b + a(t ’ x) which touches the curve y = f (t) at (x, f (x))
that is the ˜line which most resembles f close to x™. In other words

f (t) = b + a(t ’ x) + small error

close to x. If we think a little harder about the nature of the ˜smallest error™
possible we see that it ˜ought to decrease faster than linear™ that is

f (t) = b + a(t ’ x) + E(t)|t ’ x|

with E(t) ’ 0 as t ’ x.

Exercise 6.1.1. Suppose that f : R ’ R. Show that the following two
statements are equivalent.
f (t) ’ f (x)
’ a as t ’ x.
(i)
t’x
(ii) f (t) = f (x) + a(t ’ x) + E(t)|t ’ x| with E(t) ’ 0 as t ’ x.
Rewriting our equations slightly, we see that f is di¬erentiable at x if

f (t) ’ f (x) = a(t ’ x) + E(t)|t ’ x|

with E(t) ’ 0 as t ’ 0. A ¬nal rewrite now gives f is di¬erentiable at x if

f (x + h) ’ f (x) = ah + (h)|h|.

where (h) ’ 0 as h ’ x. The derivative f (x) = a is the slope of the
tangent at x.
The obvious extension to well behaved functions f : Rm ’ R is to con-
sider the tangent plane at (x, f (x)). Just as the equation of a non-vertical
123
Please send corrections however trivial to twk@dpmms.cam.ac.uk

line through the origin in R — R is y = bt, so the equation of an appropriate
plane (or ˜hyperplane™ if the reader prefers) in Rm — R is y = ±(x) where
± : Rm ’ R is linear. This suggests that we say that f is di¬erentiable at x
if
f (x + h) ’ f (x) = ±(h) + (h) h ,
where (h) ’ 0 as h ’ 0. It is natural to call ± the derivative of f at x.
Finally, if we consider f : Rm ’ Rp , the natural ¬‚ow of our argument
suggests that we say that f is di¬erentiable at x if we can ¬nd a linear map
± : Rm ’ Rp such that
f (x + h) = f (x) + ±(h) + (h) h
where (h) ’ 0 as h ’ 0. It is natural to call ± the derivative of f at x.
Important note: It is indeed natural to call ± the derivative of f at x.
Unfortunately, it is not consistent with our old de¬nition in the case m =
p = 1. If f : R ’ R, then, with our new de¬nition, the derivative is the map
t ’ f (x)t but, with our old, the derivative is the number f (x).
From the point of view we have adopted, the key observation of the one
dimensional di¬erential calculus is that well behaved curves, however com-
plicated they may be globally, behave locally like straight lines i.e. like the
simplest curves we know. The key observation of multidimensional calculus
is that well behaved functions, however complicated they may be globally,
behave locally like linear maps i.e. like the simplest functions we know. It
is this observation, above all, which justi¬es the immense amount of time
spent studying linear algebra, that is to say, studying the behaviour of linear
maps.
I shall assume that the reader has done a course on linear algebra and
is familiar with with the de¬nition and lemma that follow. (Indeed, I have
already assumed familiarity with the notion of a linear map.)
De¬nition 6.1.2. We say that a function (or map) ± : Rm ’ Rp is linear
if
±(»x + µy) = »±(x) + µ±(y)
for all x, y ∈ Rm and », µ ∈ R.
We shall often write ±x = ±(x).
Lemma 6.1.3. Each linear map ± : Rm ’ Rp is associated with a unique
p — m real matrix A = (aij ) such that if ±x = y then
m
yi = aij xj (†)
j=1
124 A COMPANION TO ANALYSIS

Conversely each p—m real matrix A = (aij ) is associated with a unique linear
map ± : Rm ’ Rp by the equation (†).
We shall call A the matrix of ± with respect to the standard bases. The
point to notice is that, if we take di¬erent coordinate axes, we get di¬erent
matrices associated with the same linear map.
From time to time, particularly in some of the exercises, we shall use other
facts about linear maps. The reader should not worry too much if some of
these facts are unfamiliar but she should worry if all of them are.
We now repeat the discussion of di¬erentiation with marginally more
generality and precision.
A function is continuous if it is locally approximately constant. A function
is di¬erentiable if it is locally approximately linear. More precisely, a function
is continuous at a point x if it is locally approximately constant, with an error
which decreases to zero, as we approach x. A function is di¬erentiable at a
point x if it is locally approximately linear, with an error which decreases to
zero faster than linearly, as we approach x.
De¬nition 6.1.4. Suppose that E is a subset of Rm and x a point such that
there exists a δ > 0 with the ball B(x, δ) ⊆ E. We say that f : E ’ Rp , is
di¬erentiable at x if we can ¬nd a linear map ± : Rm ’ Rp such that, when
h < δ,
f (x + h) = f (x) + ±h + (x, h) h ,
where (x, h) ’ 0 as h ’ 0. We write ± = Df (x) or ± = f (x).
If E is open and f is di¬erentiable at each point of E, we say that f is
di¬erentiable on E.
Needless to say, the centre of the de¬nition is the formula and the
reader should concentrate on understanding the rˆle of each term in that
o
formula. The rest of the de¬nition is just supporting wa¬„e. Formula is
sometimes written in the form
f (x + h) ’ f (x) ’ ±h
’0
h
as h ’ 0.
Of course, we need to complete De¬nition 6.1.4 by showing that ± is
unique.
Lemma 6.1.5. (i) Let γ : Rm ’ Rp be a linear map and : R m ’ Rp a
function with (h) ’ 0 as h ’ 0. If
γh = (h) h
125
Please send corrections however trivial to twk@dpmms.cam.ac.uk

then γ = 0 the zero map.
(ii) There is at most one ± satisfying the conditions of De¬nition 6.1.4.

Proof. (i) There are many di¬erent ways of setting out this simple proof.
Here is one. Let x ∈ Rm . If · > 0, we have

γx = · ’1 γ(·x) = · ’1 (·x) ·x = (·x) x

and so

(·x) x ’ 0
γx =

as · ’ 0 through values · > 0. Thus γx = 0 and γx = 0 for all x ∈ Rm .
In other words, γ = 0.
(ii) Suppose that we can ¬nd linear maps ±j : Rm ’ Rp such that, when
h < δ,

f (x + h) = f (x) + ±j h + j (x, h) h,

where j (x, h) ’ 0 as h ’ 0 [j = 1, 2].
Subtracting, we see that

(±1 ’ ±2 )h = (x, h)

where


(x, h) = 2 (x, h) 1 (x, h)


for h < δ. Since

(x, h) ¤ ’0
1 (x, h) + 2 (x, h)


as h ’ 0, we can apply part (i) to obtain ±1 = ±2 .
The coordinate free approach can be taken only so far, and to calculate
we need to know the the matrix A of ± = Df (x) with respect to the standard
bases. To ¬nd A we have recourse to directional derivatives.

De¬nition 6.1.6. Suppose that E is a subset of Rm and that we have a
function g : E ’ R. Suppose further that x ∈ E and u is a unit vector such
that there exists a δ > 0 with x + hu ∈ E for all |h| < δ. We can now de¬ne
a function G from the open interval (’δ, δ) to R by setting G(t) = g(x + tu).
If G is di¬erentiable at 0, we say that g has a directional derivative at x in
the direction u of value G (0).
126 A COMPANION TO ANALYSIS

Exercise 6.1.7. Suppose that E is a subset of of Rm and that we have a
function g : E ’ R. Suppose further that x ∈ E and u is a unit vector such
that there exists a δ > 0 with x + hu ∈ E for all |h| < δ. Show that g has a
directional derivative at x in the direction u of value a if and only if
g(x + tu) ’ g(x)
’a
t
as t ’ 0.
We are interested in the directional derivatives along the unit vectors ej in
the directions of the coordinate axes. The reader is almost certainly familiar
with these under the name of ˜partial derivatives™.
De¬nition 6.1.8. Suppose that E is a subset of of Rm and that we have a
function g : E ’ R. If we give Rm the standard basis e1 , e2 , . . . , em (where
ej is the vector with jth entry 1 and all other entries 0), then the directional
derivative of g at x in the direction ej is called a partial derivative and written
g,j (x).
The recipe for computing g,j (x) is thus, ˜di¬erentiate g(x1 , x2 , . . . , xj , . . . , xn )
with respect to xj treating all the xi with i = j as constants™.
The reader would probably prefer me to say that g,j (x) is the partial
derivative of g with respect to xj and write
‚g
g,j (x) = (x).
‚xj
I shall use this notation from time to time, but, as I point out in Appendix E,
there are cultural di¬erences between the way that applied mathematicians
and pure mathematicians think of partial derivatives, so I prefer to use a
di¬erent notation.
The reader should also know a third notation for partial derivatives.
Dj g = g,j .
This ˜D™ notation is more common than the ˜comma™ notation and is to be
preferred if you only use partial derivatives occasionally or if you only deal
with functions f : Rn ’ R. The ˜comma™ notation is used in Tensor Analysis
and is convenient in the kind of formulae which appear in Section 7.2.
If E is a subset of of Rm and we have a function g : E ’ Rp then we can
write
« 
g1 (t)
¬g2 (t)·
¬ ·
g(t) = ¬ . ·
. .
gp (t)
127
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and obtain functions gi : E ’ R with partial derivatives (if they exist) gi,j (x)
‚gi
(or, in more standard notation (x)). The proof of the next lemma just
‚xj
consists of dismantling the notation so laboriously constructed in the last
few paragraphs.

Lemma 6.1.9. Let f be as in De¬nition 6.1.4. If we use standard coordi-
nates, then, if f is di¬erentiable at x, its partial derivatives fi,j (x) exist and
the matrix of the derivative Df (x) is the Jacobian matrix (fi,j (x)) of partial
derivatives.

Proof. Left as a strongly recommended but simple exercise for the reader.

Notice that, if f : R ’ R, the matrix of Df (x) is the 1 — 1 Jacobian matrix
(f (x)). Notice also that Exercise 6.1.9 provides an alternative proof of the
uniqueness of the derivative (Lemma 6.1.5 (ii)).
It is customary to point out that the existence of the partial deriva-
tives does not imply the di¬erentiability of the function (see Example 7.3.14
below) but the main objections to over-reliance on partial derivatives are
that it makes formulae cumbersome and sti¬‚es geometric intuition. Let your
motto be ˜coordinates and matrices for calculation, vectors and lin-
ear maps for understanding™.


6.2 The operator norm and the chain rule
We shall need some method of measuring the ˜size™ of a linear map. The
reader is unlikely to have come across this in a standard ˜abstract algebra™
course, since algebraists dislike using ˜metric notions™ which do not generalise
from R to more general ¬elds.
Our ¬rst idea might be to use some sort of measure like

= max |aij |
±

where (aij ) is the matrix of ± with respect to the standard bases. However
± has no geometric meaning.

Exercise 6.2.1. Show by example that ± may depend on the coordinate
axes chosen.

Even if we insist that our method of measuring the size of a linear map
shall have a geometric meaning, this does not give a unique method. The
following chain of ideas gives one method which is natural and standard.
128 A COMPANION TO ANALYSIS

Lemma 6.2.2. If ± : Rm ’ Rp is linear, there exists a constant K(±) such
that

±x ¤ K(±) x

for all x ∈ Rm .

<<

. 4
( 19)



>>