ńņš. 4 |

ā ā

nā’1

zj zj |z|j

e(z) ā’ ā¤

=

j! j! j!

j=0 j=n j=n

ā

|z|n |z|k

=

n! (n + 1)(n + 2) . . . (n + k)

k=0

ā k

|z|n |z|

ā¤

n! n+1

k=0

n

(n + 1)|z|n

|z| 1

ā¤ =

n! 1 ā’ |z| (n + 1 ā’ |z|)n!

n+1

2|z|n

ā¤ .

n!

96 A COMPANION TO ANALYSIS

Exercise 5.4.9. A particularly cautious mathematician might prove Lemma 5.4.8

as follows. Set em (z) = m z . Show that, if m ā„ n, then

j

j=0 j!

(n + 1)|z|n

|em (z) ā’ enā’1 (z)| ā¤ .

(n + 1 ā’ |z|)n!

Deduce that

(n + 1)|z|n

|e(z) ā’ enā’1 (z)| ā¤ |e(z) ā’ em (z)| + |em (z) ā’ enā’1 (z)| ā¤ |e(z) ā’ em (z)| + .

(n + 1 ā’ |z|)n!

By allowing m ā’ ā, obtain the required result.

We now switch our attention to the restriction of e to R. The results we

expect now come tumbling out.

Exercise 5.4.10. Consider e : R ā’ R given by e(x) = ā xj /j!. j=0

(i) Using Lemma 5.4.8, show that |e(h) ā’ 1 ā’ h| ā¤ h2 for |h| < 1/2.

Deduce that e is diļ¬erentiable at 0 with derivative 1.

(ii) Explain why e(x + h) ā’ e(x) = e(x)(e(h) ā’ 1). Deduce that e is

everywhere diļ¬erentiable with e (x) = e(x).

(iii) Show that e(x) ā„ 1 for x ā„ 0 and, by using the relation e(ā’x)e(x) =

1, or otherwise, show that e(x) > 0 for all x ā R.

(iv) Explain why e is a strictly increasing function.

(v) Show that e(x) ā„ x for x ā„ 0 and deduce that e(x) ā’ ā as x ā’ ā.

Show also that e(x) ā’ 0 as x ā’ ā’ā.

(vi) Use (v) and the intermediate value theorem to show that e(x) = y

has a solution for all y > 0.

(vii) Use (iv) to show that e(x) = y has at most one solution for all y > 0.

Conclude that e is a bijective map of R to R++ = {x ā R : x > 0}.

(viii) By modifying the proof of (v), or otherwise, show that P (x)e(ā’x) ā’

0 as x ā’ ā. [We say ā˜exponential beats polynomialā™.]

(ix) By using (viii), or otherwise, show that e is not equal to any function

of the form P/Q with P and Q polynomials. [Thus e is a genuinely new

function.]

When trying to prove familiar properties of a familiar function, it is prob-

ably wise to use a slightly unfamiliar notation. However, as the reader will

have realised from the start, the function e is our old friend exp. We shall

revert to the mild disguise in the next section but we use standard notation

for the rest of this one.

97

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.4.11. (i) Check that R is an Abelian group under addition. Show

that R++ = {x ā R : x > 0} is an Abelian group under multiplication. Show

that exp : (R, +) ā’ (R++ , Ć—) is a isomorphism.

(ii) [Needs a little more familiarity with groups] Show that R \ {0} is an

Abelian group under multiplication. By considering the order of the element

ā’1 ā R \ {0}, or otherwise show that the groups (R, +) and (R \ {0}, Ć—)

are not isomorphic.

We can turn Plausible Statement 5.4.1 into a theorem

Theorem 5.4.12. The general solution of the equation

y (x) = y(x), ()

where y : R ā’ R is a diļ¬erentiable function is

y(x) = a exp(x)

with a ā R.

Proof. It is clear that y(x) = a exp(x) is a solution of . We must prove

there are no other solutions. To this end, observe that, if y satisļ¬es , then

d

(exp(ā’x)y(x)) = y (x) exp(ā’x) ā’ y(x) exp(ā’x) = 0

dx

so, by the mean value theorem, exp(ā’x)y(x) is a constant function. Thus

exp(ā’x)y(x) = a and y(x) = a exp(x) for some a ā R.

Exercise 5.4.13. State and prove the appropriate generalisation of Theo-

rem 5.4.12 to cover the equation

y (x) = by(x)

with b a real constant.

Here is another consequence of Lemma 5.4.8.

Exercise 5.4.14. (e is irrational.) Suppose, if possible, that e = exp 1 is

rational. Then exp 1 = m/n for some positive integers m and n. Explain,

why if N ā„ n,

N

1

N ! exp 1 ā’

j!

j=0

98 A COMPANION TO ANALYSIS

must be a non-zero integer and so

N

1

N ! exp 1 ā’ ā„ 1.

j!

j=0

Use Lemma 5.4.8 to obtain a contradiction.

r+1

Show, similarly, that ā (ā’1)

r=1 (2rā’1)! is irrational.

Most mathematicians draw diagrams frequently both on paper and in

their heads6 . However, these diagrams are merely sketches. To see this,

quickly sketch a graph of exp x.

Exercise 5.4.15. Choosing appropriate scales, draw an accurate graph of

exp on the interval [0, 100]. Does it look like your quick sketch?

We conclude this section with a result which is a little oļ¬ our main track

but whose proof provides an excellent example of the use of dominated con-

vergence (Theorem 5.3.3).

Exercise 5.4.16. We work in C. Show that if we write

ā

z n

aj (n)z j

1+ =

n j=0

then aj (n)z j ā’ z j /j! as n ā’ ā and |aj (n)z j | ā¤ |z j |/j! for all n and all j.

Use dominated convergence to conclude that

z n

ā’ e(z)

1+

n

as n ā’ ā, for all z ā C.

The trigonometric functions ā™„

5.5

In the previous section we considered the simple diļ¬erential equation y (x) =

y(x). What happens if we consider the diļ¬erential equation y (x)+y(x) = 0?

6

Little of this activity appears in books and papers, partly because, even today, adding

diagrams to printed work is non-trivial. It is also possible that it is the process of drawing

(or watching the process of drawing) which aids comprehension rather than the ļ¬nished

product.

99

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.5.1. Proceeding along the lines of Plausible Statement 5.4.1,

show that it is reasonable to conjecture that the general solution of the equa-

tion

y (x) + y(x) = 0, ( )

where y : R ā’ R is a well behaved function, is

ā ā

(ā’1)j x2j (ā’1)j x2j+1

y(x) = a +b

(2j)! (2j + 1)!

j=0 j=0

with a, b ā R.

A little experimentation reveals what is going on.

Exercise 5.5.2. We work in C. If we write

e(iz) ā’ e(ā’iz)

e(iz) + e(ā’iz)

c(z) = , s(z) = ,

2 2i

show carefully that

ā ā

(ā’1)j z 2j (ā’1)j z 2j+1

c(z) = , s(z) = .

(2j)! (2j + 1)!

j=0 j=0

We can use the fact that e(z + w) = e(z)e(w) to obtain a collection of

useful formula for s and c.

Exercise 5.5.3. Show that if z, w ā C then

(i) s(z + w) = s(z)c(w) + c(z)s(w),

(ii) c(z + w) = c(z)c(w) ā’ s(z)s(w),

(iii) s(z)2 + c(z)2 = 1

(iv) s(ā’z) = ā’s(z), c(ā’z) = c(z).

We now switch our attention to the restriction of s and c to R.

Exercise 5.5.4. Consider c, s : R ā’ R given by c(x) = ā (ā’1)j x2j /(2j)!,

j=0

ā j 2j+1

and s(x) = j=0 (ā’1) x /(2j + 1)!.

(i) Using the remainder estimate in alternating series test (second para-

graph of Lemma 5.2.1), or otherwise, show that |c(h) ā’ 1| ā¤ h 2 /2 and

|s(h) ā’ h| ā¤ |h|3 /6 for |h| < 1. Deduce that c and s are diļ¬erentiable at

0 with c (0) = 0, s (0) = 1.

(ii) Using the addition formula of Exercise 5.5.3 (ii) and (iii) to evaluate

c(x + h) and s(x + h), show that c and s are everywhere diļ¬erentiable with

c (x) = ā’s(x), s (x) = c(x).

100 A COMPANION TO ANALYSIS

Suppose that a group of mathematicians who did not know the trigono-

metric functions were to investigate our functions c and s deļ¬ned by power

series. Careful calculation and graphing would reveal that, incredible as it

seemed, c and s appeared to be periodic!

Exercise 5.5.5. (i) By using the estimate for error in the alternating series

test, show that

c(x) > 0 for all 0 ā¤ x ā¤ 1.

By using a minor modiļ¬cation of these ideas, or otherwise, show that c(2) <

0. Explain carefully why this means that there must exist an a with 1 < a < 2

such that c(a) = 0.

(iii) In this part and what follows we make use of the formulae obtained

in Exercise 5.5.3 which tell us that

s(x + y) = s(x)c(y) + c(x)s(y), c(x + y) = c(x)c(y) ā’ s(x)s(y),

c(x)2 + s(x)2 = 1, s(ā’x) = ā’s(x), c(ā’x) = c(x)

for all x, y ā R. Show that, if c(a ) = 0 and c(a ) = 0, then s(a ā’ a ) = 0.

Use the fact that s(0) = 0 and s (x) = c(x) > 0 for 0 ā¤ x ā¤ 1 to show that

s(x) > 0 for 0 < x ā¤ 1. Conclude that, if a and a are distinct zeros of

c, then |a ā’ a | > 1. Deduce that c(x) = 0 has exactly one solution with

0 ā¤ x ā¤ 2. We call this solution a.

(iv) By considering derivatives, show that s is strictly increasing on [0, a].

Conclude that s(a) > 0 and deduce that s(a) = 1. Show that

s(x + a) = c(x), c(x + a) = ā’s(x)

for all x and that c and s are periodic with period 4a (that is s(x + 4a) = s(x)

and c(x + 4a) = c(x) for all x).

(v) Show that s is strictly increasing on [ā’a, a], and strictly decreasing

on [a, 3a].

(vi) If u and v are real numbers with u2 + v 2 = 1, show that there there

is exactly one solution to the pair of equations

c(x) = u, s(x) = v

with 0 ā¤ x < 4a.

At this point we tear oļ¬ the thin disguise of our characters and write

exp(z) = e(z), sin z = s(z), cos(z) = c(z) and a = Ļ/2.

101

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.5.6. We work in R. Show that, if |u| ā¤ 1, there is exactly one

Īø with 0 ā¤ Īø ā¤ Ļ such that cos Īø = u.

Using the Cauchy-Schwarz inequality (Lemma 4.1.2) show that, if x and

y are non-zero vectors in Rm , then there is exactly one Īø with 0 ā¤ Īø ā¤ Ļ

such that

xĀ·y

cos Īø = .

xy

We call Īø the angle between x and y.

Exercise 5.5.7. We work in C and use the usual disguises except that we

write a = Ļ/2.

(i) Show that e has period 2Ļi in the sense that

e(z + 2Ļi) = e(z)

for all z and

e(z + w) = e(z)

for all z if and only w = 2nĻi for some n ā Z. State corresponding results

for s, c : C ā’ C.

(ii) If x and y are real, show that

e(x + iy) = e(x)(c(y) + is(y)).

(iii) If w = 0, show that there are unique real numbers r and y with r > 0

and 0 ā¤ y < 2Ļ such that

w = re(iy).

(iv) If w ā C, ļ¬nd all solutions of w = re(iy) with r and y real and r ā„ 0.

The traditional statement of Exercise 5.5.7 (iii) says that z = reiĪø where

r = |z| and Īø is real. However, we have not deļ¬ned powers yet so, for the

moment, this must merely be considered as a useful mnemonic. (We will

discuss the matter further in Exercise 5.7.9.)

It may be objected that our deļ¬nitions of sine and cosine ignore their

geometric origins. Later (see Exercises K.169 and K.170) I shall give more

ā˜geometricā™ treatment but the following points are worth noting.

The trigonometric functions did not arise as part of classical axiomatic

Euclidean geometry, but as part of practical geometry (mainly astronomy).

An astronomer is happy to consider the sine of 20 degrees, but a classical

102 A COMPANION TO ANALYSIS

geometer would simply note that it is impossible to construct an angle of

20 degrees using ruler and compass. Starting with our axioms for R we can

obtain a model of classical geometry, but the reverse is not true.

The natural ā˜practical geometricā™ treatment of angle does not use radians.

Our use of radians has nothing to do with geometric origins and everything

to do with the equation (written in radians)

d

sin cx = c cos cx.

dx

Mathematicians measure angles in radians because, for them, sine is a func-

tion of analysis, everyone else measures angles in degrees because, for them,

sine is a function used in practical geometry.

In the natural ā˜practical geometricā™ treatment of angle it is usual to conļ¬ne

oneself to positive angles less than two right angles (or indeed one right

angle). When was the last time you have heard a navigator shouting ā˜turn

ā’20 degrees leftā™ or ā˜up 370 degreesā™ ? The extension of sine from a function

on [0, Ļ/2] to a function on R and the corresponding extension of the notion

of angle is a product of ā˜analyticā™ and not ā˜geometricā™ thinking.

Since much of this book is devoted to stressing the importance of a ā˜geo-

metric approachā™ to the calculus of several variables, I do not wish to down-

play the geometric meaning of sine. However, we should treat sine both as

a geometric object and a function of analysis. In this context it matters

little whether we start with a power series deļ¬nition of sine and end up

with the parametric description of the unit circle as the path described by

the point (sin Īø, cos Īø) as Īø runs from 0 to 2Ļ or (as we shall do in Exer-

cises K.169 and K.170) we start with the Cartesian description of the circle

as x2 + y 2 = 1 and end up with a power series for sine.

Exercise 5.5.8. Write down the main properties of cosh and sinh that you

know. Starting with a tentative solution of the diļ¬erential equation y = y,

write down appropriate deļ¬nitions and prove the stated properties in the style

of this section. Distinguish between those properties which hold for cosh and

sinh as functions from C to C and those which hold for cosh and sinh as

functions from R to R.

The logarithm ā™„

5.6

In this section we shall make use of the one dimensional chain rule.

Lemma 5.6.1. Suppose that f : R ā’ R is diļ¬erentiable at x with derivative

f (x), that g : R ā’ R is diļ¬erentiable at y with derivative g (y) and that

f (x) = y. Then g ā—¦ f is diļ¬erentiable at x with derivative f (x)g (y).

103

Please send corrections however trivial to twk@dpmms.cam.ac.uk

d

In traditional notation g(f (x)) = f (x)g (f (x)). We divide the proof

dx

into two parts.

Lemma 5.6.2. Suppose that the hypotheses of Lemma 5.6.1 hold and, in

addition, f (x) = 0. Then the conclusion of Lemma 5.6.1 holds.

Proof. Since

f (x + h) ā’ f (x)

ā’ f (x) = 0

h

we can ļ¬nd a Ī“ > 0 such that

f (x + h) ā’ f (x)

=0

h

for 0 < |h| < Ī“ and so, in particular, f (x + h) ā’ f (x) = 0 for 0 < |h| < Ī“.

Thus if 0 < |h| < Ī“. we may write

g(f (x + h)) ā’ g(f (x)) g(f (x + h)) ā’ g(f (x)) f (x + h) ā’ f (x)

= . ()

f (x + h) ā’ f (x)

h h

Now f is diļ¬erentiable and so continuous at x, so f (x + h) ā’ f (x) ā’ 0 as

h ā’ 0. It follows, by using standard theorems on limits (which the reader

should identify explicitly), that

g(f (x + h)) ā’ g(f (x))

ā’ g (f (x))f (x)

h

as h ā’ 0 and we are done.

Unfortunately the proof of Lemma 5.6.2 does not work in general for

Lemma 5.6.1 since we then have no guarantee that f (x + h) ā’ f (x) = 0, even

for small h, and so we cannot use equation 7 . We need a separate proof

for this case.

Lemma 5.6.3. Suppose that the hypotheses of Lemma 5.6.1 hold and, in

addition, f (x) = 0. Then the conclusion of Lemma 5.6.1 holds.

I outline a proof in the next exercise, leaving the details to the reader.

7

Hardyā™s Pure Mathematics says ā™The proof of [the chain rule] requires a little careā™ and

carries the rueful footnote ā˜The proofs in many text-books (and in the ļ¬rst three editions

of this book) are inaccurateā™. This is the point that the text-books overlooked.

104 A COMPANION TO ANALYSIS

Exercise 5.6.4. We prove Lemma 5.6.3 by reductio ad absurdum. To this

end, suppose that the hypotheses of the lemma hold but the conclusion is false.

(i) Explain why we can ļ¬nd an > 0 and a sequence hn ā’ 0 such that

hn = 0 and

g(f (x + hn )) ā’ g(f (x))

>

hn

for each n ā„ 0.

(ii) Explain why f (x + hn ) = f (x) for each n ā„ 0.

(iii) Use the method of proof of Lemma 5.6.2 to derive a contradiction.

The rather ugly use of reductio ad absurdum in Exercise 5.6.4 can be

avoided by making explicit use of the ideas of Exercise K.23.

Note that, in this section, we only use the special case of the chain rule

given in Lemma 5.6.2. I believe that the correct way to look at the chain rule

is by adopting the ideas of Chapter 6 and attacking it directly as we shall do

in Lemma 6.2.10. We now move on to the main subject of this section.

Since e : R ā’ R++ is a bijection (indeed by Exercise 5.4.11 a group

isomorphism) it is natural to look at its inverse. Let us write l(x) = e ā’1 (x)

for x ā (0, ā) = R++ . Some of the properties of l are easy to obtain. (Here

and later we use the properties of the function e obtained in Exercise 5.4.10.)

Exercise 5.6.5. (i) Explain why l : (0, ā) ā’ R is a bijection.

(ii) Show that l(xy) = l(x) + l(y) for all x, y > 0.

(iii) Show that l is a strictly increasing function.

Exercise 5.6.6. No one who went to school after 1960 can really appreci-

ate the immense diļ¬erence between the work involved in hand multiplication

without logarithms and hand multiplication if we are allowed to use loga-

rithms. The invention of logarithms was an important contribution to the

scientiļ¬c revolution. When Henry Briggs (who made a key simpliļ¬cation)

visited Baron Napier (who invented the idea) ā˜almost one quarter of an hour

was spent, each beholding [the] other . . . with admiration before one word

was spoke, at last Mr Briggs began.

ā˜My lord, I have undertaken this long Journey purposely to see your Per-

son, and to know by what Engine of Wit or Ingenuity you came ļ¬rst to think

of this most excellent Help unto Astronomy, viz., the Logarithms; but, my

Lord, being by you found out, I wonder nobody else found it out before, when

now known it is so easy.ā™(Quotation from 9.E.3 of [16].)

(i) As Briggs realised, calculations become a little easier if we use log 10

deļ¬ned by

log10 x = l(x)/l(10)

105

Please send corrections however trivial to twk@dpmms.cam.ac.uk

for x > 0. Show that log10 xy = log10 x + log10 y for all x, y > 0 and that

log10 10r x = r + log10 x.

(ii) Multiply 1.3245 by 8.7893, correct to ļ¬ve signiļ¬cant ļ¬gures, without

using a calculator.

(iii) To multiply 1.3245 by 8.7893 using logarithms, one looked up log 10 1.3245

and log10 8.7893 in a table of logarithms. This was quick and easy, giving

log10 1.3245 ā 0.1220520, log10 8.7893 ā 0.9439543.

A hand addition, which the reader should do, gave

log10 (1.3245 Ć— 8.7893) = log10 1.3245 + log10 8.7893

ā 0.1220520 + 0.9439543 = 1.0660063.

A quick and easy search in a table of logarithms (or, still easier a table of

inverse logarithms, the so called antilogarithms) showed that

log10 1.164144 ā .0660052, log10 1.164145 ā .0660089

so that

log10 11.64144 ā 1.0660052, log10 11.64145 ā 1.0660089

and, correct to ļ¬ve signiļ¬cant ļ¬gures, 1.3245 Ć— 8.7893 = 11.6414.

(iv) Repeat the exercise with numbers of your own choosing. You may

use the ā˜log10 ā™ (often just called ā˜logā™) function on your calculator and the

ā˜inverse log10 ā™ (often called ā˜10x ā™) but you must do the multiplication and

addition by hand. Notice that you need one (or, if you are being careful, two)

more extra ļ¬gures in your calculations than there are signiļ¬cant ļ¬gures in

your answers.

[There are some additional remarks in Exercises 5.7.7 and K.85.]

Other properties require a little more work.

Lemma 5.6.7. (i) The function l : (0, ā) ā’ R is continuous.

(ii) The function l is everywhere diļ¬erentiable with

1

l (x) = .

x

Proof. (i) We wish to show that l is continuous at some point x ā (0, ā).

To this end, let Ī“ > 0 be given. Since l is increasing, we know that, if

e(l(x) + Ī“) > y > e(l(x) ā’ Ī“),

106 A COMPANION TO ANALYSIS

we have

l e(l(x) + Ī“) > l(y) > l e(l(x) ā’ Ī“)

and so

l(x) + Ī“ > l(y) > l(x) ā’ Ī“.

Now e is strictly increasing, so we can ļ¬nd Ī·(Ī“) > 0 such that

e(l(x) + Ī“) > x + Ī·(Ī“) > x = l(e(x)) > x ā’ Ī·(Ī“) > e(l(x) ā’ Ī“).

Combining the results of the two previous sentences, we see that, if |x ā’ y| <

Ī·(Ī“), then |l(x) ā’ l(y)| < Ī“. Since Ī“ was arbitrary, l is continuous at x.

(ii) We shall use the result that, if g is never zero and g(x + h) ā’ a as

h ā’ 0, then, if a = 0, 1/g(x + h) ā’ 1/a as h ā’ 0. Observe that, since l is

continuous, we have

l(x + h) ā’ l(x) ā’ 0

and so

l(x + h) ā’ l(x) l(x + h) ā’ l(x) 1 1 1

ā’

= = =

e(l(x + h)) ā’ e(l(x))

h e (l(x)) e(l(x)) x

as h ā’ 0.

By using the ideas of parts (iv), (v) and (vi) of Exercise 5.4.10 together

with parts (i) and (iii) of Exercise 5.6.5 and both parts of Lemma 5.6.7, we

get the following general result.

Exercise 5.6.8. (One dimensional inverse function theorem.) Sup-

pose that f : [a, b] ā’ [c, d] is continuous and f is diļ¬erentiable on (a, b) with

f (x) > 0 for all x ā (a, b) and f (a) = c, f (b) = d. Show that f is a bijection,

that f ā’1 : [c, d] ā’ [a, b] is continuous and that f ā’1 is diļ¬erentiable on (c, d)

with

1

(f ā’1 ) (x) = .

f (f ā’1 (x))

We shall give a diļ¬erent proof of this result in a more general (and, I would

claim, more instructive) context in Theorem 13.1.13. Traditionally, the one

dimensional inverse function theorem is illustrated, as in Figure 5.1, by taking

the graph y = f (x) with tangent shown at (f ā’1 (x0 ), x0 ) and reļ¬‚ecting in the

angle bisector of the x and y axes to obtain the graph y = f ā’1 (x) with

tangent shown at (x0 , f (x0 )).

Although the picture is suggestive, this is one of those cases where (at

the level of proof we wish to use) a simple picture is inadequate.

107

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Figure 5.1: The one dimensional inverse function theorem

Exercise 5.6.9. Go through Exercise 5.6.8 and note where you used the

mean value theorem and the intermediate value theorem.

Exercise 5.6.10. (i) Write A = {x ā Q : 2 ā„ x ā„ 1} and B = {x ā

Q : 4 ā„ x ā„ 1}. Deļ¬ne f : A ā’ B by f (x) = x2 . Show that f is strictly

increasing on A, that f (1) = 1 and f (2) = 4, that f is diļ¬erentiable on A

with f (x) ā„ 2 for all x ā A and that f : A ā’ B is injective yet f is not

surjective.

(ii) Deļ¬ne f : Q ā’ Q by

for x < 0, x2 > 2,

f (x) = x + 1

for x2 < 2,

f (x) = x

for x > 0, x2 > 2.

f (x) = x ā’ 1

Show that f (x) ā’ ā’ā as x ā’ ā’ā, that f (x) ā’ ā as x ā’ ā, that f

is everywhere diļ¬erentiable with f (x) = 1 for all x and that f : Q ā’ Q is

surjective yet f is not injective8 .

Initially we deļ¬ned the exponential and trigonometric functions as maps

C ā’ C although we did not make much use of this (they are very important

8

These examples do not exhaust the ways in which Figure 5.1 is an inadequate guide

to what can happen without the fundamental axiom of analysis [32].

108 A COMPANION TO ANALYSIS

in more advanced work) and switched rapidly to maps R ā’ R. We did

nothing of this sort for the logarithm.

The most obvious attempt to deļ¬ne a complex logarithm fails at the ļ¬rst

hurdle. We showed that, working over R, the map exp : R ā’ (0, ā) is

bijective, so that we could deļ¬ne log as the inverse function. However, we

know (see Exercise 5.5.7) that, working over C, the map exp : C ā’ C \ {0}

is surjective but not injective, so no inverse function exists.

Exercise 5.6.11. By using the fact that exp 2Ļi = 1 = exp 0, show that

there cannot exist a function L : C \ {0} ā’ C with L(exp z) = z for all

z ā C.

However, a one-sided inverse can exist.

Exercise 5.6.12. (i) If we set L0 (r exp iĪø) = log r + iĪø for r > 0 and 2Ļ >

Īø ā„ 0, show that L0 : C\{0} ā’ C is a well deļ¬ned function with exp(L0 (z)) =

z for all z ā C \ {0}.

(ii) Let n be an integer. If we set Ln (r exp iĪø) = L0 (r exp iĪø)+2Ļin, show

that Ln : C \ {0} ā’ C is a well deļ¬ned function with exp(Ln (z)) = z for all

z ā C \ {0}.

(iii) If we set M (r exp iĪø) = log r + iĪø for r > 0 and 3Ļ > Īø ā„ Ļ, show

that M : C \ {0} ā’ C is a well deļ¬ned function with exp(M (z)) = z for all

z ā C \ {0}.

The functions Ln and M in the last exercise are not continuous everywhere

and it is natural to ask if there is a continuous function L : C \ {0} ā’ C

with exp(L(z)) = z for all z ā C \ {0}. The reader should convince herself,

by trying to deļ¬ne L(exp iĪø) and considering what happens as Īø runs from 0

to 2Ļ, that this is not possible. The next exercise crystallises the ideas.

Exercise 5.6.13. Suppose, if possible, that there exists a continuous L :

C \ {0} ā’ C with exp(L(z)) = z for all z ā C \ {0}.

(i) If Īø is real, show that L(exp(iĪø)) = i(Īø + 2Ļn(Īø)) for some n(Īø) ā Z.

(ii) Deļ¬ne f : R ā’ R by

L(exp iĪø) ā’ L(1)

1

ā’Īø .

f (Īø) =

2Ļ i

Show that f is a well deļ¬ned continuous function, that f (Īø) ā Z for all Īø ā R,

that f (0) = 0 and that f (2Ļ) = ā’1.

(iii) Show that the statements made in the last sentence of (ii) are in-

compatible with the intermediate value theorem and deduce that no function

can exist with the supposed properties of L.

(iv) Discuss informally what connection, if any, the discussion above has

with the existence of the international date line.

109

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.6.13 is not an end but a beginning of much important mathe-

matics. In due course it will be necessary for the reader to understand both

the formal proof that, and the informal reasons why, no continuous L can

exist.

Powers ā™„

5.7

How should we deļ¬ne ab for a > 0 and b any real number? Most people

would say that we should ļ¬rst deļ¬ne ab for b rational and then extend ā˜by

continuityā™ to non-rational b. This can be done, even with the few tools at

our disposal, but it requires hard work to deļ¬ne ab this way and still more

hard work to obtain its properties. When we have more powerful tools at

our disposal (uniform convergence and the associated theorems) we shall see

how to make this programme work in Exercises K.227 to K.229 but, even

then, it requires careful thought.

There are, I think, various reasons why the direct approach is hard.

(1) The ļ¬rst point is mainly psychological. We need to consider ab as a

function of two variables a and b. When we deļ¬ne an , we think of the integers

n as ļ¬xed and a as varying and the same is true when we deļ¬ne ab with b

rational. However, when we want to deļ¬ne ab ā˜by continuityā™, we think of a

as ļ¬xed and b as varying.

(2) The second point is mathematical. The fact that a function is con-

tinuous on the rationals does not mean that it has a continuous extension

to the reals9 . Consider our standard example, the function f : Q ā’ Q of

Example 1.1.3. We know that f is continuous but there is no continuous

function F : R ā’ R with F (x) = f (x) for x ā Q.

Exercise 5.7.1. (i) Prove this statement by observing that, if F is continu-

ous, F (xn ) ā’ F (2ā’1/2 ) whenever xn ā’ 2ā’1/2 , or otherwise.

(ii) Find a function g : Q ā’ Q which is diļ¬erentiable with continuous

derivative such that there is a continuous function G : R ā’ R with G(x) =

g(x) for x ā Q but any such function G is not everywhere diļ¬erentiable.

However, the fact that I think something is hard does not prove that it

is hard. I suggest that the reader try it for herself. (She may well succeed,

all that is required is perseverance and a cool head. I simply claim that the

exercise is hard, not that it is impossible.)

9

I can still remember being scolded by my research supervisor for making this partic-

ular mistake. (The result is true if we replace ā˜continuityā™ by ā˜uniform continuityā™. See

Exercise K.56.)

110 A COMPANION TO ANALYSIS

Assuming that the reader agrees with me, can we ļ¬nd another approach?

We obtained the exponential and trigonometric functions as the solution of

diļ¬erential equations. How does this approach work here? The natural choice

of diļ¬erential equation, if we wish to obtain y(x) = xĪ± , is

xy (x) = Ī±y(x)

(Here Ī± is real and y : (0, ā) ā’ (0, ā).)

Tentative solution. We can rewrite as

y (x) Ī±

ā’ = 0.

y(x) x

Using the properties of logarithm and the chain rule, this gives

d

(log y(x) ā’ Ī± log x) = 0

dx

so, by the mean value theorem,

log y(x) ā’ Ī± log x = C

where C is constant. Applying the exponential function and taking A =

exp C, we obtain

y(x) = A exp(Ī± log x)

where A is a constant.

Exercise 5.7.2. Check, by using the chain rule, that y(x) = A exp(Ī± log x)

is indeed a solution of .

This suggests very strongly indeed that we should deļ¬ne xĪ± = exp(Ī± log x).

In order to avoid confusion, we adopt our usual policy of light disguise

and investigate the properties of functions rĪ± : (0, ā) ā’ (0, ā) deļ¬ned

by rĪ± (x) = exp(Ī± log x) [Ī± real].

Exercise 5.7.3. (Index laws.) If Ī±, Ī² ā R, show that

(i) rĪ±+Ī² (x) = rĪ± (x)rĪ² (x) for all x > 0.

(ii) rĪ±Ī² (x) = rĪ± (rĪ² (x)) for all x > 0.

Exercise 5.7.4. (Consistency.) Suppose that n, p and q are integers with

n ā„ 0 and q > 0. Show that

(i) r1 (x) = x for all x > 0.

(ii) rn+1 (x) = xrn (x) for all x > 0.

111

Please send corrections however trivial to twk@dpmms.cam.ac.uk

n

(iii) rn (x) = x Ć— x Ć— Ā· Ā· Ā· Ć— x for all x > 0.

1

(iv) rā’n (x) = for all x > 0.

rn (x)

(v) rq (rp/q (x)) = rp (x) for all x > 0.

Explain brieļ¬‚y why this means that writing rp/q (x) = xp/q is consistent

with your previous school terminology.

Exercise 5.7.5. Suppose that Ī± is real. Show that

(i) rĪ± (xy) = rĪ± (x)rĪ± (y) for all x, y > 0.

(ii) r0 (x) = 1 for all x > 0.

(iii) rĪ± is everywhere diļ¬erentiable and xrĪ± (x) = Ī±rĪ± (x) and rĪ± (x) =

Ī±rĪ±ā’1 (x) for all x > 0.

Exercise 5.7.6. (i) If x > 0 is ļ¬xed, show that rĪ± (x) is a diļ¬erentiable

function of Ī± with

d

rĪ± (x) = rĪ± (x) log x.

dĪ±

(ii) If Ī± > 0 and Ī± is kept ļ¬xed, show that rĪ± (x) is an increasing function

of x. What happens if Ī± < 0?

(iii) If x > 1 and x is kept ļ¬xed, show that rĪ± (x) is an increasing function

of Ī±. What happens if 0 < x < 1?

(iv) If we write e = exp 1 show that exp x = re (x) (or, in more familiar

terms, exp x = ex ).

Exercise 5.7.7. Take two rulers A and B marked in centimeters (or some

other convenient unit) and lay them marked edge to marked edge. If we slide

the point marked 0 on ruler B until it is opposite the point marked x on ruler

A, then the point marked y on ruler B will be opposite the point marked x + y

on ruler A. We have invented an adding machine.

Now produce a new ruler A by renaming the point marked x as 10x (thus

the point marked 0 on A becomes the point marked 1 on A and the point

marked 3 on A becomes the point marked 1000 on A ). Obtain B from B in

the same way. If we slide the point marked 1 on ruler B until it is opposite

the point marked 10x on ruler A , then the point marked 10y on ruler B will

be opposite the point marked 10x+y on ruler A . Explain why, if a, b > 0 and

we slide the point marked 1 on ruler B until it is opposite the point marked

a on ruler A , then the point marked b on ruler B will be opposite the point

marked ab on ruler A . We have invented an multiplying machine.

(i) How would you divide a by b using this machine?

(ii) Does the number 10 play an essential role in the device?

112 A COMPANION TO ANALYSIS

(iii) Draw a line segment CD of some convenient length to represent the

ruler A . If C corresponds to 1 and D to 10, draw, as accurately as you can,

the points corresponding to 2, 3, . . . , 9.

The device we have described was invented by Oughtred some years after

Napierā™s discovery of the logarithm and forms the basis for the ā˜slide ruleā™.

From 1860 to 1960 the slide rule was the emblem of the mathematically com-

petent engineer. It allowed fast and reasonably accurate ā˜back of an envelopeā™

calculations.

Exercise 5.7.8. By imitating the argument of Exercise 5.6.13 show that

there is no continuous function S : C ā’ C with S(z)2 = z for all z ā C.

(In other words, we can not deļ¬ne a well behaved square root function on the

complex plane.)

Exercise 5.7.9. Exercise 5.7.8 shows, I think, that we can not hope to ex-

tend our deļ¬nition of rĪ± (x) with x real and strictly positive and Ī± real to

some well behaved rĪ± (z) with Ī± and z both complex. We can, however, ex-

tend our deļ¬nition to the case when x is still real and strictly positive but we

allow Ī± to be complex. Our deļ¬nition remains the same

rĪ± (x) = exp(Ī± log x)

but only some of our previous statements carry over.

(i) If Ī±, Ī² ā C, show that rĪ±+Ī² (x) = rĪ± (x)rĪ² (x) for all x > 0. Thus

part (i) of Exercise 5.7.3 carries over.

(ii) Explain carefully why the statement in part (ii) of Exercise 5.7.3

?

rĪ±Ī² (x) = rĪ± (rĪ² (x))

makes no sense (within the context of this question) if we allow Ī± and Ī² to

range freely over C. Does it make sense and is it true if Ī² ā R and Ī± ā C?

Does it make sense and is it true if Ī± ā R and Ī² ā C?

(iii) Find which parts of Exercises 5.7.5 and 5.7.6 continue to make sense

in the more general context of this question and prove them.

(iv) Show that, if u and v are real and e = exp(1), then exp(u + iv) =

ru+iv (e). We have thus converted the mnemonic

exp(z) = ez

into a genuine equality.

113

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.7.10. According to a well known story10 , the Harvard mathe-

matician Benjamin Pierce chalked the formula

eiĻ + 1 = 0

on the board and addressed his students as follows.

Gentleman, that is surely true, it is absolutely paradoxical; we

cannot understand it, and we do not know what it means, but we

have proved it, and therefore we know it must be the truth.

(i) In the context of this chapter, what information is conveyed by the

formula

exp(iĻ) + 1 = 0?

(What does exp mean, what does Ļ mean and what does exp(iĻ) mean?)

(ii) In the context of this chapter, what information is conveyed by the

formula

eiĻ + 1 = 0?

There is a superb discussion of the problem of deļ¬ning xĪ± in Kleinā™s

Elementary Mathematics from an Advanced Standpoint [28].

The fundamental theorem of algebra ā™„

5.8

It is in the nature of a book like this that much of our time is occupied in

proving results which the ā˜physicist in the streetā™ would consider obvious. In

this section we prove a result which is less obvious.

Theorem 5.8.1. (The fundamental theorem of algebra.) Suppose that

n ā„ 1, a0 , a1 , . . . , an ā C and an = 0. Then the equation

an z n + anā’1 z nā’1 + Ā· Ā· Ā· + a0 = 0

has at least one root in C.

In other words, every polynomial has a root in C.

If the reader believes that this is obvious, then she should stop reading

at this point and write down the ā˜obvious argumentā™. In fact, Leibniz and

other mathematicians doubted the truth of the result. Although dā™Alembert,

10

Repeated in Martin Gardnerā™s Mathematical Diversions. See also Exercise K.89.

114 A COMPANION TO ANALYSIS

Euler and Lagrange oļ¬ered proofs of the result, they were unsatisfactory and

the ļ¬rst satisfactory discussion is due to Gauss11 .

The ļ¬rst point to realise is that the ā˜fundamental theorem of algebraā™ is

in fact a theorem of analysis!

Exercise 5.8.2. Suppose z = u + iv with u, v ā R. If z 2 ā’ 2 = 0, show that

u2 ā’ v 2 = 2

uv = 0

and deduce that v = 0, u2 = 2.

If we write

Q + iQ = {x + iy : x, y ā Q},

show that the equation

z2 ā’ 2 = 0

has no solution with z ā Q + iQ.

Since Q + iQ and C = R + iR share the same algebraic structure, Exer-

cise 5.8.2 shows that the truth of Theorem 5.8.1 must depend in some way of

the fundamental axiom of analysis. We shall use Theorem 4.3.4, which states

that any continuous function on a closed bounded set in Rn has a minimum,

to establish the following key step of our proof.

Lemma 5.8.3. If P is a polynomial, then there exists a z0 ā C such that

|P (z)| ā„ |P (z0 )|

for all z ā C.

We then complete the proof by establishing the following lemma.

Lemma 5.8.4. If P is a non-constant polynomial and |P | attains a mini-

mum at z0 , then P (z0 ) = 0.

Clearly, Lemmas 5.8.3 and 5.8.4 together imply Theorem 5.8.1. Our

proofs of the two lemmas make use of simple results given in the next exercise.

11

See [29], Chapter 19, section 4 and Chapter 25 sections 1 and 2.

115

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 5.8.5. (i) Let P (z) = n aj z j with n ā„ 1 and an = 0. Show

j=0

ā’1

that, if we set R0 = 2n|an | (1 + max0ā¤jā¤n |aj |), then, whenever |z| ā„ R0 ,

|aj | |aj | |an |

ā¤ ā¤

|z|nā’j R0 2n

for all 0 ā¤ j ā¤ n ā’ 1. Hence, or otherwise, show that

nā’1

|an |

aj

ā„

an +

z nā’j 2

j=0

and so

n

|an ||z|n

j

ā„

aj z

2

j=0

for all |z| ā„ R0 .

(ii) By using the result of (i), show that, given any real number K ā„ 0,

we can ļ¬nd an R(K) > 0 such that |P (z)| ā„ K whenever |z| ā„ R(K).

(iii) Let Q(z) = n bj z j with n ā„ k ā„ 1 and bk = 0. Show that there

j=k

exists a Ī·0 > 0 such that

n

|bk ||z|k

j

ā¤

bj z

2

j=k+1

for all |z| ā¤ Ī·0 .

Proof of Lemma 5.8.3. We wish to show that, if P is any polynomial, then

|P | has a minimum. If P is a constant polynomial there is nothing to prove,

n j

j=0 aj z with n ā„ 1 and an = 0. By

so we may suppose that P (z) =

Exercise 5.8.5 (ii), we can ļ¬nd an R > 0 such that |P (z)| ā„ |P (0)| + 1

whenever |z| ā„ R.

Identifying C with R2 in the usual way, we observe that

ĀÆ

DR = {z ā C : |z| ā¤ R}

is a closed bounded set and that the function |P | : C ā’ R is continuous.

Thus we may use Theorem 4.3.4 which states that a continuous function on

ĀÆ

a closed bounded set attains its minimum to show the existence of a z0 ā DR

ĀÆ

with |P (z0 )| ā¤ |P (z)| for all z ā DR .

We note, in particular, that |P (z0 )| ā¤ |P (0)|. Thus, if |z| ā„ R, then

|P (z)| ā„ |P (0)| + 1 > |P (0)| ā„ |P (z0 )|.

It follows that |P (z0 )| ā¤ |P (z)| for all z ā C as required.

116 A COMPANION TO ANALYSIS

Exercise 5.8.6. Deļ¬ne f : C ā’ R by f (z) = ā’|z|2 . Show that f attains a

minimum on every set

ĀÆ

DR = {z ā C : |z| ā¤ R}

but has no minimum on C. Explain brieļ¬‚y why the proof above works for |P |

but not for f .

We must now show that, if z0 gives the minimum value of the modulus |P |

of a non-constant polynomial P , then P (z0 ) = 0. We start with a collection

of remarks intended to simplify the algebra.

Exercise 5.8.7. (i) Let P be a non-constant polynomial whose modulus |P |

has a minimum at z0 . Show that if Q(z) = P (z + z0 ), then Q is a non-

constant polynomial whose modulus |Q| has a minimum at 0. Show further

that, if Q(0) = 0, then P (z0 ) = 0.

(ii) Let Q be a non-constant polynomial whose modulus |Q| has a mini-

mum at 0. Show that, for an appropriate Ļ ā R, to be deļ¬ned, the function

R(z) = eiĻ Q(z) has R(0) real and positive12 . Show that R is a non-constant

polynomial whose modulus |R| has a minimum at 0 and that, if R(0) = 0,

then Q(0) = 0.

(iii) Let R be a non-constant polynomial whose modulus |R| has a mini-

mum at 0 and such that R(0) is real and positive. Explain why we have

n

aj z j

R(z) = a0 +

j=k

where a0 is real and positive, k ā„ 1 and ak = 0. Set S(z) = R(eiĻ z). Show

that, for an appropriate Ļ ā R, to be deļ¬ned,

n

bj z j

S(z) = b0 +

j=k

where b0 is real and positive, k ā„ 1 and bk is real and strictly negative (that

is bk < 0).

Most mathematicians would consider the results of Exercise 5.8.7 to be

trivial and use a phrase like ā˜Without loss of generality we may suppose that

z0 = 0 and P (z) = a0 + n aj z j where a0 is real and positive, k ā„ 1 and

j=k

ak is real and strictly negativeā™ or (better) ā˜By considering eiĻ P (eiĻ (z ā’ z0 ))

we may suppose that z0 = 0 and P (z) = a0 + n aj z j where a0 is real and

j=k

positive, k ā„ 1 and ak is real and strictly negativeā™.

12

That is to say, non-negative.

117

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof of Lemma 5.8.4. We want to show that if P is a non-constant poly-

nomial and z0 gives a minimum of |P |, then P (z0 ) = 0. Without loss of

generality we may suppose that z0 = 0 and P (z) = a0 + n aj z j where a0

j=k

is real and positive, k ā„ 1 and ak is real and strictly negative. If a0 = 0 then

P (0) = 0 and we are done. We suppose that a0 is strictly positive and seek

a contradiction.

By Exercise 5.8.5, we can ļ¬nd an Ī·0 > 0 such that

n

|ak z k |

j

ā¤

aj z

2

j=k+1

for all |z| ā¤ Ī·0 . Now choose Ī·1 , a real number with 0 < Ī·1 ā¤ Ī·0 and

k

a0 > |ak |Ī·1 /2 (Ī·1 = min(Ī·0 , 1, ā’a0 /(2ak )) will do). Remembering that a0 is

real and strictly positive and ak is real and strictly negative, we see that,

whenever Ī· is real and 0 < Ī· < Ī·1 , we have

n n

j k

aj Ī· j

|P (Ī·)| = a0 + ā¤ |a0 + ak Ī· | +

aj Ī·

j=k j=k+1

ā¤ |a0 + ak Ī· k | + |ak Ī· k |/2 = a0 + ak Ī· k ā’ ak Ī· k /2 = a0 + ak Ī· k /2 < P (0),

contradicting the statement that 0 is a minimum for P . The result follows

by reductio ad absurdum.

The proof of Theorem 5.8.1 may look a little complicated but really it

only amounts to a ļ¬‚eshing out of the following sketch argument.

Outline proof of Theorem 5.8.1. Let P be a non constant polynomial. Since

|P (z)| ā’ ā as |z| ā’ ā, P must attain a minimum. By translation, we may

suppose that the minimum occurs at 0. If P (0) = 0, then

n

aj z j

P (z) = a0 +

j=k

with k ā„ 1 and a0 , ak = 0. Close to zero,

P (z) ā a0 + ak z k .

Choosing an appropriate Ļ, we have |a0 + ak (eiĻ Ī·)k | < |a0 | whenever Ī· is

small and strictly positive, contradicting the statement that |P | attains a

minimum, at 0. The result follows by reductio ad absurdum.

Exercise 5.8.8. Give an explicit value for Ļ in the outline proof just sketched.

118 A COMPANION TO ANALYSIS

Exercise 5.8.9. We say that z0 is a local minimum of a function G : C ā’ R

if we can ļ¬nd a Ī“ > 0 such that G(z) ā„ G(z0 ) for all z with |z ā’ z0 | < Ī“.

Show that if P is a non-constant polynomial and z0 is a local minimum of

|P |, then P (z0 ) = 0.

We have already used the strategy of looking for a minimum (or maxi-

mum) and then considering the behaviour of the function near that ā˜extremeā™

point in our proof of Rolleā™s theorem (Theorem 4.4.4). Another example oc-

curs in Exercise K.30 if the reader wishes to try it and other examples will

crop up in this book. The method is very powerful but we must be careful to

establish that an extreme point actually exists (see, as a warning example,

the discussion beginning on page 199 of a counterexample due, essentially,

to Weierstrass). Notice that our proof required the ability to ā˜look in all

directionsā™. The minimum had to be in the open set

DR = {z ā C : |z| < R}

and not merely in the set

ĀÆ

DR = {z ā C : |z| ā¤ R}.

Exercise 5.8.10. This exercise recalls material that is probably familiar from

algebra. We work in C.

(i) Show, by induction on the degree of P , or otherwise, that if P is a

non-constant polynomial and Ī» ā C, then there exists a polynomial Q and

an r ā C such that

P (z) = (z ā’ Ī»)Q(z) + r.

(ii) If P is a non-constant polynomial and Ī» ā C is such that P (Ī») = 0,

then there is a polynomial Q such that

P (z) = (z ā’ Ī»)Q(z).

(iii) Use the fundamental theorem of algebra and induction on the degree

of n to show that any polynomial P of degree n can be written in the form

n

(z ā’ Ī»j ).

P (z) = a

j=1

(iv) Show that a polynomial of degree n can have at most n distinct roots.

What is the minimum number of distinct roots it can have?

119

Please send corrections however trivial to twk@dpmms.cam.ac.uk

(v) If P has real coeļ¬cients show13 that P (z)ā— = P (z ā— ) and deduce that,

if Ī» is a root of P , so is Ī»ā— .

(vi) Use part (v) and induction to show that, if P is a polynomial with

real coeļ¬cients, then P can be written in the form

m

P (z) = a Qj (z)

j=1

where a ā R and, for each j, either Qj (z) = z + aj with aj ā R, or Qj =

z 2 + aj z + bj with aj , bj ā R.

In the days before mathematicians acquired our present conļ¬dence with

complex numbers, the fundamental theorem of algebra was given the less gen-

eral statement that any polynomial with real coeļ¬cients could be written as

the product of linear and quadratic terms with real coeļ¬cients.

It is natural to ask if this restricted result which does not mention complex

numbers can be proved without using complex numbers. Gaussā™s ļ¬rst proof of

the restricted result used complex numbers but he later gave a second proof

without using complex numbers which depends only on the fact that a real

polynomial of odd degree must have a root (Exercise 1.6.4) and so uses the

fundamental axiom in the form of the intermediate value theorem. As might

be expected, his proof and its modern sucessors are rather subtle. The reader

is advised to wait until she has studied the rudiments of Galois theory before

pursuing these ideas further.

Exercise 5.8.11. Let P (z) = n aj z j be a non-constant polynomial with

j=0

a root at z0 .

(i) Explain why we can ļ¬nd an Ī·0 > 0 such that P (z) = 0 for all z with

0 < |z ā’ z0 | < Ī·0 .

(ii) If 0 < Ī· < Ī·0 , use the fact that a continuous function on a closed

bounded set is bounded and attains its bounds to show that there is a Ī“(Ī·) > 0

such that |P (z)| ā„ Ī“(Ī·) > 0 for all z with |z ā’ z0 | = Ī·.

(iii) Continuing with the notations and assumptions of (ii), show that if

Q(z) is a polynomial with |P (z) ā’ Q(z)| < Ī“(Ī·)/2 for all z with |z ā’ z 0 | ā¤ Ī·,

then |Q| has a local minimum (and so Q has a root) z1 with |z1 ā’ z0 | < Ī·.

(iv) Show that given any Ī“ > 0, we can ļ¬nd an > 0 (depending on Ī“, n,

a0 , a1 , . . . , an ) such that, if |aj ā’ bj | < , for 0 ā¤ j ā¤ n then n bj z j has

j=0

at least one root z1 with |z0 ā’ z1 | < Ī“.

[Note that this result is not true if we work over R. The equation x2 = 0

has a real root at 0 but x2 + = 0 has no real roots if > 0 however small

may be.]

We write z ā— for the complex conjugate of z. Thus, if x and y are real (x+iy)ā— = xā’iy.

13

Some authors use z .

ĀÆ

120 A COMPANION TO ANALYSIS

Exercise 5.8.12. (This exercise requires countability and a certain willing-

ness to think like an algebraist.)

It is sometimes said that we have to introduce R in order to provide

equations like x2 ā’ 2 = 0 with a root. A little thought shows that this is too

simple a view of the matter. Recall that a system (F, +, Ć—) satisfying all the

axioms set out in Axioms A except axioms P1 to P4 (the axioms of order) is

called a ļ¬eld. If (F, +, Ć—) is a ļ¬eld and and G ā F is such that

(a) 0, 1, ā’1 ā G,

(b) if x, y ā G, then x + y, xy ā G,

(c) if x ā G and x = 0, then xā’1 ā G,

then we say that G is a subļ¬eld of F. It easy to see that a subļ¬eld is itself

a ļ¬eld. In this exercise we show that there is a countable subļ¬eld L of C

containing Q and such that, if a0 , a1 , . . . , an ā L, with an = 0, then we can

ļ¬nd a, Ī»1 , . . . , Ī»n ā L such that

n n

j

(z ā’ Ī»j )

aj z = a

j=0 k=1

for all z ā L. In other words, every polynomial with coeļ¬cients in L has all

its roots in L. Here are the steps in the proof.

(i) If K is a countable subļ¬eld of C, show that the set of polynomials

with degree n with coeļ¬cients in K is countable. Deduce that the set of

polynomials P(K) with coeļ¬cients in K is countable. Show also that the set

Z(K) of roots in C of polynomials in P(K) is countable.

(ii) If K is a subļ¬eld of C and Ļ ā C, we write K(Ļ) for the set of

numbers P (Ļ)/Q(Ļ) with P , Q ā P(K) and Q(Ļ) = 0. Show that K(Ļ) is a

subļ¬eld of C containing K and Ļ. If K is countable, show that K(Ļ) is.

(iii) Let K be a subļ¬eld of C and Ļ = (Ļ1 , Ļ2 , . . . ) where Ļj ā C. Set

K0 = K and deļ¬ne Kn = Knā’1 (Ļn ) for all n ā„ 1. If we set K(Ļ) = ā Kn ,n=0

show that K(Ļ) is a subļ¬eld of C containing K and Ļj for each j ā„ 1. If K

is countable, show that K(Ļ) is.

(iv) Let K be a countable subļ¬eld of C (we could take K = Q). Set

K0 = K. Show by induction, using part (iii), that we may deļ¬ne inductively

a sequence Kn of countable subļ¬elds of C such that Kn contains Z(Knā’1 ) for

each n ā„ 1. If we set L = ā Kn , show that L is a countable subļ¬eld of C

n=0

such that every polynomial with coeļ¬cients in L has all its roots in L.

[We say that ļ¬elds like L are ā˜algebraically closedā™. The work we have

had to do to obtain an ā˜algebraically closedā™ L from K shows the fundamental

theorem of algebra in a remarkable light. Although R is not algebraically

closed, adjoining a single root i of a single equation z 2 + 1 = 0 to form

R(i) = C produces an algebraically closed ļ¬eld!]

Chapter 6

Diļ¬erentiation

6.1 Preliminaries

This section is as much propaganda as technical mathematics and, as with

much propaganda, most points are made more than once.

We have already looked brieļ¬‚y at diļ¬erentiation of functions f : R ā’ R.

Unfortunately, nature is not one dimensional and we must consider the more

general case of a function f : Rm ā’ Rp . The deļ¬nition of the derivative in

terms of the limit of some ratio is not available since we cannot divide by

vectors.

The ļ¬rst solution that mathematicians found to this problem is via ā˜di-

rectional derivativesā™ or, essentially equivalently, via ā˜partial derivativesā™. We

shall give formal deļ¬nitions later but the idea is to reduce a many dimen-

sional problem to a collection of one dimensional problems by only examining

changes in one direction at at time. Suppose, for example, that f : Rm ā’ R

is well behaved. If we wish to examine how f behaves near x we choose a unit

vector u and look at fu (t) = f (x + tu) with t ā R. The function fu : R ā’ R

is ā˜one dimensionalā™ and we may look at its derivative

f (x + hu) ā’ f (x)

fu (x) = lim .

h

hā’0

By choosing m unit vectors uj at right angles and looking at the associated

ā˜directional derivativesā™ fuj (x) we can obtain a picture of the way in which

f changes.

But to echo Maxwell

. . . the doctrine of Vectors . . . is a method of thinking and not,

at least for the present generation, a method of saving thought.

121

122 A COMPANION TO ANALYSIS

It does not, like some more popular mathematical methods, en-

courage the hope that mathematicians may give their minds a

holiday, by transferring all their work to their pens. It calls on us

at every step to form a mental image of the geometrical features

represented by the symbols, so that in studying geometry by this

method we have our minds engaged with geometrical ideas, and

are not permitted to call ourselves geometers when we are only

arithmeticians. (Page 951, [38])

Is there a less ā˜coordinate boundā™ and more ā˜geometricā™ way of looking at

diļ¬erentiation in many dimensions? If we are prepared to spend a little time

and eļ¬ort acquiring new habits of thought, the answer is yes.

The original discoverers of the calculus thought of diļ¬erentiation as the

process of ļ¬nding a tangent. If f : R ā’ R is well behaved then the tangent

at x is the line y = b + a(t ā’ x) which touches the curve y = f (t) at (x, f (x))

that is the ā˜line which most resembles f close to xā™. In other words

f (t) = b + a(t ā’ x) + small error

close to x. If we think a little harder about the nature of the ā˜smallest errorā™

possible we see that it ā˜ought to decrease faster than linearā™ that is

f (t) = b + a(t ā’ x) + E(t)|t ā’ x|

with E(t) ā’ 0 as t ā’ x.

Exercise 6.1.1. Suppose that f : R ā’ R. Show that the following two

statements are equivalent.

f (t) ā’ f (x)

ā’ a as t ā’ x.

(i)

tā’x

(ii) f (t) = f (x) + a(t ā’ x) + E(t)|t ā’ x| with E(t) ā’ 0 as t ā’ x.

Rewriting our equations slightly, we see that f is diļ¬erentiable at x if

f (t) ā’ f (x) = a(t ā’ x) + E(t)|t ā’ x|

with E(t) ā’ 0 as t ā’ 0. A ļ¬nal rewrite now gives f is diļ¬erentiable at x if

f (x + h) ā’ f (x) = ah + (h)|h|.

where (h) ā’ 0 as h ā’ x. The derivative f (x) = a is the slope of the

tangent at x.

The obvious extension to well behaved functions f : Rm ā’ R is to con-

sider the tangent plane at (x, f (x)). Just as the equation of a non-vertical

123

Please send corrections however trivial to twk@dpmms.cam.ac.uk

line through the origin in R Ć— R is y = bt, so the equation of an appropriate

plane (or ā˜hyperplaneā™ if the reader prefers) in Rm Ć— R is y = Ī±(x) where

Ī± : Rm ā’ R is linear. This suggests that we say that f is diļ¬erentiable at x

if

f (x + h) ā’ f (x) = Ī±(h) + (h) h ,

where (h) ā’ 0 as h ā’ 0. It is natural to call Ī± the derivative of f at x.

Finally, if we consider f : Rm ā’ Rp , the natural ļ¬‚ow of our argument

suggests that we say that f is diļ¬erentiable at x if we can ļ¬nd a linear map

Ī± : Rm ā’ Rp such that

f (x + h) = f (x) + Ī±(h) + (h) h

where (h) ā’ 0 as h ā’ 0. It is natural to call Ī± the derivative of f at x.

Important note: It is indeed natural to call Ī± the derivative of f at x.

Unfortunately, it is not consistent with our old deļ¬nition in the case m =

p = 1. If f : R ā’ R, then, with our new deļ¬nition, the derivative is the map

t ā’ f (x)t but, with our old, the derivative is the number f (x).

From the point of view we have adopted, the key observation of the one

dimensional diļ¬erential calculus is that well behaved curves, however com-

plicated they may be globally, behave locally like straight lines i.e. like the

simplest curves we know. The key observation of multidimensional calculus

is that well behaved functions, however complicated they may be globally,

behave locally like linear maps i.e. like the simplest functions we know. It

is this observation, above all, which justiļ¬es the immense amount of time

spent studying linear algebra, that is to say, studying the behaviour of linear

maps.

I shall assume that the reader has done a course on linear algebra and

is familiar with with the deļ¬nition and lemma that follow. (Indeed, I have

already assumed familiarity with the notion of a linear map.)

Deļ¬nition 6.1.2. We say that a function (or map) Ī± : Rm ā’ Rp is linear

if

Ī±(Ī»x + Āµy) = Ī»Ī±(x) + ĀµĪ±(y)

for all x, y ā Rm and Ī», Āµ ā R.

We shall often write Ī±x = Ī±(x).

Lemma 6.1.3. Each linear map Ī± : Rm ā’ Rp is associated with a unique

p Ć— m real matrix A = (aij ) such that if Ī±x = y then

m

yi = aij xj (ā )

j=1

124 A COMPANION TO ANALYSIS

Conversely each pĆ—m real matrix A = (aij ) is associated with a unique linear

map Ī± : Rm ā’ Rp by the equation (ā ).

We shall call A the matrix of Ī± with respect to the standard bases. The

point to notice is that, if we take diļ¬erent coordinate axes, we get diļ¬erent

matrices associated with the same linear map.

From time to time, particularly in some of the exercises, we shall use other

facts about linear maps. The reader should not worry too much if some of

these facts are unfamiliar but she should worry if all of them are.

We now repeat the discussion of diļ¬erentiation with marginally more

generality and precision.

A function is continuous if it is locally approximately constant. A function

is diļ¬erentiable if it is locally approximately linear. More precisely, a function

is continuous at a point x if it is locally approximately constant, with an error

which decreases to zero, as we approach x. A function is diļ¬erentiable at a

point x if it is locally approximately linear, with an error which decreases to

zero faster than linearly, as we approach x.

Deļ¬nition 6.1.4. Suppose that E is a subset of Rm and x a point such that

there exists a Ī“ > 0 with the ball B(x, Ī“) ā E. We say that f : E ā’ Rp , is

diļ¬erentiable at x if we can ļ¬nd a linear map Ī± : Rm ā’ Rp such that, when

h < Ī“,

f (x + h) = f (x) + Ī±h + (x, h) h ,

where (x, h) ā’ 0 as h ā’ 0. We write Ī± = Df (x) or Ī± = f (x).

If E is open and f is diļ¬erentiable at each point of E, we say that f is

diļ¬erentiable on E.

Needless to say, the centre of the deļ¬nition is the formula and the

reader should concentrate on understanding the rĖle of each term in that

o

formula. The rest of the deļ¬nition is just supporting waļ¬„e. Formula is

sometimes written in the form

f (x + h) ā’ f (x) ā’ Ī±h

ā’0

h

as h ā’ 0.

Of course, we need to complete Deļ¬nition 6.1.4 by showing that Ī± is

unique.

Lemma 6.1.5. (i) Let Ī³ : Rm ā’ Rp be a linear map and : R m ā’ Rp a

function with (h) ā’ 0 as h ā’ 0. If

Ī³h = (h) h

125

Please send corrections however trivial to twk@dpmms.cam.ac.uk

then Ī³ = 0 the zero map.

(ii) There is at most one Ī± satisfying the conditions of Deļ¬nition 6.1.4.

Proof. (i) There are many diļ¬erent ways of setting out this simple proof.

Here is one. Let x ā Rm . If Ī· > 0, we have

Ī³x = Ī· ā’1 Ī³(Ī·x) = Ī· ā’1 (Ī·x) Ī·x = (Ī·x) x

and so

(Ī·x) x ā’ 0

Ī³x =

as Ī· ā’ 0 through values Ī· > 0. Thus Ī³x = 0 and Ī³x = 0 for all x ā Rm .

In other words, Ī³ = 0.

(ii) Suppose that we can ļ¬nd linear maps Ī±j : Rm ā’ Rp such that, when

h < Ī“,

f (x + h) = f (x) + Ī±j h + j (x, h) h,

where j (x, h) ā’ 0 as h ā’ 0 [j = 1, 2].

Subtracting, we see that

(Ī±1 ā’ Ī±2 )h = (x, h)

where

ā’

(x, h) = 2 (x, h) 1 (x, h)

for h < Ī“. Since

(x, h) ā¤ ā’0

1 (x, h) + 2 (x, h)

as h ā’ 0, we can apply part (i) to obtain Ī±1 = Ī±2 .

The coordinate free approach can be taken only so far, and to calculate

we need to know the the matrix A of Ī± = Df (x) with respect to the standard

bases. To ļ¬nd A we have recourse to directional derivatives.

Deļ¬nition 6.1.6. Suppose that E is a subset of Rm and that we have a

function g : E ā’ R. Suppose further that x ā E and u is a unit vector such

that there exists a Ī“ > 0 with x + hu ā E for all |h| < Ī“. We can now deļ¬ne

a function G from the open interval (ā’Ī“, Ī“) to R by setting G(t) = g(x + tu).

If G is diļ¬erentiable at 0, we say that g has a directional derivative at x in

the direction u of value G (0).

126 A COMPANION TO ANALYSIS

Exercise 6.1.7. Suppose that E is a subset of of Rm and that we have a

function g : E ā’ R. Suppose further that x ā E and u is a unit vector such

that there exists a Ī“ > 0 with x + hu ā E for all |h| < Ī“. Show that g has a

directional derivative at x in the direction u of value a if and only if

g(x + tu) ā’ g(x)

ā’a

t

as t ā’ 0.

We are interested in the directional derivatives along the unit vectors ej in

the directions of the coordinate axes. The reader is almost certainly familiar

with these under the name of ā˜partial derivativesā™.

Deļ¬nition 6.1.8. Suppose that E is a subset of of Rm and that we have a

function g : E ā’ R. If we give Rm the standard basis e1 , e2 , . . . , em (where

ej is the vector with jth entry 1 and all other entries 0), then the directional

derivative of g at x in the direction ej is called a partial derivative and written

g,j (x).

The recipe for computing g,j (x) is thus, ā˜diļ¬erentiate g(x1 , x2 , . . . , xj , . . . , xn )

with respect to xj treating all the xi with i = j as constantsā™.

The reader would probably prefer me to say that g,j (x) is the partial

derivative of g with respect to xj and write

ā‚g

g,j (x) = (x).

ā‚xj

I shall use this notation from time to time, but, as I point out in Appendix E,

there are cultural diļ¬erences between the way that applied mathematicians

and pure mathematicians think of partial derivatives, so I prefer to use a

diļ¬erent notation.

The reader should also know a third notation for partial derivatives.

Dj g = g,j .

This ā˜Dā™ notation is more common than the ā˜commaā™ notation and is to be

preferred if you only use partial derivatives occasionally or if you only deal

with functions f : Rn ā’ R. The ā˜commaā™ notation is used in Tensor Analysis

and is convenient in the kind of formulae which appear in Section 7.2.

If E is a subset of of Rm and we have a function g : E ā’ Rp then we can

write

ļ£« ļ£¶

g1 (t)

ļ£¬g2 (t)ļ£·

ļ£¬ ļ£·

g(t) = ļ£¬ . ļ£·

ļ£.ļ£ø .

gp (t)

127

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and obtain functions gi : E ā’ R with partial derivatives (if they exist) gi,j (x)

ā‚gi

(or, in more standard notation (x)). The proof of the next lemma just

ā‚xj

consists of dismantling the notation so laboriously constructed in the last

few paragraphs.

Lemma 6.1.9. Let f be as in Deļ¬nition 6.1.4. If we use standard coordi-

nates, then, if f is diļ¬erentiable at x, its partial derivatives fi,j (x) exist and

the matrix of the derivative Df (x) is the Jacobian matrix (fi,j (x)) of partial

derivatives.

Proof. Left as a strongly recommended but simple exercise for the reader.

Notice that, if f : R ā’ R, the matrix of Df (x) is the 1 Ć— 1 Jacobian matrix

(f (x)). Notice also that Exercise 6.1.9 provides an alternative proof of the

uniqueness of the derivative (Lemma 6.1.5 (ii)).

It is customary to point out that the existence of the partial deriva-

tives does not imply the diļ¬erentiability of the function (see Example 7.3.14

below) but the main objections to over-reliance on partial derivatives are

that it makes formulae cumbersome and stiļ¬‚es geometric intuition. Let your

motto be ā˜coordinates and matrices for calculation, vectors and lin-

ear maps for understandingā™.

6.2 The operator norm and the chain rule

We shall need some method of measuring the ā˜sizeā™ of a linear map. The

reader is unlikely to have come across this in a standard ā˜abstract algebraā™

course, since algebraists dislike using ā˜metric notionsā™ which do not generalise

from R to more general ļ¬elds.

Our ļ¬rst idea might be to use some sort of measure like

= max |aij |

Ī±

where (aij ) is the matrix of Ī± with respect to the standard bases. However

Ī± has no geometric meaning.

Exercise 6.2.1. Show by example that Ī± may depend on the coordinate

axes chosen.

Even if we insist that our method of measuring the size of a linear map

shall have a geometric meaning, this does not give a unique method. The

following chain of ideas gives one method which is natural and standard.

128 A COMPANION TO ANALYSIS

Lemma 6.2.2. If Ī± : Rm ā’ Rp is linear, there exists a constant K(Ī±) such

that

Ī±x ā¤ K(Ī±) x

for all x ā Rm .

ńņš. 4 |