ńņš. 5 |

ļ¬nd a ā˜goodā™ value, we can use the crudest inequalities.

If we write y = Ī±x, we have

p

Ī±x = y ā¤ |yi |

i=1

p m

ā¤ |aij ||xj |

i=1 j=1

p m

ā¤ |aij | x .

i=1 j=1

p m

|aij |.

The required result follows on putting K(Ī±) = i=1 j=1

Exercise 6.2.3. Use Lemma 6.2.2 to estimate Ī±x ā’ Ī±y and hence deduce

that every linear map Ī± : Rm ā’ Rp is continuous. (This exercise takes longer

to pose than to do.)

Lemma 6.2.2 tells us that { Ī±x : x ā¤ 1} is a non-empty subset of R

bounded above by K(Ī±) and so has a supremum.

Deļ¬nition 6.2.4. If Ī± : Rm ā’ Rp is a linear map, then

Ī± = sup Ī±x .

x ā¤1

Exercise 6.2.5. If Ī± is as in Deļ¬nition 6.2.4, show that the three quantities

Ī±x

sup Ī±x , sup Ī±x , and sup

x

x=0

x ā¤1 x =1

are well deļ¬ned and equal.

The ā˜operator normā™ just deļ¬ned in Deļ¬nition 6.2.4 has many pleasant

properties.

129

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 6.2.6. Let Ī±, Ī² : Rm ā’ Rp be linear maps.

(i) If x ā Rm then Ī±x ā¤ Ī± x .

(ii) Ī± ā„ 0,

(iii) If Ī± = 0 then Ī± = 0,

(iv) If Ī» ā R then Ī»Ī± = |Ī»| Ī± .

(v) (The triangle inequality) Ī± + Ī² ā¤ Ī± + Ī² .

(vi) If Ī³ : Rp ā’ Rq is linear, then Ī³Ī± ā¤ Ī³ Ī± .

Proof. I will prove parts (i) and (vi) leaving the equally easy remaining parts

as an essential exercise for the reader.

(i) If x = 0, we observe that Ī±0 = 0 and so

Ī±0 = 0 = 0 ā¤ 0 = Ī± 0 = Ī± 0

as required.

ā’1

If x = 0, we set u = x x. Since

ā’1

u=x x =1

we have Ī±u ā¤ Ī± and so

Ī±x = Ī±( x u) = ( x Ī±u) = x Ī±u ā¤ Ī± x

as required.

(vi) If x ā¤ 1 then, using part (i) twice,

Ī³Ī±(x) = Ī³(Ī±(x)) ā¤ Ī³ Ī±(x) ā¤ Ī³ xā¤Ī³

Ī± Ī±.

It follows that

ā¤Ī³

Ī³Ī± = sup Ī³Ī±(x) Ī±.

x ā¤1

Exercise 6.2.7. (i) Write down a linear map Ī± : R2 ā’ R2 such that Ī± = 0

but Ī±2 = 0.

(ii) Show that we cannot replace the inequality (vi) in Lemma 6.2.6 by an

equality.

(iii) Show that we cannot replace the inequality (v) in Lemma 6.2.6 by an

equality.

130 A COMPANION TO ANALYSIS

Exercise 6.2.8. (i) Suppose that Ī± : R ā’ R is a linear map and that its

matrix with respect to the standard bases is (a). Show that

Ī± = |a|.

(ii) Suppose that Ī± : Rm ā’ R is a linear map and that its matrix with re-

spect to the standard bases is (a1 a2 . . . am ). By using the Cauchy-Schwarz

inequality (Lemma 4.1.2) and the associated conditions for equality (Exer-

cise 4.1.5 (i)) show that

1/2

m

a2

Ī±= .

j

j=1

Although the operator norm is, in principle, calculable (see Exercises K.98

to K.101) the reader is warned that, except in special cases, there is no simple

formula for the operator norm and it is mainly used as a theoretical tool.

Should we need to have some idea of its size, extremely rough estimates will

often suļ¬ce.

Exercise 6.2.9. Suppose that Ī± : Rm ā’ Rp is a linear map and that its

matrix with respect to the standard bases is A = (aij ). Show that

max |aij | ā¤ Ī± ā¤ pm max |aij |.

i,j i,j

By using the Cauchy-Schwarz inequality, show that

1/2

p m

a2

Ī±ā¤ .

ij

i=1 j=1

Show that this inequality implies the corresponding inequality in the previous

paragraph.

We now return to diļ¬erentiation. Suppose that f : Rm ā’ Rp and g :

Rp ā’ Rq are diļ¬erentiable. What can we say about their composition g ā—¦ f ?

To simplify the algebra let us suppose that f (0) = 0, g(0) = 0 (so g ā—¦ f (0) =

0) and ask about the diļ¬erentiability of gā—¦f at 0. Suppose that the derivative

of f at 0 is Ī± and the derivative of g at 0 is Ī². Then

f (h) ā Ī±h

when h is small (h ā Rm ) and

g(k) ā Ī²k

131

Please send corrections however trivial to twk@dpmms.cam.ac.uk

when k is small (k ā Rp ). It ought, therefore, to to be true that

g(f (h)) ā Ī²(Ī±h)

i.e. that

g ā—¦ f (h) ā (Ī²Ī±)h

when h is small (h ā Rm ). In other words g ā—¦ f is diļ¬erentiable at 0.

We have been lead to formulate the chain rule.

Lemma 6.2.10. (The chain rule.) Let U be a neighbourhood of x in Rm ,

and V a neighbourhood of y in Rp . Suppose that f : U ā’ V is diļ¬erentiable

at x with derivative Ī±, that g : V ā’ Rq is diļ¬erentiable at y with derivative

Ī² and that f (x) = y. Then g ā—¦ f is diļ¬erentiable at x with derivative Ī²Ī±.

In more condensed notation

D(g ā—¦ f )(x) = Dg(f (x))Df (x),

or, equivalently,

D(g ā—¦ f )(x) = (Dg) ā—¦ f (x)Df (x).

Proof. We know that

f (x + h) = f (x) + Ī±h + 1 (h) h

and

g(f (x) + k) = g(f (x)) + Ī²k + 2 (k) k

ā’ 0 as h ā’ 0 and ā’ 0 as k ā’ 0. It follows that

where 1 (h) 2 (k)

g ā—¦ f (x + h) = g(f (x + h))

= g(f (x) + Ī±h + 1 (h) h)

so, taking k = Ī±h + 1 (h) h , we have

g ā—¦ f (x + h) = g(f (x)) + Ī²(Ī±h + 1 (h) h ) + 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h

= g ā—¦ f (x) + Ī²Ī±h + Ī·(h) h

with

Ī·(h) = Ī· 1 (h) + Ī· 2 (h)

132 A COMPANION TO ANALYSIS

where

Ī· 1 (h) h = Ī² 1 (h) h

and

Ī· 2 (h) h = 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h.

All we have to do is to show that Ī· 1 (h) and Ī· 2 (h) , and so Ī·(h) =

Ī· 1 (h) + Ī· 2 (h) tend to zero as h ā’ 0. We observe ļ¬rst that

ā¤Ī²

Ī· 1 (h) h 1 (h) h =Ī² 1 (h) h

so Ī· 1 (h) ā¤ Ī² ā’ 0 as h ā’ 0. Next we observe that

1 (h)

Ī· 2 (h) h = 2 (Ī±h + 1 (h) h ) Ī±h + 1 (h) h

ā¤ 2 (Ī±h + 1 (h) h ) ( Ī±h + 1 (h) h )

ā¤ 2 (Ī±h + 1 (h) h ) ( Ī± + 1 (h) ) h ,

so that

Ī· 2 (h) ā¤ )ā’0

2 (Ī±h + 1 (h) h)(Ī± + 1 (h)

as h ā’ 0 and we are done.

Students sometimes say that the proof of the chain rule is diļ¬cult but

they really mean that it is tedious. It is simply a matter of showing that

the error terms Ī· 1 (h) h and Ī· 2 (h) h which ought to be small, actually

are. Students also forget the artiļ¬ciality of the standard proofs of the one

dimensional chain rule (see the discussion of Lemma 5.6.2 ā” any argument

which Hardy got wrong cannot be natural). The multidimensional argument

forces us to address the real nature of the chain rule.

The next result is very simple but I would like to give two diļ¬erent proofs.

Lemma 6.2.11. Let U be a neighbourhood of x in Rn . Suppose that f , g :

U ā’ Rm are diļ¬erentiable at x. Then f + g is diļ¬erentiable at x with

D(f + g)(x) = Df (x) + Dg(x).

Direct proof. By deļ¬nition

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(x + h) = g(x) + Dg(x)h + 2 (h) h

133

Please send corrections however trivial to twk@dpmms.cam.ac.uk

ā’ 0 and ā’ 0 as h ā’ 0. Thus

where 1 (h) 2 (h)

(f + g)(x + h) = f (x + h) + g(x + h)

= f (x) + Df (x)h + 1 (h) h + g(x) + Dg(x)h + 2 (h) h

= (f + g)(x) + (Df (x) + Dg(x))h + 3 (h) h

with

3 (h) = 1 (h) + 2 (h).

Since

ā¤ ā’ 0 + 0 = 0,

3 (h) 1 (h) + 2 (h)

as h ā’ 0, we are done.

Our second proof depends on a series of observations.

Lemma 6.2.12. A linear map Ī± : Rn ā’ Rm is everywhere diļ¬erentiable

with derivative Ī±.

Proof. Observe that

Ī±(x + h) = Ī±x + Ī±h + (h) h ,

where (h) = 0, and apply the deļ¬nition.

As the reader can see, the result and proof are trivial, but they take some

getting used to. In one dimension the result says that the map given by

x ā’ ax has derivative x ā’ ax (or that the tangent to the line y = ax is the

line y = ax itself, or that the derivative of the linear map with 1 Ć— 1 matrix

(a) is the linear map with matrix (a)).

Exercise 6.2.13. Show that the constant map fc : Rn ā’ Rm , given by

fc (x) = c for all x, is everywhere diļ¬erentiable with derivative the zero linear

map.

Lemma 6.2.14. Let U be a neighbourhood of x in Rn and V a neighbourhood

of y in Rm . Suppose that f : U ā’ Rp is diļ¬erentiable at x and g : V ā’ Rq

is diļ¬erentiable at y. Then U Ć— V is a neighbourhood of (x, y) in Rn+m and

the function (f , g) : U Ć— V ā’ Rp+q given by

(f , g)(u, v) = (f (u), g(v))

is diļ¬erentiable at (x, y) with derivative (Df (x), Dg(x)) where we write

(Df (x), Dg(x))(h, k) = (Df (x)h, Dg(x)k).

134 A COMPANION TO ANALYSIS

Proof. We leave some details (such as verifying that U Ć—V is a neighbourhood

of (x, y)) to the reader. The key to the proof is the remark that (h, k) ā„

h , k . Observe that, if we write

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(y + k) = g(y) + Dg(y)k + 2 (k) k,

we have

(f , g)((x, y) + (h, k)) = (f , g)(x, y) + (Df (x), Dg(x))(h, k) + (h, k) (h, k)

where

(h, k) (h, k) = 1 (h) h+ 2 (k) k.

Using the last equation, we obtain

(h, k) (h, k) = ( (h, k) (h, k) ) = ( 1 (h) h + 2 (k) k )

ā¤ ( 1 (h) h ) + ( 2 (k) k ) ā¤ 1 (h) (h, k) + 2 (k) (h, k) .

Thus

(h, k) ā¤ ā’0+0=0

1 (h) + 2 (k)

as (h, k) ā’ 0.

Exercise 6.2.15. If h ā Rn and k ā Rm , show that

2 2 2

(h, k) =h +k

and

h + k ā„ (h, k) ā„ h , k.

Exercise 6.2.16. Consider the situation described in Lemma 6.2.14. Write

down the Jacobian matrix of partial derivatives for (f , g) in terms of the

Jacobian matrices for f and g.

We can now give a second proof of Lemma 6.2.11 using the chain rule.

135

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Second proof of Lemma 6.2.11. Let Ī± : Rn ā’ R2n be the map given by

Ī±(x) = (x, x)

and Ī² : R2m ā’ Rm be the map given by

Ī²(x, y) = x + y.

Then, using the notation of Lemma 6.2.14,

f + g = Ī² ā—¦ (f , g) ā—¦ Ī±.

But Ī± and Ī² are linear, so using the chain rule (Lemma 6.2.10), we see that

f + g is diļ¬erentiable at x and

D(f + g)(x) = Ī² ā—¦ D(f , g)(x, x) ā—¦ Ī± = Df (x) + Dg(x).

If we only used this idea to prove Lemma 6.2.11 it would hardly be worth

it but it is frequently easiest to show that a complicated function is diļ¬eren-

tiable by expressing it as the composition of simpler diļ¬erentiable functions.

(How else would one prove that x ā’ sin(exp(1 + x2 )) is diļ¬erentiable?)

Exercise 6.2.17. (i) Show that the function J : Rn Ć— Rn ā’ R given by the

scalar product

J(u, v) = u Ā· v

is everywhere diļ¬erentiable with

DJ(x, y)(h, k) = x Ā· k + y Ā· h.

(ii) Let U be a neighbourhood of x in Rn . Suppose that f , g : U ā’ Rm

are diļ¬erentiable at x. Show, using the chain rule, that f Ā· g is diļ¬erentiable

at x with

D(f Ā· g)(x)h = f (x) Ā· (D(g)(x)h) + (D(f )(x)h) Ā· g(x).

(iii) Let U be a neighbourhood of x in Rn . Suppose that f : U ā’ Rm

and Ī» : U ā’ R are diļ¬erentiable at x. State and prove an appropriate result

about the function Ī»f given by

(Ī»f )(u) = Ī»(u)f (u).

136 A COMPANION TO ANALYSIS

(iv) If you have met the vector product1 u ā§ v of two vectors u, v ā R3 ,

state and prove an appropriate theorem about the vector product of diļ¬eren-

tiable functions.

(v) Let U be a neighbourhood of x in Rn . Suppose that f : U ā’ R is

non-zero on U and diļ¬erentiable at x. Show that 1/f is diļ¬erentiable at x

and ļ¬nd D(1/f )x.

6.3 The mean value inequality in higher di-

mensions

So far our study of diļ¬erentiation in higher dimensions has remained on

the level of mere algebra. (The deļ¬nition of the operator norm used the

supremum and so lay deeper but we could have avoided this at the cost of

using a less natural norm.) The next result is a true theorem of analysis.

Theorem 6.3.1. (The mean value inequality.) Suppose that U is an

open set in Rm and that f : U ā’ Rp is diļ¬erentiable. Consider the straight

line segment

L = {(1 ā’ t)a + tb : 0 ā¤ t ā¤ 1}

joining a and b. If L ā U (i.e. L lies entirely within U ) and Df (x) ā¤ K

for all x ā L, then

f (a) ā’ f (b) ā¤ K a ā’ b .

Proof. Before starting the proof, it is helpful to note that, since U is open,

we can ļ¬nd a Ī· > 0 such that the extended straight line segment

{(1 ā’ t)a + tb : ā’Ī· ā¤ t ā¤ 1 + Ī·} ā U.

We shall prove our many dimensional mean value inequality from the

one dimensional version (Theorem 1.7.1, or if the reader prefers, the slightly

sharper Theorem 4.4.1). To this end, observe that, if f (b) ā’ f (a) = 0, there

is nothing to prove. We may thus assume that f (b) ā’ f (a) = 0 and consider

f (b) ā’ f (a)

u= ,

f (b) ā’ f (a)

1

Question What do you get if you cross a mountaineer with a mosquito? Answer You

canā™t. One is a scaler and the other is a vector.

137

Please send corrections however trivial to twk@dpmms.cam.ac.uk

the unit vector in the direction f (b)ā’f (a). If we now deļ¬ne g : (ā’Ī·, 1+Ī·) ā’

R by

g(t) = u Ā· f ((1 ā’ t)a + tb) ā’ f (a) ,

we see, by using the chain rule or direct calculation, that g is continuous and

diļ¬erentiable on (ā’Ī·, 1 + Ī·) with

g (t) = u Ā· (Df ((1 ā’ t)a + tb)(b ā’ a)).

Using the Cauchy-Schwarz inequality (Lemma 4.1.2) and the deļ¬nition of

the operator norm (Deļ¬nition 6.2.4), we have

|g (t)| ā¤ u Df ((1 ā’ t)a + tb)(b ā’ a)

= Df ((1 ā’ t)a + tb)(b ā’ a)

ā¤ Df ((1 ā’ t)a + tb) b ā’ a

ā¤K aā’b .

for all t ā (0, 1). Thus, by the one dimensional mean value inequality,

f (a) ā’ f (b) = |g(1) ā’ g(0)| ā¤ K a ā’ b

as required.

Exercise 6.3.2. (i) Prove the statement of the ļ¬rst sentence in the proof

just given.

(ii) If g is the function deļ¬ned in the proof just given, show, giving all

the details, that g is continuous and diļ¬erentiable on (ā’Ī·, 1 + Ī·) with

g (t) = u Ā· Df ((1 ā’ t)a + tb)(b ā’ a) .

You should give two versions of the proof, the ļ¬rst using the chain rule

(Lemma 6.2.10) and the second using direct calculation.

If we have already gone to the trouble of proving the one-dimensional

mean value inequality it seems sensible to make use of it in proving the mul-

tidimensional version. However, we could have proved the multidimensional

theorem directly without making a one-dimensional detour.

Exercise 6.3.3. (i) Reread the proof of Theorem 1.7.1.

(ii) We now start the direct proof of Theorem 6.3.1. As before observe

that we can ļ¬nd a Ī· > 0 such that

{(1 ā’ t)a + tb : ā’Ī· ā¤ t ā¤ 1 + Ī·} ā U,

138 A COMPANION TO ANALYSIS

but now consider F : (ā’Ī·, 1 + Ī·) ā’ Rp by

F(t) = f ((1 ā’ t)a + tb) ā’ f (a).

Explain why the theorem will follow if we can show that, given any > 0, we

have

F(1) ā’ F(0) ā¤ K a ā’ b + .

(ii) Suppose, if possible, that there exists an > 0 such that

F(1) ā’ F(0) ā„ K a ā’ b + .

Show by a lion hunting argument that there exist a c ā [0, 1] and un , vn ā

[0, 1] with un < vn such that un , vn ā’ c and

F(vn ) ā’ F(un ) ā„ (K a ā’ b + )(vn ā’ un ).

(iii) Show from the deļ¬nition of diļ¬erentiability that there exists a Ī“ > 0

such that

F(t) ā’ F(c) < (K a ā’ b + /2)|t ā’ c|

whenever |t ā’ c| < Ī“ and t ā [0, 1].

(iv) Prove Theorem 6.3.1 by reductio ad absurdum.

One of the principal uses we made of the one dimensional mean value

theorem was to show that a function on an open interval with zero derivative

was necessarily constant. The reader should do both parts of the following

easy exercise and reļ¬‚ect on them.

Exercise 6.3.4. (i) Let U be an open set in Rm such that given any a, b ā U

we can ļ¬nd a ļ¬nite sequence of points a = a0 , a1 , . . . , akā’1 , ak = b such

that each line segment

{(1 ā’ t)ajā’1 + taj : 0 ā¤ t ā¤ 1} ā U

[1 ā¤ j ā¤ k]. Show that, if f : U ā’ Rp is everywhere diļ¬erentiable on U with

Df (x) = 0, it follows that f is constant.

(ii) We work in R2 . Let U1 be the open disc of radius 1 centre (ā’2, 0)

and U2 be the open disc of radius 1 centre (2, 0). Set U = U1 āŖ U2 . Deļ¬ne

f : U ā’ R by f (x) = ā’1 for x ā U1 , f (x) = 1 for x ā U2 . Show that f is

everywhere diļ¬erentiable on U with D(f )(x) = 0 but f is not constant.

139

Please send corrections however trivial to twk@dpmms.cam.ac.uk

The reader may ask if we can obtain an improvement to our mean value

inequality by some sort of equality along the lines of Theorem 4.4.1. The

answer is a clear no.

Exercise 6.3.5. Let f : R ā’ R2 be given by f (t) = (cos t, sin t)T . Compute

the Jacobian matrix of partial derivatives for f and show that f (0) = f (2Ļ)

but Df (t) = 0 for all t.

(Although Exercise K.102 is not a counter example it points out another

problem which occurs when we work in many dimensions.)

It is fairly obvious that we cannot replace the line segment L in Theo-

rem 6.3.1 by other curves without changing the conclusion.

Exercise 6.3.6. Let

U = {x ā R2 : x > 1} \ {(x, 0)T : x ā¤ 0}

If we take Īø(x) to be the unique solution of

x y

, ā’Ļ < Īø(x) < Ļ

cos(Īø(x)) = , sin(Īø(x)) = 2

(x2 + y 2 )1/2 (x + y 2 )1/2

for x = (x, y)T ā U , show that Īø : U ā’ R is everywhere diļ¬erentiable

with DĪø(x) < 1. (The amount of work involved in proving this depends

quite strongly on how clever you are in exploiting radial symmetry.) Show,

however, that if a = (ā’1, 10ā’1 )T , b = (ā’1, ā’10ā’1 )T , then

|Īø(a) ā’ Īø(b)| > a ā’ b .

It is clear (though we shall not prove it, and, indeed, cannot yet state it

without using concepts which we have not formally deļ¬ned) that the correct

generalisation when L is not a straight line will run as follows. ā˜If L is a well

behaved path lying entirely within U and Df (x) ā¤ K for all x ā L then

f (a) ā’ f (b) ā¤ K Ć— length Lā™.

Chapter 7

Local Taylor theorems

7.1 Some one dimensional Taylor theorems

By deļ¬nition, a function f : R ā’ R which is continuous at 0 looks like a

constant function near 0, in the sense that

f (t) = f (0) + (t)

where (t) ā’ 0 as t ā’ 0. By deļ¬nition, again, a function f : R ā’ R which

is diļ¬erentiable at 0 looks like a linear function near 0, in the sense that

f (t) = f (0) + f (0)t + (t)|t|

where (t) ā’ 0 as t ā’ 0. The next exercise establishes the non-trivial

theorem that a function f : R ā’ R, which is n times diļ¬erentiable in a

neighbourhood of 0 and has f (n) continuous at 0, looks like a polynomial of

degree n near 0, in the sense that

f (n) (0) n

f (0) 2

t + (t)|t|n

t + Ā·Ā·Ā· +

f (t) = f (0) + f (0)t +

2! n!

where (t) ā’ 0 as t ā’ 0.

This exercise introduces several ideas which we use repeatedly in this

chapter so the reader should do it carefully.

Exercise 7.1.1. In this exercise we consider functions f, g : (ā’a, a) ā’ R

where a > 0.

(i) If f and g are diļ¬erentiable with f (t) ā¤ g (t) for all 0 ā¤ t < a and

f (0) = g(0), explain why f (t) ā¤ g(t) for all 0 ā¤ t < a.

(ii) If |f (t)| ā¤ |t|r for all t ā (ā’a, a) and f (0) = 0, show that |f (t)| ā¤

|t|r+1 /(r + 1) for all |t| < a.

141

142 A COMPANION TO ANALYSIS

(iii) If g is n times diļ¬erentiable with |g (n) (t)| ā¤ M for all t ā (ā’a, a)

and g(0) = g (0) = Ā· Ā· Ā· = g (nā’1) (0) = 0, show that

M |t|n

|g(t)| ā¤

n!

for all |t| < a.

(iv) If g is n times diļ¬erentiable in (ā’a, a) and g(0) = g (0) = Ā· Ā· Ā· =

g (n) (0) = 0, show, using (iii), that, if g (n) is continuous at 0, then

Ī·(t)|t|n

|g(t)| ā¤

n!

where Ī·(t) ā’ 0 as t ā’ 0.

(v) If f is n times diļ¬erentiable with |f (n) (t)| ā¤ M for all t ā (ā’a, a),

show that

nā’1

f (j) (0) j M |t|n

f (t) ā’ tā¤

j! n!

j=0

for all |t| < a.

(vi) If f is n times diļ¬erentiable in (ā’a, a), show that, if f (n) is contin-

uous at 0, then

n

f (j) (0) j Ī·(t)|t|n

f (t) ā’ tā¤

j! n!

j=0

where Ī·(t) ā’ 0 as t ā’ 0.

Restating parts (v) and (vi) of Exercise 7.1.1 we get two similar looking

but distinct theorems.

Theorem 7.1.2. (A global Taylorā™s theorem.) If f : (ā’a, a) ā’ R is n

times diļ¬erentiable with |f (n) (t)| ā¤ M for all t ā (ā’a, a), then

nā’1

f (j) (0) j M |t|n

f (t) ā’ tā¤ .

j! n!

j=0

Theorem 7.1.3. (The local Taylorā™s theorem). If f : (ā’a, a) ā’ R is n

times diļ¬erentiable and f (n) is continuous at 0, then

n

f (j) (0) j

t + (t)|t|n

f (t) =

j!

j=0

where (t) ā’ 0 as t ā’ 0.

143

Please send corrections however trivial to twk@dpmms.cam.ac.uk

We shall obtain other and more precise global Taylor theorems in the

course of the book (see Exercise K.49 and Theorem 8.3.20) but Theorem 7.1.2

is strong enough for the following typical applications.

Exercise 7.1.4. (i) Consider a diļ¬erentiable function e : R ā’ R which

obeys the diļ¬erential equation e (t) = e(t) with the initial condition e(0) = 1.

Quote a general theorem which tells you that, if a > 0, there exists an M

with |e(t)| ā¤ M for |t| ā¤ a. Show that

nā’1

tj M |t|n

e(t) ā’ ā¤

j! n!

j=0

for all |t| < a. Deduce that

nā’1

tj

ā’ e(t)

j!

j=0

as n ā’ ā, and so

ā

tj

e(t) =

j!

j=0

for all t.

(ii) Consider diļ¬erentiable functions s, c : R ā’ R which obey the diļ¬er-

ential equations s (t) = c(t), c (t) = ā’s(t) with the initial conditions s(0) = 0,

c(0) = 1. Show that

ā

(ā’1)j t2j+1

s(t) =

(2j + 1)!

j=0

for all t and obtain a similar result for c.

However, in this chapter we are interested in the local behaviour of func-

tions and therefoe in the local Taylor theorem. The distinction between local

and global Taylor expansion is made in the following very important example

of Cauchy.

Example 7.1.5. Consider the function F : R ā’ R deļ¬ned by

F (0) = 0

F (x) = exp(ā’1/x2 ) otherwise.

144 A COMPANION TO ANALYSIS

(i) Prove by induction, using the standard rules of diļ¬erentiation, that F

is inļ¬nitely diļ¬erentiable at all points x = 0 and that, at these points,

F (n) (x) = Pn (1/x) exp(ā’1/x2 )

where Pn is a polynomial which need not be found explicitly.

(ii) Explain why xā’1 Pn (1/x) exp(ā’1/x2 ) ā’ 0 as x ā’ 0.

(iii) Show by induction, using the deļ¬nition of diļ¬erentiation, that F is

inļ¬nitely diļ¬erentiable at 0 with F (n) (0) = 0 for all n. [Be careful to get this

part of the argument right.]

(iv) Show that

ā

F (j) (0) j

F (x) = x

j!

j=0

if and only if x = 0. (The reader may prefer to say that ā˜The Taylor expansion

of F is only valid at 0ā™.)

(v) Why does part (iv) not contradict the local Taylor theorem (Theo-

rem 7.1.3)?

[We give a diļ¬erent counterexample making use of uniform convergence in

Exercise K.226.]

Example 7.1.6. Show that, if we deļ¬ne E : R ā’ R by

if x ā¤ 0

E(x) = 0

E(x) = exp(ā’1/x2 ) otherwise,

then E is an inļ¬nitely diļ¬erentiable function with E(x) = 0 for x ā¤ 0 and

E(x) > 0 for x > 0

Cauchy gave his example to show that we cannot develop the calculus

algebraically but must use , Ī“ techniques. In later courses the reader will

see that his example encapsulates a key diļ¬erence between real and complex

analysis. If the reader perseveres further with mathematics she will also ļ¬nd

the function E playing a useful rĖle in distribution theory and diļ¬erential

o

geometry.

A simple example of the use of the local Taylor theorem is given by the

proof of (a version of) Lā™HĖpitalā™s rule in the next exercise.

o

Exercise 7.1.7. If f, g : (ā’a, a) ā’ R are n times diļ¬erentiable and

f (0) = f (0) = Ā· Ā· Ā· = f nā’1 (0) = g(0) = g (0) = Ā· Ā· Ā· = g (nā’1) (0) = 0

145

Please send corrections however trivial to twk@dpmms.cam.ac.uk

but g (n) (0) = 0 then, if f (n) and g (n) are continuous at 0, it follows that

f (n) (0)

f (t)

ā’ (n)

g(t) g (0)

as t ā’ 0.

It should be pointed out that the local Taylor theorems of this chapter

(and the global ones proved elsewhere) are deep results which depend on

the fundamental axiom. The fact that we use mean value theorems to prove

them is thus not surprising ā” we must use the fundamental axiom or results

derived from it in the proof.

(Most of my readers will be prepared to accept my word for the statements

made in the previous paragraph. Those who are not will need to work through

the next exercise. The others may skip it.)

Exercise 7.1.8. Explain why we can ļ¬nd a sequence of irrational numbers

an such that 4ā’nā’1 < an < 4ā’n . We write I0 = {x ā Q : x > a0 } and

In = {x ā Q : an < x < anā’1 }

[n = 1, 2, 3, . . . ]. Check that, if x ā In , then 4ā’nā’1 < x < 4ā’n+1 [n ā„ 1].

We deļ¬ne f : Q ā’ Q by f (0) = 0 and f (x) = 8ā’n if |x| ā In [n ā„ 0]. In

what follows we work in Q.

(i) Show that

f (h) ā’ f (0)

ā’0

h

as h ā’ 0. Conclude that f is diļ¬erentiable at 0 with f (0) = 0.

(ii) Explain why f is everywhere diļ¬erentiable with f (x) = 0 for all x.

Conclude that f is inļ¬nitely diļ¬erentiable with f (r) = 0 for all r ā„ 0.

(iii) Show that

f (h) ā’ f (0)

ā’ā

h2

as h ā’ 0. Conclude that, if we write

f (0) 2

h + (h)h2 ,

f (h) = f (0) + f (0)h +

2!

0 as h ā’ 0. Thus the local Taylor theorem (Theorem 7.1.3) is

then (h)

false for Q.

146 A COMPANION TO ANALYSIS

7.2 Some many dimensional local Taylor the-

orems

In the previous section we used mean value inequalities to investigate the

local behaviour of well behaved functions f : R ā’ R. We now use the same

ideas to investigate the local behaviour of well behaved functions f : Rn ā’ R.

It turns out that, once we understand what happens when n = 2, it is easy

to extend the results to general n and this will be left to the reader.

Here is our ļ¬rst example.

Lemma 7.2.1. We work in R2 and write 0 = (0, 0).

(i) Suppose Ī“ > 0, and that f : B(0, Ī“) ā’ R has partial derivatives f ,1

and f,2 with |f,1 (x, y)|, |f,2 (x, y)| ā¤ M for all (x, y) ā B(0, Ī“). If f (0, 0) = 0,

then

|f (x, y)| ā¤ 2M (x2 + y 2 )1/2

for all (x, y) ā B(0, Ī“).

(ii) Suppose Ī“ > 0, and that g : B(0, Ī“) ā’ R has partial derivatives

g,1 and g,2 in B(0, Ī“). Suppose that g,1 and g,2 are continuous at (0, 0) and

g(0, 0) = g,1 (0, 0) = g,2 (0, 0) = 0. Then writing

g((h, k)) = (h, k)(h2 + k 2 )1/2

we have (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0.

Proof. (i) Observe that the one dimensional mean value inequality applied

to the function t ā’ f (x, t) gives

|f (x, y) ā’ f (x, 0)| ā¤ M |y|

whenever (x, y) ā B(0, Ī“) and the same inequality applied to the function

s ā’ f (s, 0) gives

|f (x, 0) ā’ f (0, 0)| ā¤ M |x|

whenever (x, 0) ā B(0, Ī“). We now apply a taxicab argument (the idea

behind the name is that a New York taxicab which wishes to get from (0, 0)

to (x, y) will be forced by the grid pattern of streets to go from (0, 0) to (x, 0)

and thence to (x, y)) to obtain

|f (x, y)| = |f (x, y) ā’ f (0, 0)| = |(f (x, y) ā’ f (x, 0)) + (f (x, 0) ā’ f (0, 0))|

ā¤ |f (x, y) ā’ f (x, 0)| + |f (x, 0) ā’ f (0, 0)| ā¤ M |y| + M |x|

ā¤ 2M (x2 + y 2 )1/2

147

Please send corrections however trivial to twk@dpmms.cam.ac.uk

for all (x, y) ā B(0, Ī“).

(ii) Let > 0 be given. By the deļ¬nition of continuity, we can ļ¬nd a Ī“1 ( )

such that Ī“ > Ī“1 ( ) > 0 and

|g,1 (x, y)|, |g,2 (x, y)| ā¤ /2

for all (x, y) ā B(0, Ī“1 ( )). By part (i), this means that

|g(x, y)| ā¤ (x2 + y 2 )1/2

for all (x, y) ā B(0, Ī“1 ( )) and this gives the desired result.

Theorem 7.2.2. (Continuity of partial derivatives implies diļ¬eren-

tiability.) Suppose Ī“ > 0, x = (x, y) ā R2 , B(x, Ī“) ā E ā R2 and that

f : E ā’ R. If the partial derivatives f,1 and f,2 exist in B(x, Ī“) and are

continuous at x, then, writing

f (x + h, y + k) = f (x, y) + f,1 (x, y)h + f,2 (x, y)k + (h, k)(h2 + k 2 )1/2 ,

we have (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0. (In other words, f is diļ¬erentiable

at x.)

Proof. By translation, we may suppose that x = 0. Now set

g(x, y) = f (x, y) ā’ f (0, 0) ā’ f,1 (0, 0)x ā’ f,2 (0, 0)y.

We see that g satisļ¬es the hypotheses of part (ii) of Lemma 7.2.1. Thus g

satisļ¬es the conclusions of part (ii) of Lemma 7.2.1 and our theorem follows.

Although this is not one of the great theorems of all time, it occasionally

provides a useful short cut for proving functions diļ¬erentiable1 . The following

easy extensions are left to the reader.

Theorem 7.2.3. (i) Suppose Ī“ > 0, x ā Rm , B(x, Ī“) ā E ā Rm and that

f : E ā’ R. If the partial derivatives f,1 , f,2 , . . . f,m exist in B(x, Ī“) and are

continuous at x, then f is diļ¬erentiable at x.

(ii) Suppose Ī“ > 0, x ā Rm , B(x, Ī“) ā E ā Rm and that f : E ā’ Rp . If

the partial derivatives fi,j exist in B(x, Ī“) and are continuous at x [1 ā¤ i ā¤

p, 1 ā¤ j ā¤ m], then f is diļ¬erentiable at x.

1

I emphasise the word occasionally. Usually, results like the fact that the diļ¬erentiable

function of a diļ¬erentiable function is diļ¬erentiable give a faster and more satisfactory

proof.

148 A COMPANION TO ANALYSIS

Similar ideas to those used in the proof of Theorem 7.2.2 give our next

result which we shall therefore prove more expeditiously. We write

f,ij (x) = (f,j ),i (x),

or, in more familiar notation,

ā‚2f

f,ij = .

ā‚xi ā‚xj

Theorem 7.2.4. (Second order Taylor series.) Suppose Ī“ > 0, x =

(x, y) ā R2 , B(x, Ī“) ā E ā R2 and that f : E ā’ R. If the partial derivatives

f,1 , f,2 , f,11 , f,12 , f,22 exist in B(x, Ī“) and f,11 , f,12 , f,22 are continuous at x,

then writing

f ((x + h, y + k)) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k

+ (f,11 (x, y)h2 + 2f,12 (x, y)hk + f,22 (x, y)k 2 )/2 + (h, k)(h2 + k 2 ),

we have (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0.

Proof. By translation, we may suppose that x = 0. By considering

f (h, k) ā’ f (0, 0) ā’ f,1 (0, 0)h ā’ f,2 (0, 0)k ā’ (f,11 (0, 0)h2 + 2f,12 (0, 0)hk + f,22 (0, 0)k 2 )/2,

we may suppose that

f (0, 0) = f,1 (0, 0) = f,2 (0, 0) = f,11 (0, 0) = f,12 (0, 0) = f,22 (0, 0).

If we do this, our task reduces to showing that

f (h, k)

ā’0

h2 + k 2

as (h2 + k 2 )1/2 ā’ 0.

To this end, observe that, if > 0, the continuity of the given partial

derivatives at (0, 0) tells us that we can ļ¬nd a Ī“1 ( ) such that Ī“ > Ī“1 ( ) > 0

and

|f,11 (h, k)|, |f,12 (h, k)|, |f,22 (h, k)| ā¤

for all (h, k) ā B(0, Ī“1 ( )). Using the mean value inequality in the manner

of Lemma 7.2.1, we have

|f,1 (h, k) ā’ f,1 (h, 0)| ā¤ |k|

149

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and

|f,1 (h, 0) ā’ f,1 (0, 0)| ā¤ |h|

and a taxicab argument gives

|f,1 (h, k)| = |f,1 (h, k) ā’ f,1 (0, 0)| = |(f,1 (h, k) ā’ f,1 (h, 0)) + (f,1 (h, 0) ā’ f,1 (0, 0))|

ā¤ |f,1 (h, k) ā’ f,1 (h, 0)| + |f,1 (h, 0) ā’ f,1 (0, 0)| ā¤ (|k| + |h|)

for all (h, k) ā B(0, Ī“1 ( )). (Or we could have just applied Lemma 7.2.1 with

f replaced by f,1 .) The mean value inequality also gives

|f,2 (0, k)| = |f,2 (0, k) ā’ f,2 (0, 0)| ā¤ |k|.

Now, applying the taxicab argument again, using the mean value inequal-

ity and the estimates of the ļ¬rst paragraph, we get

|f (h, k)| = |f (h, k) ā’ f (0, 0)| = |(f (h, k) ā’ f (0, k)) + (f (0, k) ā’ f (0, 0))|

ā¤ |f (h, k) ā’ f (0, k)| + |f (0, k) ā’ f (0, 0)|

ā¤ sup |f,1 (sh, k)||h| + sup |f,2 (0, tk)||k|

0ā¤sā¤1 0ā¤tā¤1

ā¤ (|k| + |h|)|h| + |k|2

ā¤ 3 (h2 + k 2 ).

Since was arbitrary, the result follows.

Exercise 7.2.5. Set out the proof of Theorem 7.2.4 in the style of the proof

of Theorem 7.2.2.

We have the following important corollary.

Theorem 7.2.6. (Symmetry of the second partial derivatives.) Sup-

pose Ī“ > 0, x = (x, y) ā R2 , B(x, Ī“) ā E ā R2 and that f : E ā’ R.

If the partial derivatives f,1 , f,2 , f,11 , f,12 , f,21 f,22 exist in B(x, Ī“) and are

continuous at x, then f,12 (x) = f,21 (x).

Proof. By Theorem 7.2.4, we have

f (x + h, y + k) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k

+ (f,11 (x, y)h2 + 2f,12 (x, y)hk + f,22 (x, y)k 2 )/2 + 2

+ k2)

1 (h, k)(h

with 1 (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0. But, interchanging the rĖle of ļ¬rst

o

and second variable, Theorem 7.2.4 also tells us that

f (x + h, y + k) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k

+ (f,11 (x, y)h2 + 2f,21 (x, y)hk + f,22 (x, y)k 2 )/2 + 2

+ k2)

2 (h, k)(h

150 A COMPANION TO ANALYSIS

with 2 (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0.

Comparing the two Taylor expansions for f (x + h, y + k), we see that

2

+ k2) = 2

+ k2)

f,12 (x, y)hk ā’ f,21 (x, y)hk = ( 1 (h, k) ā’ 2 (h, k))(h 3 (h, k)(h

ā’ 0 as (h2 + k 2 )1/2 ā’ 0. Taking h = k and dividing by h2 we

with 3 (h, k)

have

f,12 (x, y) ā’ f,21 (x, y) = 2 3 (h, h) ā’ 0

as h ā’ 0, so f,12 (x, y) ā’ f,21 (x, y) = 0 as required.

It is possible to produce plausible arguments for the symmetry of second

partial derivatives. Here are a couple.

(1) If f is a multinomial, i.e. f (x, y) = P Q pq

q=0 ap,q x y , then f,12 =

p=0

f,21 . But smooth functions are very close to being polynomial, so we would

expect the result to be true in general.

(2) Although we cannot interchange limits in general, it is plausible, that

if f is well behaved, then

f,12 (x, y) = lim lim hā’1 k ā’1 (f (x + h, y + k) ā’ f (x + h, y) ā’ f (x, y + k) + f (x, y))

hā’0 kā’0

= lim lim hā’1 k ā’1 (f (x + h, y + k) ā’ f (x + h, y) ā’ f (x, y + k) + f (x, y))

kā’0 hā’0

= f,21 (x, y).

However, these are merely plausible arguments. They do not make clear the

rĖle of the continuity of the second derivative (in Example 7.3.18 we shall see

o

that the result may fail for discontinuous second partial derivatives). More

fundamentally, they are algebraic arguments and, as the use of the mean value

theorem indicates, the result is one of analysis. The same kind of argument

which shows that the local Taylor theorem fails over Q (see Example 7.1.8)

shows that it fails over Q2 and that the symmetry of partial derivatives fails

with it (see [33]).

If we use the D notation, Theorem 7.2.6 states that (under appropriate

conditions)

D1 D2 f = D2 D1 f.

If we write Dij = Di Dj , as is often done, we get

D12 f = D21 f.

What happens if a function has higher partial derivatives? It is not hard

to guess and prove the appropriate theorem.

151

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 7.2.7. Suppose Ī“ > 0, x ā Rm B(x, Ī“) ā E ā Rm and that

f : E ā’ R. Show that, if all the partial derivatives f,j , f,jk , f,ijk , . . . up to

the nth order exist in B(x, Ī“) and are continuous at x, then, writing

m m m m m m

1 1

f (x + h) = f (x) + f,j (x)hj + f,jk (x)hj hk + f,jkl (x)hj hk hl

2! 3!

j=1 j=1 j=1 k=1 l=1

k=1

+ Ā· Ā· Ā· + sum up to nth powers + (h) h n ,

we have (h) ā’ 0 as h ā’ 0.

Notice that you do not have to prove results like

f,jkl (x) = f,ljk (x) = f,klj (x) = f,lkj (x) = f,jlk (x) = f,kjl (x)

since they follow directly from Theorem 7.2.6.

Applying Exercise 7.2.7 to the components fi of a function f , we obtain

our full many dimensional Taylor theorem.

Theorem 7.2.8 (The local Taylorā™s theorem). Suppose Ī“ > 0, x ā Rm ,

B(x, Ī“) ā E ā Rm and that f : E ā’ Rp . If all the partial derivatives fi,j ,

fi,jk , fi,jkl , . . . exist in B(x, Ī“) and are continuous at x, then, writing

m m m

1

fi (x + h) = fi (x) + fi,j (x)hj + fi,jk (x)hj hk

2!

j=1 j=1 k=1

m m m

1

+ fi,jkl (x)hj hk hl

3! j=1 k=1 l=1

+ Ā· Ā· Ā· + sum up to nth powers + i (h) h n ,

(h) ā’ 0 as h ā’ 0.

we have

The reader will remark that Theorem 7.2.8 bristles with subscripts, con-

trary to our announced intention of seeking a geometric, coordinate free view.

However, it is very easy to restate the main formula of Theorem 7.2.8 in a

coordinate free way as

f (x + h) = f (x) + Ī±1 (h) + Ī±2 (h, h) + Ā· Ā· Ā· + Ī±n (h, h, . . . , h) + (h) h n ,

where Ī±k : Rm Ć— Rm Ā· Ā· Ā· Ć— Rm ā’ Rp is linear in each variable (i.e. a k-

linear function) and symmetric (i.e. interchanging any two variables leaves

the value of Ī±k unchanged).

Anyone who feels that the higher derivatives are best studied using co-

ordinates should reļ¬‚ect that, if f : R3 ā’ R3 is well behaved, then the

152 A COMPANION TO ANALYSIS

ā˜third derivative behaviourā™ of f at a single point is apparently given by

the 3 Ć— 3 Ć— 3 Ć— 3 = 81 numbers fi,jkl (x). By symmetry (see Theorem 7.2.6)

only 30 of the numbers are distinct but these 30 numbers are independent

(consider polynomials in three variables for which the total degree of each

term is 3). How can we understand the information carried by an array of

30 real numbers?

Exercise 7.2.9. (i) Verify the statements in the last paragraph. How large

an array is required to give the ā˜third derivative behaviourā™ of a well behaved

function f : R4 ā’ R4 at a point? How large an array is required to give the

ā˜fourth derivative behaviourā™ of a well behaved function f : R3 ā’ R3 at a

point?

(ii) (Ignore this if the notation is not familiar.) Consider a well behaved

function f : R3 ā’ R3 . How large an array is required to give curl f = Ć— f

and div f = Ā· f ? How large an array is required to give Df ?

In many circumstances curl f and div f give the physically interesting part

of Df but physicists also use

3 3 3

(a Ā· )f = aj f1,j , aj f2,j , aj f3,j .

j=1 j=1 j=1

How large an array is required to give (a Ā· )f for all a ā R3 ?

In subjects like elasticity the description of nature requires the full Jaco-

bian matrix (fi,j ) and the treatment of diļ¬erentiation used is closer to that

of the pure mathematician.

Most readers will be happy to ļ¬nish this section here2 . However, some of

them3 will observe that in our coordinate free statement of the local Taylorā™s

theorem the ā˜second derivative behaviourā™ is given by a bilinear map Ī±2 :

Rm Ć— Rm ā’ Rp and we deļ¬ned derivatives in terms of linear maps.

Let us be more precise. We suppose f is a well behaved function on an

open set U ā Rp taking values in Rm . If we write L(E, F ) for the space

of linear maps from a ļ¬nite dimensional vector space E to a vector space F

then, for each ļ¬xed x ā U , we have Df (x) ā L(Rm , Rp ). Thus, allowing x to

vary freely, we see that we have a function

Df : U ā’ L(Rm , Rp ).

2

The rest of this section is marked with a ā™„.

3

Boas notes that ā˜There is a test for identifying some of the future professional math-

ematicians at an early age. These are students who instantly comprehend a sentence

beginning āLet X be an ordered quintuple (a, T, Ļ, Ļ, B) where . . . ā. They are even more

promising if they add, āI never really understood it before.ā ā™ ([8] page 231.)

153

Please send corrections however trivial to twk@dpmms.cam.ac.uk

We now observe that L(Rm , Rp ) is a ļ¬nite dimensional vector space over R

of dimension mp, in other words, L(Rm , Rp ) can be identiļ¬ed with Rmp . We

know how to deļ¬ne the derivative of a well behaved function g : U ā’ Rmp

at x as a function

Dg(x) ā L(Rm , Rmp )

so we know how to deļ¬ne the derivative of Df at x as a function

D(Df )(x) ā L(Rm , L(Rm , Rp )).

We have thus shown how to deļ¬ne the second derivative D 2 f (x) = D(Df )(x).

But D2 f (x) lies in L(Rm , L(Rm , Rp )) and Ī±2 lies in the space E(Rm , Rm ; Rp )

of bilinear maps from Rm Ć— Rm to Rp . How, the reader may ask, can we

identify L(Rm , L(Rm , Rp )) with E(Rm , Rm ; Rp )? Fortunately this question

answers itself with hardly any outside intervention.

Exercise 7.2.10. Let E, F and G be ļ¬nite dimensional vector spaces over

R. We write E(E, F ; G) for the space of bilinear maps Ī± : E Ć— F ā’ G.

Deļ¬ne

(Ī˜(Ī±)(u))(v) = Ī±(u, v)

for all Ī± ā E(E, F ; G), u ā E and v ā F .

(i) Show that Ī˜(Ī±)(u) ā L(F, G).

(ii) Show that, if v is ļ¬xed,

Ī˜(Ī±)(Ī»1 u1 + Ī»2 u2 ) (v) = Ī»1 Ī˜(Ī±)(u1 ) + Ī»2 Ī˜(Ī±)(u2 ) (v)

and deduce that

Ī˜(Ī±)(Ī»1 u1 + Ī»2 u2 ) = Ī»1 Ī˜(Ī±)(u1 ) + Ī»2 Ī˜(Ī±)(u2 )

for all Ī»1 , Ī»2 ā R and u1 , u2 ā E. Conclude that Ī˜(Ī±) ā L(E, L(F, G)).

(iii) By arguments similar in spirit to those of (ii), show that Ī˜ : E(E, F ; G) ā’

L(E, L(F, G)) is linear.

(iv) Show that if (Ī˜(Ī±)(u))(v) = 0 for all u ā E, v ā F , then Ī± = 0.

Deduce that Ī˜ is injective.

(v) By computing the dimensions of E(E, F ; G) and L(E, L(F, G)), show

that Ī˜ is an isomorphism.

Since our deļ¬nition of Ī˜ does not depend on a choice of basis, we say that

Ī˜ gives a natural isomorphism of E(E, F ; G) and L(E, L(F, G)). If we use

this isomorphism to identify E(E, F ; G) and L(E, L(F, G)) then D 2 f (x) ā

154 A COMPANION TO ANALYSIS

E(Rm , Rm ; Rp ). If we treat the higher derivatives in the same manner, the

central formula of the local Taylor theorem takes the satisfying form

12 1

D f (x)(h, h) + Ā· Ā· Ā· + Dn f (x)(h, h, . . . , h) + (h) h n .

f (x + h) = f (x) + Df (x)(h) +

2! n!

For more details, consult sections 11 and 13 of chapter VIII of DieudonnĀ“ā™s

e

Foundations of Modern Analysis [13] where the higher derivatives are dealt

with in a coordinate free way. Like Hardyā™s book [23], DieudonnĀ“ā™s is ae

4

masterpiece but in very diļ¬erent tradition .

7.3 Critical points

In this section we mix informal and formal argument, deliberately using

words like ā˜well behavedā™ without deļ¬ning them. Our object is to use the

local Taylor formula to produce results about maxima, minima and related

objects.

Let U be an open subset of Rm containing 0. We are interested in the

behaviour of a well behaved function f : U ā’ R near 0.

Since f is well behaved, the ļ¬rst order local Taylor theorem (which re-

duces to the deļ¬nition of diļ¬erentiation) gives

f (h) = f (0) + Ī±h + (h) h

where (h) ā’ 0 as h ā’ 0 and Ī± = Df (0) is a linear map from Rm to R.

By a very simple result of linear algebra, we can choose a set of orthogonal

coordinates so that Ī±(x1 , x2 , . . . , xm ) = ax1 with a ā„ 0.

Exercise 7.3.1. If Ī± : Rm ā’ R is linear show that, with respect to any

particular chosen orthogonal coordinates,

Ī±(x1 , x2 , . . . , xm ) = a1 x1 + a2 x2 + Ā· Ā· Ā· + am xm

for some aj ā R. Deduce that there is a vector a such that Ī±x = a Ā· x for all

x ā Rm . Conclude that we can choose a set of orthogonal coordinates so that

Ī±(x1 , x2 , . . . , xm ) = ax1 with a ā„ 0.

In applied mathematics we write a = f . A longer, but very instructive

proof, of the result of this exercise is given in Exercise K.31.

In the coordinate system just chosen

f (h1 , h2 , . . . , hm ) = f (0) + ah1 + (h) h

155

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Figure 7.1: Contour lines when the derivative is not zero.

where (h) ā’ 0 as h ā’ 0. Thus, speaking informally, if a = 0 the ā˜contour

linesā™ f (h) = c close to 0 will look like parallel ā˜hyperplanesā™ perpendicular to

the x1 axis. Figure 7.1 illustrates the case m = 2. In particular, our contour

lines look like those describing a side of a hill but not its peak.

Using our informal insight we can prove a formal lemma.

Lemma 7.3.2. Let U be an open subset of Rm containing x. Suppose that

f : U ā’ R is diļ¬erentiable at x. If f (x) ā„ f (y) for all y ā U then

Df (x) = 0 (more precisely, Df (x)h = 0 for all h ā Rm ).

Proof. There is no loss in generality in supposing x = 0. Suppose that

Df (0) = 0. Then we can ļ¬nd an orthogonal coordinate system and a strictly

positive real number a such that Df (0)(h1 , h2 , . . . , hn ) = ah1 . Thus, from

the deļ¬nition of the derivative,

f (h1 , h2 , . . . , hn ) = f (0) + ah1 + (h) h

where (h) ā’ 0 as h ā’ 0.

Choose Ī· > 0 such that, whenever h < Ī·, we have h ā U and (h) <

a/2. Now choose any real h with 0 < h < Ī·. If we set h = (h, 0, 0, . . . , 0), we

have

f (h) = f (0) + ah + (h)h > f (0) + ah ā’ ah/2 = f (0) + ah/2 > f (0).

The distinctions made in the following deļ¬nition are probably familiar to

the reader.

Deļ¬nition 7.3.3. Let E be a subset of Rm containing x and let f be a

function from E to R.

4

See the quotation from Boas in the previous footnote.

156 A COMPANION TO ANALYSIS

(i) We say that f has a global maximum at x if f (x) ā„ f (y) for all

y ā E.

(ii) We say that f has a strict global maximum at x if f (x) > f (y) for

all y ā E with x = y.

(iii) We say that f has a local maximum (respectively a strict local maxi-

mum) at x if there exists an Ī· > 0 such that the restriction of f to E ā©B(x, Ī·)

has a global maximum (respectively a strict global maximum) at x.

(iv) If we can ļ¬nd an Ī· > 0 such that E ā B(x, Ī·) and f is diļ¬erentiable

at x with Df (x) = 0, we say that x is a critical or stationary point 5 of f .

It is usual to refer to the point x where f takes a (global or local) maxi-

mum as a (global or local) maximum and this convention rarely causes con-

fusion. When mathematicians omit the words local or global in referring to

maximum they usually mean the local version (but this convention, which I

shall follow, is not universal).

Here are some easy exercises involving these ideas.

Exercise 7.3.4. (i) Let U be an open subset of Rm containing x. Suppose

that f : U ā’ R is diļ¬erentiable on U and that Df is continuous at x. Show

that, if f has a local maximum at x, then Df (x) = 0 .

(ii) Suppose that f : Rm ā’ R is diļ¬erentiable everywhere and E is a

closed subset of Rm containing x. Show that, even if x is a global maximum

of the restriction of f to E, it need not be true that Df (x) = 0. [Hint: We

have already met this fact when we thought about Rolleā™s theorem.] Explain

informally why the proof of Lemma 7.3.2 fails in this case.

(iii) State the deļ¬nitions corresponding to Deļ¬nition 7.3.3 that we need

to deal with minima.

(iv) Let E be any subset of Rm containing y and let f be a function from

E to R. If y is both a global maximum and a global minimum for f show that

f is constant. What can you say if we replace the word ā˜globalā™ by ā˜localā™ ?

We saw above how f behaved locally near 0 if Df (0) = 0. What can we

say if Df (0) = 0? In this case, the second order Taylor expansion gives

2

f (h) = f (0) + Ī²(h, h) + (h) h

where

m m

1

Ī²(h, h) = f,ij (0)hi hj

2 i=1 j=1

5

In other words a stationary point is one where the ground is ļ¬‚at. Since ļ¬‚at ground

drains badly, the stationary points we meet in hill walking tend to be boggy. Thus we

encounter boggy ground at the top of hills and when crossing passes as well as at lowest

points (at least in the UK, other countries may be drier or have better draining soils).

157

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Figure 7.2: Contour lines when the derivative is zero but the second derivative

is non-singular

and (h) ā’ 0 as h ā’ 0. We write Ī² = 1 D2 f and call the matrix

2

K = (f,ij (0)) the Hessian matrix. As we noted in the previous section,

the symmetry of the second partial derivatives (Theorem 7.2.6) tells us that

the Hessian matrix is a symmetric matrix and the associated bilinear map

D2 f is symmetric. It follows from a well known result in linear algebra (see

e.g. Exercise K.30) that Rn has an orthonormal basis of eigenvectors of K.

Choosing coordinate axes along those vectors, we obtain

m

2

Ī»i h 2

D f (h, h) = i

i=1

where the Ī»i are the eigenvalues associated with the eigenvectors.

In the coordinate system just chosen

m

1

Ī»i h2 + (h) h 2

f (h1 , h2 , . . . , hm ) = f (0) + i

2 i=1

where (h) ā’ 0 as h ā’ 0. Thus, speaking informally, if all the Ī»i are

non-zero, the ā˜contour linesā™ f (h) = c close to 0 will look like ā˜quadratic hy-

persurfacesā™ (that is m dimensional versions of conics). Figure 7.2 illustrates

the two possible contour patterns when m = 2. The ļ¬rst type of pattern is

that of a summit (if the contour lines are for increasing heights as we ap-

proach 0) or a bottom (lowest point)6 (if the contour lines are for decreasing

heights as we approach 0). The second is that of a pass (often called a sad-

dle). Notice that, for merchants, wishing to get from one valley to another,

the pass is the highest point in their journey but, for mountaineers, wishing

to get from one mountain to another, the pass is the lowest point.

6

The English language is rich in synonyms for highest points (summits, peaks, crowns,

. . . ) but has few for lowest points. This may be because the English climate ensures that

most lowest points are under water.

158 A COMPANION TO ANALYSIS

When looking at Figure 7.2 it is important to realise that the diļ¬erence

in heights of successive contour lines is not constant. In eļ¬ect we have drawn

contour lines at heights f (0), f (0)+Ī·, f (0)+22 Ī·, f (0)+32 Ī·, . . . , f (0)+n2 Ī·.

Exercise 7.3.5. (i) Redraw Figure 7.2 with contour lines at heights f (0),

f (0) + Ī·, f (0) + 2Ī·, f (0) + 3Ī·, . . . , f (0) + nĪ·.

(ii) What (roughly speaking) can you say about the diļ¬erence in heights

of successive contour lines in Figure 7.1?

Using our informal insight we can prove a formal lemma.

Lemma 7.3.6. Let U be an open subset of Rm containing x. Suppose that f :

U ā’ R has second order partial derivatives on U and these partial derivatives

are continuous at x. If Df (x) = 0 and D 2 f (x) is non-singular then

(i) f has a minimum at x if and only if D 2 f (x) is positive deļ¬nite.

(ii) f has a maximum at x if and only if D 2 f (x) is negative deļ¬nite.

The conditions of the second sentence of the hypothesis ensure that we

have a local second order Taylor expansion. In most applications f will be

much better behaved than this. We say that D 2 f (x) is positive deļ¬nite if all

the associated eigenvalues (that is all the eigenvalues of the Hessian matrix)

are strictly positive and that D 2 f (x) is negative deļ¬nite if all the associated

eigenvalues are strictly negative.

Exercise 7.3.7. Prove Lemma 7.3.6 following the style of the proof of Lemma 7.3.2.

It is a non-trivial task to tell whether a given Hessian is positive or neg-

ative deļ¬nite.

Exercise 7.3.8. Let f (x, y) = x2 + 6xy + y 2 . Show that Df (0, 0) = 0, that

all the entries in the Hessian matrix K at (0, 0) are positive and that K

is non-singular but that D 2 f (0, 0) is neither positive deļ¬nite nor negative

deļ¬nite. (So (0, 0) is a saddle point.)

Exercise K.105 gives one method of resolving the problem.

Because it is non-trivial to use the Hessian to determine whether a sin-

gular point, that is a point x where Df (x) = 0 is a maximum, a minimum

or neither, mathematicians frequently seek short cuts.

Exercise 7.3.9. Suppose that f : Rm ā’ R is continuous, that f (x) ā’ 0 as

x ā’ ā and that f (x) > 0 for all x ā Rm .

(i) Explain why there exists an R > 0 such that f (x) < f (0) for all

x ā„ R.

159

Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Explain why there exists an x0 with x0 ā¤ R and f (x0 ) ā„ f (x) for

all x ā¤ R.

(iii) Explain why f (x0 ) ā„ f (x) for all x ā Rm .

(iv) If f is everywhere diļ¬erentiable and has exactly one singular point

y0 show that f attains a global maximum at y0 .

(v) In statistics we frequently wish to maximise functions of the form

k

(yi ā’ ati ā’ b)2

f (a, b) = exp ā’ ,

i=1

with k ti = 0. Use the results above to ļ¬nd the values of a and b which

i=1

maximise f . (Of course, this result can be obtained without calculus but most

people do it this way.)

Mathematicians with a good understanding of the topic they are investi-

gating can use insight as a substitute for rigorous veriļ¬cation, but intuition

may lead us astray.

Exercise 7.3.10. Four towns lie on the vertices of a square of side a. What

is the shortest total length of a system of roads joining all four towns? (The

answer is given in Exercise K.107, but try to ļ¬nd the answer ļ¬rst before

looking it up.)

The following are standard traps for the novice and occasional traps for

the experienced.

(1) Critical points need not be maxima or minima.

(2) Local maxima and minima need not be global maxima or minima.

(3) Maxima and minima may occur on the boundary and may then not

be critical points. [We may restate this more exactly as follows. Suppose

f : E ā’ R. Unless E is open, f may take a maximum value at a point e ā E

such that we cannot ļ¬nd any Ī“ > 0 with B(e, Ī“) ā E. However well f is

behaved, the argument of Lemma 7.3.2 will fail. For a speciļ¬c instance see

Exercise 7.3.4.]

(4) A function need not have a maximum or minimum. [Consider f :

U ā’ R given by f (x, y) = x where U = B(0, 1) or U = R2 .]

Exercise 7.3.11. Find the maxima and minima of the function f : R2 ā’ R

given by

f (x, y) = y 2 ā’ x3 ā’ ax

in the region {(x, y) : x2 + y 2 ā¤ 1}.

Your answer will depend on the constant a.

160 A COMPANION TO ANALYSIS

Figure 7.3: Light paths in an ellipse

Matters are further complicated by the fact that diļ¬erent kinds of prob-

lems call for diļ¬erent kinds of solutions. The engineer seeks a global minimum

to the cost of a process. On the other hand if we drop a handful of ball bear-

ings on the ground they will end up at local minima (lowest points) and most

people suspect that evolutionary, economic and social changes all involve lo-

cal maxima and minima. Finally, although we like to think of many physical

processes as minimising some function, it is often the case they are really

stationarising (ļ¬nding critical points for) that function. We like to say that

light takes a shortest path, but, if you consider a bulb A at the centre of an

ellipse, light is reļ¬‚ected back to A from B and B , the two closest points on

the ellipse, and from C and C , the two furthest points (see Figure 7.3).

We have said that, if f : R2 ā’ R has a Taylor expansion in the neighbour-

hood of a point, then (ignoring the possibility that the Hessian is singular)

the contour map will look like that in Figures 7.1 or 7.2. But it is very

easy to imagine other contour maps and the reader may ask what happens

if the local contour map does not look like that in Figures 7.1 or 7.2. The

answer is that the appropriate Taylor expansion has failed and therefore the

hypotheses which ensure the appropriate Taylor expansion must themselves

have failed.

Exercise 7.3.12. Suppose that f : R2 ā’ R is given by f (0, 0) = 0 and

f (r cos Īø, r sin Īø) = rg(Īø)

when r > 0, where g : R ā’ R is periodic with period 2Ļ. [Informally, we

deļ¬ne f using polar coordinates.] Show that, if g(ā’Īø) = ā’g(Īø) for all Īø, then

f has directional derivatives (see Deļ¬nition 6.1.6) in all directions at (0, 0).

If we choose g(Īø) = sin Īø, we obtain a contour map like Figure 7.1, but,

if g(Īø) = sin 3Īø, we obtain something very diļ¬erent.

Exercise 7.3.13. We continue with the notation of Exercise 7.3.12.

161

Please send corrections however trivial to twk@dpmms.cam.ac.uk

(i) If g(Īø) = sin Īø, ļ¬nd f (x, y) and sketch the contour lines f (x, y) =

h, 2h, 3h, . . . with h small.

(ii) If g(Īø) = sin 3Īø, show that

y(3x2 ā’ y 2 )

f (x, y) =

x2 + y 2

for (x, y) = 0. Sketch the contour lines f (x, y) = h, 2h, 3h, . . . with h

small.

Example 7.3.14. If

y(3x2 ā’ y 2 )

f (x, y) = for (x, y) = (0, 0),

x2 + y 2

f (0, 0) = 0,

then f is diļ¬erentiable except at (0, 0), is continuous everywhere, has direc-

tional derivatives in all directions at (0, 0) but is not diļ¬erentiable at (0, 0).

Proof. By standard results on diļ¬erentiation (the chain rule, product rule

and so on), f is diļ¬erentiable (and so continuous) except, perhaps, at (0, 0).

If u2 + v 2 = 1 we have

f (uh, vh) ā’ f (0, 0)

ā’ v(3u2 ā’ v 2 )

h

as h ā’ 0, so f has directional derivatives in all directions at (0, 0). Since

4(max(|x|, |y|))3

|f (x, y) ā’ f (0, 0)| ā¤ = 4 max(|x|, |y|) ā’ 0

max(|x|, |y|))2

as (x2 + y 2 )1/2 ā’ 0, f is continuous at (0, 0).

Suppose f were diļ¬erentiable at (0, 0). Then

f (h, k) = f (0, 0) + Ah + Bk + (h, k)(h2 + k 2 )1/2

with (h, k) ā’ 0 as (h2 + k 2 )1/2 ā’ 0, and A = f,1 (0, 0), B = f,2 (0, 0). The

calculations of the previous paragraph with v = 0 show that f,1 (0, 0) = 0

and the same calculations with u = 0 show that f,2 (0, 0) = ā’1. Thus

f (h, k) + k = (h, k)(h2 + k 2 )1/2

and

f (h, k) + k

ā’0

(h2 + k 2 )1/2

162 A COMPANION TO ANALYSIS

ńņš. 5 |