<<

. 5
( 19)



>>

Proof. Since our object is merely to show that some K(±) exists and not to
¬nd a ˜good™ value, we can use the crudest inequalities.
If we write y = ±x, we have
p
±x = y ¤ |yi |
i=1
p m
¤ |aij ||xj |
i=1 j=1
p m
¤ |aij | x .
i=1 j=1

p m
|aij |.
The required result follows on putting K(±) = i=1 j=1

Exercise 6.2.3. Use Lemma 6.2.2 to estimate ±x ’ ±y and hence deduce
that every linear map ± : Rm ’ Rp is continuous. (This exercise takes longer
to pose than to do.)

Lemma 6.2.2 tells us that { ±x : x ¤ 1} is a non-empty subset of R
bounded above by K(±) and so has a supremum.

De¬nition 6.2.4. If ± : Rm ’ Rp is a linear map, then

± = sup ±x .
x ¤1


Exercise 6.2.5. If ± is as in De¬nition 6.2.4, show that the three quantities

±x
sup ±x , sup ±x , and sup
x
x=0
x ¤1 x =1


are well de¬ned and equal.

The ˜operator norm™ just de¬ned in De¬nition 6.2.4 has many pleasant
properties.
129
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 6.2.6. Let ±, β : Rm ’ Rp be linear maps.
(i) If x ∈ Rm then ±x ¤ ± x .
(ii) ± ≥ 0,
(iii) If ± = 0 then ± = 0,
(iv) If » ∈ R then »± = |»| ± .
(v) (The triangle inequality) ± + β ¤ ± + β .
(vi) If γ : Rp ’ Rq is linear, then γ± ¤ γ ± .

Proof. I will prove parts (i) and (vi) leaving the equally easy remaining parts
as an essential exercise for the reader.
(i) If x = 0, we observe that ±0 = 0 and so

±0 = 0 = 0 ¤ 0 = ± 0 = ± 0

as required.
’1
If x = 0, we set u = x x. Since
’1
u=x x =1

we have ±u ¤ ± and so

±x = ±( x u) = ( x ±u) = x ±u ¤ ± x

as required.
(vi) If x ¤ 1 then, using part (i) twice,

γ±(x) = γ(±(x)) ¤ γ ±(x) ¤ γ x¤γ
± ±.

It follows that

¤γ
γ± = sup γ±(x) ±.
x ¤1




Exercise 6.2.7. (i) Write down a linear map ± : R2 ’ R2 such that ± = 0
but ±2 = 0.
(ii) Show that we cannot replace the inequality (vi) in Lemma 6.2.6 by an
equality.
(iii) Show that we cannot replace the inequality (v) in Lemma 6.2.6 by an
equality.
130 A COMPANION TO ANALYSIS

Exercise 6.2.8. (i) Suppose that ± : R ’ R is a linear map and that its
matrix with respect to the standard bases is (a). Show that

± = |a|.

(ii) Suppose that ± : Rm ’ R is a linear map and that its matrix with re-
spect to the standard bases is (a1 a2 . . . am ). By using the Cauchy-Schwarz
inequality (Lemma 4.1.2) and the associated conditions for equality (Exer-
cise 4.1.5 (i)) show that
1/2
m
a2
±= .
j
j=1

Although the operator norm is, in principle, calculable (see Exercises K.98
to K.101) the reader is warned that, except in special cases, there is no simple
formula for the operator norm and it is mainly used as a theoretical tool.
Should we need to have some idea of its size, extremely rough estimates will
often su¬ce.
Exercise 6.2.9. Suppose that ± : Rm ’ Rp is a linear map and that its
matrix with respect to the standard bases is A = (aij ). Show that

max |aij | ¤ ± ¤ pm max |aij |.
i,j i,j

By using the Cauchy-Schwarz inequality, show that
1/2
p m
a2
±¤ .
ij
i=1 j=1

Show that this inequality implies the corresponding inequality in the previous
paragraph.
We now return to di¬erentiation. Suppose that f : Rm ’ Rp and g :
Rp ’ Rq are di¬erentiable. What can we say about their composition g —¦ f ?
To simplify the algebra let us suppose that f (0) = 0, g(0) = 0 (so g —¦ f (0) =
0) and ask about the di¬erentiability of g—¦f at 0. Suppose that the derivative
of f at 0 is ± and the derivative of g at 0 is β. Then

f (h) ≈ ±h

when h is small (h ∈ Rm ) and

g(k) ≈ βk
131
Please send corrections however trivial to twk@dpmms.cam.ac.uk

when k is small (k ∈ Rp ). It ought, therefore, to to be true that

g(f (h)) ≈ β(±h)

i.e. that

g —¦ f (h) ≈ (β±)h

when h is small (h ∈ Rm ). In other words g —¦ f is di¬erentiable at 0.
We have been lead to formulate the chain rule.
Lemma 6.2.10. (The chain rule.) Let U be a neighbourhood of x in Rm ,
and V a neighbourhood of y in Rp . Suppose that f : U ’ V is di¬erentiable
at x with derivative ±, that g : V ’ Rq is di¬erentiable at y with derivative
β and that f (x) = y. Then g —¦ f is di¬erentiable at x with derivative β±.
In more condensed notation

D(g —¦ f )(x) = Dg(f (x))Df (x),

or, equivalently,

D(g —¦ f )(x) = (Dg) —¦ f (x)Df (x).

Proof. We know that

f (x + h) = f (x) + ±h + 1 (h) h

and

g(f (x) + k) = g(f (x)) + βk + 2 (k) k

’ 0 as h ’ 0 and ’ 0 as k ’ 0. It follows that
where 1 (h) 2 (k)


g —¦ f (x + h) = g(f (x + h))
= g(f (x) + ±h + 1 (h) h)

so, taking k = ±h + 1 (h) h , we have

g —¦ f (x + h) = g(f (x)) + β(±h + 1 (h) h ) + 2 (±h + 1 (h) h ) ±h + 1 (h) h
= g —¦ f (x) + β±h + ·(h) h

with

·(h) = · 1 (h) + · 2 (h)
132 A COMPANION TO ANALYSIS

where

· 1 (h) h = β 1 (h) h

and

· 2 (h) h = 2 (±h + 1 (h) h ) ±h + 1 (h) h.

All we have to do is to show that · 1 (h) and · 2 (h) , and so ·(h) =
· 1 (h) + · 2 (h) tend to zero as h ’ 0. We observe ¬rst that

¤β
· 1 (h) h 1 (h) h =β 1 (h) h

so · 1 (h) ¤ β ’ 0 as h ’ 0. Next we observe that
1 (h)

· 2 (h) h = 2 (±h + 1 (h) h ) ±h + 1 (h) h
¤ 2 (±h + 1 (h) h ) ( ±h + 1 (h) h )
¤ 2 (±h + 1 (h) h ) ( ± + 1 (h) ) h ,

so that

· 2 (h) ¤ )’0
2 (±h + 1 (h) h)(± + 1 (h)

as h ’ 0 and we are done.
Students sometimes say that the proof of the chain rule is di¬cult but
they really mean that it is tedious. It is simply a matter of showing that
the error terms · 1 (h) h and · 2 (h) h which ought to be small, actually
are. Students also forget the arti¬ciality of the standard proofs of the one
dimensional chain rule (see the discussion of Lemma 5.6.2 ” any argument
which Hardy got wrong cannot be natural). The multidimensional argument
forces us to address the real nature of the chain rule.
The next result is very simple but I would like to give two di¬erent proofs.
Lemma 6.2.11. Let U be a neighbourhood of x in Rn . Suppose that f , g :
U ’ Rm are di¬erentiable at x. Then f + g is di¬erentiable at x with
D(f + g)(x) = Df (x) + Dg(x).
Direct proof. By de¬nition

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(x + h) = g(x) + Dg(x)h + 2 (h) h
133
Please send corrections however trivial to twk@dpmms.cam.ac.uk

’ 0 and ’ 0 as h ’ 0. Thus
where 1 (h) 2 (h)

(f + g)(x + h) = f (x + h) + g(x + h)
= f (x) + Df (x)h + 1 (h) h + g(x) + Dg(x)h + 2 (h) h
= (f + g)(x) + (Df (x) + Dg(x))h + 3 (h) h
with

3 (h) = 1 (h) + 2 (h).

Since
¤ ’ 0 + 0 = 0,
3 (h) 1 (h) + 2 (h)

as h ’ 0, we are done.
Our second proof depends on a series of observations.
Lemma 6.2.12. A linear map ± : Rn ’ Rm is everywhere di¬erentiable
with derivative ±.
Proof. Observe that
±(x + h) = ±x + ±h + (h) h ,
where (h) = 0, and apply the de¬nition.
As the reader can see, the result and proof are trivial, but they take some
getting used to. In one dimension the result says that the map given by
x ’ ax has derivative x ’ ax (or that the tangent to the line y = ax is the
line y = ax itself, or that the derivative of the linear map with 1 — 1 matrix
(a) is the linear map with matrix (a)).
Exercise 6.2.13. Show that the constant map fc : Rn ’ Rm , given by
fc (x) = c for all x, is everywhere di¬erentiable with derivative the zero linear
map.
Lemma 6.2.14. Let U be a neighbourhood of x in Rn and V a neighbourhood
of y in Rm . Suppose that f : U ’ Rp is di¬erentiable at x and g : V ’ Rq
is di¬erentiable at y. Then U — V is a neighbourhood of (x, y) in Rn+m and
the function (f , g) : U — V ’ Rp+q given by
(f , g)(u, v) = (f (u), g(v))
is di¬erentiable at (x, y) with derivative (Df (x), Dg(x)) where we write
(Df (x), Dg(x))(h, k) = (Df (x)h, Dg(x)k).
134 A COMPANION TO ANALYSIS

Proof. We leave some details (such as verifying that U —V is a neighbourhood
of (x, y)) to the reader. The key to the proof is the remark that (h, k) ≥
h , k . Observe that, if we write

f (x + h) = f (x) + Df (x)h + 1 (h) h

and

g(y + k) = g(y) + Dg(y)k + 2 (k) k,

we have

(f , g)((x, y) + (h, k)) = (f , g)(x, y) + (Df (x), Dg(x))(h, k) + (h, k) (h, k)

where

(h, k) (h, k) = 1 (h) h+ 2 (k) k.

Using the last equation, we obtain

(h, k) (h, k) = ( (h, k) (h, k) ) = ( 1 (h) h + 2 (k) k )
¤ ( 1 (h) h ) + ( 2 (k) k ) ¤ 1 (h) (h, k) + 2 (k) (h, k) .

Thus

(h, k) ¤ ’0+0=0
1 (h) + 2 (k)


as (h, k) ’ 0.

Exercise 6.2.15. If h ∈ Rn and k ∈ Rm , show that
2 2 2
(h, k) =h +k

and

h + k ≥ (h, k) ≥ h , k.

Exercise 6.2.16. Consider the situation described in Lemma 6.2.14. Write
down the Jacobian matrix of partial derivatives for (f , g) in terms of the
Jacobian matrices for f and g.

We can now give a second proof of Lemma 6.2.11 using the chain rule.
135
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Second proof of Lemma 6.2.11. Let ± : Rn ’ R2n be the map given by

±(x) = (x, x)

and β : R2m ’ Rm be the map given by

β(x, y) = x + y.

Then, using the notation of Lemma 6.2.14,

f + g = β —¦ (f , g) —¦ ±.

But ± and β are linear, so using the chain rule (Lemma 6.2.10), we see that
f + g is di¬erentiable at x and

D(f + g)(x) = β —¦ D(f , g)(x, x) —¦ ± = Df (x) + Dg(x).



If we only used this idea to prove Lemma 6.2.11 it would hardly be worth
it but it is frequently easiest to show that a complicated function is di¬eren-
tiable by expressing it as the composition of simpler di¬erentiable functions.
(How else would one prove that x ’ sin(exp(1 + x2 )) is di¬erentiable?)

Exercise 6.2.17. (i) Show that the function J : Rn — Rn ’ R given by the
scalar product

J(u, v) = u · v

is everywhere di¬erentiable with

DJ(x, y)(h, k) = x · k + y · h.

(ii) Let U be a neighbourhood of x in Rn . Suppose that f , g : U ’ Rm
are di¬erentiable at x. Show, using the chain rule, that f · g is di¬erentiable
at x with

D(f · g)(x)h = f (x) · (D(g)(x)h) + (D(f )(x)h) · g(x).

(iii) Let U be a neighbourhood of x in Rn . Suppose that f : U ’ Rm
and » : U ’ R are di¬erentiable at x. State and prove an appropriate result
about the function »f given by

(»f )(u) = »(u)f (u).
136 A COMPANION TO ANALYSIS

(iv) If you have met the vector product1 u § v of two vectors u, v ∈ R3 ,
state and prove an appropriate theorem about the vector product of di¬eren-
tiable functions.
(v) Let U be a neighbourhood of x in Rn . Suppose that f : U ’ R is
non-zero on U and di¬erentiable at x. Show that 1/f is di¬erentiable at x
and ¬nd D(1/f )x.


6.3 The mean value inequality in higher di-
mensions
So far our study of di¬erentiation in higher dimensions has remained on
the level of mere algebra. (The de¬nition of the operator norm used the
supremum and so lay deeper but we could have avoided this at the cost of
using a less natural norm.) The next result is a true theorem of analysis.

Theorem 6.3.1. (The mean value inequality.) Suppose that U is an
open set in Rm and that f : U ’ Rp is di¬erentiable. Consider the straight
line segment

L = {(1 ’ t)a + tb : 0 ¤ t ¤ 1}

joining a and b. If L ⊆ U (i.e. L lies entirely within U ) and Df (x) ¤ K
for all x ∈ L, then

f (a) ’ f (b) ¤ K a ’ b .

Proof. Before starting the proof, it is helpful to note that, since U is open,
we can ¬nd a · > 0 such that the extended straight line segment

{(1 ’ t)a + tb : ’· ¤ t ¤ 1 + ·} ⊆ U.

We shall prove our many dimensional mean value inequality from the
one dimensional version (Theorem 1.7.1, or if the reader prefers, the slightly
sharper Theorem 4.4.1). To this end, observe that, if f (b) ’ f (a) = 0, there
is nothing to prove. We may thus assume that f (b) ’ f (a) = 0 and consider

f (b) ’ f (a)
u= ,
f (b) ’ f (a)
1
Question What do you get if you cross a mountaineer with a mosquito? Answer You
can™t. One is a scaler and the other is a vector.
137
Please send corrections however trivial to twk@dpmms.cam.ac.uk

the unit vector in the direction f (b)’f (a). If we now de¬ne g : (’·, 1+·) ’
R by

g(t) = u · f ((1 ’ t)a + tb) ’ f (a) ,

we see, by using the chain rule or direct calculation, that g is continuous and
di¬erentiable on (’·, 1 + ·) with

g (t) = u · (Df ((1 ’ t)a + tb)(b ’ a)).

Using the Cauchy-Schwarz inequality (Lemma 4.1.2) and the de¬nition of
the operator norm (De¬nition 6.2.4), we have

|g (t)| ¤ u Df ((1 ’ t)a + tb)(b ’ a)
= Df ((1 ’ t)a + tb)(b ’ a)
¤ Df ((1 ’ t)a + tb) b ’ a
¤K a’b .

for all t ∈ (0, 1). Thus, by the one dimensional mean value inequality,

f (a) ’ f (b) = |g(1) ’ g(0)| ¤ K a ’ b

as required.
Exercise 6.3.2. (i) Prove the statement of the ¬rst sentence in the proof
just given.
(ii) If g is the function de¬ned in the proof just given, show, giving all
the details, that g is continuous and di¬erentiable on (’·, 1 + ·) with

g (t) = u · Df ((1 ’ t)a + tb)(b ’ a) .

You should give two versions of the proof, the ¬rst using the chain rule
(Lemma 6.2.10) and the second using direct calculation.
If we have already gone to the trouble of proving the one-dimensional
mean value inequality it seems sensible to make use of it in proving the mul-
tidimensional version. However, we could have proved the multidimensional
theorem directly without making a one-dimensional detour.
Exercise 6.3.3. (i) Reread the proof of Theorem 1.7.1.
(ii) We now start the direct proof of Theorem 6.3.1. As before observe
that we can ¬nd a · > 0 such that

{(1 ’ t)a + tb : ’· ¤ t ¤ 1 + ·} ⊆ U,
138 A COMPANION TO ANALYSIS

but now consider F : (’·, 1 + ·) ’ Rp by

F(t) = f ((1 ’ t)a + tb) ’ f (a).

Explain why the theorem will follow if we can show that, given any > 0, we
have

F(1) ’ F(0) ¤ K a ’ b + .

(ii) Suppose, if possible, that there exists an > 0 such that

F(1) ’ F(0) ≥ K a ’ b + .

Show by a lion hunting argument that there exist a c ∈ [0, 1] and un , vn ∈
[0, 1] with un < vn such that un , vn ’ c and

F(vn ) ’ F(un ) ≥ (K a ’ b + )(vn ’ un ).

(iii) Show from the de¬nition of di¬erentiability that there exists a δ > 0
such that

F(t) ’ F(c) < (K a ’ b + /2)|t ’ c|

whenever |t ’ c| < δ and t ∈ [0, 1].
(iv) Prove Theorem 6.3.1 by reductio ad absurdum.

One of the principal uses we made of the one dimensional mean value
theorem was to show that a function on an open interval with zero derivative
was necessarily constant. The reader should do both parts of the following
easy exercise and re¬‚ect on them.

Exercise 6.3.4. (i) Let U be an open set in Rm such that given any a, b ∈ U
we can ¬nd a ¬nite sequence of points a = a0 , a1 , . . . , ak’1 , ak = b such
that each line segment

{(1 ’ t)aj’1 + taj : 0 ¤ t ¤ 1} ⊆ U

[1 ¤ j ¤ k]. Show that, if f : U ’ Rp is everywhere di¬erentiable on U with
Df (x) = 0, it follows that f is constant.
(ii) We work in R2 . Let U1 be the open disc of radius 1 centre (’2, 0)
and U2 be the open disc of radius 1 centre (2, 0). Set U = U1 ∪ U2 . De¬ne
f : U ’ R by f (x) = ’1 for x ∈ U1 , f (x) = 1 for x ∈ U2 . Show that f is
everywhere di¬erentiable on U with D(f )(x) = 0 but f is not constant.
139
Please send corrections however trivial to twk@dpmms.cam.ac.uk

The reader may ask if we can obtain an improvement to our mean value
inequality by some sort of equality along the lines of Theorem 4.4.1. The
answer is a clear no.

Exercise 6.3.5. Let f : R ’ R2 be given by f (t) = (cos t, sin t)T . Compute
the Jacobian matrix of partial derivatives for f and show that f (0) = f (2π)
but Df (t) = 0 for all t.

(Although Exercise K.102 is not a counter example it points out another
problem which occurs when we work in many dimensions.)
It is fairly obvious that we cannot replace the line segment L in Theo-
rem 6.3.1 by other curves without changing the conclusion.

Exercise 6.3.6. Let

U = {x ∈ R2 : x > 1} \ {(x, 0)T : x ¤ 0}

If we take θ(x) to be the unique solution of
x y
, ’π < θ(x) < π
cos(θ(x)) = , sin(θ(x)) = 2
(x2 + y 2 )1/2 (x + y 2 )1/2

for x = (x, y)T ∈ U , show that θ : U ’ R is everywhere di¬erentiable
with Dθ(x) < 1. (The amount of work involved in proving this depends
quite strongly on how clever you are in exploiting radial symmetry.) Show,
however, that if a = (’1, 10’1 )T , b = (’1, ’10’1 )T , then

|θ(a) ’ θ(b)| > a ’ b .

It is clear (though we shall not prove it, and, indeed, cannot yet state it
without using concepts which we have not formally de¬ned) that the correct
generalisation when L is not a straight line will run as follows. ˜If L is a well
behaved path lying entirely within U and Df (x) ¤ K for all x ∈ L then
f (a) ’ f (b) ¤ K — length L™.
Chapter 7

Local Taylor theorems

7.1 Some one dimensional Taylor theorems
By de¬nition, a function f : R ’ R which is continuous at 0 looks like a
constant function near 0, in the sense that

f (t) = f (0) + (t)

where (t) ’ 0 as t ’ 0. By de¬nition, again, a function f : R ’ R which
is di¬erentiable at 0 looks like a linear function near 0, in the sense that

f (t) = f (0) + f (0)t + (t)|t|

where (t) ’ 0 as t ’ 0. The next exercise establishes the non-trivial
theorem that a function f : R ’ R, which is n times di¬erentiable in a
neighbourhood of 0 and has f (n) continuous at 0, looks like a polynomial of
degree n near 0, in the sense that

f (n) (0) n
f (0) 2
t + (t)|t|n
t + ··· +
f (t) = f (0) + f (0)t +
2! n!
where (t) ’ 0 as t ’ 0.
This exercise introduces several ideas which we use repeatedly in this
chapter so the reader should do it carefully.
Exercise 7.1.1. In this exercise we consider functions f, g : (’a, a) ’ R
where a > 0.
(i) If f and g are di¬erentiable with f (t) ¤ g (t) for all 0 ¤ t < a and
f (0) = g(0), explain why f (t) ¤ g(t) for all 0 ¤ t < a.
(ii) If |f (t)| ¤ |t|r for all t ∈ (’a, a) and f (0) = 0, show that |f (t)| ¤
|t|r+1 /(r + 1) for all |t| < a.

141
142 A COMPANION TO ANALYSIS

(iii) If g is n times di¬erentiable with |g (n) (t)| ¤ M for all t ∈ (’a, a)
and g(0) = g (0) = · · · = g (n’1) (0) = 0, show that
M |t|n
|g(t)| ¤
n!
for all |t| < a.
(iv) If g is n times di¬erentiable in (’a, a) and g(0) = g (0) = · · · =
g (n) (0) = 0, show, using (iii), that, if g (n) is continuous at 0, then
·(t)|t|n
|g(t)| ¤
n!
where ·(t) ’ 0 as t ’ 0.
(v) If f is n times di¬erentiable with |f (n) (t)| ¤ M for all t ∈ (’a, a),
show that
n’1
f (j) (0) j M |t|n
f (t) ’ t¤
j! n!
j=0

for all |t| < a.
(vi) If f is n times di¬erentiable in (’a, a), show that, if f (n) is contin-
uous at 0, then
n
f (j) (0) j ·(t)|t|n
f (t) ’ t¤
j! n!
j=0

where ·(t) ’ 0 as t ’ 0.
Restating parts (v) and (vi) of Exercise 7.1.1 we get two similar looking
but distinct theorems.
Theorem 7.1.2. (A global Taylor™s theorem.) If f : (’a, a) ’ R is n
times di¬erentiable with |f (n) (t)| ¤ M for all t ∈ (’a, a), then
n’1
f (j) (0) j M |t|n
f (t) ’ t¤ .
j! n!
j=0

Theorem 7.1.3. (The local Taylor™s theorem). If f : (’a, a) ’ R is n
times di¬erentiable and f (n) is continuous at 0, then
n
f (j) (0) j
t + (t)|t|n
f (t) =
j!
j=0

where (t) ’ 0 as t ’ 0.
143
Please send corrections however trivial to twk@dpmms.cam.ac.uk

We shall obtain other and more precise global Taylor theorems in the
course of the book (see Exercise K.49 and Theorem 8.3.20) but Theorem 7.1.2
is strong enough for the following typical applications.

Exercise 7.1.4. (i) Consider a di¬erentiable function e : R ’ R which
obeys the di¬erential equation e (t) = e(t) with the initial condition e(0) = 1.
Quote a general theorem which tells you that, if a > 0, there exists an M
with |e(t)| ¤ M for |t| ¤ a. Show that
n’1
tj M |t|n
e(t) ’ ¤
j! n!
j=0


for all |t| < a. Deduce that
n’1
tj
’ e(t)
j!
j=0


as n ’ ∞, and so

tj
e(t) =
j!
j=0


for all t.
(ii) Consider di¬erentiable functions s, c : R ’ R which obey the di¬er-
ential equations s (t) = c(t), c (t) = ’s(t) with the initial conditions s(0) = 0,
c(0) = 1. Show that

(’1)j t2j+1
s(t) =
(2j + 1)!
j=0


for all t and obtain a similar result for c.

However, in this chapter we are interested in the local behaviour of func-
tions and therefoe in the local Taylor theorem. The distinction between local
and global Taylor expansion is made in the following very important example
of Cauchy.

Example 7.1.5. Consider the function F : R ’ R de¬ned by

F (0) = 0
F (x) = exp(’1/x2 ) otherwise.
144 A COMPANION TO ANALYSIS

(i) Prove by induction, using the standard rules of di¬erentiation, that F
is in¬nitely di¬erentiable at all points x = 0 and that, at these points,

F (n) (x) = Pn (1/x) exp(’1/x2 )

where Pn is a polynomial which need not be found explicitly.
(ii) Explain why x’1 Pn (1/x) exp(’1/x2 ) ’ 0 as x ’ 0.
(iii) Show by induction, using the de¬nition of di¬erentiation, that F is
in¬nitely di¬erentiable at 0 with F (n) (0) = 0 for all n. [Be careful to get this
part of the argument right.]
(iv) Show that

F (j) (0) j
F (x) = x
j!
j=0


if and only if x = 0. (The reader may prefer to say that ˜The Taylor expansion
of F is only valid at 0™.)
(v) Why does part (iv) not contradict the local Taylor theorem (Theo-
rem 7.1.3)?
[We give a di¬erent counterexample making use of uniform convergence in
Exercise K.226.]

Example 7.1.6. Show that, if we de¬ne E : R ’ R by

if x ¤ 0
E(x) = 0
E(x) = exp(’1/x2 ) otherwise,

then E is an in¬nitely di¬erentiable function with E(x) = 0 for x ¤ 0 and
E(x) > 0 for x > 0

Cauchy gave his example to show that we cannot develop the calculus
algebraically but must use , δ techniques. In later courses the reader will
see that his example encapsulates a key di¬erence between real and complex
analysis. If the reader perseveres further with mathematics she will also ¬nd
the function E playing a useful rˆle in distribution theory and di¬erential
o
geometry.
A simple example of the use of the local Taylor theorem is given by the
proof of (a version of) L™Hˆpital™s rule in the next exercise.
o

Exercise 7.1.7. If f, g : (’a, a) ’ R are n times di¬erentiable and

f (0) = f (0) = · · · = f n’1 (0) = g(0) = g (0) = · · · = g (n’1) (0) = 0
145
Please send corrections however trivial to twk@dpmms.cam.ac.uk

but g (n) (0) = 0 then, if f (n) and g (n) are continuous at 0, it follows that

f (n) (0)
f (t)
’ (n)
g(t) g (0)
as t ’ 0.

It should be pointed out that the local Taylor theorems of this chapter
(and the global ones proved elsewhere) are deep results which depend on
the fundamental axiom. The fact that we use mean value theorems to prove
them is thus not surprising ” we must use the fundamental axiom or results
derived from it in the proof.
(Most of my readers will be prepared to accept my word for the statements
made in the previous paragraph. Those who are not will need to work through
the next exercise. The others may skip it.)

Exercise 7.1.8. Explain why we can ¬nd a sequence of irrational numbers
an such that 4’n’1 < an < 4’n . We write I0 = {x ∈ Q : x > a0 } and

In = {x ∈ Q : an < x < an’1 }

[n = 1, 2, 3, . . . ]. Check that, if x ∈ In , then 4’n’1 < x < 4’n+1 [n ≥ 1].
We de¬ne f : Q ’ Q by f (0) = 0 and f (x) = 8’n if |x| ∈ In [n ≥ 0]. In
what follows we work in Q.
(i) Show that

f (h) ’ f (0)
’0
h
as h ’ 0. Conclude that f is di¬erentiable at 0 with f (0) = 0.
(ii) Explain why f is everywhere di¬erentiable with f (x) = 0 for all x.
Conclude that f is in¬nitely di¬erentiable with f (r) = 0 for all r ≥ 0.
(iii) Show that

f (h) ’ f (0)
’∞
h2
as h ’ 0. Conclude that, if we write

f (0) 2
h + (h)h2 ,
f (h) = f (0) + f (0)h +
2!
0 as h ’ 0. Thus the local Taylor theorem (Theorem 7.1.3) is
then (h)
false for Q.
146 A COMPANION TO ANALYSIS

7.2 Some many dimensional local Taylor the-
orems
In the previous section we used mean value inequalities to investigate the
local behaviour of well behaved functions f : R ’ R. We now use the same
ideas to investigate the local behaviour of well behaved functions f : Rn ’ R.
It turns out that, once we understand what happens when n = 2, it is easy
to extend the results to general n and this will be left to the reader.
Here is our ¬rst example.
Lemma 7.2.1. We work in R2 and write 0 = (0, 0).
(i) Suppose δ > 0, and that f : B(0, δ) ’ R has partial derivatives f ,1
and f,2 with |f,1 (x, y)|, |f,2 (x, y)| ¤ M for all (x, y) ∈ B(0, δ). If f (0, 0) = 0,
then
|f (x, y)| ¤ 2M (x2 + y 2 )1/2
for all (x, y) ∈ B(0, δ).
(ii) Suppose δ > 0, and that g : B(0, δ) ’ R has partial derivatives
g,1 and g,2 in B(0, δ). Suppose that g,1 and g,2 are continuous at (0, 0) and
g(0, 0) = g,1 (0, 0) = g,2 (0, 0) = 0. Then writing
g((h, k)) = (h, k)(h2 + k 2 )1/2
we have (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0.
Proof. (i) Observe that the one dimensional mean value inequality applied
to the function t ’ f (x, t) gives
|f (x, y) ’ f (x, 0)| ¤ M |y|
whenever (x, y) ∈ B(0, δ) and the same inequality applied to the function
s ’ f (s, 0) gives
|f (x, 0) ’ f (0, 0)| ¤ M |x|
whenever (x, 0) ∈ B(0, δ). We now apply a taxicab argument (the idea
behind the name is that a New York taxicab which wishes to get from (0, 0)
to (x, y) will be forced by the grid pattern of streets to go from (0, 0) to (x, 0)
and thence to (x, y)) to obtain
|f (x, y)| = |f (x, y) ’ f (0, 0)| = |(f (x, y) ’ f (x, 0)) + (f (x, 0) ’ f (0, 0))|
¤ |f (x, y) ’ f (x, 0)| + |f (x, 0) ’ f (0, 0)| ¤ M |y| + M |x|
¤ 2M (x2 + y 2 )1/2
147
Please send corrections however trivial to twk@dpmms.cam.ac.uk

for all (x, y) ∈ B(0, δ).
(ii) Let > 0 be given. By the de¬nition of continuity, we can ¬nd a δ1 ( )
such that δ > δ1 ( ) > 0 and

|g,1 (x, y)|, |g,2 (x, y)| ¤ /2

for all (x, y) ∈ B(0, δ1 ( )). By part (i), this means that

|g(x, y)| ¤ (x2 + y 2 )1/2

for all (x, y) ∈ B(0, δ1 ( )) and this gives the desired result.

Theorem 7.2.2. (Continuity of partial derivatives implies di¬eren-
tiability.) Suppose δ > 0, x = (x, y) ∈ R2 , B(x, δ) ⊆ E ⊆ R2 and that
f : E ’ R. If the partial derivatives f,1 and f,2 exist in B(x, δ) and are
continuous at x, then, writing

f (x + h, y + k) = f (x, y) + f,1 (x, y)h + f,2 (x, y)k + (h, k)(h2 + k 2 )1/2 ,

we have (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0. (In other words, f is di¬erentiable
at x.)

Proof. By translation, we may suppose that x = 0. Now set

g(x, y) = f (x, y) ’ f (0, 0) ’ f,1 (0, 0)x ’ f,2 (0, 0)y.

We see that g satis¬es the hypotheses of part (ii) of Lemma 7.2.1. Thus g
satis¬es the conclusions of part (ii) of Lemma 7.2.1 and our theorem follows.


Although this is not one of the great theorems of all time, it occasionally
provides a useful short cut for proving functions di¬erentiable1 . The following
easy extensions are left to the reader.

Theorem 7.2.3. (i) Suppose δ > 0, x ∈ Rm , B(x, δ) ⊆ E ⊆ Rm and that
f : E ’ R. If the partial derivatives f,1 , f,2 , . . . f,m exist in B(x, δ) and are
continuous at x, then f is di¬erentiable at x.
(ii) Suppose δ > 0, x ∈ Rm , B(x, δ) ⊆ E ⊆ Rm and that f : E ’ Rp . If
the partial derivatives fi,j exist in B(x, δ) and are continuous at x [1 ¤ i ¤
p, 1 ¤ j ¤ m], then f is di¬erentiable at x.
1
I emphasise the word occasionally. Usually, results like the fact that the di¬erentiable
function of a di¬erentiable function is di¬erentiable give a faster and more satisfactory
proof.
148 A COMPANION TO ANALYSIS

Similar ideas to those used in the proof of Theorem 7.2.2 give our next
result which we shall therefore prove more expeditiously. We write

f,ij (x) = (f,j ),i (x),

or, in more familiar notation,

‚2f
f,ij = .
‚xi ‚xj

Theorem 7.2.4. (Second order Taylor series.) Suppose δ > 0, x =
(x, y) ∈ R2 , B(x, δ) ⊆ E ⊆ R2 and that f : E ’ R. If the partial derivatives
f,1 , f,2 , f,11 , f,12 , f,22 exist in B(x, δ) and f,11 , f,12 , f,22 are continuous at x,
then writing

f ((x + h, y + k)) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k
+ (f,11 (x, y)h2 + 2f,12 (x, y)hk + f,22 (x, y)k 2 )/2 + (h, k)(h2 + k 2 ),

we have (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0.

Proof. By translation, we may suppose that x = 0. By considering

f (h, k) ’ f (0, 0) ’ f,1 (0, 0)h ’ f,2 (0, 0)k ’ (f,11 (0, 0)h2 + 2f,12 (0, 0)hk + f,22 (0, 0)k 2 )/2,

we may suppose that

f (0, 0) = f,1 (0, 0) = f,2 (0, 0) = f,11 (0, 0) = f,12 (0, 0) = f,22 (0, 0).

If we do this, our task reduces to showing that

f (h, k)
’0
h2 + k 2

as (h2 + k 2 )1/2 ’ 0.
To this end, observe that, if > 0, the continuity of the given partial
derivatives at (0, 0) tells us that we can ¬nd a δ1 ( ) such that δ > δ1 ( ) > 0
and

|f,11 (h, k)|, |f,12 (h, k)|, |f,22 (h, k)| ¤

for all (h, k) ∈ B(0, δ1 ( )). Using the mean value inequality in the manner
of Lemma 7.2.1, we have

|f,1 (h, k) ’ f,1 (h, 0)| ¤ |k|
149
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and
|f,1 (h, 0) ’ f,1 (0, 0)| ¤ |h|
and a taxicab argument gives
|f,1 (h, k)| = |f,1 (h, k) ’ f,1 (0, 0)| = |(f,1 (h, k) ’ f,1 (h, 0)) + (f,1 (h, 0) ’ f,1 (0, 0))|
¤ |f,1 (h, k) ’ f,1 (h, 0)| + |f,1 (h, 0) ’ f,1 (0, 0)| ¤ (|k| + |h|)
for all (h, k) ∈ B(0, δ1 ( )). (Or we could have just applied Lemma 7.2.1 with
f replaced by f,1 .) The mean value inequality also gives
|f,2 (0, k)| = |f,2 (0, k) ’ f,2 (0, 0)| ¤ |k|.
Now, applying the taxicab argument again, using the mean value inequal-
ity and the estimates of the ¬rst paragraph, we get
|f (h, k)| = |f (h, k) ’ f (0, 0)| = |(f (h, k) ’ f (0, k)) + (f (0, k) ’ f (0, 0))|
¤ |f (h, k) ’ f (0, k)| + |f (0, k) ’ f (0, 0)|
¤ sup |f,1 (sh, k)||h| + sup |f,2 (0, tk)||k|
0¤s¤1 0¤t¤1

¤ (|k| + |h|)|h| + |k|2
¤ 3 (h2 + k 2 ).
Since was arbitrary, the result follows.
Exercise 7.2.5. Set out the proof of Theorem 7.2.4 in the style of the proof
of Theorem 7.2.2.
We have the following important corollary.
Theorem 7.2.6. (Symmetry of the second partial derivatives.) Sup-
pose δ > 0, x = (x, y) ∈ R2 , B(x, δ) ⊆ E ⊆ R2 and that f : E ’ R.
If the partial derivatives f,1 , f,2 , f,11 , f,12 , f,21 f,22 exist in B(x, δ) and are
continuous at x, then f,12 (x) = f,21 (x).
Proof. By Theorem 7.2.4, we have
f (x + h, y + k) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k
+ (f,11 (x, y)h2 + 2f,12 (x, y)hk + f,22 (x, y)k 2 )/2 + 2
+ k2)
1 (h, k)(h

with 1 (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0. But, interchanging the rˆle of ¬rst
o
and second variable, Theorem 7.2.4 also tells us that
f (x + h, y + k) =f (x, y) + f,1 (x, y)h + f,2 (x, y)k
+ (f,11 (x, y)h2 + 2f,21 (x, y)hk + f,22 (x, y)k 2 )/2 + 2
+ k2)
2 (h, k)(h
150 A COMPANION TO ANALYSIS

with 2 (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0.
Comparing the two Taylor expansions for f (x + h, y + k), we see that
2
+ k2) = 2
+ k2)
f,12 (x, y)hk ’ f,21 (x, y)hk = ( 1 (h, k) ’ 2 (h, k))(h 3 (h, k)(h

’ 0 as (h2 + k 2 )1/2 ’ 0. Taking h = k and dividing by h2 we
with 3 (h, k)
have

f,12 (x, y) ’ f,21 (x, y) = 2 3 (h, h) ’ 0

as h ’ 0, so f,12 (x, y) ’ f,21 (x, y) = 0 as required.
It is possible to produce plausible arguments for the symmetry of second
partial derivatives. Here are a couple.
(1) If f is a multinomial, i.e. f (x, y) = P Q pq
q=0 ap,q x y , then f,12 =
p=0
f,21 . But smooth functions are very close to being polynomial, so we would
expect the result to be true in general.
(2) Although we cannot interchange limits in general, it is plausible, that
if f is well behaved, then

f,12 (x, y) = lim lim h’1 k ’1 (f (x + h, y + k) ’ f (x + h, y) ’ f (x, y + k) + f (x, y))
h’0 k’0
= lim lim h’1 k ’1 (f (x + h, y + k) ’ f (x + h, y) ’ f (x, y + k) + f (x, y))
k’0 h’0
= f,21 (x, y).

However, these are merely plausible arguments. They do not make clear the
rˆle of the continuity of the second derivative (in Example 7.3.18 we shall see
o
that the result may fail for discontinuous second partial derivatives). More
fundamentally, they are algebraic arguments and, as the use of the mean value
theorem indicates, the result is one of analysis. The same kind of argument
which shows that the local Taylor theorem fails over Q (see Example 7.1.8)
shows that it fails over Q2 and that the symmetry of partial derivatives fails
with it (see [33]).
If we use the D notation, Theorem 7.2.6 states that (under appropriate
conditions)

D1 D2 f = D2 D1 f.

If we write Dij = Di Dj , as is often done, we get

D12 f = D21 f.

What happens if a function has higher partial derivatives? It is not hard
to guess and prove the appropriate theorem.
151
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 7.2.7. Suppose δ > 0, x ∈ Rm B(x, δ) ⊆ E ⊆ Rm and that
f : E ’ R. Show that, if all the partial derivatives f,j , f,jk , f,ijk , . . . up to
the nth order exist in B(x, δ) and are continuous at x, then, writing
m m m m m m
1 1
f (x + h) = f (x) + f,j (x)hj + f,jk (x)hj hk + f,jkl (x)hj hk hl
2! 3!
j=1 j=1 j=1 k=1 l=1
k=1

+ · · · + sum up to nth powers + (h) h n ,

we have (h) ’ 0 as h ’ 0.
Notice that you do not have to prove results like

f,jkl (x) = f,ljk (x) = f,klj (x) = f,lkj (x) = f,jlk (x) = f,kjl (x)

since they follow directly from Theorem 7.2.6.
Applying Exercise 7.2.7 to the components fi of a function f , we obtain
our full many dimensional Taylor theorem.
Theorem 7.2.8 (The local Taylor™s theorem). Suppose δ > 0, x ∈ Rm ,
B(x, δ) ⊆ E ⊆ Rm and that f : E ’ Rp . If all the partial derivatives fi,j ,
fi,jk , fi,jkl , . . . exist in B(x, δ) and are continuous at x, then, writing
m m m
1
fi (x + h) = fi (x) + fi,j (x)hj + fi,jk (x)hj hk
2!
j=1 j=1 k=1
m m m
1
+ fi,jkl (x)hj hk hl
3! j=1 k=1 l=1

+ · · · + sum up to nth powers + i (h) h n ,

(h) ’ 0 as h ’ 0.
we have
The reader will remark that Theorem 7.2.8 bristles with subscripts, con-
trary to our announced intention of seeking a geometric, coordinate free view.
However, it is very easy to restate the main formula of Theorem 7.2.8 in a
coordinate free way as

f (x + h) = f (x) + ±1 (h) + ±2 (h, h) + · · · + ±n (h, h, . . . , h) + (h) h n ,

where ±k : Rm — Rm · · · — Rm ’ Rp is linear in each variable (i.e. a k-
linear function) and symmetric (i.e. interchanging any two variables leaves
the value of ±k unchanged).
Anyone who feels that the higher derivatives are best studied using co-
ordinates should re¬‚ect that, if f : R3 ’ R3 is well behaved, then the
152 A COMPANION TO ANALYSIS

˜third derivative behaviour™ of f at a single point is apparently given by
the 3 — 3 — 3 — 3 = 81 numbers fi,jkl (x). By symmetry (see Theorem 7.2.6)
only 30 of the numbers are distinct but these 30 numbers are independent
(consider polynomials in three variables for which the total degree of each
term is 3). How can we understand the information carried by an array of
30 real numbers?

Exercise 7.2.9. (i) Verify the statements in the last paragraph. How large
an array is required to give the ˜third derivative behaviour™ of a well behaved
function f : R4 ’ R4 at a point? How large an array is required to give the
˜fourth derivative behaviour™ of a well behaved function f : R3 ’ R3 at a
point?
(ii) (Ignore this if the notation is not familiar.) Consider a well behaved
function f : R3 ’ R3 . How large an array is required to give curl f = — f
and div f = · f ? How large an array is required to give Df ?
In many circumstances curl f and div f give the physically interesting part
of Df but physicists also use
3 3 3
(a · )f = aj f1,j , aj f2,j , aj f3,j .
j=1 j=1 j=1


How large an array is required to give (a · )f for all a ∈ R3 ?
In subjects like elasticity the description of nature requires the full Jaco-
bian matrix (fi,j ) and the treatment of di¬erentiation used is closer to that
of the pure mathematician.

Most readers will be happy to ¬nish this section here2 . However, some of
them3 will observe that in our coordinate free statement of the local Taylor™s
theorem the ˜second derivative behaviour™ is given by a bilinear map ±2 :
Rm — Rm ’ Rp and we de¬ned derivatives in terms of linear maps.
Let us be more precise. We suppose f is a well behaved function on an
open set U ⊆ Rp taking values in Rm . If we write L(E, F ) for the space
of linear maps from a ¬nite dimensional vector space E to a vector space F
then, for each ¬xed x ∈ U , we have Df (x) ∈ L(Rm , Rp ). Thus, allowing x to
vary freely, we see that we have a function

Df : U ’ L(Rm , Rp ).
2
The rest of this section is marked with a ™.
3
Boas notes that ˜There is a test for identifying some of the future professional math-
ematicians at an early age. These are students who instantly comprehend a sentence
beginning “Let X be an ordered quintuple (a, T, π, σ, B) where . . . ”. They are even more
promising if they add, “I never really understood it before.” ™ ([8] page 231.)
153
Please send corrections however trivial to twk@dpmms.cam.ac.uk

We now observe that L(Rm , Rp ) is a ¬nite dimensional vector space over R
of dimension mp, in other words, L(Rm , Rp ) can be identi¬ed with Rmp . We
know how to de¬ne the derivative of a well behaved function g : U ’ Rmp
at x as a function

Dg(x) ∈ L(Rm , Rmp )

so we know how to de¬ne the derivative of Df at x as a function

D(Df )(x) ∈ L(Rm , L(Rm , Rp )).

We have thus shown how to de¬ne the second derivative D 2 f (x) = D(Df )(x).
But D2 f (x) lies in L(Rm , L(Rm , Rp )) and ±2 lies in the space E(Rm , Rm ; Rp )
of bilinear maps from Rm — Rm to Rp . How, the reader may ask, can we
identify L(Rm , L(Rm , Rp )) with E(Rm , Rm ; Rp )? Fortunately this question
answers itself with hardly any outside intervention.
Exercise 7.2.10. Let E, F and G be ¬nite dimensional vector spaces over
R. We write E(E, F ; G) for the space of bilinear maps ± : E — F ’ G.
De¬ne

(˜(±)(u))(v) = ±(u, v)

for all ± ∈ E(E, F ; G), u ∈ E and v ∈ F .
(i) Show that ˜(±)(u) ∈ L(F, G).
(ii) Show that, if v is ¬xed,

˜(±)(»1 u1 + »2 u2 ) (v) = »1 ˜(±)(u1 ) + »2 ˜(±)(u2 ) (v)

and deduce that

˜(±)(»1 u1 + »2 u2 ) = »1 ˜(±)(u1 ) + »2 ˜(±)(u2 )

for all »1 , »2 ∈ R and u1 , u2 ∈ E. Conclude that ˜(±) ∈ L(E, L(F, G)).
(iii) By arguments similar in spirit to those of (ii), show that ˜ : E(E, F ; G) ’
L(E, L(F, G)) is linear.
(iv) Show that if (˜(±)(u))(v) = 0 for all u ∈ E, v ∈ F , then ± = 0.
Deduce that ˜ is injective.
(v) By computing the dimensions of E(E, F ; G) and L(E, L(F, G)), show
that ˜ is an isomorphism.
Since our de¬nition of ˜ does not depend on a choice of basis, we say that
˜ gives a natural isomorphism of E(E, F ; G) and L(E, L(F, G)). If we use
this isomorphism to identify E(E, F ; G) and L(E, L(F, G)) then D 2 f (x) ∈
154 A COMPANION TO ANALYSIS

E(Rm , Rm ; Rp ). If we treat the higher derivatives in the same manner, the
central formula of the local Taylor theorem takes the satisfying form
12 1
D f (x)(h, h) + · · · + Dn f (x)(h, h, . . . , h) + (h) h n .
f (x + h) = f (x) + Df (x)(h) +
2! n!
For more details, consult sections 11 and 13 of chapter VIII of Dieudonn´™s
e
Foundations of Modern Analysis [13] where the higher derivatives are dealt
with in a coordinate free way. Like Hardy™s book [23], Dieudonn´™s is ae
4
masterpiece but in very di¬erent tradition .


7.3 Critical points
In this section we mix informal and formal argument, deliberately using
words like ˜well behaved™ without de¬ning them. Our object is to use the
local Taylor formula to produce results about maxima, minima and related
objects.
Let U be an open subset of Rm containing 0. We are interested in the
behaviour of a well behaved function f : U ’ R near 0.
Since f is well behaved, the ¬rst order local Taylor theorem (which re-
duces to the de¬nition of di¬erentiation) gives

f (h) = f (0) + ±h + (h) h

where (h) ’ 0 as h ’ 0 and ± = Df (0) is a linear map from Rm to R.
By a very simple result of linear algebra, we can choose a set of orthogonal
coordinates so that ±(x1 , x2 , . . . , xm ) = ax1 with a ≥ 0.

Exercise 7.3.1. If ± : Rm ’ R is linear show that, with respect to any
particular chosen orthogonal coordinates,

±(x1 , x2 , . . . , xm ) = a1 x1 + a2 x2 + · · · + am xm

for some aj ∈ R. Deduce that there is a vector a such that ±x = a · x for all
x ∈ Rm . Conclude that we can choose a set of orthogonal coordinates so that
±(x1 , x2 , . . . , xm ) = ax1 with a ≥ 0.
In applied mathematics we write a = f . A longer, but very instructive
proof, of the result of this exercise is given in Exercise K.31.

In the coordinate system just chosen

f (h1 , h2 , . . . , hm ) = f (0) + ah1 + (h) h
155
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 7.1: Contour lines when the derivative is not zero.

where (h) ’ 0 as h ’ 0. Thus, speaking informally, if a = 0 the ˜contour
lines™ f (h) = c close to 0 will look like parallel ˜hyperplanes™ perpendicular to
the x1 axis. Figure 7.1 illustrates the case m = 2. In particular, our contour
lines look like those describing a side of a hill but not its peak.
Using our informal insight we can prove a formal lemma.

Lemma 7.3.2. Let U be an open subset of Rm containing x. Suppose that
f : U ’ R is di¬erentiable at x. If f (x) ≥ f (y) for all y ∈ U then
Df (x) = 0 (more precisely, Df (x)h = 0 for all h ∈ Rm ).

Proof. There is no loss in generality in supposing x = 0. Suppose that
Df (0) = 0. Then we can ¬nd an orthogonal coordinate system and a strictly
positive real number a such that Df (0)(h1 , h2 , . . . , hn ) = ah1 . Thus, from
the de¬nition of the derivative,

f (h1 , h2 , . . . , hn ) = f (0) + ah1 + (h) h

where (h) ’ 0 as h ’ 0.
Choose · > 0 such that, whenever h < ·, we have h ∈ U and (h) <
a/2. Now choose any real h with 0 < h < ·. If we set h = (h, 0, 0, . . . , 0), we
have

f (h) = f (0) + ah + (h)h > f (0) + ah ’ ah/2 = f (0) + ah/2 > f (0).



The distinctions made in the following de¬nition are probably familiar to
the reader.

De¬nition 7.3.3. Let E be a subset of Rm containing x and let f be a
function from E to R.
4
See the quotation from Boas in the previous footnote.
156 A COMPANION TO ANALYSIS

(i) We say that f has a global maximum at x if f (x) ≥ f (y) for all
y ∈ E.
(ii) We say that f has a strict global maximum at x if f (x) > f (y) for
all y ∈ E with x = y.
(iii) We say that f has a local maximum (respectively a strict local maxi-
mum) at x if there exists an · > 0 such that the restriction of f to E ©B(x, ·)
has a global maximum (respectively a strict global maximum) at x.
(iv) If we can ¬nd an · > 0 such that E ⊇ B(x, ·) and f is di¬erentiable
at x with Df (x) = 0, we say that x is a critical or stationary point 5 of f .
It is usual to refer to the point x where f takes a (global or local) maxi-
mum as a (global or local) maximum and this convention rarely causes con-
fusion. When mathematicians omit the words local or global in referring to
maximum they usually mean the local version (but this convention, which I
shall follow, is not universal).
Here are some easy exercises involving these ideas.
Exercise 7.3.4. (i) Let U be an open subset of Rm containing x. Suppose
that f : U ’ R is di¬erentiable on U and that Df is continuous at x. Show
that, if f has a local maximum at x, then Df (x) = 0 .
(ii) Suppose that f : Rm ’ R is di¬erentiable everywhere and E is a
closed subset of Rm containing x. Show that, even if x is a global maximum
of the restriction of f to E, it need not be true that Df (x) = 0. [Hint: We
have already met this fact when we thought about Rolle™s theorem.] Explain
informally why the proof of Lemma 7.3.2 fails in this case.
(iii) State the de¬nitions corresponding to De¬nition 7.3.3 that we need
to deal with minima.
(iv) Let E be any subset of Rm containing y and let f be a function from
E to R. If y is both a global maximum and a global minimum for f show that
f is constant. What can you say if we replace the word ˜global™ by ˜local™ ?
We saw above how f behaved locally near 0 if Df (0) = 0. What can we
say if Df (0) = 0? In this case, the second order Taylor expansion gives
2
f (h) = f (0) + β(h, h) + (h) h
where
m m
1
β(h, h) = f,ij (0)hi hj
2 i=1 j=1
5
In other words a stationary point is one where the ground is ¬‚at. Since ¬‚at ground
drains badly, the stationary points we meet in hill walking tend to be boggy. Thus we
encounter boggy ground at the top of hills and when crossing passes as well as at lowest
points (at least in the UK, other countries may be drier or have better draining soils).
157
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 7.2: Contour lines when the derivative is zero but the second derivative
is non-singular

and (h) ’ 0 as h ’ 0. We write β = 1 D2 f and call the matrix
2
K = (f,ij (0)) the Hessian matrix. As we noted in the previous section,
the symmetry of the second partial derivatives (Theorem 7.2.6) tells us that
the Hessian matrix is a symmetric matrix and the associated bilinear map
D2 f is symmetric. It follows from a well known result in linear algebra (see
e.g. Exercise K.30) that Rn has an orthonormal basis of eigenvectors of K.
Choosing coordinate axes along those vectors, we obtain
m
2
»i h 2
D f (h, h) = i
i=1

where the »i are the eigenvalues associated with the eigenvectors.
In the coordinate system just chosen
m
1
»i h2 + (h) h 2
f (h1 , h2 , . . . , hm ) = f (0) + i
2 i=1

where (h) ’ 0 as h ’ 0. Thus, speaking informally, if all the »i are
non-zero, the ˜contour lines™ f (h) = c close to 0 will look like ˜quadratic hy-
persurfaces™ (that is m dimensional versions of conics). Figure 7.2 illustrates
the two possible contour patterns when m = 2. The ¬rst type of pattern is
that of a summit (if the contour lines are for increasing heights as we ap-
proach 0) or a bottom (lowest point)6 (if the contour lines are for decreasing
heights as we approach 0). The second is that of a pass (often called a sad-
dle). Notice that, for merchants, wishing to get from one valley to another,
the pass is the highest point in their journey but, for mountaineers, wishing
to get from one mountain to another, the pass is the lowest point.
6
The English language is rich in synonyms for highest points (summits, peaks, crowns,
. . . ) but has few for lowest points. This may be because the English climate ensures that
most lowest points are under water.
158 A COMPANION TO ANALYSIS

When looking at Figure 7.2 it is important to realise that the di¬erence
in heights of successive contour lines is not constant. In e¬ect we have drawn
contour lines at heights f (0), f (0)+·, f (0)+22 ·, f (0)+32 ·, . . . , f (0)+n2 ·.

Exercise 7.3.5. (i) Redraw Figure 7.2 with contour lines at heights f (0),
f (0) + ·, f (0) + 2·, f (0) + 3·, . . . , f (0) + n·.
(ii) What (roughly speaking) can you say about the di¬erence in heights
of successive contour lines in Figure 7.1?

Using our informal insight we can prove a formal lemma.

Lemma 7.3.6. Let U be an open subset of Rm containing x. Suppose that f :
U ’ R has second order partial derivatives on U and these partial derivatives
are continuous at x. If Df (x) = 0 and D 2 f (x) is non-singular then
(i) f has a minimum at x if and only if D 2 f (x) is positive de¬nite.
(ii) f has a maximum at x if and only if D 2 f (x) is negative de¬nite.

The conditions of the second sentence of the hypothesis ensure that we
have a local second order Taylor expansion. In most applications f will be
much better behaved than this. We say that D 2 f (x) is positive de¬nite if all
the associated eigenvalues (that is all the eigenvalues of the Hessian matrix)
are strictly positive and that D 2 f (x) is negative de¬nite if all the associated
eigenvalues are strictly negative.

Exercise 7.3.7. Prove Lemma 7.3.6 following the style of the proof of Lemma 7.3.2.

It is a non-trivial task to tell whether a given Hessian is positive or neg-
ative de¬nite.

Exercise 7.3.8. Let f (x, y) = x2 + 6xy + y 2 . Show that Df (0, 0) = 0, that
all the entries in the Hessian matrix K at (0, 0) are positive and that K
is non-singular but that D 2 f (0, 0) is neither positive de¬nite nor negative
de¬nite. (So (0, 0) is a saddle point.)

Exercise K.105 gives one method of resolving the problem.
Because it is non-trivial to use the Hessian to determine whether a sin-
gular point, that is a point x where Df (x) = 0 is a maximum, a minimum
or neither, mathematicians frequently seek short cuts.

Exercise 7.3.9. Suppose that f : Rm ’ R is continuous, that f (x) ’ 0 as
x ’ ∞ and that f (x) > 0 for all x ∈ Rm .
(i) Explain why there exists an R > 0 such that f (x) < f (0) for all
x ≥ R.
159
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Explain why there exists an x0 with x0 ¤ R and f (x0 ) ≥ f (x) for
all x ¤ R.
(iii) Explain why f (x0 ) ≥ f (x) for all x ∈ Rm .
(iv) If f is everywhere di¬erentiable and has exactly one singular point
y0 show that f attains a global maximum at y0 .
(v) In statistics we frequently wish to maximise functions of the form
k
(yi ’ ati ’ b)2
f (a, b) = exp ’ ,
i=1


with k ti = 0. Use the results above to ¬nd the values of a and b which
i=1
maximise f . (Of course, this result can be obtained without calculus but most
people do it this way.)
Mathematicians with a good understanding of the topic they are investi-
gating can use insight as a substitute for rigorous veri¬cation, but intuition
may lead us astray.
Exercise 7.3.10. Four towns lie on the vertices of a square of side a. What
is the shortest total length of a system of roads joining all four towns? (The
answer is given in Exercise K.107, but try to ¬nd the answer ¬rst before
looking it up.)
The following are standard traps for the novice and occasional traps for
the experienced.
(1) Critical points need not be maxima or minima.
(2) Local maxima and minima need not be global maxima or minima.
(3) Maxima and minima may occur on the boundary and may then not
be critical points. [We may restate this more exactly as follows. Suppose
f : E ’ R. Unless E is open, f may take a maximum value at a point e ∈ E
such that we cannot ¬nd any δ > 0 with B(e, δ) ⊆ E. However well f is
behaved, the argument of Lemma 7.3.2 will fail. For a speci¬c instance see
Exercise 7.3.4.]
(4) A function need not have a maximum or minimum. [Consider f :
U ’ R given by f (x, y) = x where U = B(0, 1) or U = R2 .]
Exercise 7.3.11. Find the maxima and minima of the function f : R2 ’ R
given by

f (x, y) = y 2 ’ x3 ’ ax

in the region {(x, y) : x2 + y 2 ¤ 1}.
Your answer will depend on the constant a.
160 A COMPANION TO ANALYSIS




Figure 7.3: Light paths in an ellipse

Matters are further complicated by the fact that di¬erent kinds of prob-
lems call for di¬erent kinds of solutions. The engineer seeks a global minimum
to the cost of a process. On the other hand if we drop a handful of ball bear-
ings on the ground they will end up at local minima (lowest points) and most
people suspect that evolutionary, economic and social changes all involve lo-
cal maxima and minima. Finally, although we like to think of many physical
processes as minimising some function, it is often the case they are really
stationarising (¬nding critical points for) that function. We like to say that
light takes a shortest path, but, if you consider a bulb A at the centre of an
ellipse, light is re¬‚ected back to A from B and B , the two closest points on
the ellipse, and from C and C , the two furthest points (see Figure 7.3).
We have said that, if f : R2 ’ R has a Taylor expansion in the neighbour-
hood of a point, then (ignoring the possibility that the Hessian is singular)
the contour map will look like that in Figures 7.1 or 7.2. But it is very
easy to imagine other contour maps and the reader may ask what happens
if the local contour map does not look like that in Figures 7.1 or 7.2. The
answer is that the appropriate Taylor expansion has failed and therefore the
hypotheses which ensure the appropriate Taylor expansion must themselves
have failed.
Exercise 7.3.12. Suppose that f : R2 ’ R is given by f (0, 0) = 0 and

f (r cos θ, r sin θ) = rg(θ)

when r > 0, where g : R ’ R is periodic with period 2π. [Informally, we
de¬ne f using polar coordinates.] Show that, if g(’θ) = ’g(θ) for all θ, then
f has directional derivatives (see De¬nition 6.1.6) in all directions at (0, 0).
If we choose g(θ) = sin θ, we obtain a contour map like Figure 7.1, but,
if g(θ) = sin 3θ, we obtain something very di¬erent.
Exercise 7.3.13. We continue with the notation of Exercise 7.3.12.
161
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(i) If g(θ) = sin θ, ¬nd f (x, y) and sketch the contour lines f (x, y) =
h, 2h, 3h, . . . with h small.
(ii) If g(θ) = sin 3θ, show that

y(3x2 ’ y 2 )
f (x, y) =
x2 + y 2
for (x, y) = 0. Sketch the contour lines f (x, y) = h, 2h, 3h, . . . with h
small.
Example 7.3.14. If
y(3x2 ’ y 2 )
f (x, y) = for (x, y) = (0, 0),
x2 + y 2
f (0, 0) = 0,

then f is di¬erentiable except at (0, 0), is continuous everywhere, has direc-
tional derivatives in all directions at (0, 0) but is not di¬erentiable at (0, 0).
Proof. By standard results on di¬erentiation (the chain rule, product rule
and so on), f is di¬erentiable (and so continuous) except, perhaps, at (0, 0).
If u2 + v 2 = 1 we have
f (uh, vh) ’ f (0, 0)
’ v(3u2 ’ v 2 )
h
as h ’ 0, so f has directional derivatives in all directions at (0, 0). Since

4(max(|x|, |y|))3
|f (x, y) ’ f (0, 0)| ¤ = 4 max(|x|, |y|) ’ 0
max(|x|, |y|))2

as (x2 + y 2 )1/2 ’ 0, f is continuous at (0, 0).
Suppose f were di¬erentiable at (0, 0). Then

f (h, k) = f (0, 0) + Ah + Bk + (h, k)(h2 + k 2 )1/2

with (h, k) ’ 0 as (h2 + k 2 )1/2 ’ 0, and A = f,1 (0, 0), B = f,2 (0, 0). The
calculations of the previous paragraph with v = 0 show that f,1 (0, 0) = 0
and the same calculations with u = 0 show that f,2 (0, 0) = ’1. Thus

f (h, k) + k = (h, k)(h2 + k 2 )1/2

and
f (h, k) + k
’0
(h2 + k 2 )1/2
162 A COMPANION TO ANALYSIS

<<

. 5
( 19)



>>