<<

. 11
( 19)



>>


(We could prove this rigorously, but a chain is as strong as its weakest link
and the real di¬culties lie elsewhere.)
We are therefore led to the following plausible statement.
323
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Plausible statement 12.4.5. The equation

y (t) + a(t)y (t) + b(t)y(t) = f (t), y(0) = y(1) = 0

has the solution
1
y(t) = f (s)G(s, t) ds.
0

We summarise the results of our informal argument in the next theorem
which we then prove formally.

Theorem 12.4.6. Suppose that f, a, b : [0, 1] ’ R are continuous. By
Exercise 12.3.11, there exists a unique twice di¬erentiable y1 : [0, 1] ’ R
such that

y1 (t) + a(t)y1 (t) + b(t)y1 (t) = 0, y1 (0) = 0, y1 (0) = 1

and a unique twice di¬erentiable y2 : [0, 1] ’ R such that

y2 (t) + a(t)y2 (t) + b(t)y2 (t) = 0, y2 (1) = 0, y2 (1) = 1.

We make the following

key assumption: y1 (1) = 0.

If we set W (t) = y2 (t)y1 (t) ’ y2 (t)y1 (t), then W is never zero and we may
de¬ne G : [0, 1]2 ’ R by

G(s, t) = y1 (t)y2 (s)W (s)’1 for 0 ¤ t ¤ s,
G(s, t) = y2 (t)y1 (s)W (s)’1 for s ¤ t ¤ 1.

With this notation, G is continuous so we can de¬ne
1
y(t) = G(s, t)f (s) ds
0

The function y, so de¬ned, is twice di¬erentiable and satis¬es

y (t) + a(t)y (t) + b(t)y(t) = f (t)

together with the conditions y(0) = y(1) = 0. Moreover this solution is
unique.
324 A COMPANION TO ANALYSIS

Proof. We have already established the contents of the ¬rst paragraph. The
proof of the continuity of G is left to the reader (see Exercise 12.4.7 (iii) for a
general result from which this follows). To show that y is di¬erentiable and
satis¬es we observe that

t 1
y(t) = G(s, t)f (s) ds + G(s, t)f (s) ds
0 t
t 1
’1
y1 (t)y2 (s)W (s)’1 f (s) ds
= y2 (t)y1 (s)W (s) f (s) ds +
0 t
t 1
’1
y2 (s)W (s)’1 f (s) ds
= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)
0 t



[Experience shows that people get into frightful muddles over this calculation.
At each stage you must ask yourself ˜is s bigger than t or vice versa and what
does this mean for the de¬nition of G(s, t)?™]
We now use the product rule and the fundamental theorem of the calculus
to show that y is twice di¬erentiable with

t
y1 (s)W (s)’1 f (s) ds + y2 (t)y1 (t)W (t)’1 f (t)
y (t) = y2 (t)
0
1
y2 (s)W (s)’1 f (s) ds ’ y1 (t)y2 (t)W (t)’1 f (t)
+ y1 (t)
t
t 1
’1
y2 (s)W (s)’1 f (s) ds
= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)
0 t


and

t
y1 (s)W (s)’1 f (s) ds + y2 (t)y1 (t)W (t)’1 f (t)
y (t) = y2 (t)
0
1
y2 (s)W (s)’1 f (s) ds ’ y1 (t)y2 (t)W (t)’1 f (t)
+ y1 (t)
t
t 1
’1
y2 (s)W (s)’1 f (s) ds
= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)
0 t
’1
+ (y2 (t)y1 (t) ’ y1 (t)y2 (t))W (t) f (t)
t 1
’1
y2 (s)W (s)’1 f (s) ds + f (t),
= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)
0 t
325
Please send corrections however trivial to twk@dpmms.cam.ac.uk

so that

y (t) + a(t)y (t) + b(t)y(t)
1
y2 (s)W (s)’1 f (s) ds
= (y1 (t) + a(t)y1 (t) + b(t)y1 (t))
t
t
y1 (s)W (s)’1 f (s) ds + f (t)
+ (y2 (t) + a(t)y2 (t) + b(t)y2 (t))
0
1 t
’1
y1 (s)W (s)’1 f (s) ds + f (t) = f (t),
=0— y2 (s)W (s) f (s) ds + 0 —
t 0

as required.
To prove uniqueness, suppose that u1 and u2 are solutions of satisfying
u1 (0) = u2 (0) = u1 (1) = u2 (1) = 0. If we set u = u1 ’ u2 then, by linearity,

u (t) + a(t)u (t) + b(t)u(t) = 0, u(0) = 0, u(1) = 0.

Suppose u (0) = ». Then u and »y1 both satisfy

w (t) + a(t)w (t) + b(t)w(t) = 0, w(0) = 0, w (0) = »

so, by the uniqueness results of the previous section, u = »y1 . But y1 (1) = 0
and u(1) = 0, so » = 0 and u = 0. Thus u1 = u2 as required.
Exercise 12.4.7. Let A and B be subsets of some metric space and consider
a function f : A ∪ B ’ R.
(i) Show that, if f is not continuous at x, then at least one of the following
two statements must be true
(±) We can ¬nd an ∈ A with an ’ x and f (an ) f (x).
(β) We can ¬nd bn ∈ B with bn ’ x and f (bn ) f (x).
(ii) Give an example where f |A and f |B are continuous, but f is not.
(iii) Show that, if A and B are closed, then the continuity of f |A and f |B
implies the continuity of f .
[For further remarks along these lines see Exercise K.178.]
The proof of Theorem 12.4.6 is rather disappointing, since it uses only
rather elementary results and gives no hint as to how proceed in more general
circumstances. (In Exercise K.278, I outline proof which involves slightly
less calculation and slightly harder theorems but it still does not get to the
heart of the matter.) However the general idea of using ˜delta functions™ or
˜impulses™ to study ordinary and, particularly, partial di¬erential equations
is very important in both pure and applied mathematics. (The acoustics
of concert halls are tested by letting o¬ starting pistols and recording the
results.)
326 A COMPANION TO ANALYSIS

From the point of view of the pure mathematician, one of the chief ad-
vantages of the Green™s function method is that it converts problems on
di¬erentiation to problems on integration with the advantages pointed out
on page 306.
In the terminology of more advanced work, we have shown how di¬erential
operators like y ’ Sy, where

Sy(t) = y (t) + a(t)y (t) + b(t)y(t),

can be linked with better behaved integral operators like
1
f ’ T f with T f (t) = G(s, t)f (s) ds.
0

Note that we have shown ST f = f for f continuous, but note also that, if f
is merely continuous, Sf need not be de¬ned. The Green™s function G is an
example of an integral kernel1 More formally, if we write
1
Jf (s) = K(s, t)f (s) ds,
0

then J is called an integral operator with kernel K.
We end with an example to show that things really do go awry if our key
assumption fails.

Example 12.4.8. The equation

y (t) + π 2 y(t) = t

has no solution satisfying y(0) = y(1) = 0.

Proof. Suppose that y satis¬es the equation

y (t) + π 2 y(t) = t
1
We shall not discuss these matters much further but most of the new words in this
last paragraph are well worth dropping.
You must lie among the daisies and discourse in novel phrases of your com-
plicated state of mind,
The meaning doesn™t matter if it™s only idle chatter of a transcendental kind.
And everyone will say,
As you walk your mystic way,
˜If this young man expresses himself in terms too deep for me
Why, what a very singularly deep young man this deep young man must be!™
. (Gilbert and Sullivan Patience)
327
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and satis¬es y(0) = 0. If we set

y (0) ’ π ’2
’2
w(t) = π t + sin πt,
π
then, by direct calculation,

w (t) + π 2 w(t) = t

and w(0) = 0 = y(0), w (0) = y (0) so, by the uniqueness results of the
previous section, y = w. In particular, y(1) = w(1) = 0.
Chapter 13

Inverse and implicit functions

13.1 The inverse function theorem
We start with an exercise giving a very simple example of a technique that
mathematicians have used for centuries.
Exercise 13.1.1. We work in R.
(i) Suppose x and y are close to 1. If (x + ·)2 = y show that

· ≈ (y ’ x2 )/2.

(ii) This suggests the following method for ¬nding the positive square root
of y. Take x0 close to the square root of y (for example x0 = 1) and de¬ne
xn inductively by xn = xn’1 + (y ’ x2 )/2. We expect xn to converge to the
n’1
positive square root of y.
(a) Try this for y = 1.21 and for y = 0.81.
(b) Try this for y = 100.
(iii) Sketch x ’ x2 /2 for 1/2 ¤ x ¤ 3/2. Show that if |y ’ 1| ¤ 1/2 and
we de¬ne T x by

T x = x + (y ’ x2 )/2

then |T x ’ 1| ¤ 1/2 whenever |x ’ 1| ¤ 1/2. Show that T : [1/2, 3/2] ’
[1/2, 3/2] is a contraction mapping and deduce that the proposed method for
¬nding square roots works if |y ’ 1| ¤ 1/2.
(iv) Find a reasonable approximation to the positive square root of 150 by
observing that

(150)1/2 = 12(150/144)1/2

and using the method proposed in part (ii).

329
330 A COMPANION TO ANALYSIS

The discussion that follows may be considered as a natural extension of
the ideas above. We work in Rn with the usual Euclidean norm.
Lemma 13.1.2. Consider a function f : Rm ’ Rm such that f (0) = 0.
Suppose there exists a δ > 0 and an · with 1 > · > 0 such that
(f (x) ’ f (y)) ’ (x ’ y) ¤ · (x ’ y)
for all x , y ¤ δ. Then, if y ¤ (1 ’ ·)δ, there exists one and only one
solution of the equation
f (x) = y
with x < δ. Further, if we denote this solution by g(y), we have
g(y) ’ y ¤ ·(1 ’ ·)’1 y .
Since Lemma 13.1.2 is most important to us when · is small there is no
harm in concentrating our attention on this case. Lemma 13.1.2 is then an
instance of the
Slogan: Anything which is close to the identity has an inverse.
(This is a slogan, not a theorem. See, for example, Exercise 13.1.7.) The
inequality
(f (x) ’ f (y)) ’ (x ’ y) ¤ · (x ’ y)
is then seen as a perturbation of the trivial equality
(I(x) ’ I(y)) ’ (x ’ y) = 0 = 0 (x ’ y) .
If we try to solve equation by the method of Exercise 13.1.1, we are
led to the observation that, if such a solution exists, with value u, say, then,
if x is close to u,
u ’ x ≈ f (u) ’ f (x) = y ’ f (x),
and so
u ≈ x + (y ’ f (x)).
This suggests that we should start with an x0 close to the solution of equa-
tion (for example x0 = y) and de¬ne xn inductively by xn = T xn’1
where
T x = x + (y ’ f (x)).
If we add the contraction mapping as a further ingredient, we arrive at the
following proof of Lemma 13.1.2.
331
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof of Lemma 13.1.2. We set
¯
X = B(0, δ) = {x ∈ Rm : x ¤ δ}.

Since X is closed in Rm , we know that (X, ) is complete.
Let y ¤ (1 ’ ·)δ. If we set

T x = x + (y ’ f (x)),

then (since f (0) = 0)

T x ¤ y + x ’ f (x)
¤ y + (f (x) ’ f (0)) ’ (x ’ 0)
¤ y +· x’0
¤ (1 ’ ·)δ + · x < δ,

whenever x ∈ X. Thus T is a well de¬ned function T : X ’ X.
If x, x ∈ X, then

Tx ’ Tx = (x ’ x ) ’ (f (x) ’ f (x )) ¤ · x ’ x ,

by the hypotheses of the lemma. Thus T is a contraction mapping and has
a unique ¬xed point u ∈ X such that

u = T u = u + (y ’ f (u)),

or, equivalently,

f (u) = y

as required. To obtain the ¬nal inequality, we return to the original proof
of the contraction mapping theorem (Theorem 12.1.3). Observe, as we did
there, that, if x ∈ X,

T n x ’ T n’1 x ¤ · T n’1 x ’ T n’2 x ¤ · n’1 T x ’ x

and so
n n
· j’1 T x ’ x ¤ (1 ’ ·)’1 T x ’ x .
T nx ’ x ¤ T j x ’ T j’1 x ¤
j=1 j=1

Since T n x ’ u,

u ’ x ¤ u ’ T nx + T nx ’ x
¤ u ’ T n x + (1 ’ ·)’1 T x ’ x ’ (1 ’ ·)’1 T x ’ x
332 A COMPANION TO ANALYSIS

as n ’ ∞, it follows that

u ’ x ¤ (1 ’ ·)’1 T x ’ x .

If we take x = y, the last inequality takes the form

u ’ y ¤ (1 ’ ·)’1 T y ’ y = (1 ’ ·)’1 y ’ f (y)
= (1 ’ ·)’1 (y ’ 0) ’ (f (y) ’ f (0))
¤ ·(1 ’ ·)’1 y ’ 0 = ·(1 ’ ·)’1 y ,

as required.

Exercise 13.1.3. Let (X, d) be a metric space (not necessarily complete)
and T : X ’ X a mapping such that d(T x, T y) ¤ Kd(x, y) for all x, y ∈ X
and some K < 1. Suppose that T has a ¬xed point w. If x0 ∈ X and we
de¬ne xn inductively by xn+1 = T xn , show that d(x0 , w) ¤ (1’K)’1 d(x0 , x1 ).

Lemma 13.1.2 provides the core of the proof of our next result. Here, and
for the rest of the chapter, we use, in addition to the Euclidean norm on Rm ,
the operator norm on the space of linear maps L(Rm , Rm ).

Lemma 13.1.4. Consider a function f : Rm ’ Rm such that f (0) = 0 and
there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(0, δ0 ). If
Df is continuous at 0 and Df (0) = I (the identity map), then we can ¬nd a
δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ¤ ρ, there exists one and
only one solution of the equation

f (x) = y

with x < δ1 . Further, if we denote this solution by g(y), the function g is
di¬erentiable at 0 with Dg(0) = I.

Proof. Since Df is continuous at 0 and Df = I, we can ¬nd a δ1 > 0 such
that δ0 > δ1 > 0 and

Df (w) ’ I < 1/2

for w ¤ δ1 . Applying the mean value inequality to the function h = f ’ I,
we obtain

(f (x) ’ f (y)) ’ (x ’ y) = h(x) ’ h(y) ¤ x ’ y sup Dh(w)
w¤δ1

= x’y Df (w) ’ I ¤ x ’ y /2
sup
w ¤δ1
333
Please send corrections however trivial to twk@dpmms.cam.ac.uk

for all x , y ¤ δ1 . Setting ρ = δ1 /2, we see, by Lemma 13.1.2, that, if
y ¤ ρ, there exists one and only one solution of the equation

f (x) = y

with x < δ1 . For the rest of the proof we denote this solution by g(y).
To discuss the behaviour of g near 0, we echo the discussion of the ¬rst
paragraph. Let · > 0 be given. Since Df is continuous at 0 and Df = I, we
can ¬nd a δ(·) > 0 such that δ1 > δ(·) > 0 and

Df (w) ’ I < ·

for w ¤ δ(·). By exactly the same reasoning as in the ¬rst paragraph,

f (x) ’ f (y) ’ (x ’ y) ¤· x’y

for all x , y ¤ δ(·). The last sentence of Lemma 13.1.2 now tells us that,
if y ¤ (1 ’ ·)δ(·), the unique solution of the equation

f (x) = y

with x < δ(·), which we have already agreed at the end of the previous
paragraph to denote by g(y), satis¬es

g(y) ’ y ¤ ·(1 ’ ·)’1 y .

In less roundabout language,

g(y) ’ y ¤ ·(1 ’ ·)’1 y

whenever y ¤ (1 ’ ·)δ(·). Since f (0) = 0, we have g(0) = 0 and

g(y) ’ g(0) = Iy + (y) y

(y) ’ 0 as y ’ 0. Thus g is di¬erentiable at 0 with derivative
with
I.

Exercise 13.1.5. In the second paragraph of the proof just given it says ˜By
exactly the same reasoning as in the ¬rst paragraph™. Fill in the details.

Exercise 13.1.6. The only point of proving Lemma 13.1.2 was to expose the
inner workings of the proof of Lemma 13.1.4. Prove Lemma 13.1.4 directly.
(The proof is essentially the same but combining the proofs of the two lemma
is more economical.)
334 A COMPANION TO ANALYSIS

The next exercise shows that simply knowing that f is di¬erentiable at 0
with derivative I is not enough to give the existence of a local inverse.

Exercise 13.1.7. (i) Let f : R ’ R be given by f (x) = x + x2 sin(1/x)
for x = 0, f (0) = 0. Show, by using the de¬nition of di¬erentiability, or
otherwise, that f is di¬erentiable at 0 with f (0) = 1. By considering the
maxima and minima of f , or otherwise, show that there exist a sequence
of non-zero yn ’ 0 such that the equation f (x) = yn has more than one
solution.
(ii) Let f : R ’ R be given by

f (x) = 1 for x > 1,
for 1/n ≥ x > 1/(n + 1), n a strictly positive integer,
f (x) = 1/n
f (0) = 0
f (x) = ’f (’x) for x < 0.

Show that f is di¬erentiable at 0 with f (0) = 1 but that there exist a sequence
of non-zero yn ’ 0 such that the equation f (x) = yn has no solution.

It might be thought that Lemma 13.1.4 refers to a very special situation,
but we can now apply another
Slogan: Anything which works for the identity will work, in a suitable form,
for invertible elements. Moreover, the general case can be obtained from the
special case of the identity.

Lemma 13.1.8. Consider a function f : Rm ’ Rm such that f (0) = 0 and
there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(0, δ0 ).
If Df is continuous at 0 and Df (0) is invertible, then we can ¬nd a δ1 with
δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ¤ ρ, there exists one and only one
solution of the equation

f (x) = y

with x < δ1 . Further, if we denote this solution by g(y), the function g is
di¬erentiable at 0 with Dg(0) = Df (0)’1 .

Proof. Set ± = Df (0) and

F(x) = ±’1 f (x).

The proof now runs along totally predictable and easy lines.
335
Please send corrections however trivial to twk@dpmms.cam.ac.uk

By using standard chain rules (or simple direct calculations), the function
F : Rm ’ Rm satis¬es F(0) = 0 and F is di¬erentiable in the open ball
B(0, δ0 ) with derivative DF given by
DF(x) = ±’1 Df (x).
Thus DF is continuous at 0 and DF(0) = I. It follows, by Lemma 13.1.4,
that there exists a δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that if y ¤ δ1 ,
˜
˜
there exists one and only one solution of the equation
˜
F(x) = y
with x < ρ. Further, if we denote this solution by G(˜ ), the function G
˜ y
is di¬erentiable at 0 with DG(0) = I.
Equation can be rewritten as
±’1 f (x) = y.
˜
Taking y = ±˜ , and writing
y
g(t) = G(±’1 t),
we can rewrite the conclusion of the last paragraph as follows. If ± ’1 y ¤ ρ
˜
there exists one and only one solution of the equation
f (x) = y
with x < δ1 . Further, if we denote this solution by g(y), the function g is
di¬erentiable at 0 with Dg(0) = ±’1 = Df (0)’1 .
If we set ρ = ±’1 ’1 ρ, then, whenever y ¤ ρ, we have
˜
±’1 y ¤ ±’1 y ¤ρ
˜
and we have obtained all the conclusions of the lemma.
A simple translation argument transforms Lemma 13.1.8 to an apparently
more general form.
Lemma 13.1.9. Consider a function f : Rm ’ Rm and a w ∈ Rm such that
there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(w, δ0 ). If
Df is continuous at w and Df (w), is invertible then we can ¬nd a δ1 with
δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ’ f (w) ¤ ρ, there exists one and
only one solution of the equation
f (x) = y
with x ’ w < δ1 . Further, if we denote this solution by g(y), the function
g is di¬erentiable at f (w) with Dg(f (w)) = Df (w)’1 .
336 A COMPANION TO ANALYSIS

Exercise 13.1.10. Prove Lemma 13.1.9 from Lemma 13.1.8.
We have now done most of the hard work of this section but Lemma 13.1.9
can be made much more useful if we strengthen its hypotheses.

Slogan: Analysis is done not at points but on open sets.

Our ¬rst two results are preliminary.
Lemma 13.1.11. (i) We write ι for the identity map ι : Rn ’ Rn . If
± : Rn ’ Rn is linear and ι ’ ± < 1, then ± is invertible.
(ii) Suppose β, γ : Rn ’ Rn are linear and β is invertible. If β ’ γ <
β ’1 ’1 , then γ is invertible.
(iii) Consider a function f : Rm ’ Rm which is di¬erentiable on an open
set U . If Df is continuous and invertible at a point u0 ∈ U , then we can ¬nd
an open set B ⊆ U with u0 ∈ B such that Df is invertible at every point of
B.
Proof. (i) Since we are working with ¬nite dimensional vector spaces, ± is
invertible if and only if it is injective and ± is injective if and only if ker ± =
{0}. If x = 0 then

±x = x ’ (ι ’ ±)x ≥ x ’ (ι ’ ±)x ≥ x ’ ι ’ ± x > 0,

and so ±x = 0. The result follows.
(ii) Observe that

ι ’ β ’1 γ = β ’1 (β ’ γ) ¤ β ’1 (β ’ γ) < 1

and so β ’1 γ is invertible. Write θ = (β ’1 γ)’1 and observe that (θβ ’1 )γ =
θ(β ’1 γ) = ι. Thus, since we are dealing with ¬nite dimensional spaces, γ is
invertible with inverse θβ ’1 .
(iii) By the continuity of Df we can ¬nd an open set B ⊆ U with u0 ∈ B
such that

Df (u0 ) ’ Df (u) < Df (u0 )’1 ’1
.

The stated result follows from part (ii).
(Both Lemma 13.1.11 and its proof may be classi¬ed as ˜quick and dirty™. A
more leisurely approach, which reveals much more of what is actually going
on, is given in Exercise K.281.)
Lemma 13.1.12. Consider a function f : Rm ’ Rm which is di¬erentiable
on an open set U . If Df is continuous and invertible at every point of U ,
then f (U ) is open.
337
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof. Suppose w ∈ U . Since U is open, we can ¬nd a δ0 > 0 such that
the open ball B(w, δ0 ) is a subset of U . By hypothesis, Df is continuous at
w and Df (w) is invertible. Thus, by Lemma 13.1.9, we can ¬nd a δ1 with
δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ’ f (w) ¤ ρ, there exists a solution
of the equation

f (x) = y

with x ’ w < δ1 . It follows that

B(f (w), ρ) ⊆ f (B(w, δ0 )) ⊆ f (U ).

We have shown that f (U ) is open

Theorem 13.1.13. (Inverse function theorem.) Consider a function
f : Rm ’ Rm which is di¬erentiable on an open set U . Suppose, further, that
Df is continuous at every point of U , that w ∈ U and Df (w) is invertible.
Then we can ¬nd an open set B ⊆ U with w ∈ B and an open set V such
that
(i) f |B : B ’ V is bijective,
(ii) f |’1 : V ’ B is di¬erentiable with
B


Df |’1 (f (u)) = (Df (u))’1
B


for all u ∈ B.

Proof. Suppose w ∈ U . By Lemma 13.1.11, we can ¬nd a δ0 > 0 such that
the open ball B(w, δ0 ) is a subset of U and Df (u) is invertible at every point
u ∈ B(w, δ0 ).
We now use the same argument which we used in Lemma 13.1.12. We
know that Df is continuous at w and Df (w) is invertible. Thus, by Lemma 13.1.9,
we can ¬nd a δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that if y ’ f (w) ¤ ρ
there exists one and only one solution of the equation

f (x) = y

with x ’ w < δ1 . Set B = B(w, δ1 ) and apply Lemma 13.1.12 and
Lemma 13.1.9.

A slight strengthening of Theorem 13.1.13 is given in Exercise K.288. The
following cluster of easy exercises is intended to illuminate various aspects of
the inverse function theorem.
338 A COMPANION TO ANALYSIS

Exercise 13.1.14. Suppose U and V are open subsets of Rm and f : U ’ V ,
g : V ’ U are such that g —¦ f is the identity map on U . Show that if f is
di¬erentiable at u ∈ U and g is di¬erentiable at f (u) then (Df )(u) and
(Dg)(f (u)) are invertible and
’1
(Dg)(f (u)) = (Df )(u) .
Exercise 13.1.15. (A traditional examination question.) Let f : R ’ R be
given by f (x) = x3 . Show that f is bijective but f (0) = 0 (so the derivative
of f at 0 is not invertible).
Exercise 13.1.16. Consider the open interval (’4, 4). Find f ((’4, 4)) for
the functions f : R ’ R given by
(i) f (x) = sin x,
(ii) f (x) = sin 10’2 x,
(iii) f (x) = x2 ,
(iv) f (x) = x3 ’ x.
In each case comment brie¬‚y on the relation of your result to Lemma 13.1.12.
Exercise 13.1.17. Let
U = {(x, y) ∈ R2 : 1 < x2 + y 2 < 2}
and de¬ne f : U ’ R by
x2 ’ y 2 2xy
f (x, y) = ,2 .
(x2 + y 2 )1/2 (x + y 2 )1/2
(i) Show that U is open.
(ii) Show that f is di¬erentiable on U and that Df is continuous and
invertible at every point of U .
(iii) By using polar coordinates discover why f is de¬ned as it is. Show
that f (U ) = U but f is not injective.
Once the ideas behind the proof of the inverse function theorem are un-
derstood, it can be condensed into a very short argument. In Dieudonn´™s e
account the essential content of both this section and the next is stated and
proved in more general form in about two pages ([13] Chapter X, Theo-
rem 10.2.1). We leave it to the reader to undertake this condensation1 . The
usual approach to the inverse function theorem uses the contraction map-
ping theorem in a rather more subtle way than the approach adopted here.
I outline the alternative approach in Appendix F.
1
There is a Cambridge story about an eminent algebraic geometer who presented his
subject entirely without diagrams. However, from time to time, when things got di¬cult,
he would hide part of the blackboard, engage in some rapid but hidden chalk work, rub
out the result and continue.
339
Please send corrections however trivial to twk@dpmms.cam.ac.uk

The implicit function theorem ™
13.2
The contents of this section really belong to a ¬rst course in di¬erential ge-
ometry. However, there is an old mathematical tradition called ˜pass the
parcel™ by which lecturers assume that all the hard but necessary prelimi-
nary work has been done ˜in a previous course™2 . In accordance with this
tradition, lecturers in di¬erential geometry frequently leave the proof of the
implicit function theorem in the hands of an, often mythical, earlier lecturer
in analysis. Even when the earlier lecturer actually exists, this has the ef-
fect of ¬rst exposing the students to a proof of a result whose use they do
not understand and then making them use a result whose proof they have
forgotten.
My advice to the reader, as often in this book, is not to take this section
too seriously. It is far more important to make sure that you are con¬dent of
the meaning and proof of the inverse function theorem than that you worry
about the details of this section.
Consider the function h : R2 ’ R given by h(x, y) = x2 + y 2 . We know
that the contour (or level) line

x2 + y 2 = 1

can represented by a graph

x = (1 ’ y 2 )1/2

close to the point (x, y) = (0, 1). We also know that this representation fails
close to the point (x, y) = (1, 0) but that near that point we can use the
representation

y = (1 ’ x2 )1/2 .

Leaping rapidly to a conclusion, we obtain the following slogan
Slogan: If f : R2 ’ R behaves well in a neighbourhood of a point (x0 , y0 )
then at least one of the following two statements must be true.
(a) There exists a δ > 0 and a well behaved bijective function g :
(’δ, δ) ’ R such that g(0) = y0 and f (x + x0 , g(x)) = f (x0 , y0 ) for all
x ∈ (’δ, δ).
(b) There exists a δ > 0 and a well behaved bijective function g :
(’δ, δ) ’ R such that g(0) = x0 and f (g(y), y + y0 ) = f (x0 , y0 ) for all
y ∈ (’δ, δ).
2
Particular topics handled in this way include determinants, the Jordan normal form,
uniqueness of prime factorisation and various important inequalities.
340 A COMPANION TO ANALYSIS




Figure 13.1: Problems at a saddle

Exercise 13.2.1. Consider the contour x3 ’ y 2 = 0. Show that we can ¬nd
g1 : R ’ R and g2 : R ’ R such that

for all x ∈ R,
x3 ’ g1 (x)2 = 0
for all y ∈ R,
g2 (y)3 ’ y 2 = 0

but g1 is di¬erentiable everywhere and g2 is not. Explain in simple terms why
this is the case.

One way of looking at our slogan is to consider a walker on a hill whose
height is given by f (x, y) at a point (x, y) on a map. The walker seeks to walk
along a path of constant height. She is clearly going to have problems if she
starts at a strict maximum since a step in any direction takes her downward.
A similar di¬culty occurs at a strict minimum. A di¬erent problem occurs
at a saddle point (see Figure 13.1). It appears from the picture that, if f
is well behaved, there are not one but two paths of constant height passing
through a saddle point and this will create substantial di¬culties3 . However,
these are the only points which present problems.
Another way to look at our slogan is to treat it as a problem in the
calculus (our arguments will, however, continue to be informal). Suppose
that

f (x, g(x)) = f (x0 , y0 ).

Assuming that everything is well behaved, we can di¬erentiate with respect
to x, obtaining

f,1 (x, g(x)) + g (x)f,2 (x, g(x)) = 0
3
Remember Buridan™s ass which placed between two equally attractive bundles of hay
starved to death because it was unable to ¬nd a reason for starting on one bundle rather
than the other.
341
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so, provided that f,2 (x, y) = 0,

f,1 (x, g(x))
g (x) = ’ .
f,2 (x, g(x))

Thus, provided that f is su¬ciently well behaved and

f,1 (x0 , y0 ) = 0,

our earlier work on di¬erential equations tells us that there exists a local
solution for g.
The two previous paragraphs tend to con¬rm the truth of our slogan and
show that there exist local contour lines in the neighbourhood of any point
(x0 , y0 ) where at least one of f,1 (x0 , y0 ) and f,2 (x0 , y0 ) does not vanish. We
shall not seek to establish the global existence of contour lines4 . Instead we
seek to extend the ideas of our slogan to higher dimensions.
We argue informally, assuming good behaviour as required. Consider a
function f : Rm — Rn ’ Rn . Without loss of generality, we may suppose that
f (0, 0) = 0. An appropriate generalisation of the questions considered above
is to ask about solutions to the equation

f (x, gh (x)) = h (1)

where h is ¬xed. (Note that x ∈ Rm , gh (x) ∈ Rn and h ∈ Rn . Since
everything is local we suppose h is small and we only consider (x, gh (x))
close to (0, 0).) Before proceeding to the next paragraph the reader should
convince herself that the question we have asked is a natural one.
The key step in resolving this problem is to rewrite it. De¬ne ˜, g :

R — R ’ R — R by
m n m n


˜(x, y) = (x, f (x, y))
f
˜
g(x, y) = (x, gy (x)).

If equation (1) holds, then
˜(˜ (x, h)) = ˜(x, gh (x)) = x, f (gh (x), x) = (x, h),
fg f

and so
˜(˜ (x, h)) = (x, h).
fg (2)
4
Obviously the kind of ideas considered in Section 12.3 will play an important role in
such an investigation. One obvious problem is that, when we look at a contour in one
location, there is no way of telling if it will not pass through a saddle point at another.
342 A COMPANION TO ANALYSIS

Conversely, if equation (2) holds, then equation (1) follows.
To solve equation (2) we need to use the inverse mapping results of the
previous section and, to use those results, we need to know if D˜(x, y) is
f
invertible at a given point (x, y). Now
D˜(x, y)(u, v) = Df (x, y)(u, v) + u,
f
and so D˜(x, y) is invertible if and only if the linear map ± : Rn ’ Rn , given
f
by
±v = Df (x, y)(0, v),
is invertible. (We present two proofs of this last statement in the next two
exercises. Both exercises take some time to state and hardly any time to do.)
Exercise 13.2.2. In this exercise we use column vectors. Thus we write
x
˜x x x
˜
f = ,g = .
x
f
y y gy (x)
y

x
Suppose that Df has matrix C with respect to the standard basis. Explain
y
why C is an n — (n + m) matrix which we can therefore write as
C = (B A),
with A an n — n matrix and B an n — m matrix.
x
Show that D˜f has matrix
y
I0
E= ,
BA
where 0 is the n — m matrix consisting of entirely of zeros and I is the m — m
identity matrix. By considering det E, or otherwise, show that E is invertible
if and only if A is. Show that A is the matrix of the linear map ± : Rn ’ Rn
given by
x 0
±v = Df
y v
Exercise 13.2.3. Let W be the direct sum of two subspaces U and V . If
γ : W ’ V is a linear map, show that the map ± : V ’ V , given by
±v = γv for all v ∈ V , is linear.
Now de¬ne γ : W ’ W by γ (u + v) = γ(u + v) + u for u ∈ U , v ∈ V .
˜ ˜
Show that γ is a well de¬ned linear map, that ker(˜ ) = ker(±) and that
˜ γ
γ (W ) = U + ±(V ). Conclude that γ is invertible if and only if ± is.
˜ ˜
343
Please send corrections however trivial to twk@dpmms.cam.ac.uk

We can now pull the strands together.

Theorem 13.2.4. (Implicit function theorem.) Consider a function f :
Rm —Rn ’ Rn which is di¬erentiable on an open set U . Suppose further that
Df is continuous at every point of U , that (x0 , y0 ) ∈ U and that the linear
map ± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible. Then we can ¬nd an open set B1 in Rm , with x ∈ B1 , and an
open set B2 in Rm , with f (x0 , y0 ) ∈ B2 , such that, if z ∈ B2 , there exists a
di¬erentiable map gz : B1 ’ Rm with (x, gz (x)) ∈ U and

f (x, gz (x)) = z

for all x ∈ B1 .

(We give a slight improvement on this result in Theorem 13.2.9 but The-
orem 13.2.4 contains the essential result.)
Proof. De¬ne ˜ : Rm — Rn ’ Rm — Rn by
f
˜(x, y) = (x, f (x, y)).
f

If (x, y) ∈ U then D˜(x, y) exists and
f

D˜(x, y)(s, t) = Df (x, y)(s, t) + s.
f

In particular, since ±, is invertible D˜(x0 , y0 ) is invertible. It follows, by the
f
inverse function theorem (Theorem 13.1.13), that we can ¬nd an open set
B ⊆ U with (x0 , y0 ) ∈ B and an open set V such that
(i) ˜|B : B ’ V is bijective, and
f
(ii) ˜|’1 : V ’ B is di¬erentiable.
fB
Let us de¬ne G : V ’ Rm and g : V ’ Rn by
˜|’1 (v) = (G(v), g(v))
fB

for all v ∈ V . We observe that g is everywhere di¬erentiable on V . By
de¬nition,

(x, z) = ˜|B (˜|’1 (x, z))
f fB
= ˜|B (G(x, z), g(x, z))
f
= (G(x, z), f (G(x, z), g(x, z)))
344 A COMPANION TO ANALYSIS

and so

G(x, z) = x,

and

z = f (x, g(x, z))

for all (x, z) ∈ V .
We know that V is open and (x0 , f (x0 , y0 )) ∈ V , so we can ¬nd an open
set B1 in Rm , with x0 ∈ B1 , and an open set B2 in Rn , with f (x0 , y0 ) ∈ B2 ,
such that B1 — B2 ⊆ V . Setting

gz (x) = g(x, z)

for all x ∈ B1 and z ∈ B2 , we have the required result.

Remark 1: We obtained the implicit function theorem from the inverse func-
tion theorem by introducing new functions ˜ and g. There is, however, no
˜
f
reason why we should not obtain the implicit function directly by following
the same kind of method as we used to prove the inverse function theorem.
Here is the analogue of Lemma 13.1.2

Lemma 13.2.5. Consider a function f : Rm ’ Rn such that f (0, 0) = 0.
Suppose that there exists a δ > 0 and an · with 1 > · > 0 such that

(f (x, t) ’ f (x, s)) ’ (s ’ t) ¤ · (s ’ t)

for all x , s , t ¤ δ [x ∈ Rm , s, t ∈ Rn ]. Then, if h ¤ (1 ’ ·)δ,
there exists one and only one solution of the equation

f (x, u) = h

with u < δ. Further, if we denote this solution by gh (x), we have

gh (x) ’ h ¤ ·(1 ’ ·)’1 h .

Exercise 13.2.6. Prove Lemma 13.2.5. Sketch, giving as much or as little
detail as you wish, the steps from Lemma 13.2.5 to Theorem 13.2.4. You
should convince yourself that the inverse function theorem and the implicit
function theorem are ¬ngers of the same hand.
345
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 2: In the introduction to this section we obtained contours as the
solutions of an ordinary di¬erential equation of the form
f,1 (x, g(x))
g (x) = ’ .
f,2 (x, g(x))
The situation for the general implicit function theorem is genuinely more
complicated. Suppose, for example that we have a function f : R4 ’ R2 and
we wish to ¬nd (u, v) : R2 ’ R2 so that (at least locally)

f (x, y, u(x, y), v(x, y)) = h,

that is

f1 (x, y, u(x, y), v(x, y)) = h1 ,
f2 (x, y, u(x, y), v(x, y)) = h2 .

On di¬erentiating, we obtain
‚u ‚v
f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,
‚x ‚x
‚u ‚v
f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,
‚y ‚y
‚u ‚v
f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,
‚x ‚x
‚u ‚v
f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,
‚y ‚y


so we are faced with a system of simultaneous partial di¬erential equations.
(For a very slightly di¬erent view of the matter see Exercise E.2.)
Remark 3: We proved Theorem 13.2.4 under the assumption the linear map
± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible.
This condition gives unnecessary prominence to the last n coordinates.
To get round this, we introduce a new de¬nition.
De¬nition 13.2.7. We say that a linear map β : Rn+m ’ Rn has full rank
if β is surjective.
The next, very easy, lemma gives some context.
346 A COMPANION TO ANALYSIS

Lemma 13.2.8. Suppose that a linear map β : Rn+m ’ Rn has matrix B
with respect to standard coordinates. (We use column vectors.) Then the
following conditions are equivalent.
(i) β has full rank.
(ii) B has n linearly independent columns.
(iii) By permuting the coordinate axes, we can ensure that the last n
columns of B are linearly independent.
(iv) By permuting the coordinate axes, we can ensure that the linear map
± : Rn ’ Rn given by
±t = β(0, t)
is invertible.
Proof. To see that (i) implies (ii), observe that
rank β = dim βRn+m = dim span{b1 , b2 , , . . . , bn+m }
where bj is the jth column of B. If β has full rank then, since any spanning
set contains a basis, n of the bj must be linearly independent.
It is clear that (ii) implies (iii). To see that (iii) implies (iv), observe that,
if ± has matrix A with respect to standard coordinates after our permutation,
then A is an n — n matrix whose columns are the ¬rst n columns of B and
so linearly independent. Thus A is invertible and so ± is.
To see that (iv) implies (i), observe that
{b1 , b2 , , . . . , bn+m } ⊇ {a1 , a2 , , . . . , an }
where bk is the kth column of A, the matrix of ± with respect to standard
coordinates. Thus
Rn ⊇ span{b1 , b2 , , . . . , bn+m } ⊇ span{a1 , a2 , , . . . , an } = Rn ,
so βRn+m = Rn and β has full rank.
The appropriate modi¬cation of Theorem 13.2.4 is now obvious.
Theorem 13.2.9. Consider a function f : Rm+n ’ Rn which is di¬eren-
tiable on an open set U . Suppose further that Df is continuous at every point
of U , that w0 ∈ U and that Df (w0 ) has full rank. Then we can permute the
coordinate axes in such a way that, if we write w0 = (x0 , y0 ) ∈ Rm — Rn ,
we can ¬nd an open set B1 in Rm , with x0 ∈ B1 , and an open set B2 in
Rn with f (w0 ) ∈ B2 such that, if z ∈ B2 , there exists a di¬erentiable map
gz : B1 ’ Rm with (x, gz (x)) ∈ U and
f (x, gz (x)) = z
for all x ∈ B1 .
347
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 4: In Appendix C, I emphasised the importance of a large stock of
examples, but I talked mainly about examples of badly behaved functions.
It is also true that mathematicians ¬nd it hard to lay their hands on a
su¬ciently diverse collection of well behaved functions. In this book we
started with the polynomials. To these we added the function x ’ 1/x
and got the rational functions x ’ P (x)/Q(x) with P and Q polynomials.
We then added power series and solutions of di¬erential equations. The
inverse function theorem gives us a new collection of interesting functions and
the implicit function theorem greatly extends this collection. The implicit
function theorem is particularly important in giving us interesting examples
of well behaved functions f : Rn ’ R when n ≥ 2.


Lagrange multipliers ™
13.3
Suppose that t : R4 ’ R and f : R4 ’ R2 are well behaved functions and we
wish to maximise t(x) subject to the condition f (x) = h.
This will involve a fair amount of calculation, so it seems reasonable to
simplify our task as much as possible. One obvious simpli¬cation is to trans-
late so that h = (0, 0) and f (0, 0, 0, 0) = (0, 0). We now wish to investigate
whether x = (0, 0, 0, 0) maximises t(x) subject to the condition f (x) = (0, 0).
A more interesting simpli¬cation is to observe that, since we are assuming
good behaviour, Df (0) will have full rank. We can therefore ¬nd invertible
linear maps ± : R4 ’ R4 and β : R2 ’ R2 such that βDf (0)± has matrix
1000
J=
0100
with respect to the standard basis (and using the column vector representa-
tion).
If we set
F(x) = β(f (±x)), T (x) = t(±x)
then our problem becomes that of investigating whether x = (0, 0, 0, 0) max-
imises T (x), subject to the condition F(x) = (0, 0). By the implicit function
theorem, we can ¬nd well behaved u, v : R ’ R2 such that u(0, 0) = v(0, 0) =
(0, 0) and (at least if x and y are small)
F(u(z, w), v(z, w), z, w) = (0, 0).
By the chain rule Df (0) has matrix J with respect to the standard basis
(using the column vector representation) and so
u,1 (0, 0) = u,2 (0, 0) = v,1 (0, 0) = v,2 (0, 0) = 0.
348 A COMPANION TO ANALYSIS

Before proceeding further the reader should do the next exercise.

Exercise 13.3.1. Explain why our conditions on F tell us that near (0, 0, 0, 0)

F(x, y, z, w) ≈ (x, y).

Suppose that, in fact, F(x, y, z, w) = (x, y) exactly. Find u and v. What
conditions on T ensure that x = (0, 0, 0, 0) maximises T (x) subject to the
condition F(x) = (0, 0)?

Since (at least locally) F(u(z, w), v(z, w), z, w) = (0, 0), our problem re-
duces to studying whether (z, w) = (0, 0) maximises „ (z, w) = T (u(z, w), v(z, w), z, w).
Since every maximum is a stationary point we ¬rst ask if (z, w) = (0, 0) is a
stationary point, that is, if
‚„ ‚„
(0, 0) = (0, 0) = 0.
‚z ‚w
By the chain rule,
‚„ ‚u ‚v
(z, w) = T,1 (u(z, w), v(z, w), z, w) (z, w) + T,2 (u(z, w), v(z, w), z, w) (z, w)
‚z ‚z ‚z
+ T,3 (u(z, w), v(z, w), z, w),

and so
‚„
(0, 0) = T,3 (0, 0, 0, 0).
‚z
A similar calculation gives
‚„
(0, 0) = T,4 (0, 0, 0, 0),
‚w
so (0, 0) is a stationary point of „ if and only if

T,3 (0, 0, 0, 0) = T,4 (0, 0, 0, 0) = 0.

We have found a necessary condition for (0, 0, 0, 0) to maximise T (x) subject
to F(x) = 0, and so for (0, 0, 0, 0) to maximise t(x) subject to f (x) = 0.
In order to make this condition usable, we must translate it back into the
terms of our original problem and this is most easily done by expressing it
in coordinate-free notation. Observe that if we write

l(x) = T (x) ’ T,1 (0)F1 (x) ’ T,2 (0)F2 (x)
349
Please send corrections however trivial to twk@dpmms.cam.ac.uk

then

l,1 (0) = T,1 (0) ’ T,1 (0)F1,1 (x) ’ T,2 (0)F2,1 (0) = T,1 (0) ’ T,1 (0) = 0,
l,2 (0) = T,2 (0) ’ T,1 (0)F1,2 (x) ’ T,2 (0)F2,2 (0) = T,2 (0) ’ T,2 (0) = 0,
l,3 (0) = T,3 (0) ’ T,1 (0)F1,3 (x) ’ T,2 (0)F2,3 (0) = T,3 (0),
l,4 (0) = T,4 (0) ’ T,1 (0)F1,4 (x) ’ T,2 (0)F2,4 (0) = T,4 (0),

so, if (0, 0, 0, 0) is a maximum,

D(T ’ T,1 (0)F1 ’ T,2 (0)F2 )(0) = 0.

Thus, if (0, 0, 0, 0) gives a maximum and θ : R2 ’ R is the linear map
given by θ(x, y) = T,1 (0)x + T,2 (0)y, we have

D(T ’ θF)(0) = 0.

Using the de¬nitions of T and F, this gives

D(t± ’ θβf (±))(0) = 0,

so by the chain rule

(Dt(0) ’ θβDf (0))± = 0.

Since ± is invertible, this is equivalent to saying

(Dt(0) ’ θβDf (0)) = 0,

and, since β is invertible, this, in turn, is equivalent to saying that there
exists a linear map φ : R2 ’ R such that

(Dt(0) ’ φDf (0)) = 0,

so, using the chain rule once again,

D(t ’ φf )(0) = 0.

Writing φ(x, y) = »1 x + »2 y, we see that that x = (0, 0, 0, 0) maximises T (x),
subject to the condition F(x) = (0, 0), only if we can ¬nd »1 and »2 such
that t ’ »1 F1 ’ »2 F2 has (0, 0, 0, 0) as a stationary point.
The argument generalises easily to any number of dimensions.

Exercise 13.3.2. Prove Lemma 13.3.3.
350 A COMPANION TO ANALYSIS

Lemma 13.3.3. (Lagrangian necessity.) Consider functions f : Rm+n ’
Rn and t : Rm+n ’ R which are di¬erentiable on an open set U . Suppose
further that Df is continuous and of full rank at every point of U . If h ∈ R n ,
then, if z ∈ U is such that
(i) f (z) = h, and
(ii) t(z) ≥ t(x) for all x ∈ U such that f (x) = h,
then it follows that there exists a » ∈ Rn such that the function t ’ » · f :
Rm+n ’ R has z as a stationary point.
Exercise 13.3.4. Suppose that f : R2 ’ R and t : R2 ’ R are very well
behaved. Convince yourself that, in this case, Lemma 13.3.3 reads as follows.
A man takes the path f (x, y) = h over a hill of height t(x, y). He reaches the
highest point of the path at a point (x0 , y0 ) where the path and the contour line
t(x, y) = t(x0 , y0 ) have a common tangent. Generalise to higher dimensions.
Lemma 13.3.3 gives us a recipe for ¬nding the maximum of t(x), subject to
the conditions x ∈ U and f (x) = h, when t : Rn+m ’ R and f : Rn+m ’ Rn
are well behaved functions and U is open.
(1) Set

L(», x) = t(x) ’ » · f (x)

with » ∈ Rn . (L is called the Lagrangian.)
(2) For each ¬xed », ¬nd the set E(») of stationary points of L(», ) lying
within U . (The previous sentence uses the notational convention described
in the paragraph labled Abuse of Language on Page 4215 .)
(3) Now vary » and ¬nd all »— and x— ∈ E(»— ) such that f (x— ) = h. The
maximum, if it exists, will lie among these x— .
Put like this, the whole procedure has the same unreal quality that in-
structions on how to ride a bicycle would have for someone who had only
seen a photograph of a bicycle. I suspect that the only way to learn how to
use Lagrange™s method is to use it. I expect that most of my readers will
indeed be familiar with Lagrange™s method and, in any case this is a book on
the theory of analysis. However, I include one simple example of the method
in practice.
Example 13.3.5. Find the circular cylinder of greatest volume with given
surface area. (That is, ¬nd the optimum dimensions for tin of standard
shape.)
5
This convention was thoroughly disliked by several of the readers of my manuscript
but, as one of them remarked, ˜No really satisfactory notation exists™. The reader should
feel free to use her own notation. In particular she may prefer to use a placeholder and
write L(», ·)
351
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Calculation. Let the height of the cylinder be h and the radius be r. We
wish to maximise the volume

V (r, h) = πr 2 h,

subject to keeping the surface area

A(r, h) = 2πr 2 + 2πrh

¬xed, with A(r, h) = A [r, h > 0], say.
The instructions for step (1) tell us to form the Lagrangian

L(», r, h) = V (r, h) ’ »A(r, h) = πr 2 h ’ »(2πr 2 + 2πrh).

The instructions for step (2) tell us to seek stationary values for L(», , )
when » is ¬xed. We thus seek to solve
‚L ‚L
(», r, h) = 0, (», r, h) = 0,
‚r ‚h
that is

2πrh ’ »(4πr ’ 2πh) = 0, πr 2 ’ 2»πr = 0,

or, simplifying,

rh ’ »(2r + h) = 0, r ’ 2» = 0

Thus, if » is ¬xed and positive, L(», , ) has a unique stationary point given
by r = r» = 2», h = h» = 4». (If » ¤ 0, there are no stationary points
consistent with our restrictions r > 0, h > 0.)
We now proceed to step (3) which requires us to ¬nd » such that

A = A(r» , h» )

that is

A = A(2», 4») = 24π»2 .

This has unique solution with » > 0. We know, on general grounds (see
Exercise 13.3.6), that a maximum exists so, since Lagrange™s method must
produce the maximum point, and produces only one point in this case, this
point must be the maximum.
The neatest way of writing our solution (and one that we could have
obtained directly by eliminating » from equation ) is that the cylinder of
maximum volume with given surface area has height twice its radius.
352 A COMPANION TO ANALYSIS

It must be admitted that the standard tin does not follow our prescription
very closely. (Presumably the cost of materials is relatively small compared
to other costs. We must also remember that near a maximum small changes
produce very small e¬ects so the penalties for deviating are not high.)
Exercise 13.3.6. We work with the notation and hypotheses introduced in
the calculations in Example 13.3.5. Choose r0 and h0 such that r0 , h0 > 0
and A(r0 , h0 ) = A. Show that there exists an R > 1 such that, if r > R or
R’1 > r > 0 and A(r, h) = A, then V (r, h) ¤ V (r0 , h0 )/2. Show that

K = {(r, h) : A(r, h) = A, R ≥ r ≥ R’1 , h > 0}

is a closed bounded set in R2 . Deduce that V (r, h) attains a maximum on K
and conclude that V (r, h) attains a maximum on

{(r, h) : A(r, h) = A, r > 0, h > 0}.

Exercise 13.3.7. Suppose that f : R2 ’ R and t : R2 ’ R are given by
f (x, y) = y and t(x, y) = y 2 ’ x2 . We seek the maximum value of t(x, y)
subject to f (x, y) = a. Show that L(», ) = t ’ »f is stationary at (0, »/2),
and that the only value of » which gives f (0, »/2) = a is » = 2a. [We again
use the notation described on Page 421.]
(i) Show that (0, a) does indeed maximise t(x, y) subject to f (x, y) = a.
(ii) Show, however, that L(2a, ) = t ’ 2af does not have a maximum at
(0, a).
(iii) Draw a diagram in the manner of Exercise 13.3.4 to illustrate what
is happening.

Exercise 13.3.8. Suppose that f : R2 ’ R and t : R2 ’ R are given by
f (x, y) = x ’ ±y 2 and t(x, y) = (x ’ 1)2 + y 2 . We seek maxima and minima
of t subject to f (x, y) = 0. Show that L(», ) = t ’ »f is stationary at
(1 + »/2, 0), and that the only value of » which gives f (0, 0) = 0 is » = ’2.
(i) Show that, if ± > 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.
Show, moreover, L(’2, ) = t ’ 2f has a strict minimum at 0.
(ii) Show that, if ± = 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.
Show that L(’2, ) has a minimum at 0 but this is not a strict minimum.
(iii) Show that, if ± < 1/2, (0, 0) maximises t(x, y) subject to f (x, y) = 0.
Show, however, that L(’2, ) does not have a maximum at 0.
(iv) Draw a diagram in the manner of Exercise 13.3.4 to illustrate (as far
as you can, do not overdo it) what is happening.

The following remark is entirely trivial but extremely useful.
353
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 13.3.9. (Lagrangian su¬ciency.) Consider a set X and func-
tions f : X ’ Rn and t : X ’ R. Set

L(», x) = t(x) ’ » · f (x)

for all » ∈ Rn and x ∈ X.
Suppose h ∈ Rn , »— ∈ Rn , and x— ∈ X are such that f (x— ) = h and

L(»— , x— ) ≥ L(»— , x)

for all x ∈ X. Then

t(x— ) ≥ t(x)

for all x ∈ X such that f (x) = h.
Proof. The proof is shorter than the statement. Observe that, under the
given hypotheses,

t(x— ) = t(x— ) ’ »— · f (x— ) + »— · h = L(»— , x— ) + »— · h
≥ L(»— , x) + »— · h = t(x) ’ »— · f (x) + »— · h = t(x)

for all x ∈ X such that f (x) = h, as required.
It is worth noting that Lemma 13.3.9 (our Lagrangian su¬ciency condition)
does not require f or t to be well behaved or make any demands on the un-
derlying space X. This is particularly useful since, if the reader notes where
Lagrange™s method is used, she will ¬nd that it is often used in circumstances
much more general than those envisaged in Lemma 13.3.3 (our Lagrangian
necessity condition).
However, the very simple examples given in Exercises 13.3.7 and 13.3.8
show that there is a very big gap between our Lagrangian necessity and suf-
¬ciency conditions even when we restrict ourselves to well behaved functions
on ¬nite dimensional spaces. In the ¬rst three chapters of his elegant text
Optimization Under Constraints [48] Whittle considers how this gap can be
closed. It turns out that in certain very important practical cases (in partic-
ular those occurring in linear programming) the su¬cient condition is also
necessary but that, from the point of view of the present book, these cases
have very special features.
In general, the Lagrangian method is very e¬ective in suggesting the
correct answer but when that answer has been found it is often easier to
prove correctness by some other method (compare the treatment of geodesics
in Section 10.5 where we found Theorem 10.5.11 by one method but proved
it by another).
354 A COMPANION TO ANALYSIS

The alternative strategy, once the Lagrangian method has produced a
possible solution is to claim that ˜it is obvious from physical considerations
that this is indeed the correct solution™. This works well provided at least
one of the following conditions apply:-
(1) you are answering an examination question, or
(2) you have genuine physical insight, or
(3) you are reasonably lucky.
It is, of course, characteristic of bad luck that it strikes at the most inconve-
nient time.
Chapter 14

Completion

14.1 What is the correct question?
We cannot do interesting analysis on the rationals but we can on the larger
space of real numbers. On the whole, we cannot do interesting analysis on
metric spaces which are not complete. Given such an incomplete metric
space, can we ¬nd a larger complete metric space which contains it? Our
¬rst version of this question might run as follows.
Question A: If (X, d) is a metric space, can we ¬nd a complete metric space
(Z, δ) such that Z ⊇ X and d(u, v) = δ(u, v) for all u, v ∈ X?
Most mathematicians prefer to ask this question in a slightly di¬erent
way.
Question A : If (X, d) is a metric space, can we ¬nd a complete metric space
˜ ˜
(Y, d) and a map θ : X ’ Y such that d(θu, θv) = d(u, v)?
Question A asks us to build a complete metric space containing (X, d).
Question A asks us to build a complete metric space containing a perfect
copy of (X, d). Since mathematicians cannot distinguish between the original
space (X, d) and a perfect copy, they consider the two questions to be equiv-
alent. However, both from a philosophical and a technical point of view, it
is easier to handle Question A .
Exercise 14.1.1. Convince yourself either that Question A is equivalent to
Question A or that Question A is more appropriate than Question A.
For the moment, we stick to Question A. A little thought shows that it
does not have a unique answer.
Exercise 14.1.2. Consider the open interval X = (0, 1) on R with the usual
metric d.
(i) Show that (X, d) is not complete.

355
356 A COMPANION TO ANALYSIS

(ii) Show that, if we take Z = [0, 1] with the usual metric δ, then (Z, δ)
answers Question A.
(iii) Show that, if we take Y = R with the usual metric δ, then (Z, δ)
answers Question A.

We therefore reformulate Question A as follows.
Question B: If (X, d) is a metric space, can we ¬nd a most economical com-
plete metric space (Z, ρ) such that Z ⊇ X and d(u, v) = ρ(u, v) for all u,
v ∈ X?
We can formulate an appropriate meaning for ˜most economical™ with the
aid of the notion of density.

De¬nition 14.1.3. If (X, d) is a metric space, we say that a subset E is
dense in X if, given any x ∈ X, we can ¬nd xn ∈ E with d(xn , x) ’ 0 as
n ’ ∞.

Exercise 14.1.4. Show that the following statements about a subset E of a
metric space (X, d) are equivalent.
(i) E is dense in X.
(ii) Given x ∈ X and > 0, we can ¬nd a y ∈ E such that d(x, y) < .
(iii) If F is a closed subset of X with F ⊇ E, then F = X.

The notion of density is widely used in analysis since, in some sense,
a dense subset of a metric space acts as a ˜skeleton™ for the space. For
the moment, let us see why it provides the appropriate de¬nition of ˜most
economical™ in the context of this chapter.

Lemma 14.1.5. Suppose (X, d), (Y, ρ) and (Y , ρ ) are metric spaces with
the following properties.
(i) (Y, ρ) and (Y , ρ ) are complete,
(ii) There exist maps θ : X ’ Y and θ : X ’ Y such that ρ(θu, θv) =
ρ (θ u, θ v) = d(u, v).
(iii) θX is dense in Y and θ X is dense in Y .
Then we can ¬nd a bijection φ : Y ’ Y such that φθ = θ and ρ (φw, φz) =
ρ(w, z) for all w, z ∈ Y .

Exercise 14.1.6. (i) Convince yourself that Lemma 14.1.5 can be roughly
restated in terms of Question A as follows. Suppose (X, d) is a metric space
and (Z, δ) and (Z , δ ) are complete metric spaces such that X ⊆ Z, X ⊆ Z ,
δ(u, v) = δ (u, v) = d(u, v) for all u, v ∈ X and X is dense in (Z, δ) and in
(Z , δ ). Then (Z, δ) and (Z , δ ) have the same metric structure and X sits
in both metric spaces in the same way. [Note, you must be able to describe
the purpose of the statement φθ = θ in this account.]
357
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Convince yourself that the questions involved in Lemma 14.1.5 are
better treated in the language of Question A than in the language of Ques-
tion A.

Proof of Lemma 14.1.5. If y ∈ Y , then, since θX is dense in (Y, ρ), we can
¬nd xn ∈ X with ρ(θxn , y) ’ 0. Since θxn converges, it forms a Cauchy
sequence in (Y, ρ). But

ρ (θ xn , θ xm ) = d(xn , xm ) = ρ(θxn , θxm ),

so θ xn is a Cauchy sequence in (Y , ρ ) and so converges to a limit x, say.
˜
Suppose zn ∈ X with ρ(θzn , y) ’ 0. By the argument of the previous
paragraph, θ zn is a Cauchy sequence in (Y , ρ ) and so converges to a limit
z , say. We wish to show that x = z . To do this, observe that
˜ ˜˜

ρ (˜, z ) ¤ ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ (θ xn , θ zn )
x˜ x z
= ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ(θxn , θzn )
x z
¤ ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ(θxn , y) + ρ(θzn , y)
x z
’ 0 + 0 + 0 + 0 = 0,

as n ’ ∞. Thus ρ (˜, z ) = 0 and x = z . We have shown that we can de¬ne
x˜ ˜˜
φy unambiguously by φy = x. ˜
We have now de¬ned φ : Y ’ Y . To show that ρ (φw, φz) = ρ(w, z),
choose wn ∈ X and zn ∈ X such that ρ(θwn , w), ρ(θzn , z) ’ 0 as n ’ ∞.
Then ρ(θzn , θwn ) ’ ρ(z, w) as n ’ ∞. (See Exercise 14.1.7 if necessary.)
By de¬nition, ρ (θ zn , φz) ’ 0 and ρ (θ wn , φw) ’ 0, so ρ (θ zn , θ wn ) ’
ρ (φz, φw). Since

ρ (θ zn , θ wn ) = d(zn , wn ) = ρ(θzn , θwn ),

it follows that ρ (φw, φz) = ρ(w, z).
To show that φθ = θ , choose any x ∈ X. If we set xn = x for each n,
we see that ρ(θxn , θx) = 0 ’ 0 and ρ (θ xn , θ x) = 0 ’ 0 as n ’ ∞ so, by
de¬nition, φθx = θ x. Since x was arbitrary, it follows that φθ = θ .
Finally, we must show that φ is a bijection. There are several ways of
approaching this. We choose an indirect but natural approach. Observe that,
interchanging the roles of Y and Y , the work done so far also shows that
˜ ˜ ˜˜
there is a map φ : Y ’ Y such that φθ = θ and ρ(φw, φz) = ρ (w, z) for all
w, z ∈ Y . Since

˜ ˜ ˜
(φφ)θ = φ(φθ) = φθ = θ,
358 A COMPANION TO ANALYSIS

˜
we have φφ(y) = y for all y ∈ θX. Now θX is dense in Y , so, if y ∈ Y , we
can ¬nd yn ∈ θX such that ρ(yn , y) ’ 0. Thus
˜ ˜ ˜ ˜
ρ(φφ(y), y) ¤ ρ(φφ(y), φφ(yn )) + ρ(φφ(yn ), yn ) + ρ(yn , y)
˜ ˜
= ρ(φφ(y), φφ(yn )) + ρ(yn , y) = ρ (φ(y), φ(yn )) + ρ(yn , y)

<<

. 11
( 19)



>>