(We could prove this rigorously, but a chain is as strong as its weakest link

and the real di¬culties lie elsewhere.)

We are therefore led to the following plausible statement.

323

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Plausible statement 12.4.5. The equation

y (t) + a(t)y (t) + b(t)y(t) = f (t), y(0) = y(1) = 0

has the solution

1

y(t) = f (s)G(s, t) ds.

0

We summarise the results of our informal argument in the next theorem

which we then prove formally.

Theorem 12.4.6. Suppose that f, a, b : [0, 1] ’ R are continuous. By

Exercise 12.3.11, there exists a unique twice di¬erentiable y1 : [0, 1] ’ R

such that

y1 (t) + a(t)y1 (t) + b(t)y1 (t) = 0, y1 (0) = 0, y1 (0) = 1

and a unique twice di¬erentiable y2 : [0, 1] ’ R such that

y2 (t) + a(t)y2 (t) + b(t)y2 (t) = 0, y2 (1) = 0, y2 (1) = 1.

We make the following

key assumption: y1 (1) = 0.

If we set W (t) = y2 (t)y1 (t) ’ y2 (t)y1 (t), then W is never zero and we may

de¬ne G : [0, 1]2 ’ R by

G(s, t) = y1 (t)y2 (s)W (s)’1 for 0 ¤ t ¤ s,

G(s, t) = y2 (t)y1 (s)W (s)’1 for s ¤ t ¤ 1.

With this notation, G is continuous so we can de¬ne

1

y(t) = G(s, t)f (s) ds

0

The function y, so de¬ned, is twice di¬erentiable and satis¬es

y (t) + a(t)y (t) + b(t)y(t) = f (t)

together with the conditions y(0) = y(1) = 0. Moreover this solution is

unique.

324 A COMPANION TO ANALYSIS

Proof. We have already established the contents of the ¬rst paragraph. The

proof of the continuity of G is left to the reader (see Exercise 12.4.7 (iii) for a

general result from which this follows). To show that y is di¬erentiable and

satis¬es we observe that

t 1

y(t) = G(s, t)f (s) ds + G(s, t)f (s) ds

0 t

t 1

’1

y1 (t)y2 (s)W (s)’1 f (s) ds

= y2 (t)y1 (s)W (s) f (s) ds +

0 t

t 1

’1

y2 (s)W (s)’1 f (s) ds

= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)

0 t

[Experience shows that people get into frightful muddles over this calculation.

At each stage you must ask yourself ˜is s bigger than t or vice versa and what

does this mean for the de¬nition of G(s, t)?™]

We now use the product rule and the fundamental theorem of the calculus

to show that y is twice di¬erentiable with

t

y1 (s)W (s)’1 f (s) ds + y2 (t)y1 (t)W (t)’1 f (t)

y (t) = y2 (t)

0

1

y2 (s)W (s)’1 f (s) ds ’ y1 (t)y2 (t)W (t)’1 f (t)

+ y1 (t)

t

t 1

’1

y2 (s)W (s)’1 f (s) ds

= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)

0 t

and

t

y1 (s)W (s)’1 f (s) ds + y2 (t)y1 (t)W (t)’1 f (t)

y (t) = y2 (t)

0

1

y2 (s)W (s)’1 f (s) ds ’ y1 (t)y2 (t)W (t)’1 f (t)

+ y1 (t)

t

t 1

’1

y2 (s)W (s)’1 f (s) ds

= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)

0 t

’1

+ (y2 (t)y1 (t) ’ y1 (t)y2 (t))W (t) f (t)

t 1

’1

y2 (s)W (s)’1 f (s) ds + f (t),

= y2 (t) y1 (s)W (s) f (s) ds + y1 (t)

0 t

325

Please send corrections however trivial to twk@dpmms.cam.ac.uk

so that

y (t) + a(t)y (t) + b(t)y(t)

1

y2 (s)W (s)’1 f (s) ds

= (y1 (t) + a(t)y1 (t) + b(t)y1 (t))

t

t

y1 (s)W (s)’1 f (s) ds + f (t)

+ (y2 (t) + a(t)y2 (t) + b(t)y2 (t))

0

1 t

’1

y1 (s)W (s)’1 f (s) ds + f (t) = f (t),

=0— y2 (s)W (s) f (s) ds + 0 —

t 0

as required.

To prove uniqueness, suppose that u1 and u2 are solutions of satisfying

u1 (0) = u2 (0) = u1 (1) = u2 (1) = 0. If we set u = u1 ’ u2 then, by linearity,

u (t) + a(t)u (t) + b(t)u(t) = 0, u(0) = 0, u(1) = 0.

Suppose u (0) = ». Then u and »y1 both satisfy

w (t) + a(t)w (t) + b(t)w(t) = 0, w(0) = 0, w (0) = »

so, by the uniqueness results of the previous section, u = »y1 . But y1 (1) = 0

and u(1) = 0, so » = 0 and u = 0. Thus u1 = u2 as required.

Exercise 12.4.7. Let A and B be subsets of some metric space and consider

a function f : A ∪ B ’ R.

(i) Show that, if f is not continuous at x, then at least one of the following

two statements must be true

(±) We can ¬nd an ∈ A with an ’ x and f (an ) f (x).

(β) We can ¬nd bn ∈ B with bn ’ x and f (bn ) f (x).

(ii) Give an example where f |A and f |B are continuous, but f is not.

(iii) Show that, if A and B are closed, then the continuity of f |A and f |B

implies the continuity of f .

[For further remarks along these lines see Exercise K.178.]

The proof of Theorem 12.4.6 is rather disappointing, since it uses only

rather elementary results and gives no hint as to how proceed in more general

circumstances. (In Exercise K.278, I outline proof which involves slightly

less calculation and slightly harder theorems but it still does not get to the

heart of the matter.) However the general idea of using ˜delta functions™ or

˜impulses™ to study ordinary and, particularly, partial di¬erential equations

is very important in both pure and applied mathematics. (The acoustics

of concert halls are tested by letting o¬ starting pistols and recording the

results.)

326 A COMPANION TO ANALYSIS

From the point of view of the pure mathematician, one of the chief ad-

vantages of the Green™s function method is that it converts problems on

di¬erentiation to problems on integration with the advantages pointed out

on page 306.

In the terminology of more advanced work, we have shown how di¬erential

operators like y ’ Sy, where

Sy(t) = y (t) + a(t)y (t) + b(t)y(t),

can be linked with better behaved integral operators like

1

f ’ T f with T f (t) = G(s, t)f (s) ds.

0

Note that we have shown ST f = f for f continuous, but note also that, if f

is merely continuous, Sf need not be de¬ned. The Green™s function G is an

example of an integral kernel1 More formally, if we write

1

Jf (s) = K(s, t)f (s) ds,

0

then J is called an integral operator with kernel K.

We end with an example to show that things really do go awry if our key

assumption fails.

Example 12.4.8. The equation

y (t) + π 2 y(t) = t

has no solution satisfying y(0) = y(1) = 0.

Proof. Suppose that y satis¬es the equation

y (t) + π 2 y(t) = t

1

We shall not discuss these matters much further but most of the new words in this

last paragraph are well worth dropping.

You must lie among the daisies and discourse in novel phrases of your com-

plicated state of mind,

The meaning doesn™t matter if it™s only idle chatter of a transcendental kind.

And everyone will say,

As you walk your mystic way,

˜If this young man expresses himself in terms too deep for me

Why, what a very singularly deep young man this deep young man must be!™

. (Gilbert and Sullivan Patience)

327

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and satis¬es y(0) = 0. If we set

y (0) ’ π ’2

’2

w(t) = π t + sin πt,

π

then, by direct calculation,

w (t) + π 2 w(t) = t

and w(0) = 0 = y(0), w (0) = y (0) so, by the uniqueness results of the

previous section, y = w. In particular, y(1) = w(1) = 0.

Chapter 13

Inverse and implicit functions

13.1 The inverse function theorem

We start with an exercise giving a very simple example of a technique that

mathematicians have used for centuries.

Exercise 13.1.1. We work in R.

(i) Suppose x and y are close to 1. If (x + ·)2 = y show that

· ≈ (y ’ x2 )/2.

(ii) This suggests the following method for ¬nding the positive square root

of y. Take x0 close to the square root of y (for example x0 = 1) and de¬ne

xn inductively by xn = xn’1 + (y ’ x2 )/2. We expect xn to converge to the

n’1

positive square root of y.

(a) Try this for y = 1.21 and for y = 0.81.

(b) Try this for y = 100.

(iii) Sketch x ’ x2 /2 for 1/2 ¤ x ¤ 3/2. Show that if |y ’ 1| ¤ 1/2 and

we de¬ne T x by

T x = x + (y ’ x2 )/2

then |T x ’ 1| ¤ 1/2 whenever |x ’ 1| ¤ 1/2. Show that T : [1/2, 3/2] ’

[1/2, 3/2] is a contraction mapping and deduce that the proposed method for

¬nding square roots works if |y ’ 1| ¤ 1/2.

(iv) Find a reasonable approximation to the positive square root of 150 by

observing that

(150)1/2 = 12(150/144)1/2

and using the method proposed in part (ii).

329

330 A COMPANION TO ANALYSIS

The discussion that follows may be considered as a natural extension of

the ideas above. We work in Rn with the usual Euclidean norm.

Lemma 13.1.2. Consider a function f : Rm ’ Rm such that f (0) = 0.

Suppose there exists a δ > 0 and an · with 1 > · > 0 such that

(f (x) ’ f (y)) ’ (x ’ y) ¤ · (x ’ y)

for all x , y ¤ δ. Then, if y ¤ (1 ’ ·)δ, there exists one and only one

solution of the equation

f (x) = y

with x < δ. Further, if we denote this solution by g(y), we have

g(y) ’ y ¤ ·(1 ’ ·)’1 y .

Since Lemma 13.1.2 is most important to us when · is small there is no

harm in concentrating our attention on this case. Lemma 13.1.2 is then an

instance of the

Slogan: Anything which is close to the identity has an inverse.

(This is a slogan, not a theorem. See, for example, Exercise 13.1.7.) The

inequality

(f (x) ’ f (y)) ’ (x ’ y) ¤ · (x ’ y)

is then seen as a perturbation of the trivial equality

(I(x) ’ I(y)) ’ (x ’ y) = 0 = 0 (x ’ y) .

If we try to solve equation by the method of Exercise 13.1.1, we are

led to the observation that, if such a solution exists, with value u, say, then,

if x is close to u,

u ’ x ≈ f (u) ’ f (x) = y ’ f (x),

and so

u ≈ x + (y ’ f (x)).

This suggests that we should start with an x0 close to the solution of equa-

tion (for example x0 = y) and de¬ne xn inductively by xn = T xn’1

where

T x = x + (y ’ f (x)).

If we add the contraction mapping as a further ingredient, we arrive at the

following proof of Lemma 13.1.2.

331

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof of Lemma 13.1.2. We set

¯

X = B(0, δ) = {x ∈ Rm : x ¤ δ}.

Since X is closed in Rm , we know that (X, ) is complete.

Let y ¤ (1 ’ ·)δ. If we set

T x = x + (y ’ f (x)),

then (since f (0) = 0)

T x ¤ y + x ’ f (x)

¤ y + (f (x) ’ f (0)) ’ (x ’ 0)

¤ y +· x’0

¤ (1 ’ ·)δ + · x < δ,

whenever x ∈ X. Thus T is a well de¬ned function T : X ’ X.

If x, x ∈ X, then

Tx ’ Tx = (x ’ x ) ’ (f (x) ’ f (x )) ¤ · x ’ x ,

by the hypotheses of the lemma. Thus T is a contraction mapping and has

a unique ¬xed point u ∈ X such that

u = T u = u + (y ’ f (u)),

or, equivalently,

f (u) = y

as required. To obtain the ¬nal inequality, we return to the original proof

of the contraction mapping theorem (Theorem 12.1.3). Observe, as we did

there, that, if x ∈ X,

T n x ’ T n’1 x ¤ · T n’1 x ’ T n’2 x ¤ · n’1 T x ’ x

and so

n n

· j’1 T x ’ x ¤ (1 ’ ·)’1 T x ’ x .

T nx ’ x ¤ T j x ’ T j’1 x ¤

j=1 j=1

Since T n x ’ u,

u ’ x ¤ u ’ T nx + T nx ’ x

¤ u ’ T n x + (1 ’ ·)’1 T x ’ x ’ (1 ’ ·)’1 T x ’ x

332 A COMPANION TO ANALYSIS

as n ’ ∞, it follows that

u ’ x ¤ (1 ’ ·)’1 T x ’ x .

If we take x = y, the last inequality takes the form

u ’ y ¤ (1 ’ ·)’1 T y ’ y = (1 ’ ·)’1 y ’ f (y)

= (1 ’ ·)’1 (y ’ 0) ’ (f (y) ’ f (0))

¤ ·(1 ’ ·)’1 y ’ 0 = ·(1 ’ ·)’1 y ,

as required.

Exercise 13.1.3. Let (X, d) be a metric space (not necessarily complete)

and T : X ’ X a mapping such that d(T x, T y) ¤ Kd(x, y) for all x, y ∈ X

and some K < 1. Suppose that T has a ¬xed point w. If x0 ∈ X and we

de¬ne xn inductively by xn+1 = T xn , show that d(x0 , w) ¤ (1’K)’1 d(x0 , x1 ).

Lemma 13.1.2 provides the core of the proof of our next result. Here, and

for the rest of the chapter, we use, in addition to the Euclidean norm on Rm ,

the operator norm on the space of linear maps L(Rm , Rm ).

Lemma 13.1.4. Consider a function f : Rm ’ Rm such that f (0) = 0 and

there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(0, δ0 ). If

Df is continuous at 0 and Df (0) = I (the identity map), then we can ¬nd a

δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ¤ ρ, there exists one and

only one solution of the equation

f (x) = y

with x < δ1 . Further, if we denote this solution by g(y), the function g is

di¬erentiable at 0 with Dg(0) = I.

Proof. Since Df is continuous at 0 and Df = I, we can ¬nd a δ1 > 0 such

that δ0 > δ1 > 0 and

Df (w) ’ I < 1/2

for w ¤ δ1 . Applying the mean value inequality to the function h = f ’ I,

we obtain

(f (x) ’ f (y)) ’ (x ’ y) = h(x) ’ h(y) ¤ x ’ y sup Dh(w)

w¤δ1

= x’y Df (w) ’ I ¤ x ’ y /2

sup

w ¤δ1

333

Please send corrections however trivial to twk@dpmms.cam.ac.uk

for all x , y ¤ δ1 . Setting ρ = δ1 /2, we see, by Lemma 13.1.2, that, if

y ¤ ρ, there exists one and only one solution of the equation

f (x) = y

with x < δ1 . For the rest of the proof we denote this solution by g(y).

To discuss the behaviour of g near 0, we echo the discussion of the ¬rst

paragraph. Let · > 0 be given. Since Df is continuous at 0 and Df = I, we

can ¬nd a δ(·) > 0 such that δ1 > δ(·) > 0 and

Df (w) ’ I < ·

for w ¤ δ(·). By exactly the same reasoning as in the ¬rst paragraph,

f (x) ’ f (y) ’ (x ’ y) ¤· x’y

for all x , y ¤ δ(·). The last sentence of Lemma 13.1.2 now tells us that,

if y ¤ (1 ’ ·)δ(·), the unique solution of the equation

f (x) = y

with x < δ(·), which we have already agreed at the end of the previous

paragraph to denote by g(y), satis¬es

g(y) ’ y ¤ ·(1 ’ ·)’1 y .

In less roundabout language,

g(y) ’ y ¤ ·(1 ’ ·)’1 y

whenever y ¤ (1 ’ ·)δ(·). Since f (0) = 0, we have g(0) = 0 and

g(y) ’ g(0) = Iy + (y) y

(y) ’ 0 as y ’ 0. Thus g is di¬erentiable at 0 with derivative

with

I.

Exercise 13.1.5. In the second paragraph of the proof just given it says ˜By

exactly the same reasoning as in the ¬rst paragraph™. Fill in the details.

Exercise 13.1.6. The only point of proving Lemma 13.1.2 was to expose the

inner workings of the proof of Lemma 13.1.4. Prove Lemma 13.1.4 directly.

(The proof is essentially the same but combining the proofs of the two lemma

is more economical.)

334 A COMPANION TO ANALYSIS

The next exercise shows that simply knowing that f is di¬erentiable at 0

with derivative I is not enough to give the existence of a local inverse.

Exercise 13.1.7. (i) Let f : R ’ R be given by f (x) = x + x2 sin(1/x)

for x = 0, f (0) = 0. Show, by using the de¬nition of di¬erentiability, or

otherwise, that f is di¬erentiable at 0 with f (0) = 1. By considering the

maxima and minima of f , or otherwise, show that there exist a sequence

of non-zero yn ’ 0 such that the equation f (x) = yn has more than one

solution.

(ii) Let f : R ’ R be given by

f (x) = 1 for x > 1,

for 1/n ≥ x > 1/(n + 1), n a strictly positive integer,

f (x) = 1/n

f (0) = 0

f (x) = ’f (’x) for x < 0.

Show that f is di¬erentiable at 0 with f (0) = 1 but that there exist a sequence

of non-zero yn ’ 0 such that the equation f (x) = yn has no solution.

It might be thought that Lemma 13.1.4 refers to a very special situation,

but we can now apply another

Slogan: Anything which works for the identity will work, in a suitable form,

for invertible elements. Moreover, the general case can be obtained from the

special case of the identity.

Lemma 13.1.8. Consider a function f : Rm ’ Rm such that f (0) = 0 and

there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(0, δ0 ).

If Df is continuous at 0 and Df (0) is invertible, then we can ¬nd a δ1 with

δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ¤ ρ, there exists one and only one

solution of the equation

f (x) = y

with x < δ1 . Further, if we denote this solution by g(y), the function g is

di¬erentiable at 0 with Dg(0) = Df (0)’1 .

Proof. Set ± = Df (0) and

F(x) = ±’1 f (x).

The proof now runs along totally predictable and easy lines.

335

Please send corrections however trivial to twk@dpmms.cam.ac.uk

By using standard chain rules (or simple direct calculations), the function

F : Rm ’ Rm satis¬es F(0) = 0 and F is di¬erentiable in the open ball

B(0, δ0 ) with derivative DF given by

DF(x) = ±’1 Df (x).

Thus DF is continuous at 0 and DF(0) = I. It follows, by Lemma 13.1.4,

that there exists a δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that if y ¤ δ1 ,

˜

˜

there exists one and only one solution of the equation

˜

F(x) = y

with x < ρ. Further, if we denote this solution by G(˜ ), the function G

˜ y

is di¬erentiable at 0 with DG(0) = I.

Equation can be rewritten as

±’1 f (x) = y.

˜

Taking y = ±˜ , and writing

y

g(t) = G(±’1 t),

we can rewrite the conclusion of the last paragraph as follows. If ± ’1 y ¤ ρ

˜

there exists one and only one solution of the equation

f (x) = y

with x < δ1 . Further, if we denote this solution by g(y), the function g is

di¬erentiable at 0 with Dg(0) = ±’1 = Df (0)’1 .

If we set ρ = ±’1 ’1 ρ, then, whenever y ¤ ρ, we have

˜

±’1 y ¤ ±’1 y ¤ρ

˜

and we have obtained all the conclusions of the lemma.

A simple translation argument transforms Lemma 13.1.8 to an apparently

more general form.

Lemma 13.1.9. Consider a function f : Rm ’ Rm and a w ∈ Rm such that

there exists a δ0 > 0 such that f is di¬erentiable in the open ball B(w, δ0 ). If

Df is continuous at w and Df (w), is invertible then we can ¬nd a δ1 with

δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ’ f (w) ¤ ρ, there exists one and

only one solution of the equation

f (x) = y

with x ’ w < δ1 . Further, if we denote this solution by g(y), the function

g is di¬erentiable at f (w) with Dg(f (w)) = Df (w)’1 .

336 A COMPANION TO ANALYSIS

Exercise 13.1.10. Prove Lemma 13.1.9 from Lemma 13.1.8.

We have now done most of the hard work of this section but Lemma 13.1.9

can be made much more useful if we strengthen its hypotheses.

Slogan: Analysis is done not at points but on open sets.

Our ¬rst two results are preliminary.

Lemma 13.1.11. (i) We write ι for the identity map ι : Rn ’ Rn . If

± : Rn ’ Rn is linear and ι ’ ± < 1, then ± is invertible.

(ii) Suppose β, γ : Rn ’ Rn are linear and β is invertible. If β ’ γ <

β ’1 ’1 , then γ is invertible.

(iii) Consider a function f : Rm ’ Rm which is di¬erentiable on an open

set U . If Df is continuous and invertible at a point u0 ∈ U , then we can ¬nd

an open set B ⊆ U with u0 ∈ B such that Df is invertible at every point of

B.

Proof. (i) Since we are working with ¬nite dimensional vector spaces, ± is

invertible if and only if it is injective and ± is injective if and only if ker ± =

{0}. If x = 0 then

±x = x ’ (ι ’ ±)x ≥ x ’ (ι ’ ±)x ≥ x ’ ι ’ ± x > 0,

and so ±x = 0. The result follows.

(ii) Observe that

ι ’ β ’1 γ = β ’1 (β ’ γ) ¤ β ’1 (β ’ γ) < 1

and so β ’1 γ is invertible. Write θ = (β ’1 γ)’1 and observe that (θβ ’1 )γ =

θ(β ’1 γ) = ι. Thus, since we are dealing with ¬nite dimensional spaces, γ is

invertible with inverse θβ ’1 .

(iii) By the continuity of Df we can ¬nd an open set B ⊆ U with u0 ∈ B

such that

Df (u0 ) ’ Df (u) < Df (u0 )’1 ’1

.

The stated result follows from part (ii).

(Both Lemma 13.1.11 and its proof may be classi¬ed as ˜quick and dirty™. A

more leisurely approach, which reveals much more of what is actually going

on, is given in Exercise K.281.)

Lemma 13.1.12. Consider a function f : Rm ’ Rm which is di¬erentiable

on an open set U . If Df is continuous and invertible at every point of U ,

then f (U ) is open.

337

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof. Suppose w ∈ U . Since U is open, we can ¬nd a δ0 > 0 such that

the open ball B(w, δ0 ) is a subset of U . By hypothesis, Df is continuous at

w and Df (w) is invertible. Thus, by Lemma 13.1.9, we can ¬nd a δ1 with

δ0 ≥ δ1 > 0 and a ρ > 0 such that, if y ’ f (w) ¤ ρ, there exists a solution

of the equation

f (x) = y

with x ’ w < δ1 . It follows that

B(f (w), ρ) ⊆ f (B(w, δ0 )) ⊆ f (U ).

We have shown that f (U ) is open

Theorem 13.1.13. (Inverse function theorem.) Consider a function

f : Rm ’ Rm which is di¬erentiable on an open set U . Suppose, further, that

Df is continuous at every point of U , that w ∈ U and Df (w) is invertible.

Then we can ¬nd an open set B ⊆ U with w ∈ B and an open set V such

that

(i) f |B : B ’ V is bijective,

(ii) f |’1 : V ’ B is di¬erentiable with

B

Df |’1 (f (u)) = (Df (u))’1

B

for all u ∈ B.

Proof. Suppose w ∈ U . By Lemma 13.1.11, we can ¬nd a δ0 > 0 such that

the open ball B(w, δ0 ) is a subset of U and Df (u) is invertible at every point

u ∈ B(w, δ0 ).

We now use the same argument which we used in Lemma 13.1.12. We

know that Df is continuous at w and Df (w) is invertible. Thus, by Lemma 13.1.9,

we can ¬nd a δ1 with δ0 ≥ δ1 > 0 and a ρ > 0 such that if y ’ f (w) ¤ ρ

there exists one and only one solution of the equation

f (x) = y

with x ’ w < δ1 . Set B = B(w, δ1 ) and apply Lemma 13.1.12 and

Lemma 13.1.9.

A slight strengthening of Theorem 13.1.13 is given in Exercise K.288. The

following cluster of easy exercises is intended to illuminate various aspects of

the inverse function theorem.

338 A COMPANION TO ANALYSIS

Exercise 13.1.14. Suppose U and V are open subsets of Rm and f : U ’ V ,

g : V ’ U are such that g —¦ f is the identity map on U . Show that if f is

di¬erentiable at u ∈ U and g is di¬erentiable at f (u) then (Df )(u) and

(Dg)(f (u)) are invertible and

’1

(Dg)(f (u)) = (Df )(u) .

Exercise 13.1.15. (A traditional examination question.) Let f : R ’ R be

given by f (x) = x3 . Show that f is bijective but f (0) = 0 (so the derivative

of f at 0 is not invertible).

Exercise 13.1.16. Consider the open interval (’4, 4). Find f ((’4, 4)) for

the functions f : R ’ R given by

(i) f (x) = sin x,

(ii) f (x) = sin 10’2 x,

(iii) f (x) = x2 ,

(iv) f (x) = x3 ’ x.

In each case comment brie¬‚y on the relation of your result to Lemma 13.1.12.

Exercise 13.1.17. Let

U = {(x, y) ∈ R2 : 1 < x2 + y 2 < 2}

and de¬ne f : U ’ R by

x2 ’ y 2 2xy

f (x, y) = ,2 .

(x2 + y 2 )1/2 (x + y 2 )1/2

(i) Show that U is open.

(ii) Show that f is di¬erentiable on U and that Df is continuous and

invertible at every point of U .

(iii) By using polar coordinates discover why f is de¬ned as it is. Show

that f (U ) = U but f is not injective.

Once the ideas behind the proof of the inverse function theorem are un-

derstood, it can be condensed into a very short argument. In Dieudonn´™s e

account the essential content of both this section and the next is stated and

proved in more general form in about two pages ([13] Chapter X, Theo-

rem 10.2.1). We leave it to the reader to undertake this condensation1 . The

usual approach to the inverse function theorem uses the contraction map-

ping theorem in a rather more subtle way than the approach adopted here.

I outline the alternative approach in Appendix F.

1

There is a Cambridge story about an eminent algebraic geometer who presented his

subject entirely without diagrams. However, from time to time, when things got di¬cult,

he would hide part of the blackboard, engage in some rapid but hidden chalk work, rub

out the result and continue.

339

Please send corrections however trivial to twk@dpmms.cam.ac.uk

The implicit function theorem ™

13.2

The contents of this section really belong to a ¬rst course in di¬erential ge-

ometry. However, there is an old mathematical tradition called ˜pass the

parcel™ by which lecturers assume that all the hard but necessary prelimi-

nary work has been done ˜in a previous course™2 . In accordance with this

tradition, lecturers in di¬erential geometry frequently leave the proof of the

implicit function theorem in the hands of an, often mythical, earlier lecturer

in analysis. Even when the earlier lecturer actually exists, this has the ef-

fect of ¬rst exposing the students to a proof of a result whose use they do

not understand and then making them use a result whose proof they have

forgotten.

My advice to the reader, as often in this book, is not to take this section

too seriously. It is far more important to make sure that you are con¬dent of

the meaning and proof of the inverse function theorem than that you worry

about the details of this section.

Consider the function h : R2 ’ R given by h(x, y) = x2 + y 2 . We know

that the contour (or level) line

x2 + y 2 = 1

can represented by a graph

x = (1 ’ y 2 )1/2

close to the point (x, y) = (0, 1). We also know that this representation fails

close to the point (x, y) = (1, 0) but that near that point we can use the

representation

y = (1 ’ x2 )1/2 .

Leaping rapidly to a conclusion, we obtain the following slogan

Slogan: If f : R2 ’ R behaves well in a neighbourhood of a point (x0 , y0 )

then at least one of the following two statements must be true.

(a) There exists a δ > 0 and a well behaved bijective function g :

(’δ, δ) ’ R such that g(0) = y0 and f (x + x0 , g(x)) = f (x0 , y0 ) for all

x ∈ (’δ, δ).

(b) There exists a δ > 0 and a well behaved bijective function g :

(’δ, δ) ’ R such that g(0) = x0 and f (g(y), y + y0 ) = f (x0 , y0 ) for all

y ∈ (’δ, δ).

2

Particular topics handled in this way include determinants, the Jordan normal form,

uniqueness of prime factorisation and various important inequalities.

340 A COMPANION TO ANALYSIS

Figure 13.1: Problems at a saddle

Exercise 13.2.1. Consider the contour x3 ’ y 2 = 0. Show that we can ¬nd

g1 : R ’ R and g2 : R ’ R such that

for all x ∈ R,

x3 ’ g1 (x)2 = 0

for all y ∈ R,

g2 (y)3 ’ y 2 = 0

but g1 is di¬erentiable everywhere and g2 is not. Explain in simple terms why

this is the case.

One way of looking at our slogan is to consider a walker on a hill whose

height is given by f (x, y) at a point (x, y) on a map. The walker seeks to walk

along a path of constant height. She is clearly going to have problems if she

starts at a strict maximum since a step in any direction takes her downward.

A similar di¬culty occurs at a strict minimum. A di¬erent problem occurs

at a saddle point (see Figure 13.1). It appears from the picture that, if f

is well behaved, there are not one but two paths of constant height passing

through a saddle point and this will create substantial di¬culties3 . However,

these are the only points which present problems.

Another way to look at our slogan is to treat it as a problem in the

calculus (our arguments will, however, continue to be informal). Suppose

that

f (x, g(x)) = f (x0 , y0 ).

Assuming that everything is well behaved, we can di¬erentiate with respect

to x, obtaining

f,1 (x, g(x)) + g (x)f,2 (x, g(x)) = 0

3

Remember Buridan™s ass which placed between two equally attractive bundles of hay

starved to death because it was unable to ¬nd a reason for starting on one bundle rather

than the other.

341

Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so, provided that f,2 (x, y) = 0,

f,1 (x, g(x))

g (x) = ’ .

f,2 (x, g(x))

Thus, provided that f is su¬ciently well behaved and

f,1 (x0 , y0 ) = 0,

our earlier work on di¬erential equations tells us that there exists a local

solution for g.

The two previous paragraphs tend to con¬rm the truth of our slogan and

show that there exist local contour lines in the neighbourhood of any point

(x0 , y0 ) where at least one of f,1 (x0 , y0 ) and f,2 (x0 , y0 ) does not vanish. We

shall not seek to establish the global existence of contour lines4 . Instead we

seek to extend the ideas of our slogan to higher dimensions.

We argue informally, assuming good behaviour as required. Consider a

function f : Rm — Rn ’ Rn . Without loss of generality, we may suppose that

f (0, 0) = 0. An appropriate generalisation of the questions considered above

is to ask about solutions to the equation

f (x, gh (x)) = h (1)

where h is ¬xed. (Note that x ∈ Rm , gh (x) ∈ Rn and h ∈ Rn . Since

everything is local we suppose h is small and we only consider (x, gh (x))

close to (0, 0).) Before proceeding to the next paragraph the reader should

convince herself that the question we have asked is a natural one.

The key step in resolving this problem is to rewrite it. De¬ne ˜, g :

f˜

R — R ’ R — R by

m n m n

˜(x, y) = (x, f (x, y))

f

˜

g(x, y) = (x, gy (x)).

If equation (1) holds, then

˜(˜ (x, h)) = ˜(x, gh (x)) = x, f (gh (x), x) = (x, h),

fg f

and so

˜(˜ (x, h)) = (x, h).

fg (2)

4

Obviously the kind of ideas considered in Section 12.3 will play an important role in

such an investigation. One obvious problem is that, when we look at a contour in one

location, there is no way of telling if it will not pass through a saddle point at another.

342 A COMPANION TO ANALYSIS

Conversely, if equation (2) holds, then equation (1) follows.

To solve equation (2) we need to use the inverse mapping results of the

previous section and, to use those results, we need to know if D˜(x, y) is

f

invertible at a given point (x, y). Now

D˜(x, y)(u, v) = Df (x, y)(u, v) + u,

f

and so D˜(x, y) is invertible if and only if the linear map ± : Rn ’ Rn , given

f

by

±v = Df (x, y)(0, v),

is invertible. (We present two proofs of this last statement in the next two

exercises. Both exercises take some time to state and hardly any time to do.)

Exercise 13.2.2. In this exercise we use column vectors. Thus we write

x

˜x x x

˜

f = ,g = .

x

f

y y gy (x)

y

x

Suppose that Df has matrix C with respect to the standard basis. Explain

y

why C is an n — (n + m) matrix which we can therefore write as

C = (B A),

with A an n — n matrix and B an n — m matrix.

x

Show that D˜f has matrix

y

I0

E= ,

BA

where 0 is the n — m matrix consisting of entirely of zeros and I is the m — m

identity matrix. By considering det E, or otherwise, show that E is invertible

if and only if A is. Show that A is the matrix of the linear map ± : Rn ’ Rn

given by

x 0

±v = Df

y v

Exercise 13.2.3. Let W be the direct sum of two subspaces U and V . If

γ : W ’ V is a linear map, show that the map ± : V ’ V , given by

±v = γv for all v ∈ V , is linear.

Now de¬ne γ : W ’ W by γ (u + v) = γ(u + v) + u for u ∈ U , v ∈ V .

˜ ˜

Show that γ is a well de¬ned linear map, that ker(˜ ) = ker(±) and that

˜ γ

γ (W ) = U + ±(V ). Conclude that γ is invertible if and only if ± is.

˜ ˜

343

Please send corrections however trivial to twk@dpmms.cam.ac.uk

We can now pull the strands together.

Theorem 13.2.4. (Implicit function theorem.) Consider a function f :

Rm —Rn ’ Rn which is di¬erentiable on an open set U . Suppose further that

Df is continuous at every point of U , that (x0 , y0 ) ∈ U and that the linear

map ± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible. Then we can ¬nd an open set B1 in Rm , with x ∈ B1 , and an

open set B2 in Rm , with f (x0 , y0 ) ∈ B2 , such that, if z ∈ B2 , there exists a

di¬erentiable map gz : B1 ’ Rm with (x, gz (x)) ∈ U and

f (x, gz (x)) = z

for all x ∈ B1 .

(We give a slight improvement on this result in Theorem 13.2.9 but The-

orem 13.2.4 contains the essential result.)

Proof. De¬ne ˜ : Rm — Rn ’ Rm — Rn by

f

˜(x, y) = (x, f (x, y)).

f

If (x, y) ∈ U then D˜(x, y) exists and

f

D˜(x, y)(s, t) = Df (x, y)(s, t) + s.

f

In particular, since ±, is invertible D˜(x0 , y0 ) is invertible. It follows, by the

f

inverse function theorem (Theorem 13.1.13), that we can ¬nd an open set

B ⊆ U with (x0 , y0 ) ∈ B and an open set V such that

(i) ˜|B : B ’ V is bijective, and

f

(ii) ˜|’1 : V ’ B is di¬erentiable.

fB

Let us de¬ne G : V ’ Rm and g : V ’ Rn by

˜|’1 (v) = (G(v), g(v))

fB

for all v ∈ V . We observe that g is everywhere di¬erentiable on V . By

de¬nition,

(x, z) = ˜|B (˜|’1 (x, z))

f fB

= ˜|B (G(x, z), g(x, z))

f

= (G(x, z), f (G(x, z), g(x, z)))

344 A COMPANION TO ANALYSIS

and so

G(x, z) = x,

and

z = f (x, g(x, z))

for all (x, z) ∈ V .

We know that V is open and (x0 , f (x0 , y0 )) ∈ V , so we can ¬nd an open

set B1 in Rm , with x0 ∈ B1 , and an open set B2 in Rn , with f (x0 , y0 ) ∈ B2 ,

such that B1 — B2 ⊆ V . Setting

gz (x) = g(x, z)

for all x ∈ B1 and z ∈ B2 , we have the required result.

Remark 1: We obtained the implicit function theorem from the inverse func-

tion theorem by introducing new functions ˜ and g. There is, however, no

˜

f

reason why we should not obtain the implicit function directly by following

the same kind of method as we used to prove the inverse function theorem.

Here is the analogue of Lemma 13.1.2

Lemma 13.2.5. Consider a function f : Rm ’ Rn such that f (0, 0) = 0.

Suppose that there exists a δ > 0 and an · with 1 > · > 0 such that

(f (x, t) ’ f (x, s)) ’ (s ’ t) ¤ · (s ’ t)

for all x , s , t ¤ δ [x ∈ Rm , s, t ∈ Rn ]. Then, if h ¤ (1 ’ ·)δ,

there exists one and only one solution of the equation

f (x, u) = h

with u < δ. Further, if we denote this solution by gh (x), we have

gh (x) ’ h ¤ ·(1 ’ ·)’1 h .

Exercise 13.2.6. Prove Lemma 13.2.5. Sketch, giving as much or as little

detail as you wish, the steps from Lemma 13.2.5 to Theorem 13.2.4. You

should convince yourself that the inverse function theorem and the implicit

function theorem are ¬ngers of the same hand.

345

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 2: In the introduction to this section we obtained contours as the

solutions of an ordinary di¬erential equation of the form

f,1 (x, g(x))

g (x) = ’ .

f,2 (x, g(x))

The situation for the general implicit function theorem is genuinely more

complicated. Suppose, for example that we have a function f : R4 ’ R2 and

we wish to ¬nd (u, v) : R2 ’ R2 so that (at least locally)

f (x, y, u(x, y), v(x, y)) = h,

that is

f1 (x, y, u(x, y), v(x, y)) = h1 ,

f2 (x, y, u(x, y), v(x, y)) = h2 .

On di¬erentiating, we obtain

‚u ‚v

f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,

‚x ‚x

‚u ‚v

f1,1 (x, y, u(x, y), v(x, y)) + f1,3 (x, y, u(x, y), v(x, y)) + f1,4 (x, y, u(x, y), v(x, y)) = 0,

‚y ‚y

‚u ‚v

f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,

‚x ‚x

‚u ‚v

f2,1 (x, y, u(x, y), v(x, y)) + f2,3 (x, y, u(x, y), v(x, y)) + f2,4 (x, y, u(x, y), v(x, y)) = 0,

‚y ‚y

so we are faced with a system of simultaneous partial di¬erential equations.

(For a very slightly di¬erent view of the matter see Exercise E.2.)

Remark 3: We proved Theorem 13.2.4 under the assumption the linear map

± : Rn ’ Rn given by

±t = Df (x0 , y0 )(0, t)

is invertible.

This condition gives unnecessary prominence to the last n coordinates.

To get round this, we introduce a new de¬nition.

De¬nition 13.2.7. We say that a linear map β : Rn+m ’ Rn has full rank

if β is surjective.

The next, very easy, lemma gives some context.

346 A COMPANION TO ANALYSIS

Lemma 13.2.8. Suppose that a linear map β : Rn+m ’ Rn has matrix B

with respect to standard coordinates. (We use column vectors.) Then the

following conditions are equivalent.

(i) β has full rank.

(ii) B has n linearly independent columns.

(iii) By permuting the coordinate axes, we can ensure that the last n

columns of B are linearly independent.

(iv) By permuting the coordinate axes, we can ensure that the linear map

± : Rn ’ Rn given by

±t = β(0, t)

is invertible.

Proof. To see that (i) implies (ii), observe that

rank β = dim βRn+m = dim span{b1 , b2 , , . . . , bn+m }

where bj is the jth column of B. If β has full rank then, since any spanning

set contains a basis, n of the bj must be linearly independent.

It is clear that (ii) implies (iii). To see that (iii) implies (iv), observe that,

if ± has matrix A with respect to standard coordinates after our permutation,

then A is an n — n matrix whose columns are the ¬rst n columns of B and

so linearly independent. Thus A is invertible and so ± is.

To see that (iv) implies (i), observe that

{b1 , b2 , , . . . , bn+m } ⊇ {a1 , a2 , , . . . , an }

where bk is the kth column of A, the matrix of ± with respect to standard

coordinates. Thus

Rn ⊇ span{b1 , b2 , , . . . , bn+m } ⊇ span{a1 , a2 , , . . . , an } = Rn ,

so βRn+m = Rn and β has full rank.

The appropriate modi¬cation of Theorem 13.2.4 is now obvious.

Theorem 13.2.9. Consider a function f : Rm+n ’ Rn which is di¬eren-

tiable on an open set U . Suppose further that Df is continuous at every point

of U , that w0 ∈ U and that Df (w0 ) has full rank. Then we can permute the

coordinate axes in such a way that, if we write w0 = (x0 , y0 ) ∈ Rm — Rn ,

we can ¬nd an open set B1 in Rm , with x0 ∈ B1 , and an open set B2 in

Rn with f (w0 ) ∈ B2 such that, if z ∈ B2 , there exists a di¬erentiable map

gz : B1 ’ Rm with (x, gz (x)) ∈ U and

f (x, gz (x)) = z

for all x ∈ B1 .

347

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Remark 4: In Appendix C, I emphasised the importance of a large stock of

examples, but I talked mainly about examples of badly behaved functions.

It is also true that mathematicians ¬nd it hard to lay their hands on a

su¬ciently diverse collection of well behaved functions. In this book we

started with the polynomials. To these we added the function x ’ 1/x

and got the rational functions x ’ P (x)/Q(x) with P and Q polynomials.

We then added power series and solutions of di¬erential equations. The

inverse function theorem gives us a new collection of interesting functions and

the implicit function theorem greatly extends this collection. The implicit

function theorem is particularly important in giving us interesting examples

of well behaved functions f : Rn ’ R when n ≥ 2.

Lagrange multipliers ™

13.3

Suppose that t : R4 ’ R and f : R4 ’ R2 are well behaved functions and we

wish to maximise t(x) subject to the condition f (x) = h.

This will involve a fair amount of calculation, so it seems reasonable to

simplify our task as much as possible. One obvious simpli¬cation is to trans-

late so that h = (0, 0) and f (0, 0, 0, 0) = (0, 0). We now wish to investigate

whether x = (0, 0, 0, 0) maximises t(x) subject to the condition f (x) = (0, 0).

A more interesting simpli¬cation is to observe that, since we are assuming

good behaviour, Df (0) will have full rank. We can therefore ¬nd invertible

linear maps ± : R4 ’ R4 and β : R2 ’ R2 such that βDf (0)± has matrix

1000

J=

0100

with respect to the standard basis (and using the column vector representa-

tion).

If we set

F(x) = β(f (±x)), T (x) = t(±x)

then our problem becomes that of investigating whether x = (0, 0, 0, 0) max-

imises T (x), subject to the condition F(x) = (0, 0). By the implicit function

theorem, we can ¬nd well behaved u, v : R ’ R2 such that u(0, 0) = v(0, 0) =

(0, 0) and (at least if x and y are small)

F(u(z, w), v(z, w), z, w) = (0, 0).

By the chain rule Df (0) has matrix J with respect to the standard basis

(using the column vector representation) and so

u,1 (0, 0) = u,2 (0, 0) = v,1 (0, 0) = v,2 (0, 0) = 0.

348 A COMPANION TO ANALYSIS

Before proceeding further the reader should do the next exercise.

Exercise 13.3.1. Explain why our conditions on F tell us that near (0, 0, 0, 0)

F(x, y, z, w) ≈ (x, y).

Suppose that, in fact, F(x, y, z, w) = (x, y) exactly. Find u and v. What

conditions on T ensure that x = (0, 0, 0, 0) maximises T (x) subject to the

condition F(x) = (0, 0)?

Since (at least locally) F(u(z, w), v(z, w), z, w) = (0, 0), our problem re-

duces to studying whether (z, w) = (0, 0) maximises „ (z, w) = T (u(z, w), v(z, w), z, w).

Since every maximum is a stationary point we ¬rst ask if (z, w) = (0, 0) is a

stationary point, that is, if

‚„ ‚„

(0, 0) = (0, 0) = 0.

‚z ‚w

By the chain rule,

‚„ ‚u ‚v

(z, w) = T,1 (u(z, w), v(z, w), z, w) (z, w) + T,2 (u(z, w), v(z, w), z, w) (z, w)

‚z ‚z ‚z

+ T,3 (u(z, w), v(z, w), z, w),

and so

‚„

(0, 0) = T,3 (0, 0, 0, 0).

‚z

A similar calculation gives

‚„

(0, 0) = T,4 (0, 0, 0, 0),

‚w

so (0, 0) is a stationary point of „ if and only if

T,3 (0, 0, 0, 0) = T,4 (0, 0, 0, 0) = 0.

We have found a necessary condition for (0, 0, 0, 0) to maximise T (x) subject

to F(x) = 0, and so for (0, 0, 0, 0) to maximise t(x) subject to f (x) = 0.

In order to make this condition usable, we must translate it back into the

terms of our original problem and this is most easily done by expressing it

in coordinate-free notation. Observe that if we write

l(x) = T (x) ’ T,1 (0)F1 (x) ’ T,2 (0)F2 (x)

349

Please send corrections however trivial to twk@dpmms.cam.ac.uk

then

l,1 (0) = T,1 (0) ’ T,1 (0)F1,1 (x) ’ T,2 (0)F2,1 (0) = T,1 (0) ’ T,1 (0) = 0,

l,2 (0) = T,2 (0) ’ T,1 (0)F1,2 (x) ’ T,2 (0)F2,2 (0) = T,2 (0) ’ T,2 (0) = 0,

l,3 (0) = T,3 (0) ’ T,1 (0)F1,3 (x) ’ T,2 (0)F2,3 (0) = T,3 (0),

l,4 (0) = T,4 (0) ’ T,1 (0)F1,4 (x) ’ T,2 (0)F2,4 (0) = T,4 (0),

so, if (0, 0, 0, 0) is a maximum,

D(T ’ T,1 (0)F1 ’ T,2 (0)F2 )(0) = 0.

Thus, if (0, 0, 0, 0) gives a maximum and θ : R2 ’ R is the linear map

given by θ(x, y) = T,1 (0)x + T,2 (0)y, we have

D(T ’ θF)(0) = 0.

Using the de¬nitions of T and F, this gives

D(t± ’ θβf (±))(0) = 0,

so by the chain rule

(Dt(0) ’ θβDf (0))± = 0.

Since ± is invertible, this is equivalent to saying

(Dt(0) ’ θβDf (0)) = 0,

and, since β is invertible, this, in turn, is equivalent to saying that there

exists a linear map φ : R2 ’ R such that

(Dt(0) ’ φDf (0)) = 0,

so, using the chain rule once again,

D(t ’ φf )(0) = 0.

Writing φ(x, y) = »1 x + »2 y, we see that that x = (0, 0, 0, 0) maximises T (x),

subject to the condition F(x) = (0, 0), only if we can ¬nd »1 and »2 such

that t ’ »1 F1 ’ »2 F2 has (0, 0, 0, 0) as a stationary point.

The argument generalises easily to any number of dimensions.

Exercise 13.3.2. Prove Lemma 13.3.3.

350 A COMPANION TO ANALYSIS

Lemma 13.3.3. (Lagrangian necessity.) Consider functions f : Rm+n ’

Rn and t : Rm+n ’ R which are di¬erentiable on an open set U . Suppose

further that Df is continuous and of full rank at every point of U . If h ∈ R n ,

then, if z ∈ U is such that

(i) f (z) = h, and

(ii) t(z) ≥ t(x) for all x ∈ U such that f (x) = h,

then it follows that there exists a » ∈ Rn such that the function t ’ » · f :

Rm+n ’ R has z as a stationary point.

Exercise 13.3.4. Suppose that f : R2 ’ R and t : R2 ’ R are very well

behaved. Convince yourself that, in this case, Lemma 13.3.3 reads as follows.

A man takes the path f (x, y) = h over a hill of height t(x, y). He reaches the

highest point of the path at a point (x0 , y0 ) where the path and the contour line

t(x, y) = t(x0 , y0 ) have a common tangent. Generalise to higher dimensions.

Lemma 13.3.3 gives us a recipe for ¬nding the maximum of t(x), subject to

the conditions x ∈ U and f (x) = h, when t : Rn+m ’ R and f : Rn+m ’ Rn

are well behaved functions and U is open.

(1) Set

L(», x) = t(x) ’ » · f (x)

with » ∈ Rn . (L is called the Lagrangian.)

(2) For each ¬xed », ¬nd the set E(») of stationary points of L(», ) lying

within U . (The previous sentence uses the notational convention described

in the paragraph labled Abuse of Language on Page 4215 .)

(3) Now vary » and ¬nd all »— and x— ∈ E(»— ) such that f (x— ) = h. The

maximum, if it exists, will lie among these x— .

Put like this, the whole procedure has the same unreal quality that in-

structions on how to ride a bicycle would have for someone who had only

seen a photograph of a bicycle. I suspect that the only way to learn how to

use Lagrange™s method is to use it. I expect that most of my readers will

indeed be familiar with Lagrange™s method and, in any case this is a book on

the theory of analysis. However, I include one simple example of the method

in practice.

Example 13.3.5. Find the circular cylinder of greatest volume with given

surface area. (That is, ¬nd the optimum dimensions for tin of standard

shape.)

5

This convention was thoroughly disliked by several of the readers of my manuscript

but, as one of them remarked, ˜No really satisfactory notation exists™. The reader should

feel free to use her own notation. In particular she may prefer to use a placeholder and

write L(», ·)

351

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Calculation. Let the height of the cylinder be h and the radius be r. We

wish to maximise the volume

V (r, h) = πr 2 h,

subject to keeping the surface area

A(r, h) = 2πr 2 + 2πrh

¬xed, with A(r, h) = A [r, h > 0], say.

The instructions for step (1) tell us to form the Lagrangian

L(», r, h) = V (r, h) ’ »A(r, h) = πr 2 h ’ »(2πr 2 + 2πrh).

The instructions for step (2) tell us to seek stationary values for L(», , )

when » is ¬xed. We thus seek to solve

‚L ‚L

(», r, h) = 0, (», r, h) = 0,

‚r ‚h

that is

2πrh ’ »(4πr ’ 2πh) = 0, πr 2 ’ 2»πr = 0,

or, simplifying,

rh ’ »(2r + h) = 0, r ’ 2» = 0

Thus, if » is ¬xed and positive, L(», , ) has a unique stationary point given

by r = r» = 2», h = h» = 4». (If » ¤ 0, there are no stationary points

consistent with our restrictions r > 0, h > 0.)

We now proceed to step (3) which requires us to ¬nd » such that

A = A(r» , h» )

that is

A = A(2», 4») = 24π»2 .

This has unique solution with » > 0. We know, on general grounds (see

Exercise 13.3.6), that a maximum exists so, since Lagrange™s method must

produce the maximum point, and produces only one point in this case, this

point must be the maximum.

The neatest way of writing our solution (and one that we could have

obtained directly by eliminating » from equation ) is that the cylinder of

maximum volume with given surface area has height twice its radius.

352 A COMPANION TO ANALYSIS

It must be admitted that the standard tin does not follow our prescription

very closely. (Presumably the cost of materials is relatively small compared

to other costs. We must also remember that near a maximum small changes

produce very small e¬ects so the penalties for deviating are not high.)

Exercise 13.3.6. We work with the notation and hypotheses introduced in

the calculations in Example 13.3.5. Choose r0 and h0 such that r0 , h0 > 0

and A(r0 , h0 ) = A. Show that there exists an R > 1 such that, if r > R or

R’1 > r > 0 and A(r, h) = A, then V (r, h) ¤ V (r0 , h0 )/2. Show that

K = {(r, h) : A(r, h) = A, R ≥ r ≥ R’1 , h > 0}

is a closed bounded set in R2 . Deduce that V (r, h) attains a maximum on K

and conclude that V (r, h) attains a maximum on

{(r, h) : A(r, h) = A, r > 0, h > 0}.

Exercise 13.3.7. Suppose that f : R2 ’ R and t : R2 ’ R are given by

f (x, y) = y and t(x, y) = y 2 ’ x2 . We seek the maximum value of t(x, y)

subject to f (x, y) = a. Show that L(», ) = t ’ »f is stationary at (0, »/2),

and that the only value of » which gives f (0, »/2) = a is » = 2a. [We again

use the notation described on Page 421.]

(i) Show that (0, a) does indeed maximise t(x, y) subject to f (x, y) = a.

(ii) Show, however, that L(2a, ) = t ’ 2af does not have a maximum at

(0, a).

(iii) Draw a diagram in the manner of Exercise 13.3.4 to illustrate what

is happening.

Exercise 13.3.8. Suppose that f : R2 ’ R and t : R2 ’ R are given by

f (x, y) = x ’ ±y 2 and t(x, y) = (x ’ 1)2 + y 2 . We seek maxima and minima

of t subject to f (x, y) = 0. Show that L(», ) = t ’ »f is stationary at

(1 + »/2, 0), and that the only value of » which gives f (0, 0) = 0 is » = ’2.

(i) Show that, if ± > 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.

Show, moreover, L(’2, ) = t ’ 2f has a strict minimum at 0.

(ii) Show that, if ± = 1/2, (0, 0) minimises t(x, y) subject to f (x, y) = 0.

Show that L(’2, ) has a minimum at 0 but this is not a strict minimum.

(iii) Show that, if ± < 1/2, (0, 0) maximises t(x, y) subject to f (x, y) = 0.

Show, however, that L(’2, ) does not have a maximum at 0.

(iv) Draw a diagram in the manner of Exercise 13.3.4 to illustrate (as far

as you can, do not overdo it) what is happening.

The following remark is entirely trivial but extremely useful.

353

Please send corrections however trivial to twk@dpmms.cam.ac.uk

Lemma 13.3.9. (Lagrangian su¬ciency.) Consider a set X and func-

tions f : X ’ Rn and t : X ’ R. Set

L(», x) = t(x) ’ » · f (x)

for all » ∈ Rn and x ∈ X.

Suppose h ∈ Rn , »— ∈ Rn , and x— ∈ X are such that f (x— ) = h and

L(»— , x— ) ≥ L(»— , x)

for all x ∈ X. Then

t(x— ) ≥ t(x)

for all x ∈ X such that f (x) = h.

Proof. The proof is shorter than the statement. Observe that, under the

given hypotheses,

t(x— ) = t(x— ) ’ »— · f (x— ) + »— · h = L(»— , x— ) + »— · h

≥ L(»— , x) + »— · h = t(x) ’ »— · f (x) + »— · h = t(x)

for all x ∈ X such that f (x) = h, as required.

It is worth noting that Lemma 13.3.9 (our Lagrangian su¬ciency condition)

does not require f or t to be well behaved or make any demands on the un-

derlying space X. This is particularly useful since, if the reader notes where

Lagrange™s method is used, she will ¬nd that it is often used in circumstances

much more general than those envisaged in Lemma 13.3.3 (our Lagrangian

necessity condition).

However, the very simple examples given in Exercises 13.3.7 and 13.3.8

show that there is a very big gap between our Lagrangian necessity and suf-

¬ciency conditions even when we restrict ourselves to well behaved functions

on ¬nite dimensional spaces. In the ¬rst three chapters of his elegant text

Optimization Under Constraints [48] Whittle considers how this gap can be

closed. It turns out that in certain very important practical cases (in partic-

ular those occurring in linear programming) the su¬cient condition is also

necessary but that, from the point of view of the present book, these cases

have very special features.

In general, the Lagrangian method is very e¬ective in suggesting the

correct answer but when that answer has been found it is often easier to

prove correctness by some other method (compare the treatment of geodesics

in Section 10.5 where we found Theorem 10.5.11 by one method but proved

it by another).

354 A COMPANION TO ANALYSIS

The alternative strategy, once the Lagrangian method has produced a

possible solution is to claim that ˜it is obvious from physical considerations

that this is indeed the correct solution™. This works well provided at least

one of the following conditions apply:-

(1) you are answering an examination question, or

(2) you have genuine physical insight, or

(3) you are reasonably lucky.

It is, of course, characteristic of bad luck that it strikes at the most inconve-

nient time.

Chapter 14

Completion

14.1 What is the correct question?

We cannot do interesting analysis on the rationals but we can on the larger

space of real numbers. On the whole, we cannot do interesting analysis on

metric spaces which are not complete. Given such an incomplete metric

space, can we ¬nd a larger complete metric space which contains it? Our

¬rst version of this question might run as follows.

Question A: If (X, d) is a metric space, can we ¬nd a complete metric space

(Z, δ) such that Z ⊇ X and d(u, v) = δ(u, v) for all u, v ∈ X?

Most mathematicians prefer to ask this question in a slightly di¬erent

way.

Question A : If (X, d) is a metric space, can we ¬nd a complete metric space

˜ ˜

(Y, d) and a map θ : X ’ Y such that d(θu, θv) = d(u, v)?

Question A asks us to build a complete metric space containing (X, d).

Question A asks us to build a complete metric space containing a perfect

copy of (X, d). Since mathematicians cannot distinguish between the original

space (X, d) and a perfect copy, they consider the two questions to be equiv-

alent. However, both from a philosophical and a technical point of view, it

is easier to handle Question A .

Exercise 14.1.1. Convince yourself either that Question A is equivalent to

Question A or that Question A is more appropriate than Question A.

For the moment, we stick to Question A. A little thought shows that it

does not have a unique answer.

Exercise 14.1.2. Consider the open interval X = (0, 1) on R with the usual

metric d.

(i) Show that (X, d) is not complete.

355

356 A COMPANION TO ANALYSIS

(ii) Show that, if we take Z = [0, 1] with the usual metric δ, then (Z, δ)

answers Question A.

(iii) Show that, if we take Y = R with the usual metric δ, then (Z, δ)

answers Question A.

We therefore reformulate Question A as follows.

Question B: If (X, d) is a metric space, can we ¬nd a most economical com-

plete metric space (Z, ρ) such that Z ⊇ X and d(u, v) = ρ(u, v) for all u,

v ∈ X?

We can formulate an appropriate meaning for ˜most economical™ with the

aid of the notion of density.

De¬nition 14.1.3. If (X, d) is a metric space, we say that a subset E is

dense in X if, given any x ∈ X, we can ¬nd xn ∈ E with d(xn , x) ’ 0 as

n ’ ∞.

Exercise 14.1.4. Show that the following statements about a subset E of a

metric space (X, d) are equivalent.

(i) E is dense in X.

(ii) Given x ∈ X and > 0, we can ¬nd a y ∈ E such that d(x, y) < .

(iii) If F is a closed subset of X with F ⊇ E, then F = X.

The notion of density is widely used in analysis since, in some sense,

a dense subset of a metric space acts as a ˜skeleton™ for the space. For

the moment, let us see why it provides the appropriate de¬nition of ˜most

economical™ in the context of this chapter.

Lemma 14.1.5. Suppose (X, d), (Y, ρ) and (Y , ρ ) are metric spaces with

the following properties.

(i) (Y, ρ) and (Y , ρ ) are complete,

(ii) There exist maps θ : X ’ Y and θ : X ’ Y such that ρ(θu, θv) =

ρ (θ u, θ v) = d(u, v).

(iii) θX is dense in Y and θ X is dense in Y .

Then we can ¬nd a bijection φ : Y ’ Y such that φθ = θ and ρ (φw, φz) =

ρ(w, z) for all w, z ∈ Y .

Exercise 14.1.6. (i) Convince yourself that Lemma 14.1.5 can be roughly

restated in terms of Question A as follows. Suppose (X, d) is a metric space

and (Z, δ) and (Z , δ ) are complete metric spaces such that X ⊆ Z, X ⊆ Z ,

δ(u, v) = δ (u, v) = d(u, v) for all u, v ∈ X and X is dense in (Z, δ) and in

(Z , δ ). Then (Z, δ) and (Z , δ ) have the same metric structure and X sits

in both metric spaces in the same way. [Note, you must be able to describe

the purpose of the statement φθ = θ in this account.]

357

Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Convince yourself that the questions involved in Lemma 14.1.5 are

better treated in the language of Question A than in the language of Ques-

tion A.

Proof of Lemma 14.1.5. If y ∈ Y , then, since θX is dense in (Y, ρ), we can

¬nd xn ∈ X with ρ(θxn , y) ’ 0. Since θxn converges, it forms a Cauchy

sequence in (Y, ρ). But

ρ (θ xn , θ xm ) = d(xn , xm ) = ρ(θxn , θxm ),

so θ xn is a Cauchy sequence in (Y , ρ ) and so converges to a limit x, say.

˜

Suppose zn ∈ X with ρ(θzn , y) ’ 0. By the argument of the previous

paragraph, θ zn is a Cauchy sequence in (Y , ρ ) and so converges to a limit

z , say. We wish to show that x = z . To do this, observe that

˜ ˜˜

ρ (˜, z ) ¤ ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ (θ xn , θ zn )

x˜ x z

= ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ(θxn , θzn )

x z

¤ ρ (˜, θ xn ) + ρ (˜, θ zn ) + ρ(θxn , y) + ρ(θzn , y)

x z

’ 0 + 0 + 0 + 0 = 0,

as n ’ ∞. Thus ρ (˜, z ) = 0 and x = z . We have shown that we can de¬ne

x˜ ˜˜

φy unambiguously by φy = x. ˜

We have now de¬ned φ : Y ’ Y . To show that ρ (φw, φz) = ρ(w, z),

choose wn ∈ X and zn ∈ X such that ρ(θwn , w), ρ(θzn , z) ’ 0 as n ’ ∞.

Then ρ(θzn , θwn ) ’ ρ(z, w) as n ’ ∞. (See Exercise 14.1.7 if necessary.)

By de¬nition, ρ (θ zn , φz) ’ 0 and ρ (θ wn , φw) ’ 0, so ρ (θ zn , θ wn ) ’

ρ (φz, φw). Since

ρ (θ zn , θ wn ) = d(zn , wn ) = ρ(θzn , θwn ),

it follows that ρ (φw, φz) = ρ(w, z).

To show that φθ = θ , choose any x ∈ X. If we set xn = x for each n,

we see that ρ(θxn , θx) = 0 ’ 0 and ρ (θ xn , θ x) = 0 ’ 0 as n ’ ∞ so, by

de¬nition, φθx = θ x. Since x was arbitrary, it follows that φθ = θ .

Finally, we must show that φ is a bijection. There are several ways of

approaching this. We choose an indirect but natural approach. Observe that,

interchanging the roles of Y and Y , the work done so far also shows that

˜ ˜ ˜˜

there is a map φ : Y ’ Y such that φθ = θ and ρ(φw, φz) = ρ (w, z) for all

w, z ∈ Y . Since

˜ ˜ ˜

(φφ)θ = φ(φθ) = φθ = θ,

358 A COMPANION TO ANALYSIS

˜

we have φφ(y) = y for all y ∈ θX. Now θX is dense in Y , so, if y ∈ Y , we

can ¬nd yn ∈ θX such that ρ(yn , y) ’ 0. Thus

˜ ˜ ˜ ˜

ρ(φφ(y), y) ¤ ρ(φφ(y), φφ(yn )) + ρ(φφ(yn ), yn ) + ρ(yn , y)

˜ ˜

= ρ(φφ(y), φφ(yn )) + ρ(yn , y) = ρ (φ(y), φ(yn )) + ρ(yn , y)