<<

. 6
( 19)



>>


as (h2 + k 2 )1/2 ’ 0. Setting k = h, we get
h+h f (h, h) + h
21/2 = ’0
=
(h2 + h2 )1/2 (h2 + h2 )1/2
as h ’ 0, which is absurd. Thus f is not di¬erentiable at (0, 0).
(We give a stronger result in Exercise C.8 and a weaker but slightly easier
result in Exercise 7.3.16.)
Exercise 7.3.15. Write down the details behind the ¬rst sentence of our
proof of Example 7.3.14. You will probably wish to quote Lemma 6.2.11 and
Exercise 6.2.17.
Exercise 7.3.16. If
xy
f (x, y) = for (x, y) = (0, 0),
(x2 + y 2 )1/2
f (0, 0) = 0,

show that f is di¬erentiable except at (0, 0), is continuous at (0, 0) and has
partial derivatives f,1 (0, 0) and f,2 (0, 0) at (0, 0) but has directional deriva-
tives in no other directions at (0, 0). Discuss your results brie¬‚y using the
ideas of Exercise 7.3.12.
A further exercise on the ideas just used is given as Exercise K.108.
Emboldened by our success, we could well guess immediately a suitable
function to look for in the context of Theorem 7.2.6.
Exercise 7.3.17. Suppose that f : R2 ’ R is given by f (0, 0) = 0 and

f (r cos θ, r sin θ) = r 2 sin 4θ,

for r > 0. Show that
4xy(x2 ’ y 2 )
f (x, y) =
x2 + y 2
for (x, y) = 0. Sketch the contour lines f (x, y) = h, 22 h, 32 h, . . . and
compare the result with Figure 7.2.
Exercise 7.3.18. Suppose that
xy(x2 ’ y 2 )
f (x, y) = for (x, y) = (0, 0),
(x2 + y 2 )
f (0, 0) = 0.
163
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(i) Compute f,1 (0, y), for y = 0, by using standard results of the calculus.
(ii) Compute f,1 (0, 0) directly from the de¬nition of the derivative.
(iii) Find f,2 (x, 0) for all x.
(iv) Compute f,12 (0, 0) and f,21 (0, 0).
(v) Show that f has ¬rst and second partial derivatives everywhere but
f,12 (0, 0) = f,21 (0, 0).
It is profoundly unfortunate that Example 7.3.14 and Exercise 7.3.18 seem
to act on some examiners like catnip on a cat. Multi-dimensional calculus
leads towards di¬erential geometry and in¬nite dimensional calculus (func-
tional analysis). Both subjects depend on understanding objects which we
know to be well behaved but which our limited geometric intuition makes it
hard for us to comprehend. Counterexamples, such as the ones just produced,
which depend on functions having some precise degree of di¬erentiability are
simply irrelevant.
At the beginning of this section we used a ¬rst order local Taylor expan-
sion and results on linear maps to establish the behaviour of a well behaved
function f near a point x where Df (x) = 0. We then used a second order lo-
cal Taylor expansion and results on bilinear maps to establish the behaviour
of a well behaved function f near a point x where Df (x) = 0 on condition
that D2 f (x) was non-singular. Why should we stop here?
It is not the case that we can restrict ourselves to functions f for which
D2 f (x) is non-singular at all points.
Exercise 7.3.19. (i) Let A(t) be a 3 — 3 real symmetric matrix with A(t) =
(aij (t)). Suppose that the entries aij : R ’ R are continuous. Explain why
det A : R ’ R is continuous. By using an expression for det A in terms
of the eigenvalues of A, show that, if A(0) is positive de¬nite and A(1) is
negative de¬nite, then there must exist a c ∈ (0, 1) with A(c) singular.
(ii) Let m be an odd positive integer, U an open subset of Rm and γ :
[0, 1] ’ U a continuous map. Suppose that f : U ’ R has continuous
second order partial derivatives on U , that f attains a local minimum at γ(0)
and a local maximum at γ(1). Show that there exists a c ∈ [0, 1] such that
D2 f (γ(t)) is singular.
There is nothing special about the choice of m odd in Exercise 7.3.19.
We do the case m = 2 in Exercise K.106 and ambitious readers may wish to
attack the general case themselves (however, it is probably only instructive if
you make the argument watertight). Exercise K.43 gives a slightly stronger
result when m = 1.
However, it is only when Df (x) vanishes and D 2 f (x) is singular at the
same point x that we have problems and we can readily convince ourselves
(note this is not the same as proving) that this is rather unusual.
164 A COMPANION TO ANALYSIS

Exercise 7.3.20. Let f : R ’ R be given by f (x) = ax3 + bx2 + cx + d
with a, b, c, d real. Show that there is a y with f (y) = f (y) = 0 if and
only if one of the following two conditions hold:- a = 0 and b2 = 3ac, or
a = b = c = 0,

Faced with this kind of situation mathematicians tend to use the word
generic and say ˜in the generic case, the Hessian is non-singular at the critical
points™. This is a useful way of thinking but we must remember that:-
(1) If we leave the word generic unde¬ned, any sentence containing the
word generic is, strictly speaking, meaningless.
(2) In any case, if we look at any particular function, it ceases to be
generic. (A generic function is one without any particular properties. Any
particular function that we look at has the particular property that we are
interested in it.)
(3) The generic case may be a lot worse than we expect. Most mathemati-
cians would agree that the generic function f : R ’ R is unbounded on every
interval (a, b) with a < b, that the generic bounded function f : R ’ R is dis-
continuous at every point and that the generic continuous function f : R ’ R
is nowhere di¬erentiable. We should have said something more precise like
˜the generic 3 times di¬erentiable function f : Rn ’ R has a non-singular
Hessian at its critical points™.
So far in this section we have looked at stationary points of f by studying
the local behaviour of the function. In this we have remained true to our
17th and 18th century predecessors. In a paper entitled On Hills and Dales,
Maxwell7 raises our eyes from the local and shows us the prospect of a global
theory.

Plausible statement 7.3.21. (Hill and dale theorem.) Suppose the
surface of the moon has a ¬nite number S of summits, B of bottoms and
P of passes (all heights being measured from the moon™s centre). Then

S + B ’ P = 2.

Plausible Proof. By digging out pits and piling up soil we may ensure that
all the bottoms are at the same height, that all the passes are at di¬erent
heights, but all higher than the bottoms, and that all the summits are at the
same height which is greater than the height of any pass. Now suppose that
it begins to rain and that the water level rises steadily (and that the level is
the same for each body of water). We write L(h) for the number of lakes (a
lake is the largest body of water that a swimmer can cover without going on
7
Maxwell notes that he was anticipated by Cayley.
165
Please send corrections however trivial to twk@dpmms.cam.ac.uk




Figure 7.4: A pass vanishes under water

to dry land), I(h) for the number of islands (an island is the largest body of
dry land that a walker can cover without going into the water) and P (h) for
the number of passes visible when the height of the water is h.
When the rain has just begun and the height h0 , say, of the water is
higher than the bottoms, but lower than the lowest pass, we have
L(h0 ) = B, I(h0 ) = 1, P (h0 ) = P. (1)
(Observe that there is a single body of dry land that a walker can get to
without going into the water so I(h0 ) = 1 even if the man in the street would
object to calling the surface of the moon with a few puddles an island.) Every
time the water rises just high enough to drown a pass, then either
(a) two arms of a lake join so an island appears, a pass vanishes and the
number of lakes remains the same, or
(b) two lakes come together so the number of lakes diminishes by one, a
pass vanishes and the number of islands remains the same.
We illustrate this in Figure 7.4. In either case, we see that
I(h) ’ L(h) + P (h) remains constant
and so, by equation (1),
I(h) ’ L(h) + P (h) = I(h0 ) ’ L(h0 ) + P (h0 ) = 1 ’ B + P. (2)
When the water is at a height h1 , higher than the highest pass but lower
than the summits, we have
L(h1 ) = 1, I(h1 ) = S, P (h1 ) = 0. (3)
(Though the man in the street would now object to us calling something a
lake when it is obviously an ocean with S isolated islands.) Using equations
(2) and (3), we now have
1 ’ B + P = I(h1 ) ’ L(h1 ) + P (h1 ) = S ’ 1
and so B + S ’ P = 2.
166 A COMPANION TO ANALYSIS




Figure 7.5: One- and two-holed doughnuts

Exercise 7.3.22. State and provide plausible arguments for plausible results
corresponding to Plausible Statement 7.3.21 when the moon is in the shape
of a one-holed doughnut, two-holed doughnut and an n-holed doughnut (see
Figure 7.5).
Notice that local information about the nature of a function at special
points provides global ˜topological™ information about the number of holes in
a doughnut.
If you know Euler™s theorem (memory jogger ˜V-E+F=2™), can you con-
nect it with this discussion?

Exercise 7.3.23. The function f : R2 ’ R is well behaved (say 3 times
di¬erentiable). We have f (x, y) = 0 for x2 + y 2 = 1 and f (x, y) > 0 for
x2 + y 2 < 1. State and provide a plausible argument for a plausible result
concerning the number of maxima, minima and saddle points (x, y) for f
with x2 + y 2 < 1.

I ¬nd the plausible argument just used very convincing but it is not clear
how we would go about converting it into an argument from ¬rst principles
(in e¬ect, from the fundamental axiom of analysis). Here are some of the
problems we must face.
(1) Do contour lines actually exist (that is do the points (x, y) with
f (x, y) = h actually lie on nice curves)8 ? We shall answer this question
locally by the implicit function theorem (Theorem 13.2.4) and our discussion
of the solution of di¬erential equations in Section 12.3 will shed some light
on the global problem.
(2) ˜The largest body of water that a swimmer can cover without going
on to dry land™ is a vivid but not a mathematical expression. In later work
8
The reader will note that though we have used contour lines as a heuristic tool we have
not used them in proofs. Note that, in speci¬c cases, we do not need a general theorem to
tell us that contour lines exist. For example, the contour lines of f (x, y) = a ’2 x2 + b’2 y 2
are given parametrically by (x, y) = (ah1/2 cos θ, bh1/2 sin θ) for h ≥ 0.
167
Please send corrections however trivial to twk@dpmms.cam.ac.uk

this problem is resolved by giving a formal de¬nition of a connected set.
(3) Implicit in our argument is the idea that a loop divides a sphere
into two parts. A result called the Jordan curve theorem gives the formal
statement of this idea but the proof turns out to be unexpectedly hard,
Another, less important, problem is to show that the hypothesis that
there are only a ˜¬nite number S of summits, B of bottoms and P of passes™
applies to an interesting variety of cases. It is certainly not the case that a
function f : R ’ R will always have only a ¬nite number of maxima in a
closed bounded interval. In the same way, it is not true that a moon need
have only a ¬nite number of summits.

Exercise 7.3.24. Reread Example 7.1.5. De¬ne f : R ’ R by

f (x) = (cos(1/x) ’ 1) exp(’1/x2 ) if x = 0,
f (0) = 0

Show that f is in¬nitely di¬erentiable everywhere and that f has an in¬nite
number of distinct strict local maxima in the interval [’1, 1].
(Exercise K.42 belongs to the same circle of ideas.)

The answer, once again, is to develop a suitable notion of genericity but
we shall not do so here.
Some say will say that there is no need to answer these questions since
the plausible argument which establishes Plausible Statement 7.3.21 is in
some sense ˜obviously correct™. I would reply that the reason for attacking
these questions is their intrinsic interest. Plausible Statement 7.3.21 and the
accompanying discussion are the occasion for us to ask these questions, not
the reason for trying to answer them. I would add that we cannot claim to
understand Maxwell™s result fully unless we can see either how it generalises
to higher dimensions or why it does not.
Students often feel that multidimensional calculus is just a question of
generalising results from one dimension to many. Maxwell™s result shows that
the change from one to many dimensions introduces genuinely new phenom-
ena, whose existence cannot be guessed from a one dimensional perspective.
Chapter 8

The Riemann integral

8.1 Where is the problem ?
Everybody knows what area is, but then everybody knows what honey tastes
like. But does honey taste the same to you as it does to me? Perhaps the
question is unanswerable but, for many practical purposes, it is su¬cient
that we agree on what we call honey. In the same way, it is important that,
when two mathematicians talk about area, they should agree on the answers
to the following questions:-
(1) Which sets E actually have area?
(2) When a set E has area, what is that area?
One of the discoveries of 20th century mathematics is that decisions on (1)
and (2) are linked in rather subtle ways to the question:-
(3) What properties should area have?
As an indication of the ideas involved, consider the following desirable
properties for area.
(a) Every bounded set E in R2 has an area |E| with |E| ≥ 0.
(b) Suppose that E is a bounded set in R2 . If E is congruent to F (that
is E can be obtained from F by translation and rotation), then |E| = |F |.
(c) Any square E of side a has area |E| = a2 .
(d) If E1 , E2 , . . . are disjoint bounded sets in R2 whose union F = ∞ Ej
i=1

is also bounded, then |F | = i=1 |Ej | (so ˜the whole is equal to the sum of
its parts™).

Exercise 8.1.1. Suppose that conditions (a) to (d) all hold.
(i) Let A be a bounded set in R2 and B ⊆ A. By writing A = B ∪ (A \ B)
and using condition (d) together with other conditions, show that |A| ≥ |B|.
(ii) By using (i) and condition (c), show that, if A is a non-empty bounded
open set, in R2 then |A| > 0.

169
170 A COMPANION TO ANALYSIS

We now show that assuming all of conditions (a) to (d) leads to a con-
tradiction. We start with an easy remark.

Exercise 8.1.2. If 0 ¤ x, y < 1, write x ∼ y whenever x ’ y ∈ Q. Show
that if x, y, z ∈ [0, 1) we have
(i) x ∼ x,
(ii) x ∼ y implies y ∼ x,
(iii) x ∼ y and y ∼ z together imply x ∼ z.
(In other words, ∼ is an equivalence relation.)
Write

[x] = {y ∈ [0, 1) : y ∼ x}.

(In other words, write [x] for the equivalence class of x.) By quoting the
appropriate theorem or direct proof, show that
(iv) [x] = [0, 1),
x∈[0,1)
(v) if x, y ∈ [0, 1), then either [x] = [y] or [x] © [y] = ….

We now consider a set A which contains exactly one element from each
equivalence class.

Exercise 8.1.3. (This is easy.) Show that if t ∈ [0, 1) then the equation

t≡a+q mod 1

has exactly one solution with a ∈ A, q rational and q ∈ [0, 1).
[Here t ≡ x + q mod 1 means t ’ x ’ q ∈ Z.]

We are now in a position to produce our example. It will be easiest to
work in C identi¬ed with R2 in the usual way and to de¬ne

E = {r exp 2πia : 1 > r > 0, a ∈ A}.

Since Q is countable, it follows that its subset Q © [0, 1) is countable and we
can write

Q © [0, 1) = {qj : j ≥ 1}

with q1 , q2 , . . . all distinct. Set

Ej = {r exp 2πi(a + qj ) : 1 > r > 0, a ∈ A}.
171
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 8.1.4. Suppose that conditions (a) to (d) all hold.
(i) Describe the geometric relation of E and Ej . Deduce that |E| = |Ej |.
(ii) Use Exercise 8.1.3 to show that Ej © Ek = … if j = k.
(iii) Use Exercise 8.1.3 to show that

Ej = U
j=1


where U = {z : 0 < |z| < 1}.
(iv) Deduce that

|Ej | = |U |.
j=1


Show from Exercise 8.1.1 (ii) that 0 < |U |.
(v) Show that (i) and (iv) lead to a contradiction if |E| = 0 and if |E| > 0.
Thus (i) and (iv) lead to a contradiction whatever we assume. It follows that
conditions (a) to (d) cannot all hold simultaneously.

Exercise 8.1.5. De¬ne E and Eq as subsets of R2 without using complex
numbers.

The example just given is due to Vitali. It might be hoped that the
problem raised by Vitali™s example are due to the fact that condition (d)
involves in¬nite sums. This hope is dashed by the following theorem of
Banach and Tarski.

Theorem 8.1.6. The unit ball in R3 can be decomposed into a ¬nite number
of pieces which may be reassembled, using only translation and rotation, to
form 2 disjoint copies of the unit ball.

Exercise 8.1.7. Use Theorem 8.1.6 to show that the following four condi-
tions are not mutually consistent.
(a) Every bounded set E in R3 has an volume |E| with |E| ≥ 0.
(b) Suppose that E is a bounded set in R3 . If E is congruent to F (that
is E can be obtained from F by translation and rotation), then |E| = |F |.
(c) Any cube E of side a has volume |E| = a3 .
(d) If E1 and E2 are disjoint bounded sets in R3 , then |E1 ∪ E2 | = |E1 | +
|E2 |.

The proof of Theorem 8.1.6, which is a lineal descendant of Vitali™s ex-
ample, is too long to be given here. It is beautifully and simply explained in
172 A COMPANION TO ANALYSIS

a book [46] devoted entirely to ideas generated by the result of Banach and
Tarski1 .
The examples of Vitali and Banach and Tarski show that if we want a
well behaved notion of area we will have to say that only certain sets have
area. Since the notion of an integral is closely linked to that of area, (˜the
integral is the area under the curve™) this means that we must accept that
only certain functions have integrals. It also means that that we must make
sure that our de¬nition does not allow paradoxes of the type discussed here.


8.2 Riemann integration
In this section we introduce a notion of the integral due to Riemann. For
the moment we only attempt to de¬ne our integral for bounded functions on
bounded intervals.
Let f : [a, b] ’ R be a function such that there exists a K with |f (x)| ¤ K
for all x ∈ [a, b]. [To see the connection with ˜the area under the curve™ it is
helpful to suppose initially that 0 ¤ f (x) ¤ K. However, all the de¬nitions
and proofs work more generally for ’K ¤ f (x) ¤ K. The point is discussed
in Exercise K.114.] A dissection (also called a partition) D of [a, b] is a ¬nite
subset of [a, b] containing the end points a and b. By convention, we write

D = {x0 , x1 , . . . , xn } with a = x0 ¤ x1 ¤ x2 ¤ · · · ¤ xn = b.

We de¬ne the upper sum and lower sum associated with D by
n
S(f, D) = (xj ’ xj’1 ) sup f (x),
x∈[xj’1 ,xj ]
j=1
n
s(f, D) = (xj ’ xj’1 ) inf f (x)
x∈[xj’1 ,xj ]
j=1

b
[Observe that, if the integral a f (t) dt exists, then the upper sum ought to
provide an upper bound and the lower sum a lower bound for that integral.]
Exercise 8.2.1. (i) Suppose that a ¤ c ¤ b. If D = {a, b} and D =
{a, c, b}, show that

S(f, D) ≥ S(f, D ) ≥ s(f, D ) ≥ s(f, D).
1
In more advanced work it is observed that our discussion depends on a principle
called the ˜axiom of choice™. It is legitimate to doubt this principle. However, anyone who
doubts the axiom of choice but believes that every set has volume resembles someone who
disbelieves in Father Christmas but believes in ¬‚ying reindeer.
173
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(ii) Let c = a, b. Show by examples that, in (i), we can have either
S(f, D) = S(f, D ) or S(f, D) > S(f, D ).
(iii) Suppose that a ¤ c ¤ b and D is a dissection. Show that, if D =
D ∪ {c}, then

S(f, D) ≥ S(f, D ) ≥ s(f, D ) ≥ s(f, D).

(iv) Suppose that D and D are dissections with D ⊇ D. Show, using
(iii), or otherwise, that

S(f, D) ≥ S(f, D ) ≥ s(f, D ) ≥ s(f, D).

The result of Exercise 8.2.1 (iv) is so easy that it hardly requires proof.
None the less it is so important that we restate it as a lemma.

Lemma 8.2.2. If D and D are dissections with D ⊇ D then

S(f, D) ≥ S(f, D ) ≥ s(f, D ) ≥ s(f, D).

The next lemma is again hardly more than an observation but it is the
key to the proper treatment of the integral.

Lemma 8.2.3 (Key integration property). If f : [a, b] ’ R is bounded
and D1 and D2 are two dissections, then

S(f, D1 ) ≥ S(f, D1 ∪ D2 ) ≥ s(f, D1 ∪ D2 ) ≥ s(f, D2 ).

The inequalities tell us that, whatever dissection you pick and whatever
dissection I pick, your lower sum cannot exceed my upper sum. There is no
way we can put a quart into a pint pot2 and the Banach-Tarski phenomenon
is avoided.
Since S(f, D) ≥ ’(b ’ a)K for all dissections D we can de¬ne the upper
integral as I — (f ) = inf D S(f, D). We de¬ne the lower integral similarly as
I— (f ) = supD s(f, D). The inequalities tell us that these concepts behave
well.

Lemma 8.2.4. If f : [a, b] ’ R is bounded, then I — (f ) ≥ I— (f ).
b
[Observe that, if the integral a f (t) dt exists, then the upper integral
ought to provide an upper bound and the lower integral a lower bound for
that integral.]
2
Or a litre into a half litre bottle. Any reader tempted to interpret such pictures
literally is directed to part (iv) of Exercise K.171.
174 A COMPANION TO ANALYSIS

If I — (f ) = I— (f ), we say that f is Riemann integrable and we write
b
f (x) dx = I — (f ).
a

We write R[a, b] or sometimes just R for the set of Riemann integrable func-
tions on [a, b].

Exercise 8.2.5. If k ∈ R show that the constant function given by f (t) = k
for all t is Riemann integrable and
b
k dx = k(b ’ a).
a

The following lemma provides a convenient criterion for Riemann inte-
grability.

Lemma 8.2.6. (i) A bounded function f : [a, b] ’ R is Riemann integrable
if and only if, given any > 0, we can ¬nd a dissection D with

S(f, D) ’ s(f, D) < .

(ii) A bounded function f : [a, b] ’ R is Riemann integrable with integral
I if and only if, given any > 0, we can ¬nd a dissection D with

S(f, D) ’ s(f, D) < and |S(f, D) ’ I| ¤ .

Proof. (i) We need to prove necessity and su¬ciency. To prove necessity,
suppose that f is Riemann integrable with Riemann integral I (so that I =
I — (f ) = I— (f )). If > 0 then, by the de¬nition of I — (f ), we can ¬nd a
dissection D1 such that

I + /2 > S(f, D1 ) ≥ I.

Similarly, by the de¬nition of I— (f ), we can ¬nd a dissection D2 such that

I ≥ s(f, D2 ) > I ’ /2.

Setting D = D1 ∪ D2 and using Lemmas 8.2.2 and 8.2.3, we have

I + /2 > S(f, D1 ) ≥ S(f, D) ≥ s(f, D) ≥ s(f, D2 ) > I ’ /2,

so S(f, D) ’ s(f, D) < as required.
175
Please send corrections however trivial to twk@dpmms.cam.ac.uk

To prove su¬ciency suppose that, given any > 0, we can ¬nd a dissection
D with

S(f, D) ’ s(f, D) < .

Using the de¬nition of the upper and lower integrals I — (f ) and I— (f ) together
with the fact that I — (f ) ≥ I— (f ) (a consequence of our key Lemma 8.2.3), we
already know that

S(f, D) ≥ I — (f ) ≥ I— (f ) ≥ s(f, D),

so we may conclude that ≥ I — (f ) ’ I— (f ) ≥ 0. Since is arbitrary, we have
I — (f ) ’ I— (f ) = 0 so I — (f ) = I— (f ) as required.
(ii) Left to the reader.
Exercise 8.2.7. Prove part (ii) of Lemma 8.2.6.
Many students are tempted to use Lemma 8.2.6 (ii) as the de¬nition of
the Riemann integral. The reader should re¬‚ect that, without the inequality
, it is not even clear that such a de¬nition gives a unique value for I. (This
is only the ¬rst of a series of nasty problems that arise if we attempt to
develop the theory without ¬rst proving , so I strongly advise the reader
not to take this path.) We give another equivalent de¬nition of the Riemann
integral in Exercise K.113.
It is reasonably easy to show that the Riemann integral has the properties
which are normally assumed in elementary calculus.
Lemma 8.2.8. If f, g : [a, b] ’ R are Riemann integrable, then so is f + g
and
b b b
f (x) + g(x) dx = f (x) dx + g(x) dx.
a a a

b b
Proof. Let us write I(f ) = a f (x) dx and I(g) = a g(x) dx. Suppose > 0
is given. By the de¬nition of the Riemann integral, we can ¬nd dissections
D1 and D2 of [a, b] such that

I(f ) + /4 >S(f, D1 ) ≥ I(f ) > s(f, D1 ) ’ /4 and
I(g) + /4 >S(g, D2 ) ≥ I(g) > s(g, D2 ) ’ /4.

and the de¬nition of I — (f )
If we set D = D1 ∪ D2 , then our key inequality
tell us that

I(f ) + /4 > S(f, D1 ) ≥ S(f, D) ≥ I(f ).
176 A COMPANION TO ANALYSIS

Using this and corresponding results, we obtain

I(f ) + /4 >S(f, D) ≥ I(f ) > s(f, D) ’ /4 and
I(g) + /4 >S(g, D) ≥ I(g) > s(g, D) ’ /4.

Now
n
S(f + g, D) = (xj ’ xj’1 ) sup (f (x) + g(x))
x∈[xj’1 ,xj ]
j=1
n
¤ (xj ’ xj’1 )( sup f (x) + sup g(x))
x∈[xj’1 ,xj ] x∈[xj’1 ,xj ]
j=1

= S(f, D) + S(g, D)

and similarly s(f +g, D) ≥ s(f, D)+s(g, D). Thus, using the ¬nal inequalities
of the last paragraph,

I(f ) + I(g) + /2 > S(f, D) + S(g, D) ≥ S(f + g, D)
≥ s(f + g, D) ≥ s(f, D) + s(g, D) > I(f ) + I(g) ’ /2.

Thus S(f + g, D) ’ s(f + g, D) < and |S(f + g, D) ’ (I(f ) + I(g))| < .

Exercise 8.2.9. How would you explain (NB explain, not prove) to someone
who had not done calculus but had a good grasp of geometry why the result
b b b
f (x) + g(x) dx = f (x) dx + g(x) dx
a a a

is true for well behaved functions. (I hope that you will agree with me that,
obvious as this result now seems to us, the ¬rst mathematicians to grasp this
fact had genuine insight.)

Exercise 8.2.10. (i) If f : [a, b] ’ R is bounded and D is a dissection of
[a, b], show that S(’f, D) = ’s(f, D).
(ii) If f : [a, b] ’ R is Riemann integrable, show that ’f is Riemann
integrable and
b b
(’f (x)) dx = ’ f (x) dx.
a a

(iii) If » ∈ R, » ≥ 0, f : [a, b] ’ R is bounded and D is a dissection of
[a, b], show that S(»f, D) = »S(f, D).
177
Please send corrections however trivial to twk@dpmms.cam.ac.uk

(iv) If » ∈ R, » ≥ 0 and f : [a, b] ’ R is Riemann integrable, show that
»f is Riemann integrable and
b b
»f (x) dx = » f (x) dx.
a a

(v) If » ∈ R and f : [a, b] ’ R is Riemann integrable, show that »f is
Riemann integrable and
b b
»f (x) dx = » f (x) dx.
a a

Combining Lemma 8.2.8 with Exercise 8.2.10, we get the following result.

Lemma 8.2.11. If », µ ∈ R and f, g : [a, b] ’ R are Riemann integrable,
then »f + µg is Riemann integrable and
b b b
»f (x) + µg(x) dx = » f (x) dx + µ g(x) dx.
a a a

In the language of linear algebra, R[a, b] (the set of Riemann integrable
functions on [a, b]) is a vector space and the integral is a linear functional
(i.e. a linear map from R[a, b] to R).

Exercise 8.2.12. (i) If E is a subset of [a, b], we de¬ne the indicator func-
tion IE : [a, b] ’ R by IE (x) = 1 if x ∈ E, IE (x) = 0 otherwise. Show
directly from the de¬nition that, if a ¤ c ¤ d ¤ b, then I[c,d] is Riemann
integrable and
b
I[c,d] (x) dx = d ’ c.
a

(ii) If a ¤ c ¤ d ¤ b, we say that the intervals (c, d), (c, d], [c, d), [c, d] all
have length d ’ c. If I(j) is a subinterval of [a, b] of length |I(j)| and » j ∈ R
show that the step function n »j II(j) is Riemann integrable and
j=1

n n
b
»j II(j) dx = »j |I(j)|.
a j=1 j=1


Exercise 8.2.13. (i) If f, g : [a, b] ’ R are bounded functions with f (t) ≥
g(t) for all t ∈ [a, b] and D is a dissection of [a, b], show that S(f, D) ≥
S(g, D).
178 A COMPANION TO ANALYSIS

(ii) If f, g : [a, b] ’ R are Riemann integrable functions with f (t) ≥ g(t)
for all t ∈ [a, b], show that
b b
f (x) dx ≥ g(x) dx.
a a

(iii) Suppose that f : [a, b] ’ R is a Riemann integrable function, K ∈ R
and f (t) ≥ K for all t ∈ [a, b]. Show that
b
f (x) dx ≥ K(b ’ a).
a

State and prove a similar result involving upper bounds.
(iv) Suppose that f : [a, b] ’ R is a Riemann integrable function, K ∈ R,
K ≥ 0 and |f (t)| ¤ K for all t ∈ [a, b]. Show that
b
f (x) dx ¤ K(b ’ a).
a

Although part (iv) is weaker than part (iii), it generalises more easily and
we shall use it frequently in the form
|integral| ¤ length — sup.
Exercise 8.2.14. (i) Let M be a positive real number and f : [a, b] ’ R
a function with |f (t)| ¤ M for all t ∈ [a, b]. Show that |f (s)2 ’ f (t)2 | ¤
2M |f (s) ’ f (t)| and deduce that
sup f (x)2 ’ inf f (x)2 ¤ 2M ( sup f (x) ’ inf f (x)).
x∈[a,b] x∈[a,b]
x∈[a,b] x∈[a,b]

(ii) Let f : [a, b] ’ R be a bounded function. Show that, if D is a
dissection of [a, b],
S(f 2 , D) ’ s(f 2 D) ¤ 2M (S(f, D) ’ s(f, D)).
Deduce that, if f is Riemann integrable, so is f 2 .
(iii) By using the formula f g = 1 ((f + g)2 ’ (f ’ g)2 ), or otherwise,
4
deduce that that if f, g : [a, b] ’ R are Riemann integrable, so is f g (the
product of f and g). (Compare Exercise 1.2.6.)
Exercise 8.2.15. (i) Consider a function f : [a, b] ’ R. We de¬ne f+ , f’ :
[a, b] ’ R by
if f (t) ≥ 0
f+ (t) = f (t), f’ (t) = 0
f’ (t) = ’f (t) if f (t) ¤ 0.
f+ (t) = 0,
179
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Check that f (t) = f+ (t) ’ f’ (t) and |f (t)| = f+ (t) + f’ (t).
(ii) If f : [a, b] ’ R is bounded and D is a dissection of [a, b], show that
S(f, D) ’ s(f, D) ≥ S(f+ , D) ’ s(f+ , D) ≥ 0.
(iii) If f : [a, b] ’ R is Riemann integrable, show that f+ and f’ are
Riemann integrable.
(iv) If f : [a, b] ’ R is Riemann integrable, show that |f | is Riemann
integrable and
b b
|f (x)| dx ≥ f (x) dx .
a a

Exercise 8.2.16. In each of Exercises 8.2.10, 8.2.14 and 8.2.15 we used
a roundabout route to our result. For example, in Exercise 8.2.10 we ¬rst
proved that if f 2 is Riemann integrable whenever f is and then used this
result to prove that f g is Riemann integrable whenever f and g are. It is
natural to ask whether we can give a direct proof in each case. The reader
should try to do so. (In my opinion, the direct proofs are not much harder,
though they do require more care in writing out.)
Exercise 8.2.17. (i) Suppose that a ¤ c ¤ b and that f : [a, b] ’ R is a
bounded function. Consider a dissection D1 of [a, c] given by
D1 = {x0 , x1 , . . . , xm } with a = x0 ¤ x1 ¤ x2 ¤ · · · ¤ xm = c,
and a dissection D2 of [c, b] given by
D2 = {xm+1 , xm+2 , . . . , xn } with c = xm+1 ¤ xm+2 ¤ xm+3 ¤ · · · ¤ xn = b.
If D is the dissection of [a, b] given by
D = {x0 , x1 , . . . , xn },
show that S(f, D) = S(f |[a,c] , D1 ) + S(f |[c,b] , D2 ). (Here f |[a,c] means the
restriction of f to [a, c].)
(ii) Show that f ∈ R[a, b] if and only if f |[a,c] ∈ R[a, c] and f |[c,b] ∈ R[c, b].
Show also that, if f ∈ R[a, b], then
b c b
f |[a,c] (x) dx + f |[c,b] (x) dx.
f (x) dx =
a a c

In a very slightly less precise and very much more usual notation we write
b c b
f (x) dx = f (x) dx + f (x) dx.
a a c
180 A COMPANION TO ANALYSIS

There is a standard convention that we shall follow which says that, if
b ≥ a and f is Riemann integrable on [a, b], we de¬ne
a b
f (x) dx = ’ f (x) dx.
b a

Exercise 8.2.18. Suppose β ≥ ± and f is Riemann integrable on [±, β].
Show that if a, b, c ∈ [±, β] then
b c b
f (x) dx = f (x) dx + f (x) dx.
a a c

[Note that a, b and c may occur in any of six orders.]
However, this convention must be used with caution.
Exercise 8.2.19. Suppose that b ≥ a, », µ ∈ R, and f and g are Riemann
integrable. Which of the following statements are always true and which are
not? Give a proof or counterexample. If the statement is not always true,
¬nd an appropriate correction and prove it.
a a a
(i) »f (x) + µg(x) dx = » f (x) dx + µ g(x) dx.
b b b
a a
(ii) If f (x) ≥ g(x) for all x ∈ [a, b], then f (x) dx ≥ g(x) dx.
b b

Riemann was unable to show that all continuous functions were integrable
(we have a key concept that Riemann did not and we shall be able to ¬ll
this gap in the next section). He did, however, have the result of the next
exercise. (Note that an increasing function need not be continuous. Consider
the Heaviside function H : R ’ R given by H(x) = 0 for x < 0, H(x) = 1
for x ≥ 0.)
Exercise 8.2.20. Suppose f : [a, b] ’ R is increasing. Let N be a strictly
positive integer and consider the dissection
D = {x0 , x1 , . . . , xN } with xj = a + j(b ’ a)/N .
Show that
N
S(f, D) = f (xj )(b ’ a)/N,
j=1

¬nd s(f, D) and deduce that
S(f, D) ’ s(f, D) = (f (b) ’ f (a))(b ’ a)/N.
Conclude that f is Riemann integrable.
181
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Using Lemma 8.2.11 this gives the following result.
Lemma 8.2.21. If f : [a, b] ’ R can be written as f = f1 ’ f2 with f1 , f2 :
[a, b] ’ R increasing, then f is Riemann integrable.
At ¬rst sight, Lemma 8.2.21 looks rather uninteresting but, in fact, it
covers most of the functions we normally meet.
Exercise 8.2.22. (i) If
f1 (t) = 0, f2 (t) = ’t2 if t < 0
f1 (t) = t2 , f2 (t) = 0 if t ≥ 0,
show that f1 and f2 are increasing functions with t2 = f1 (t) ’ f2 (t).
(ii) Show that, if f : [a, b] ’ R has only a ¬nite number of local maxima
and minima, then it can be written in the form f = f1 ’ f2 with f1 , f2 :
[a, b] ’ R increasing.
Functions which are the di¬erence of two increasing functions are dis-
cussed in Exercise K.158, Exercises K.162 to K.166 and more generally in
the next chapter as ˜functions of bounded variation™. We conclude this sec-
tion with an important example of Dirichlet.
Exercise 8.2.23. If f : [0, 1] ’ R is given by
f (x) = 1 when x is rational,
f (x) = 0 when x is irrational,
show that, whenever D is a dissection of [0, 1], we have S(f, D) = 1 and
s(f, D) = 0. Conclude that f is not Riemann integrable.
Exercise 8.2.24. (i) If f is as in Exercise 8.2.23, show that
N N
1 1
f (r/N ) ’ 1 as N ’ ∞.
f (r/N ) = 1 and so
N N
r=1 r=1

(ii) Let g : [0, 1] ’ R be given by
g(r/2n ) = 1 when 1 ¤ r ¤ 2n ’ 1, n ≥ 1, and r and n are integers,
g(s/3n ) = ’1 when 1 ¤ s ¤ 3n ’ 1, n ≥ 1, and s and n are integers,
g(x) = 0 otherwise.
Discuss the behaviour of
N
1
g(r/N )
N r=1

as N ’ ∞ in as much detail as you consider desirable.
182 A COMPANION TO ANALYSIS

8.3 Integrals of continuous functions
The key to showing that continuous functions are integrable, which we have
and Riemann did not, is the notion of uniform continuity and the theo-
rem (Theorem 4.5.5) which tells us that a continuous function on a closed
bounded subset of Rn , and so, in particular, on a closed interval, is uniformly
continuous3 .

Theorem 8.3.1. Any continuous function f : [a, b] ’ R is Riemann inte-
grable.

Proof. If b = a the result is obvious, so suppose b > a. We shall show that f
is Riemann integrable by using the standard criterion given in Lemma 8.2.6.
To this end, suppose that > 0 is given. Since a continuous function on a
closed bounded interval is uniformly continuous, we can ¬nd a δ > 0 such
that

|f (x) ’ f (y)| ¤ whenever x, y ∈ [a, b] and |x ’ y| < δ.
b’a
Choose an integer N > (b ’ a)/δ and consider the dissection

D = {x0 , x1 , . . . , xN } with xj = a + j(b ’ a)/N .

If x, y ∈ [xj , xj+1 ], then |x ’ y| < δ and so

|f (x) ’ f (y)| ¤ .
b’a
It follows that

f (x) ’ f (x) ¤
sup inf
b’a
x∈[xj ,xj+1 ]
x∈[xj ,xj+1 ]


for all 0 ¤ j ¤ N ’ 1 and so
N ’1
S(f, D) ’ s(f, D) = (xj+1 ’ xj ) f (x) ’
sup inf f (x)
x∈[xj ,xj+1 ]
x∈[xj ,xj+1 ]
j=0
N ’1
b’a
¤ =,
N b’a
j=0


as required.
3
This is a natural way to proceed but Exercise K.118 shows that it is not the only one.
183
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Slight extensions of this result are given in Exercise I.11. In Exercise K.122
we consider a rather di¬erent way of looking at integrals of continuous func-
tions.
Although there are many functions which are integrable besides the con-
tinuous functions, there are various theorems on integration which demand
that the functions involved be continuous or even better behaved. Most of
the results of this section have this character.

Lemma 8.3.2. If f : [a, b] ’ R is continuous, f (t) ≥ 0 for all t ∈ [a, b] and
b
f (t) dt = 0,
a

it follows that f (t) = 0 for all t ∈ [a, b].

Proof. If f is a positive continuous function which is not identically zero, then
we can ¬nd an x ∈ [a, b] with f (x) > 0. Setting = f (x)/2, the continuity
of f tells us that there exists a δ > 0 such that |f (x) ’ f (y)| < whenever
|x ’ y| ¤ δ and y ∈ [a, b]. We observe that

f (y) ≥ f (x) ’ |f (x) ’ f (y)| > f (x) ’ = f (x)/2

whenever |x ’ y| ¤ δ and y ∈ [a, b]. If we de¬ne h : [a, b] ’ R by h(y) =
f (x)/2 whenever |x ’ y| ¤ δ and y ∈ [a, b] and h(y) = 0 otherwise, then
f (t) ≥ h(t) for all t ∈ [a, b] and so
b b
f (t) dt ≥ h(t) dt > 0.
a a




Exercise 8.3.3. (i) Let a ¤ c ¤ b. Give an example of a Riemann integrable
function f : [a, b] ’ R such that f (t) ≥ 0 for all t ∈ [a, b] and
b
f (t) dt = 0,
a

but f (c) = 0.
(ii) If f : [a, b] ’ R is Riemann integrable, f (t) ≥ 0 for all t ∈ [a, b] and
b
f (t) dt = 0,
a

show that f (t) = 0 at every point t ∈ [a, b] where f is continuous.
184 A COMPANION TO ANALYSIS

(iii) We say that f : [a, b] ’ R is right continuous at t ∈ [a, b] if f (s) ’
f (t) as s ’ t through values of s with b ≥ s > t. Suppose f is Riemann
integrable and is right continuous at every point t ∈ [a, b]. Show that if
f (t) ≥ 0 for all t ∈ [a, b] and
b
f (t) dt = 0,
a

it follows that f (t) = 0 for all t ∈ [a, b] with at most one exception. Give an
example to show that this exception may occur.

The reader should have little di¬culty in proving the following useful
related results.

Exercise 8.3.4. (i) If f : [a, b] ’ R is continuous and
b
f (t)g(t) dt = 0,
a

whenever g : [a, b] ’ R is continuous, show that f (t) = 0 for all t ∈ [a, b].
(ii) If f : [a, b] ’ R is continuous and
b
f (t)g(t) dt = 0,
a

whenever g : [a, b] ’ R is continuous and g(a) = g(b) = 0, show that f (t) = 0
for all t ∈ [a, b]. (We prove a slightly stronger result in Lemma 8.4.7.)

We now prove the fundamental theorem of the calculus which links the
processes of integration and di¬erentiation. Since the result is an important
one it is worth listing the properties of the integral that we use in the proof.

Lemma 8.3.5. Suppose », µ ∈ R, f, g : [±, β] ’ R are Riemann integrable
and a, b, c ∈ [±, β]. The following results hold.
b
1 dt = b ’ a.
(i)
a
b b b
(ii) »f (t) + µg(t) dt = » f (t) dt + µ g(t) dt.
a a a
b c c
(iii) f (t) dt + f (t) dt = f (t) dt.
a b a
b
f (t) dt ¤ |b ’ a| sup |f (a + θ(b ’ a))|.
(iv)
0¤θ¤1
a
185
Please send corrections however trivial to twk@dpmms.cam.ac.uk

The reader should run through these results in her mind and make sure
that she can prove them (note that a, b and c can be in any order).
Theorem 8.3.6. (The fundamental theorem of the calculus.) Sup-
pose that f : (a, b) ’ R is a continuous function and that u ∈ (a, b). If we
set
t
F (t) = f (x) dx,
u

then F is di¬erentiable on (a, b) and F (t) = f (t) for all t ∈ (a, b).
Proof. Observe that, if t + h ∈ (a, b) and h = 0 then
t+h t
F (t + h) ’ F (t) 1
’ f (t) = f (x) dx ’ f (x) dx ’ hf (t)
h h u u
t+h t+h
1
f (x) dx ’
= f (t) dx
h t t
t+h
1
(f (x) ’ f (t)) dx
=
|h| t
¤ sup |f (t + θh) ’ f (t)| ’ 0
0¤θ¤1

as h ’ 0 since f is continuous at t. (Notice that f (t) remains constant as x
varies.)
Exercise 8.3.7. (i) Using the idea of the integral as the area under a curve,
draw diagrams illustrating the proof of Theorem 8.3.6.
(ii) Point out, explicitly, each use of Lemma 8.3.5 in our proof of Theo-
rem 8.3.6.
(iii) Let H be the Heaviside function H : R ’ R given by H(x) = 0 for
t
x < 0, H(x) = 1 for x ≥ 0. Calculate F (t) = 0 H(x) dx and show that F is
not di¬erentiable at 0. Where does our proof of Theorem 8.3.6 break down?
t
(iv) Let f (0) = 1, f (t) = 0 otherwise. Calculate F (t) = 0 f (x) dx and
show that F is di¬erentiable at 0 but F (0) = f (0). Where does our proof of
Theorem 8.3.6 break down?
Exercise 8.3.8. Suppose that f : (a, b) ’ R is a function such that f is
Riemann integrable on every interval [c, d] ⊆ (a, b). Let u ∈ (a, b) If we set
t
F (t) = f (x) dx
u

show that F is continuous on (a, b) and that, if f is continuous at some point
t ∈ (a, b), then F is di¬erentiable at t and F (t) = f (t).
186 A COMPANION TO ANALYSIS

Sometimes we think of the fundamental theorem in a slightly di¬erent way.

Theorem 8.3.9. Suppose that f : (a, b) ’ R is continuous, that u ∈ (a, b)
and c ∈ R. Then there is a unique solution to the di¬erential equation
g (t) = f (t) [t ∈ (a, b)] such that g(u) = c.

Exercise 8.3.10. Prove Theorem 8.3.9. Make clear how you use Theo-
rem 8.3.6 and the mean value theorem. Reread section 1.1.

We call the solutions of g (t) = f (t) inde¬nite integrals (or, simply, inte-
grals) of f .
Yet another version of the fundamental theorem is given by the next
theorem.

Theorem 8.3.11. Suppose that g : (±, β) ’ R has continuous derivative
and [a, b] ⊆ (±, β). Then
b
g (t) dt = g(b) ’ g(a).
a

Proof. De¬ne U : (±, β) ’ R by
t
g (x) dx ’ g(t) + g(a).
U (t) =
a

By the fundamental theorem of the calculus and earlier results on di¬erenti-
ation, U is everywhere di¬erentiable with

U (t) = g (t) ’ g (t) = 0

so, by the mean value theorem, U is constant. But U (a) = 0, so U (t) = 0
for all t and, in particular, U (b) = 0 as required.

[Remark: In one dimension, Theorems 8.3.6, 8.3.9 and 8.3.11 are so closely
linked that mathematicians tend to refer to them all as ˜The fundamental
theorem of the calculus™. However they generalise in di¬erent ways.
(1) Theorem 8.3.6 shows that, under suitable circumstances, we can re-
cover a function from its ˜local average™ (see Exercise K.130).
(2) Theorem 8.3.9 says that we can solve a certain kind of di¬erential
equation. We shall obtain substantial generalisations of this result in Sec-
tion 12.2.
(3) Theorem 8.3.11 links the value of the derivative f on the whole of
[a, b] with the value of f on the boundary (that is to say, the set {a, b}). If
187
Please send corrections however trivial to twk@dpmms.cam.ac.uk

you have done a mathematical methods course you will already have seen a
similar idea expressed by the divergence theorem

· u dV = u · dS.
V ‚V

This result and similar ones like Stokes™ theorem turn out to be special cases
of a master theorem4 which links the behaviour of the derivative of a certain
mathematical object over the whole of some body with the behaviour of the
object on the boundary of that body.]
Theorems 8.3.6 and 8.3.11 show that (under appropriate circumstances)
integration and di¬erentiation are inverse operations and the the theories
of di¬erentiation and integration are subsumed in the greater theory of the
calculus. Under appropriate circumstances, if the graph of F has tangent
with slope f (x) at x
area under the graph of slope of tangent of F
= area under the graph of f
b b
F (x) dx = F (b) ’ F (a).
= f (x) dx =
a a

Exercise 8.3.12. Most books give a slightly stronger version of Theorem 8.3.11
in the following form.
If f : [a, b] ’ R has continuous derivative, then
b
f (t) dt = f (b) ’ f (a).
a

Explain what this means (you will need to talk about ˜left™ and ˜right™ deriva-
tives) and prove it.
Recalling the chain rule (Lemma 6.2.10) which tells us that (¦ —¦ g) (t) =
g (t)¦ (g(t)), the same form of proof gives us a very important theorem.
Theorem 8.3.13. (Change of variables for integrals.) Suppose that f :
(±, β) ’ R is continuous and g : (γ, δ) ’ R is di¬erentiable with continuous
derivative. Suppose further that g (γ, δ) ⊆ (±, β). Then, if c, d ∈ (γ, δ), we
have
g(d) d
f (s) ds = f (g(x))g (x) dx.
g(c) c

4
Arnol´d calls it the Newton-Leibniz-Gauss-Green-Ostrogradskii-Stokes-Poincar´ the-
e
orem but most mathematicians call it the generalised Stokes™ theorem or just Stokes™
theorem.
188 A COMPANION TO ANALYSIS

Exercise 8.3.14. (i) Prove Theorem 8.3.13 by considering
g(t) t
f (s) ds ’
U (t) = f (g(x))g (x) dx.
g(c) c

(ii) Derive Theorem 8.3.11 from Theorem 8.3.13 by choosing f appropri-
ately.
(iii) Strengthen Theorem 8.3.13 along the lines of Exercise 8.3.12.
(iv) (An alternative proof.) If f is as in Theorem 8.3.13 explain why we
can ¬nd an F : (±, β) ’ R with F = f . Obtain Theorem 8.3.13 by applying
the chain rule to F (g(x))g (x) = f (g(x))g (x).
Because the proof of Theorem 8.3.13 is so simple and because the main use
of the result in elementary calculus is to evaluate integrals, there is tendency
to underestimate the importance of this result. However, it is important for
later developments that the reader has an intuitive grasp of this result.
Exercise 8.3.15. (i) Suppose that f : R ’ R is the constant function f (t) =
K and that g : R ’ R is the linear function g(t) = »t + µ. Show by direct
calculation that
g(d) d
f (s) ds = f (g(x))g (x) dx,
g(c) c

and describe the geometric content of this result in words.
(ii) Suppose now that f : R ’ R and g : R ’ R are well behaved
functions. By splitting [c, d] into small intervals on which f is ˜almost con-
stant™ and g is ˜almost linear™, give a heuristic argument for the truth of
Theorem 8.3.13. To see how this heuristic argument can be converted into a
rigorous one, consult Exercise K.118.
Exercise 8.3.16. There is one peculiarity in our statement of Theorem 8.3.13
which is worth noting. We do not demand that g be bijective. Suppose that
f : R ’ R is continuous and g(t) = sin t. Show that, by choosing di¬erent
intervals (c, d), we obtain
sin ± ±
f (s) ds = f (sin x) cos x dx
0 0
±+2π π’±
= f (sin x) cos x dx = f (sin x) cos x dx.
0 0

Explain what is going on.
The extra ¬‚exibility given by allowing g not be bijective is one we are
usually happy to sacri¬ce in the interests of generalising Theorem 8.3.13.
189
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 8.3.17. The following exercise is traditional.
(i) Show that integration by substitution, using x = 1/t, gives
b 1/a
dx dt
=
1 + x2 1 + t2
a 1/b

when b > a > 0.
(ii) If we set a = ’1, b = 1 in the formula of (i), we obtain
1 1
dx ? dt
=’
1 + x2 1 + t2
’1 ’1

Explain this apparent failure of the method of integration by substitution.
(iii) Write the result of (i) in terms of tan’1 and prove it using standard
trigonometric identities.
In sections 5.4 and 5.6 we gave a treatment of the exponential and loga-
rithmic functions based on di¬erentiation. The reader may wish to look at
Exercise K.126 in which we use integration instead.
Another result which can be proved in much the same manner as Theo-
rems 8.3.11 and Theorem 8.3.13 is the lemma which justi¬es integration by
parts. (Recall the notation [h(x)]b = h(b) ’ h(a).)
a

Lemma 8.3.18. Suppose that f : (±, β) ’ R has continuous derivative and
g : (±, β) ’ R is continuous. Let G : (±, β) ’ R be an inde¬nite integral of
g. Then, if [a, b] ⊆ (±, β), we have
b b
[f (x)G(x)]b ’
f (x)g(x) dx = f (x)G(x) dx.
a
a a

Exercise 8.3.19. (i) Obtain Lemma 8.3.18 by di¬erentiating an appropriate
U in the style of the proofs of Theorems 8.3.11 and Theorem 8.3.13. Quote
carefully the results that you use.
(ii) Obtain Lemma 8.3.18 by integrating both sides of the equality (uv) =
u v + uv and choosing appropriate u and v. Quote carefully the results that
you use.
(iii) Strengthen Lemma 8.3.18 along the lines of Exercise 8.3.12.
Integration by parts gives a global Taylor theorem with a form that is
easily remembered and proved for examination.
Theorem 8.3.20. (A global Taylor™s theorem with integral remain-
der.) If f : (u, v) ’ R is n times continuously di¬erentiable and 0 ∈ (u, v),
then
n’1
f (j) (0) j
f (t) = t + Rn (f, t)
j!
j=0
190 A COMPANION TO ANALYSIS

where
t
1
(t ’ x)n’1 f (n) (x) dx.
Rn (f, t) =
(n ’ 1)! 0

Exercise 8.3.21. By integration by parts, show that
f (n’1) (0) n’1
Rn (f, t) = t + Rn’1 (f, t).
(n ’ 1)!
Use repeated integration by parts to obtain Theorem 8.3.20.
Exercise 8.3.22. Reread Example 7.1.5. If F is as in that example, identify
Rn’1 (F, t).
Exercise 8.3.23. If f : (’a, a) ’ R is n times continuously di¬erentiable
with |f (n) (t)| ¤ M for all t ∈ (’a, a), show that
n’1
f (j) (0) j M |t|n
f (t) ’ t¤ .
j! n!
j=0

Explain why this result is slightly weaker than that of Exercise 7.1.1 (v).
There are several variants of Theorem 8.3.20 with di¬erent expressions for
Rn (f, t) (see, for example, Exercise K.49 (vi)). However, although the theory
of the Taylor expansion is very important (see, for example, Exercise K.125
and Exercise K.266), these global theorems are not much used in relation to
speci¬c functions outside the examination hall. We discuss two of the reasons
why at the end of Section 11.5. In Exercises 11.5.20 and 11.5.22 I suggest
that it is usually easier to obtain Taylor series by power series solutions rather
than by using theorems like Theorem 8.3.20. In Exercise 11.5.23 I suggest
that power series are often not very suitable for numerical computation.


First steps in the calculus of variations ™
8.4
The most famous early problem in the calculus of variations is that of the
brachistochrone. It asks for the equation y = f (x) of the wire down which a
frictionless particle with initial velocity v will slide from one point (a, ±) to
another (b, β) (so f (a) = ±, f (b) = β, a = b and ± > β) in the shortest time.
It turns out that that time taken by the particle is
1/2
b
1 + f (x)2
1
J(f ) = dx
(2g)1/2 κ ’ f (x)
a

where κ = v 2 /(2g) + ± and g is the acceleration due to gravity.
191
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Exercise 8.4.1. If you know su¬cient mechanics, verify this. (Your argu-
ment will presumably involve arc length which has not yet been mentioned in
this book.)
This is a problem of minimising which is very di¬erent from those dealt
with in elementary calculus. Those problems ask us to choose a point x0 from
a one-dimensional space which minimises some function g(x). In section 7.3
we considered problems in which we sought to choose a point x0 from a
n-dimensional space which minimises some function g(x). Here we seek to
choose a function f0 from an in¬nite dimensional space to minimise a function
J(f ) of functions f .
Exercise 8.4.2. In the previous sentence we used the words ˜in¬nite dimen-
sional™ somewhat loosely. However we can make precise statements along the
same lines.
(i) Show that the collection P of polynomials P with P (0) = P (1) = 0
forms a vector space over R with the obvious operations. Show that P is
in¬nite dimensional (in other words, has no ¬nite spanning set).
(ii) Show that the collection E of in¬nitely di¬erentiable functions f :
[0, 1] ’ R with f (0) = f (1) forms a vector space over R with the obvious
operations. Show that E is in¬nite dimensional.
John Bernoulli published the brachistochrone problem as a challenge in
1696. Newton, Leibniz, L™Hˆpital, John Bernoulli and James Bernoulli all
o
found solutions within a year5 . However, it is one thing to solve a particular
problem and quite another to ¬nd a method of attack for the general class
of problems to which it belongs. Such a method was developed by Euler
and Lagrange. We shall see that it does not resolve all di¬culties but it
represents a marvelous leap of imagination.
We begin by proving that, under certain circumstances, we can inter-
change the order of integration and di¬erentiation. (We will extend the
result in Theorem 11.4.21.)
Theorem 8.4.3. (Di¬erentiation under the integral.) Let (a , b ) —
(c , d ) ⊇ [a, b] — [c, d]. Suppose that g : (a , b ) — (c , d ) ’ R is continuous
and that the partial derivative g,2 exists and is continuous. Then writing
b
G(y) = a g(x, y) dx we have G di¬erentiable on (c, d) with
b
G (y) = g,2 (x, y) dx.
a
5
They were giants in those days. Newton had retired from mathematics and submitted
his solution anonymously. ˜But™ John Bernoulli said ˜one recognises the lion by his paw.™
192 A COMPANION TO ANALYSIS

This result is more frequently written as
b b
d ‚g
g(x, y) dx = (x, y) dx,
dy ‚y
a a

and interpreted as ˜the d clambers through the integral and curls up™. If we
use the D notation we get
b
G (y) = D2 g(x, y) dx.
a

b
It may, in the end, be more helpful to note that a g(x, y) dx is a function of
the single variable y, but g(x, y) is a function of the two variables x and y.
Proof. We use a proof technique which is often useful in this kind of situation
(we have already used a simple version in Theorem 8.3.6, when we proved
the fundamental theorem of the calculus).
We ¬rst put everything under one integral sign. Suppose y, y + h ∈ (c, d)
and h = 0. Then
b b
G(y + h) ’ G(y) 1
’ G(y + h) ’ G(y) ’
g,2 (x, y) dx = hg,2 (x, y) dx
|h|
h a a
b
1
g(x, y + h) ’ g(x, y) ’ hg,2 (x, y) dx
=
|h| a
In order to estimate the last integral we use the simple result (Exercise 8.2.13 (iv))

|integral| ¤ length — sup

which gives us
b
1
g(x, y + h) ’ g(x, y) ’ hg,2 (x, y) dx
|h| a
b’a
¤ sup |g(x, y + h) ’ g(x, y) ’ hg,2 (x, y)|.
|h| x∈[a,b]

We expect |g(x, y +h)’g(x, y)’hg,2 (x, y)| to be small when h is small be-
cause the de¬nition of the partial derivative tells us that g(x, y+h)’g(x, y) ≈
hg,2 (x, y). In such circumstances, the mean value theorem is frequently use-
ful. In this case, setting f (t) = g(x, y + t) ’ g(x, y), the mean value theorem
tells us that

|f (h)| = |f (h) ’ f (0)| ¤ |h| sup |f (θh)|
0¤θ¤1
193
Please send corrections however trivial to twk@dpmms.cam.ac.uk

and so
|g(x, y + h) ’ g(x, y) ’ hg,2 (x, y)| ¤ |h| sup |g,2 (x, y + θh) ’ g,2 (x, y)|.
0¤θ¤1

There is one further point to notice. Since we are taking a supremum
over all x ∈ [a, b], we shall need to know, not merely that we can make
|g,2 (x, y + θh) ’ g,2 (x, y)| small at a particular x by taking h su¬ciently
small, but that we can make |g,2 (x, y + θh) ’ g,2 (x, y)| uniformly small for
all x. However, we know that g,2 is continuous on [a, b] — [c, d] and that a
function which is continuous on a closed bounded set is uniformly continuous
and this will enable us to complete the proof.
Let > 0. By Theorem 4.5.5, g,2 is uniformly continuous on [a, b] — [c, d]
and so we can ¬nd a δ( ) > 0 such that
|g,2 (x, y) ’ g,2 (u, v)| ¤ /(b ’ a)
whenever (x’u)2 +(y ’v)2 < δ( ) and (x, y), (u, v) ∈ [a, b]—[c, d]. It follows
that, if y, y + h ∈ (c, d) and |h| < δ( ), then
sup |g,2 (x, y + θh) ’ g,2 (x, y)| ¤ /(b ’ a)
0¤θ¤1

for all x ∈ [a, b]. Putting all our results together, we have shown that
b
G(y + h) ’ G(y)
’ g,2 (x, y) dx <
h a

whenever y, y + h ∈ (c, d) and 0 < |h| < δ( ) and the result follows.
Exercise 8.4.4. Because I have tried to show where the proof comes from,
the proof above is not written in a very economical way. Rewrite it more
economically.
A favourite examiner™s variation on the theme of Theorem 8.4.3 is given in
Exercise K.132.
Exercise 8.4.5. In what follows we will use a slightly di¬erent version of
Theorem 8.4.3.
Suppose g : [a, b] — [c, d] is continuous and that the partial derivative g ,2
b
exists and is continuous. Then, writing G(y) = a g(x, y) dx, we have G
di¬erentiable on [c, d] with
b
G (y) = g,2 (x, y) dx.
a

Explain what this means in terms of left and right derivatives and prove
it.
194 A COMPANION TO ANALYSIS

The method of Euler and Lagrange applies to the following class of prob-
lems. Suppose that F : R3 ’ R has continuous second partial derivatives.
We consider the set A of functions f : [a, b] ’ R which are di¬erentiable
with continuous derivative and are such that f (a) = ± and f (b) = β. We
write
b
J(f ) = F (t, f (t), f (t)) dt.
a

and seek to minimise J, that is to ¬nd an f0 ∈ A such that

J(f0 ) ¤ J(f )

whenever f ∈ A.
In section 7.3, when we asked if a particular point x0 from an n-dimensional
space minimised g : Rn ’ R, we examined the behaviour of g close to x0 . In
other words, we looked at g(x0 + ·u) when u was an arbitrary vector and ·
was small. The idea of Euler and Lagrange is to look at

Gh (·) = J(f0 + ·h)

where h : [a, b] ’ R is di¬erentiable with continuous derivative and is such
that h(a) = 0 and h(b) = 0 (we shall call the set of such functions E). We
observe that Gh is a function from R and that Gh has a minimum at 0 if J
is minimised by f0 . This observation, combined with some very clever, but
elementary, calculus gives the celebrated Euler-Lagrange equation.

Theorem 8.4.6. Suppose that F : R3 ’ R has continuous second partial
derivatives. Consider the set A of functions f : [a, b] ’ R which are di¬er-
entiable with continuous derivative and are such that f (a) = ± and f (b) = β.
We write
b
J(f ) = F (t, f (t), f (t)) dt.
a

If f ∈ A such that

J(f ) ¤ J(g)

whenever g ∈ A then

d
F,2 (t, f (t), f (t)) = F,3 (t, f (t), f (t)).
dt
195
Please send corrections however trivial to twk@dpmms.cam.ac.uk

Proof. We use the notation of the paragraph preceding the statement of
the theorem. If h ∈ E (that is to say h : [a, b] ’ R is di¬erentiable with
continuous derivative and is such that h(a) = 0 and h(b) = 0) then the chain
rule tells us that the function gh : R2 ’ R given by

gh (·, t) = F (t, f (t) + ·h(t), f (t) + ·h (t))

has continuous partial derivative

gh,1 (·, t) = h(t)F,2 (t, f (t) + ·h(t), f (t) + ·h (t)) + h (t)F,3 (t, f (t) + ·h(t), f (t) + ·h (t)).

Thus by Theorem 8.4.3, we may di¬erentiate under the integral to show that
Gh is di¬erentiable everywhere with

Gh (·) =
b
h(t)F,2 (t, f (t) + ·h(t), f (t) + ·h (t)) + h (t)F,3 (t, f (t) + ·h(t), f (t) + ·h (t)) dt.
a

If f minimises J, then 0 minimises Gh and so Gh (0) = 0. We deduce that
b
0= h(t)F,2 (t, f (t), f (t)) + h (t)F,3 (t, f (t), f (t)) dt
a
b b
= h(t)F,2 (t, f (t), f (t)) dt + h (t)F,3 (t, f (t), f (t)) dt.
a a

Using integration by parts and the fact that h(a) = h(b) = 0 we obtain
b b
d
b

h (t)F,3 (t, f (t), f (t)) dt = [h(t)F,3 (t, f (t), f (t))]a h(t) F,3 (t, f (t), f (t)) dt
dt
a a

<<

. 6
( 19)



>>