. 24
( 27)


Proof. We prove the smooth case and indicate the changes for the real analytic
case. The proof will use an algorithm.
Note ¬rst that by (50.10) (by (50.12) in the real analytic case) the characteristic

P (A(t))(») = det(A(t) ’ »I)
= »n ’ a1 (t)»n’1 + a2 (t)»n’2 ’ · · · + (’1)n an (t)
tr(Λi A(t))»n’i

is smoothly solvable (real analytically solvable), with smooth (real analytic) roots
»1 (t), . . . , »n (t) on the whole parameter interval.
Case 1: distinct eigenvalues. If A(0) has some eigenvalues distinct, then one
can reorder them in such a way that for i0 = 0 < 1 ¤ i1 < i2 < · · · < ik < n = ik+1
we have

»1 (0) = · · · = »i1 (0) < »i1 +1 (0) = · · · = »i2 (0) < · · · < »ik +1 (0) = · · · = »n (0).

50.14 50. Applications to perturbation theory of operators 547

For t near 0 we still have

»1 (t), . . . , »i1 (t) < »i1 +1 (t), . . . , »i2 (t) < · · · < »ik +1 (t), . . . , »n (t).

For j = 1, . . . , k + 1 we consider the subspaces

{v ∈ V : (A(t) ’ »i (t))v = 0}.
Vt =
i=ij’1 +1

Then each Vt runs through a smooth (real analytic) vector subbundle of the
trivial bundle (’µ, µ) — V ’ (’µ, µ), which admits a smooth (real analytic) framing
k+1 (j)
eij’1 +1 (t), . . . , eij (t). We have V = j=1 Vt for each t.
In order to prove this statement, note that

= ker (A(t) ’ »ij’1 +1 (t)) —¦ . . . —¦ (A(t) ’ »ij (t)) ,

so Vt is the kernel of a smooth (real analytic) vector bundle homomorphism B(t)
of constant rank (even of constant dimension of the kernel), and thus is a smooth
(real analytic) vector subbundle. This together with a smooth (real analytic) frame
¬eld can be shown as follows: Choose a basis of V , constant in t, such that A(0)
is diagonal. Then by the elimination procedure one can construct a basis for the
kernel of B(0). For t near 0, the elimination procedure (with the same choices)
gives then a basis of the kernel of B(t); the elements of this basis are then smooth
(real analytic) in t for t near 0.
From the last result it follows that it su¬ces to ¬nd smooth (real analytic) eigen-
vectors in each subbundle V (j) separately, expanded in the smooth (real analytic)
frame ¬eld. But in this frame ¬eld the vector subbundle looks again like a constant
vector space. So feed each of these parts (A restricted to V (j) , as matrix with
respect to the frame ¬eld) into case 2 below.
Case 2: All eigenvalues at 0 are equal. So suppose that A(t) : V ’ V is
Hermitian with all eigenvalues at t = 0 equal to a1n , see (1).

Eigenvectors of A(t) are also eigenvectors of A(t)’ a1n I, so we may replace A(t) by

A(t) ’ a1n I and assume that for the characteristic polynomial (1) we have a1 = 0,

or assume without loss that »i (0) = 0 for all i, and so A(0) = 0.
If A(t) = 0 for all t we choose the eigenvectors constant.
Otherwise, let Aij (t) = tAij (t). From (1) we see that the characteristic polynomial
of the Hermitian matrix A(1) (t) is P1 (t) in the notation of (50.8), thus m(ai ) ≥ i
for 2 ¤ i ¤ n, which also follows from (50.5).
The eigenvalues of A(1) (t) are the roots of P1 (t), which may be chosen in a smooth
way, since they again satisfy the condition of theorem (50.10). In the real analytic
case we just have to invoke (50.12). Note that eigenvectors of A(1) are also eigen-
vectors of A. If the eigenvalues are still all equal, we apply the same procedure

548 Chapter X. Further Applications 50.16

again, until they are not all equal: we arrive at this situation by the assumption of
the theorem in the smooth case, and automatically in the real analytic case. Then
we apply case 1.
This algorithm shows that one may choose the eigenvectors xi (t) of Ai (t) in a
smooth (real analytic) way, locally in t. It remains to extend this to the whole
parameter interval.
If some eigenvalues coincide locally then on the whole of R, by the assumption. The
corresponding eigenspaces then form a smooth (real analytic) vector bundle over
R, by case 1, since those eigenvalues, which meet in isolated points are di¬erent
after application of case 2.
(j) (j)
So we we get V = Wt where the Wt are real analytic sub vector bundles of
V —R, whose dimension is the generic multiplicity of the corresponding smooth (real
analytic) eigenvalue function. It su¬ces to ¬nd global orthonormal smooth (real
analytic) frames for each of these; this exists since the vector bundle is smoothly
(real analytically) trivial, by using parallel transport with respect to a smooth (real
analytic) Hermitian connection.

50.15. Example. (see [Rellich, 1937, section 2]) That the last result cannot be
improved is shown by the following example which rotates a lot:
cos 1 ’ sin 1 1
»± (t) = ±e’ t2 ,
t t
x+ (t) := , x’ (t) := ,
1 1
sin t cos t
»+ (t) 0
(x+ (t), x’ (t))’1
A(t) := (x+ (t), x’ (t))
0 »’ (t)
cos 2 sin 2
’ t1 t t
=e .
’ cos 2
sin t t

Here t ’ A(t) and t ’ »± (t) are smooth, whereas the eigenvectors cannot be
chosen continuously.

50.16. Theorem. Let t ’ A(t) be a smooth curve of unbounded self-adjoint oper-
ators in a Hilbert space with common domain of de¬nition and compact resolvent.
Then the eigenvalues of A(t) may be arranged increasingly ordered in such a way
that each eigenvalue is continuous, and they can be rearranged in such a way that
they become C 1 -functions.
Suppose, moreover, that no two of the continuous eigenvalues meet of in¬nite order
at any t ∈ R if they are not equal. Then the eigenvalues and the eigenvectors can
be chosen smoothly in t on the whole parameter domain.
If on the other hand t ’ A(t) is a real analytic curve of unbounded self-adjoint
operators in a Hilbert space with common domain of de¬nition and with compact
resolvent. Then the eigenvalues and the eigenvectors can be chosen smoothly in t,
on the whole parameter domain.

The real analytic version of this theorem is due to [Rellich, 1940], see also [Kato,
1976, VII, 3.9] the smooth version is due to [Alekseevsky, Kriegl, Losik, Michor,
1996]; the proof follows the lines of the latter paper.

50.16 50. Applications to perturbation theory of operators 549

That A(t) is a smooth curve of unbounded operators means the following: There is
a dense subspace V of the Hilbert space H such that V is the domain of de¬nition
of each A(t) and such that A(t)— = A(t) with the same domains V , where the
adjoint operator A(t)— is de¬ned by A(t)u, v = u, A(t)— v for all v for which the
left hand side is bounded as functional in u ∈ V ‚ H. Moreover, we require that
t ’ A(t)u, v is smooth for each u ∈ V and v ∈ H. This implies that t ’ A(t)u is
smooth R ’ H for each v ∈ V by (2.3). Similar for the real analytic case, by (7.4).
The ¬rst part of the proof will show that t ’ A(t) smooth implies that the resolvent
(A(t) ’ z)’1 is smooth in t and z jointly, and mainly this is used later in the proof.
It is well known and in the proof we will show that if for some (t, z) the resolvent
(A(t) ’ z)’1 is compact then for all t ∈ R and z in the resolvent set of A(t).

Proof. We shall prove the smooth case and indicate the changes for the real ana-
lytic case.
For each t consider the norm u 2 := u 2 + A(t)u 2 on V . Since A(t) = A(t)—
is closed, (V, t ) is also a Hilbert space with inner product u, v t := u, v +
s ) ’ (V,
A(t)u, A(t)v . All these norms are equivalent since (V, t+ t)
is continuous and bijective, so an isomorphism by the open mapping theorem. Then
t ’ u, v t is smooth for ¬xed u, v ∈ V , and by the multilinear uniform boundedness
principle (5.18), the mapping t ’ , t is smooth and into the space of bounded
bilinear forms; in the real analytic case we use (11.14) instead. By the exponential
law (3.12) the mapping (t, u) ’ u 2 is smooth from R — (V, s ) ’ R for each
¬xed s. In the real analytic case we use (11.18) instead. Thus, all Hilbert norms
t are equivalent, since { u t : |t| ¤ K, u s ¤ 1} is bounded by LK,s in R, so
u t ¤ LK,s u s for all |t| ¤ K. Moreover, each A(s) is a globally de¬ned operator
t ) ’ H with closed graph and is thus bounded, and by using again the
(multi)linear uniform boundedness principle (5.18) (or (11.14) in the real analytic
case) as above we see that s ’ A(s) is smooth (real analytic) R ’ L((V, t ), H).

If for some (t, z) ∈ R — C the bounded operator A(t) ’ z : V ’ H is invertible, then
this is true locally and (t, z) ’ (A(t) ’ z)’1 : H ’ V is smooth since inversion is
smooth on Banach spaces.
Since each A(t) is Hermitian the global resolvent set {(t, z) ∈ R — C : (A(t) ’ z) :
V ’ H is invertible} is open, contains R — (C \ R), and hence is connected.
Moreover (A(t) ’ z)’1 : H ’ H is a compact operator for some (equivalently any)
(t, z) if and only if the inclusion i : V ’ H is compact, since i = (A(t) ’ z)’1 —¦
(A(t) ’ z) : V ’ H ’ H.
Let us ¬x a parameter s. We choose a simple smooth curve γ in the resolvent set
of A(s) for ¬xed s.
(1) Claim. For t near s, there are C 1 -functions t ’ »i (t) : 1 ¤ i ¤ N which
parameterize all eigenvalues (repeated according to their multiplicity) of
A(t) in the interior of γ. If no two of the generically di¬erent eigenvalues
meet of in¬nite order they can be chosen smoothly.

550 Chapter X. Further Applications 50.16

By replacing A(s) by A(s)’z0 if necessary we may assume that 0 is not an eigenvalue
of A(s). Since the global resolvent set is open, no eigenvalue of A(t) lies on γ or
equals 0, for t near s. Since
(A(t) ’ z)’1 dz =: P (t, γ)
2πi γ

is a smooth curve of projections (on the direct sum of all eigenspaces corresponding
to eigenvalues in the interior of γ) with ¬nite dimensional ranges, the ranks (i.e.
dimension of the ranges) must be constant: it is easy to see that the (¬nite) rank
cannot fall locally, and it cannot increase, since the distance in L(H, H) of P (t) to
the subset of operators of rank ¤ N = rank(P (s)) is continuous in t and is either
0 or 1. So for t near s, there are equally many eigenvalues in the interior, and we
may call them µi (t) : 1 ¤ i ¤ N (repeated with multiplicity). Let us denote by
ei (t) : 1 ¤ i ¤ N a corresponding system of eigenvectors of A(t). Then by the
residue theorem we have
z p (A(t) ’ z)’1 dz,
µi (t)p ei (t) ei (t), =’
2πi γ

which is smooth in t near s, as a curve of operators in L(H, H) of rank N , since 0
is not an eigenvalue.
(2) Claim. Let t ’ T (t) ∈ L(H, H) be a smooth curve of operators of rank
N in Hilbert space such that T (0)T (0)(H) = T (0)(H). Then t ’ tr(T (t))
is smooth (real analytic) (note that this implies T smooth (real analytic)
into the space of operators of trace class by (2.3) or (2.14.4), (by (10.3) and
(9.4) in the real analytic case) since all bounded linear functionals are of
the form A ’ tr(AB) for bounded B, see (52.33), e.g.
Let F := T (0)(H). Then T (t) = (T1 (t), T2 (t)) : H ’ F • F ⊥ and the image of
T (t) is the space
T (t)(H) = {(T1 (t)(x), T2 (t)(x)) : x ∈ H}
= {(T1 (t)(x), T2 (t)(x)) : x ∈ F } for t near 0
= {(y, S(t)(y)) : y ∈ F }, where S(t) := T2 (t) —¦ (T1 (t)|F )’1 .
Note that S(t) : F ’ F ⊥ is smooth (real analytic) in t by ¬nite dimensional
inversion for T1 (t)|F : F ’ F . Now
T1 (t)|F ⊥
1 0 T1 (t)|F 1 0
tr(T (t)) = tr
T2 (t)|F ⊥
’S(t) 1 T2 (t)|F S(t) 1
T1 (t)|F ⊥
T1 (t)|F 1 0
= tr
’S(t)T1 (t)|F ⊥ + T2 (t)|F ⊥
0 S(t) 1
T1 (t)|F ⊥
T1 (t)|F 1 0
= tr , since rank = N
0 0 S(t) 1
T1 (t)|F + (T1 (t)|F ⊥ )S(t) T1 (t)|F ⊥
= tr
0 0
= tr T1 (t)|F + (T1 (t)|F ⊥ )S(t) : F ’ F ,

50.16 50. Applications to perturbation theory of operators 551

which visibly is smooth (real analytic) since F is ¬nite dimensional.
From the claim (2) we now may conclude that
z p (A(t) ’ z)’1 dz
»i (t)p = ’ tr
2πi γ

is smooth (real analytic) for t near s.
Thus, the Newton polynomial mapping sN (»’n (t), . . . , »m (t)) is smooth (real an-
alytic), so also the elementary symmetric polynomial σ N (»’n (t), . . . , »m (t)) is
smooth, and thus {µi (t) : 1 ¤ i ¤ N } is the set of roots of a polynomial with
smooth (real analytic) coe¬cients. By theorem (50.11), there is an arrangement
of these roots such that they become di¬erentiable. If no two of the generically
di¬erent ones meet of in¬nite order, by theorem (50.10) there is even a smooth ar-
rangement. In the real analytic case, by theorem (50.12) the roots may be arranged
in a real analytic way.
To see that in the general smooth case they are even C 1 note that the images of
the projections P (t, γ) of constant rank for t near s describe the ¬bers of a smooth
vector bundle. The restriction of A(t) to this bundle, viewed in a smooth framing,
becomes a smooth curve of symmetric matrices, for which by Rellich™s result (50.17)
below the eigenvalues can be chosen C 1 . This ¬nishes the proof of claim (1).
(3) Claim. Let t ’ »i (t) be a di¬erentiable eigenvalue of A(t), de¬ned on
some interval. Then

|»i (t1 ) ’ »i (t2 )| ¤ (1 + |»i (t2 )|)(ea|t1 ’t2 | ’ 1)

holds for a continuous positive function a = a(t1 , t2 ) which is independent
of the choice of the eigenvalue.
For ¬xed t near s take all roots »j which meet »i at t, order them di¬erentiably near
t, and consider the projector P (t, γ) onto the joint eigenspaces for only those roots
(where γ is a simple smooth curve containing only »i (t) in its interior, of all the
eigenvalues at t). Then the image of u ’ P (u, γ), for u near t, describes a smooth
¬nite dimensional vector subbundle of R — H, since its rank is constant. For each u
choose an orthonormal system of eigenvectors vj (u) of A(u) corresponding to these
»j (u). They form a (not necessarily continuous) framing of this bundle. For any
sequence tk ’ t there is a subsequence such that each vj (tk ) ’ wj (t) where wj (t)
is again an orthonormal system of eigenvectors of A(t) for the eigenspace of »i (t).
Now consider
A(t) ’ »i (t) A(tk ) ’ A(t) »i (tk ) ’ »i (t)
vi (tk ) ’
vi (tk ) + vi (tk ) = 0,
tk ’ t tk ’ t tk ’ t

take the inner product of this with wi (t), note that then the ¬rst summand vanishes,
and let tk ’ t to obtain

»i (t) = A (t)wi (t), wi (t) for an eigenvector wi (t) of A(t) with eigenvalue »i (t).

552 Chapter X. Further Applications 50.17

This implies, where Vt = (V, t ),

|»i (t)| ¤ A (t) wi (t) wi (t)
Vt H
L(Vt ,H)

2 2
= A (t) wi (t) + A(t)wi (t)
L(Vt ,H) H H

1 + »i (t)2 ¤ a + a|»i (t)|,
= A (t) L(Vt ,H)

for a constant a which is valid for a compact interval of t™s since t ’ t is
smooth on V . By Gronwall™s lemma (see e.g. [Dieudonn´, 1960,] ( this
implies claim (3).
By the following arguments we can conclude that all eigenvalues may be numbered
as »i (t) for i in N or Z in such a way that they are C 1 , or C ∞ under the stronger
assumption, or real analytic in the real analytic case, in t ∈ R. Note ¬rst that by
claim (3) no eigenvalue can go o¬ to in¬nity in ¬nite time since it may increase at
most exponentially. Let us ¬rst number all eigenvalues of A(0) increasingly.
We claim that for one eigenvalue (say »0 (0)) there exists a C 1 (or C ∞ or real
analytic) extension to all of R; namely the set of all t ∈ R with a C 1 (or C ∞ or
real analytic) extension of »0 on the segment from 0 to t is open and closed. Open
follows from claim (1). If this interval does not reach in¬nity, from claim (3) it
follows that (t, »0 (t)) has an accumulation point (s, x) at the the end s. Clearly
x is an eigenvalue of A(s), and by claim (1) the eigenvalues passing through (s, x)
can be arranged C 1 (or C ∞ or real analytic), and thus »0 (t) converges to x and
can be extended C 1 (or C ∞ or real analytic) beyond s.
By the same argument we can extend iteratively all eigenvalues C 1 (or C ∞ or real
analytic) to all t ∈ R: if it meets an already chosen one, the proof of (50.11) shows
that we may pass through it coherently. In the smooth case look at (50.10) instead,
and in the real analytic case look at the proof of (50.12).
Now we start to choose the eigenvectors smoothly, under the stronger assumption
in the smooth case, and in the real analytic case. Let us consider again eigenvalues
{»i (t) : 1 ¤ i ¤ N } contained in the interior of a smooth curve γ for t in an open
interval I. Then Vt := P (t, γ)(H) is the ¬ber of a smooth (real analytic) vector
bundle of dimension N over I. We choose a smooth framing of this bundle, and
use then the proof of theorem (50.14) to choose smooth (real analytic) sub vec-
tor bundles whose ¬bers over t are the eigenspaces of the eigenvalues with their
generic multiplicity. By the same arguments as in (50.14) we then get global vec-
tor sub bundles with ¬bers the eigenspaces of the eigenvalues with their generic
multiplicities, and thus smooth (real analytic) eigenvectors for all eigenvalues.

50.17. Result. ([Rellich, 1969, page 43], see also [Kato, 1976, II, 6.8]). Let A(t)
be a C 1 -curve of (¬nite dimensional) symmetric matrices. Then the eigenvalues
can be chosen C 1 in t, on the whole parameter interval.

This result is best possible for the degree of continuous di¬erentiability, as is shown
by the example in [Alekseevsky, Kriegl, Losik, Michor, 1996, 7.4]


51. The Nash-Moser Inverse Function Theorem

This section treats the hard implicit function theorem of Nash and Moser following
[Hamilton, 1982], in full generality and in condensed form, but with all details. The
main di¬culty of the proof of the hard implicit function theorem is the following:
By trying to use the Newton iteration procedure for a nonlinear partial di¬erential
equation one quickly ¬nds out that ˜loss of derivatives™ occurs and one cannot reach
the situation, where the Banach ¬xed point theorem is directly applicable. Using
smoothing operators after each iteration step one can estimate higher derivatives
by lower ones and ¬nally apply the ¬xed point theorem.
The core of this presentation is the following: one proves the theorem in a Fr´chet
space of exponentially decreasing sequences in a Banach space, where the smooth-
ing operators take a very simple form: essentially just cutting the sequences at some
index. The statement carries over to certain direct summands which respect ˜boun-
ded losses of derivatives™, and one can organize these estimates into the concept
of tame mappings and thus apply the result to more general situations. However
checking that the mappings and also the inverses of their linearizations in a certain
problem are tame mappings (a priori estimates) is usually very di¬cult. We do not
give any applications, in view of our remarks before.

51.1. Remark. Let f : E ⊇ U ’ V ⊆ E be a di¬eomorphisms. Then di¬erenti-
ation of f ’1 —¦ f = Id and f —¦ f ’1 = Id at x and f (x) yields using the chain-rule,
that f (x) is invertible with inverse (f ’1 ) (f (x)) and hence x ’ f (x)’1 is smooth
as well.
The inverse function theorem for Banach spaces assumes the invertibility of the
derivative only at one point. Openness of GL(E) in L(E) implies then local in-
vertibility and smoothness of inv : GL(E) ’ GL(E) implies the smoothness of
x ’ f (x)’1 .
Beyond Banach spaces we do not have openness of GL(E) in L(E) as the following
example shows.

51.2. Example. Let E := C ∞ (R, R) and P : E ’ E be given by P (f )(t) :=
f (t)’t f (t) f (t). Since multiplication with smooth functions and taking derivatives
are continuous linear maps, P is a polynomial of degree 2. Its derivative is given
P (f )(h)(t) = h(t) ’ t h(t) f (t) ’ t f (t) h (t).

In particular, the derivative P (0) is the identity, hence invertible. However, at the
constant functions fn = n the derivative P (fn ) is not injective, since hk (t) := tk
1 k
are in the kernel: P (fn )(hk )(t) = tk ’ t · 0 · tk ’ t · n · k · tk’1 = tk · (1 ’ n ).

Let us give an even more natural and geometric example:

51.3. Example. Let M be a compact smooth manifold. For Di¬(M ) we have
shown that the 1-parameter subgroup of Di¬(M ) with initial tangent vector X ∈

554 Chapter X. Further Applications 51.3

TId Di¬(M ) = X(M ) is given by the ¬‚ow FlX of X, see (43.1). Thus, the exponen-
tial mapping Exp : TId Di¬(M ) ’ Di¬(M ) is given by X ’ FlX .1

The derivative T0 exp : Te G = T0 (Te G) ’ Texp(0) (G) = Te G at 0 of the exponential
mapping exp : g = Te G ’ G is given by

FltX (1, e) = FlX (t, e) = Xe .
d d d
dt |t=0 dt |t=0 dt |t=0
T0 exp(X) := exp(tX) =

Thus, T0 exp = Idg . In ¬nite dimensions the inverse function theorem now implies
that exp : g ’ G is a local di¬eomorphism.
What is the corresponding situation for G = Di¬(M )? We take the simplest
compact manifold (without boundary), namely M = S 1 = R/2π Z. Since the
natural quotient mapping p : R ’ R/2πZ = S 1 is a covering map we can lift
each di¬eomorphism f : S 1 ’ S 1 to a di¬eomorphism f : R ’ R. This lift
is uniquely determined by its initial value f (0) ∈ p’1 ([0]) = 2πZ. A smooth
mapping f : R ’ R projects to a smooth mapping f : S 1 ’ S 1 if and only if
˜ ˜
f (t + 2π) ∈ f (t) + 2πZ. Since 2πZ is discrete, f (t + 2π) ’ f (t) has to be 2πn for
some n ∈ Z not depending on t. In order that a di¬eomorphism f : R ’ R factors
to a di¬eomorphism f : S 1 ’ S 1 the constant n has to be +1 or ’1. So we ¬nally
obtain an isomorphism {f ∈ Di¬(R) : f (t + 2π) ’ f (t) = ±1}/2πZ ∼ Di¬(S 1 ). In
particular, we have di¬eomorphisms Rθ given by translations with θ ∈ S 1 (In the
picture S 1 ⊆ C these are just the rotations by with angle θ).

Claim. Let f ∈ Di¬(S 1 ) be ¬xed point free and in the image of exp. Then f is
conjugate to some translation Rθ .
We have to construct a di¬eomorphism g : S 1 ’ S 1 such that f = g ’1 —¦ Rθ —¦ g.
Since p : R ’ R/2πZ = S 1 is a covering map it induces an isomorphism Tt p :
R ’ Tp(t) S 1 . In the picture S 1 ⊆ C this isomorphism is given by s ’ s p(t)⊥ ,
where p(t)⊥ is the normal vector obtained from p(t) ∈ S 1 via rotation by π/2.
Thus, the vector ¬elds on S 1 can be identi¬ed with the smooth functions S 1 ’ R
or, by composing with p : R ’ S 1 with the 2π-periodic functions X : R ’ R.
Let us ¬rst remark that the constant vector ¬eld X θ ∈ X(S 1 ), s ’ θ has as ¬‚ow
θ θ
FlX : (t, •) ’ • + t · θ. Hence exp(X θ ) = FlX = Rθ .

Let f = exp(X) and suppose g —¦ f = Rθ —¦ g. Then g —¦ —¦g for t = 1.
= Flt
Let us assume that this is true for all t. Then di¬erentiating at t = 0 yields
T g(Xx ) = Xg(x) for all x ∈ S 1 . If we consider g as di¬eomorphism R ’ R this

means that g (t) · X(t) = θ for all t ∈ R. Since f was assumed to be ¬xed point free
the vector ¬eld X is nowhere vanishing. Otherwise, there would be a stationary

point x ∈ S 1 . So the condition on g is equivalent to g(t) = g(0) + 0 X(s) ds. We
take this as de¬nition of g, where g(0) := 0, and where θ will be chosen such that
t+2π ds
g factors to an (orientation preserving) di¬eomorphism on S 1 , i.e. θ t X(s) =

g(t + 2π) ’ g(t) = 1. Since X is 2π-periodic this is true for θ = 1/ 0 X(s) . Since
the ¬‚ow of a transformed vector ¬eld is nothing else but the transformed ¬‚ow we
obtain that g(FlX (t, x)) = FlX (t, g(x)), and hence g —¦ f = Rθ —¦ g.

51.4 51. The Nash-Moser inverse function theorem 555

In order to show that exp : X(S 1 ) ’ Di¬(S 1 ) is not locally surjective, it hence
su¬ces to ¬nd ¬xed point free di¬eomorphisms f arbitrarily close to the identity
which are not conjugate to translations. For this consider the translations R2π/n
and modify them inside the interval (0, 2π ) such that the resulting di¬eomorphism
π 3π k
f satis¬es f ( n ) ∈ n + 2πZ. Then f maps 0 to 2π, and thus the induced di¬eo-
morphism on S 1 has [0] as ¬xed point. If f would be conjugate to a translation,
the same would be true for f k , hence the translation would have a ¬xed point and
hence would have to be the identity. So f k must be the identity on S 1 , which is
impossible, since f ( n ) ∈ 3π + 2πZ.
Let us ¬nd out the reason for this break-down of the inverse function theorem. For
this we calculate the derivative of exp at the constant vector ¬eld X := X 2π/k :

ds |s=0
exp (X)(Y )(x) = exp (X + sY )(x)
d 2tπ
ds |s=0
= Fl (1, x) = Y (x + k ) dt,

where we have di¬erentiated the de¬ning equation for FlX+sY to obtain

FlX+sY (t, x) = |s=0 ‚t FlX+sY (t, x)
‚‚ ‚ ‚
‚t ‚s |s=0 ‚s

‚s |s=0 (X + sY )(Fl
= (t, x))
= Y (Fl (t, x)) + X (. . . )
= Y (x + t 2π ) + 0,

and the initial condition FlX+sY (0, x) = x gives



‚s |s=0 Fl (t, x) = Y (x + „ ) d„.

If we take x ’ sin(kx) as Y then exp (X)(Y ) = 0, so exp (X) is not injective, and
since X can be chosen arbitrarily near to 0 we have that exp is not locally injective.

So we may conclude that a necessary assumption for an inverse function theorem
beyond Banach spaces is the invertibility of f (x) not only for one point x but for
a whole neighborhood.
For Banach spaces one then uses that x ’ f (x)’1 is continuous (or even smooth),
which follows directly from the smoothness of inv : GL(E) ’ GL(E), see (51.1).
However, for Fr´chet spaces the following example shows that inv is not even con-
tinuous (for the c∞ -topology).

51.4. Example. Let s be the Fr´chet space of all fast falling sequences, i.e.
s := {(xk )k ∈ RN : (xk )k n := sup{(1 + k)n |xk | : k ∈ N} < ∞ for all n ∈ N}.
Next we consider a curve c : R ’ GL(s) de¬ned by

c(t)((xk )k ) := ((1 ’ h0 (t))x0 , . . . , (1 ’ hk (t))xk , . . . ),

556 Chapter X. Further Applications 51.4

where hk (t) := (1 ’ 2’k ) h(kt) for an h ∈ C ∞ (R, R) which will be chosen appropri-
Then c(t) ∈ GL(s) provided h(0) = 0 and supp h is compact, since then the factors
1 ’ hk (t) are equal to 1 for almost all k. The inverse is given by multiplying with
1/(1 ’ hk (t)), which exists provided h(R) ⊆ [0, 1].
Let us show next that inv —¦c : R ’ GL(s) ⊆ L(s) is not even continuous. For this
take x ∈ s, and consider

1 ’1 k
t’c (x) = (. . . ; 1 xk ; . . . ) = (?, . . . , ?; 2 xk ; ?, . . . ),
1 ’ hk ( k )

provided h(1) = 1. Let x be de¬ned by xk := 2’k , then c( k )’1 (x)’c(0)’1 (x)

1 ’ 2’k ’ 0.
It remains to show that c : R ’ GL(s) is continuous or even smooth. Since
smoothness of a curve depends only on the bounded sets, and boundedness in
GL(E) ⊆ L(E, E) can be tested pointwise because of the uniform boundedness
theorem (5.18), it is enough to show that evx —¦s : R ’ GL(s) ’ s is smooth.
Boundedness in a locally convex space can be tested by the continuous linear func-
tionals, so it would be enough to show that » —¦ evx —¦c : R ’ GL(s) ’ s ’ R is
smooth for all » ∈ s— . We want to use the particular functionals given by the coor-
dinate projections »k : (xk )k ’ xk . These, however, do not generate the bornology,
but if B ⊆ s is bounded, then so is k∈N »’1 (»k (B)). In fact, let B be bounded.
Then for every n ∈ N there exists a constant Cn such that (1 + k)n |xk | ¤ Cn for all
k and all x = (xk )k ∈ B. Then every y ∈ »’1 (»k (x)) (i.e., »k (y) = »k (x)) satis¬es
the same inequality for the given k, and hence k∈N »’1 (»k (B)) is bounded as well.
Obviously, »k —¦evx —¦c is smooth with derivatives (»k —¦evx —¦c)(p) (t) = (1’hk )(p) (t)xk .
Let cp (t) be the sequence with these coordinates. We claim that cp has values in s
and is (locally) bounded. So take an n ∈ N and consider

cp (t) = sup(1 + k)n |(1 ’ hk )(p) (t)xk |.

We have (1’hk )(p) (t) = 1(p) ’(1’2’k )k p h(p) (kt), and hence this factor is bounded
by 1+k p h(p) ∞ . Since (1+k n )(1+ h(p) ∞ k p )|xk | is by assumption on x bounded
we have that supt cp (t) n < ∞.
Now it is a general argument, that if we are given locally bounded curves cp : R ’ s
such that »k —¦ c0 is smooth with derivatives (»k —¦ c0 )(p) = »k —¦ cp , then c0 is smooth
with derivatives cp .
In fact, we consider for c = c0 the following expression

c(t) ’ c(0) »k (c(t)) ’ »k (c(0))
1 1
’ c1 (0) ’ »k (c1 (0)) ,
»k =
t t t t

which is by the classical mean value theorem contained in { 1 »k (c2 (s)) : s ∈
[0, t]}. Thus, taking for B the bounded set { 2 c (s) : s ∈ [0, 1]}, we conclude

51.7 51. The Nash-Moser inverse function theorem 557

that (c(t) ’ c(0))/t ’ c1 (0) /t is contained in the bounded set k∈N »’1 (»k (B)),
c(t)’c(0) 1 k 0
’ c (0). Doing the same for c = c shows that c is smooth
and hence t
with derivatives ck .

From this we conclude that in order to obtain an inverse function theorem we
have to assume beside local invertibility of the derivative also that x ’ f (x)’1 is
smooth. That this is still not enough is shown by the following example:

51.5. Example. Let E := C ∞ (R, R) and consider the map exp— : E ’ E given by
exp— (f )(t) := exp(f (t)). Then one can show that exp— is smooth. Its (directional)
derivative is given by

(f +sh)(t)
= h(t) · ef (t) ,

‚s |s=0 e
(exp— ) (f )(h)(t) =

so (exp— ) (f ) is multiplication by exp— (f ). The inverse of (exp— ) (f ) is the multi-
plication operator with exp1 (f ) = exp— (’f ), and hence f ’ (exp— ) (f )’1 is smooth

as well. But the image of exp— consists of positive functions only, whereas the curve
c : t ’ (s ’ 1 ’ ts) is a smooth curve in E = C ∞ (R, R) through exp— (0) = 1, and
c(t) is not positive for all t = 0 (take s := 1 ).

So we will need additional assumptions. The idea of the proof is to use that a
Fr´chet space is built up from Banach spaces as projective limit, to solve the inverse
function theorem for the building blocks, and to try to approximate in that way an
inverse to the original function. In order to guarantee that such a process converges,
we need (a priori) estimates for the seminorms, and hence we have to ¬x the basis
of seminorms on our spaces.

51.6. De¬nition. A Fr´chet space is called graded, if it is provided with a ¬xed
increasing basis of its continuous seminorms. A linear map T between graded
Fr´chet spaces (E, (pk )k ) and (F, (qk )k ) is called tame of degree d and base b if

∀n ≥ b ∃Cn ∈ R ∀x ∈ E : qn (T x) ¤ Cn pn+d (x).

Recall that T is continuous if and only if

∀n ∃m ∃Cn ∈ R ∀x ∈ E : qn (T x) ¤ Cn pm (x).

Two gradings are called tame equivalent of degree r and base b if and only if the
identity is tame of degree r and base b in both directions.

51.7. Examples. Let M be a compact manifold. Then C ∞ (M, R) is a graded
Fr´chet space, where we consider as k-th norm the supremum of all derivatives
of order less or equal to k. In order that this de¬nition makes sense, we can
embed M as closed submanifold into some Rn . Choosing a tubular neighborhood
Rn ⊇ U ’ M we obtain an extension operator p— : C ∞ (M, R) ’ C ∞ (U, R), and
on the latter space the operator norms of derivatives f k (x) for f ∈ C ∞ (U, R) make

558 Chapter X. Further Applications 51.8

Another way to give sense to the de¬nition is to consider the vector bundle J k (M, R)
of k-jets of functions f : M ’ R. Its ¬ber over x ∈ M consists of all “Taylor-
polynomials” of functions f ∈ C ∞ (M, R). We obtain an injection of C ∞ (M, R)
into the space of sections of J k (M, R) by associating to f ∈ C ∞ (M, R) the section
having the Taylor-polynomial of f at a point x ∈ M . So it remains to de¬ne a
norm pk on the space C ∞ (M ← J k (M, R)) of sections. This is just the supremum
norm, if we consider some metric on the vector bundle J k (M, R) ’ M .
Another method of choosing seminorms would be to take a ¬nite atlas and a par-
tition of unity subordinated to the charts and use the supremum norms of the
derivatives of the chart representations.
A second example of a graded Fr´chet space, closely related to the ¬rst one, is the
space s(E) of fast falling sequences in a Banach space E, i.e.

:= sup{(1 + k)n xk | : k ∈ N} < ∞ for all n ∈ N}.
s(E) := {(xk )k ∈ E N : (xk )k n

A modi¬cation of this is the space Σ(E) of very fast falling sequences in a Banach
space E, i.e.

enk xk < ∞ for all n ∈ N}.
Σ(E) := {(xk )k ∈ E N : (xk )k :=

51.8. Examples.
(1). Let T : s(E) ’ s(E) be the multiplication operator with a polynomial p, i.e.,
T ((xk )k ) := (p(k)xk )k .
We claim that T is tame of degree d := deg(p) and base 0. For this we estimate as

= sup{(1 + k)n p(k) xk : k ∈ N}
T ((xk )k ) n

¤ Cn sup{(1 + k)n+d xk : k ∈ N} = Cn (xk )k n+d ,

where d is the degree of p and Cn := sup{ (1+k)d : k ∈ N}. Note that Cn < ∞,
since k ’ (1 + k)d is not vanishing on N, and the limit of the quotient for k ’ ∞
is the coe¬cient of p of degree d.
This shows that s(E) is tamely equivalent to the same space, where the seminorms
are replaced by k (1 + k)n xk . In fact, the sums are larger than the suprema.
Conversely, k (1+k)n xk ¤ k (1+k)’2 (1+k)n+2 x k ¤ ’2
k (1+k) x n+2 ,
showing that the identity in the reverse direction is tame of degree 2 and base 0.
(2). Let T : Σ(E) ’ Σ(E) be the multiplication operator with an exponential
function, i.e., T ((xk )k ) := (ak xk )k .
We claim that T is tame of some degree and base 0. For this we estimate as follows:

enk ak xk = e(n+log(a))k xk
T ((xk )k ) =
k∈N k∈N

e(n+d)k xk = (xk )k
¤ n+d ,

51.10 51. The Nash-Moser inverse function theorem 559

where d is any integer greater or equal to log(a). Note however, that T is not well
de¬ned on s(E) for a > 1, and this is the reason to consider the space Σ(E).
Note furthermore, that as before one shows that one could equally well replace the
sum by the corresponding supremum in the de¬nition of Σ(E), one only has to use
that k∈N e’k = 1’1/e < ∞.

(3). As a similar example we consider a linear di¬erential operator D of degree
d, i.e., a local operator (the values Df depend at x only on the germ of f at x)
which is locally given in the form Df = |±|¤d g± · ‚ ± f , with smooth coe¬cient

functions g± ∈ C (M, R) on a compact manifold M .
Then D : C ∞ (M, R) ’ C ∞ (M, R) is tame of degree d and base 0. In fact, by the
product rule we can write the k-th derivative of Df as linear combination of partial
derivatives of the g± and derivatives of order up to k + d of f .
(4). Now we give an example of a non-tame linear map. For this consider T :
C ∞ ([0, 1], R) ’ C ∞ ([’1, 1], R) given by T f (t) := f (t2 ). It was shown in the proof

of (25.2) that the image of T consists exactly of the space Ceven ([’1, 1], R) of even
functions. Since (T f )(n) (t) = f (n) (t2 )(2t)n + 0<2k¤n cn f (n’k) (t2 )tn’2k with some
ck ∈ Z, we have that T is tame of order 0 and degree 0. But the inverse is not
tame since (T f )(2n) (0) is proportional to f (n) (0), hence in order to estimate the
n-th derivative of T ’1 g we need the 2n-th derivative of g.

51.9. De¬nition. A graded Fr´chet space F is called tame if there exists some
Banach space E such that F is a tame direct summand in Σ(E), i.e. there are tame
linear mappings i : F ’ Σ(E) and p : Σ(E) ’ F with p —¦ i = IdF .

Our next aim is to show that instead of Σ(E) we can equally well use s(E). For
this we consider a measured space (X, µ) and a measurable positive weight function
w : X ’ R and de¬ne

L1 (X, µ, w) := f ∈ L1 (X, µ) : f en w(x) |f (x)| dµ(x) < ∞ .

51.10. Proposition. Every space L1 (X, µ, w) is a tame Fr´chet space.

Proof. Let Xk := {x ∈ X : k ¤ w(x) < k + 1}. Then the Xk form a countable
disjoint covering of X by measurable sets. Let χk be the characteristic function of
Xk , and let R : L1 (X, µ, w) ’ ΣL1 (X, µ) and L : ΣL1 (X, µ) ’ L1 (X, µ, w) be
de¬ned by Rf := (χk · f )k and L((fk )k ) := k χk · fk . Then obviously L —¦ R = Id.
The linear map R is well-de¬ned and tame of degree 0 and base 0, since

enk χk f enk |f | dµ ¤
Rf = =
n 1
k k

ew(x)n |f (x)| dµ(x) = ew(x)n |f (x)| dµ(x) = f
¤ n.
Xk X

560 Chapter X. Further Applications 51.13

Finally, L is a well-de¬ned linear map, which is tame of degree 0 and base 0, since

en w(x) en w(x) |fk (x)| dµ(x)
L((fk )k ) = χk fk (x) dµ(x) =
X Xk
k k

en(k+1) |fk (x)| dµ(x) ¤ en(k+1) |fk (x)| dµ(x)
Xk Xk
k k

= en enk fk = en (fk )k n.

51.11. Corollary. For every Banach space E the space s(E) is a tame Fr´chet

Proof. This result follows immediately from the proposition (51.10) above, if one
replaces L1 (X, µ, w) by the vector valued function space L1 (X, µ, w; E) and simi-
1 1
larly the space L (X, µ) by the Banach space L (X, µ; E).

Now let us show the converse direction:

51.12. Proposition. For every Banach space E the space Σ(E) is a tame direct
summand of s(E).

Proof. We de¬ne R : Σ(E) ’ s(E) and L : s(E) ’ Σ(E) by R((xk )k ) := (yk )k ,
where y[ek ] := xk and 0 otherwise, and L((yk )k ) := (y[ek ] )k . The map R is well-
de¬ned, linear and tame, since (yk )k n := k (1 + k)n yk = j (1 + [ej ])n xj ¤
jn n
j (2e ) xj = 2 (xj )j n . The map L is well-de¬ned, linear and tame, since
kn kn kn
y[ek ] ¤ ¤
(xk )k n := ke xk = ke k (1 + [e ]) y[ek ] j (1 +
j)n yj = y n . Obviously, L —¦ R = Id.

51.13. De¬nition. A non-linear map f : E ⊇ U ’ F between graded Fr´chet e
spaces is called tame of degree r and base b if it is continuous and every point in U
has a neighborhood V such that
∀n ≥ b ∃Cn ∀x ∈ V : f (x) ¤ Cn (1 + x n+r ).

Remark. Every continuous map from a graded Fr´chet space into a Banach space
is tame.
For ¬xed x0 ∈ U choose a constant C > f (x0 ) and let V := {x : f (x) < C}.
Then V is an open neighborhood of x0 , and for all n and all x ∈ V we have
f (x) n = f (x) ¤ C ¤ C(1 + x n ).
Every continuous map from a ¬nite dimensional space into a graded Fr´chet space
is tame.
Choose a compact neighborhood V of x0 . Let Cn := max{ f (x) n : x ∈ V }. Then
f (x) n ¤ Cn ¤ Cn (1 + x ).
It is easily checked that the composite of tame linear maps is tame. In fact
¤ C(1 + g(x)
f (g(x)) n+r )

¤ C(1 + C(1 + x ¤ C(1 + x
n+r+s )) n+r+s )

for all x in an appropriately chosen neighborhood and n ≥ bf and n + r ≥ bg .

51.15 51. The Nash-Moser inverse function theorem 561

51.14. Proposition. The de¬nition of tameness of degree r is coherent with the
one for linear maps, but the base may change.

Proof. Let ¬rst f be linear and tame as non-linear map. In particular, we have
locally around 0
f (x) n ¤ C(1 + x n+r ) for all n ≥ b.
If we increase b, we may assume that the 0-neighborhood is of the form {x :
x b+r ¤ µ} for some µ > 0. For y = 0 let x := y µb+r y, i.e., x b+r = µ.
Thus, f (x) n ¤ C(1 + x n+r ). By linearity of f , we get

y y y
b+r b+r b+r
f (y) = f (x) +x
n n n+r
µ µ µ
y b+r
=C +y .
¤x for b ¤ n we get
Since y b+r n+r

f (y) +1 x n+r .

Conversely, let f be a tame linear map. Then the inequality

¤C x ¤ C(1 + x for all n ≥ b
f (x) n+r )
n n+r

is true.

De¬nition. For functions f of two variables we will de¬ne tameness of bi-degree
(r, s) and base b if locally

∀n ≥ b ∃C ∀x, y : f (x, y) ¤ C(1 + x +y n+s );
n n+r

and similar for functions in several variables.

51.15. Lemma. Let f : U — E ’ F be linear in the second variable and tame of
b+r —
base b and degree (r, s) in a b+s -neighborhood. Then we have

∀n ≥ b ∃C : f (x)h ¤ C( h +x h b+s )
n n+s n+r

for all x in a b+r -neighborhood and all h.
If f : U — E1 — E2 is tame of base b and degree (r, s, t) in a — —
b+r b+s
b+t -neighborhood. Then we have

¤ C( h
f (x)(h, k) k +h k +x h k b+t )
n n+s b+t b+s n+t n+r b+s

for all x in a b + r-neighborhood and all h and k.
¯ µ
Proof. For arbitrary h let h := h. Then
h b+s

¯ ¯
f (x)h ¤ C(1 + x +h n+s ).

562 Chapter X. Further Applications 51.18

h h µ
b+s b+s
¯ ¤
f (x)h = f (x)h C 1+ x + h
n n n+r n+s
µ µ h b+s
C h b+s C h b+s
¤ + x n+r + C h n+s
µ µ
1 C
¤C + 1 h n+s + x n+r h b+s .
µ µ
The second part is proved analogously.

51.16. Proposition. Interpolation formula for Σ(E).

·x ¤x ·x for 0 ¤ r ¤ n ¤ m.
x n m n’r m+r

Proof. Let us ¬rst consider the special case, where n = m and r = 1. Then
n’1 · ’x
x x =
n+1 n

e(n’1)k xk e(n+1)l xl ’ enk xk enl xl
k l k l

(e(n’1)k e(n+1)k ’ e2nk ) xk 2

(e(n’1)k e(n+1)l + e(n+1)k e(n’1)l ’ 2en(k+l) ) xk
+ xl .

In both subsummands the expression in brackets is positive, since

e(n’1)k e(n+1)l + e(n+1)k e(n’1)l ’ 2en(k+l) =
= 2en(k+l) (el’k + ek’l ’ 2) = 4en(k+l) (cosh(l ’ k) ’ 1) ≥ 0.

By transitivity, it is enough to show the general case for r = 1. Without loss of
generality we may assume x = 0. Then this case is equivalent to
xn x m+1
¤ for n ¤ m.
x n’1 xm
Again by transitivity it is enough to show this for m = n.

51.17. The Nash-Moser inverse function theorem. Let E and F be tame
Fr´chet spaces and let f : E ⊇ U ’ F be a tame smooth map. Suppose f has a
tame smooth family Ψ of inverses. Then f is locally bijective, and the inverse of f
is a tame smooth map.

The proof will take the rest of this section.

51.18. Proposition. Let E and F be tame Fr´chet spaces and let f : E ⊇ U ’ F
be a tame smooth map. Suppose f has a tame smooth family Ψ of linear left
inverses. Then f is locally injective.

51.19 51. The Nash-Moser inverse function theorem 563

51.19. Proposition. Let E and F be tame Fr´chet spaces and let f : E ⊇ U ’ F
be a smooth tame map. Suppose f has a tame smooth family Ψ of linear right
inverses. Then f is locally surjective (and locally has a smooth right inverse).

By a tame smooth mapping f we will for the moment understand an in¬nitely often
Gˆteaux di¬erentiable map, for which the derivatives f (n) (x) are multilinear and
are tame as maps U — E n ’ F .
By a tame smooth family of (one-sided) inverses of f we understand a family
(Ψ(x))x∈U : F ’ E of (one-sided) inverses of (f (x))x∈U , which gives a tame
smooth map Ψ§ : U — F ’ E.
Let us start with some preparatory remarks for the proofs. Contrary to good
manners the symbol C will almost never denote the same constant even not in the
same inequality. This constant may depend on the index of the norm n but not on
any argument of the norms.
For all three proofs we may assume that the initial values are f : 0 ’ 0 (apply
translations in the domain and the codomain).

Claim. We may assume that E = Σ(B) and F = Σ(C).
First for (51.18). In fact, E and F are direct summands in such spaces Σ(B) and
Σ(C). We extend f to a smooth tame mapping f : Σ(B) ⊇ U ’ Σ(B — C) ∼
˜ ˜ =
Σ(B) — Σ(C), by setting U := p’1 (U ), where p : Σ(B) ’ E is the retraction,
and f := (Id ’p, f —¦ p). Note that (Id ’p) preserves exactly that part which gets
annihilated by f —¦ p. More precisely injectivity of f implies that of f . In fact,
f (x) = f (y) implies x = p(x), y = p(y), and hence (Id ’p)(x) = 0 = (Id ’p)(y), and
˜ ˜ ˜x ˜ ˜ ˜ ˜x
so f (x) = f (y). Since f (˜)(h) = ((Id ’p)(h), f (p(˜)) · p(h)), let Ψ(˜) := (Id ’p) —¦
˜x ˜x
pr1 +Ψ(p(˜))—¦pr2 . Then Ψ(˜)—¦ f (˜) = (Id ’p)—¦(Id ’p)+Ψ(p(˜))—¦f (p(˜))—¦p = Id.
x x x
Now for (51.19). Here we extend f to a smooth tame mapping f : Σ(B — C) ∼
˜ =
˜ ˜
Σ(B) — Σ(C) ⊇ U ’ Σ(C), by setting U := p’1 (U ) — Σ(C) and f := (f —¦ p) •
(Id ’q), where p : Σ(B) ’ E and q : Σ(C) ’ F are the retractions. Since
˜x˜ ˜x˜
f (˜, y ) = f (p(˜)) —¦ p • (Id ’q) let Ψ(˜, y ) : Σ(C) ’ Σ(B) — Σ(C) be de¬ned
˜x˜ ˜ ˜ ˜ ˜x˜
by Ψ(˜, y )(k) := (Ψ(p(˜))(q(k)), (Id ’q)(k)), i.e. Ψ(˜, y ) := (Ψ(p(˜)) —¦ q, (Id ’q)).
x x

˜x˜ ˜x˜
f (˜, y ) —¦ Ψ(˜, y ) = ((f —¦ p) (˜) • (Id ’q) (˜)) —¦ (Ψ(p(˜)) —¦ q, (Id ’q))
x y x
= f (p(˜)) —¦ p —¦ Ψ(p(˜)) —¦q + (Id ’q) —¦ (Id ’q)
x x

= q + (Id ’2q + q 2 ) = Id .

Claim. We may assume that x ’ f (x), (x, h) ’ f (x)h, (x, h) ’ f (x)(h, h) and
(x, k) ’ Ψ(x)k satisfy tame estimates of degree 2r in x, of degree r in h and 0 in
k (for some r) and base 0 on the set { x 0 ¤ 1}.
Consider on Σ(B) the linear operators p which are de¬ned by ( p x)k := epk xk .
x n = x n+p . If f satis¬es f (x) n ¤ C(1 + x n+s ) on x a ¤ δ

564 Chapter X. Further Applications 51.20

˜ ˜
for n ≥ b then f := q —¦ f —¦ ’p satis¬es f (x) m = f ( ’p x) m+q ¤ C(1 +
x m+q+s ) = C(1 + x m+q+s’p ) on x a’p ¤ δ for m ≥ b ’ q.
Choosing q and p su¬ciently large, we may assume that f , f , f , and Ψ satisfy
tame estimates of base 0 (choose q large in comparison to b) on {x : x 0 ¤ δ}
(choose p large in comparison to a). Furthermore, we may achieve that (x, k) ’
Ψ(x)k is tame of order 0 (since by linearity we don™t need p for the neighborhood,
which is now global, but we have to choose it so that m + q + s ’ p ¤ m) in k (but
we cannot achieve that this is also true for f ). Now take r su¬ciently large such
that the degrees are dominated by 2r and r, and ¬nally replace f by x ’ f (cx) to
obtain δ = 1.

¤ 1 we have for all n ≥ 0 a Cn > 0 such that
Claim. On x 2r

¤ Cn x
f (x) n+2r ,

¤ Cn ( h
f (x)h +x h r ),
n n+r n+2r

¤ Cn ( h1
f (x)(h1 , h2 ) h2 + h1 h2 +x h1 h2 r ),
n n+r r r n+r n+2r r

¤ Cn ( k
Ψ(x)k +x k 0 ).
n n n+2r

The 2nd, 3rd and 4th inequality follow from the corresponding tameness and
(51.15), since the neighborhood is given by a norm with index higher then base
+ degree. For the ¬rst inequality one would expect f (x) n ¤ C(1 + x n+2r ), but
since f (0) = 0 one can drop the 1, which follows from integration of the second
¤ C( x
f (x) = f (0) + f (tx)x dt + x x r ).
n n+r n+2r

¤x ¤x ¤ 1 we are done.
Since x and x
n+r n+2r r 2r

Proof of (51.18). The idea comes from the 1-dimensional situation, where f (x) =
f (y) implies by the mean value theorem that there exists an r ∈ [x, y] := {tx + (1 ’
t)y : 0 ¤ t ¤ 1} with f (r) = f (x)’f (y) = 0.

51.20. Sublemma. There exists a δ > 0 such that for xj 2r ¤ δ we have
x1 ’ x0 0 ¤ C f (x1 ) ’ f (x0 ) 0 . In particular, we have that f is injective on
{x : x 2r ¤ δ}.

Proof. Using the Taylor formula
(1 ’ t)f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2 dt
f (x1 ) = f (x0 ) + f (x0 )(x1 ’ x0 ) +

and Ψ(x0 ) —¦ f (x0 ) = Id, we obtain that x1 ’ x0 = Ψ(x0 )(k), where

(1 ’ t)f (x0 + t(x1 ’ x0 )) (x1 ’ x0 )2 dt.
k := f (x1 ) ’ f (x0 ) ’

51.21 51. The Nash-Moser inverse function theorem 565

¤ 1 we can use the tame estimates of f and interpolation to get
For xj 2r

f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2 ¤
¤C x1 ’ x0 x1 ’ x0 x1 ’ x0
+ ( x1 + x0 n+2r )
n+r r n+2r r

¤C x1 ’ x0 x1 ’ x0 x1 ’ x0 x1 ’ x0
+ ( x1 + x0 n+2r )
n+2r 0 n+2r 2r 0

¤ C ( x1 x1 ’ x0 x1 ’ x0
+ x0 n+2r ) + ( x1 + x0 n+2r )2δ
n+2r 0 n+2r 0

¤ C( x1 x1 ’ x0 0 .
+ x0 n+2r )

Using the tame estimate

¤ C k 0 (1 + x0 ¤ C k 0,
Ψ(x0 )k 2r )

we thus get

x1 ’ x0 ¤C k ¤
= Ψ(x0 )k
0 0 0

+ 1 C( x1
¤C f (x1 ) ’ f (x0 ) x1 ’ x0
+ x0 2r )
0 2r 0
+ x1 ’ x0 2 )
¤ C ( f (x1 ) ’ f (x0 ) 0 r
¤ C ( f (x1 ) ’ f (x0 ) + x1 ’ x0 · x1 ’ x0 0 ).
0 2r

Now use x1 ’ x0 ¤ x1 ¤ 2δ to obtain
+ x0
2r 2r 2r

x1 ’ x0 ¤ C( f (x1 ) ’ f (x0 ) + 2δ x1 ’ x0 0 ).
0 0

Taking δ < yields the result.

¤ δ with δ as before. Then for n ≥ 0 we have
51.21. Corollary. Let xj 2r

x1 ’ x0 ¤C f (x1 ) ’ f (x0 ) f (x1 ) ’ f (x0 )
+ ( x1 + x0 n+2r ) .
n n n+2r 0

Proof. As before we have

f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2 ¤ C( x1 x1 ’ x0 0 .
+ x0 n+2r )
n n+2r

Since Ψ is tame we obtain now

x1 ’ x0 = Ψ(x0 ) f (x1 ) ’ f (x0 )
(1 ’ t)f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2


¤ Ψ(x0 ) f (x1 ) ’ f (x0 ) +
(1 ’ t)f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2
+ Ψ(x0 )

566 Chapter X. Further Applications 51.21

¤C f (x1 ) ’ f (x0 ) · f (x1 ) ’ f (x0 )
+ x0 +
n n+2r 0
(1 ’ t)f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2
+C +
(1 ’ t)f (x0 + t(x1 ’ x0 ))(x1 ’ x0 )2
+ x0 n+2r

¤C f (x1 ) ’ f (x0 ) f (x1 ) ’ f (x0 )
+ ( x1 + x0 n+2r )
n n+2r 0

x1 ’ x0
+ ( x1 + x0 n+2r )
n+2r 0

¤C f (x1 )’f (x0 ) 0

· ( x1 x1 ’ x0
+ ( x1 + x0 n+2r ) + x0 2r )
n+2r 2r 0

¤2δ C f (x1 )’f (x0 ) 0

¤C f (x1 ) ’ f (x0 ) f (x1 ) ’ f (x0 )
+ ( x1 + x0 n+2r ) .
n n+2r 0

Proof of (51.19). As in (51.18) we may assume that the initial condition is f :
0 ’ 0 and that E = Σ(B) and F = Σ(C).
The idea of the proof is to solve the equation f (x) = y via a di¬erential equation
for a curve t ’ x(t) whose image under f joins 0 and y a¬nely. More precisely we
consider the parameterization t ’ h(t) y of the segment [0, y], where h(t) := 1’e’ct
is a smooth increasing function with h(0) = 0 and limt’+∞ h(t) = 1. Di¬erentiation
of f (x(t)) = h(t) y yields f (x(t)) · x (t) = h (t) y and (if f (x) is invertible) that
x (t) = c Ψ(x(t)) · e’ct y. Substituting e’ct y = (1 ’ h(t)) y = y ’ f (x(t)) gives

x (t) = c Ψ(x(t)) · (y ’ f (x(t))).

In Fr´chet spaces (like Σ(B)) we cannot guarantee that this di¬erential equation
with initial condition x(0) = 0 has a solution. The subspaces Bt := {(xk )k ∈
Σ(B) : xk = 0 for k > t} however are Banach spaces (isomorphic to ¬nite products
of B), and they are direct summands with the obvious projections. So the idea is
to modify the di¬erential equation in such a way that for ¬nite t it factors over Bt
and to prove that the solution of the modi¬ed equation still converges for t ’ ∞
to a solution x∞ of f (x∞ ) = y. Since t is a non-discrete parameter we have to
consider the spaces Bt as a continuous family of Banach spaces, and so we have to
¬nd a family (σt )t∈R of projections (called smoothing operators). For this we take
a smooth function σ : R ’ [0, 1] with σ(t) = 0 for t ¤ 0 and σ(t) = 1 for t ≥ 1.
Then we set σt (x)(k) := σ(t ’ k) · x(k).
We have to show that σt ’ Id, more precisely we

¤ cn,m e(n’m)t x
Claim. For n ≥ m there exists a cn,m such that σt x and
n m
(1 ’ σt )x m ¤ cn,m e(m’n)t x n .
xk . Since (σt x)k ¤
Recall that x n := ke xk for all t and k and

51.22 51. The Nash-Moser inverse function theorem 567

(σt x)k = 0 for t ¤ k and ((1 ’ σt )x)k = 0 for t ≥ k + 1 we have

enk (σt x)k
σt x =

enk xk ¤ e(n’m)k emk xk ¤ e(n’m)t x
¤ m,
k¤t k¤t

emk xk ¤ e(m’n)k enk xk ¤ en’m e(m’n)t x k .
(1 ’ σt )x ¤
k≥t’1 k≥t’1

Now we modify our di¬erential equation by projecting the arguments of the de¬ning
function to Bt , i.e.

x (t) = c Ψ(σt (x(t))) · (σt (y ’ f (x(t)))) with x(0) = 0.

Thus, our modi¬ed di¬erential equation factors for ¬nite t over some Banach space.
The following sublemma now provides us with local solutions

51.22. Sublemma. If a function f : F ⊇ U ’ F factors via smooth maps over a
Banach space E “ i.e., f = g —¦ h, where h : F ⊇ U ’ W ⊆ E and g : E ⊇ W ’ F
are smooth maps “ then the di¬erential equation y (t) = f (y(t)) has locally unique
solutions depending continuously (smoothly) on the initial condition y0 ∈ U .

RR f j w F
h h
yh R hh
h h
Proof. Suppose y is a solution of the di¬erential equation y = f —¦ y with initial
condition y(0) = y0 , or equivalently y(t) = y0 + 0 f (y(s)) ds. The idea is to consider
the curve x := h —¦ y in the Banach space E. Thus,
t t
x(t) = h y0 + g(h(y(s))) ds = h y0 + g(x(s)) ds .
0 0

Now conversely, if x is a solution of this integral equation, then t ’ y(t) := y0 +
g(x(s)) ds is a solution of the original integral equation and hence also of the
di¬erential equation, since x(t) = h(y0 + 0 g(x(s)) ds) = h(y(t)), and so y(t) =
t t t
y0 + 0 g(x(s)) ds = y0 + 0 g(h(y(s))) ds = y0 + 0 f (y(s)) ds.
In order to show that x exists, we consider the map
k:x’ t ’ h y0 + g(x(s)) ds

and show that it is a contraction.
Since h is smooth we can ¬nd a seminorm on F , a C > 0 and an · > 0 such
h(y1 ) ’ h(y0 ) ¤ C y1 ’ y0 ¤ ·.
for all yj
q q

568 Chapter X. Further Applications 51.22

Furthermore, since g is smooth we ¬nd a constant C > 0 and θ > 0 such that

g(x1 ) ’ g(x0 ) ¤ C x1 ’ x0 for all xj ¤ θ.

¤ C and that θ ¤ 1. So we
Since we may assume that h(0) = 0, that g(0) q

h(y) ¤ C y ¤ · and g(x) ¤ 2C for all x ¤ θ.
for all y
q q q

˜ ˜
Let U := {y0 ∈ F : y0 q ¤ δ}, let V := {x ∈ C([0, µ], E) : x(t) ¤ θ for all t},
and let k : F — C([0, µ], E) ⊇ U — V ’ C([0, µ], E) be given by

k(y0 , x)(t) := h y0 + g(x(s)) ds .

Then k is continuous with values in V and is a C 2 µ-contraction with respect to x.
In fact, y0 + 0 g(x(s)) ds q ¤ y0 q + µ sup{ g(x(s)) q : s} ¤ δ + 2C µ ¤ · for
su¬ciently small δ and µ. So k(y0 , x)(t) ¤ C · ¤ θ for su¬ciently small ·. Hence,
k(y0 , x) ∈ V . Furthermore,

k(y0 , x1 )(t) ’ k(y0 , x0 )(t) ¤ C g(x1 (s)) ’ g(x0 (s)) ds
0 q
¤ C µ sup{ g(x1 (s)) ’ g(x0 (s)) : s}


. 24
( 27)