. 18
( 26)


k=0 l=0 k=0
438 10. Orthogonal Polynomials in Approximation Theory

Therefore, looking at the ¬rst and last equality, we get
N ’1 N ’1
’(j’ N )k
ik(j’ N )h
, j = 0, . . . , N ’ 1. (10.55)
f (xj ) = fk e = fk WN

k=0 k=0

The mapping {f (xj )} ’ {fk } described by (10.52) is called the Discrete
Fourier Transform (DFT), while the mapping (10.55) from {fk } to {f (xj )}
is called the inverse transform (IDFT). Both DFT and IDFT can be written
in matrix form as {fk } = T{f (xj )} and {f (xj )} = C{fk } where T ∈
CN —N , C denotes the inverse of T and
1 (k’ N )j
k, j = 0, . . . , N ’ 1,
Tkj = W ,
’(j’ N )k
j, k = 0, . . . , N ’ 1.
Cjk = WN ,
A naive implementation of the matrix-vector computation in the DFT and
IDFT would require N 2 operations. Using the FFT (Fast Fourier Trans-
form) algorithm only O(N log2 N ) ¬‚ops are needed, provided that N is a
power of 2, as will be explained in Section 10.9.2.

The function ΠF f ∈ SN introduced in (10.53) is the solution of the
minimization problem f ’ ΠF f N ¤ f ’ g N for any g ∈ SN , where
· N = (·, ·)N is a discrete norm for SN . In the case where f is periodic
with all its derivatives up to order s (s ≥ 1), an error estimate analogous
to that for Chebyshev and Legendre interpolation holds

¤ CN ’s f
f ’ ΠF f L2 (0,2π) s

and also
max |f (x) ’ ΠF f (x)| ¤ CN 1/2’s f s.

In a similar manner, we also have

|(f, vN ) ’ (f, vN )N | ¤ CN ’s f vN

for any vN ∈ SN , and in particular, setting vN = 1 we have the following
error for the quadrature formula (10.51)
2π N ’1
f (xj ) ¤ CN ’s f
f (x)dx ’ h s

(see for the proof [CHQZ88], Chapter 2).
N ’1
Notice that h f (xj ) is nothing else than the composite trapezoidal

rule for approximating the integral f (x)dx. Therefore, such a formula
10.9 Fourier Trigonometric Polynomials 439

turns out to be extremely accurate when dealing with periodic and smooth

Programs 88 and 89 provide an implementation of the DFT and IDFT. The
input parameter f is a string containing the function f to be transformed
while fc is a vector of size N containing the values fk .

Program 88 - dft : Discrete Fourier transform

function fc = dft(N,f)
h = 2*pi/N; x=[0:h:2*pi*(1-1/N)]; fx = eval(f); wn = exp(-i*h);
for k=0:N-1,
s = 0;
for j=0:N-1
s = s + fx(j+1)*wnˆ((k-N/2)*j);
fc (k+1) = s/N;

Program 89 - idft : Inverse discrete Fourier transform

function fv = idft(N,fc)
h = 2*pi/N; wn = exp(-i*h);
for k=0:N-1
s = 0;
for j=0:N-1
s = s + fc(j+1)*wnˆ(-k*(j-N/2));
fv (k+1) = s;

10.9.1 The Gibbs Phenomenon
Consider the discontinuous function f (x) = x/π for x ∈ [0, π] and equal to
x/π ’ 2 for x ∈ (π, 2π], and compute its DFT using Program 88. The inter-
polate ΠF f is shown in Figure 10.3 (above) for N = 8, 16, 32. Notice the
spurious oscillations around the point of discontinuity of f whose maximum
amplitude, however, tends to a ¬nite limit. The arising of these oscillations
is known as Gibbs phenomenon and is typical of functions with isolated
jump discontinuities; it a¬ects the behavior of the truncated Fourier series
not only in the neighborhood of the discontinuity but also over the entire
interval, as can be clearly seen in the ¬gure. The convergence rate of the
truncated series for functions with jump discontinuities is linear in N ’1 at
every given non-singular point of the interval of de¬nition of the function
(see [CHQZ88], Section 2.1.4).
440 10. Orthogonal Polynomials in Approximation Theory






0 1 2 3 4 5 6





0 1 2 3 4 5 6

FIGURE 10.3. Above: Fourier interpolate of the sawtooth function (thick solid
line) for N = 8 (dash-dotted line), 16 (dashed line) and 32 (thin solid line).
Below: the same informations are plotted in the case of the Lanczos smoothing

Since the Gibbs phenomenon is related to the slow decay of the Fourier
coe¬cients of a discontinuous function, smoothing procedures can be prof-
itably employed to attenuate the higher-order Fourier coe¬cients. This can
be done by multiplying each coe¬cient fk by a factor σk such that σk is a
decreasing function of k. An example is provided by the Lanczos smoothing

sin(2(k ’ N/2)(π/N ))
k = 0, . . . , N ’ 1.
σk = , (10.56)
2(k ’ N/2)(π/N )

The e¬ect of applying the Lanczos smoothing to the computation of the
DFT of the above function f is represented in Figure 10.3 (below), which
shows that the oscillations have almost completely disappeared.
For a deeper analysis of this subject we refer to [CHQZ88], Chapter 2.

10.9.2 The Fast Fourier Transform
As pointed out in the previous section, computing the discrete Fourier
transform (DFT) or its inverse (IDFT) as a matrix-vector product, would
require N 2 operations. In this section we illustrate the basic steps of the
Cooley-Tukey algorithm [CT65], commonly known as Fast Fourier Trans-
form (FFT). The computation of a DFT of order N is split into DFTs of
order p0 , . . . , pm , where {pi } are the prime factors of N . If N is a power of
2, the computational cost has the order of N log2 N ¬‚ops.
A recursive algorithm to compute the DFT when N is a power of 2
is described in the following. Let f = (fi )T , i = 0, . . . , N ’ 1 and set
N ’1
fj xj . Then, computing the DFT of the vector f amounts to
p(x) = N
10.9 Fourier Trigonometric Polynomials 441

k’ N
) for k = 0, . . . , N ’1. Let us introduce the polynomials
evaluating p(WN
1 N
f0 + f2 x + . . . + fN ’2 x 2 ’1 ,
pe (x) =
1 N
f1 + f3 x + . . . + fN ’1 x 2 ’1 .
po (x) =
Notice that
p(x) = pe (x2 ) + xpo (x2 )
from which it follows that the computation of the DFT of f can be carried
2(k’ N )
out by evaluating the polynomials pe and po at the points WN ,k=
0, . . . , N ’ 1. Since
2(k’ N )
= exp ’i
2k’N k
WN = WN exp(i2π) = WN/2 ,
it turns out that we must evaluate pe and po at the principal roots of unity
of order N/2. In this manner the DFT of order N is rewritten in terms
of two DFTs of order N/2; of course, we can recursively apply again this
procedure to po and pe . The process is terminated when the degree of the
last generated polynomials is equal to one.

In Program 90 we propose a simple implementation of the FFT recursive
algorithm. The input parameters are the vector f containing the NN values
fk , where NN is a power of 2.

Program 90 - ¬trec : FFT algorithm in the recursive version
function [¬tv]=¬trec(f,NN)
N = length(f); w = exp(-2*pi*sqrt(-1)/N);
if N == 2
¬tv = f(1)+w.ˆ[-NN/2:NN-1-NN/2]*f(2);
a1 = f(1:2:N); b1 = f(2:2:N);
a2 = ¬trec(a1,NN); b2 = ¬trec(b1,NN);
for k=-NN/2:NN-1-NN/2
¬tv(k+1+NN/2) = a2(k+1+NN/2) + b2(k+1+NN/2)*wˆk;

Remark 10.4 A FFT procedure can also be set up when N is not a power
of 2. The simplest approach consists of adding some zero samples to the
original sequence {fi } in such a way to obtain a total number of N = 2p
values. This technique, however, does not necessarily yield the correct re-
sult. Therefore, an e¬ective alternative is based on partitioning the Fourier
matrix C into subblocks of smaller size. Practical FFT implementations
can handle both strategies (see, for instance, the fft package available in
442 10. Orthogonal Polynomials in Approximation Theory

10.10 Approximation of Function Derivatives
A problem which is often encountered in numerical analysis is the ap-
proximation of the derivative of a function f (x) on a given interval [a, b].
A natural approach to it consists of introducing in [a, b] n + 1 nodes
{xk , k = 0, . . . , n}, with x0 = a, xn = b and xk+1 = xk +h, k = 0, . . . , n’1
where h = (b ’ a)/n. Then, we approximate f (xi ) using the nodal values
f (xk ) as

m m
h ±k ui’k = βk f (xi’k ), (10.57)
k=’m k=’m

where {±k }, {βk } ∈ R are m + m + 1 coe¬cients to be determined and uk
is the desired approximation to f (xk ).
A non negligible issue in the choice of scheme (10.57) is the computa-
tional e¬ciency. Regarding this concern, it is worth noting that, if m = 0,
determining the values {ui } requires the solution of a linear system.
The set of nodes which are involved in constructing the derivative of f at
a certain node, is called a stencil. The band of the matrix associated with
system (10.57) increases as the stencil gets larger.

10.10.1 Classical Finite Di¬erence Methods
The simplest way to generate a formula like (10.57) consists of resorting to
the de¬nition of the derivative. If f (xi ) exists, then

f (xi + h) ’ f (xi )
f (xi ) = lim+ . (10.58)

Replacing the limit with the incremental ratio, with h ¬nite, yields the

f (xi+1 ) ’ f (xi )
0 ¤ i ¤ n ’ 1.
uF D = , (10.59)
Relation (10.59) is a special instance of (10.57) setting m = 0, ±0 = 1,
m = 1, β’1 = 1, β0 = ’1, β1 = 0.
The right side of (10.59) is called the forward ¬nite di¬erence and the
approximation that is being used corresponds to replacing f (xi ) with
the slope of the straight line passing through the points (xi , f (xi )) and
(xi+1 , f (xi+1 )), as shown in Figure 10.4.
To estimate the error that is made, it su¬ces to expand f in Taylor™s
series, obtaining

with ξi ∈ (xi , xi+1 ).
f (xi+1 ) = f (xi ) + hf (xi ) + f (ξi )
10.10 Approximation of Function Derivatives 443

We assume henceforth that f has the required regularity, so that

f (xi ) ’ uF D = ’ f (ξi ). (10.60)

f (xi )

f (xi’1 )
f (xi+1 )

xi’1 xi xi+1
FIGURE 10.4. Finite di¬erence approximation of f (xi ): backward (solid line),
forward (pointed line) and centred (dashed line)

Obviously, instead of (10.58) we could employ a centred incremental
ratio, obtaining the following approximation

f (xi+1 ) ’ f (xi’1 )
1 ¤ i ¤ n ’ 1.
uCD = , (10.61)

Scheme (10.61) is a special instance of (10.57) setting m = 0, ±0 = 1,
m = 1, β’1 = 1/2, β0 = 0, β1 = ’1/2.

The right side of (10.61) is called the centred ¬nite di¬erence and geometri-
cally amounts to replacing f (xi ) with the slope of the straight line passing
through the points (xi’1 , f (xi’1 )) and (xi+1 , f (xi+1 )) (see Figure 10.4).
Resorting again to Taylor™s series, we get

f (xi ) ’ uCD = ’ f (ξi ). (10.62)
Formula (10.61) thus provides a second-order approximation to f (xi ) with
respect to h.
Finally, with a similar procedure, we can derive a backward ¬nite di¬er-
ence scheme, where

f (xi ) ’ f (xi’1 )
1 ¤ i ¤ n,
uBD = , (10.63)
444 10. Orthogonal Polynomials in Approximation Theory

which is a¬ected by the following error
f (xi ) ’ uBD = f (ξi ). (10.64)
The values of the parameters in (10.57) are m = 0, ±0 = 1, m = 1 and
β’1 = 0, β0 = 1, β1 = ’1.
Higher-order schemes, as well as ¬nite di¬erence approximations of higher-
order derivatives of f , can be constructed using Taylor™s expansions of
higher order. A remarkable example is the approximation of f ; if f ∈
C 4 ([a, b]) we easily get

f (xi+1 ) ’ 2f (xi ) + f (xi’1 )
f (xi ) =
’ f (4) (xi + θi h) + f (4) (xi ’ ωi h) , 0 < θi , ωi < 1.
The following centred ¬nite di¬erence scheme can thus be derived

f (xi+1 ) ’ 2f (xi ) + f (xi’1 )
ui = , (10.65)

which is a¬ected by the error

f (xi ) ’ ui = ’ f (4) (xi + θi h) + f (4) (xi ’ ωi h) . (10.66)
Formula (10.65) provides a second-order approximation to f (xi ) with re-
spect to h.

10.10.2 Compact Finite Di¬erences
More accurate approximations are provided by using the following formula
(which we call compact di¬erences)

β γ
(fi+1 ’ fi’1 ) + (fi+2 ’ fi’2 )
±ui’1 + ui + ±ui+1 = (10.67)
2h 4h
for i = 2, . . . , n ’ 1. We have set, for brevity, fi = f (xi ).
The coe¬cients ±, β and γ are to be determined in such a way that the
relations (10.67) yield values ui that approximate f (xi ) up to the highest
order with respect to h. For this purpose, the coe¬cients are selected in
such a way as to minimize the consistency error (see Section 2.2)
(1) (1) (1)
’ ±fi+1
σi (h) = ±fi’1 + fi
β γ
’ (fi+1 ’ fi’1 ) + (fi+2 ’ fi’2 )
2h 4h
10.10 Approximation of Function Derivatives 445

which comes from “forcing” f to satisfy the numerical scheme (10.67). For
brevity, we set fi = f (k) (xi ), k = 1, 2, . . . .
Precisely, assuming that f ∈ C 5 ([a, b]) and expanding it in a Taylor™s
series around xi , we ¬nd
h2 (2) h3 (3) h4 (4) h5 (5)
fi±1 = fi ± hfi ± ± + O(h6 ),
+ 2 fi 6 fi + 24 fi 120 fi

h2 (3) h3 (4) h4 (5)
(1) (1) (2)
± hfi ± + O(h5 ).
fi±1 = fi + 2 fi 6 fi + 24 fi

Substituting into (10.68) we get

h2 (3) h4 (5)
(1) (1)
+ ± fi + ± fi ’ (β + γ)fi
σi (h) = (2± + 1)fi
2 12

h2 h4
β 2γ β
(3) (5)
’ ’ + 8γ fi + O(h6 ).
+ fi
2 6 3 60 2
Second-order methods are obtained by equating to zero the coe¬cient of
fi , i.e., if 2± + 1 = β + γ, while we obtain schemes of order 4 by equating
to zero also the coe¬cient of fi , yielding 6± = β +4γ and ¬nally, methods
of order 6 are obtained by setting to zero also the coe¬cient of fi , i.e.,
10± = β + 16γ.
The linear system formed by these last three equations has a nonsingular
matrix. Thus, there exists a unique scheme of order 6 that corresponds to
the following choice of the parameters

± = 1/3, β = 14/9, γ = 1/9, (10.69)

while there exist in¬nitely many methods of second and fourth order.
Among these in¬nite methods, a popular scheme has coe¬cients ± = 1/4,
β = 3/2 and γ = 0. Schemes of higher order can be generated at the
expense of furtherly expanding the computational stencil.
Traditional ¬nite di¬erence schemes correspond to setting ± = 0 and
allow for computing explicitly the approximant of the ¬rst derivative of f
at a node, in contrast with compact schemes which are required in any case
to solve a linear system of the form Au = Bf (where the notation has the
obvious meaning).
To make the system solvable, it is necessary to provide values to the vari-
ables ui with i < 0 and i > n. A particularly favorable instance is that
where f is a periodic function of period b ’ a, in which case ui+n = ui
for any i ∈ Z. In the nonperiodic case, system (10.67) must be supplied
by suitable relations at the nodes near the boundary of the approximation
interval. For example, the ¬rst derivative at x0 can be computed using the
u0 + ±u1 = (Af1 + Bf2 + Cf3 + Df4 ),
446 10. Orthogonal Polynomials in Approximation Theory

and requiring that
1 ’ ± + 6D
3 + ± + 2D
A=’ , B = 2 + 3D, C = ’ ,
2 2
in order for the scheme to be at least second-order accurate (see [Lel92]
for the relations to enforce in the case of higher-order methods). Finally,
we notice that, for any given order of accuracy, compact schemes have a
stencil smaller than the one of standard ¬nite di¬erences.

Program 91 provides an implementation of the compact ¬nite di¬erence
schemes (10.67) for the approximation of the derivative of a given function f
which is assumed to be periodic on the interval [a, b). The input parameters
alpha, beta and gamma contain the coe¬cients of the scheme, a and b are
the endpoints of the interval, f is a string containing the expression of f
and n denotes the number of subintervals in which [a, b] is partitioned. The
output vectors u and x contain the computed approximate values ui and
the node coordinates. Notice that setting alpha=gamma=0 and beta=1 we
recover the centered ¬nite di¬erence approximation (10.61).

Program 91 - compdi¬ : Compact di¬erence schemes
function [u, x] = compdi¬(alpha,beta,gamma,a,b,n,f)
h=(b-a)/(n+1); x=[a:h:b]; fx = eval(f);
A = eye(n+2)+alpha*diag(ones(n+1,1),1)+alpha*diag(ones(n+1,1),-1);
rhs = 0.5*beta/h*(fx(4:n+1)-fx(2:n-1))+0.25*gamma/h*(fx(5:n+2)-fx(1:n-2));
if gamma == 0
rhs=[0.5*beta/h*(fx(3)-fx(1)), rhs, 0.5*beta/h*(fx(n+2)-fx(n))];
A(1,1:n+2) = zeros(1,n+2);
A(1,1) = 1; A(1,2)=alpha; A(1,n+1)=alpha;
rhs=[0.5*beta/h*(fx(2)-fx(n+1)), rhs];
A(n+2,1:n+2) = zeros(1,n+2);
A(n+2,n+2) = 1; A(n+2,n+1)=alpha; A(n+2,2)=alpha;
rhs=[rhs, 0.5*beta/h*(fx(2)-fx(n+1))];
rhs=[0.5*beta/h*(fx(3)-fx(1))+0.25*gamma/h*(fx(4)-fx(n+1)), rhs];
A(1,1:n+2) = zeros(1,n+2);
A(1,1) = 1; A(1,2)=alpha; A(1,n+1)=alpha;
rhs=[0.5*beta/h*(fx(2)-fx(n+1))+0.25*gamma/h*(fx(3)-fx(n)), rhs];
A(n+2,1:n+2) = zeros(1,n+2);
A(n+2,n+2) = 1; A(n+2,n+1)=alpha; A(n+2,2)=alpha;
u = A \ rhs™;

Example 10.4 Let us consider the approximate evaluation of the derivative of
the function f (x) = sin(x) on the interval [0, 2π]. Figure 10.5 shows the loga-
10.10 Approximation of Function Derivatives 447

rithm of the maximum nodal errors for the second-order centered ¬nite di¬er-
ence scheme (10.61) and of the fourth and sixth-order compact di¬erence schemes

introduced above, as a function of p = log(n).






4 8 16 32 64

FIGURE 10.5. Maximum nodal errors for the second-order centred ¬nite di¬er-
ence scheme (solid line) and for the fourth (dashed line) and sixth-order (dotted
line) compact di¬erence schemes as functions of p = log(n)

Another nice feature of compact schemes is that they maximize the range
of well-resolved waves as we are going to explain. Assume that f is a real
and periodic function on [0, 2π], that is, f (0) = f (2π). Using the same
notation as in Section 10.9, we let N be an even positive integer and set
h = 2π/N . Then replace f by its truncated Fourier series

fk eikx .
fN (x) =

Since the function f is real-valued, fk = f ’k for k = 1, . . . , N/2 and
f0 = f 0 . For sake of convenience, introduce the normalized wave number
wk = kh = 2πk/N and perform a scaling of the coordinates setting s = x/h.
As a consequence, we get
N/2’1 N/2’1
— iksh
fk eiwk s .
fN (x(s)) = fk e = (10.70)
k=’N/2 k=’N/2

Taking the ¬rst derivative of (10.70) with respect to s yields a function
whose Fourier coe¬cients are fk = iwk fk . We can thus estimate the ap-

proximation error on (fN ) by comparing the exact coe¬cients fk with the
corresponding ones obtained by an approximate derivative, in particular,
by comparing the exact wave number wk with the approximate one, say
wk,app .
448 10. Orthogonal Polynomials in Approximation Theory

Let us neglect the subscript k and perform the comparison over the whole
interval [0, π) where wk is varying. It is clear that methods based on the
Fourier expansion have wapp = w if w = π (wapp = 0 if w = π). The family
of schemes (10.67) is instead characterized by the wave number
a sin(z) + (b/2) sin(2z) + (c/3) sin(3z)
z ∈ [0, π)
wapp (z) = ,
1 + 2± cos(z) + 2β cos(2z)
(see [Lel92]). Figure 10.6 displays a comparison among wave numbers of
several schemes, of compact and non compact type.
The range of values for which the wave number computed by the numer-
ical scheme adequately approximates the exact wave number, is the set of
well-resolved waves. As a consequence, if wmin is the smallest well-resolved
wave, the di¬erence 1 ’ wmin /π represents the fraction of waves that are
unresolved by the numerical scheme. As can be seen in Figure 10.6, the
standard ¬nite di¬erence schemes approximate correctly the exact wave
number only for small wave numbers.







0.5 1 1.5 2 2.5 3

FIGURE 10.6. Computed wave numbers for centred ¬nite di¬erences (10.61) (a)
and for compact schemes of fourth (b), sixth (c) and tenth (d) order, compared
with the exact wave number (the straight line). On the x axis the normalized
coordinate s is represented

10.10.3 Pseudo-Spectral Derivative
An alternative way for numerical di¬erentiation consists of approximating
the ¬rst derivative of a function f with the exact ¬rst derivative of the
polynomial Πn f interpolating f at the nodes {x0 , . . . , xn }.
Exactly as happens for Lagrange interpolation, using equally spaced
nodes does not yield stable approximations to the ¬rst derivative of f for
n large. For this reason, we limit ourselves to considering the case where
the nodes are nonuniformly distributed according to the Gauss-Lobatto-
Chebyshev formula.
10.10 Approximation of Function Derivatives 449

For simplicity, assume that I = [a, b] = [’1, 1] and for n ≥ 1, take in
I the Gauss-Lobatto-Chebyshev nodes as in (10.21). Then, consider the
Lagrange interpolating polynomial ΠGL f , introduced in Section 10.3. We
de¬ne the pseudo-spectral derivative of f ∈ C 0 (I) to be the derivative of
the polynomial ΠGL f

Dn f = (ΠGL f ) ∈ Pn’1 (I).

The error made in replacing f with Dn f is of exponential type, that is, it
only depends on the smoothness of the function f . More precisely, there
exists a constant C > 0 independent of n such that

f ’ Dn f ¤ Cn1’m f m,w , (10.71)

for any m ≥ 2 such that the norm f m,w , introduced in (10.23), is ¬nite.
Recalling (10.19) and using (10.27) yields
f (¯j )¯j (¯i ),
(Dn f )(¯i ) =
x xlx i = 0, . . . , n, (10.72)

so that the pseudo-spectral derivative at the interpolation nodes can be
computed knowing only the nodal values of f and of ¯j . These values can
be computed once for all and stored in a matrix D ∈ R(n+1)—(n+1) : Dij =
¯ (¯i ) for i, j = 0, ..., n, called a pseudo-spectral di¬erentiation matrix.
lj x
Relation (10.72) can thus be cast in matrix form as f = Df , letting
f = [f (¯i )] and f = [(Dn f )(¯i )] for i = 0, ..., n.
x x
The entries of D have the following explicit form (see [CHQZ88], p. 69)
± l+j
 dl (’1) ,

 d x ’x l = j,
 j ¯l ¯j

 ’¯j
 x
 1 ¤ l = j ¤ n ’ 1,
 ,
2(1 ’ x2 )
Dlj = (10.73)
 2n2 + 1

 , l = j = 0,

 6

 2n2 + 1

 , l = j = n,
where the coe¬cients dl have been de¬ned in Section 10.3 (see also Example
5.13 concerning the approximation of the multiple eigenvalue » = 0 of D).
To compute the pseudo-spectral derivative of a function f over the generic
interval [a, b], we only have to resort to the change of variables considered
in Remark 10.3.
The second-order pseudo-spectral derivative can be computed as the
product of the matrix D and the vector f , that is, f = Df , or by di-
rectly applying matrix D2 to the vector f .
450 10. Orthogonal Polynomials in Approximation Theory

10.11 Transforms and Their Applications
In this section we provide a short introduction to the most relevant integral
transforms and discuss their basic analytical and numerical properties.

10.11.1 The Fourier Transform
De¬nition 10.1 Let L1 (R) denote the space of real or complex functions
de¬ned on the real line such that

|f (t)| dt < +∞.

For any f ∈ L1 (R) its Fourier transform is a complex-valued function
F = F[f ] de¬ned as

f (t)e’i2πνt dt.
F (ν) =

Should the independent variable t denote time, then ν would have the
meaning of frequency. Thus, the Fourier transform is a mapping that to a
function of time (typically, real-valued) associates a complex-valued func-
tion of frequency.
The following result provides the conditions under which an inversion for-
mula exists that allows us to recover the function f from its Fourier trans-
form F (for the proof see [Rud83], p. 199).

Property 10.3 (inversion theorem) Let f be a given function in L1 (R),
F ∈ L1 (R) be its Fourier transform and g be the function de¬ned by

t ∈ R.
F (ν)ei2πνt dν,
g(t) = (10.74)

Then g ∈ C 0 (R), with lim|x|’∞ g(x) = 0, and f (t) = g(t) almost every-
where in R (i.e., for any t unless possibly a set of zero measure).
The integral at right hand side of (10.74) is to be meant in the Cauchy
principal value sense, i.e., we let
∞ a
F (ν)ei2πνt dν
F (ν)e dν = lim
’∞ ’a
10.11 Transforms and Their Applications 451

and we call it the inverse Fourier transform or inversion formula of the
Fourier transform. This mapping that associates to the complex function
F the generating function f will be denoted by F ’1 [F ], i.e., F = F[f ] i¬
f = F ’1 [F ].
Let us brie¬‚y summarize the main properties of the Fourier transform and
its inverse.

1. F and F ’1 are linear operators, i.e.

F[±f + βg] = ±F[f ] + βF[g], ∀±, β ∈ C,
F ’1 [±F + βG] = ±F ’1 [F ] + βF ’1 [G], ∀±, β ∈ C;

2. scaling: if ± is any nonzero real number and f± is the function f± (t) =
f (±t), then
F[f± ] = F1
|±| ±

where F ± (ν) = F (ν/±);

3. duality: let f (t) be a given function and F (ν) be its Fourier trans-
form. Then the function g(t) = F (’t) has a Fourier transform given
by f (ν). Thus, once an associated function-transform pair is found,
another dual pair is automatically generated. An application of this
property is provided by the pair r(t)-F[r] in Example 10.5;

4. parity: if f (t) is a real even function then F (ν) is real and even, while
if f (t) is a real and odd function then F (ν) is imaginary and odd.
This property allows one to work only with nonnegative frequencies;

5. convolution and product: for any given functions f, g ∈ L1 (R), we

F[f — g] = F[f ]F[g], F[f g] = F — G, (10.76)

where the convolution integral of two functions φ and ψ is given by

(φ — ψ)(t) = φ(„ )ψ(t ’ „ ) d„. (10.77)

Example 10.5 We provide two examples of the computation of the Fourier
transforms of functions that are typically encountered in signal processing.
Let us ¬rst consider the square wave (or rectangular) function r(t) de¬ned as

if ’ ¤t¤
A ,
2 2
r(t) =
0 otherwise,
452 10. Orthogonal Polynomials in Approximation Theory

where T and A are two given positive numbers. Its Fourier transform F[r] is the
T /2
sin(πνT )
Ae’i2πνt dt = AT ν∈R
F (ν) = ,
’T /2

where AT is the area of the rectangular function.
Let us consider the sawtooth function
 2At if ’ T ¤ t ¤ T
2 2
s(t) =

0 otherwise,

whose DFT is shown in Figure 10.3 and whose Fourier transform F[s] is the
AT sin(πνT )
cos(πνT ) ’ ν∈R
F (ν) = i ,
πνT πνT
and is purely imaginary since s is an odd real function. Notice also that the
functions r and s have a ¬nite support whereas their transforms have an in¬nite
support (see Figure 10.7). In signal theory this corresponds to saying that the

transform has an in¬nite bandwidth.
1 0.5


0.6 0.2




’0.4 ’0.5
’10 ’8 ’6 ’4 ’2 0 2 4 6 8 10 ’10 ’8 ’6 ’4 ’2 0 2 4 6 8 10

FIGURE 10.7. Fourier transforms of the rectangular (left) and the sawtooth
(right) functions

Example 10.6 The Fourier transform of a sinusoidal function is of paramount
interest in signal and communication systems. To start with, consider the constant
function f (t) = A for a given A ∈ R. Since it has an in¬nite time duration its
Fourier transform F[A] is the function
Ae’i2πνt dt = A lim
F (ν) = lim ,
a’∞ a’∞

where the integral above is again the Cauchy principal value of the corresponding
integral over (’∞, ∞). It can be proved that the limit exists and is unique in the
sense of distributions (see Section 12.4) yielding

F (ν) = Aδ(ν), (10.78)
10.11 Transforms and Their Applications 453

where δ is the so-called Dirac mass, i.e., a distribution that satis¬es

δ(ξ)φ(ξ) dξ = φ(0) (10.79)

for any function φ continuous at the origin. From (10.78) we see that the trans-
form of a function with in¬nite time duration has a ¬nite bandwidth.
Let us now consider the computation of the Fourier transform of the function
f (t) = A cos(2πν0 t) where ν0 is a ¬xed frequency. Recalling Euler™s formula

eiθ + e’iθ
θ ∈ R,
cos(θ) = ,
and applying (10.78) twice we get
F[A cos(2πν0 t)] = δ(ν ’ ν0 ) + δ(ν + ν0 ),
2 2
which shows that the spectrum of a sinusoidal function with frequency ν0 is
centred around ±ν0 (notice that the transform is even and real since the same

holds for the function f (t)).

It is worth noting that in real-life there do not exist functions (i.e. signals)
with in¬nite duration or bandwidth. Actually, if f (t) is a function whose
value may be considered as “negligible” outside of some interval (ta , tb ),
then we can assume that the e¬ective duration of f is the length ∆t =
tb ’ ta . In a similar manner, if F (ν) is the Fourier transform of f and
it happens that F (ν) may be considered as “negligible” outside of some
interval (νa , νb ), then the e¬ective bandwidth of f is ∆ν = νb ’νa . Referring
to Figure 10.7, we clearly see that the e¬ective bandwidth of the rectangular
function can be taken as (’10, 10).

10.11.2 (Physical) Linear Systems and Fourier Transform
Mathematically speaking, a physical linear system (LS) can be regarded
as a linear operator S that enjoys the linearity property (10.75). Denoting
by i(t) and u(t) an admissible input function for S and its corresponding
output function respectively, the LS can be represented as u(t) = S(i(t))
or S : i ’ u. A special category of LS are the so-called shift invariant (or
time-invariant) linear systems (ILS) which satisfy the property

S(i(t ’ t0 )) = u(t ’ t0 ), ∀t0 ∈ R

and for any admissible input function i.
Let S be an ILS system and let f and g be two admissible input functions
for S with w = S(g). An immediate consequence of the linearity and shift-
invariance is that

S((f — g)(t)) = (f — S(g))(t) = (f — w)(t) (10.80)
454 10. Orthogonal Polynomials in Approximation Theory

where — is the convolution operator de¬ned in (10.77).
Assume we take as input function the impulse function δ(t) introduced in
the previous section and denote by h(t) = S(δ(t)) the corresponding output
through S (usually referred to as the system impulse response function).
Property (10.79) implies that for any function φ, (φ — δ)(t) = φ(t), so that,
recalling (10.80) and taking φ(t) = i(t) we have

u(t) = S(i(t)) = S(i — δ)(t) = (i — S(δ))(t) = (i — h)(t).

Thus, S can be completely described through its impulse response function.
Equivalently, we can pass to the frequency domain by means of the ¬rst
relation in (10.76) obtaining

U (ν) = I(ν)H(ν), (10.81)

where I, U and H are the Fourier transforms of i(t), u(t) and h(t), respec-
tively; H is the so-called system transfer function.
Relation (10.81) plays a central role in the analysis of linear time-invariant
systems as it is simpler to deal with the system transfer function than with
the corresponding impulse response function, as demonstrated in the fol-
lowing example.

Example 10.7 (ideal low-pass ¬lter) An ideal low-pass ¬lter is an ILS char-
acterized by the transfer function

if |ν| ¤ ν0 /2,
H(ν) =
0, otherwise.

Using the duality property, the impulse response function F ’1 [H] is

sin(πν0 t)
h(t) = ν0 .
πν0 t

Given an input signal i(t) with Fourier transform I(ν), the corresponding output
u(t) has a spectrum given by (10.81)

if |ν| ¤ ν0 /2,
I(ν)H(ν) =
0 otherwise.

The e¬ect of the ¬lter is to cut o¬ the input frequencies that lie outside the
window |ν| ¤ ν0 /2. •

The input/output functions i(t) and u(t) usually denote signals and
the linear system described by H(ν) is typically a communication system.
Therefore, as pointed out at the end of Section 10.11.1, we are legitimated
in assuming that both i(t) and u(t) have a ¬nite e¬ective duration. In
10.11 Transforms and Their Applications 455

particular, referring to i(t) we suppose i(t) = 0 if t ∈ [0, T0 ). Then, the
computation of the Fourier transform of i(t) yields

i(t)e’i2πνt dt.
I(ν) =

Letting ∆t = T0 /n for n ≥ 1 and approximating the integral above by the
composite trapezoidal formula (9.14), we get
i(k∆t)e’i2πνk∆t .
I(ν) = ∆t

It can be proved (see, e.g., [Pap62]) that I(ν)/∆t is the Fourier transform
of the so-called sampled signal

i(k∆t)δ(t ’ k∆t),
is (t) =

where δ(t ’ k∆t) is the Dirac mass at k∆t. Then, using the convolution
and the duality properties of the Fourier transform, we get

˜ I ν’
I(ν) = , (10.82)

which amounts to replacing I(ν) by its periodic repetition with period
1/∆t. Let J∆t = [’ 2∆t , 2∆t ]; then, it su¬ces to compute (10.82) for ν ∈
1 1

J∆t . This can be done numerically by introducing a uniform discretization
of J∆t with frequency step ν0 = 1/(m∆t) for m ≥ 1. By doing so, the
computation of I(ν) requires evaluating the following m+1 discrete Fourier
transforms (DFT)
i(k∆t)e’i2πjν0 k∆t , j = ’ m , . . . , m .
I(jν0 ) = ∆t 2 2

For an e¬cient computation of each DFT in the formula above it is crucial
to use the FFT algorithm described in Section 10.9.2.

10.11.3 The Laplace Transform
The Laplace transform can be employed to solve ordinary di¬erential equa-
tions with constant coe¬cients as well as partial di¬erential equations.

De¬nition 10.2 Let f ∈ L1 ([0, ∞)) i.e., f ∈ L1 ([0, T ]) for any T > 0.
Let s = σ + iω be a complex variable. The Laplace integral of f is de¬ned
456 10. Orthogonal Polynomials in Approximation Theory

∞ T

f (t)e’st dt = lim f (t)e’st dt.
T ’∞
0 0

If this integral exists for some s, it turns out to be a function of s; then,
the Laplace transform L[f ] of f is the function

f (t)e’st dt.
L(s) =

The following relation between Laplace and Fourier transforms holds

L(s) = F (e’σt f (t)),

˜ ˜
where f (t) = f (t) if t ≥ 0 while f (t) = 0 if t < 0.

Example 10.8 The Laplace transform of the unit step function f (t) = 1 if t > 0,
f (t) = 0 otherwise, is given by

e’st dt =
L(s) = .

We notice that the Laplace integral exists if σ > 0.

In Example 10.8 the convergence region of the Laplace integral is the half-
plane {Re(s) > 0} of the complex ¬eld. This property is quite general, as
stated by the following result.

Property 10.4 If the Laplace transform exists for s = s then it exists
for all s with Re(s) > Re(¯). Moreover, let E be the set of the real parts
of s such that the Laplace integral exists and denote by » the in¬mum of
E. If » happens to be ¬nite, the Laplace integral exists in the half-plane
Re(s) > ». If » = ’∞ then it exists for all s ∈ C; » is called the abscissa
of convergence.

We recall that the Laplace transform enjoys properties completely analo-
gous to those of the Fourier transform. The inverse Laplace transform is
denoted formally as L’1 and is such that

f (t) = L’1 [L(s)].
10.11 Transforms and Their Applications 457

Example 10.9 Let us consider the ordinary di¬erential equation y (t) + ay(t) =
g(t) with y(0) = y0 . Multiplying by est , integrating between 0 and ∞ and passing
to the Laplace transform, yields

sY (s) ’ y0 + aY (s) = G(s). (10.83)

Should G(s) be easily computable, (10.83) would furnish Y (s) and then, by ap-
plying the inverse Laplace transform, the generating function y(t). For instance,
if g(t) is the unit step function, we obtain

11 1 y0 1
y(t) = L’1 (1 ’ e’at ) + y0 e’at .
’ + =
as s+a s+a a

For an extensive presentation and analysis of the Laplace transform see,
e.g., [Tit37]. In the next section we describe a discrete version of the Laplace
transform, known as the Z-transform.

10.11.4 The Z-Transform
De¬nition 10.3 Let f be a given function, de¬ned for any t ≥ 0, and
∆t > 0 be a given time step. The function

f (n∆t)z ’n , z∈C
Z(z) = (10.84)

is called the Z-transform of the sequence {f (n∆t)} and is denoted by
Z[f (n∆t)].

The parameter ∆t is the sampling time step of the sequence of samples
f (n∆t). The in¬nite sum (10.84) converges if

|z| > R = lim sup n |f (n∆t)|.

It is possible to deduce the Z-transform from the Laplace transform as
follows. Denoting by f0 (t) the piecewise constant function such that f0 (t) =
f (n∆t) for t ∈ (n∆t, (n + 1)∆t), the Laplace transform L[f0 ] of f0 is the
∞ (n+1)∆t

f0 (t)e’st dt = e’st f (n∆t) dt
L(s) =
n=0 n∆t
∞ ∞
e’ns∆t ’ e’(n+1)s∆t 1 ’ e’s∆t
f (n∆t)e’ns∆t .
= f (n∆t) =
s s
n=0 n=0
458 10. Orthogonal Polynomials in Approximation Theory

The discrete Laplace transform Z d [f0 ] of f0 is the function

f (n∆t)e’ns∆t .
Z (s) =

Then, the Z-transform of the sequence {f (n∆t), n = 0, . . . , ∞} coincides
with the discrete Laplace transform of f0 up to the change of variable
z = e’s∆t . The Z-transform enjoys similar properties (linearity, scaling,
convolution and product) to those already seen in the continuous case.
The inverse Z-transform is denoted by Z ’1 and is de¬ned as

f (n∆t) = Z ’1 [Z(z)].

The practical computation of Z ’1 can be carried out by resorting to classi-
cal techniques of complex analysis (for example, using the Laurent formula
or the Cauchy theorem for residual integral evaluation) coupled with an
extensive use of tables (see, e.g., [Pou96]).

10.12 The Wavelet Transform
This technique, originally developed in the area of signal processing, has
successively been extended to many di¬erent branches of approximation
theory, including the solution of di¬erential equations. It is based on the
so-called wavelets, which are functions generated by an elementary wavelet
through traslations and dilations. We shall limit ourselves to a brief intro-
duction of univariate wavelets and their transform in both the continuous
and discrete cases referring to [DL92], [Dau88] and to the references cited
therein for a detailed presentation and analysis.

10.12.1 The Continuous Wavelet Transform
Any function
hs,„ (t) = √ h t∈R
, (10.85)

that is obtained from a reference function h ∈ L2 (R) by means of traslations
by a traslation factor „ and dilations by a positive scaling factor s is called
a wavelet. The function h is called an elementary wavelet.
Its Fourier transform, written in terms of ω = 2πν, is

Hs,„ (ω) = sH(sω)e’iω„ , (10.86)

where i denotes the imaginary unit and H(ω) is the Fourier transform of
the elementary wavelet. A dilation t/s (s > 1) in the real domain produces
10.12 The Wavelet Transform 459

therefore a contraction sω in the frequency domain. Therefore, the factor
1/s plays the role of the frequency ν in the Fourier transform (see Section
10.11.1). In wavelets theory s is usually referred to as the scale. Formula
(10.86) is known as the ¬lter of the wavelet transform.

De¬nition 10.4 Given a function f ∈ L2 (R), its continuous wavelet trans-
form Wf = W[f ] is a decomposition of f (t) onto a wavelet basis {hs,„ (t)},
that is

Wf (s, „ ) = f (t)hs,„ (t) dt, (10.87)

where the overline bar denotes complex conjugate.
When t denotes the time-variable, the wavelet transform of f (t) is a func-
tion of the two variables s (scale) and „ (time shift); as such, it is a repre-
sentation of f in the time-scale space and is usually referred to as time-scale
joint representation of f . The time-scale representation is the analogue of
the time-frequency representation introduced in the Fourier analysis. This
latter representation has an intrinsic limitation: the product of the res-
olution in time ∆t and the resolution in frequency ∆ω must satisfy the
following constraint (Heisenberg inequality)
∆t∆ω ≥ (10.88)
which is the counterpart of the Heisenberg uncertainty principle in quantum
mechanics. This inequality states that a signal cannot be represented as
a point in the time-frequency space. We can only determine its position
within a rectangle of area ∆t∆ω in the time-frequency space.
The wavelet transform (10.87) can be rewritten in terms of the Fourier
transform F (ω) of f as
√ ∞
s ¯
F (ω)H(sω)eiω„ dω,
Wf (s, „ ) =


which shows that the wavelets transform is a bank of wavelet ¬lters char-
acterized by di¬erent scales. More precisely, if the scale is small the wavelet
is concentrated in time and the wavelet transform provides a detailed de-
scription of f (t) (which is the signal). Conversely, if the scale is large, the
wavelet transform is able to resolve only the large-scale details of f . Thus,
the wavelet transform can be regarded as a bank of multiresolution ¬lters.

The theoretical properties of this transform do not depend on the partic-
ular elementary wavelet that is considered. Hence, speci¬c bases of wavelets
can be derived for speci¬c applications. Some examples of elementary wave-
lets are reported below.
460 10. Orthogonal Polynomials in Approximation Theory

Example 10.10 (Haar wavelets) These functions can be obtained by choos-
ing as the elementary wavelet the Haar function de¬ned as
if x ∈ (0, 1 ),
1 2
’1 if x ∈ ( 1 , 1),
h(x) =
 2
0 otherwise.

Its Fourier transform is the complex-valued function
H(ω) = 4ie’iω/2 1 ’ cos( ) /ω,
which has symmetric module with respect to the origin (see Figure 10.8). The
bases that are obtained from this wavelet are not used in practice due to their

ine¬ective localization properties in the frequency domain.







’80 ’60 ’40 ’20 0 20 40 60 80
’0.5 0 0.5 1 1.5

FIGURE 10.8. The Haar wavelet (left) and the module of its Fourier transform

Example 10.11 (Morlet wavelets) The Morlet wavelet is de¬ned as follows
(see [MMG87])
h(x) = eiω0 x e’x /2

Thus, it is a complex-valued function whose real part has a real positive Fourier
transform, symmetric with respect to the origin, given by
√ 2 2
π e’(ω’ω0 ) /2 + e’(ω+ω0 ) /2 .
H(ω) =

We point out that the presence of the dilation factor allows for the wavelets
to easily handle possible discontinuities or singularities in f . Indeed, using
the multi-resolution analysis, the signal, properly divided into frequency
bandwidths, can be processed at each frequency by suitably tuning up the
scale factor of the wavelets.
10.12 The Wavelet Transform 461


0.8 1.4





’10 ’8 ’6 ’4 ’2 0 2 4 6 8 10
’10 ’8 ’6 ’4 ’2 0 2 4 6 8 10

FIGURE 10.9. The real part of the Morlet wavelet (left) and the real part of the
corresponding Fourier transforms (right) for ω0 = 1 (solid line), ω0 = 2.5 (dashed
line) and ω0 = 5 (dotted line)

Recalling what was already pointed out in Section 10.11.1, the time lo-
calization of the wavelet gives rise to a ¬lter with in¬nite bandwidth. In
particular, de¬ning the bandwidth ∆ω of the wavelet ¬lter as
«∞ 2

∆ω =  |H(ω)|2 dω  ,
ω 2 |H(ω)|2 dω/
’∞ ’∞

then the bandwidth of the wavelet ¬lter with scale equal to s is
«∞ 2

∆ωs =  ω 2 |H(sω)|2 dω/ |H(sω)|2 dω  = ∆ω.
’∞ ’∞

Consequently, the quality factor Q of the wavelet ¬lter, de¬ned as the in-
verse of the bandwidth of the ¬lter, is independent of s since
Q= = ∆ω
provided that (10.88) holds. At low frequencies, corresponding to large
values of s, the wavelet ¬lter has a small bandwidth and a large temporal
width (called window) with a low resolution. Conversely, at high frequencies
the ¬lter has a large bandwidth and a small temporal window with a high
resolution. Thus, the resolution furnished by the wavelet analysis increases
with the frequency of the signal. This property of adaptivity makes the
wavelets a crucial tool in the analysis of unsteady signals or signals with
fast transients for which the standard Fourier analysis turns out to be

10.12.2 Discrete and Orthonormal Wavelets
The continuous wavelet transform maps a function of one variable into a bi-
dimensional representation in the time-scale domain. In many applications
462 10. Orthogonal Polynomials in Approximation Theory

this description is excessively rich. Resorting to the discrete wavelets is
an attempt to represent a function using a ¬nite (and small) number of
A discrete wavelet is a continuous wavelet that is generated by using
discrete scale and translation factors. For s0 > 1, denote by s = sj the
scale factors; the dilation factors usually depend on the scale factors by
setting „ = k„0 sj , „0 ∈ R. The corresponding discrete wavelet is
’j/2 ’j/2
h(s’j (t ’ k„0 sj )) = s0 h(s’j t ’ k„0 ).
hj,k (t) = s0 0 0 0

The scale factor sj corresponds to the magni¬cation or the resolution of
the observation, while the translation factor „0 is the location where the
observations are made. If one looks at very small details, the magni¬cation
must be large, which corresponds to large negative index j. In this case the
step of translation is small and the wavelet is very concentrated around the
observation point. For large and positive j, the wavelet is spread out and
large translation steps are used.


. 18
( 26)