<<

. 13
( 26)



>>

in Section 7.2.7.
The criteria for selecting the parameter ±k , that have been discussed in
Section 7.2.3, can now be usefully employed to devise globally convergent
methods. Property 7.5 ensures that there exist values of ±k satisfying (7.31),
(7.33) or (7.31), (7.32).
Let us then assume that a sequence of iterates x(k) , generated by a
descent method for a given x(0) , converge to a vector x— . This vector will
not be, in general, a critical point for f . The following result gives some
conditions on the directions d(k) which ensure that the limit x— of the
sequence is also a critical point of f .

Property 7.7 (Convergence) Let f : Rn ’ R be a continuously di¬er-
entiable function, and assume that there exists L > 0 such that

∇f (x) ’ ∇f (y) ¤ L x ’ y 2.
2

Then, if x(k) is a sequence generated by a gradient-like method which
ful¬lls (7.31) and (7.33), then, one (and only one) of the following events
can occur:

1. ∇f (x(k) ) = 0 for some k;

2. lim f (x(k) ) = ’∞;
k’∞
7.2 Unconstrained Optimization 309

∇f (x(k) )T d(k)
3. lim = 0.
d(k) 2
k’∞

Thus, unless the pathological cases where the directions d(k) become too
large or too small with respect to ∇f (x(k) ) or, even, are orthogonal to
∇f (x(k) ), any limit of the sequence x(k) is a critical point of f .
The convergence result for the sequence x(k) can also be extended to the
sequence f (x(k) ). Indeed, the following result holds.

Property 7.8 Let x(k) be a convergent sequence generated by a gradient-
like method, i.e., such that any limit of the sequence is also a critical point
of f . If the sequence x(k) is bounded, then ∇f (x(k) ) tends to zero as
k ’ ∞.
For the proofs of the above results, see [Wol69] and [Wol71].


7.2.7 Secant-like methods
In quasi-Newton methods the Hessian matrix H is replaced by a suitable
approximation. Precisely, the generic iterate is

x(k+1) = x(k) ’ B’1 ∇f (x(k) ) = x(k) + s(k) .
k

Assume that f : Rn ’ R is of class C 2 on an open convex set D ‚ Rn .
In such a case, H is symmetric and, as a consequence, approximants Bk
of H ought to be symmetric. Moreover, if Bk were symmetric at a point
x(k) , we would also like the next approximant Bk+1 to be symmetric at
x(k+1) = x(k) + s(k) .
To generate Bk+1 starting from Bk , consider the Taylor expansion

∇f (x(k) ) = ∇f (x(k+1) ) + Bk+1 (x(k) ’ x(k+1) ),

from which we get

Bk+1 s(k) = y(k) , with y(k) = ∇f (x(k+1) ) ’ ∇f (x(k) ).

Using again a series expansion of B, we end up with the following ¬rst-order
approximation of H

(y(k) ’ Bk s(k) )cT
Bk+1 = Bk + , (7.42)
cT s(k)
where c ∈ Rn and having assumed that cT s(k) = 0. We notice that taking
c = s(k) yields Broyden™s method, already discussed in Section 7.1.4 in the
case of systems of nonlinear equations.
Since (7.42) does not guarantee that Bk+1 is symmetric, it must be
suitably modi¬ed. A way for constructing a symmetric approximant Bk+1
310 7. Nonlinear Systems and Numerical Optimization

consists of choosing c = y(k) ’ Bk s(k) in (7.42), assuming that (y(k) ’
Bk s(k) )T s(k) = 0. By so doing, the following symmetric ¬rst-order approx-
imation is obtained
(y(k) ’ Bk s(k) )(y(k) ’ Bk s(k) )T
Bk+1 = Bk + . (7.43)
(y(k) ’ Bk s(k) )T s(k)

From a computational standpoint, disposing of an approximation for H
is not completely satisfactory, since the inverse of the approximation of
H appears in the iterative methods that we are dealing with. Using the
Sherman-Morrison formula (3.57), with Ck = B’1 , yields the following
k
recursive formula for the computation of the inverse

(s(k) ’ Ck y(k) )(s(k) ’ Ck y(k) )T
Ck+1 = Ck + , k = 0, 1, . . . (7.44)
(s(k) ’ Ck y(k) )T y(k)

having assumed that y(k) = Bs(k) , where B is a symmetric nonsingular
matrix, and that (s(k) ’ Ck y(k) )T y(k) = 0.
An algorithm that employs the approximations (7.43) or (7.44), is po-
tentially unstable when (s(k) ’ Ck y(k) )T y(k) 0, due to rounding errors.
For this reason, it is convenient to set up the previous scheme in a more
stable form. To this end, instead of (7.42), we introduce the approximation

(y(k) ’ Bk s(k) )cT
(1)
Bk+1 = Bk + ,
cT s(k)
(2)
then, we de¬ne Bk+1 as being the symmetric part

(1) (1)
Bk+1 + (Bk+1 )T
(2)
Bk+1 = .
2
The procedure can be iterated as follows
(2j)
(y(k) ’ Bk+1 s(k) )cT
(2j+1) (2j)
Bk+1 = Bk+1 + ,
cT s
(7.45)
(2j+1) (2j+1)
(Bk+1 )T
Bk+1 +
(2j+2)
Bk+1 =
2
(0)
with k = 0, 1, . . . and having set Bk+1 = Bk . It can be shown that the limit
as j tends to in¬nity of (7.45) is

(y(k) ’ Bk s(k) )cT + c(y(k) ’ Bk s(k) )T
(j)
lim B = Bk+1 = Bk +
cT s(k)
j’∞
(7.46)
(y(k) ’ Bk s(k) )T s(k) T
’ cc ,
(cT s(k) )2
7.3 Constrained Optimization 311

having assumed that cT s(k) = 0. If c = s(k) , the method employing (7.46)
is known as the symmetric Powell-Broyden method. Denoting by BSP B
the corresponding matrix Bk+1 , it can be shown that BSP B is the unique
solution to the problem:
¯ ¯
¬nd B such that B ’ B is minimized,
F

¯
where Bs(k) = y(k) and · F is the Frobenius norm.
As for the error made approximating H(x(k+1) ) with BSP B , it can be proved
that

BSP B ’ H(x(k+1) ) ¤ Bk ’ H(x(k) ) + 3L s(k) ,
F F

where it is assumed that H is Lipschitz continuous, with Lipschitz constant
L, and that the iterates x(k+1) and x(k) belong to D.

To deal with the particular case in which the Hessian matrix is not only
symmetric but also positive de¬nite, we refer to [DS83], Section 9.2.



7.3 Constrained Optimization
The simplest case of constrained optimization can be formulated as follows.
Given f : Rn ’ R,

minimize f (x), with x ∈ „¦ ‚ Rn . (7.47)

More precisely, the point x— is said to be a global minimizer in „¦ if it
satis¬es (7.47), while it is a local minimizer if ∃R > 0 such that

f (x— ) ¤ f (x), ∀x ∈ B(x— ; R) ‚ „¦.

Existence of solutions to problem (7.47) is, for instance, ensured by the
Weierstrass theorem, in the case in which f is continuous and „¦ is a closed
and bounded set. Under the assumption that „¦ is a convex set, the following
optimality conditions hold.

Property 7.9 Let „¦ ‚ Rn be a convex set, x— ∈ „¦ and f ∈ C 1 (B(x— ; R)),
for a suitable R > 0. Then:

1. if x— is a local minimizer of f then

∇f (x— )T (x ’ x— ) ≥ 0, ∀x ∈ „¦; (7.48)

2. moreover, if f is convex on „¦ (see (7.21)) and (7.48) is satis¬ed, then
x— is a global minimizer of f .
312 7. Nonlinear Systems and Numerical Optimization

We recall that f : „¦ ’ R is a strongly convex function if ∃ρ > 0 such
that
f [±x + (1 ’ ±)y] ¤ ±f (x) + (1 ’ ±)f (y) ’ ±(1 ’ ±)ρ x ’ y 2 , (7.49)
2

∀x, y ∈ „¦ and ∀± ∈ [0, 1]. The following result holds.

Property 7.10 Let „¦ ‚ Rn be a closed and convex set and f be a strongly
convex function in „¦. Then there exists a unique local minimizer x— ∈ „¦.
Throughout this section, we refer to [Avr76], [Ber82], [CCP70], [Lue73] and
[Man69], for the proofs of the quoted results and further details.

A remarkable instance of (7.47) is the following problem: given f : Rn ’ R,
minimize f (x), under the constraint that h(x) = 0, (7.50)
where h : Rn ’ Rm , with m ¤ n, is a given function of components
h1 , . . . , hm . The analogues of critical points in problem (7.50) are called
the regular points.

De¬nition 7.2 A point x— ∈ Rn , such that h(x— ) = 0, is said to be
regular if the column vectors of the Jacobian matrix Jh (x— ) are linearly
independent, having assumed that hi ∈ C 1 (B(x— ; R)), for a suitable R > 0
and i = 1, . . . , m.
Our aim now is to convert problem (7.50) into an unconstrained minimiza-
tion problem of the form (7.2), to which the methods introduced in Section
7.2 can be applied.
For this purpose, we introduce the Lagrangian function L : Rn+m ’ R
L(x, ») = f (x) + »T h(x),
where the vector » is called the Lagrange multiplier. Moreover, let us de-
note by JL the Jacobian matrix associated with L, but where the partial
derivatives are only taken with respect to the variables x1 , . . . , xn . The link
between (7.2) and (7.50) is then expressed by the following result.

Property 7.11 Let x— be a local minimizer for (7.50) and suppose that,
for a suitable R > 0, f, hi ∈ C 1 (B(x— ; R)), for i = 1, . . . , m. Then there
exists a unique vector »— ∈ Rm such that JL (x— , »— ) = 0.
Conversely, assume that x— ∈ Rn satis¬es h(x— ) = 0 and that, for a
suitable R > 0 and i = 1, . . . , m, f, hi ∈ C 2 (B(x— ; R)). Let HL be the
matrix of entries ‚ 2 L/‚xi ‚xj for i, j = 1, . . . , n. If there exists a vector
»— ∈ Rm such that JL (x— , »— ) = 0 and
zT HL (x— , »— )z > 0 ∀z = 0, ∇h(x— )T z = 0,
with
then x— is a strict local minimizer of (7.50).
7.3 Constrained Optimization 313



The last class of problems that we are going to deal with includes the case
where inequality constraints are also present, i.e.: given f : Rn ’ R,

minimize f (x), under the constraint that h(x) = 0 and g(x) ¤ 0,(7.51)

where h : Rn ’ Rm , with m ¤ n, and g : Rn ’ Rr are two given functions.
It is understood that g(x) ¤ 0 means gi (x) ¤ 0 for i = 1, . . . , r. Inequality
constraints give rise to some extra formal complication with respect to the
case previously examined, but do not prevent converting the solution of
(7.51) into the minimization of a suitable Lagrangian function.
In particular, De¬nition 7.2 becomes

De¬nition 7.3 Assume that hi , gj ∈ C 1 (B(x— ; R)) for a suitable R >
0 with i = 1, . . . , m and j = 1, . . . , r, and denote by J (x— ) the set of
indices j such that gj (x— ) = 0. A point x— ∈ Rn such that h(x— ) = 0 and
g(x— ) ¤ 0 is said to be regular if the column vectors of the Jacobian matrix
Jh (x— ) together with the vectors ∇gj (x— ), j ∈ J (x— ) form a set of linearly
independent vectors.

Finally, an analogue of Property 7.11 holds, provided that the following
Lagrangian function is used

M(x, », µ) = f (x) + »T h(x) + µT g(x)

instead of L and that further assumptions on the constraints are made.
For the sake of simplicity, we report in this case only the following nec-
essary condition for optimality of problem (7.51) to hold.

Property 7.12 Let x— be a regular local minimizer for (7.51) and suppose
that, for a suitable R > 0, f, hi , gj ∈ C 1 (B(x— ; R)) with i = 1, . . . , m,
j = 1, . . . , r. Then, there exist only two vectors »— ∈ Rm and µ— ∈ Rr ,
such that JM (x— , »— , µ— ) = 0 with µ— ≥ 0 and µ— gj (x— ) = 0 ∀j = 1, . . . , r.
j j



7.3.1 Kuhn-Tucker Necessary Conditions for Nonlinear
Programming
In this section we recall some results, known as Kuhn-Tucker conditions
[KT51], that ensure in general the existence of a local solution for the non-
linear programming problem. Under suitable assumptions they also guar-
antee the existence of a global solution. Throughout this section we suppose
that a minimization problem can always be reformulated as a maximization
one.
314 7. Nonlinear Systems and Numerical Optimization

Let us consider the general nonlinear programming problem:

given f : Rn ’ R,
maximize f (x), subject to
gi (x) ¤ bi i = 1, . . . , l,
(7.52)
gi (x) ≥ bi i = l + 1, . . . , k,
gi (x) = bi i = k + 1, . . . , m,
x ≥ 0.

A vector x that satis¬es the constraints above is called a feasible solution of
(7.52) and the set of the feasible solutions is called the feasible region. We
assume henceforth that f, gi ∈ C 1 (Rn ), i = 1, . . . , m, and de¬ne the sets
I= = {i : gi (x— ) = bi }, I= = {i : gi (x— ) = bi }, J= = {i : x— = 0}, J> =
i
{i : x— > 0}, having denoted by x— a local maximizer of f . We associate
i
with (7.52) the following Lagrangian
m m+n
L(x, ») = f (x) + »i [bi ’ gi (x)] ’ »i xi’m .
i=1 i=m+1

The following result can be proved.

Property 7.13 (Kuhn-Tucker conditions I and II) If f has a con-
strained local maximum at the point x = x— , it is necessary that a vector
»— ∈ Rm+n exists such that (¬rst Kuhn-Tucker condition)

∇x L(x— , »— ) ¤ 0,

where strict equality holds for every component i ∈ J> . Moreover (second
Kuhn-Tucker condition)

(∇x L(x— , »— )) x— = 0.
T


The other two necessary Kuhn-Tucker conditions are as follows.

Property 7.14 Under the same hypothesis as in Property 7.13, the third
Kuhn-Tucker condition requires that:

∇» L(x— , »— ) ≥ 0 i = 1, . . . , l,
∇» L(x— , »— ) ¤ 0 i = l + 1, . . . , k,
∇» L(x— , »— ) = 0 i = k + 1, . . . , m.

Moreover (fourth Kuhn-Tucker condition)

(∇» L(x— , »— )) x— = 0.
T
7.3 Constrained Optimization 315

It is worth noticing that the Kuhn-Tucker conditions hold provided that
the vector »— exists. To ensure this, it is necessary to introduce a further
geometric condition that is known as constraint quali¬cation (see [Wal75],
p. 48).
We conclude this section by the following fundamental theorem which
establishes when the Kuhn-Tucker conditions become also su¬cient for the
existence of a global maximizer for f .

Property 7.15 Assume that the function f in (7.52) is a concave func-
tion (i.e., ’f is convex) in the feasible region. Suppose also that the point
(x— , »— ) satis¬es all the Kuhn-Tucker necessary conditions and that the
functions gi for which »— > 0 are convex while those for which »— < 0 are
i i
concave. Then f (x— ) is the constrained global maximizer of f for problem
(7.52).


7.3.2 The Penalty Method
The basic idea of this method is to eliminate, partly or completely, the
constraints in order to transform the constrained problem into an uncon-
strained one. This new problem is characterized by the presence of a pa-
rameter that yields a measure of the accuracy at which the constraint is
actually imposed.
Let us consider the constrained problem (7.50), assuming we are search-
ing for the solution x— only in „¦ ‚ Rn . Suppose that such a problem admits
at least one solution in „¦ and write it in the following penalized form

minimize L± (x) for x ∈ „¦, (7.53)

where
1
L± (x) = f (x) + ± h(x) 2 .2
2
The function L± : Rn ’ R is called the penalized Lagrangian, and ± is
called the penalty parameter. It is clear that if the constraint was exactly
satis¬ed then minimizing f would be equivalent to minimizing L± .
The penalty method is an iterative technique for solving (7.53).
For k = 0, 1, . . . , until convergence, one must solve the sequence of prob-
lems

minimize L±k (x) with x ∈ „¦, (7.54)

where {±k } is an increasing monotonically sequence of positive penalty
parameters, such that ±k ’ ∞ as k ’ ∞. As a consequence, after choosing
±k , at each step of the penalty process we have to solve a minimization
problem with respect to the variable x, leading to a sequence of values x— ,
k
solutions to (7.54). By doing so, the objective function L±k (x) tends to
in¬nity, unless h(x) is equal to zero.
316 7. Nonlinear Systems and Numerical Optimization

The minimization problems can then be solved by one of the methods
introduced in Section 7.2. The following property ensures the convergence
of the penalty method in the form (7.53).

Property 7.16 Assume that f : Rn ’ R and h : Rn ’ Rm , with m ¤ n,
are continuous functions on a closed set „¦ ‚ Rn and suppose that the
sequence of penalty parameters ±k > 0 is monotonically divergent. Finally,
let x— be the global minimizer of problem (7.54) at step k. Then, taking
k
the limit as k ’ ∞, the sequence x— converges to x— , which is a global
k
minimizer of f in „¦ and satis¬es the constraint h(x— ) = 0.
Regarding the selection of the parameters ±k , it can be shown that large
values of ±k make the minimization problem in (7.54) ill-conditioned, thus
making its solution quite prohibitive unless the initial guess is particularly
close to x— . On the other hand, the sequence ±k must not grow too slowly,
since this would negatively a¬ect the overall convergence of the method.
A choice that is commonly made in practice is to pick up a not too large
value of ±0 and then set ±k = β±k’1 for k > 0, where β is an integer
number between 4 and 10 (see [Ber82]). Finally, the starting point for the
numerical method used to solve the minimization problem (7.54) can be
set equal to the last computed iterate.
The penalty method is implemented in Program 63. This requires as
input parameters the functions f, h, an initial value alpha0 for the penalty
parameter and the number beta.

Program 63 - lagrpen : Penalty method
function [x,vinc,nit]=lagrpen(x0,alpha0,beta,f,h,toll)
x = x0; [r,c]=size(h); vinc = 0;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
norm2h=[™(™,h(1,1:c),™)ˆ2™];
for i=2:r, norm2h=[norm2h,™+(™,h(i,1:c),™)ˆ2™]; end
alpha = alpha0; options(1)=0; options(2)=toll*0.1; nit = 0;
while vinc > toll
g=[f,™+0.5*™,num2str(alpha,16),™*™,norm2h];
[x]=fmins(g,x,options);
vinc=0; nit = nit + 1;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
alpha=alpha*beta;
end


Example 7.6 Let us employ the penalty method to compute the minimizer of
f (x) = 100(x2 ’ x2 )2 + (1 ’ x1 )2 under the constraint h(x) = (x1 + 0.5)2 +
1
(x2 + 0.5)2 ’ 0.25 = 0. The crosses in Figure 7.3 denote the sequence of iterates
computed by Program 63 starting from x(0) = (1, 1)T and choosing ±0 = 0.1, β =
6. The method converges in 12 iterations to the value x = (’0.2463, ’0.0691)T ,
satisfying the constraint up to a tolerance of 10’4 . •
7.3 Constrained Optimization 317

2


1.5


1


0.5


0


’0.5


’1


’1.5


’2
’2 ’1.5 ’1 ’0.5 0 0.5 1 1.5 2


FIGURE 7.3. Convergence history of the penalty method in Example 7.6



7.3.3 The Method of Lagrange Multipliers
A variant of the penalty method makes use of (instead of L± (x) in (7.53))
the augmented Lagrangian function G± : Rm — Rn ’ R given by
1
G± (x, ») = f (x) + »T h(x) + ± h(x) 2 , (7.55)
2
2
where » ∈ Rm is a Lagrange multiplier. Clearly, if x— is a solution to prob-
lem (7.50), then it will also be a solution to (7.55), but with the advantage,
with respect to (7.53), of disposing of the further degree of freedom ». The
penalty method applied to (7.55) reads: for k = 0, 1, . . . , until convergence,
solve the sequence of problems

minimize G±k (x, »k ) for x ∈ „¦, (7.56)

where {»k } is a bounded sequence of unknown vectors in Rn , and the
parameters ±k are de¬ned as above (notice that if »k were zero, then we
would recover method (7.54)).
Property 7.16 also holds for method (7.56), provided that the multipliers
are assumed to be bounded. Notice that the existence of the minimizer
of (7.56) is not guaranteed, even in the case where f has a unique global
minimizer (see Example 7.7). This circumstance can be overcome by adding
further non quadratic terms to the augmented Lagrangian function (e.g.,
of the form h p , with p large).
2
318 7. Nonlinear Systems and Numerical Optimization

Example 7.7 Let us ¬nd the minimizer of f (x) = ’x4 under the constraint
x = 0. Such problem clearly admits the solution x— = 0. If, instead, one considers
the augmented Lagrangian function
1
L±k (x, »k ) = ’x4 + »k x + ±k x2 ,
2
one ¬nds that it no longer admits a minimum at x = 0, though vanishing there,

for any ±k di¬erent from zero.

As far as the choice of the multipliers is concerned, the sequence of vectors
»k is typically assigned by the following formula

»k+1 = »k + ±k h(x(k) ),

where »0 is a given value while the sequence of ±k can be set a priori or
modi¬ed during run-time.

As for the convergence properties of the method of Lagrange multipliers,
the following local result holds.

Property 7.17 Assume that x— is a regular strict local minimizer of (7.50)
and that:
1. f, hi ∈ C 2 (B(x— ; R)) with i = 1, . . . , m and for a suitable R > 0;
2. the pair (x— , »— ) satis¬es zT HG0 (x— , »— )z > 0, ∀z = 0 such that
Jh (x— )T z = 0;
3. ∃¯ > 0 such that HG± (x— , »— ) > 0.
± ¯


Then, there exist three positive scalars δ, γ and M such that, for any pair
(», ±) ∈ V = (», ±) ∈ Rm+1 : » ’ »— 2 < δ±, ± ≥ ± , the problem
¯

minimize G± (x, »), with x ∈ B(x— ; γ),

admits a unique solution x(», ±), di¬erentiable with respect to its argu-
ments. Moreover, ∀(», ±) ∈ V

¤ M » ’ »— 2 .
x(», ±) ’ x— 2

Under further assumptions (see [Ber82], Proposition 2.7), it can be proved
that the Lagrange multipliers method converges. Moreover, if ±k ’ ∞, as
k ’ ∞, then

»k+1 ’ »— 2
lim = 0.
»k ’ »— 2
k’∞

and the convergence of the method is more than linear. In the case where
the sequence ±k has an upper bound, the method converges linearly.
7.4 Applications 319

Finally, we notice that, unlike the penalty method, it is no longer nec-
essary that the sequence of ±k tends to in¬nity. This, in turn, limits the
ill-conditioning of problem (7.56) as ±k is growing. Another advantage con-
cerns the convergence rate of the method, which turns out to be indepen-
dent of the growth rate of the penalty parameter, in the case of the Lagrange
multipliers technique. This of course implies a considerable reduction of the
computational cost.
The method of Lagrange multipliers is implemented in Program 64. Com-
pared with Program 63, this further requires in input the initial value
lambda0 of the multiplier.

Program 64 - lagrmult : Method of Lagrange multipliers

function [x,vinc,nit]=lagrmult(x0,lambda0,alpha0,beta,f,h,toll)
x = x0; [r,c]=size(h); vinc = 0; lambda = lambda0;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
norm2h=[™(™,h(1,1:c),™)ˆ2™];
for i=2:r, norm2h=[norm2h,™+(™,h(i,1:c),™)ˆ2™]; end
alpha = alpha0; options(1)=0; options(2)=toll*0.1; nit = 0;
while vinc > toll
lh=[™(™,h(1,1:c),™)*™,num2str(lambda(1))];
for i=2:r, lh=[lh,™+(™,h(i,1:c),™)*™,num2str(lambda(i))];
end
g=[f,™+0.5*™,num2str(alpha,16),™*™,norm2h,™+™,lh];
[x]=fmins(g,x,options);
vinc=0; nit = nit + 1;
for i=1:r, vinc = max(vinc,eval(h(i,1:c))); end
alpha=alpha*beta;
for i=1:r, lambda(i)=lambda(i)+alpha*eval(h(i,1:c)); end
end



Example 7.8 We use the method of Lagrange multipliers to solve the prob-
lem presented in Example 7.6. Set » = 10 and leave the remaining parameters
unchanged. The method converges in 6 iterations and the crosses in Figure 7.4
show the iterates computed by Program 64. The constraint is here satis¬ed up

to machine precision.




7.4 Applications
The two applications of this section are concerned with nonlinear systems
arising in the simulation of the electric potential in a semiconductor device
and in the triangulation of a two-dimensional polygon.
320 7. Nonlinear Systems and Numerical Optimization


1




0.5




0




’0.5




’1
’1 ’0.5 0 0.5 1




FIGURE 7.4. Convergence history for the method of Lagrange multipliers in
Example 7.8


7.4.1 Solution of a Nonlinear System Arising from
Semiconductor Device Simulation
Let us consider the nonlinear system in the unknown u ∈ Rn

F(u) = Au + φ(u) ’ b = 0, (7.57)

where A = (»/h)2 tridiagn (’1, 2’1), for h = 1/(n+1), φi (u) = 2K sinh(ui )
for i = 1, . . . , n, where » and K are two positive constants and b ∈ Rn is
a given vector. Problem (7.57) arises in the numerical simulation of semi-
conductor devices in microelectronics, where u and b represent electric
potential and doping pro¬le, respectively.
In Figure 7.5 (left) we show schematically the particular device consid-
ered in the numerical example, a p ’ n junction diode of unit normalized
length, subject to an external bias V = Vb ’ Va , together with the doping
pro¬le of the device, normalized to 1 (right). Notice that bi = b(xi ), for
i = 1, . . . , n, where xi = ih. The mathematical model of the problem at
hand comprises a nonlinear Poisson equation for the electric potential and
two continuity equations of advection-di¬usion type, as those addressed in
Chapter 12, for the current densities. For the complete derivation of the
model and its analysis see, for instance, [Mar86] and [Jer96].
Solving system (7.57) corresponds to ¬nding the minimizer in Rn of the
function f : Rn ’ R de¬ned as
n
1
cosh(ui )) ’ bT u.
f (u) = uT Au + 2 (7.58)
2 i=1
7.4 Applications 321

b(x)
1

p n
x
0 L

’ ’1
+
∆V
FIGURE 7.5. Scheme of a semiconductor device (left); doping pro¬le (right)

It can be checked (see Exercise 5) that for any u, v ∈ Rn with u = v and
for any » ∈ (0, 1)

»f (u) + (1 ’ »)f (v) ’ f (»u + (1 ’ »)v) > (1/2)»(1 ’ ») u ’ v 2
A,

where · A denotes the energy norm introduced in (1.28). This implies
that f (u) is an uniformly convex function in Rn , that is, it strictly satis¬es
(7.49) with ρ = 1/2.
Property 7.10 ensures, in turn, that the function in (7.58) admits a unique
minimizer u— ∈ Rn and it can be shown (see Theorem 14.4.3, p. 503 [OR70])
that there exists a sequence {±k } such that the iterates of the damped
Newton method introduced in Section 7.2.6 converge to u— ∈ Rn (at least)
superlinearly.

Thus, using the damped Newton method for solving system (7.57) leads to
the following sequence of linearized problems:

given u(0) ∈ Rn , ∀k ≥ 0 solve
(k)
A + 2K diagn (cosh(ui )) δu(k) = b ’ Au(k) + φ(u(k) ) , (7.59)

then set u(k+1) = u(k) + ±k δu(k) .
Let us now address two possible choices of the acceleration parameters
±k . The ¬rst one has been proposed in [BR81] and is
1
±k = , k = 0, 1, . . . , (7.60)
F(u(k) )
1 + ρk
where · denotes a vector norm, for instance · = · ∞ , and the
coe¬cients ρk ≥ 0 are suitable acceleration parameters picked in such a
way that the descent condition F(u(k) + ±k δu(k) ) ∞ < F(u(k) ) ∞ is
satis¬ed (see [BR81] for the implementation details of the algorithm).
322 7. Nonlinear Systems and Numerical Optimization

We notice that, as F(u(k) ) ∞ ’ 0, (7.60) yields ±k ’ 1, thus recov-
ering the full (quadratic) convergence of Newton™s method. Otherwise, as
typically happens in the ¬rst iterations, F(u(k) ) ∞ 1 and ±k is quite
close to zero, with a strong reduction of the Newton variation (damping).
As an alternative to (7.60), the sequence {±k } can be generated using the
simpler formula, suggested in [Sel84], Chapter 7
±k = 2’i(i’1)/2 , k = 0, 1, . . . , (7.61)
where i is the ¬rst integer in the interval [1, Itmax ] such that the descent
condition above is satis¬ed, Itmax being the maximum admissible number
of damping cycles for any Newton™s iteration (¬xed equal to 10 in the
numerical experiments).
As a comparison, both damped and standard Newton™s methods have been
implemented, the former one with both choices (7.60) and (7.61) for the
coe¬cients ±k . In the case of Newton™s method, we have set in (7.59) ±k = 1
for any k ≥ 0.
The numerical examples have been performed with n = 49, bi = ’1 for
i ¤ n/2 and the remaining values bi equal to 1. Moreover, we have taken
»2 = 1.67 · 10’4 , K = 6.77 · 10’6 and ¬xed the ¬rst n/2 components of the
initial vector u(0) equal to Va and the remaining ones equal to Vb , where
Va = 0 and Vb = 10.
The tolerance on the maximum change between two successive iterates,
which monitors the convergence of damped Newton™s method (7.59), has
been set equal to 10’4 .
4
10 1

0.9
2
10 0.8

0.7
0
10 0.6
(1)
0.5
’2
10 0.4

0.3
(2)
’4
10 0.2

(3) 0.1
’6
10 0
0 1 2 0 2 4 6 8 10
10 10 10

FIGURE 7.6. Absolute error (left) and damping parameters ±k (right). The error
curve for standard Newton™s method is denoted by (1), while (2) and (3) refer to
damped Newton™s method with the choices (7.61) and (7.60) for the coe¬cients
±k , respectively

Figure 7.6 (left) shows the log-scale absolute error for the three algorithms
as functions of the iteration number. Notice the rapid convergence of the
7.4 Applications 323

damped Newton™s method (8 and 10 iterations in the case of (7.60) and
(7.61), respectively), compared with the extremely slow convergence of the
standard Newton™s method (192 iterations). Moreover, it is interesting to
analyze in Figure 7.6 (right) the plot of the sequences of parameters ±k as
functions of the iteration number.
The starred and the circled curves refer to the choices (7.60) and (7.61)
for the coe¬cients ±k , respectively. As previously observed, the ±k ™s start
from very small values, to converge quickly to 1 as the damped Newton
method (7.59) enters the attraction region of the minimizer x— .



7.4.2 Nonlinear Regularization of a Discretization Grid
In this section we go back to the problem of regularizing a discretization
grid that has been introduced in Section 3.14.2. There, we considered the
technique of barycentric regularization, which leads to solving a linear sys-
tem, typically of large size and featuring a sparse coe¬cient matrix.
In this section we address two alternative techniques, denoted as reg-
ularization by edges and by areas. The main di¬erence with respect to
the method described in Section 3.14.2 lies in the fact that these new ap-
proaches lead to systems of nonlinear equations.

Using the notation of Section 3.14.2, for each pair of nodes xj , xk ∈ Zi ,
denote by ljk the edge on the boundary ‚Pi of Pi which connects them
and by xjk the midpoint of ljk , while for each triangle T ∈ Pi we denote
by xb,T the centroid of T . Moreover, let ni = dim(Zi ) and denote for any
geometric entity (side or triangle) by | · | its measure in R1 or R2 .
In the case of regularization by edges, we let

« 

xi =  xjk |ljk | /|‚Pi |, ∀xi ∈ Nh , (7.62)
ljk ∈‚Pi



while in the case of regularization by areas, we let


xb,T |T | /|Pi |, ∀xi ∈ Nh .
xi = (7.63)
T ∈Pi


(‚D)
if xi ∈
In both the regularization procedures we assume that xi = xi
‚D, that is, the nodes lying on the boundary of the domain D are ¬xed. Let-
ting n = N ’Nb be the number of internal nodes, relation (7.62) amounts to
solving the following two systems of nonlinear equations for the coordinates
324 7. Nonlinear Systems and Numerical Optimization

{xi } and {yi } of the internal nodes, with i = 1, . . . , n
« 
1
xi ’  (xj + xk )|ljk | / |ljk | = 0,
2
« ljk ∈‚Pi  ljk ∈‚Pi (7.64)
1
yi ’  (yj + yk )|ljk | / |ljk | = 0.
2
ljk ∈‚Pi ljk ∈‚Pi

Similarly, (7.63) leads to the following nonlinear systems, for i = 1, . . . , n

1
xi ’ (x1,T + x2,T + x3,T )|T | / |T | = 0,
3
T ∈Pi T ∈Pi
(7.65)
1
yi ’ (y1,T + y2,T + y3,T )|T | / |T | = 0,
3
T ∈Pi T ∈Pi

where xs,T = (xs,T , ys,T ), for s = 1, 2, 3, are the coordinates of the vertices
of each triangle T ∈ Pi . Notice that the nonlinearity of systems (7.64) and
(7.65) is due to the presence of terms |ljk | and |T |.
Both systems (7.64) and (7.65) can be cast in the form (7.1), denoting,
as usual, by fi the i-th nonlinear equation of the system, for i = 1, . . . , n.
The complex functional dependence of fi on the unknowns makes it pro-
hibitive to use Newton™s method (7.4), which would require the explicit
computation of the Jacobian matrix JF .
A convenient alternative is provided by the nonlinear Gauss-Seidel method
(see [OR70], Chapter 7), which generalizes the corresponding method pro-
posed in Chapter 4 for linear systems and can be formulated as follows.
Denote by zi , for i = 1, . . . , n, either of the unknown xi or yi . Given the
(0) (0)
initial vector z(0) = (z1 , . . . , zn )T , for k = 0, 1, . . . until convergence,
solve
(k+1) (k+1) (k) (k)
fi (z1 , . . . , zi’1 , ξ, zi+1 , . . . , zn ) = 0, i = 1, . . . , n, (7.66)
(k+1)
then, set zi = ξ. Thus, the nonlinear Gauss-Seidel method converts
problem (7.1) into the successive solution of n scalar nonlinear equations.
In the case of system (7.64), each of these equations is linear in the unknown
(k+1)
zi (since ξ does not explicitly appear in the bracketed term at the right
side of (7.64)). This allows for its exact solution in one step.
In the case of system (7.65), the equation (7.66) is genuinely nonlinear
with respect to ξ, and is solved taking one step of a ¬xed-point iteration.
The nonlinear Gauss-Seidel (7.66) has been implemented in MATLAB
to solve systems (7.64) and (7.65) in the case of the initial triangulation
shown in Figure 7.7 (left). Such a triangulation covers the external region
of a two dimensional wing section of type NACA 2316. The grid contains
NT = 534 triangles and n = 198 internal nodes.
7.5 Exercises 325

The algorithm reached convergence in 42 iterations for both kinds of reg-
ularization, having used as stopping criterion the test z(k+1) ’ z(k) ∞ ¤
10’4 . In Figure 7.7 (right) the discretization grid obtained after the reg-
ularization by areas is shown (a similar result has been provided by the
regularization by edges). Notice the higher uniformity of the triangles with
respect to those of the starting grid.




FIGURE 7.7. Triangulation before (left) and after (right) the regularization




7.5 Exercises
1. Prove (7.8) for the m-step Newton-SOR method.
[Hint: use the SOR method for solving a linear system Ax=b with A=D-
E-F and express the k-th iterate as a function of the initial datum x(0) ,
obtaining

x(k+1) = x(0) + (Mk+1 ’ I)x(0) + (Mk + . . . + I)B’1 b,

where B= ω ’1 (D ’ ωE) and M = B’1 ω ’1 [(1 ’ ω)D + ωF ]. Since B’1 A =
I ’ M and

(I + . . . + Mk )(I ’ M) = I ’ Mk+1

then (7.8) follows by suitably identifying the matrix and the right-side of
the system.]
2. Prove that using the gradient method for minimizing f (x) = x2 with the
directions p(k) = ’1 and the parameters ±k = 2’k+1 , does not yield the
minimizer of f .
3. Show that for the steepest descent method applied to minimizing a quadratic
functional f of the form (7.35) the following inequality holds
2
»max ’ »min

(k+1)
f (x(k) ),
f (x
»max + »min
326 7. Nonlinear Systems and Numerical Optimization

where »max , »min are the eigenvalues of maximum and minimum module,
respectively, of the matrix A that appears in (7.35).
[Hint: proceed as done for (7.38).]
4. Check that the parameters ±k of Exercise 2 do not ful¬ll the conditions
(7.31) and (7.32).
5. Consider the function f : Rn ’ R introduced in (7.58) and check that it is
uniformly convex on Rn , that is

»f (u) + (1 ’ »)f (v) ’ f (»u + (1 ’ »)v) > (1/2)»(1 ’ ») u ’ v 2
A

for any u, v ∈ Rn with u = v and 0 < » < 1.
[Hint: notice that cosh(·) is a convex function.]
6. To solve the nonlinear system
± 1 1 1
 ’ cos x1 + x2 + sin x3 = x1

 2
 81 9 3
1 1
sin x1 + cos x3 = x2
3
 3

 ’ 1 cos x + 1 x + 1 sin x = x ,
1 2 3 3
9 3 6

use the ¬xed-point iteration x(n+1) = Ψ(x(n) ), where x = (x1 , x2 , x3 )T and
Ψ(x) is the left-hand side of the system. Analyze the convergence of the
iteration to compute the ¬xed point ± = (0, 1/3, 0)T .
[Solution: the ¬xed-point method is convergent since Ψ(±) ∞ = 1/2.]
7. Using Program 50 implementing Newton™s method, determine the global
maximizer of the function
x2 1
f (x) = e’ ’
2 cos(2x)
4
and analyze the performance of the method (input data: xv=1; toll=1e-6;
nmax=500). Solve the same problem using the following ¬xed-point iteration
®2 
x
e 2 (x sin(2x) + 2 cos(2x)) ’ 2 »
with g(x) = sin(2x) °
x(k+1) = g(xk ) .
2 (x sin(2x) + 2 cos(2x))

Analyze the performance of this second scheme, both theoretically and
experimentally, and compare the results obtained using the two methods.
[Solution: the function f has a global maximum at x = 0. This point is
a double zero for f . Thus, Newton™s method is only linearly convergent.
Conversely, the proposed ¬xed-point method is third-order convergent.]
8
Polynomial Interpolation




This chapter is addressed to the approximation of a function which is known
through its nodal values.
Precisely, given m+1 pairs (xi , yi ), the problem consists of ¬nding a func-
tion ¦ = ¦(x) such that ¦(xi ) = yi for i = 0, . . . , m, yi being some given
values, and say that ¦ interpolates {yi } at the nodes {xi }. We speak about
polynomial interpolation if ¦ is an algebraic polynomial, trigonometric ap-
proximation if ¦ is a trigonometric polynomial or piecewise polynomial
interpolation (or spline interpolation) if ¦ is only locally a polynomial.
The numbers yi may represent the values attained at the nodes xi by a
function f that is known in closed form, as well as experimental data. In the
former case, the approximation process aims at replacing f with a simpler
function to deal with, in particular in view of its numerical integration
or derivation. In the latter case, the primary goal of approximation is to
provide a compact representation of the available data, whose number is
often quite large.
Polynomial interpolation is addressed in Sections 8.1 and 8.2, while piece-
wise polynomial interpolation is introduced in Sections 8.3, 8.4 and 8.5. Fi-
nally, univariate and parametric splines are addressed in Sections 8.6 and
8.7. Interpolation processes based on trigonometric or algebraic orthogonal
polynomials will be considered in Chapter 10.
328 8. Polynomial Interpolation

8.1 Polynomial Interpolation
Let us consider n + 1 pairs (xi , yi ). The problem is to ¬nd a polynomial
Πm ∈ Pm , called an interpolating polynomial, such that

Πm (xi ) = am xm + . . . + a1 xi + a0 = yi i = 0, . . . , n. (8.1)
i

The points xi are called interpolation nodes. If n = m the problem is over
or under-determined and will be addressed in Section 10.7.1. If n = m, the
following result holds.

Theorem 8.1 Given n+1 distinct points x0 , . . . , xn and n+1 correspond-
ing values y0 , . . . , yn , there exists a unique polynomial Πn ∈ Pn such that
Πn (xi ) = yi for i = 0, . . . , n.
Proof. To prove existence, let us use a constructive approach, providing an
expression for Πn . Denoting by {li }n a basis for Pn , then Πn admits a repre-
i=0
n
sentation on such a basis of the form Πn (x) = i=0 bi li (x) with the property
that
n
Πn (xi ) = bj lj (xi ) = yi , i = 0, . . . , n. (8.2)
j=0


If we de¬ne
n
x ’ xj
li ∈ Pn : li (x) = i = 0, . . . , n, (8.3)
xi ’ xj
j=0
j=i


then li (xj ) = δij and we immediately get from (8.2) that bi = yi .
The polynomials {li , i = 0, . . . , n} form a basis for Pn (see Exercise 1). As a con-
sequence, the interpolating polynomial exists and has the following form (called
Lagrange form)
n
Πn (x) = yi li (x). (8.4)
i=0

To prove uniqueness, suppose that another interpolating polynomial Ψm of de-
gree m ¤ n exists, such that Ψm (xi ) = yi for i = 0, ..., n. Then, the di¬erence
polynomial Πn ’ Ψm vanishes at n + 1 distinct points xi and thus coincides with
the null polynomial. Therefore, Ψm = Πn .
An alternative approach to prove existence and uniqueness of Πn is provided
3
in Exercise 2.

It can be checked that (see Exercise 3)
n
ωn+1 (x)
Πn (x) = yi (8.5)
(x ’ xi )ωn+1 (xi )
i=0
8.1 Polynomial Interpolation 329

where ωn+1 is the nodal polynomial of degree n + 1 de¬ned as
n
(x ’ xi ).
ωn+1 (x) = (8.6)
i=0

Formula (8.4) is called the Lagrange form of the interpolating polynomial,
while the polynomials li (x) are the characteristic polynomials. In Figure
8.1 we show the characteristic polynomials l2 (x), l3 (x) and l4 (x), in the
case of degree n = 6, on the interval [-1,1] where equally spaced nodes are
taken, including the end points.

1.5

l2 l3 l4
1


0.5


0


’0.5


’1


’1.5
’1 ’0.5 0 0.5 1



FIGURE 8.1. Lagrange characteristic polynomials

Notice that |li (x)| can be greater than 1 within the interpolation interval.
If yi = f (xi ) for i = 0, . . . , n, f being a given function, the interpolating
polynomial Πn (x) will be denoted by Πn f (x).


8.1.1 The Interpolation Error
In this section we estimate the interpolation error that is made when re-
placing a given function f with its interpolating polynomial Πn f at the
nodes x0 , x1 , . . . , xn (for further results, we refer the reader to [Wen66],
[Dav63]).

Theorem 8.2 Let x0 , x1 , . . . , xn be n+1 distinct nodes and let x be a point
belonging to the domain of a given function f . Assume that f ∈ C n+1 (Ix ),
where Ix is the smallest interval containing the nodes x0 , x1 , . . . , xn and x.
Then the interpolation error at the point x is given by

f (n+1) (ξ)
En (x) = f (x) ’ Πn f (x) = ωn+1 (x), (8.7)
(n + 1)!
where ξ ∈ Ix and ωn+1 is the nodal polynomial of degree n + 1.
330 8. Polynomial Interpolation

Proof. The result is obviously true if x coincides with any of the interpola-
tion nodes. Otherwise, de¬ne, for any t ∈ Ix , the function G(t) = En (t) ’
ωn+1 (t)En (x)/ωn+1 (x). Since f ∈ C (n+1) (Ix ) and ωn+1 is a polynomial, then
G ∈ C (n+1) (Ix ) and it has n + 2 distinct zeros in Ix , since

G(xi ) = En (xi ) ’ ωn+1 (xi )En (x)/ωn+1 (x) = 0, i = 0, . . . , n
G(x) = En (x) ’ ωn+1 (x)En (x)/ωn+1 (x) = 0.

Then, thanks to the mean value theorem, G has n + 1 distinct zeros and, by
recursion, G(j) admits n + 2 ’ j distinct zeros. As a consequence, G(n+1) has a
(n+1)
(t) = f (n+1) (t)
unique zero, which we denote by ξ. On the other hand, since En
(n+1)
and ωn+1 (x) = (n + 1)! we get

(n + 1)!
G(n+1) (t) = f (n+1) (t) ’ En (x),
ωn+1 (x)

3
which, evaluated at t = ξ, gives the desired expression for En (x).


8.1.2 Drawbacks of Polynomial Interpolation on Equally
Spaced Nodes and Runge™s Counterexample
In this section we analyze the behavior of the interpolation error (8.7) as
n tends to in¬nity. For this purpose, for any function f ∈ C 0 ([a, b]), de¬ne
its maximum norm

= max |f (x)|.
f (8.8)

x∈[a,b]


Then, let us introduce a lower triangular matrix X of in¬nite size, called the
interpolation matrix on [a, b], whose entries xij , for i, j = 0, 1, . . . , represent
points of [a, b], with the assumption that on each row the entries are all
distinct.
Thus, for any n ≥ 0, the n + 1-th row of X contains n + 1 distinct
values that we can identify as nodes, so that, for a given function f , we
can uniquely de¬ne an interpolating polynomial Πn f of degree n at those
nodes (any polynomial Πn f depends on X, as well as on f ).
Having ¬xed f and an interpolation matrix X, let us de¬ne the interpo-
lation error

En,∞ (X) = f ’ Πn f ∞, n = 0, 1, . . . (8.9)

Next, denote by p— ∈ Pn the best approximation polynomial, for which
n

En = f ’ p—

¤ f ’ qn ∀qn ∈ Pn .
∞ ∞
n

The following comparison result holds (for the proof, see [Riv74]).
8.1 Polynomial Interpolation 331

Property 8.1 Let f ∈ C 0 ([a, b]) and X be an interpolation matrix on [a, b].
Then

En,∞ (X) ¤ En (1 + Λn (X)) , n = 0, 1, . . . (8.10)
where Λn (X) denotes the Lebesgue constant of X, de¬ned as
n
(n)
|lj |
Λn (X) = , (8.11)
j=0


(n)
∈ Pn is the j-th characteristic polynomial associated with
and where lj
(n)
the n + 1-th row of X, that is, satisfying lj (xnk ) = δjk , j, k = 0, 1, . . .

Since En does not depend on X, all the information concerning the e¬ects
of X on En,∞ (X) must be looked for in Λn (X). Although there exists an
interpolation matrix X— such that Λn (X) is minimized, it is not in general a
simple task to determine its entries explicitly. We shall see in Section 10.3,
that the zeros of the Chebyshev polynomials provide on the interval [’1, 1]
an interpolation matrix with a very small value of the Lebesgue constant.
On the other hand, for any possible choice of X, there exists a constant
C > 0 such that (see [Erd61])
2
log(n + 1) ’ C,
Λn (X) > n = 0, 1, . . .
π
This property shows that Λn (X) ’ ∞ as n ’ ∞. This fact has important
consequences: in particular, it can be proved (see [Fab14]) that, given an
interpolation matrix X on an interval [a, b], there always exists a continuous
function f in [a, b], such that Πn f does not converge uniformly (that is, in
the maximum norm) to f . Thus, polynomial interpolation does not allow for
approximating any continuous function, as demonstrated by the following
example.

Example 8.1 (Runge™s counterexample) Suppose we approximate the fol-
lowing function
1
’5 ¤ x ¤ 5
f (x) = , (8.12)
1 + x2
using Lagrange interpolation on equally spaced nodes. It can be checked that
some points x exist within the interpolation interval such that
lim |f (x) ’ Πn f (x)| = 0.
n’∞

In particular, Lagrange interpolation diverges for |x| > 3.63 . . . . This phenomenon
is particularly evident in the neighborhood of the end points of the interpolation
interval, as shown in Figure 8.2, and is due to the choice of equally spaced nodes.
We shall see in Chapter 10 that resorting to suitably chosen nodes will allow for
uniform convergence of the interpolating polynomial to the function f to hold. •
332 8. Polynomial Interpolation

2




1.5




1




0.5




0




’0.5
’5 ’4 ’3 ’2 ’1 0 1 2 3 4 5



FIGURE 8.2. Lagrange interpolation on equally spaced nodes for the function
f (x) = 1/(1 + x2 ): the interpolating polynomials Π5 f and Π10 f are shown in
dotted and dashed line, respectively

8.1.3 Stability of Polynomial Interpolation
Let us consider a set of function values f (xi ) which is a perturbation
of the data f (xi ) relative to the nodes xi , with i = 0, . . . , n, in an interval
[a, b]. The perturbation may be due, for instance, to the e¬ect of rounding
errors, or may be caused by an error in the experimental measure of the
data.
Denoting by Πn f the interpolating polynomial on the set of values f (xi ),
we have
n
Π n f ’ Πn f (f (xj ) ’ f (xj ))lj (x)
= max

a¤x¤b
j=0
¤ Λn (X) max |f (xi ) ’ f (xi )|.
i=0,...,n

As a consequence, small changes on the data give rise to small changes
on the interpolating polynomial only if the Lebesgue constant is small.
This constant plays the role of the condition number for the interpolation
problem.
As previously noticed, Λn grows as n ’ ∞ and in particular, in the case
of Lagrange interpolation on equally spaced nodes, it can be proved that
(see [Nat65])
2n+1
Λn (X)
en log n
where e 2.7183 is the naeperian number. This shows that, for n large,
this form of interpolation can become unstable. Notice also that so far we
have completely neglected the errors generated by the interpolation process
in constructing Πn f . However, it can be shown that the e¬ect of such errors
is generally negligible (see [Atk89]).
8.2 Newton Form of the Interpolating Polynomial 333

2.5


2


1.5


1


0.5


0


’0.5


’1


’1.5
’1 ’0.5 0 0.5 1


FIGURE 8.3. Instability of Lagrange interpolation. In solid line Π21 f , on unper-
turbed data, in dashed line Π21 f , on perturbed data, for Example 8.2

Example 8.2 On the interval [’1, 1] let us interpolate the function f (x) =
sin(2πx) at 22 equally spaced nodes xi . Next, we generate a perturbed set of val-
ues f (xi ) of the function evaluations f (xi ) = sin(2πxi ) with maxi=0,...,21 |f (xi ) ’
9.5 · 10’4 . In Figure 8.3 we compare the polynomials Π21 f and Π21 f :
f (xi )|
notice how the di¬erence between the two interpolating polynomials, around the
end points of the interpolation interval, is quite larger than the impressed per-
turbation (actually, Π21 f ’ Π21 f ∞ 2.1635 and Λ21 24000). •



8.2 Newton Form of the Interpolating Polynomial
The Lagrange form (8.4) of the interpolating polynomial is not the most
convenient from a practical standpoint. In this section we introduce an
alternative form characterized by a cheaper computational cost. Our goal
is the following:
given n + 1 pairs {xi , yi }, i = 0, . . . , n, we want to represent Πn (with
Πn (xi ) = yi for i = 0, . . . , n) as the sum of Πn’1 (with Πn’1 (xi ) = yi for
i = 0, . . . , n ’ 1) and a polynomial of degree n which depends on the nodes
xi and on only one unknown coe¬cient. We thus set
Πn (x) = Πn’1 (x) + qn (x), (8.13)
where qn ∈ Pn . Since qn (xi ) = Πn (xi ) ’ Πn’1 (xi ) = 0 for i = 0, . . . , n ’ 1,
it must necessarily be that
qn (x) = an (x ’ x0 ) . . . (x ’ xn’1 ) = an ωn (x).
334 8. Polynomial Interpolation

To determine the unknown coe¬cient an , suppose that yi = f (xi ), i =
0, . . . , n, where f is a suitable function, not necessarily known in explicit
form. Since Πn f (xn ) = f (xn ), from (8.13) it follows that

f (xn ) ’ Πn’1 f (xn )
an = . (8.14)
ωn (xn )

The coe¬cient an is called n-th the Newton divided di¬erence and is gen-
erally denoted by

an = f [x0 , x1 , . . . , xn ] (8.15)

for n ≥ 1. As a consequence, (8.13) becomes

Πn f (x) = Πn’1 f (x) + ωn (x)f [x0 , x1 , . . . , xn ]. (8.16)

If we let y0 = f (x0 ) = f [x0 ] and ω0 = 1, by recursion on n we can obtain
from (8.16) the following formula
n
Πn f (x) = ωk (x)f [x0 , . . . , xk ]. (8.17)
k=0

Uniqueness of the interpolating polynomial ensures that the above expres-
sion yields the same interpolating polynomial generated by the Lagrange
form. Form (8.17) is commonly known as the Newton divided di¬erence
formula for the interpolating polynomial.

Program 65 provides an implementation of Newton™s formula. The input
vectors x and y contain the interpolation nodes and the corresponding func-
tional evaluations of f , respectively, while vector z contains the abscissae
where the polynomial Πn f is to be evaluated. This polynomial is stored in
the output vector f.

Program 65 - interpol : Lagrange polynomial using Newton™s formula
function [f] = interpol (x,y,z)
[m n] = size(y);
for j = 1:m
a (:,1) = y (j,:)™;
for i = 2:n
a (i:n,i) = ( a(i:n,i-1)-a(i-1,i-1) )./(x(i:n)-x(i-1))™;
end
f(j,:) = a(n,n).*(z-x(n-1)) + a(n-1,n-1);
for i = 2:n-1
f(j,:) = f(j,:).*(z-x(n-i))+a(n-i,n-i);
end
end
8.2 Newton Form of the Interpolating Polynomial 335

8.2.1 Some Properties of Newton Divided Di¬erences
The n-th divided di¬erence f [x0 , . . . , xn ] = an can be further characterized
by noticing that it is the coe¬cient of xn in Πn f . Isolating such a coe¬cient
from (8.5) and equating it with the corresponding coe¬cient in the Newton
formula (8.17), we end up with the following explicit representation
n
f (xi )
f [x0 , . . . , xn ] = . (8.18)
ωn+1 (xi )
i=0

This formula has remarkable consequences:

1. the value attained by the divided di¬erence is invariant with respect
to permutations of the indexes of the nodes. This instance can be
pro¬tably employed when stability problems suggest exchanging the
indexes (for example, if x is the point where the polynomial must be
computed, it is convenient to introduce a permutation of the indexes
such that |x ’ xk | ¤ |x ’ xk’1 | with k = 1, . . . , n);

2. if f = ±g + βh for some ±, β ∈ R, then

f [x0 , . . . , xn ] = ±g[x0 , . . . , xn ] + βh[x0 , . . . , xn ];

3. if f = gh, the following formula (called the Leibniz formula) holds
(see [Die93])
n
f [x0 , . . . , xn ] = g[x0 , . . . , xj ]h[xj , . . . , xn ];
j=0


4. an algebraic manipulation of (8.18) (see Exercise 7) yields the follow-
ing recursive formula for computing divided di¬erences

f [x1 , . . . , xn ] ’ f [x0 , . . . , xn’1 ]
n ≥ 1. (8.19)
f [x0 , . . . , xn ] = ,
xn ’ x0

Program 66 implements the recursive formula (8.19). The evaluations of f
at the interpolation nodes x are stored in vector y, while the output matrix
d (lower triangular) contains the divided di¬erences, which are stored in
the following form

x0 f [x0 ]
x1 f [x1 ] f [x0 , x1 ]
x2 f [x2 ] f [x1 , x2 ] f [x0 , x1 , x2 ]
. . . ..
. . . .
. . .
xn f [xn ] f [xn’1 , xn ] f [xn’2 , xn’1 , xn ] . . . f [x0 , . . . , xn ]
336 8. Polynomial Interpolation

The coe¬cients involved in the Newton formula are the diagonal entries of
the matrix.

Program 66 - dividif : Newton divided di¬erences

function [d]=dividif(x,y)
[n,m]=size(y);
if n == 1, n = m; end
n = n-1; d = zeros (n+1,n+1); d (:,1) = y™;
for j = 2:n+1
for i = j:n+1
d (i,j) = ( d (i-1,j-1)-d (i,j-1))/(x (i-j+1)-x (i));
end
end


Using (8.19), n(n + 1) sums and n(n + 1)/2 divisions are needed to gen-
erate the whole matrix. If a new evaluation of f were available at a new
node xn+1 , only the calculation of a new row of the matrix would be re-
quired (f [xn , xn+1 ], . . . , f [x0 , x1 , . . . , xn+1 ]). Thus, in order to construct
Πn+1 f from Πn f , it su¬ces to add to Πn f the term an+1 ωn+1 (x), with a
computational cost of (n + 1) divisions and 2(n + 1) sums. For the sake of
notational simplicity, we write below Dr fi = f [xi , xi+1 , . . . , xr ].


Example 8.3 In Table 8.1 we show the divided di¬erences on the interval (0,2)
for the function f (x) = 1+sin(3x). The values of f and the corresponding divided
di¬erences have been computed using 16 signi¬cant ¬gures, although only the ¬rst
5 ¬gures are reported. If the value of f were available at node x = 0.2, updating
the divided di¬erence table would require only to computing the entries denoted

by italics in Table 8.1.



D2 fi D 3 fi D4 fi D 5 fi D6 fi
xi f (xi ) f [xi , xi’1 ]
0 1.0000
0.2 1.5646 2.82

<<

. 13
( 26)



>>