<<

. 5
( 9)



>>

Gradient Norm




0
10
4
10
5
10



10 3
10 10
0 5 10 15 20 0 5 10 15 20
Iterations Iterations


Figure 4.2: BFGS“Armijo for Discrete Control Problem




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BFGS METHOD 85

and allocated 50 vectors to Algorithm bfgsopt, there was no longer a bene¬t to using the
good Hessian. In fact, as is clear from Figure 4.3 the poor Hessian produced a more rapidly
convergent iteration.

Better Hessian Better Hessian
10 8
10 10


5
10




Function Value
Gradient Norm




6
10
0
10
4
10
5
10


10 2
10 10
0 50 100 150 0 50 100 150
Iterations Iterations

Poor Hessian Poor Hessian
10 8
10 10


5
10
Function Value
Gradient Norm




6
10
0
10
4
10
5
10


10 2
10 10
0 20 40 60 80 0 20 40 60 80
Iterations Iterations


Figure 4.3: BFGS“Armijo for Discrete Control Problem: Poor Initial Iterate



4.5 Exercises on BFGS
4.5.1. Use the secant method (4.2) with initial data x’1 = 1 and x0 = .9 to minimize f (x) = x4 .
Explain the convergence of the iteration.

4.5.2. Prove Lemma 4.1.1. It might help to use the secant equation.

4.5.3. Prove Lemma 4.1.4.

4.5.4. Verify (4.13) and compute the constant C.

4.5.5. Prove (4.25).

4.5.6. As an exercise in character building, implement Algorithm bfgsrec nonrecursively.

4.5.7. Show how the Sherman“Morrison formula can be used to implement the SR1 update in
such a way that only one vector need be stored for each iterate.

4.5.8. State and prove a local convergence theorem for DFP and/or PSB.

4.5.9. Implement the DFP and PSB update and compare their performance with BFGS on the
examples from §4.4.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




86 ITERATIVE METHODS FOR OPTIMIZATION

4.5.10. Show that, for positive de¬nite quadratic problems, the BFGS method with an exact line
search (i.e., one that ¬nds the minimum of f in the search direction) is the same as CG
[201], [200].
4.5.11. Prove Theorem 4.3.2.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 5

Simple Bound Constraints

5.1 Problem Statement
The goal of this chapter is to show how the techniques of Chapters 2, 3, and 4 can be used to
solve a simple constrained optimization problem. The algorithm we suggest at the end in §5.5.3
is a useful extension of the BFGS“Armijo algorithm from Chapter 4. We will continue this line
of development when we solve noisy problems in Chapter 7.
Let {Li }N and {Ui }N be sequences of real numbers such that
i=1 i=1

’∞ < Li < Ui < +∞.
(5.1)

The bound constrained optimization problem is to ¬nd a local minimizer x— of a function f of
N variables subject to the constraint that

x— ∈ „¦ = {x ∈ RN | Li ¤ (x)i ¤ Ui }.
(5.2)

By this we mean that x— satis¬es

f (x— ) ¤ f (x) for all x ∈ „¦ near x— .
(5.3)

It is standard to express this problem as

min f (x)
(5.4)
x∈„¦

or as min„¦ f . The set „¦ is called the feasible set and a point in „¦ is called a feasible point.
Because the set „¦ is compact there is always a solution to our minimization problem [229].
The inequalities Li ¤ (x)i ¤ Ui are called inequality constraints or simply constraints.
We will say that the ith constraint is active at x ∈ „¦ if either (x)i = Li or (x)i = Ui . If the
ith constraint is not active we will say that it is inactive. The set of indices i such that the ith
constraint is active (inactive) will be called the set of active (inactive) indices at x.
We will write A(x) and I(x) for the active and inactive sets at x.


5.2 Necessary Conditions for Optimality
For a continuously differentiable function of one variable, the necessary conditions for uncon-
strained optimality at x— are simply f (x— ) = 0 and, if f is twice continuously differentiable,
f (x— ) ≥ 0. A bound constrained problem in one variable restricts the domain of f to an
interval [a, b], and the necessary conditions must be changed to admit the possibility that the

87
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




88 ITERATIVE METHODS FOR OPTIMIZATION

minimizer is one of the endpoints. If x— = a is a local minimizer, then it need not be the case
that f (a) = 0; however, because a is a local minimizer, f (x) ≥ f (a) for all a ¤ x suf¬ciently
near a. Hence f (a) ≥ 0. Nothing, however, can be said about f . Similarly, if x— = b is a
local minimizer, f (b) ¤ 0. If f is differentiable on [a, b] (i.e., on an open set containing [a, b]),
then the necessary conditions for all three possibilities, x— = a, x— = b, and a < x— < b can be
neatly expressed by the following theorem.
Theorem 5.2.1. Let f be a continuously differentiable function of one variable on the
interval [a, b]. Let x— be a local minimum of f on [a, b]. Then

f (x— )(x ’ x— ) ≥ 0 for all x ∈ [a, b]
(5.5)

and, if f is twice continuously differentiable on [a, b],

f (x— )(x— ’ a)(b ’ x— ) ≥ 0.
(5.6)


The analogue (5.5) is expressed by the idea of stationarity.
De¬nition 5.2.1. A point x— ∈ „¦ is stationary for problem (5.4) if

∇f (x— )T (x ’ x— ) ≥ 0 for all x ∈ „¦.
(5.7)


As in the unconstrained case, stationary points are said to satisfy the ¬rst-order necessary
conditions.
The fact that optimality implies stationarity is proved with Taylor™s theorem just as it was in
the unconstrained case.
Theorem 5.2.2. Let f be continuously differentiable on „¦ and let x— be a solution of problem
(5.4). Then x— is a stationary point for problem (5.4).
Proof. Let x— be a solution of problem (5.4) and let y ∈ „¦. As „¦ is convex, the line segment
joining x— and y is entirely in „¦. Hence, the function

φ(t) = f (x— + t(y ’ x— ))

is de¬ned for t ∈ [0, 1] and has a local minimum at t = 0. Therefore, by Theorem 5.2.1

0 ¤ φ (0) = ∇f (x— )T (y ’ x— )

as asserted.
The case of a function of a single variable is less useful in explaining the role of the second
derivative. However, we can get a complete picture by looking at functions of two variables.
To illustrate the ideas we let N = 2 and let f be a twice Lipschitz continuously differentiable
function on „¦ = [0, 1] — [0, 1]. If x— is a solution of (5.4) and no constraints are active, then
∇2 f (x— ) is positive semide¬nite by the same arguments used in the unconstrained case. If one
or more constraints are active, however, then, just as in the one variable case, one cannot draw
conclusions about the positivity of ∇2 f (x— ). Suppose the minimizer is at x— = (ξ, 0) with
0 < ξ < 1. While nothing can be said about ‚ 2 f (x— )/‚x2 , the function φ(t) = f (t, 0), de¬ned
2
on [0, 1], must satisfy
φ (ξ) = ‚ 2 f (x— )/‚x2 ≥ 0.
1

Hence, second partials in directions corresponding to the inactive constraints must be nonnega-
tive, while nothing can be said about directions corresponding to active constraints.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 89

We de¬ne the reduced Hessian to help make this idea precise.
De¬nition 5.2.2. Let f be twice differentiable at x ∈ „¦. The reduced Hessian ∇2 f (x) is
R
the matrix whose entries are
±
 δij if i ∈ A(x) or j ∈ A(x),
2
(∇R f (x))ij =
(5.8)

(∇2 f (x))ij otherwise.


We can now present the second-order necessary conditions.
Theorem 5.2.3. Let f be twice Lipschitz continuously differentiable and let x— be the solution
of problem (5.4). Then the reduced Hessian ∇2 f (x— ) is positive semide¬nite.
R

Proof. Assume that there are M inactive indices and N ’ M active indices. We partition
x ∈ „¦, reordering the variables if needed, into x = (ξ, ζ) with ξ corresponding to the inactive
indices and ζ to the active. The map

φ(ξ) = f (ξ, ζ — )

has an unconstrained local minimizer at ξ — ∈ RM and hence ∇2 φ is positive semide¬nite. Since
the reduced Hessian can be written as
∇2 φ(x— ) 0
∇2 f (x— ) =
R 0 I

if the variables are partitioned in this way, the proof is complete.
We let P denote the projection onto „¦, that is, the map that takes x into the nearest point (in
the l2 -norm) in „¦ to x. We have that
±
 Li if (x)i ¤ Li ,
(x)i if Li < (x)i < Ui ,
P(x)i =
(5.9)

Ui if (x)i ≥ Ui .

Theorem 5.2.4 states our ¬nal necessary condition; we defer the proof to §5.4.4.
Theorem 5.2.4. Let f be continuously differentiable. A point x— ∈ „¦ is stationary for
problem (5.4) if and only if
x— = P(x— ’ »∇f (x— ))
(5.10)
for all » ≥ 0.


5.3 Suf¬cient Conditions
With the de¬nition of the reduced Hessian in hand, the suf¬cient conditions are easy to formulate.
We begin by strengthening the notion of stationarity. If x— is stationary, i ∈ I(x— ), and ei is a
unit vector in the ith coordinate direction, then x— ± tei ∈ „¦ for all t suf¬ciently small. Since
df (x— ± tei )
= ±∇f (x— )T ei ≥ 0,
dt
therefore
(∇f (x— ))i = 0 for all i ∈ I(x— ).
We will use the concept of nondegenerate stationary point or strict complementarity in our
formulation of the suf¬cient conditions.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




90 ITERATIVE METHODS FOR OPTIMIZATION

De¬nition 5.3.1. A point x— ∈ „¦ is a nondegenerate stationary point for problem (5.4) if
x— is a stationary point and

(∇f (x— ))i = 0 for all i ∈ A(x— ).
(5.11)

If x— is also a solution of problem (5.4) we say that x— is a nondegenerate local minimizer.
Our nondegeneracy condition is also referred to as strict complementarity.
If S is any set of indices de¬ne
±
 (x)i , i ∈ S,
(PS x)i =

0, i ∈ S.

Nondegeneracy is important not only in the formulation of suf¬cient conditions but also in
the design of termination criteria. The ¬rst step in the use of nondegeneracy is Lemma 5.3.1.
Lemma 5.3.1. Let x— be a nondegenerate stationary point. Assume that A = A(x— ) is not
empty. Then there is σ such that

∇f (x— )T (x ’ x— ) = ∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— )

for all x ∈ „¦.
Proof. If i ∈ A then nondegeneracy and stationarity imply that there is σ > 0 such that
either
(x— ) = Li and (∇f (x— ))i ≥ σ or (x— ) = Ui and (∇f (x— ))i ¤ ’σ.
i i

If x ∈ „¦ then for all i ∈ A,

(∇f (x— ))i (x ’ x— )i ≥ σ|(x ’ x— )i |.

Therefore, since x ≥ x 2,
1

∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— ) ,

as asserted.
For a nondegenerate stationary point the suf¬ciency conditions are very similar to the un-
constrained case.
Theorem 5.3.2. Let x— ∈ „¦ be a nondegenerate stationary point for problem (5.4). Let
f be twice differentiable in a neighborhood of x— and assume that the reduced Hessian at x—
is positive de¬nite. Then x— is a solution of problem (5.4) (and hence a nondegenerate local
minimizer).
Proof. Let x ∈ „¦ and de¬ne φ(t) = f (x— + t(x ’ x— )). We complete the proof by showing
that either (i) φ (0) > 0 or (ii) φ (0) = 0, φ (0) > 0. Let e = x ’ x— and note that

φ (0) = ∇f (x— )T e = ∇f (x— )T (PA e + PI e).

Stationarity implies that ∇f (x— )T PI e = 0. If PA e = 0 then nondegeneracy implies that

∇f (x— )T PA e > 0

and hence (i) holds. If PA e = 0 then

φ (0) = (x ’ x— )T PI ∇2 f (x— )PI (x ’ x— ) = (x ’ x— )T ∇2 f (x— )(x ’ x— ) > 0,
R

proving (ii).


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 91

5.4 The Gradient Projection Algorithm
The gradient projection algorithm is the natural extension of the steepest descent algorithm to
bound constrained problems. It shares all the advantages and disadvantages of that algorithm.
Our approach follows that of [18]. Given a current iterate xc the new iterate is

x+ = P(xc ’ »∇f (xc )),

where » is a steplength parameter given by the Armijo rule or some other line search scheme.
In this section we will restrict our attention to the simplest form of the Armijo rule. In order to
implement any line search scheme, we must specify what we mean by suf¬cient decrease. For
» > 0 de¬ne
x(») = P(x ’ »∇f (x)).
(5.12)
For bound constrained problems we will express the suf¬cient decrease condition for line searches
(compare with (3.4)) as
’±
x ’ x(») 2 .
f (x(»)) ’ f (x) ¤
(5.13)
»
As with (3.4), ± is a parameter and is typically set to 10’4 [84].
The general algorithmic description follows in Algorithm 5.4.1.
Algorithm 5.4.1. gradproj(x, f, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination.
(b) Find the least integer m such that (5.13) holds for » = β m .
(c) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

The next step is to elaborate on the termination criterion.


5.4.1 Termination of the Iteration
The termination criterion for unconstrained optimization that we have used previously must be
modi¬ed if we are to properly take the constraints into account. ∇f need not be zero at the
solution, but a natural substitute is to terminate the iteration if the difference between x and x(1)
is small. As in the case of unconstrained optimization or nonlinear equations, we must invoke
the suf¬cient conditions to show that such a termination criterion will accurately measure the
error.
As usual, we let e = x ’ x— .
We begin with a lemma that connects the active and inactive sets at a nondegenerate local
minimizer with nearby points.
Lemma 5.4.1. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate
stationary point for problem (5.4). Let » ∈ (0, 1]. Then for x suf¬ciently near x— ,

1. A(x) ‚ A(x— ) and (x)i = (x— )i for all i ∈ A(x).

2. A(x(»)) = A(x— ) and (x(»))i = (x— )i for all i ∈ A(x— ).


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




92 ITERATIVE METHODS FOR OPTIMIZATION

Proof. Let
A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).
Let
δ1 = min {(Ui ’ (x— )i ), ((x— )i ’ Li ), (Ui ’ Li )/2}.
—i∈I

If i ∈ I — and e < δ1 then Li < (x)i < Ui . Hence,

I— ‚ I

proving the ¬rst assertion that A ‚ A— . Moreover, since

e < δ1 ¤ min {(Ui ’ Li )/2} ,

then (x)i = (x— )i for all i ∈ A.
Now let A» and I » be the active and inactive sets for x(») = P(x ’ »∇f (x)). Let i ∈ A— .
By Lemma 5.3.1 and continuity of ∇f there is δ2 such that if e < δ2 then

(∇f (x— + e))i (x ’ x— )i > σ|x ’ x— |i /2.

Therefore, if
e < δ3 < min(σ/2, δ2 ),
then i ∈ A» and (x(»))i = (x— )i . Hence A— ‚ A» .
It remains to prove that A» ‚ A— . By de¬nition of P we have

P(x) ’ P(y) ¤ x ’ y

for all x, y ∈ RN . Continuity of ∇2 f implies that ∇f is Lipschitz continuous. We let L denote
the Lipschitz constant of ∇f in „¦. By stationarity and Theorem 5.2.4,

x— = x— (») = P(x— ’ »∇f (x— )),

and, therefore,

x— ’ x(») = P(x— ’ »∇f (x— )) ’ P(x ’ »∇f (x))
(5.14)
¤ e + » ∇f (x— ) ’ ∇f (x) ¤ (1 + L») e .

If there is i ∈ A» © I — then we must have

x— ’ x(») ≥ δ1 = min {(Ui ’ x— ), (x— ’ Li )}.
(5.15) — i∈I

However, if
e < δ4 = min(δ3 , δ1 /(1 + L))
then (5.14) implies that (5.15) cannot hold. This completes the proof.
Theorem 5.4.2. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate
stationary point for problem (5.4). Assume that suf¬cient conditions hold at x— . Then there are
δ and M such that if e < δ and A(x) = A(x— ) then

e /M ¤ x ’ x(1) ¤ M e .
(5.16)




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 93

Proof. Again we let L denote the Lipschitz constant of ∇f in „¦ and

A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).

Using stationarity we obtain

= e ’ (x(1) ’ x— (1))
x ’ x(1)

¤ e + P(x ’ ∇f (x)) ’ P(x— ’ ∇f (x— ))

¤ 2 e + ∇f (x) ’ ∇f (x— ) ¤ (2 + L) e .
Hence, the right inequality in (5.16) holds.
To verify the left inequality in (5.16) we apply Lemma 5.4.1. Let δ1 be such that e < δ1
implies that the conclusions of Lemma 5.4.1 hold for » = 1. The lemma implies that
±
 (∇f (x))i , i ∈ I — ,
(x ’ x(1))i =
(5.17)

(e)i = 0, i ∈ A— .

The remaining case is if i ∈ I = I — . The suf¬ciency conditions imply that there is µ > 0
such that
uT PI — ∇2 f (x— )PI — u ≥ µ PI — u 2
for all u ∈ RN . Hence, there is δ2 so that if e < δ2 then

uT PI — ∇2 f (x)PI — u ≥ µ PI — u 2 /2

for all u ∈ RN .
Therefore, since e = PI — e,
1
2
eT PI — ∇2 f (x— + te)e dt
PI — (x ’ x(1)) =
0

1
eT PI — ∇2 f (x— + te)PI — e dt
=
0

≥ µ PI e 2 /2.



Therefore, x ’ x(1) ≥ min(1, µ/2) e and setting M = max{2 + L, 1, 2/µ} completes
the proof.
Following the unconstrained case, we formulate a termination criterion based on relative and
absolute reductions in the measure of stationarity x ’ x(1) . Given r0 = x0 ’ x0 (1) and
relative and absolute tolerances „r and „a the termination criterion for Algorithm gradproj is

x ’ x(1) ¤ „a + „r r0 .
(5.18)

5.4.2 Convergence Analysis
The convergence analysis is more complicated than that for the steepest descent algorithm be-
cause of the care that must be taken with the constraints. Our analysis begins with several
preliminary lemmas.
Lemma 5.4.3. For all x, y ∈ „¦,

(y ’ x(»))T (x(») ’ x + »∇f (x)) ≥ 0.
(5.19)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




94 ITERATIVE METHODS FOR OPTIMIZATION



Proof. By de¬nition of P

x(») ’ x + »∇f (x) ¤ y ’ x + »∇f (x)

for all y ∈ „¦. Hence t = 0 is a local minimum for

φ(t) = (1 ’ t)x(») + ty ’ x + »∇f (x) 2 /2

and, therefore,
0 ¤ φ (0) = (y ’ x(»))T (x(») ’ x + »∇f (x))
as asserted.
We will most often use the equivalent form of (5.19)

(x ’ x(»))T (y ’ x(»)) ¤ »∇f (x)T (y ’ x(»)).
(5.20)

Setting y = x in (5.20), we state Corollary 5.4.4.
Corollary 5.4.4. For all x ∈ „¦ and » ≥ 0,
2
¤ »∇f (x)T (x ’ x(»)).
x ’ x(»)
(5.21)



An important result in any line search analysis is that the steplengths remain bounded away
from 0.
Theorem 5.4.5. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let
x ∈ „¦. Then the suf¬cient decrease condition (5.13) holds for all » such that

2(1 ’ ±)
0<»¤ .
(5.22)
L


Proof. We begin with the fundamental theorem of calculus. Setting y = x ’ x(») we have
1
∇f (x ’ ty)T y dt.
f (x ’ y) ’ f (x) = f (x(»)) ’ f (x) = ’
0

Hence,
= f (x) + ∇f (x)T (x(») ’ x)
f (x(»))
(5.23) 1
(∇f (x ’ ty) ’ ∇f (x))T y dt.

0

Rearranging terms in (5.23) gives

»(f (x) ’ f (x(»))) = »∇f (x)T (x ’ x(»)) + »E,
(5.24)

where
1
(∇f (x ’ ty) ’ ∇f (x))T y dt
E=
0
and hence
E ¤ L x ’ x(») 2 /2.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 95

So
»(f (x) ’ f (x(»))) ≥ »∇f (x)T (x ’ x(»)) ’ »L x ’ x(») 2 /2.
(5.25)
Therefore, using Corollary 5.4.4 we obtain
2
»(f (x) ’ f (x(»))) ≥ (1 ’ »L/2) x ’ x(»)

which completes the proof.
The consequence for the Armijo rule is that the line search will terminate when
2(1 ’ ±)
βm ¤ ¤ β m’1
L
if not before. Hence, a lower bound for the steplengths is

¯ 2β(1 ’ ±) .
»=
(5.26)
L

Theorem 5.4.6. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let
{xn } be the sequence generated by the gradient projection method. Then every limit point of
the sequence is a stationary point.
Proof. Since the sequence {f (xn )} is decreasing and f is bounded from below on „¦, f (xn )
has a limit f — . The suf¬cient decrease condition, as in the proof of Theorem 3.2.4, and (5.26)
imply that
2
xn ’ xn+1 ¤ »(f (xn ) ’ f (xn+1 ))/± ¤ (f (xn ) ’ f (xn+1 ))/± ’ 0

as n ’ ∞.
Now let y ∈ „¦ and n ≥ 0. By (5.20) we have

∇f (xn )T (xn ’ y) = ∇f (xn )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 )

¤ »’1 (xn ’ xn+1 )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 ).
n

Therefore, by (5.26),

∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ),
n
(5.27)
¯
∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ).

If xnl ’ x— is a convergence subsequence of {xn }, then we may take limits in (5.27) as
l ’ ∞ and complete the proof.

5.4.3 Identi¬cation of the Active Set
The gradient projection iteration has the remarkable property that if it converges to a nondegen-
erate local minimizer, then the active set An of xn is the same as A— after only ¬nitely many
iterations.
Theorem 5.4.7. Assume that f is Lipschitz continuously differentiable and that the gradient
projection iterates {xn } converge to a nondegenerate local minimizer x— . Then there is n0 such
that A(xn ) = A(x— ) for all n ≥ n0 .
¯
Proof. Let » be the lower bound for the steplength. Let δ be such that the conclusions of
¯ ¯
Lemma 5.4.1 hold for » = » (and hence for all » ≥ »). Let n0 be such that en < δ for all
n ≥ n0 ’ 1 and the proof is complete.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




96 ITERATIVE METHODS FOR OPTIMIZATION

5.4.4 A Proof of Theorem 5.2.4
We close this section with a proof of Theorem 5.2.4. We de¬ne a nonsmooth function

F (x) = x ’ P(x ’ ∇f (x)).
(5.28)

Using (5.12),
F (x) = x ’ x(1).
We now prove Theorem 5.2.4.
Proof. Corollary 5.4.4 states that

x— ’ x— (») 2
¤ »∇f (x— )T (x— ’ x— (»)).

If we set x = x— (») in the de¬nition of stationarity (5.7) we have

∇f (x— )T (x— ’ x— (»)) ¤ 0

and hence x— = x— (»).
Conversely assume that x— = x— (») for all » > 0. This implies that x— is left invariant by
the gradient projection iteration and is therefore a stationary point.
By setting » = 1 we obtain a simple consequence of Theorem 5.2.4.
Corollary 5.4.8. Let f be a Lipschitz continuously differentiable function on „¦. Then if
x is stationary then F (x— ) = 0.





5.5 Superlinear Convergence
Once the gradient projection iteration has identi¬ed the active constraints, PA(x— ) x— is known.
At that point the minimization problem for PI x— is unconstrained and, in principal, any super-
linearly convergent method for unconstrained optimization could then be used.
The problem with this idea is, of course, that determining when the active set has been
identi¬ed is possible only after the problem has been solved and an error in estimating the active
set can have devastating effects on convergence. In this section we discuss two approaches: one,
based on Newton™s method, is presented only as a local method; the other is a BFGS“Armijo
method similar to Algorithm bfgsopt.
We will begin with the development of the local theory for the projected Newton method
[19]. This analysis illustrates the important problem of estimation of the active set. As with the
unconstrained minimization problem, the possibility of negative curvature makes this method
dif¬cult to globalize (but see §5.6 for pointers to the literature on trust region methods). Following
the approach in §4.2 we describe a projected BFGS“Armijo scheme in §5.5.3.


5.5.1 The Scaled Gradient Projection Algorithm
One might think that the theory developed in §5.4 applies equally well to iterations of the form
’1
x+ = P(xc ’ »Hc ∇f (xc ))

where Hc is spd. This is not the case as the following simple example illustrates. Let N = 2,
Li = 0, and Ui = 1 for all i. Let

f (x) = x ’ (’1, 1/2)T 2
/2;


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 97

then the only local minimizer for (5.3) is x— = (0, 1/2)T . Let xc = (0, 0) (not a local mini-
mizer!); then ∇f (xc ) = (1, ’1/2)T . If

21
’1
Hc =
12
’1
then Hc (and hence Hc ) is spd, and

2 1 1 3/2
’1
Hc ∇f (xc ) = = .
1 2 ’1/2 0

Therefore, for all » > 0,

0 1 ’3»/2 0
’1
xc (») = P ’ »Hc =P = = xc .
0 ’1/2 0 0

The reason that xc (») = xc for all » > 0 is that the search direction for the unconstrained
’1
problem has been rotated by Hc to be orthogonal to the direction of decrease in the inactive
directions for the constrained problem. Hence, unlike the constrained case, positivity of the
model Hessian is not suf¬cient and we must be able to estimate the active set and model the
reduced Hessian (rather than the Hessian) if we expect to improve convergence.
The solution proposed in [19] is to underestimate the inactive set in a careful way and
therefore maintain a useful spd approximate reduced Hessian. For x ∈ „¦ and

0 ¤ < min(Ui ’ Li )/2,

we de¬ne A (x), the -active set at x, by

A (x) = {i | Ui ’ (x)i ¤ or (x)i ’ Li ¤ }.
(5.29)

And let I (x), the -inactive set, be the complement of A (x).
Given 0 ¤ c < min(Ui ’ Li )/2, xc , and an spd matrix Hc , we model the reduced Hessian
with Rc , the matrix with entries
±
 δij if i ∈ A c (xc ) or j ∈ A c (xc ),
Rc = PA c (xc ) + PI c (xc )Hc PI c (xc ) =

(Hc )ij otherwise.
(5.30)
When the explicit dependence on xc , c , and Hc is important we will write

R(xc , c , Hc ).

So, for example,
∇2 f (xc ) = R(xc , 0, ∇2 f (xc )).
R

Given 0 < < min(Ui ’ Li )/2 and an spd H, de¬ne

xH, (») = P(x ’ »R(x, , H)’1 ∇f (x)).

It requires proof that
f (xH, (»)) < f (x)
for » suf¬ciently small. We prove more and show that the suf¬cient decrease condition

f (xH, (»)) ’ f (x) ¤ ’±∇f (x)T (x ’ xH, (»))
(5.31)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




98 ITERATIVE METHODS FOR OPTIMIZATION

holds for suf¬ciently small ».
Lemma 5.5.1. Let x ∈ „¦, 0 < < min(Ui ’ Li )/2, and H be spd with smallest and largest
eigenvalues 0 < »s ¤ »l . Let ∇f be Lipschitz continuous on „¦ with Lipschitz constant L. Then
¯
there is »( , H) such that (5.31) holds for all
¯
» ¤ »( , H).
(5.32)



Proof. The proof will show that

∇f (x)T (x ’ xH, (»)) ≥ »’1 ∇f (x)T (x ’ x(»))
l

and then use the method of proof from Theorem 5.4.5. We do this by writing

∇f (x)T (x ’ xH, (»)) = (PA T
’ xH, (»)) + (PI T
’ xH, (»))
(x) ∇f (x)) (x (x) ∇f (x)) (x

and considering the two terms on the right side separately.
We begin by looking at (PA (x) ∇f (x))T (x ’ xH, (»)). Note that

(xH, (»))i = (x(»))i for i ∈ A (x)

and, therefore,
T
’ xH, (»)) = (PA T
(PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»)).
(5.33)

We will need to show that
T
(PA (x) ∇f (x)) (x ’ x(»)) ≥ 0.
(5.34)

Now assume that
min(Ui ’ Li )
¯
» < »1 = .
(5.35)
2 maxx∈„¦ ∇f (x) ∞

Since A(x) ‚ A (x) we can investigate the contributions of A(x) and A (x) © I(x) separately.
If i ∈ A(x) then (5.35) implies that either (x ’ x(»))i = »(∇f (x))i or (x ’ x(»))i = 0. In
either case (x ’ x(»))i (∇f (x))i ≥ 0. If i ∈ A (x) © I(x) and (x ’ x(»))i = »(∇f (x))i , then
it must be the case that i ∈ A(x(»)) and therefore we must still have (x ’ x(»))i (∇f (x))i ≥ 0.
Hence (5.34) holds.
Now if i ∈ I (x) then, by de¬nition,

Li + ¤ (x)i ¤ Ui ’

and, hence, if
¯
» ¤ »2 =
(5.36)
maxx∈„¦ R(x, , H)’1 ∇f (x) ∞

then i is in the inactive set for both xH, (») and x(»). Therefore, if (5.36) holds then
T
’ xH, (»)) = »(PI T ’1
(PI (x) ∇f (x)) (x (x) ∇f (x)) H PI (x) ∇f (x)

≥ »’1 »’1 PI 2
(x) (x ’ x(»))
(5.37) l


= »’1 (PI T
(x) ∇f (x)) (x ’ x(»)).
l



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 99

Hence, using Corollary 5.4.4, (5.33), (5.34), and (5.37), we obtain

∇f (x)T (x ’ xH, (»)) T
’ xH, (»))
= (PA (x) ∇f (x)) (x

T
’ xH, (»))
+(PI (x) ∇f (x)) (x


’ x(»)) + »’1 (PI
T T
≥ (PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»))
l

min(1,»’1 )
≥ min(1, »’1 )∇f (x)T (x ’ x(»)) ≥ x ’ x(») 2 .
l
l »
(5.38)
The remainder of the proof is almost identical to that for Theorem 5.4.5. The fundamental
theorem of calculus and the Lipschitz continuity assumption imply that

f (xH, (»)) ’ f (x) ¤ ’∇f (x)T (x ’ xH, (»)) + L x ’ xH, (») 2 .

We apply (5.38) and obtain

f (xH, (»)) ’ f (x) ¤ ’(1 ’ L» max(1, »l ))∇f (x)T (x ’ xH, (»)),

which implies (5.31) if 1 ’ L» max(1, »l ) ≥ ± which will follow from

(1 ’ ±)
¯
» ¤ »3 = .
(5.39)
max(1, »l )L
¯ ¯¯¯
This completes the proof with » = min(»1 , »2 , »3 ).
An algorithm based on these ideas is the scaled gradient projection algorithm. The name
comes from the scaling matrix H that is used to computed the direction. The inputs are the initial
iterate, the vectors of upper and lower bounds u and l, the relative-absolute residual tolerance
vector „ = („r , „a ), and a limit on the number of iterations. Left unstated in the algorithmic
description are the manner in which the parameter is computed and the way in which the
approximate Hessians are constructed.
Algorithm 5.5.1. sgradproj(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).
(b) Compute and an spd H.
(c) Solve
R(x, , Hc )d = ’∇f (xc ).

(d) Find the least integer m such that (5.13) holds for » = β m .
(e) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

If our model reduced Hessians remain uniformly positive de¬nite, a global convergence
result completely analogous to Theorem 3.2.4 holds.
Theorem 5.5.2. Let ∇f be Lipschitz continuous with Lipschitz constant L. Assume that the
matrices Hn are symmetric positive de¬nite and that there are κ and »l such that κ(Hn ) ¤ κ,
¯ ¯
and Hn ¤ »l for all n. Assume that there is ¯ > 0 such that ¯ ¤ n < min(Ui ’ Li )/2 for
all n.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




100 ITERATIVE METHODS FOR OPTIMIZATION

Then
lim xn ’ xn (1) = 0,
(5.40)
n’∞

and hence any limit point of the sequence of iterates produced by Algorithm sgradproj is a
stationary point.
In particular, if xnl ’ x— is any convergent subsequence of {xn }, then x— = x— (1). If xn
converges to a nondegenerate local minimizer x— , then the active set of xn is the same as that of
x— after ¬nitely many iterations.
Proof. With Lemma 5.5.1 and its proof in hand, the proof follows the outline of the proof of
Theorem 5.4.6. We invite the reader to work through it in exercise 5.8.3.

5.5.2 The Projected Newton Method
The requirement in the hypothesis of Theorem 5.5.2 that the sequence { n } be bounded away
from zero is used to guarantee that the steplengths »n are bounded away from zero. This is
needed because appears in the numerator in (5.36). However, once the active set has been
identi¬ed and one is near enough to a nondegenerate local minimizer for the reduced Hessians to
be spd, one is solving an unconstrained problem. Moreover, once near enough to that minimizer,
the convergence theory for Newton™s method will hold. Then one can, in principle, set n = 0
and the iteration will be q-quadratically convergent. In this section we discuss an approach from
[19] for making a transition from the globally convergent regime described in Theorem 5.5.2 to
the locally convergent setting where Newton™s method converges rapidly.
If the initial iterate x0 is suf¬ciently near a nondegenerate local minimizer x— and we take

Hn = ∇2 f (xn )
R

in Algorithm sgradproj, then the resulting projected Newton method will take full steps (i.e.,
» = 1) and, if n is chosen with care, converge q-quadratically to x— .
A speci¬c form of the recommendation from [19], which we use here, is

= min( xn ’ xn (1) , min(Ui ’ Li )/2).
(5.41) n

Note that while xn is far from a stationary point and the reduced Hessian is spd, then n will be
bounded away from zero and Theorem 5.5.2 will be applicable. The convergence result is like
Theorem 2.3.3 for local convergence but makes the strong assumption that Hn is spd (valid near
x— , of course) in order to get a global result.
Algorithm projnewt is the formal description of the projected Newton algorithm. It is a
bit more than just a speci¬c instance of Algorithm gradproj. Keep in mind that if the initial
iterate is far from x— and the reduced Hessian is not spd, then the line search (and hence the
entire iteration) may fail. The algorithm tests for this. This possibility of inde¬niteness is the
weakness in any line search method that uses ∇2 f when far from the minimizer. The inputs to
Algorithm projnewt are the same as those for Algorithm gradproj. The algorithm exploits
the fact that
R(x, , ∇2 f (x)) = R(x, , ∇2 f (x))
(5.42) R

which follows from A(x) ‚ A (x).
Algorithm 5.5.2. projnewt(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).
(b) Set = x ’ x(1) .


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 101

(c) Compute and factor R = R(x, , ∇2 f (x)). If R is not spd, terminate with a failure
R
message.
(d) Solve Rd = ’∇f (xc ).
(e) Find the least integer m such that (5.13) holds for » = β m .
(f) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.


Theorem 5.5.3. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near
to x— and A(x0 ) = A(x— ) then the projected Newton iteration, with n = xn ’ xn (1) , will
converge q-quadratically to x— .
Proof. Our assumption that the active set has been identi¬ed, i.e.,

A(xc ) = A(x+ ) = A(x— ),

implies that
PA(xc ) ec = PA(xc ) e+ = 0.
Hence, we need only estimate PI(xc ) e+ to prove the result.
Let
δ — = min— (|(x)i ’ Ui |, |(x)i ’ Li |) > 0.
i∈I(x )

We reduce e if necessary so that
e ¤ δ — /M,
where M is the constant in Theorem 5.4.2. We may then apply Theorem 5.4.2 to conclude that
both c < δ — and ec < δ — . Then any index i ∈ A c (xc ) must also be in A(xc ) = A(x— ).
Hence
A c (xc ) = A(xc ) = A(x— ).
(5.43)
From this we have
R(xc , c , ∇2 f (xc )) = ∇2 f (xc ).
(5.44) R R

Hence, for ec suf¬ciently small the projected Newton iteration is

x+ = P(xc ’ (∇2 f (xc ))’1 ∇f (xc )).
R

By the fundamental theorem of calculus,

∇f (xc ) = ∇f (x— ) + ∇2 f (xc )ec + E1 ,
(5.45)

where
1
(∇2 f (x— + tec ) ’ ∇2 f (xc ))ec dt
E1 =
0

and hence E1 ¤ K1 ec 2 for some K1 > 0.
By the necessary conditions,

PI(x) ∇f (x— ) = PI(x— ) ∇f (x— ) = 0.
(5.46)

By the fact that I(xc ) = I(x— ), we have the equivalent statements

ec = PI(xc ) ec and PA(xc ) ec = 0.
(5.47)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




102 ITERATIVE METHODS FOR OPTIMIZATION

Therefore, combining (5.45), (5.46), (5.47),

= PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1
PI(xc ) ∇f (xc )

= PA(xc ) ec + PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1
(5.48)

= ∇2 f (xc )ec + PI(xc ) E1 .
R

So, by de¬nition of ∇2 ,
R

PI(xc ) (∇2 f (xc ))’1 ∇f (xc ) = (∇2 f (xc ))’1 PI(xc ) ∇f (xc ) = ec + E2 ,
R R

where E2 ¤ K2 ec 2 for some K2 > 0.
Since PI(xc ) Pw = PPI(xc ) w for all w ∈ RN ,

= PI(xc ) P(xc ’ (∇2 f (xc ))’1 ∇f (xc ))
PI(xc ) x+ R


= PPI(xc ) (xc ’ (∇2 f (xc ))’1 ∇f (xc )) = P(x— ’ E2 ).
R

2
Therefore, e+ ¤ K2 ec as asserted.

5.5.3 A Projected BFGS“Armijo Algorithm
We can apply the structured quasi-Newton updating scheme from §4.3 with

C(x) = PA
(5.49) (x)



and update an approximation to the part of the model Hessian that acts on the inactive set. In
this way we can hope to maintain a positive de¬nite model reduced Hessian with, say, a BFGS
update. So if our model reduced Hessian is

R = C(x) + A,

we can use (4.42) to update A (with A0 = PI 0 (x0 ) , for example), as long as the active set does
not change. If one begins the iteration near a nondegenerate local minimizer with an accurate
approximation to the Hessian, then one would expect, based on Theorem 5.5.3, that the active
set would remain constant and that the iteration would converge q-superlinearly.
However, if the initial data is far from a local minimizer, the active set can change with each
iteration and the update must be designed to account for this. One way to do this is to use a
projected form of the BFGS update of A from (4.42),
T
y# y# (Ac s)(Ac s)T
A+ = PI+ Ac PI+ + ’ PI+ PI+ ,
(5.50) T sT Ac s
y# s
with
y # = PI+ (∇f (x+ ) ’ ∇f (xc )).
Here I+ = I + (x+ ). This update carries as much information as possible from the previous
model reduced Hessian while taking care about proper approximation of the active set. As in
T
the unconstrained case, if y # s ¤ 0 we can either skip the update or reinitialize A to PI .
A is not spd if any constraints are active. However, we can demand that A be symmetric
inde¬nite, and a generalized inverse A† exists. We have

(PA + A)’1 = PA + A† .
(5.51)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 103

If A(xc ) = A(x+ ) any of the low-storage methods from Chapter 4 can be used to update A† .
In this case we have
T
sy # y # sT ssT
A† A†
= I’ I’ + .
(5.52) c
+ T T T
y# s y# s y# s

Since s = PI+ s if A(x+ ) = A(xc ), we could replace s by s# = PI+ s in (5.52).
If the set of active constraints changes, then (5.52) is no longer true and we cannot replace
s by s# . One approach is to store A, update it as a full matrix, and refactor with each nonlinear
iteration. This is very costly for even moderately large problems. Another approach, used in
[250], is to reinitialize A = PI whenever the active set changes. The problem with this is that
in the terminal phase of the iteration, when most of the active set has been identi¬ed, too much
information is lost when A is reinitialized.
In this book we suggest an approach based on the recursive BFGS update that does not
discard information corresponding to that part of the inactive set that is not changed. The idea is
that even if the active set has changed, we can still maintain an approximate generalized inverse
with
T T T
s# y # y # s# s# s#
† †
A+ = I ’ PI+ Ac PI+ I ’ + .
(5.53) T T T
y # s# y # s# y # s#
The formulation we use in the MATLAB code bfgsbound is based on (5.52) and Al-
gorithm bfgsrec. Algorithm bfgsrecb stores the sequences {yk } and {s# } and uses
#
k
Algorithm bfgsrec and the projection PI+ to update A† as the iteration progresses. The data
are the same as for Algorithm bfgsrec with the addition of

PIn = PI .
n (xn )



Note that the sequences {yk } and {s# } are changed early in the call and then the unconstrained
#
k
algorithm bfgsrec is used to do most of the work.
Algorithm 5.5.3. bfgsrecb(n, {s# }, {yk }, A† , d, PIn )
#
0
k

1. d = PIn d.

2. If n = 0, d = A† d; return
0
T
T # #
3. ± = s# n’1 d/yn’1 s# ; d = d ’ ±yn’1

4. call bfgsrec(n ’ 1, {s# k }, {yk }, A† , d)
#
0
T T
# #
5. d = d + (± ’ (yn’1 d/yn’1 s# n’1 ))s# n’1
6. d = PIn d.

The projected BFGS“Armijo algorithm that we used in the example problems in §5.7 is
based on Algorithm bfgsrecb. Note that we reinitialize ns to zero (i.e., reinitialize A to PI )
#T
when yns s ¤ 0. We found experimentally that this was better than skipping the update.
Algorithm 5.5.4. bfgsoptb(x, f, „, u, l)
1. ns = n = 0; pg0 = pg = x ’ P(x ’ ∇f (x))
= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x); A0 = PI
2.
3. While pg ¤ „a + „r pg0


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




104 ITERATIVE METHODS FOR OPTIMIZATION

(a) d = ’∇f (x); Call bfgsrecb(ns, {s# }, {yk }, A† , d, PI )
#
0
k
(b) d = ’PA ∇f (x) + d
(c) Find the least integer m such that (5.13) holds for » = β m . Set s# = PI (x(») ’ x)
ns
#
(d) xp = x(»); y = ∇f (xp) ’ ∇f (x); x=xp; yns = PI (∇f (xp) ’ ∇f (x))
T
(e) If yns s# s > 0 then ns = ns + 1, else ns = 0
#
n
(f) x = xp; pg = x ’ P(x ’ ∇f (x))
= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x)
(g)
(h) n = n + 1

Theorem 4.1.3 can be applied directly once the active set has been identi¬ed and a good
initial approximation to the reduced Hessian is available. The reader is invited to construct the
(easy!) proof in exercise 5.8.6.
Theorem 5.5.4. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near
to x— , A(x0 ) = A(x— ), and A0 suf¬ciently near to PI(x— ) ∇2 f (x— )PI(x— ) , then the projected
BFGS iteration, with n = xn ’ xn (1) , will converge q-superlinearly to x— .
A global convergence result for this projected BFGS algorithm can be derived by combining
Theorems 5.5.2 and 4.1.9.
Theorem 5.5.5. Let ∇f be Lipschitz continuous on „¦. Assume that the matrices Hn are
constructed with the projected BFGS method (5.50) and satisfy the assumptions of Theorem 5.5.2.
Then (5.40) and the conclusions of Theorem 5.5.2 hold.
Moreover, if x— is a nondegenerate local minimizer such that there is n0 such that A(xn ) =
A(x— ) for all n ≥ n0 , Hn0 is spd, and the set

D = {x | f (x) ¤ f (xn0 ) and A(x) = A(x— )}

is convex, then the projected BFGS“Armijo algorithm converges q-superlinearly to x— .


5.6 Other Approaches
Our simple projected-BFGS method is effective for small to medium sized problems and for very
large problems that are discretizations of in¬nite-dimensional problems that have the appropriate
compactness properties. The example in §4.4.2 nicely illustrates this point. For other kinds of
large problems, however, more elaborate methods are needed, and we present some pointers to
the literature in this section.
The limited memory BFGS method for unconstrained problems described in [44] and [176]
has also been extended to bound constrained problems [42], [280]. More general work on line
search methods for bound constrained problems can be found in [47], [194], and [42].
Very general theories have been developed for convergence of trust region methods for
bound constrained problems. The notion of Cauchy decrease can, for example, be replaced by
the decrease from a gradient projection step for the quadratic model [191], [259], [66]. One
could look for minima of the quadratic model along the projection path [63], [64], or attempt to
project the solution of an unconstrained model using the reduced Hessian [162].
A completely different approach can be based on interior point methods. This is an active
research area and the algorithms are not, at least at this moment, easy to implement or analyze.
This line of research began with [57] and [58]. We refer the reader to [86] and [266] for more
recent accounts of this aspect of the ¬eld and to [140] and [79] for some applications to control
problems and an account of the dif¬culties in in¬nite dimensions.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




BOUND CONSTRAINTS 105



Gradient Projection Gradient Projection
1 2
10 10
Projected Gradient Norm




Function Value
0
10




1 1
10 10
0 500 1000 1500 2000 0 500 1000 1500 2000
Iterations Iterations

Projected BFGS Projected BFGS
5 2
10 10
Projected Gradient Norm




Function Value


0
10



5
10



10 1
10 10
0 10 20 30 40 0 10 20 30 40
Iterations Iterations


Figure 5.1: Solution to Constrained Parameter ID Problem



Solution to bound constrained control problem
1.4


1.3


1.2


1.1


1


0.9


0.8


0.7


0.6


0.5
0 200 400 600 800 1000 1200 1400 1600 1800 2000


Figure 5.2: Solution to Discrete Control Problem: First Example

<<

. 5
( 9)



>>