0

10

4

10

5

10

10 3

10 10

0 5 10 15 20 0 5 10 15 20

Iterations Iterations

Figure 4.2: BFGS“Armijo for Discrete Control Problem

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BFGS METHOD 85

and allocated 50 vectors to Algorithm bfgsopt, there was no longer a bene¬t to using the

good Hessian. In fact, as is clear from Figure 4.3 the poor Hessian produced a more rapidly

convergent iteration.

Better Hessian Better Hessian

10 8

10 10

5

10

Function Value

Gradient Norm

6

10

0

10

4

10

5

10

10 2

10 10

0 50 100 150 0 50 100 150

Iterations Iterations

Poor Hessian Poor Hessian

10 8

10 10

5

10

Function Value

Gradient Norm

6

10

0

10

4

10

5

10

10 2

10 10

0 20 40 60 80 0 20 40 60 80

Iterations Iterations

Figure 4.3: BFGS“Armijo for Discrete Control Problem: Poor Initial Iterate

4.5 Exercises on BFGS

4.5.1. Use the secant method (4.2) with initial data x’1 = 1 and x0 = .9 to minimize f (x) = x4 .

Explain the convergence of the iteration.

4.5.2. Prove Lemma 4.1.1. It might help to use the secant equation.

4.5.3. Prove Lemma 4.1.4.

4.5.4. Verify (4.13) and compute the constant C.

4.5.5. Prove (4.25).

4.5.6. As an exercise in character building, implement Algorithm bfgsrec nonrecursively.

4.5.7. Show how the Sherman“Morrison formula can be used to implement the SR1 update in

such a way that only one vector need be stored for each iterate.

4.5.8. State and prove a local convergence theorem for DFP and/or PSB.

4.5.9. Implement the DFP and PSB update and compare their performance with BFGS on the

examples from §4.4.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

86 ITERATIVE METHODS FOR OPTIMIZATION

4.5.10. Show that, for positive de¬nite quadratic problems, the BFGS method with an exact line

search (i.e., one that ¬nds the minimum of f in the search direction) is the same as CG

[201], [200].

4.5.11. Prove Theorem 4.3.2.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

Chapter 5

Simple Bound Constraints

5.1 Problem Statement

The goal of this chapter is to show how the techniques of Chapters 2, 3, and 4 can be used to

solve a simple constrained optimization problem. The algorithm we suggest at the end in §5.5.3

is a useful extension of the BFGS“Armijo algorithm from Chapter 4. We will continue this line

of development when we solve noisy problems in Chapter 7.

Let {Li }N and {Ui }N be sequences of real numbers such that

i=1 i=1

’∞ < Li < Ui < +∞.

(5.1)

The bound constrained optimization problem is to ¬nd a local minimizer x— of a function f of

N variables subject to the constraint that

x— ∈ „¦ = {x ∈ RN | Li ¤ (x)i ¤ Ui }.

(5.2)

By this we mean that x— satis¬es

f (x— ) ¤ f (x) for all x ∈ „¦ near x— .

(5.3)

It is standard to express this problem as

min f (x)

(5.4)

x∈„¦

or as min„¦ f . The set „¦ is called the feasible set and a point in „¦ is called a feasible point.

Because the set „¦ is compact there is always a solution to our minimization problem [229].

The inequalities Li ¤ (x)i ¤ Ui are called inequality constraints or simply constraints.

We will say that the ith constraint is active at x ∈ „¦ if either (x)i = Li or (x)i = Ui . If the

ith constraint is not active we will say that it is inactive. The set of indices i such that the ith

constraint is active (inactive) will be called the set of active (inactive) indices at x.

We will write A(x) and I(x) for the active and inactive sets at x.

5.2 Necessary Conditions for Optimality

For a continuously differentiable function of one variable, the necessary conditions for uncon-

strained optimality at x— are simply f (x— ) = 0 and, if f is twice continuously differentiable,

f (x— ) ≥ 0. A bound constrained problem in one variable restricts the domain of f to an

interval [a, b], and the necessary conditions must be changed to admit the possibility that the

87

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

88 ITERATIVE METHODS FOR OPTIMIZATION

minimizer is one of the endpoints. If x— = a is a local minimizer, then it need not be the case

that f (a) = 0; however, because a is a local minimizer, f (x) ≥ f (a) for all a ¤ x suf¬ciently

near a. Hence f (a) ≥ 0. Nothing, however, can be said about f . Similarly, if x— = b is a

local minimizer, f (b) ¤ 0. If f is differentiable on [a, b] (i.e., on an open set containing [a, b]),

then the necessary conditions for all three possibilities, x— = a, x— = b, and a < x— < b can be

neatly expressed by the following theorem.

Theorem 5.2.1. Let f be a continuously differentiable function of one variable on the

interval [a, b]. Let x— be a local minimum of f on [a, b]. Then

f (x— )(x ’ x— ) ≥ 0 for all x ∈ [a, b]

(5.5)

and, if f is twice continuously differentiable on [a, b],

f (x— )(x— ’ a)(b ’ x— ) ≥ 0.

(5.6)

The analogue (5.5) is expressed by the idea of stationarity.

De¬nition 5.2.1. A point x— ∈ „¦ is stationary for problem (5.4) if

∇f (x— )T (x ’ x— ) ≥ 0 for all x ∈ „¦.

(5.7)

As in the unconstrained case, stationary points are said to satisfy the ¬rst-order necessary

conditions.

The fact that optimality implies stationarity is proved with Taylor™s theorem just as it was in

the unconstrained case.

Theorem 5.2.2. Let f be continuously differentiable on „¦ and let x— be a solution of problem

(5.4). Then x— is a stationary point for problem (5.4).

Proof. Let x— be a solution of problem (5.4) and let y ∈ „¦. As „¦ is convex, the line segment

joining x— and y is entirely in „¦. Hence, the function

φ(t) = f (x— + t(y ’ x— ))

is de¬ned for t ∈ [0, 1] and has a local minimum at t = 0. Therefore, by Theorem 5.2.1

0 ¤ φ (0) = ∇f (x— )T (y ’ x— )

as asserted.

The case of a function of a single variable is less useful in explaining the role of the second

derivative. However, we can get a complete picture by looking at functions of two variables.

To illustrate the ideas we let N = 2 and let f be a twice Lipschitz continuously differentiable

function on „¦ = [0, 1] — [0, 1]. If x— is a solution of (5.4) and no constraints are active, then

∇2 f (x— ) is positive semide¬nite by the same arguments used in the unconstrained case. If one

or more constraints are active, however, then, just as in the one variable case, one cannot draw

conclusions about the positivity of ∇2 f (x— ). Suppose the minimizer is at x— = (ξ, 0) with

0 < ξ < 1. While nothing can be said about ‚ 2 f (x— )/‚x2 , the function φ(t) = f (t, 0), de¬ned

2

on [0, 1], must satisfy

φ (ξ) = ‚ 2 f (x— )/‚x2 ≥ 0.

1

Hence, second partials in directions corresponding to the inactive constraints must be nonnega-

tive, while nothing can be said about directions corresponding to active constraints.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 89

We de¬ne the reduced Hessian to help make this idea precise.

De¬nition 5.2.2. Let f be twice differentiable at x ∈ „¦. The reduced Hessian ∇2 f (x) is

R

the matrix whose entries are

±

δij if i ∈ A(x) or j ∈ A(x),

2

(∇R f (x))ij =

(5.8)

(∇2 f (x))ij otherwise.

We can now present the second-order necessary conditions.

Theorem 5.2.3. Let f be twice Lipschitz continuously differentiable and let x— be the solution

of problem (5.4). Then the reduced Hessian ∇2 f (x— ) is positive semide¬nite.

R

Proof. Assume that there are M inactive indices and N ’ M active indices. We partition

x ∈ „¦, reordering the variables if needed, into x = (ξ, ζ) with ξ corresponding to the inactive

indices and ζ to the active. The map

φ(ξ) = f (ξ, ζ — )

has an unconstrained local minimizer at ξ — ∈ RM and hence ∇2 φ is positive semide¬nite. Since

the reduced Hessian can be written as

∇2 φ(x— ) 0

∇2 f (x— ) =

R 0 I

if the variables are partitioned in this way, the proof is complete.

We let P denote the projection onto „¦, that is, the map that takes x into the nearest point (in

the l2 -norm) in „¦ to x. We have that

±

Li if (x)i ¤ Li ,

(x)i if Li < (x)i < Ui ,

P(x)i =

(5.9)

Ui if (x)i ≥ Ui .

Theorem 5.2.4 states our ¬nal necessary condition; we defer the proof to §5.4.4.

Theorem 5.2.4. Let f be continuously differentiable. A point x— ∈ „¦ is stationary for

problem (5.4) if and only if

x— = P(x— ’ »∇f (x— ))

(5.10)

for all » ≥ 0.

5.3 Suf¬cient Conditions

With the de¬nition of the reduced Hessian in hand, the suf¬cient conditions are easy to formulate.

We begin by strengthening the notion of stationarity. If x— is stationary, i ∈ I(x— ), and ei is a

unit vector in the ith coordinate direction, then x— ± tei ∈ „¦ for all t suf¬ciently small. Since

df (x— ± tei )

= ±∇f (x— )T ei ≥ 0,

dt

therefore

(∇f (x— ))i = 0 for all i ∈ I(x— ).

We will use the concept of nondegenerate stationary point or strict complementarity in our

formulation of the suf¬cient conditions.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

90 ITERATIVE METHODS FOR OPTIMIZATION

De¬nition 5.3.1. A point x— ∈ „¦ is a nondegenerate stationary point for problem (5.4) if

x— is a stationary point and

(∇f (x— ))i = 0 for all i ∈ A(x— ).

(5.11)

If x— is also a solution of problem (5.4) we say that x— is a nondegenerate local minimizer.

Our nondegeneracy condition is also referred to as strict complementarity.

If S is any set of indices de¬ne

±

(x)i , i ∈ S,

(PS x)i =

0, i ∈ S.

Nondegeneracy is important not only in the formulation of suf¬cient conditions but also in

the design of termination criteria. The ¬rst step in the use of nondegeneracy is Lemma 5.3.1.

Lemma 5.3.1. Let x— be a nondegenerate stationary point. Assume that A = A(x— ) is not

empty. Then there is σ such that

∇f (x— )T (x ’ x— ) = ∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— )

for all x ∈ „¦.

Proof. If i ∈ A then nondegeneracy and stationarity imply that there is σ > 0 such that

either

(x— ) = Li and (∇f (x— ))i ≥ σ or (x— ) = Ui and (∇f (x— ))i ¤ ’σ.

i i

If x ∈ „¦ then for all i ∈ A,

(∇f (x— ))i (x ’ x— )i ≥ σ|(x ’ x— )i |.

Therefore, since x ≥ x 2,

1

∇f (x— )T PA (x ’ x— ) ≥ σ PA (x ’ x— ) ,

as asserted.

For a nondegenerate stationary point the suf¬ciency conditions are very similar to the un-

constrained case.

Theorem 5.3.2. Let x— ∈ „¦ be a nondegenerate stationary point for problem (5.4). Let

f be twice differentiable in a neighborhood of x— and assume that the reduced Hessian at x—

is positive de¬nite. Then x— is a solution of problem (5.4) (and hence a nondegenerate local

minimizer).

Proof. Let x ∈ „¦ and de¬ne φ(t) = f (x— + t(x ’ x— )). We complete the proof by showing

that either (i) φ (0) > 0 or (ii) φ (0) = 0, φ (0) > 0. Let e = x ’ x— and note that

φ (0) = ∇f (x— )T e = ∇f (x— )T (PA e + PI e).

Stationarity implies that ∇f (x— )T PI e = 0. If PA e = 0 then nondegeneracy implies that

∇f (x— )T PA e > 0

and hence (i) holds. If PA e = 0 then

φ (0) = (x ’ x— )T PI ∇2 f (x— )PI (x ’ x— ) = (x ’ x— )T ∇2 f (x— )(x ’ x— ) > 0,

R

proving (ii).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 91

5.4 The Gradient Projection Algorithm

The gradient projection algorithm is the natural extension of the steepest descent algorithm to

bound constrained problems. It shares all the advantages and disadvantages of that algorithm.

Our approach follows that of [18]. Given a current iterate xc the new iterate is

x+ = P(xc ’ »∇f (xc )),

where » is a steplength parameter given by the Armijo rule or some other line search scheme.

In this section we will restrict our attention to the simplest form of the Armijo rule. In order to

implement any line search scheme, we must specify what we mean by suf¬cient decrease. For

» > 0 de¬ne

x(») = P(x ’ »∇f (x)).

(5.12)

For bound constrained problems we will express the suf¬cient decrease condition for line searches

(compare with (3.4)) as

’±

x ’ x(») 2 .

f (x(»)) ’ f (x) ¤

(5.13)

»

As with (3.4), ± is a parameter and is typically set to 10’4 [84].

The general algorithmic description follows in Algorithm 5.4.1.

Algorithm 5.4.1. gradproj(x, f, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination.

(b) Find the least integer m such that (5.13) holds for » = β m .

(c) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

The next step is to elaborate on the termination criterion.

5.4.1 Termination of the Iteration

The termination criterion for unconstrained optimization that we have used previously must be

modi¬ed if we are to properly take the constraints into account. ∇f need not be zero at the

solution, but a natural substitute is to terminate the iteration if the difference between x and x(1)

is small. As in the case of unconstrained optimization or nonlinear equations, we must invoke

the suf¬cient conditions to show that such a termination criterion will accurately measure the

error.

As usual, we let e = x ’ x— .

We begin with a lemma that connects the active and inactive sets at a nondegenerate local

minimizer with nearby points.

Lemma 5.4.1. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate

stationary point for problem (5.4). Let » ∈ (0, 1]. Then for x suf¬ciently near x— ,

1. A(x) ‚ A(x— ) and (x)i = (x— )i for all i ∈ A(x).

2. A(x(»)) = A(x— ) and (x(»))i = (x— )i for all i ∈ A(x— ).

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

92 ITERATIVE METHODS FOR OPTIMIZATION

Proof. Let

A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).

Let

δ1 = min {(Ui ’ (x— )i ), ((x— )i ’ Li ), (Ui ’ Li )/2}.

—i∈I

If i ∈ I — and e < δ1 then Li < (x)i < Ui . Hence,

I— ‚ I

proving the ¬rst assertion that A ‚ A— . Moreover, since

e < δ1 ¤ min {(Ui ’ Li )/2} ,

then (x)i = (x— )i for all i ∈ A.

Now let A» and I » be the active and inactive sets for x(») = P(x ’ »∇f (x)). Let i ∈ A— .

By Lemma 5.3.1 and continuity of ∇f there is δ2 such that if e < δ2 then

(∇f (x— + e))i (x ’ x— )i > σ|x ’ x— |i /2.

Therefore, if

e < δ3 < min(σ/2, δ2 ),

then i ∈ A» and (x(»))i = (x— )i . Hence A— ‚ A» .

It remains to prove that A» ‚ A— . By de¬nition of P we have

P(x) ’ P(y) ¤ x ’ y

for all x, y ∈ RN . Continuity of ∇2 f implies that ∇f is Lipschitz continuous. We let L denote

the Lipschitz constant of ∇f in „¦. By stationarity and Theorem 5.2.4,

x— = x— (») = P(x— ’ »∇f (x— )),

and, therefore,

x— ’ x(») = P(x— ’ »∇f (x— )) ’ P(x ’ »∇f (x))

(5.14)

¤ e + » ∇f (x— ) ’ ∇f (x) ¤ (1 + L») e .

If there is i ∈ A» © I — then we must have

x— ’ x(») ≥ δ1 = min {(Ui ’ x— ), (x— ’ Li )}.

(5.15) — i∈I

However, if

e < δ4 = min(δ3 , δ1 /(1 + L))

then (5.14) implies that (5.15) cannot hold. This completes the proof.

Theorem 5.4.2. Let f be twice continuously differentiable on „¦ and let x— be a nondegenerate

stationary point for problem (5.4). Assume that suf¬cient conditions hold at x— . Then there are

δ and M such that if e < δ and A(x) = A(x— ) then

e /M ¤ x ’ x(1) ¤ M e .

(5.16)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 93

Proof. Again we let L denote the Lipschitz constant of ∇f in „¦ and

A— = A(x— ), I — = I(x— ), A = A(x), and I = I(x).

Using stationarity we obtain

= e ’ (x(1) ’ x— (1))

x ’ x(1)

¤ e + P(x ’ ∇f (x)) ’ P(x— ’ ∇f (x— ))

¤ 2 e + ∇f (x) ’ ∇f (x— ) ¤ (2 + L) e .

Hence, the right inequality in (5.16) holds.

To verify the left inequality in (5.16) we apply Lemma 5.4.1. Let δ1 be such that e < δ1

implies that the conclusions of Lemma 5.4.1 hold for » = 1. The lemma implies that

±

(∇f (x))i , i ∈ I — ,

(x ’ x(1))i =

(5.17)

(e)i = 0, i ∈ A— .

The remaining case is if i ∈ I = I — . The suf¬ciency conditions imply that there is µ > 0

such that

uT PI — ∇2 f (x— )PI — u ≥ µ PI — u 2

for all u ∈ RN . Hence, there is δ2 so that if e < δ2 then

uT PI — ∇2 f (x)PI — u ≥ µ PI — u 2 /2

for all u ∈ RN .

Therefore, since e = PI — e,

1

2

eT PI — ∇2 f (x— + te)e dt

PI — (x ’ x(1)) =

0

1

eT PI — ∇2 f (x— + te)PI — e dt

=

0

≥ µ PI e 2 /2.

—

Therefore, x ’ x(1) ≥ min(1, µ/2) e and setting M = max{2 + L, 1, 2/µ} completes

the proof.

Following the unconstrained case, we formulate a termination criterion based on relative and

absolute reductions in the measure of stationarity x ’ x(1) . Given r0 = x0 ’ x0 (1) and

relative and absolute tolerances „r and „a the termination criterion for Algorithm gradproj is

x ’ x(1) ¤ „a + „r r0 .

(5.18)

5.4.2 Convergence Analysis

The convergence analysis is more complicated than that for the steepest descent algorithm be-

cause of the care that must be taken with the constraints. Our analysis begins with several

preliminary lemmas.

Lemma 5.4.3. For all x, y ∈ „¦,

(y ’ x(»))T (x(») ’ x + »∇f (x)) ≥ 0.

(5.19)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

94 ITERATIVE METHODS FOR OPTIMIZATION

Proof. By de¬nition of P

x(») ’ x + »∇f (x) ¤ y ’ x + »∇f (x)

for all y ∈ „¦. Hence t = 0 is a local minimum for

φ(t) = (1 ’ t)x(») + ty ’ x + »∇f (x) 2 /2

and, therefore,

0 ¤ φ (0) = (y ’ x(»))T (x(») ’ x + »∇f (x))

as asserted.

We will most often use the equivalent form of (5.19)

(x ’ x(»))T (y ’ x(»)) ¤ »∇f (x)T (y ’ x(»)).

(5.20)

Setting y = x in (5.20), we state Corollary 5.4.4.

Corollary 5.4.4. For all x ∈ „¦ and » ≥ 0,

2

¤ »∇f (x)T (x ’ x(»)).

x ’ x(»)

(5.21)

An important result in any line search analysis is that the steplengths remain bounded away

from 0.

Theorem 5.4.5. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let

x ∈ „¦. Then the suf¬cient decrease condition (5.13) holds for all » such that

2(1 ’ ±)

0<»¤ .

(5.22)

L

Proof. We begin with the fundamental theorem of calculus. Setting y = x ’ x(») we have

1

∇f (x ’ ty)T y dt.

f (x ’ y) ’ f (x) = f (x(»)) ’ f (x) = ’

0

Hence,

= f (x) + ∇f (x)T (x(») ’ x)

f (x(»))

(5.23) 1

(∇f (x ’ ty) ’ ∇f (x))T y dt.

’

0

Rearranging terms in (5.23) gives

»(f (x) ’ f (x(»))) = »∇f (x)T (x ’ x(»)) + »E,

(5.24)

where

1

(∇f (x ’ ty) ’ ∇f (x))T y dt

E=

0

and hence

E ¤ L x ’ x(») 2 /2.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 95

So

»(f (x) ’ f (x(»))) ≥ »∇f (x)T (x ’ x(»)) ’ »L x ’ x(») 2 /2.

(5.25)

Therefore, using Corollary 5.4.4 we obtain

2

»(f (x) ’ f (x(»))) ≥ (1 ’ »L/2) x ’ x(»)

which completes the proof.

The consequence for the Armijo rule is that the line search will terminate when

2(1 ’ ±)

βm ¤ ¤ β m’1

L

if not before. Hence, a lower bound for the steplengths is

¯ 2β(1 ’ ±) .

»=

(5.26)

L

Theorem 5.4.6. Assume that ∇f is Lipschitz continuous with Lipschitz constant L. Let

{xn } be the sequence generated by the gradient projection method. Then every limit point of

the sequence is a stationary point.

Proof. Since the sequence {f (xn )} is decreasing and f is bounded from below on „¦, f (xn )

has a limit f — . The suf¬cient decrease condition, as in the proof of Theorem 3.2.4, and (5.26)

imply that

2

xn ’ xn+1 ¤ »(f (xn ) ’ f (xn+1 ))/± ¤ (f (xn ) ’ f (xn+1 ))/± ’ 0

as n ’ ∞.

Now let y ∈ „¦ and n ≥ 0. By (5.20) we have

∇f (xn )T (xn ’ y) = ∇f (xn )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 )

¤ »’1 (xn ’ xn+1 )T (xn+1 ’ y) + ∇f (xn )T (xn ’ xn+1 ).

n

Therefore, by (5.26),

∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ),

n

(5.27)

¯

∇f (xn )T (xn ’ y) ¤ xn ’ xn+1 (»’1 xn+1 ’ y + ∇f (xn ) ).

If xnl ’ x— is a convergence subsequence of {xn }, then we may take limits in (5.27) as

l ’ ∞ and complete the proof.

5.4.3 Identi¬cation of the Active Set

The gradient projection iteration has the remarkable property that if it converges to a nondegen-

erate local minimizer, then the active set An of xn is the same as A— after only ¬nitely many

iterations.

Theorem 5.4.7. Assume that f is Lipschitz continuously differentiable and that the gradient

projection iterates {xn } converge to a nondegenerate local minimizer x— . Then there is n0 such

that A(xn ) = A(x— ) for all n ≥ n0 .

¯

Proof. Let » be the lower bound for the steplength. Let δ be such that the conclusions of

¯ ¯

Lemma 5.4.1 hold for » = » (and hence for all » ≥ »). Let n0 be such that en < δ for all

n ≥ n0 ’ 1 and the proof is complete.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

96 ITERATIVE METHODS FOR OPTIMIZATION

5.4.4 A Proof of Theorem 5.2.4

We close this section with a proof of Theorem 5.2.4. We de¬ne a nonsmooth function

F (x) = x ’ P(x ’ ∇f (x)).

(5.28)

Using (5.12),

F (x) = x ’ x(1).

We now prove Theorem 5.2.4.

Proof. Corollary 5.4.4 states that

x— ’ x— (») 2

¤ »∇f (x— )T (x— ’ x— (»)).

If we set x = x— (») in the de¬nition of stationarity (5.7) we have

∇f (x— )T (x— ’ x— (»)) ¤ 0

and hence x— = x— (»).

Conversely assume that x— = x— (») for all » > 0. This implies that x— is left invariant by

the gradient projection iteration and is therefore a stationary point.

By setting » = 1 we obtain a simple consequence of Theorem 5.2.4.

Corollary 5.4.8. Let f be a Lipschitz continuously differentiable function on „¦. Then if

x is stationary then F (x— ) = 0.

—

5.5 Superlinear Convergence

Once the gradient projection iteration has identi¬ed the active constraints, PA(x— ) x— is known.

At that point the minimization problem for PI x— is unconstrained and, in principal, any super-

linearly convergent method for unconstrained optimization could then be used.

The problem with this idea is, of course, that determining when the active set has been

identi¬ed is possible only after the problem has been solved and an error in estimating the active

set can have devastating effects on convergence. In this section we discuss two approaches: one,

based on Newton™s method, is presented only as a local method; the other is a BFGS“Armijo

method similar to Algorithm bfgsopt.

We will begin with the development of the local theory for the projected Newton method

[19]. This analysis illustrates the important problem of estimation of the active set. As with the

unconstrained minimization problem, the possibility of negative curvature makes this method

dif¬cult to globalize (but see §5.6 for pointers to the literature on trust region methods). Following

the approach in §4.2 we describe a projected BFGS“Armijo scheme in §5.5.3.

5.5.1 The Scaled Gradient Projection Algorithm

One might think that the theory developed in §5.4 applies equally well to iterations of the form

’1

x+ = P(xc ’ »Hc ∇f (xc ))

where Hc is spd. This is not the case as the following simple example illustrates. Let N = 2,

Li = 0, and Ui = 1 for all i. Let

f (x) = x ’ (’1, 1/2)T 2

/2;

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 97

then the only local minimizer for (5.3) is x— = (0, 1/2)T . Let xc = (0, 0) (not a local mini-

mizer!); then ∇f (xc ) = (1, ’1/2)T . If

21

’1

Hc =

12

’1

then Hc (and hence Hc ) is spd, and

2 1 1 3/2

’1

Hc ∇f (xc ) = = .

1 2 ’1/2 0

Therefore, for all » > 0,

0 1 ’3»/2 0

’1

xc (») = P ’ »Hc =P = = xc .

0 ’1/2 0 0

The reason that xc (») = xc for all » > 0 is that the search direction for the unconstrained

’1

problem has been rotated by Hc to be orthogonal to the direction of decrease in the inactive

directions for the constrained problem. Hence, unlike the constrained case, positivity of the

model Hessian is not suf¬cient and we must be able to estimate the active set and model the

reduced Hessian (rather than the Hessian) if we expect to improve convergence.

The solution proposed in [19] is to underestimate the inactive set in a careful way and

therefore maintain a useful spd approximate reduced Hessian. For x ∈ „¦ and

0 ¤ < min(Ui ’ Li )/2,

we de¬ne A (x), the -active set at x, by

A (x) = {i | Ui ’ (x)i ¤ or (x)i ’ Li ¤ }.

(5.29)

And let I (x), the -inactive set, be the complement of A (x).

Given 0 ¤ c < min(Ui ’ Li )/2, xc , and an spd matrix Hc , we model the reduced Hessian

with Rc , the matrix with entries

±

δij if i ∈ A c (xc ) or j ∈ A c (xc ),

Rc = PA c (xc ) + PI c (xc )Hc PI c (xc ) =

(Hc )ij otherwise.

(5.30)

When the explicit dependence on xc , c , and Hc is important we will write

R(xc , c , Hc ).

So, for example,

∇2 f (xc ) = R(xc , 0, ∇2 f (xc )).

R

Given 0 < < min(Ui ’ Li )/2 and an spd H, de¬ne

xH, (») = P(x ’ »R(x, , H)’1 ∇f (x)).

It requires proof that

f (xH, (»)) < f (x)

for » suf¬ciently small. We prove more and show that the suf¬cient decrease condition

f (xH, (»)) ’ f (x) ¤ ’±∇f (x)T (x ’ xH, (»))

(5.31)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

98 ITERATIVE METHODS FOR OPTIMIZATION

holds for suf¬ciently small ».

Lemma 5.5.1. Let x ∈ „¦, 0 < < min(Ui ’ Li )/2, and H be spd with smallest and largest

eigenvalues 0 < »s ¤ »l . Let ∇f be Lipschitz continuous on „¦ with Lipschitz constant L. Then

¯

there is »( , H) such that (5.31) holds for all

¯

» ¤ »( , H).

(5.32)

Proof. The proof will show that

∇f (x)T (x ’ xH, (»)) ≥ »’1 ∇f (x)T (x ’ x(»))

l

and then use the method of proof from Theorem 5.4.5. We do this by writing

∇f (x)T (x ’ xH, (»)) = (PA T

’ xH, (»)) + (PI T

’ xH, (»))

(x) ∇f (x)) (x (x) ∇f (x)) (x

and considering the two terms on the right side separately.

We begin by looking at (PA (x) ∇f (x))T (x ’ xH, (»)). Note that

(xH, (»))i = (x(»))i for i ∈ A (x)

and, therefore,

T

’ xH, (»)) = (PA T

(PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»)).

(5.33)

We will need to show that

T

(PA (x) ∇f (x)) (x ’ x(»)) ≥ 0.

(5.34)

Now assume that

min(Ui ’ Li )

¯

» < »1 = .

(5.35)

2 maxx∈„¦ ∇f (x) ∞

Since A(x) ‚ A (x) we can investigate the contributions of A(x) and A (x) © I(x) separately.

If i ∈ A(x) then (5.35) implies that either (x ’ x(»))i = »(∇f (x))i or (x ’ x(»))i = 0. In

either case (x ’ x(»))i (∇f (x))i ≥ 0. If i ∈ A (x) © I(x) and (x ’ x(»))i = »(∇f (x))i , then

it must be the case that i ∈ A(x(»)) and therefore we must still have (x ’ x(»))i (∇f (x))i ≥ 0.

Hence (5.34) holds.

Now if i ∈ I (x) then, by de¬nition,

Li + ¤ (x)i ¤ Ui ’

and, hence, if

¯

» ¤ »2 =

(5.36)

maxx∈„¦ R(x, , H)’1 ∇f (x) ∞

then i is in the inactive set for both xH, (») and x(»). Therefore, if (5.36) holds then

T

’ xH, (»)) = »(PI T ’1

(PI (x) ∇f (x)) (x (x) ∇f (x)) H PI (x) ∇f (x)

≥ »’1 »’1 PI 2

(x) (x ’ x(»))

(5.37) l

= »’1 (PI T

(x) ∇f (x)) (x ’ x(»)).

l

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 99

Hence, using Corollary 5.4.4, (5.33), (5.34), and (5.37), we obtain

∇f (x)T (x ’ xH, (»)) T

’ xH, (»))

= (PA (x) ∇f (x)) (x

T

’ xH, (»))

+(PI (x) ∇f (x)) (x

’ x(»)) + »’1 (PI

T T

≥ (PA (x) ∇f (x)) (x (x) ∇f (x)) (x ’ x(»))

l

min(1,»’1 )

≥ min(1, »’1 )∇f (x)T (x ’ x(»)) ≥ x ’ x(») 2 .

l

l »

(5.38)

The remainder of the proof is almost identical to that for Theorem 5.4.5. The fundamental

theorem of calculus and the Lipschitz continuity assumption imply that

f (xH, (»)) ’ f (x) ¤ ’∇f (x)T (x ’ xH, (»)) + L x ’ xH, (») 2 .

We apply (5.38) and obtain

f (xH, (»)) ’ f (x) ¤ ’(1 ’ L» max(1, »l ))∇f (x)T (x ’ xH, (»)),

which implies (5.31) if 1 ’ L» max(1, »l ) ≥ ± which will follow from

(1 ’ ±)

¯

» ¤ »3 = .

(5.39)

max(1, »l )L

¯ ¯¯¯

This completes the proof with » = min(»1 , »2 , »3 ).

An algorithm based on these ideas is the scaled gradient projection algorithm. The name

comes from the scaling matrix H that is used to computed the direction. The inputs are the initial

iterate, the vectors of upper and lower bounds u and l, the relative-absolute residual tolerance

vector „ = („r , „a ), and a limit on the number of iterations. Left unstated in the algorithmic

description are the manner in which the parameter is computed and the way in which the

approximate Hessians are constructed.

Algorithm 5.5.1. sgradproj(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).

(b) Compute and an spd H.

(c) Solve

R(x, , Hc )d = ’∇f (xc ).

(d) Find the least integer m such that (5.13) holds for » = β m .

(e) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

If our model reduced Hessians remain uniformly positive de¬nite, a global convergence

result completely analogous to Theorem 3.2.4 holds.

Theorem 5.5.2. Let ∇f be Lipschitz continuous with Lipschitz constant L. Assume that the

matrices Hn are symmetric positive de¬nite and that there are κ and »l such that κ(Hn ) ¤ κ,

¯ ¯

and Hn ¤ »l for all n. Assume that there is ¯ > 0 such that ¯ ¤ n < min(Ui ’ Li )/2 for

all n.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

100 ITERATIVE METHODS FOR OPTIMIZATION

Then

lim xn ’ xn (1) = 0,

(5.40)

n’∞

and hence any limit point of the sequence of iterates produced by Algorithm sgradproj is a

stationary point.

In particular, if xnl ’ x— is any convergent subsequence of {xn }, then x— = x— (1). If xn

converges to a nondegenerate local minimizer x— , then the active set of xn is the same as that of

x— after ¬nitely many iterations.

Proof. With Lemma 5.5.1 and its proof in hand, the proof follows the outline of the proof of

Theorem 5.4.6. We invite the reader to work through it in exercise 5.8.3.

5.5.2 The Projected Newton Method

The requirement in the hypothesis of Theorem 5.5.2 that the sequence { n } be bounded away

from zero is used to guarantee that the steplengths »n are bounded away from zero. This is

needed because appears in the numerator in (5.36). However, once the active set has been

identi¬ed and one is near enough to a nondegenerate local minimizer for the reduced Hessians to

be spd, one is solving an unconstrained problem. Moreover, once near enough to that minimizer,

the convergence theory for Newton™s method will hold. Then one can, in principle, set n = 0

and the iteration will be q-quadratically convergent. In this section we discuss an approach from

[19] for making a transition from the globally convergent regime described in Theorem 5.5.2 to

the locally convergent setting where Newton™s method converges rapidly.

If the initial iterate x0 is suf¬ciently near a nondegenerate local minimizer x— and we take

Hn = ∇2 f (xn )

R

in Algorithm sgradproj, then the resulting projected Newton method will take full steps (i.e.,

» = 1) and, if n is chosen with care, converge q-quadratically to x— .

A speci¬c form of the recommendation from [19], which we use here, is

= min( xn ’ xn (1) , min(Ui ’ Li )/2).

(5.41) n

Note that while xn is far from a stationary point and the reduced Hessian is spd, then n will be

bounded away from zero and Theorem 5.5.2 will be applicable. The convergence result is like

Theorem 2.3.3 for local convergence but makes the strong assumption that Hn is spd (valid near

x— , of course) in order to get a global result.

Algorithm projnewt is the formal description of the projected Newton algorithm. It is a

bit more than just a speci¬c instance of Algorithm gradproj. Keep in mind that if the initial

iterate is far from x— and the reduced Hessian is not spd, then the line search (and hence the

entire iteration) may fail. The algorithm tests for this. This possibility of inde¬niteness is the

weakness in any line search method that uses ∇2 f when far from the minimizer. The inputs to

Algorithm projnewt are the same as those for Algorithm gradproj. The algorithm exploits

the fact that

R(x, , ∇2 f (x)) = R(x, , ∇2 f (x))

(5.42) R

which follows from A(x) ‚ A (x).

Algorithm 5.5.2. projnewt(x, f, „, nmax)

1. For n = 1, . . . , nmax

(a) Compute f and ∇f ; test for termination using (5.18).

(b) Set = x ’ x(1) .

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 101

(c) Compute and factor R = R(x, , ∇2 f (x)). If R is not spd, terminate with a failure

R

message.

(d) Solve Rd = ’∇f (xc ).

(e) Find the least integer m such that (5.13) holds for » = β m .

(f) x = x(»).

2. If n = nmax and the termination test is failed, signal failure.

Theorem 5.5.3. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near

to x— and A(x0 ) = A(x— ) then the projected Newton iteration, with n = xn ’ xn (1) , will

converge q-quadratically to x— .

Proof. Our assumption that the active set has been identi¬ed, i.e.,

A(xc ) = A(x+ ) = A(x— ),

implies that

PA(xc ) ec = PA(xc ) e+ = 0.

Hence, we need only estimate PI(xc ) e+ to prove the result.

Let

δ — = min— (|(x)i ’ Ui |, |(x)i ’ Li |) > 0.

i∈I(x )

We reduce e if necessary so that

e ¤ δ — /M,

where M is the constant in Theorem 5.4.2. We may then apply Theorem 5.4.2 to conclude that

both c < δ — and ec < δ — . Then any index i ∈ A c (xc ) must also be in A(xc ) = A(x— ).

Hence

A c (xc ) = A(xc ) = A(x— ).

(5.43)

From this we have

R(xc , c , ∇2 f (xc )) = ∇2 f (xc ).

(5.44) R R

Hence, for ec suf¬ciently small the projected Newton iteration is

x+ = P(xc ’ (∇2 f (xc ))’1 ∇f (xc )).

R

By the fundamental theorem of calculus,

∇f (xc ) = ∇f (x— ) + ∇2 f (xc )ec + E1 ,

(5.45)

where

1

(∇2 f (x— + tec ) ’ ∇2 f (xc ))ec dt

E1 =

0

and hence E1 ¤ K1 ec 2 for some K1 > 0.

By the necessary conditions,

PI(x) ∇f (x— ) = PI(x— ) ∇f (x— ) = 0.

(5.46)

By the fact that I(xc ) = I(x— ), we have the equivalent statements

ec = PI(xc ) ec and PA(xc ) ec = 0.

(5.47)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

102 ITERATIVE METHODS FOR OPTIMIZATION

Therefore, combining (5.45), (5.46), (5.47),

= PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1

PI(xc ) ∇f (xc )

= PA(xc ) ec + PI(xc ) ∇2 f (xc )PI(xc ) ec + PI(xc ) E1

(5.48)

= ∇2 f (xc )ec + PI(xc ) E1 .

R

So, by de¬nition of ∇2 ,

R

PI(xc ) (∇2 f (xc ))’1 ∇f (xc ) = (∇2 f (xc ))’1 PI(xc ) ∇f (xc ) = ec + E2 ,

R R

where E2 ¤ K2 ec 2 for some K2 > 0.

Since PI(xc ) Pw = PPI(xc ) w for all w ∈ RN ,

= PI(xc ) P(xc ’ (∇2 f (xc ))’1 ∇f (xc ))

PI(xc ) x+ R

= PPI(xc ) (xc ’ (∇2 f (xc ))’1 ∇f (xc )) = P(x— ’ E2 ).

R

2

Therefore, e+ ¤ K2 ec as asserted.

5.5.3 A Projected BFGS“Armijo Algorithm

We can apply the structured quasi-Newton updating scheme from §4.3 with

C(x) = PA

(5.49) (x)

and update an approximation to the part of the model Hessian that acts on the inactive set. In

this way we can hope to maintain a positive de¬nite model reduced Hessian with, say, a BFGS

update. So if our model reduced Hessian is

R = C(x) + A,

we can use (4.42) to update A (with A0 = PI 0 (x0 ) , for example), as long as the active set does

not change. If one begins the iteration near a nondegenerate local minimizer with an accurate

approximation to the Hessian, then one would expect, based on Theorem 5.5.3, that the active

set would remain constant and that the iteration would converge q-superlinearly.

However, if the initial data is far from a local minimizer, the active set can change with each

iteration and the update must be designed to account for this. One way to do this is to use a

projected form of the BFGS update of A from (4.42),

T

y# y# (Ac s)(Ac s)T

A+ = PI+ Ac PI+ + ’ PI+ PI+ ,

(5.50) T sT Ac s

y# s

with

y # = PI+ (∇f (x+ ) ’ ∇f (xc )).

Here I+ = I + (x+ ). This update carries as much information as possible from the previous

model reduced Hessian while taking care about proper approximation of the active set. As in

T

the unconstrained case, if y # s ¤ 0 we can either skip the update or reinitialize A to PI .

A is not spd if any constraints are active. However, we can demand that A be symmetric

inde¬nite, and a generalized inverse A† exists. We have

(PA + A)’1 = PA + A† .

(5.51)

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 103

If A(xc ) = A(x+ ) any of the low-storage methods from Chapter 4 can be used to update A† .

In this case we have

T

sy # y # sT ssT

A† A†

= I’ I’ + .

(5.52) c

+ T T T

y# s y# s y# s

Since s = PI+ s if A(x+ ) = A(xc ), we could replace s by s# = PI+ s in (5.52).

If the set of active constraints changes, then (5.52) is no longer true and we cannot replace

s by s# . One approach is to store A, update it as a full matrix, and refactor with each nonlinear

iteration. This is very costly for even moderately large problems. Another approach, used in

[250], is to reinitialize A = PI whenever the active set changes. The problem with this is that

in the terminal phase of the iteration, when most of the active set has been identi¬ed, too much

information is lost when A is reinitialized.

In this book we suggest an approach based on the recursive BFGS update that does not

discard information corresponding to that part of the inactive set that is not changed. The idea is

that even if the active set has changed, we can still maintain an approximate generalized inverse

with

T T T

s# y # y # s# s# s#

† †

A+ = I ’ PI+ Ac PI+ I ’ + .

(5.53) T T T

y # s# y # s# y # s#

The formulation we use in the MATLAB code bfgsbound is based on (5.52) and Al-

gorithm bfgsrec. Algorithm bfgsrecb stores the sequences {yk } and {s# } and uses

#

k

Algorithm bfgsrec and the projection PI+ to update A† as the iteration progresses. The data

are the same as for Algorithm bfgsrec with the addition of

PIn = PI .

n (xn )

Note that the sequences {yk } and {s# } are changed early in the call and then the unconstrained

#

k

algorithm bfgsrec is used to do most of the work.

Algorithm 5.5.3. bfgsrecb(n, {s# }, {yk }, A† , d, PIn )

#

0

k

1. d = PIn d.

2. If n = 0, d = A† d; return

0

T

T # #

3. ± = s# n’1 d/yn’1 s# ; d = d ’ ±yn’1

4. call bfgsrec(n ’ 1, {s# k }, {yk }, A† , d)

#

0

T T

# #

5. d = d + (± ’ (yn’1 d/yn’1 s# n’1 ))s# n’1

6. d = PIn d.

The projected BFGS“Armijo algorithm that we used in the example problems in §5.7 is

based on Algorithm bfgsrecb. Note that we reinitialize ns to zero (i.e., reinitialize A to PI )

#T

when yns s ¤ 0. We found experimentally that this was better than skipping the update.

Algorithm 5.5.4. bfgsoptb(x, f, „, u, l)

1. ns = n = 0; pg0 = pg = x ’ P(x ’ ∇f (x))

= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x); A0 = PI

2.

3. While pg ¤ „a + „r pg0

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

104 ITERATIVE METHODS FOR OPTIMIZATION

(a) d = ’∇f (x); Call bfgsrecb(ns, {s# }, {yk }, A† , d, PI )

#

0

k

(b) d = ’PA ∇f (x) + d

(c) Find the least integer m such that (5.13) holds for » = β m . Set s# = PI (x(») ’ x)

ns

#

(d) xp = x(»); y = ∇f (xp) ’ ∇f (x); x=xp; yns = PI (∇f (xp) ’ ∇f (x))

T

(e) If yns s# s > 0 then ns = ns + 1, else ns = 0

#

n

(f) x = xp; pg = x ’ P(x ’ ∇f (x))

= min(min(Ui ’ Li )/2, pg ); A = A (x); I = I (x)

(g)

(h) n = n + 1

Theorem 4.1.3 can be applied directly once the active set has been identi¬ed and a good

initial approximation to the reduced Hessian is available. The reader is invited to construct the

(easy!) proof in exercise 5.8.6.

Theorem 5.5.4. Let x— be a nondegenerate local minimizer. Then if x0 is suf¬ciently near

to x— , A(x0 ) = A(x— ), and A0 suf¬ciently near to PI(x— ) ∇2 f (x— )PI(x— ) , then the projected

BFGS iteration, with n = xn ’ xn (1) , will converge q-superlinearly to x— .

A global convergence result for this projected BFGS algorithm can be derived by combining

Theorems 5.5.2 and 4.1.9.

Theorem 5.5.5. Let ∇f be Lipschitz continuous on „¦. Assume that the matrices Hn are

constructed with the projected BFGS method (5.50) and satisfy the assumptions of Theorem 5.5.2.

Then (5.40) and the conclusions of Theorem 5.5.2 hold.

Moreover, if x— is a nondegenerate local minimizer such that there is n0 such that A(xn ) =

A(x— ) for all n ≥ n0 , Hn0 is spd, and the set

D = {x | f (x) ¤ f (xn0 ) and A(x) = A(x— )}

is convex, then the projected BFGS“Armijo algorithm converges q-superlinearly to x— .

5.6 Other Approaches

Our simple projected-BFGS method is effective for small to medium sized problems and for very

large problems that are discretizations of in¬nite-dimensional problems that have the appropriate

compactness properties. The example in §4.4.2 nicely illustrates this point. For other kinds of

large problems, however, more elaborate methods are needed, and we present some pointers to

the literature in this section.

The limited memory BFGS method for unconstrained problems described in [44] and [176]

has also been extended to bound constrained problems [42], [280]. More general work on line

search methods for bound constrained problems can be found in [47], [194], and [42].

Very general theories have been developed for convergence of trust region methods for

bound constrained problems. The notion of Cauchy decrease can, for example, be replaced by

the decrease from a gradient projection step for the quadratic model [191], [259], [66]. One

could look for minima of the quadratic model along the projection path [63], [64], or attempt to

project the solution of an unconstrained model using the reduced Hessian [162].

A completely different approach can be based on interior point methods. This is an active

research area and the algorithms are not, at least at this moment, easy to implement or analyze.

This line of research began with [57] and [58]. We refer the reader to [86] and [266] for more

recent accounts of this aspect of the ¬eld and to [140] and [79] for some applications to control

problems and an account of the dif¬culties in in¬nite dimensions.

Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.

Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.

BOUND CONSTRAINTS 105

Gradient Projection Gradient Projection

1 2

10 10

Projected Gradient Norm

Function Value

0

10

1 1

10 10

0 500 1000 1500 2000 0 500 1000 1500 2000

Iterations Iterations

Projected BFGS Projected BFGS

5 2

10 10

Projected Gradient Norm

Function Value

0

10

5

10

10 1

10 10

0 10 20 30 40 0 10 20 30 40

Iterations Iterations

Figure 5.1: Solution to Constrained Parameter ID Problem

Solution to bound constrained control problem

1.4

1.3

1.2

1.1

1

0.9

0.8

0.7

0.6

0.5

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Figure 5.2: Solution to Discrete Control Problem: First Example