<<

. 7
( 9)



>>





0
10




function value
50
1
10 40

30
2
10
20
3
10 10
0 50 100 150 200 0 50 100 150 200
function evaluations function evaluations

Forward Differences Forward Differences
2
10 70

60
simplex gradient norm




0
10
function value

50
2
10 40

30
4
10
20
6
10 10
0 50 100 150 200 250 0 50 100 150 200 250
function evaluations function evaluations


Figure 7.3: Third Weber Example

differencing. This, of course, is consistent with the theory, which does not claim that implicit
¬ltering is a global optimizer.

7.6.2 Parameter ID
We consider the parameter ID example from §1.6.2 using the data from §2.6.1. Recall that in this
example we use as data the values of the exact solution for c = k = 1 at the points ti = i/100
for 1 ¤ i ¤ 100. The initial iterate was (5, 5)T ; the sequence of scales was {2’k }12 . Implicit
k=1
¬ltering, like the globally convergent algorithms in the ¬rst part of the book, is fairly insensitive
to the choice of initial iterate, as we will see when we revisit this example in §8.5.2.
We report on both low (rtol = atol = 10’3 , Figure 7.4) and high (rtol = atol = 10’6 ,
Figure 7.5) accuracy computations. Note that after 200 function evaluations the function re-
duction from the central difference BFGS form of implicit ¬ltering ¬‚attens out in both plots at
roughly the expected level of O(tol) while the other methods have not. This effect, which is
not uncommon, is one reason for our preference for the BFGS central difference form of the
algorithm.

7.6.3 Convex Quadratics
The performance of the central difference BFGS form of implicit ¬ltering should be very good,
since (see exercises 7.7.1 and 7.7.2) the difference approximation of the gradient is exact. We
would expect that good performance to persist in the perturbed case. We illustrate this with results
on two problems, both given by (6.12). One is an unperturbed problem (aj = bj = cj = 0
for all j) where H is a diagonal matrix with (H)ii = 1/(2i) for 1 ¤ i ¤ N . The other is a
perturbed problem with
ξ0 = (sin(1), sin(2), . . . , sin(N ))T , ξ1 = 0, ξ2 = (1, . . . , 1)T ,

a1 = a2 = .01, a3 = 0, b1 = (1, . . . , 1)T , b2 = 0, and c1 = c2 = 10π.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




130 ITERATIVE METHODS FOR OPTIMIZATION



Central Differences Central Differences
4 2
10 10
simplex gradient norm


2
10
0




function value
10
0
10
2
10
2
10


4 4
10 10
0 50 100 150 200 250 0 50 100 150 200 250
function evaluations function evaluations

Forward Differences Forward Differences
5 2
10 10
simplex gradient norm




0 0
function value
10 10



5 2
10 10



10 4
10 10
0 50 100 150 200 250 0 50 100 150 200 250
function evaluations function evaluations


Figure 7.4: Parameter ID, tol = 10’3




Central Differences Central Differences
4 2
10 10
simplex gradient norm




0
10
2
function value




10
2
10
0
10
4
10


2 6
10 10
0 50 100 150 200 250 0 50 100 150 200 250
function evaluations function evaluations

Forward Differences Forward Differences
5 2
10 10
simplex gradient norm




0 0
function value




10 10



5 2
10 10



10 4
10 10
0 50 100 150 200 250 0 50 100 150 200 250
function evaluations function evaluations


Figure 7.5: Parameter ID, tol = 10’6




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




IMPLICIT FILTERING 131



Central Differences Central Differences
10 20
10 10
simplex gradient norm




0 0




function value
10 10



10 20
10 10



20 40
10 10
0 100 200 300 400 0 100 200 300 400
function evaluations function evaluations

Forward Differences Forward Differences
10 5
10 10
simplex gradient norm




0 0
function value




10 10



10 5
10 10



20 10
10 10
0 200 400 600 0 200 400 600
function evaluations function evaluations


Figure 7.6: Unperturbed Quadratic, N = 4




Central Differences Central Differences
2 5
10 10
simplex gradient norm




0
10
0
function value




10
2
10
5
10
4
10


6 10
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations

Forward Differences Forward Differences
5 5
10 10
simplex gradient norm




0 0
function value




10 10



5 5
10 10



10 10
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations


Figure 7.7: Unperturbed Quadratic, N = 32




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




132 ITERATIVE METHODS FOR OPTIMIZATION



Central Differences Central Differences
2 2
10 10
simplex gradient norm




0 0




function value
10 10



2 2
10 10



4 4
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations

Forward Differences Forward Differences
2 2
10 10
simplex gradient norm




0
10
0
function value
10
2
10
2
10
4
10


6 4
10 10
0 100 200 300 400 0 100 200 300 400
function evaluations function evaluations


Figure 7.8: Perturbed Quadratic, N = 4




Central Differences Central Differences
2 4
10 10
simplex gradient norm




1 2
10 10
function value




0 0
10 10


1 2
10 10


2 4
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations

Forward Differences Forward Differences
2 3
10 10
simplex gradient norm




2
10
1
function value




10
1
10
0
10
0
10


1 1
10 10
0 1000 2000 3000 4000 0 1000 2000 3000 4000
function evaluations function evaluations


Figure 7.9: Perturbed Quadratic, N = 32




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




IMPLICIT FILTERING 133

If a3 = 0 then f may not return the same value when called with the same argument twice.
The reader is invited to explore the consequences of this in exercise 7.7.3.
The performance of the algorithms in this part of the book sometimes depends on the size of
the problem much more strongly than the Newton-based methods in Part I. In the case of implicit
¬ltering, that dependence is mostly a result of the cost of evaluation of the simplex gradient. To
illustrate this we consider our quadratic problems for N = 4 (Figures 7.6 and 7.8) and N = 32
(Figures 7.7 and 7.9).
For all the quadratic examples the initial iterate was

(1, 2, . . . , N )T
x0 =
10N
and the sequence of scales was {2’k }10 .
k=0



7.7 Exercises on Implicit Filtering
7.7.1. Let S be a nonsingular simplex. Show that DC (f : S) = f (x1 ) if f is a quadratic
function.
7.7.2. How would you expect forward and centered difference implicit ¬ltering to perform when
applied to f (x) = xT x? Would the performance be independent of dimension? Test your
expectation with numerical experimentation.
7.7.3. Use implicit ¬ltering to minimize the perturbed quadratic function with nonzero values of
a3 .

7.7.4. Try to solve the Lennard“Jones problem with implicit ¬ltering for various values of M
and various initial iterates. Compare your best results with those in [142], [40], and [210].
Are you doing any better than you did in exercise 6.4.3?
7.7.5. Show that Theorems 5.5.4 and 5.5.5 hold if the projected BFGS update is implemented us-
ing (7.9) and (7.10). How would these formulas affect an implementation like bfgsrecb,
which is designed for problems in which full matrices cannot be stored?




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




134 ITERATIVE METHODS FOR OPTIMIZATION




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




Chapter 8

Direct Search Algorithms

In this chapter we discuss the class of direct search algorithms. These methods use values of
f taken from a set of sample points and use that information to continue the sampling. Unlike
implicit ¬ltering, these methods do not explicitly use approximate gradient information. We will
focus on three such methods: the Nelder“Mead simplex algorithm [204], the multidirectional
search method [85], [261], [262], and the Hooke“Jeeves algorithm [145]. Each of these can be
analyzed using the simplex gradient techniques from Chapter 6. We will not discuss the very
general results based on the taxonomies of direct search methods from [263], [174], and [179] or
the recent research on the application of these methods to bound [173] or linear [175] constraints.
We include at the end of this chapter a short discussion of methods based on surrogate models
and a brief account of a very different search method, the DIRECT algorithm [150]. These two
¬nal topics do not lead to algorithms that are easy to implement, and our discussions will be
very general with pointers to the literature.


8.1 The Nelder“Mead Algorithm
8.1.1 Description and Implementation
The Nelder“Mead [204] simplex algorithm maintains a simplex S of approximations to an
optimal point. In this algorithm the vertices {xj }N +1 are sorted according to the objective
j=1
function values
f (x1 ) ¤ f (x2 ) ¤ · · · ¤ f (xN +1 ).
(8.1)
x1 is called the best vertex and xN +1 the worst. If several vertices have the same objective
value as x1 , the best vertex is not uniquely de¬ned, but this ambiguity has little effect on the
performance of the algorithm.
The algorithm attempts to replace the worst vertex xN +1 with a new point of the form

x(µ) = (1 + µ)x ’ µxN +1 ,
(8.2)

where x is the centroid of the convex hull of {xi }N
i=1

1 N
x= xi .
(8.3)
N i=1

The value of µ is selected from a sequence

’1 < µic < 0 < µoc < µr < µe

135
Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




136 ITERATIVE METHODS FOR OPTIMIZATION

by rules that we formally describe in Algorithm nelder. Our formulation of the algorithm
allows for termination if either f (xN +1 ) ’ f (x1 ) is suf¬ciently small or a user-speci¬ed number
of function evaluations has been expended.

Algorithm 8.1.1. nelder(S, f, „, kmax)

1. Evaluate f at the vertices of S and sort the vertices of S so that (8.1) holds.

2. Set f count = N + 1.

3. While f (xN +1 ) ’ f (x1 ) > „

(a) Compute x, (8.3), x(µr ), (8.2), and fr = f (x(µr )). f count = f count + 1.
(b) Re¬‚ect: If f count = kmax then exit. If f (x1 ) ¤ fr < f (xN ), replace xN +1 with
x(µr ) and go to step 3g.
(c) Expand: If f count = kmax then exit. If fr < f (x1 ) then compute fe = f (x(µe )).
f count = f count + 1. If fe < fr , replace xN +1 with x(µe ); otherwise replace
xN +1 with x(µr ). Go to to step 3g.
(d) Outside Contraction: If f count = kmax then exit. If f (xN ) ¤ fr < f (xN +1 ),
compute fc = f (x(µoc )). f count = f count + 1. If fc ¤ fr replace xN +1 with
x(µoc ) and go to step 3g; otherwise go to step 3f.
(e) Inside Contraction: If f count = kmax then exit. If fr ≥ f (xN +1 ) compute
fc = f (x(µic )). f count = f count + 1. If fc < f (xN +1 ), replace xN +1 with
x(µic ) and go to step 3g; otherwise go to step 3f.
(f) Shrink: If f count ≥ kmax ’ N , exit. For 2 ¤ i ¤ N + 1: set xi = x1 ’ (xi ’
x1 )/2; compute f (xi ).
(g) Sort: Sort the vertices of S so that (8.1) holds.


A typical sequence [169] of candidate values for µ is

{µr , µe , µoc , µic } = {1, 2, 1/2, ’1/2}.

Figure 8.1 is an illustration of the options in two dimensions. The vertices labeled 1, 2, and
3 are those of the original simplex.
The Nelder“Mead algorithm is not guaranteed to converge, even for smooth problems [89],
[188]. The failure mode is stagnation at a nonoptimal point. In §8.1.3 we will present some
examples from [188] that illustrate this failure. However, the performance of the Nelder“Mead
algorithm in practice is generally good [169], [274]. The shrink step is rare in practice and we
will assume in the analysis in §8.1.2 that shrinks do not occur. In that case, while a Nelder“Mead
iterate may not result in a reduction in the best function value, the average value

1 N +1
f= f (xj )
N +1 j=1


will be reduced.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 137

e




1
r



oc




ic



3 2

Figure 8.1: Nelder“Mead Simplex and New Points


8.1.2 Suf¬cient Decrease and the Simplex Gradient
Our study of the Nelder“Mead algorithm is based on the simple ideas in §3.1. We will denote
the vertices of the simplex S k at the kth iteration by {xk }N +1 . We will simplify notation by
j j=1
suppressing explicit mention of S k in what follows by denoting

V k = V (S k ), δ k = δ(f : S k ), K k = K(S k ), and Dk (f ) = D(f : S k ).

If V 0 is nonsingular then V k is nonsingular for all k > 0 [169]. Hence if S 0 is nonsingular so
is S k for all k and hence Dk (f ) is de¬ned for all k.
We formalize this by assuming that our sequence of simplices satis¬es the following assump-
tion.
Assumption 8.1.1. For all k,
• S k is nonsingular.
• The vertices satisfy (8.1).
• f k+1 < f k .

Assumption 8.1.1 is satis¬ed by the Nelder“Mead sequence if no shrink steps are taken
and the initial simplex directions are linearly independent [169]. The Nelder“Mead algorithm
demands that the average function value improve, but no control is possible on which value is
improved, and the simplex condition number can become unbounded.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




138 ITERATIVE METHODS FOR OPTIMIZATION

We can de¬ne a suf¬cient decrease condition for search algorithms that is analogous to the
suf¬cient decrease condition for steepest descent and related algorithms (3.2). We will ask that
the k + 1st iteration satisfy
f k+1 ’ f k < ’± Dk f 2 .
(8.4)
Here ± > 0 is a small parameter. Our choice of suf¬cient decrease condition is motivated by the
¯
smooth case and steepest descent, where (3.2) and the lower bound ’» on » from Lemma 3.2.3
lead to
¯
f (xk+1 ) ’ f (xk ) ¤ ’»± ∇f (xk ) 2 ,
which is a smooth form of (8.4). Unlike the smooth case, however, we have no descent direction
¯
and must incorporate » into ±. This leads to the possibility that if the simplex diameter is much
smaller than Dk f , (8.4) could fail on the ¬rst iterate. We address this problem with the scaling
σ+ (S 0 )
± = ±0 .
D0 f
A typical choice in line search methods, which we use in our numerical results, is ±0 = 10’4 .
The convergence result for smooth functions follows easily from Lemma 6.2.1.
Theorem 8.1.1. Let a sequence of simplices satisfy Assumption 8.1.1 and let the assumptions
of Lemma 6.2.1 hold, with the Lipschitz constants K k uniformly bounded. Assume that {f k } is
bounded from below. Then if (8.4) holds for all but ¬nitely many k and
lim σ+ (S k )κ(V k ) = 0,
k’∞

then any accumulation point of the simplices is a critical point of f .
Proof. The boundedness from below of {f k } and (8.4) imply that f k ’ 0. Assumption 8.1.1
and (8.4) imply that limk’∞ Dk f = 0. Hence (6.2) implies
lim ∇f (xk ) ¤ lim Kκ(V k )σ+ (S k ) + Dk f = 0.
1
k’∞ k’∞

Hence, if x— is any accumulation point of the sequence {xk } then ∇f (x— ) = 0. This completes
1
k k
the proof since κ(V ) ≥ 1 and therefore σ+ (V ) ’ 0.
The result for the noisy functions that satisfy (6.1) with fs smooth re¬‚ects the fact that
the resolution is limited by the size of φ. In fact, if σ+ (S k ) is much smaller than φ S k , no
information on fs can be obtained by evaluating f at the vertices of S k and once σ+ (S k ) is
1/2
smaller than φ S k no conclusions on ∇fs can be drawn. If, however, the noise decays to zero
suf¬ciently rapidly near the optimal point, the conclusions of Theorem 8.1.1 still hold.
Theorem 8.1.2. Let a sequence of simplices satisfy Assumption 8.1.1 and let the assumptions
of Lemma 6.2.2 hold with the Lipschitz constants Ks uniformly bounded. Assume that {f k } is
k

bounded from below. Then if (8.4) holds for all but ¬nitely many k and if
φ Sk
lim κ(V k ) σ+ (S k ) + = 0,
(8.5)
σ+ (S k )
k’∞

then any accumulation point of the simplices is a critical point of fs .
Proof. Our assumptions, as in the proof of Theorem 8.1.1, imply that Dk f ’ 0. Recall that
Lemma 6.2.2 implies that
φ Sk
Dk fs ¤ Dk f + K k κ(V k ) σ+ (S k ) + ,
(8.6)
σ+ (S k )

and the sequence {K k } is bounded because {Ks } is. Hence, by (8.5), Dk fs ’ 0 as k ’ ∞.
k




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 139

Function Differences Max Oriented Length
2 2
10 10
0
10
0
10
2
10
2
10
4
10
4
10 6
10
6
10 8
10
8 10
10 10
0 50 100 150 0 50 100 150

Simplex Gradient Norm Simplex Condition
4 20
10 10

15
10
3
10
10
10
2
10 5
10

1 0
10 10
0 50 100 150 0 50 100 150


Figure 8.2: Unmodi¬ed Nelder“Mead, („, θ, φ) = (1, 15, 10)


8.1.3 McKinnon™s Examples
In this set of three examples from [188], N = 2, and
±
 θφ|(x)1 |„ + (x)2 + (x)2 , (x)1 ¤ 0,
2
f (x) =

θ(x)„ + (x)2 + (x)2 , (x)1 > 0.
1 2

The examples in [188] consider the parameter sets
±
 (3, 6, 400),
(2, 6, 60),
(„, θ, φ) =

(1, 15, 10).

The initial simplex was

x1 = (1, 1)T , x2 = (»+ , »’ )T , x3 = (0, 0)T , where »± = (1 ± 33)/8.

With this data, the Nelder“Mead iteration will stagnate at the origin, which is not a critical point
for f . The stagnation mode is repeated inside contractions that leave the best point (which is not
a minimizer) unchanged.
We terminated the iteration when the difference between the best and worst function values
was < 10’8 .
We illustrate the behavior of the Nelder“Mead algorithm in Figures 8.2, 8.3, and 8.4. In all
the ¬gures we plot, as functions of the iteration index, the difference between the best and worst
function values, σ+ , the maximum oriented length, the norm of the simplex gradient, and the l2
condition number of the matrix of simplex directions. In all three problems stagnation is evident
from the behavior of the simplex gradients. Note also how the simplex condition number is
growing rapidly.



Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




140 ITERATIVE METHODS FOR OPTIMIZATION



Function Differences Max Oriented Length
2 1
10 10
0
10
0
10
1
10
2
10
2
10
4
10 3
10
6
10 4
10
8 5
10 10
0 20 40 60 0 20 40 60

Simplex Gradient Norm Simplex Condition
5 10
10 10
4 8
10 10
3 6
10 10
2 4
10 10
1 2
10 10
0 0
10 10
0 20 40 60 0 20 40 60


Figure 8.3: Unmodi¬ed Nelder“Mead, („, θ, φ) = (2, 6, 60)




Function Differences Simplex Diameter
2 1
10 10
0
10 0
10
2
10
1
10
4
10
2
10
6
10
8 3
10 10
0 10 20 30 40 0 10 20 30 40

Simplex Gradient Norm Simplex Condition
1 8
10 10

6
10

0 4
10 10

2
10

1 0
10 10
0 10 20 30 40 0 10 20 30 40


Figure 8.4: Unmodi¬ed Nelder“Mead, („, θ, φ) = (3, 6, 400)




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 141

8.1.4 Restarting the Nelder“Mead Algorithm
When the Nelder“Mead iteration stagnates, a restart with the same best point and a different set
of directions can help sometimes. In order to formulate a restart scheme, one must ¬rst develop
a strategy for detecting stagnation. One might think that a large simplex condition would suf¬ce
for this. However [204], the ability of the Nelder“Mead simplices to drastically vary their shape
is an important feature of the algorithm and looking at the simplex condition alone would lead
to poor results. Failure of (8.4), however, seems to indicate that something is wrong, and we
will use that as our stagnation detector.
Having detected stagnation, one must modify the simplex. Simply performing a shrink
step is not effective. The method we advocate here, from [155], is the oriented restart. The
motivation is that if the simplex gradient can be trusted to be in the correct orthant in RN , a new,
smaller simplex with orthogonal edges oriented with that quadrant should direct the iteration in
a productive direction.
We propose performing an oriented restart when (8.4) fails but f k+1 ’ f k < 0. This means
replacing the current simplex with vertices {xj }N +1 , ordered so that (8.1) holds, with a new
j=1
smaller simplex having vertices (before ordering!) {yj }N +1 with y1 = x1 and
j=1

yj = y1 ’ βj’1 ej’1 for 2 ¤ j ¤ N + 1,
(8.7)

where, for 1 ¤ l ¤ N , el is the lth coordinate vector,
±
σ (S k )sign((Dk f )l ), (Dk f )l = 0,
1 ’
βl =
2
σ’ (S k ), (Dk f )l = 0,

and (Dk f )l is the lth component of Dk f . If Dk f = 0 we assume that the Nelder“Mead iteration
would have been terminated at iteration k because there is no difference between best and worst
values.
So, before ordering, the new simplex has the same ¬rst point as the old. The diameter of the
new simplex has not been increased since the diameter of the new simplex is at most σ+ (S k ).
Moreover all edge lengths have been reduced. So after reordering σ+ (S k+1 ) ¤ σ’ (S k ). As for
κ, after the oriented shrink, but before reordering, κ(V ) = 1. After reordering, of course, the
best point may no longer be x1 . In any case the worst-case bound on κ is
√2
k+1 k+1 2
κ(V )= V ¤ (1 + N ) .
(8.8)

In any case, the new simplex is well conditioned.
Returning to the McKinnon examples, we ¬nd that an oriented restart did remedy stagnation
for the smooth examples. The graphs in Figures 8.5, 8.6, and 8.7 report the same data as for the
unmodi¬ed algorithm, with stars on the plots denoting oriented restarts.
For the smoothest example, („, θ, φ) = (3, 6, 400), the modi¬ed form of Nelder“Mead took a
single oriented restart at the 21st iteration. For the less smooth of these two, („, θ, φ) = (2, 6, 60),
a single restart was taken on the 19th iteration. As one can see from Figures 8.6 and 8.7 the
restart had an immediate effect on the simplex gradient norm and overcame the stagnation.
For the nonsmooth example, („, θ, φ) = (1, 15, 10), in Figure 8.5, the modi¬ed algorithm
terminated with failure after restarting on the 44th, 45th, and 46th iterations. Since the objective
is not smooth at the stagnation point, this is the best we can expect and is far better than the
behavior of the unmodi¬ed algorithm, which stagnates with no warning of the failure.




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




142 ITERATIVE METHODS FOR OPTIMIZATION



Function Differences Max Oriented Length
2 2
10 10



0 0
10 10



2 2
10 10



4 4
10 10
0 10 20 30 40 50 0 10 20 30 40 50


Simplex Gradient Norm Simplex Condition
3 8
10 10


6
10


2 4
10 10


2
10


1 0
10 10
0 10 20 30 40 50 0 10 20 30 40 50




Figure 8.5: Modi¬ed Nelder“Mead, („, θ, φ) = (1, 15, 10)




Function Differences Max Oriented Length
5 2
10 10



0 0
10 10



5 2
10 10



10 4
10 10
0 20 40 60 0 20 40 60


Simplex Gradient Norm Simplex Condition
2 3
10 10



0 2
10 10



2 1
10 10



4 0
10 10
0 20 40 60 0 20 40 60




Figure 8.6: Modi¬ed Nelder“Mead, („, θ, φ) = (2, 6, 60)




Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 143

Function Differences Max Oriented Length
5 2
10 10



0 0
10 10



5 2
10 10



10 4
10 10
0 20 40 60 0 20 40 60


Simplex Gradient Norm Simplex Condition
2 4
10 10


0 3
10 10


2 2
10 10


4 1
10 10


6 0
10 10
0 20 40 60 0 20 40 60




Figure 8.7: Modi¬ed Nelder“Mead, („, θ, φ) = (3, 6, 400)



8.2 Multidirectional Search
8.2.1 Description and Implementation
One way to address the possible ill-conditioning in the Nelder“Mead algorithm is to require that
the condition numbers of the simplices be bounded. The multidirectional search (MDS) method
[85], [261], [262] does this by making each new simplex congruent to the previous one. The
results in this section, mostly taken from [29], show that MDS has convergence properties like
those of implicit ¬ltering.
In the special case of equilateral simplices, V k is a constant multiple of V 0 and the simplex
condition number is constant. If the simplices are not equilateral, then κ(V ) may vary depending
on which vertex is called x1 , but (6.6) will hold in any case.
Figure 8.8 illustrates the two-dimensional case for two types of simplices. Beginning with
the ordered simplex S c with vertices x1 , x2 , x3 one ¬rst attempts a re¬‚ection step, leading to a
simplex S r with vertices x1 , r2 , r3 .
If the best function value of the vertices of S r is better than the best f (x1 ) in S 0 , S r is
(provisionally) accepted and expansion is attempted. The expansion step differs from that in
the Nelder“Mead algorithm because N new points are needed to make the new, larger simplex
similar to the old one. The expansion simplex S e has vertices x1 , e2 , e3 and is accepted over S r
if the best function value of the vertices of S e is better than the best in S r . If the best function
value of the vertices of S r is not better than the best in S c , then the simplex is contracted and
the new simplex has vertices x1 , c2 , c3 . After the new simplex is identi¬ed, the vertices are
reordered to create the new ordered simplex S + .
Similar to the Nelder“Mead algorithm, there are expansion and contraction parameters µe
and µc . Typical values for these are 2 and 1/2.
Algorithm 8.2.1. mds(S, f, „, kmax)


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




144 ITERATIVE METHODS FOR OPTIMIZATION

Right Simplex
Equilateral Simplex

x3
x2

c3
x3
c2
e2 r2
c3
x1
x1 c2 x2




r3
r2
r3




e3
e3
e2


Figure 8.8: MDS Simplices and New Points


1. Evaluate f at the vertices of S and sort the vertices of S so that (8.1) holds.

2. Set f count = N + 1.

3. While f (xN +1 ) ’ f (x1 ) > „

(a) Re¬‚ect: If f count = kmax then exit.
For j = 2, . . . , N +1: rj = x1 ’(xj ’x1 ); Compute f (rj ); f count = f count+1.
If f (x1 ) > minj {f (rj )} then goto step 3b else goto step 3c.
(b) Expand:
i. For j = 2, . . . , N + 1: ej = x1 ’ µe (xj ’ x1 ); Compute f (ej ); f count =
f count + 1.
ii. If minj {f (rj )} > minj {f (ej )} then
for j = 2, . . . N + 1: xj = ej
else
for j = 2, . . . N + 1: xj = rj
iii. Goto step 3d
(c) Contract: For j = 2, . . . , N + 1: xj = x1 + µc (xj ’ x1 ), Compute f (xj )
(d) Sort: Sort the vertices of S so that (8.1) holds.

If the function values at the vertices of S c are known, then the cost of computing S + is 2N
additional evaluations. Just as with the Nelder“Mead algorithm, the expansion step is optional
but has been observed to improve performance.
The extension of MDS to bound constrained and linearly constrained problems is not trivial.
We refer the reader to [173] and [175] for details.

8.2.2 Convergence and the Simplex Gradient
Assume that the simplices are either equilateral or right simplices (having one vertex from which
all N edges are at right angles). In those cases, as pointed out in [262], the possible vertices
created by expansion and re¬‚ection steps form a regular lattice of points. If the MDS simplices
remain bounded, only ¬nitely many re¬‚ections and expansions are possible before every point
on that lattice has been visited and a contraction to a new maximal simplex size must take place.


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 145

This exhaustion of a lattice takes place under more general conditions [262] but is most clear for
the equilateral and right simplex cases.
Theorem 6.2.9 implies that in¬nitely many contractions and convergence of the simplex
diameters to zero imply convergence of the simplex gradient to zero. The similarity of The-
orem 6.2.9 to Lemma 6.2.2 and of Theorem 8.2.1, the convergence result for multidirectional
search, to Theorem 8.1.2 is no accident. The Nelder“Mead iteration, which is more aggressive
than the multidirectional search iteration, requires far stronger assumptions (well conditioning
and suf¬cient decrease) for convergence, but the ideas are the same. Theorems 6.2.9 and 8.2.1
can be used to extend the results in [262] to the noisy case. The observation in [85] that one
can apply any heuristic or machine-dependent idea to improve performance, say, by exploring
far away points on spare processors (the “speculative function evaluations” of [46]) without
affecting the analysis is still valid here.
Theorem 8.2.1. Let f satisfy (6.1) and assume that the set

{x | f (x) ¤ f (x0 )}
1

is bounded. Assume that the simplex shape is such that

lim σ+ (S k ) ’ 0.
(8.9)
k’∞

Let B k be a ball of radius 2σ+ (S k ) about xk . Then if
1

φ Bk
lim =0
k’∞ σ+ (S k )

then every limit point of the vertices is a critical point of fs .
Recall that if the simplices are equilateral or right simplices, then (8.9) holds (see exer-
cise 8.6.2).


8.3 The Hooke“Jeeves Algorithm
8.3.1 Description and Implementation
The Hooke“Jeeves algorithm is like implicit ¬ltering in that the objective is evaluated on a stencil
and the function values are used to compute a search direction. However, unlike implicit ¬ltering,
there are only ¬nitely many possible search directions and only qualitative information about
the function values is used.
The algorithm begins with a base point x and pattern size h, which is like the scale in implicit
¬ltering. In the next phase of the algorithm, called the exploratory move in [145], the function is
sampled at successive perturbations of the base point in the search directions {vj }, where vj is
the jth column of a direction matrix V . In [145] and our MATLAB implementation V = I. The
current best value fcb = f (xcb ) and best point xcb are recorded and returned. xcb is initialized
to x. The sampling is managed by ¬rst evaluating f at xcb + vj and only testing xcb ’ vj
if f (xcb + vj ) ≥ f (xcb ). The exploratory phase will either produce a new base point or fail
(meaning that xcb = x). Note that this phase depends on the ordering of the coordinates of x.
Applying a permutation to x could change the output of the exploration.
If the exploratory phase has succeeded, the search direction is

dHJ = xcb ’ x
(8.10)

and the new base point is xcb . The subtle part of the algorithm begins here. Rather than center
the next exploration at xcb , which would use some of the same points that were examined in


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




146 ITERATIVE METHODS FOR OPTIMIZATION

the previous exploration, the Hooke“Jeeves pattern move step is aggressive and tries to move
further. The algorithm centers the next exploratory move at

xC = x + 2dHJ = xcb + dHJ .

If this second exploratory move fails to improve upon f (xcb ), then an exploratory move with
xcb as the center is tried. If that fails h is reduced, x is set to xcb , and the process is started over.
Note that when h has just been set, the base point and the center of the stencil for the exploratory
moves are the same, but afterward they are not.
If, after the ¬rst exploratory move, xcb = x (i.e., as it will be if x is the best point in the
pattern), then x is left unchanged and h is reduced.
Therefore, whenever h is reduced, the stencil centered at x has x itself as the best point.
This is exactly the situation that led to a shrink in the MDS algorithm and, as you might expect,
will enable us to prove a convergence result like those in the previous sections. In [145] h was
simply multiplied by a constant factor. Our description in Algorithm hooke follows the model
of implicit ¬ltering and uses a sequence of scales. Choice of perturbation directions could be
generalized to any simplex shape, not just the right simplices used in [145].
Figure 8.9 illustrates the idea for N = 2. The base point x lies at the center of the stencil. If

f (x+ ) < f (x), f (x+ ) < f (x), f (x’ ) ≥ f (x), and f (x’ ) ≥ f (x),
1 2 1 2

then the new base point xb will be located above and to the right of x. The next exploratory
move will be centered at xC , which is the center of the stencil in the upper right corner of the
¬gure.
The reader, especially one who plans to implement this method, must be mindful that points
may be sampled more than once. For example, in the ¬gure, if the exploratory move centered
at xC fails, f will be evaluated for the second time at the four points in the stencil centered
at xb unless the algorithm is implemented to avoid this. The MDS method is also at risk of
sampling points more than once. The implementations of Hooke“Jeeves and MDS in our suite
of MATLAB codes keep the most recent 4N iterations in memory to guard against this. This
reevaluation is much less likely for the Nelder“Mead and implicit ¬ltering methods. One should
also be aware that the Hooke“Jeeves algorithm, like Nelder“Mead, does not have the natural
parallelism that implicit ¬ltering and MDS do.
One could implement a variant of the Hooke“Jeeves iteration by using xC = x + dHJ
instead of xC = x + 2dHJ and shrinking the size of the simplex on stencil failure. This is the
discrete form of the classical coordinate descent algorithm [180] and can also be analyzed by
the methods of this section (see [279] for a different view).
Our implementation follows the model of implicit ¬ltering as well as the description in
[145]. We begin with the exploratory phase, which uses a base point xb , base function value
fb = f (xb ), and stencil center xC . Note that in the algorithm xb = xC for the ¬rst exploration
and xC = xb +dHJ thereafter. Algorithm hjexplore takes a base point and a scale and returns
a direction and the value at the trial point x + d. We let V = I be the matrix of coordinate
directions, but any nonsingular matrix of search directions could be used. The status ¬‚ag sf is
used to signal failure and trigger a shrink step.
Algorithm 8.3.1. hjexplore(xb , xC , f, h, sf )

1. fb = f (xb ); d = 0; sf = 0; xcb = xb ; fcb = f (xb ); xt = xC

2. for j = 1, . . . , N : p = xt + hvj ; if f (p) ≥ fb then p = xt ’ hvj ;
if f (p) < fb then xt = xcb = p; fb = f (xcb )

3. if xcb = xb ; sf = 1; xb = xcb


Buy this book from SIAM at http://www.ec-securehost.com/SIAM/FR18.html.
Copyright ©1999 by the Society for Industrial and Applied Mathematics. This electronic version is for personal use and may not be duplicated or distributed.




SEARCH ALGORITHMS 147

xC2+




xC1- xC1+
xC




x2+ xb xC2-




s1+




x1- x1+
x
s1- s1+




s2-




x2-


Figure 8.9: Hooke“Jeeves Pattern and New Points


The exploration is coupled to the pattern move to complete the algorithm for a single value
of the scale. The inputs for Algorithm hjsearch are an initial iterate x, the function, and the
scale. On output, a point x is returned for which the exploration has failed. There are other
considerations, such as the budget for function evaluations, that should trigger a return from the
exploratory phase in a good implementation. In our MATLAB code hooke.m we pay attention
to the number of function evaluations and change in the function value as part of the decision to
return from the exploratory phase.

Algorithm 8.3.2. hjsearch(x, f, h)

1. xb = x; xC = x; sf = 1

2. Call hjexplore(x, xC , f, h, sf )

3. While sf = 1

(a) d = x ’ xb ; xb = x; xC = x + d
(b) Call hjexplore(x, xC , f, h, sf );
If sf = 0; xC = x; Call hjexplore(x, xC , f, h, sf )



<<

. 7
( 9)



>>