|
- Here we deal with the problem of (locally) minimizing a function of one
real variable. These methods play also a role as part of multidimensional minimizers
for determining a steplength for a proposed direction of descent.
Two of these methods are originally used only in this context, namely the
Goldstein-Armijo descent test and the computation of the Powell-Wolfe step.
We use them here with the ''directions'' +1 and -1 for iteratively finding a
stationary point whereas the first two identify a point which in addition satisifies
the second order necessary optimality condition f'(x*) = 0 and
f''((x*) >= 0 at least.
- Method 1: Golden section search: This begins with four points in the position
high> low1 low2 < high (function values) , hence enclosing at least one local minimizer,
with the abscissas in a special position, namely with
relative distances 0, ρ2, ρ , 1
from left to right. With ρ = (sqrt(5)-1)/2) this allows to remove one of the endpoints
(the left if ''low1 > low2'' and the right otherwise) and to recompute only one new interior point in order
to obtain the exactly same situation as before. This because
ρ = 1 - ρ2
The new point is in position ρ3 resp. 1-ρ3.
The interval length is reduced by the factor ρ and the step is repeated until the
interval length has been reduced sufficiently.
(ρ is the so called ''golden section'' ratio, known to artists and
architects.)
- Method 2: Successive quadratic interpolation. We begin with three points (a,f(a)), (b,f(b)), (c,f(c))
in the position f(a) > f(b), f(b) < f(c), a < b < c.
Next we compute the quadratic polynomial p2
which interpolates these data and its unique minimizer, say d. We now intend to remove either a
or c from our data list and add (d,f(d)) as the new point, since there exists a local minimizer
of f which has distance from d less than O(|c-a|2). This follows from the fact
that
f'(x) - p'2(x) = (f''(ξ)/2)( (x-a)(x-c)+(x-b)(x-c)+(x-a)(x-c) )
+(f(3)(η)/6)(x-a)(x-b)(x-c) .
But we want to keep this replacement process stable. Hence if the distance of one of the abscissas
a, b, c is nearer to d than l=(c-a)/100, then we compute two more points
max{a+l,d-l},min{d+l,b-l} and now select from the existing data three neighboring points in the
position as before. This reduces the length of the interval at least by l/100, hence generates
at least linear interval reduction. At least after three steps the interval length is reduced to
O(|c-a|2). If the special steps would never occur the order of convergence would be 1.83.
Hence this is much faster than golden section search (provided the theory applies ,i.e.
f isthree times continuously differentiable)
- Method 3: Goldstein - Armijo stepsize computation (also known as ''backtracking'').
Given d in {0,1} such that f'(x)d > 0 we compute the largest
σ in {γ, γ/2, γ/4, ... } such that
f(x)-f(x-σ d) > = δ σ f'(x)d
and replace x by x-σ d . It is easy to show that there is a lower bound for σ
namely
σ > = (2(1-δ)/sup{f''(x)}) |f'(x)|/γ
provided f is two times continuously differentiable and bounded from below with compact level sets.
Hence this produces a sequence of which each accumulation point is a stationary point of f.
We use γ = |f'(x)| here.
In order to speed up the process we check in addition the minimizer of the parabola in the variable
&sigma defined by the values (x,f(x)),(x,-f'(x)d),(x-γd,f(x-γd))
(if in [c1,c2]*γ) as a candidate for the next value of σ.
- Method 4: The Powell - Wolfe step size: This is similar in spirit to ''backtracking'' but
begins with an initial search for a point z > x such that f'(z)d < 0 .
Then by a combination of bisection and interpolation a point in [x,z] is sought such that
f(x)-f(x-σ d) > = δ σ f'(x)d
and
|f'(x-σd)| <= κ |f'(x)d|
Here 0 < δ < κ < 1 and preferably δ < 1/2 .
(This has the effect that finally the interpolation replacement of σ is always succesful
speeding this up considerably.) Then x-σ d replaces x and the process is repeated.
- The methods terminate ''successfully'' if the interval reduction is eps*original length
resp. if two new successive x-values differ by less than eps(|x|+eps).
There are more termination reasons, namely
- too many function values used (10000)
- limiting precision in interval reduction reached
- limiting precision in function evaluation reached
- limiting precision in x reached
- limiting accuracy in derivative evaluation reached.
These additional termination cases signal a too small choice of eps. They all
assume that the function is evaluated to full double precision.
|