|
The gradient method |
|
|
|
- In this application we search a local minimizer xopt of a smooth nonlinear
function f on Rn.
This means
f(xopt) <=
f(x) for all x from some neighborhood of xopt.
Indeed the method only tries to find a stationary point of f, i.e.
a zero of ∇ f(x). Should f be locally convex, then indeed this will
be a minimizer. Convexity however is not needed for this method.
We need f in C1 and the gradient is explicitly used.
But in no way second partial derivatives enter this method.
For a little bit theory concerning the background look here.
- The gradient method is the simplest method of this type.
Given an initial guess x0 one iterates
xk+1 = xk - σk dk.
Here dk = ∇ f(xk) .
σk is computed here using the stepsize method of Albaali
and Fletcher.
This is a bracketing method using quadratic and cubic interpolation aiming in satisfying
the strong Powell-Wolfe conditions (we omit the index k):
|∇ f(x_next)'d| <= κ*|∇ f(x)'d|
f(x)-f(x_next) >= σ*δ*|∇ f(x)'d|.
Here δ and κ are two parameters with default
values 0.01 and 0.8. There always must hold
0 < δ < κ < 1
A very small κ means an almost exact unidimensional minimization in the direction -dk.
We use the minimizer of the quadratic polynomial in σ defined by the values of f for σ=0 and
σ=1 and the directional derivative
-∇ f(x)'d at σ =0 in order to compute a first guess for σk.
One can show that this estimate tends to the first local minimizer along the ray and satisfies the strong Powell-Wolfe conditions provided
0 < δ < 1/2 < κ < 1 .
For a convex quadratic it is exact.
- The convergence of this method (measured in the distance ||H1/2 .||2) is
Q-linear with the asymptotic rate
(cond(H)-1)/(cond(H)+1) , H = Hessian of f at the solution, assumed as strictly positive definite.
- In order to cope with possible trouble, a lot of termination criteria are tested: besides the obvious ones
(too many steps to reach the desired precision (i.e. 4000 here), irrecoverable error return from side of the users
function) these are
- ∇ f(xk) =0 suddenly
- ||xk+1-xk|| <= εx*(||xk||+εx) and ||∇ f(xk+1)|| <= εg
(the two ''successful'' termination cases). Here εg = εx.
- ||∇ f(xk)||2 <= εf*(|f(xk)|+εf), εf=10-12
(directional derivative too small to make stepsize search senseful)
- failure of the stepsize algorithm (usually strong illconditioning or a too small κ)
- More than n small successive changes in f less than 10-12(|f|+10-12).
|
|
|
Input |
|
- You have the choice between 14 predefined functions for which you can read a short
examples description here.
You also may define a function f of your own,
depending on n variables x(1),...,x(n).
You also have the choice to specify the initial guess x0.
- Either you use the standard parameter settings for the termination criteria or you specify these
yourself. For convenience we have these reduced to εx, δ , κ setting εg=εx.
- For the predefined cases you have the possibility to replace the standard initial point by
another one. The required dimension can be drawn from the input form.
- You have the possibility to require a contourplot of f around the best point found
with two of the n variables varying and the other components of x fixed at their
current values. you can specify the size of the region. Be careful! Often these functions show
rapid growth such that on a large region you may ''see'' nothing useful.
|
|
|
Output |
|
- cpu time used. If this appears as zero this means less than 0.01 seconds.
- the termination reason.
- The computed best values f(xopt) and
x(1)opt,...,x(n)opt and the gradient components.
- Then number of function and gradient evaluations used.
- Two plots in half-log format showing fk - fopt
and ||∇ f(xk)||.
- The contour plot, if requested.
|
|
|
|