The gradient method

Directly to the input form

 
  • In this application we search a local minimizer xopt of a smooth nonlinear function f on Rn. This means
    f(xopt) <= f(x) for all x from some neighborhood of xopt. Indeed the method only tries to find a stationary point of f, i.e. a zero of ∇ f(x). Should f be locally convex, then indeed this will be a minimizer. Convexity however is not needed for this method. We need f in C1 and the gradient is explicitly used. But in no way second partial derivatives enter this method. For a little bit theory concerning the background look here.
  • The gradient method is the simplest method of this type. Given an initial guess x0 one iterates
    xk+1 = xk - σk dk.
    Here dk = ∇ f(xk) . σk is computed here using the stepsize method of Albaali and Fletcher. This is a bracketing method using quadratic and cubic interpolation aiming in satisfying the strong Powell-Wolfe conditions (we omit the index k):
    |∇ f(x_next)'d| <= κ*|∇ f(x)'d|
    f(x)-f(x_next) >= σ*δ*|∇ f(x)'d|.
    Here δ and κ are two parameters with default values 0.01 and 0.8. There always must hold
    0 < δ < κ < 1
    A very small κ means an almost exact unidimensional minimization in the direction -dk. We use the minimizer of the quadratic polynomial in σ defined by the values of f for σ=0 and σ=1 and the directional derivative -∇ f(x)'d at σ =0 in order to compute a first guess for σk. One can show that this estimate tends to the first local minimizer along the ray and satisfies the strong Powell-Wolfe conditions provided
    0 < δ < 1/2 < κ < 1 .
    For a convex quadratic it is exact.
  • The convergence of this method (measured in the distance ||H1/2 .||2) is Q-linear with the asymptotic rate
    (cond(H)-1)/(cond(H)+1) , H = Hessian of f at the solution, assumed as strictly positive definite.
  • In order to cope with possible trouble, a lot of termination criteria are tested: besides the obvious ones (too many steps to reach the desired precision (i.e. 4000 here), irrecoverable error return from side of the users function) these are
    • ∇ f(xk) =0 suddenly
    • ||xk+1-xk|| <= εx*(||xk||+εx) and ||∇ f(xk+1)|| <= εg (the two ''successful'' termination cases). Here εg = εx.
    • ||∇ f(xk)||2 <= εf*(|f(xk)|+εf), εf=10-12 (directional derivative too small to make stepsize search senseful)
    • failure of the stepsize algorithm (usually strong illconditioning or a too small κ)
    • More than n small successive changes in f less than 10-12(|f|+10-12).
 

Input

 
  • You have the choice between 14 predefined functions for which you can read a short examples description here. You also may define a function f of your own, depending on n variables x(1),...,x(n). You also have the choice to specify the initial guess x0.
  • Either you use the standard parameter settings for the termination criteria or you specify these yourself. For convenience we have these reduced to εx, δ , κ setting εgx.
  • For the predefined cases you have the possibility to replace the standard initial point by another one. The required dimension can be drawn from the input form.
  • You have the possibility to require a contourplot of f around the best point found with two of the n variables varying and the other components of x fixed at their current values. you can specify the size of the region. Be careful! Often these functions show rapid growth such that on a large region you may ''see'' nothing useful.
 

Output

 
  • cpu time used. If this appears as zero this means less than 0.01 seconds.
  • the termination reason.
  • The computed best values f(xopt) and x(1)opt,...,x(n)opt and the gradient components.
  • Then number of function and gradient evaluations used.
  • Two plots in half-log format showing fk - fopt and ||∇ f(xk)||.
  • The contour plot, if requested.
 

To the input form

 
Back to the top!

30.08.2017