Minimization using Newton descent

Directly to the input form

 
  • For a little bit theory concerning the background look here.
  • Newton's method for solving
    ∇f(x) = 0 will converge to any stationary point (locally), provided the Hessian is invertible there. Here it is modified in such a way, that only second order necessary stationary points are found: points, where ∇f(x) = 0 and the Hessian is positive semidefinite at least.
  • This is enforced using three features:
    • a regularization of the Hessian which makes it positive definite results in a direction of descent d for f:
      (H + α G) d = - ∇ f(x)
    • The use of directions of negative curvature z which satisfy
      zT∇ f(x) <= 0 and zT H z < 0
      in cases of nonconvexity.
    • A step size control which enforces a descent in f which by construction is bouded from below by
      max { ||∇ f(x)||2 , | zT H z|2} × C
      with some constant C > 0 (existent but unknown in practice)
  • These two directions (and the correction matrix G) are obtained using the Bunch-Parlett decomposition of the Hessian H. For more information about it look here .
  • Given an initial vector x0 we iterate:
    either
    xk+1 = xk + σ kdk
    or
    xk+1 = xk + σ kzk.
    The choice depends on the decision which one promises better descent. Specifically we use here z if
    |zTg|/||z||+(1/2) |zT H z|/||z||2 > |dTg|/||d||
  • d is the solution of
    (H+α P L LT PT) d = -∇ f(x)
    The regularization parameter α is computed using the input parameter regul such that the eigenvalues of the block diagonal part D of the Bunch-Parlett decomposition of H are not less than regul*||D||. If d is used, then the stepsize is computed using a stepsize method described by AlBaali and Fletcher aiming in fulfilling the strong Powell-Wolfe conditions
    |∇ f(x_next)Td| <= κ *|∇ f(x)Td|
    f(x)-f(x_next) >= σ * δ *|∇ f(x)Td|.

    δ and κ are under your control. Defaults are 0.01 and 0.9.
  • The direction z, if used, is composed using eigenvectors of D corresponding to all negative eigenvalues of D. In this case the stepsize is computed using backtracking and an initial stepsize obtained by a cubic interpolation using φ(0), φ'(0), φ''(0) and φ(σ0) with
    φ(σ) = f(x+ σ z)
    and σ0 obtained by a search for a region of ascent of f.
  • Gradients are computed analytically with the exception of testcase 14, and the Hessian matrices are computed analytically for all testcases 1 to 13. For case 14 the gradient is computed with a 6th order accurate difference formula. The Hessian, if done numerically, is computed by the forward finite difference of the gradient with subsequent symmetrization.
  • If d is used then the first test step is always σ =1. Hence this method converges globally and of second order locally, if the Hessian is sufficiently well conditioned at the solution (that means "last regularization shift=0" in the output.)
  • Termination occurs, besides the obvious cases "too many steps required" and "error flag in evaluating f is set" (for example if a sqrt(-1) would have occured) in the following cases:
    • "no acceptable step size found" This means that the stepsize algorithm found no stepsize which would have changed x significantly. This occurs with imprecise gradients (testcase 14 with a small epsx for example) or with very illconditioned cases near the solution.
    • "correction d too small"
      ||d|| < 10-16(||x||+10-16)
      This would imply that no significant change in x can be obtained.
    • "grad f sufficiently small":
      ||∇ f(x)|| <= max(epsx,10-16*cond(H))
      and
      ||x-xnext|| <= epsx(||x||+epsx)
      and no direction of negative curvature exists.
    • "small directional derivative: f flat"
      |dT∇ f(x)| < 10-13(|f(x)|+10-16) (line search makes no sense)
    • "no significant changes in f": for at least n successive step the relative change in the function values had been less than 10-12
 

Input

 
  • You have the choice between 14 prepared testcases, for which there exists a more detailed description here, or you define a piece of code for computing f(x) yourself. This is subject to automatic differentiation using the code JAKEF, hence you must obey some special limitations. In this case you also must specify the initial point explicitly. For the prepared cases you also have the choice to use an initial point different from the automatically provided one.
  • You can use the default parameter settings or use your own choice concerning epsx , δ , κ , regul , maxstep . If you choose a very small regul, then due to the implied bad conditioning of the regularized Hessian either the stepsize selection or even the computation of d might fail. You may play with this in the nonconvex testcases.
  • You finally might require a contourplot of f around the minimizer, with two variables varying and the others fixed at their optimal values.
 

Output

 
  • CPU-time (zero means "less than 0.01 seconds")
  • The termination reason
  • The computed optimal values f(xopt) and x(1)opt,...,x(n)opt and the gradient.
  • the number of function and gradient evaluations required. The number of Hessian evaluations is equal to the number of steps done.
  • The condition number of the last (regularized) Hessian in the Frobeniusnorm.
  • Two plots in half logarithmic scale, showing f - fopt and ||∇ f(x)|| over the stepnumbers.
  • The contourplot, if required.
 

To the input form

 
Back to the top!

06.06.2016