|
Minimization using Newton descent |
|
|
|
- For a little bit theory concerning the background look here.
- Newton's method for solving
∇f(x) = 0
will converge to any stationary point (locally),
provided the Hessian is invertible there.
Here it is modified in such a way, that only second order necessary
stationary points are found: points, where ∇f(x) = 0
and the Hessian is positive semidefinite at least.
- This is enforced using three features:
- a regularization of the Hessian
which makes it positive definite results in a direction of descent
d for f:
(H + α G) d = - ∇ f(x)
- The use of directions of negative curvature z which satisfy
zT∇ f(x) <= 0 and zT H z < 0
in cases of nonconvexity.
- A step size control which enforces a descent in f which by
construction is bouded from below by
max { ||∇ f(x)||2 , | zT H z|2} × C
with some constant C > 0 (existent but unknown in practice)
- These two directions (and the correction matrix G) are obtained
using the Bunch-Parlett decomposition of the Hessian H.
For more information about it look here .
- Given an initial vector x0 we iterate:
either
xk+1 = xk + σ kdk
or
xk+1 = xk + σ kzk.
The choice depends on the decision which one promises better descent.
Specifically we use here z if
|zTg|/||z||+(1/2) |zT H z|/||z||2 > |dTg|/||d||
- d is the solution of
(H+α P L LT PT) d = -∇ f(x)
The regularization parameter α is computed using the input
parameter regul such that the eigenvalues of the block diagonal part
D of the Bunch-Parlett decomposition of H are not less than
regul*||D||.
If d is used, then the stepsize is computed using a stepsize
method described by AlBaali and Fletcher aiming in fulfilling the strong
Powell-Wolfe conditions
|∇ f(x_next)Td| <= κ *|∇ f(x)Td|
f(x)-f(x_next) >= σ * δ *|∇ f(x)Td|.
δ and κ are under your control.
Defaults are 0.01 and 0.9.
- The direction z, if used, is composed using eigenvectors of D
corresponding to all negative eigenvalues of D. In this case
the stepsize is computed using backtracking and an initial stepsize
obtained by a cubic interpolation using φ(0), φ'(0), φ''(0)
and φ(σ0) with
φ(σ) = f(x+ σ z)
and σ0 obtained by a search for a region of
ascent of f.
- Gradients are computed analytically with the exception of testcase 14,
and the Hessian matrices are computed analytically for all testcases 1 to
13. For case 14 the gradient is computed with a 6th order accurate
difference formula. The Hessian, if done numerically, is computed
by the forward finite difference of the gradient with subsequent symmetrization.
- If d is used then the first test step is always σ =1.
Hence this method converges globally and of second order locally, if
the Hessian is sufficiently well conditioned at the solution
(that means "last regularization shift=0" in the output.)
- Termination occurs, besides the obvious cases "too many steps required"
and "error flag in evaluating f is set" (for example if a sqrt(-1) would have
occured) in the following cases:
- "no acceptable step size found"
This means that the stepsize algorithm found no stepsize which
would have changed x significantly.
This occurs with imprecise gradients (testcase 14 with a small epsx
for example) or with very illconditioned cases near the solution.
- "correction d too small"
||d|| < 10-16(||x||+10-16)
This would imply that no significant change in x can
be obtained.
- "grad f sufficiently small":
||∇ f(x)|| <= max(epsx,10-16*cond(H))
and
||x-xnext|| <= epsx(||x||+epsx)
and no direction of negative curvature exists.
- "small directional derivative: f flat"
|dT∇ f(x)| < 10-13(|f(x)|+10-16)
(line search makes no sense)
- "no significant changes in f": for at least n
successive step the relative change in the function values had been less than
10-12
|
|
|
Input |
|
- You have the choice between 14 prepared testcases, for which there exists
a more detailed description here,
or you define a piece of code for computing f(x) yourself.
This is subject to automatic differentiation using the code JAKEF,
hence you must obey some special limitations.
In this case you also must specify the initial point explicitly.
For the prepared cases you also have the choice to use an initial point
different from the automatically provided one.
- You can use the default parameter settings or use your own choice
concerning epsx , δ , κ , regul , maxstep .
If you choose a very small regul, then due to the implied bad
conditioning of the regularized Hessian either the stepsize selection or
even the computation of d might fail. You may play with this
in the nonconvex testcases.
- You finally might require a contourplot of f
around the minimizer, with two variables varying and the others fixed at their
optimal values.
|
|
|
Output |
|
- CPU-time (zero means "less than 0.01 seconds")
- The termination reason
- The computed optimal values f(xopt) and
x(1)opt,...,x(n)opt and the
gradient.
- the number of function and gradient evaluations required.
The number of Hessian evaluations is equal to the number of steps done.
- The condition number of the last (regularized) Hessian in the
Frobeniusnorm.
- Two plots in half logarithmic scale, showing f - fopt
and ||∇ f(x)|| over the stepnumbers.
- The contourplot, if required.
|
|
|
|