Minimization by quadratic approximation
NEWUOA by M.J.D. Powell

Directly to the input form

 
  • In many applications one wants to solve a minimization (or maximization) problem, where explicit computation of derivatives of the objective function is impossible, although existence of such derivatives is theoretically guaranteed. In such situations the methods of "derivative free optimization", DFO for short, are useful. Using finite difference approximations instead of analytical derivatives is often problematic, especially if the function values are not given to very high precision, as is often the case in computations involving simulations of continuous processes. If the existence of second derivatives is known then second order approximations are justified.
  • Here we use a new method of M.J.D. Powell, namely his code NEWUOA, which uses a quadratic model for the objective function, whose minimizer under a trust region constraint is used as the next approximation of the minimizer. This quadratic model is obtained by interpolation of function data on a grid in general position which guarantees unique solvability of the corresponding linear system. For the case n=2 you may imagine a triangle and as interpolation points the three vertices and the midpoints of the edges as interpolation points. By a linear transformation to the standard triangle with the vertices (0,0), (1,0), (0,1) one obtains a simple linear system fixing the six parameters of a quadratic interpolation in two variables, which allows retranslation to the general position by invertibility of the linear transformation. In general the model has the form
    fappr(x) = f(x0)+gT*(x-x0) +(1/2)(x-x0)T*H*(x-x0) .
    g is an approximation for the gradient and H for the Hessian of f at x0 .
    x0 is the best point found so far.
    This model is then minimized on a ball of radius Δ, this radius being computed adaptively itself. The user gives an initial value and a final value for it. This is the so called "trust region radius". The minimizer is a candidate for the next best approximation for f's minimizer. It becomes accepted if the new function value is sufficiently smaller than f(x0), compared with the decrease obtained in the quadratic model. If this is not obtained (for example because the function behaves far from the quadratic model), then the trust region radius is decreased and the minimization of the model repeated, and with very good accordance the radius may also be increased. The new point replaces one of the old interpolation points, allowing updating techniques in the linear system solves. If the interpolation points get more and more widespread (specifically if their distance becomes larger than 2 Δ), then points "far away" are replaced by ones nearer to the newest point but in such a way that invertibility of the linear systems involved in the interpolation model is guaranteed.
  • A complete quadratic model needs (n+1)(n+2)/2 (x,f(x)) pairs and this becomes unsupportable expensive for larger n. Hence it is possible to use a smaller number of new interpolation points only around the next best approximation point, namely in the range [2n+1,(n+1)(n+2)/2] which is of course not sufficient to fix the Hessian completely if chosen less than the upper bound. The computation of H+, the next value for it, is then accomplished using the principle of minimal change (measured in the Frobeniusnorm) compared to the previous value of H, i.e. one solves
    ||H-H+|| = min w.r.t. H+
    subject to a symmetry condition and the interpolation conditions for the model (i.e. linear equality constraints). This results in a linear system of equations which is solved using an explicit inverse which is updated from step to step. The result is a Hessian approximation similar to the so called ''PSB-Update'', a special quasi Newton method in the field of differentiable minimization
  • Termination of the computation occurs if the trust region radius becomes smaller than the input parameter rhomin, which should be a measure of the precision in x desired by the user, if the maximum allowed number of function evaluations (set by the user) has been exceeded prior of reaching this or if, exceptionally, the trust region computation did not produce a decrease in the quadratic model, which can occur if excessive roundoff due to illconditioning is present.
  • A detailed description of the method appeared in
    M.J.D. Powell: "The NEWUOA software for unconstrained optimization without derivatives", in "Large-Scale Nonlinear Optimization" eds. G. Di Pillo and M. Roma, Springer 2006, pages 255-297.
 

Input

 
  • You have the choice between 14 predefined functions for which you can read a short examples description here. You also may define a function f of your own, depending on n variables x(1),...,x(n). You also have the choice to specify the initial guess x0.
  • You may use the default values for the some algorithmic parameters or specify your own choice. These parameters are:
    rhobeg, the first value of the trust region radius (not too small!). Powell recommends 10 percent of the maximum expected change in a component of x for this. rhomin, the smallest trust region radius allowed, stands for the desired precision in the solution x, maxfun, recommendation for this: 30*n*npkt at least, npkt, the number of new evaluation points to be chosen after each successful trust region step, and the amount of intermediate output, iprint.
  • For the predefined cases you have the possibility to replace the standard initial point by another one. The required dimension can be drawn from the input form.
  • You have the possibility to require a contourplot of f around the best point found with two of the n variables varying and the other components of x fixed at their current values. you can specify the size of the region. Be careful! Often these functions show rapid growth such that on a large region you may ''see'' nothing useful.
 

Output

 
  • The protocol of the intermediate results, if required.
  • The total cpu time. Zero means "less than 0.01 seconds".
  • The termination reason.
  • The best values f(xopt) and x(1)opt,...,x(n)opt.
  • The number of function calls.
  • Four plots: f - fopt, and the euclidean length of g (the approximate gradient in the quadratic model), both in semi logarithmic scale, the current number of function evaluations and the actual trust region radius, anything over the step number.
  • The contour plot, if required.
 

To the input form

 
Back to the top!

11.12.2013