Active set methods/ BFGS-GRG method

 

Directly to the input form: predefined testcases

 

Directly to the input form: user defined testcase

 
  • Here you find a code for solving a constrained optimization problem using the active set method. For basic facts on optimization problems of this kind: f(x) = min, h(x)=0, g(x)>=0
    look here.
  • ''active set method'' means a method where the current approximations to the solution always satisfy the constraints and typically move along boundary manifolds of the feasible set, i.e. some of inequality constraints are at their lower bound and the equality constraints are always satisfied of course.
  • In the following the dimensions are n, p and m for x, h, g respectively and J is a subset of {1,...,m} such that
    gi(x) = 0 for i in J
    We use the short notation
    N = ( ∇h , ∇gJ )(x)
    for the matrix whose columns are build from the gradients of all constraints satisfied with equality. Of course J also depends on x, but we omit this in order to simplify the notation.
    The code assumes that N always has full column rank and gives up if this is not the case .
  • The idea of the active set method is to reduce the general problem to one which is only equality constrained (locally, with changes of the active set allowed of course). The equality constrained problem is then solved by a Newton type method until a local solution is obtained or a change of the active set is required.
  • This Newton type method consists of two parts: in the first step a correction direction d is computed using a modified Kuhn - Tucker system which reads
    H d + N u = ∇ f(x)
    NT d = 0 ,

    where the right hand side ''0'' of the second equation comes from the fact that the active constraints have value 0.
    The vector u consists of Lagrange multiplier estimates for the equality constraints and the active inequality constraints, hence it is u = [ μ , λJ ] . Clearly, if d = 0 and λJ > = 0 then a local solution is obtained (since we restrict ourselves in finding first order necessary points for optimality, the Kuhn-Tucker points). The special form of H which is used here allows to decompose the solution of this linear system into three resp. four smaller systems such that for example u can be computed independent of d, see below.
  • In the second step a new value for x is determined. d is tangential to the current boundary manifold. If there are only linear constraints, we search the next x+ as
    x+ = x - σ d moving along this ray until either a local minimizer of f along this ray is found or a new constraint becomes active. In both cases we stop the search at the corresponding point. If f is linear, then always the second case applies, since otherwise the objective function is not bounded from below on the feasible set. If f is quadratic and convex, then the first case is possible. If f is ''general'', then the exact unidimensional minimization is not practicable and we use a backtracking scheme in order to obtain ''sufficient descent'', i.e.
    f(x) - f(x+) >= δ σ dT ∇ f(x) .
    The stepsize σ is chosen in the sequence {1,1/2,1/4,....}. δ is an algorithmic parameter to be chosen, satisfying 0 < δ < 1/2 . Otherwise the ''optimal'' step could be prohibited. We use δ = 0.01 here.
  • If there are nonlinear constraints, then
    x - σ d will not necessarily be feasible and we add a further correction to it:
    The linear ray is now replaced by a pointwise computed nonlinear one, which looks like
    x - σ d + N(x)c(σ) = x+(σ)
    where c(σ) is computed such that
    ( h, gJ )(x+(σ)) = 0 .
    This results in a nonlinear system of equations for c(σ) which itself is solved by iteration. This is known as a ''restoration'' in the optimization literature. We try to solve this to full precision and if this is not reached after 10 (another algorithmic parameter, fixed here) iterations, we reduce σ and try again. From Taylors theorem it follows that this finally will succeed.
  • The matrix H above plays a special role here: If all functions are linear, then we use the identity matrix. Because of the stepsize technique this results in a variant of the Simplex algorithm for the LP problem. If f is quadratic and strictly convex, then we take its exact Hessian as H and otherwise the exact Hessian plus a suitable multiple of the identity matrix to make it positive definite. In all other cases we use the BFGS approximation to the projected Hessian of the Lagrangian function of the problem, with eventual reinitialization by a multiple of the (scaled) identity matrix in case of too much illconditioning and QHQT is a block matrix of the form
    
            [ O  O   O ]
            [ O  I   O ]
            [ O  O   B ] 
         

    where B is the projected BFGS part. The I-part is present in case of an inactivation in the current step. The identity has the dimension of the number of inactivated constraints then. The Q-matrix used comes from the QR decomposition of N: QN is upper triangular. In order to be able to do this efficiently, also a full BFGS update is maintained and used as long as the active set changes or d is not small. Thereby we don't loose second order information accumulated during previous steps.
  • Typically far from the solution some of computed inequality multipliers λJ have a wrong sign. In this case a change of the active set is indicated: we delete several of the inequality constraints from the active set (those with the algebraically smallest multipliers , determined by another algorithmic parameter) and compute d for a larger constraint manifold. This is possible due to the special choice we make for H. This increases ||d|| and hence the possible decrease of f in this direction.
  • One can show that this results in a finite process for LP and strictly convex QP and in superlinear convergence in the general case (provided the local solution satisfies the strong second order sufficiency conditions).
  • We allow an infeasible initial point. In this case the method begins by minimizing the L1-Norm of (h,g(-)), i.e. we solve
    i=1,..,p |hi(x)| - ∑i=1,...,m min{0,gi(x)} = 0
    by a variant of Newtons method. If this fails, the code ''gives up''.
  • Besides the successful termination, indicated as
    ''kuhn-tucker conditions satisfied''
    there are a lot more termination reasons:
    ''lp not bounded from below'',
    ''restoration seems to diverge'' (possibly due to illconditioning of N),
    ''no feasible end point found in unidimensional search'' (extremely narrow feasible set),
    ''restoration converges too slowly'' (possibly due to illconditioning of N),
    ''restoration does not converge for sig=sig_min=10-10 '' (maybe a nonsmooth problem?),
    ''no stepsize found for sufficient decrease'' (direction d spoiled by roundoff?),
    ''qp-problem: regularization becomes too large'' (regularization would be larger than 0.1 × ||H||),
    ''rankdeficient matrix N occurs'',
    ''more than maxit steps'',
    ''d too small'' (no significant change in x can be obtained),
    ''directional derivative in the roundoff level'' (of f: no further significant reduction of f can be obtained),
    ''slow convergence with bad conditioning'' (f changes in the order of the square root of the machine precision and simultaneously the conditioning of the projected Hessian is larger than 1000, but we are almost primal and dual feasible).
  • You may use predefined testcases for your experiments from here or define your own problem.
 

There are two different input pages: Predefined und Userdefined

Input: predefined testcases

 
  • Select a problem number from testcases The problem number always looks like hs followed by a number and then sometimes followed by scal (=scaled). You must write the problem number exactly as seen there. The scaled versions allow you to study the effects of bad or good scaling of a problem, which in real life applications is of outmost importance.
  • in some of these testcases the standard initial points makes trouble in phase 1: finding an initial feasible point. here you find a list of feasible initial points: list of feasible initial points for the nonlinear testcases.
  • You may choose the amount of output: final results only or a short report of intermediate steps.
  • You may use the predefined initial point, but can also provide your own choice for this. It may be interesting to play with different initial points.
  • There is a default initialization of some algorithmic parameters, but you may make here your own choice as well:
    1. itermax: the maximum number of steps allowed. Internally there is the bound itermax <= 1000 .
    2. epsx: The parameter describing ''sufficient precision'': we terminate if
      ||projected gradient|| <= epsx*(1+epsx*|f|) or ||x-xprevious|| <= epsx*(1+||x||).
      Important: epsx > 0 !
    3. delmin: A bound for the error in the constraints: (since we allow nonlinear constraints you cannot hope for exact zeroes here)
    4. rho, rho1: 1/rho is the bound for the allowed condition number of N. If this becomes violated, we ''give up''. We measure this quite roughly from the diagonal of R-part in the QR decomposition of N. 1/rho1 is a bound for the condition number of the BFGS approximation. We measure it via using only the diagonal part of the Bunch-Parlett decomposition of B. If this bound becomes violated a reinitialization is done with a multiple of the unit matrix.
    5. c2u: we use multiple inactivation: (the block structure of QTHQ allows this): If
      λmin = min{ λi : i in J } < 0
      then all constraints with
      λi <= c2u × λmin
      will be inactivated. They become strictly positive subsequently.
  • You can require a contourplot of the exact l1- penalty function of the problem (with two components of x varying with the other components and the multiplier estimates u fixed):
    f(x)+sumi=1,..,p { (|ui|+1){hi(x)|} +sumi=1,..,m{ (u(i)+1) max{0,-gi(x)} }
    In this case you specify the components and the width of the rectangle around the current point. You also can require that this is cut with respect to possibly existing bound constraints.
 

Input: user defined testcase

 
  • You first specify the dimensions of your problem: n the number of variables and
  • p=nh, the number of equality constraints (can be zero) and m=ng the number of inequality constraints, can also be zero but (nh+ng > 0 !)
  • Then you specify the objective function fx and the functions decribing the equality constraints hxi and the inequality constraints gxi as described here.
  • You specify a name of your testcase. This is empty or alphanumeric with no blank inserted.
  • You specify your initial guess xstart.
  • You declare the properties of your problem: LP, QP or general?
  • You have the possibility to determine the amount of output: final results only or a short protocol of all intermediate results.
  • There is a default initialization of some algorithmic parameters, but you may make here your own choice as well:
    1. itermax: the maximum number of steps allowed. Internally there is the bound itermax <= 1000
    2. epsx: The parameter describing ''sufficient precision'': we terminate if
      ||projected gradient|| <= epsx*(1+epsx*|f|) or ||x-xprevious|| <= epsx*(1+||x||).
      Important: epsx > 0 !
    3. delmin: A bound for the allowed error in the constraints: (since we allow nonlinear constraints you cannot hope for exact zeroes here)
    4. rho, rho1: 1/rho is the bound for the allowed condition number of N. If this becomes violated, we ''give up''. We measure this quite roughly from the diagonal of R in the QR decomposition of N. 1/rho1 is a bound for the condition number of the BFGS approximation. We measure it via using only the diagonal part of the Bunch-Parlett decomposition of B. If this bound becomes violated a reinitialization is done with a multiple of the unit matrix.
    5. c2u: we use multiple inactivation: (the block structure of QTHQ allows this): If
      λmin = min{ λi : i in J } < 0
      then all constraints with
      λi <= c2u × λmin
      will be inactivated in the next step. They become strictly positive subsequently.
  • You can require a contourplot of the exact l1- penalty function of the problem (with two components of x varying with the other components and the multiplier estimates u fixed):
    f(x)+sumi=1,..,p { (|ui|+1){hi(x)|} +sumi=1,..,m{ (u(i)+1) max{0,-gi(x)} }
    In this case you specify the components and the width of the rectangle around the current point. You also can require that this is cut with respect to possibly existing bound constraints.
 

Output

 
  • You obtain the termination reason, the best so far computed solution and a statistic about the amount of work done. cputime=0 means ''less than 0.01 seconds''.
  • If you required a protocol of the iteration, then you get a list as follows: f(x)=objective function value, projgrad= norm of projected gradient (on the current manifold), infeas=L1-norm of (h,g(-)), uminsc= the smallest negative multiplier divided by the norm of the corresponding gradient, dirder=dT ∇ f(x), sigma=current σ, sigma*=T, if a constraint has been hit and included in the active set, =F otherwise, #restor = the number of simplified Newton steps required for restoration, including the failed ones. For fixed σ we try at most 10 such steps. phase=1 normal minimization, phase=-1 : initial search for a feasible point.
  • a plot of f - fmin
  • and a semilogarithmic plot of the norm of the projected gradient and the dual infeasibility.
  • The contourplot, if required.
 

To the input form: predefined testcases

 

To the input form: user defined testcase

 
Back to the top!

07.02.2018