|
Active set methods/ BFGS-GRG method |
|
|
|
|
|
|
|
- Here you find a code for solving a constrained optimization
problem using the active set method.
For basic facts on optimization problems of this kind:
f(x) = min, h(x)=0, g(x)>=0
look
here.
- ''active set method'' means a method where the current approximations
to the solution always satisfy the constraints and typically move along
boundary manifolds of the feasible set, i.e. some of inequality constraints
are at their lower bound and the equality constraints are always satisfied
of course.
- In the following the dimensions are n, p and m for x, h, g
respectively and J is a subset of {1,...,m} such that
gi(x) = 0 for i in J
We use the short notation
N = ( ∇h , ∇gJ )(x)
for the matrix whose columns are build from the gradients of all
constraints satisfied with equality. Of course J
also depends on x, but we omit this in order to simplify the
notation.
The code assumes that N always has full column rank
and gives up if this is not the case .
- The idea of the active set method is to reduce the general problem
to one which is only equality constrained (locally, with changes
of the active set allowed of course). The equality constrained
problem is then solved by a Newton type method until a local solution
is obtained or a change of the active set is required.
- This Newton type method consists of two parts: in the first step
a correction direction d is computed using a modified
Kuhn - Tucker system which reads
H d + N u = ∇ f(x)
NT d = 0 ,
where the right hand side ''0'' of the second equation comes
from the fact that the active constraints have value 0.
The vector u consists of Lagrange multiplier estimates
for the equality constraints and the active inequality constraints,
hence it is u = [ μ , λJ ] .
Clearly, if d = 0 and λJ > = 0
then a local solution is obtained (since we restrict ourselves in
finding first order necessary points for optimality, the Kuhn-Tucker points).
The special form of H which is used here allows to decompose the solution
of this linear system into three resp. four smaller systems such that for
example u can be computed independent of d, see below.
- In the second step a new value for x is determined.
d is tangential to the current boundary manifold. If there are only
linear constraints, we search the next x+ as
x+ = x - σ d
moving along this ray until either a local minimizer of f along this
ray is found or a new constraint becomes active. In both cases we stop the search
at the corresponding point. If f is linear, then always the second case
applies, since otherwise the objective function is not bounded from below on the
feasible set. If f is quadratic and convex, then the first case is possible.
If f is ''general'', then the exact unidimensional minimization is not
practicable and we use a backtracking scheme in order to obtain ''sufficient descent'', i.e.
f(x) - f(x+) >= δ σ dT ∇ f(x) .
The stepsize σ is chosen in the sequence {1,1/2,1/4,....}. δ
is an algorithmic parameter to be chosen, satisfying 0 < δ < 1/2 .
Otherwise the ''optimal'' step could be prohibited. We use δ = 0.01 here.
- If there are nonlinear constraints, then
x - σ d will not necessarily be feasible and we add a further correction to it:
The linear ray is now replaced by a pointwise computed nonlinear one, which looks like
x - σ d + N(x)c(σ) = x+(σ)
where c(σ) is computed such that
( h, gJ )(x+(σ)) = 0 .
This results in a nonlinear system of equations for c(σ) which itself is solved by iteration.
This is known as a ''restoration'' in the optimization literature.
We try to solve this to full precision and if this is not reached after 10 (another
algorithmic parameter, fixed here) iterations, we reduce σ and try again.
From Taylors theorem it follows that this finally will succeed.
- The matrix H above plays a special role here: If all functions are linear,
then we use the identity matrix. Because of the stepsize technique this results in a
variant of the Simplex algorithm for the LP problem.
If f is quadratic and strictly convex, then we take its exact Hessian as H
and otherwise the exact Hessian plus a suitable multiple of the identity matrix to
make it positive definite. In all other cases we use the BFGS approximation to the
projected Hessian of the Lagrangian function of the problem, with eventual reinitialization
by a multiple of the (scaled) identity matrix in case of too much illconditioning and
QHQT is a block matrix of the form
[ O O O ]
[ O I O ]
[ O O B ]
where B is the projected BFGS part. The I-part is present in case of an
inactivation in the current step. The identity has the dimension of the number of
inactivated constraints then. The Q-matrix used comes from the QR decomposition of N:
QN is upper triangular. In order to be able to do this efficiently, also a full
BFGS update is maintained and used as long as the active set changes or d is not small.
Thereby we don't loose second order information accumulated during previous steps.
- Typically far from the solution some of computed inequality multipliers λJ
have a wrong sign. In this case a change of the active set
is indicated: we delete several of the inequality constraints from the active set
(those with the algebraically smallest multipliers , determined by another algorithmic parameter)
and compute d for a larger constraint manifold. This is possible due to the special choice
we make for H. This increases ||d|| and hence the possible decrease of f in this
direction.
- One can show that this results in a finite process for LP and strictly convex QP and in
superlinear convergence in the general case (provided the local solution satisfies
the strong second order sufficiency conditions).
- We allow an infeasible initial point. In this case the method begins by minimizing the
L1-Norm of (h,g(-)), i.e. we solve
∑i=1,..,p |hi(x)| - ∑i=1,...,m min{0,gi(x)} = 0
by a variant of Newtons method.
If this fails, the code ''gives up''.
- Besides the successful termination, indicated as
''kuhn-tucker conditions satisfied''
there are a lot more termination reasons:
''lp not bounded from below'',
''restoration seems to diverge'' (possibly due to illconditioning of N),
''no feasible end point found in unidimensional search'' (extremely narrow feasible set),
''restoration converges too slowly'' (possibly due to illconditioning of N),
''restoration does not converge for sig=sig_min=10-10 '' (maybe a nonsmooth problem?),
''no stepsize found for sufficient decrease'' (direction d spoiled by roundoff?),
''qp-problem: regularization becomes too large'' (regularization would be larger than 0.1 × ||H||),
''rankdeficient matrix N occurs'',
''more than maxit steps'',
''d too small'' (no significant change in x can be obtained),
''directional derivative in the roundoff level'' (of f: no further significant reduction of f
can be obtained),
''slow convergence with bad conditioning'' (f changes in the order of the square root
of the machine precision and simultaneously the conditioning of the projected Hessian is larger than 1000,
but we are almost primal and dual feasible).
- You may use predefined testcases for your experiments from
here or define your
own problem.
|
|
|
|
|
Input: predefined testcases |
|
- Select a problem number from
testcases
The problem number always looks like hs followed by a number
and then sometimes followed by scal (=scaled). You must
write the problem number exactly as seen there. The scaled versions allow you
to study the effects of bad or good scaling of a problem, which in real
life applications is of outmost importance.
- in some of these testcases the standard initial points makes trouble
in phase 1: finding an initial feasible point.
here you find a list of feasible initial points:
list of feasible initial points for the
nonlinear testcases.
- You may choose the amount of output: final results only or a
short report of intermediate steps.
- You may use the predefined initial point, but can also provide your own
choice for this. It may be interesting to play with different initial points.
- There is a default initialization of some algorithmic parameters, but you may
make here your own choice as well:
- itermax: the maximum number of steps allowed. Internally there is
the bound itermax <= 1000 .
- epsx: The parameter describing ''sufficient precision'': we terminate if
||projected gradient|| <= epsx*(1+epsx*|f|) or
||x-xprevious|| <= epsx*(1+||x||).
Important: epsx > 0 !
- delmin: A bound for the error in the constraints:
(since we allow nonlinear constraints you cannot hope for exact zeroes here)
- rho, rho1:
1/rho is the bound for the allowed condition number of N.
If this becomes violated, we ''give up''. We measure this quite roughly from the
diagonal of R-part in the QR decomposition of N.
1/rho1 is a bound for the condition number of the BFGS approximation.
We measure it via using only the diagonal part of the Bunch-Parlett decomposition of B.
If this bound becomes violated a reinitialization is done with a multiple of the
unit matrix.
- c2u: we use multiple inactivation: (the block structure of QTHQ allows this):
If
λmin = min{ λi : i in J } < 0
then all constraints with
λi <= c2u × λmin
will be inactivated. They become strictly positive subsequently.
- You can require a contourplot of the exact
l1- penalty function of the problem (with two components of x varying with the other
components and the multiplier estimates u fixed):
f(x)+sumi=1,..,p { (|ui|+1){hi(x)|}
+sumi=1,..,m{ (u(i)+1) max{0,-gi(x)} }
In this case you specify the components and the width of the rectangle around the current point.
You also can require that this is cut with respect to possibly existing bound constraints.
|
|
|
Input: user defined testcase |
|
- You first specify the dimensions of your problem:
n the number of variables and
- p=nh, the number of equality constraints (can be zero) and m=ng
the number of inequality constraints, can also be zero but (nh+ng > 0 !)
- Then you specify the objective function fx and the functions decribing the
equality constraints hxi and the inequality constraints gxi as described
here.
- You specify a name of your testcase. This is empty or alphanumeric with no blank inserted.
- You specify your initial guess xstart.
- You declare the properties of your problem: LP, QP or general?
- You have the possibility to determine the amount of output: final results only or a short
protocol of all intermediate results.
- There is a default initialization of some algorithmic parameters, but you may
make here your own choice as well:
- itermax: the maximum number of steps allowed. Internally there is
the bound itermax <= 1000
- epsx: The parameter describing ''sufficient precision'': we terminate if
||projected gradient|| <= epsx*(1+epsx*|f|) or
||x-xprevious|| <= epsx*(1+||x||).
Important: epsx > 0 !
- delmin: A bound for the allowed error in the constraints:
(since we allow nonlinear constraints you cannot hope for exact zeroes here)
- rho, rho1:
1/rho is the bound for the allowed condition number of N.
If this becomes violated, we ''give up''. We measure this quite roughly from the
diagonal of R in the QR decomposition of N.
1/rho1 is a bound for the condition number of the BFGS approximation.
We measure it via using only the diagonal part of the Bunch-Parlett decomposition of B.
If this bound becomes violated a reinitialization is done with a multiple of the
unit matrix.
- c2u: we use multiple inactivation: (the block structure of QTHQ allows this):
If
λmin = min{ λi : i in J } < 0
then all constraints with
λi <= c2u × λmin
will be inactivated in the next step. They become strictly positive subsequently.
- You can require a contourplot of the exact
l1- penalty function of the problem (with two components of x varying with the other
components and the multiplier estimates u fixed):
f(x)+sumi=1,..,p { (|ui|+1){hi(x)|}
+sumi=1,..,m{ (u(i)+1) max{0,-gi(x)} }
In this case you specify the components and the width of the rectangle around the current point.
You also can require that this is cut with respect to possibly existing bound constraints.
|
|
|
Output |
|
- You obtain the termination reason, the best so far
computed solution and a statistic about the amount of work done.
cputime=0 means ''less than 0.01 seconds''.
- If you required a protocol of the iteration, then you get a list as follows:
f(x)=objective function value, projgrad= norm of projected gradient (on the current manifold),
infeas=L1-norm of (h,g(-)), uminsc= the smallest negative multiplier divided by the
norm of the corresponding gradient,
dirder=dT ∇ f(x), sigma=current σ, sigma*=T, if a
constraint has been hit and included in the active set, =F otherwise,
#restor = the number of simplified Newton steps required for restoration, including the failed ones.
For fixed σ we try at most 10 such steps.
phase=1 normal minimization, phase=-1 : initial search for a feasible point.
- a plot of f - fmin
- and a semilogarithmic plot of the norm of the projected gradient and the dual infeasibility.
- The contourplot, if required.
|
|
|
|
|
|
|