|
Parameter identification for ordinary differential equations |
|
|
|
|
|
- In the following, y, f and G denote functions with
n components and y' denotes componentwise differentiation.
Typically there will be a ''free'' real variable t with the
meaning ''time'' and a set of r parameters p which fix f.
- In many applications one has a process, whose time evolution is described by
a system of ordinary differential equations
y = y(t,p)
y' = f(t,y,p) on a time interval ta <= t <= tb
with p unknown. An initial value y0 for y(ta,p) may be known or sought too.
Also some or all final values of y might be prescribed.
Some (or possible all) components of y are known only via a set of
measured data
(ti,yi,j), 1 <= i <= me , j in J
with J a subset of 1,...,n.
What is required is the
value of p. This is the so called parameter identification
problem. (Contrary, the structure of f must be known of course).
For example the predefined testcase here is the scalar problem
y' = p1y + p2 , y(0) = y0
with p1, p2 and y0
as variables to be determined.
- In another possible application the initial values of y might be known
and final values are prescribed, and the parameter p is sought which
satisfies these. In that case no measurement values may be present. This
type of problem is known as the computation of a feasible point in optimization.
-
This problem is cast into an optimization problem
Minimize ∑i=1 to me, j in J
(yi,j-yj(ti,p))2
with respect to p and additional parameters (for example the values of
y at ta), which may make the solution of the ode unique.
Additional prescriptions on the solution or the parameters are added to this
problem as side constraints. If there are no measurement points, then the objective function is
set identically zero and the optimizer deals only with the constraints on y and p.
- Here we provide an even more general concept, known from the ''multiple shooting''
solution of boundary value problems: the function y
is considered on a fixed predefined grid, covering the measurement points,
Tk, k=1,..,m with
T1=ta, Tm=tb
and the values of y at T1=ta,..., Tm-1
are also optimization variables. One can now impose conditions on y
at these grid points Tk and also on some values of y at the endpoint
Tm. For example, one may prescribe a total of n values of
y at the first and the last gridpoint, which means parameter identification
of a boundary value problem. Each step of the optimization process now requires
the solution of the corresponding ordinary differential equation, given the
current value of p and the additional initial values.
Since we use an SQP based optimizer we also need the derivatives of y
with respect to the initial values and the parameters p, which we
compute here via the so called variational equations:
(d(y(t,p)/d p)' = fy(t,y,p) (d(y(t,p)/d p) + fp(t,y,p),
d( y(Tk,p)/d p ) = O
(d( y(t,p)/d y0 )' = fy(t,y,p) (dy(t,p)/d y0)
d( y(Tk,p)/d y0 ) = I
Don't panic: these variational equations are generated here automatically for you .
- This results in a
set of m-1 initial value problems to be solved for every optimization
step (and hence any evaluation of the sum of squares of deviations and the
constraints).
Any of these m-1 initial value problems hence has the dimension (n+1+r) × n.
In order to discern those partial solutions we use
the notation y[k](t,p) . These initial value problems hence read
y'[k] = f(t,y[k],p), Tk <= t <= Tk+1,
y[k](Tk,p)= y0[k], k=1,...,m-1,
Optimization variables are p and y0[k], k=1,...,m-1.
- Normally, continuity of the solution over the whole interval is
required, but you can fix values or jump conditions for every component of
y at the grid points. You should however be aware that the multipoint boundary
value problem may have no solution if you describe more than n fixed values.
- In addition, it is possible to prescribe conditions on the parameters p
in form of lower bounds, upper bounds, fixed values for the components and
linear relations between them (again lower bounds, upper bounds or equality constraints).
Typical such conditions are
0 <= pi <= 1 , ∑i pi = 1
- The initial value solver used here is DASSL from Linda Petzold, available through netlib/ode, which
uses the ode in the form
G(t,y,y',p) = 0
and would allow also algebraic conditions on y over the time interval.
However, in order to avoid the need to provide consistent initial values on the fixed grid , we
require that G has the form
G(t,y,y',p) = -y' + f(t,y,p)
This makes computation of consistent values trivial.
- If problems occur in the integration process, the error messages of DASSL will
apply. These apply for the integration on one subinterval. The most important are
- cannot store more than 50000 data items from integration
- more than 50000 steps
- repeated error test failures
- integration can not be continued at T=... with h=...
This means that either your precision requests tolrel and smally
(see the input page) are too
strong or during the run a moving singularity occured in a component of the
solution, for example due to a too crude initial guess of the initial values
for the optimizer. Either decrease the precision requirement or change the
initial guesses in this case.
- The optimizer used is DONLP2, see for this in the section on constrained
optimization. Its most important exit messages which apply here are
- The ''X'' of the optimizer is the ''p'', the ode parameter, of course, plus
the values of y on the grid points T1,...,Tm-1
- ''MORE THAN MAXIT ITERATION STEPS'': too much work required in order to reach the required precision.
Possibly a very illconditioned or badly scaled problem with many restarts.
- ''STEPSIZESELECTION: NO ACCEPTABLE STEPSIZE IN [SIGSM,SIGLA]'': typically the effect of unsmooth behaviour
or severe illconditioning or too large penalty weights.
Since DASSL uses adaptive stepsizes its output behaves necessarily nonsmooth.
And although a high precision can be required in the integration process this
will, especially near the optimal solution, lead to such a message, since the optimizer assumes
smooth behaviour up to second order differentiability.
Hence this will be the prominent terminal message and does not indicate a failure.
Indeed, since the gradients are computed here internally, coding errors
are excluded.
- ''KT-CONDITIONS SATISFIED, NO FURTHER CORRECTION COMPUTED'' : the best successful exit.
- ''KT-CONDITIONS SATISFIED, COMPUTED CORRECTION SMALL'': essentially the same as the foregoing.
- ''STEPSIZESELECTION: X ALMOST FEASIBLE, DIR. DERIV. VERY SMALL'': a very flat penalty function
φ,
continuing the iteration seems meaningless, maybe your precision requirements were too strong.
- ''KT-CONDITIONS (RELAXED) SATISFIED, SINGULAR POINT'': if the full SQP step is taken we relax automatically
the parameter epsx by a factor 100, the other termination conditions being unchanged, hence this
also indicates a succesful termination usually.
- ''VERY SLOW PRIMAL PROGRESS, SINGULAR OR ILLCONDITONED PROBLEM'': x is sufficiently feasible and
does not change significantly over several steps.
- ''CORRECTION VERY SMALL, ALMOST FEASIBLE BUT SINGULAR POINT'': ||d|| < epsx*0.01*min(1,||x||)
during the full SQP step, i.e. correction much smaller than required precision.
- ''MAX(N,10) SMALL DIFFERENCES IN PENALTY FUNCTION,TERMINATE'': φ doens't change significantly any more.
- Of course there is a dependency of the different precision requirements:
How accurate to do the integration (and in dependency of this, how accurate
to solve the implicit equations of a single integration step) and how accurate
to solve the optimization problem.
As default we propose epsx=max(tolrel1/2,10-4) as the default precision
for the optimizer with tolrel the relative precision requirement for the
integrator.
This because tolrel describes the error in the function values and in the
so called ''horizontal'' direction (the direction tangential to the boundary of the
feasible set) the objective function will no longer decrease significantly in a distance
from the optimizer which is in the order of the root of this.
Hence tolrel should be chosen in the range 10-5,..,
10-10.
|
|
|
|
|
|
Input: Predefined cases |
|
- There is a simple predefined problem
y' = p1 y + p2 , y(0) = y0, 0 <= t <= 1
(i.e. n=1, r=2) with the ''true'' solution
y(t,p) = (y0+p2/p1)epx(p1 t) - p2/p1 for p1 ≠ 0
and y(t,p) = y0 + p2 t otherwise .
We use the problem on a grid of length m=4 and the grid
0, 0.3, 0.6, 1.0 .
As default we use p1=-2, p2=1 , y0=4
for generating the ''true'' values of y, i.e.
y(t,p) = (7/2)exp(-2*t)+(1/2)
on a grid of
equidistant values of tmess,i but you can change these
three values. Then you choose a level of perturbation u in [0,1]
and the number me of ''measurement'' points. We then compute the true solution
of the initial value problem and perturb the values y
by the formula
ymess = y+2*(1/2-ξ)*YA*u
where YA is the largest absolute value of a generated y.
ξ is an internally generated pseudorandom number in ]0,1[.
In this way we simulate measurement errors. The error level hence is absolute
and 100u percent w.r.t. YA.
The original true coefficients and solution are used as initial guess for
the optimizer. (Hence, if you take u=0 , the optimizer will
stop immediately with these values verified).
We fix no values of y but require continuity (and hence C1 continuity) over [0,1],
hence we have 5 optimization variables
(the three values of y at t=0, 0.3, 0.6 and the two parameters of the ode
and two nonlinear equality constraints (continuity at t=0.3, 0.6).
You can now check how much the perturbation will change the input values
via the given optimization criterion.
In addition you may change the default settings of some parameters
of the ode-solver and the optimizer as follows:
- You have the choice to limit the maximal attainable order, e.g. 2 in order to use A-stable
integrators only.
- You may set the two parameters tolrel and smally controlling the precision delivered
by the integrator. This results in reltoli = tolrel, abstoli=tolrel*smally
for all components of y. DASSL uses the internal requirement for the local error
erri <= reltoli |yi| + abstoli
- You specify the maximal allowed stepsize hmax to be taken (which of course must be considerably smaller than
1.0).
- You also may specify the initial stepsize to be tried, which of course must not be larger than hmax.
- And finally, you may change p1, p2, y0 used for the data generation.
- You may change the error control of the optimizer: the required precision epsx for the solution,
the required precision delmin for the constraints, the maximal allowed deviation of the
constraints from their nominal values tau0 and the amount of output io.
We recommend io=0 (no intermediate output) or io=1 (one line output showing the
optimizers progress) since for io=2,3 you need to know a lot about DONLP2 in order
to understand the enormous amount of output you will get then.
|
|
|
Output: predefined case |
|
- A small table showing you the input for the
optimizer.
- The number of function evaluations used by the optimizer.
- The three error criteria for the optimizer: Lagrangian violation
(must be taken relative to the gradient of the objective function, the sum of squares of deviations),
the feasibility violation and the dual feasibility violation.
- The total computing time.
- The computed optimal parameters: the first r values are the ode parameters p
and the grid values of y follow in the natural enumeration.
- A table of the computed Lagrange multipliers
- Estimates of the conditioning of the constraints gradients and
of the Hessian of the (augmented) Lagrangian function.
These values are of interest for judging the sensitivity of the problem.
- From the side of the ode solver DASSL you get information about the cumulative
effort of the whole process: number of integration steps, number of rejected steps in
the control of the local error, number of steps with trouble in the Newton solver,
number of evaluations of G(t,y,y',p) and its Jacobian, and the last
integration order used.
- In addition you get a plot of:
- step sizes used (in log form).
- local error estimation(in log form).
- the actual order used
- the computed solution of the ode together with the data.
- A plot showing the progress of the optimizer (norm of the gradient of
the Lagrangian function (this is ''grad l'' in red) and value of the penalty term ,
a weighted sum of the values of the constraints violations, (this is ''psi'' in green))
- !!!! It may occur that a log-scale displays no values
or a message like "all points undefined" appears.
this means that the errors are so small that by the default values of
the graphic program used these are all considered as zero.
This might occur with very strong precision requirements.
- Should the integrator or the optimizer issue warning or error
messages then you get these as a protocol.
|
|
|
Questions: Predefined case?! |
|
- Generate a table which relates the level of perturbation
of the solution with the perturbation of the coefficients.
How does this change, if you make the solution more flat or
more steep?
- Why may you not relax your integration precision arbitrarily
without shutting down the whole process? Think about the fact that
DASSL needs internally a Jacobian of the variational equations done by finite differences.
- Whereas the variational equations are generated using automatic
differentiation, in their evaluation enter the numerical values of
y. Which influence may this have on the precision of the
gradients required by the optimizer?
- If you work with low precision requirements on the integrator,
how does this influence the behaviour of the optimizer and why?
|
|
|
|
|
|
Input: user defined case |
|
- Here you have the possibility to define a problem yourself. This requires
- Specification of n, m, r. n is the number of components of
y resp. G(t,y,y',p), m is the number of fixed grid points
other than the measurement points and r is the number of parameters p.
- You write a piece of code for computing the components of G.
Details are given on the input form and in the help text.
- You specify the number of measurement points, the number and index
of the components of y for which you have data and these data.
It is possible to specify the case of no measurements here, i.e. anzmess=0
and mmess=0
- For graphical output you can specify two components of y for which you have data for plotting.
- You may change default settings of parameters for the ode solver and the optimizer as follows:
- You have the choice to limit the maximal attainable order, e.g. 2 for BDF in order to use A-stable
integrators only.
- You specify the maximal allowed stepsize to be taken (which of course must be considerably smaller than
the length of the whole interval.
- You also specify the initial used stepsize
- In order to specify your bound for the local discretization error you specify
two parameters tolrel and smally resulting in
reltoli=tolrel and abstoli=smally*tolrel for all i.
Important: 0 < tolrel < 1 and smally > 0 !
The internal bound for the local discretization error is then
erri <= reltol * abs (ycomputed,i) + abstol.
- You specify the type of constraints on the solution y on the fixed grid. This is done
using a table of number pairs type,value.
In the input form you find the detailed instruction for this.
- equally, you specify possible restrictions on the ode parameters p (bounds, fixed
values or linear relations). How to do this is also to be found on the input form.
- and, last not least, you must specify initial guesses for all optimization
variables, namely the ode parameters p and the values of y on the grid of
the Tk, k=1,...,m-1.
- You may change the error control of the optimizer: the required precision epsx for the solution,
the required precision delmin for the constraints, the maximal allowed deviation of the
constraints from their nominal values tau0 and the amount of output io.
We recommend io=0 (no intermediate output) or io=1 (one line output showing the
optimizers progress) since for io=2,3 you need to know a lot about DONLP2 in order
to understand the enormous amount of output you will get then.
|
|
|
Output: user defined case |
|
- This is the same as for the predefined case.
|
|
|
Questions: user defined case?! |
|
- How might you manage to get an impression on the sensitivity
of your computed ode parameters w.r.t to the error necessarily inherent
in your data?
- The questions for the user defined case concerning the precision control
apply here too.
|
|
|
|
|