Parameter identification for ordinary differential equations

directly to the input form: predefined case

directly to the input form: user defined case

 
  • In the following, y, f and G denote functions with n components and y' denotes componentwise differentiation. Typically there will be a ''free'' real variable t with the meaning ''time'' and a set of r parameters p which fix f.
  • In many applications one has a process, whose time evolution is described by a system of ordinary differential equations
    y = y(t,p)
    y' = f(t,y,p)
    on a time interval ta <= t <= tb
    with p unknown. An initial value y0 for y(ta,p) may be known or sought too. Also some or all final values of y might be prescribed. Some (or possible all) components of y are known only via a set of measured data
    (ti,yi,j), 1 <= i <= me , j in J with J a subset of 1,...,n.
    What is required is the value of p. This is the so called parameter identification problem. (Contrary, the structure of f must be known of course). For example the predefined testcase here is the scalar problem
    y' = p1y + p2 , y(0) = y0
    with p1, p2 and y0 as variables to be determined.
  • In another possible application the initial values of y might be known and final values are prescribed, and the parameter p is sought which satisfies these. In that case no measurement values may be present. This type of problem is known as the computation of a feasible point in optimization.
  • This problem is cast into an optimization problem
    Minimize i=1 to me, j in J (yi,j-yj(ti,p))2
    with respect to p and additional parameters (for example the values of y at ta), which may make the solution of the ode unique. Additional prescriptions on the solution or the parameters are added to this problem as side constraints. If there are no measurement points, then the objective function is set identically zero and the optimizer deals only with the constraints on y and p.
  • Here we provide an even more general concept, known from the ''multiple shooting'' solution of boundary value problems: the function y is considered on a fixed predefined grid, covering the measurement points,
    Tk, k=1,..,m with T1=ta, Tm=tb
    and the values of y at T1=ta,..., Tm-1 are also optimization variables. One can now impose conditions on y at these grid points Tk and also on some values of y at the endpoint Tm. For example, one may prescribe a total of n values of y at the first and the last gridpoint, which means parameter identification of a boundary value problem. Each step of the optimization process now requires the solution of the corresponding ordinary differential equation, given the current value of p and the additional initial values. Since we use an SQP based optimizer we also need the derivatives of y with respect to the initial values and the parameters p, which we compute here via the so called variational equations:
    (d(y(t,p)/d p)' = fy(t,y,p) (d(y(t,p)/d p) + fp(t,y,p),
    d( y(Tk,p)/d p ) = O
    (d( y(t,p)/d y0 )' = fy(t,y,p) (dy(t,p)/d y0)
    d( y(Tk,p)/d y0 ) = I
    Don't panic: these variational equations are generated here automatically for you .
  • This results in a set of m-1 initial value problems to be solved for every optimization step (and hence any evaluation of the sum of squares of deviations and the constraints). Any of these m-1 initial value problems hence has the dimension (n+1+r) × n.
    In order to discern those partial solutions we use the notation y[k](t,p) . These initial value problems hence read
    y'[k] = f(t,y[k],p), Tk <= t <= Tk+1, y[k](Tk,p)= y0[k], k=1,...,m-1,
    Optimization variables are p and y0[k], k=1,...,m-1.
  • Normally, continuity of the solution over the whole interval is required, but you can fix values or jump conditions for every component of y at the grid points. You should however be aware that the multipoint boundary value problem may have no solution if you describe more than n fixed values.
  • In addition, it is possible to prescribe conditions on the parameters p in form of lower bounds, upper bounds, fixed values for the components and linear relations between them (again lower bounds, upper bounds or equality constraints). Typical such conditions are
    0 <= pi <= 1 , i pi = 1
  • The initial value solver used here is DASSL from Linda Petzold, available through netlib/ode, which uses the ode in the form
    G(t,y,y',p) = 0
    and would allow also algebraic conditions on y over the time interval. However, in order to avoid the need to provide consistent initial values on the fixed grid , we require that G has the form
    G(t,y,y',p) = -y' + f(t,y,p)
    This makes computation of consistent values trivial.
  • If problems occur in the integration process, the error messages of DASSL will apply. These apply for the integration on one subinterval. The most important are
    1. cannot store more than 50000 data items from integration
    2. more than 50000 steps
    3. repeated error test failures
    4. integration can not be continued at T=... with h=...
    This means that either your precision requests tolrel and smally (see the input page) are too strong or during the run a moving singularity occured in a component of the solution, for example due to a too crude initial guess of the initial values for the optimizer. Either decrease the precision requirement or change the initial guesses in this case.
  • The optimizer used is DONLP2, see for this in the section on constrained optimization. Its most important exit messages which apply here are
    1. The ''X'' of the optimizer is the ''p'', the ode parameter, of course, plus the values of y on the grid points T1,...,Tm-1
    2. ''MORE THAN MAXIT ITERATION STEPS'': too much work required in order to reach the required precision. Possibly a very illconditioned or badly scaled problem with many restarts.
    3. ''STEPSIZESELECTION: NO ACCEPTABLE STEPSIZE IN [SIGSM,SIGLA]'': typically the effect of unsmooth behaviour or severe illconditioning or too large penalty weights. Since DASSL uses adaptive stepsizes its output behaves necessarily nonsmooth. And although a high precision can be required in the integration process this will, especially near the optimal solution, lead to such a message, since the optimizer assumes smooth behaviour up to second order differentiability.
      Hence this will be the prominent terminal message and does not indicate a failure.

      Indeed, since the gradients are computed here internally, coding errors are excluded.
    4. ''KT-CONDITIONS SATISFIED, NO FURTHER CORRECTION COMPUTED'' : the best successful exit.
    5. ''KT-CONDITIONS SATISFIED, COMPUTED CORRECTION SMALL'': essentially the same as the foregoing.
    6. ''STEPSIZESELECTION: X ALMOST FEASIBLE, DIR. DERIV. VERY SMALL'': a very flat penalty function φ, continuing the iteration seems meaningless, maybe your precision requirements were too strong.
    7. ''KT-CONDITIONS (RELAXED) SATISFIED, SINGULAR POINT'': if the full SQP step is taken we relax automatically the parameter epsx by a factor 100, the other termination conditions being unchanged, hence this also indicates a succesful termination usually.
    8. ''VERY SLOW PRIMAL PROGRESS, SINGULAR OR ILLCONDITONED PROBLEM'': x is sufficiently feasible and does not change significantly over several steps.
    9. ''CORRECTION VERY SMALL, ALMOST FEASIBLE BUT SINGULAR POINT'': ||d|| < epsx*0.01*min(1,||x||) during the full SQP step, i.e. correction much smaller than required precision.
    10. ''MAX(N,10) SMALL DIFFERENCES IN PENALTY FUNCTION,TERMINATE'': φ doens't change significantly any more.
  • Of course there is a dependency of the different precision requirements: How accurate to do the integration (and in dependency of this, how accurate to solve the implicit equations of a single integration step) and how accurate to solve the optimization problem. As default we propose epsx=max(tolrel1/2,10-4) as the default precision for the optimizer with tolrel the relative precision requirement for the integrator. This because tolrel describes the error in the function values and in the so called ''horizontal'' direction (the direction tangential to the boundary of the feasible set) the objective function will no longer decrease significantly in a distance from the optimizer which is in the order of the root of this. Hence tolrel should be chosen in the range 10-5,.., 10-10.
 

There are two different input forms: Predefined
and User defined

 

Input: Predefined cases

 
  • There is a simple predefined problem
    y' = p1 y + p2 , y(0) = y0, 0 <= t <= 1
    (i.e. n=1, r=2) with the ''true'' solution
    y(t,p) = (y0+p2/p1)epx(p1 t) - p2/p1 for p1 ≠ 0
    and
    y(t,p) = y0 + p2 t otherwise .
    We use the problem on a grid of length m=4 and the grid
    0, 0.3, 0.6, 1.0 .
    As default we use p1=-2, p2=1 , y0=4 for generating the ''true'' values of y, i.e.
    y(t,p) = (7/2)exp(-2*t)+(1/2)
    on a grid of equidistant values of tmess,i but you can change these three values. Then you choose a level of perturbation u in [0,1] and the number me of ''measurement'' points. We then compute the true solution of the initial value problem and perturb the values y by the formula
    ymess = y+2*(1/2-ξ)*YA*u
    where YA is the largest absolute value of a generated y. ξ is an internally generated pseudorandom number in ]0,1[. In this way we simulate measurement errors. The error level hence is absolute and 100u percent w.r.t. YA. The original true coefficients and solution are used as initial guess for the optimizer. (Hence, if you take u=0 , the optimizer will stop immediately with these values verified). We fix no values of y but require continuity (and hence C1 continuity) over [0,1], hence we have 5 optimization variables (the three values of y at t=0, 0.3, 0.6 and the two parameters of the ode and two nonlinear equality constraints (continuity at t=0.3, 0.6).
    You can now check how much the perturbation will change the input values via the given optimization criterion. In addition you may change the default settings of some parameters of the ode-solver and the optimizer as follows:
    1. You have the choice to limit the maximal attainable order, e.g. 2 in order to use A-stable integrators only.
    2. You may set the two parameters tolrel and smally controlling the precision delivered by the integrator. This results in reltoli = tolrel, abstoli=tolrel*smally for all components of y. DASSL uses the internal requirement for the local error
      erri <= reltoli |yi| + abstoli
    3. You specify the maximal allowed stepsize hmax to be taken (which of course must be considerably smaller than 1.0).
    4. You also may specify the initial stepsize to be tried, which of course must not be larger than hmax.
    5. And finally, you may change p1, p2, y0 used for the data generation.
    6. You may change the error control of the optimizer: the required precision epsx for the solution, the required precision delmin for the constraints, the maximal allowed deviation of the constraints from their nominal values tau0 and the amount of output io. We recommend io=0 (no intermediate output) or io=1 (one line output showing the optimizers progress) since for io=2,3 you need to know a lot about DONLP2 in order to understand the enormous amount of output you will get then.
 

Output: predefined case

 
  • A small table showing you the input for the optimizer.
  • The number of function evaluations used by the optimizer.
  • The three error criteria for the optimizer: Lagrangian violation (must be taken relative to the gradient of the objective function, the sum of squares of deviations), the feasibility violation and the dual feasibility violation.
  • The total computing time.
  • The computed optimal parameters: the first r values are the ode parameters p and the grid values of y follow in the natural enumeration.
  • A table of the computed Lagrange multipliers
  • Estimates of the conditioning of the constraints gradients and of the Hessian of the (augmented) Lagrangian function. These values are of interest for judging the sensitivity of the problem.
  • From the side of the ode solver DASSL you get information about the cumulative effort of the whole process: number of integration steps, number of rejected steps in the control of the local error, number of steps with trouble in the Newton solver, number of evaluations of G(t,y,y',p) and its Jacobian, and the last integration order used.
  • In addition you get a plot of:
    1. step sizes used (in log form).
    2. local error estimation(in log form).
    3. the actual order used
    4. the computed solution of the ode together with the data.
    5. A plot showing the progress of the optimizer (norm of the gradient of the Lagrangian function (this is ''grad l'' in red) and value of the penalty term , a weighted sum of the values of the constraints violations, (this is ''psi'' in green))
  • !!!! It may occur that a log-scale displays no values or a message like "all points undefined" appears. this means that the errors are so small that by the default values of the graphic program used these are all considered as zero. This might occur with very strong precision requirements.
  • Should the integrator or the optimizer issue warning or error messages then you get these as a protocol.
 

Questions: Predefined case?!

 
  • Generate a table which relates the level of perturbation of the solution with the perturbation of the coefficients. How does this change, if you make the solution more flat or more steep?
  • Why may you not relax your integration precision arbitrarily without shutting down the whole process? Think about the fact that DASSL needs internally a Jacobian of the variational equations done by finite differences.
  • Whereas the variational equations are generated using automatic differentiation, in their evaluation enter the numerical values of y. Which influence may this have on the precision of the gradients required by the optimizer?
  • If you work with low precision requirements on the integrator, how does this influence the behaviour of the optimizer and why?
 

To the input form: predefined case

 

Input: user defined case

 
  • Here you have the possibility to define a problem yourself. This requires
    1. Specification of n, m, r. n is the number of components of y resp. G(t,y,y',p), m is the number of fixed grid points other than the measurement points and r is the number of parameters p.
    2. You write a piece of code for computing the components of G. Details are given on the input form and in the help text.
    3. You specify the number of measurement points, the number and index of the components of y for which you have data and these data. It is possible to specify the case of no measurements here, i.e. anzmess=0 and mmess=0
  • For graphical output you can specify two components of y for which you have data for plotting.
  • You may change default settings of parameters for the ode solver and the optimizer as follows:
    1. You have the choice to limit the maximal attainable order, e.g. 2 for BDF in order to use A-stable integrators only.
    2. You specify the maximal allowed stepsize to be taken (which of course must be considerably smaller than the length of the whole interval.
    3. You also specify the initial used stepsize
    4. In order to specify your bound for the local discretization error you specify two parameters tolrel and smally resulting in
      reltoli=tolrel and abstoli=smally*tolrel for all i. Important: 0 < tolrel < 1 and smally > 0 ! The internal bound for the local discretization error is then
      erri <= reltol * abs (ycomputed,i) + abstol.
    5. You specify the type of constraints on the solution y on the fixed grid. This is done using a table of number pairs type,value. In the input form you find the detailed instruction for this.
    6. equally, you specify possible restrictions on the ode parameters p (bounds, fixed values or linear relations). How to do this is also to be found on the input form.
    7. and, last not least, you must specify initial guesses for all optimization variables, namely the ode parameters p and the values of y on the grid of the Tk, k=1,...,m-1.
    8. You may change the error control of the optimizer: the required precision epsx for the solution, the required precision delmin for the constraints, the maximal allowed deviation of the constraints from their nominal values tau0 and the amount of output io. We recommend io=0 (no intermediate output) or io=1 (one line output showing the optimizers progress) since for io=2,3 you need to know a lot about DONLP2 in order to understand the enormous amount of output you will get then.
 

Output: user defined case

 
  • This is the same as for the predefined case.
 

Questions: user defined case?!

 
  • How might you manage to get an impression on the sensitivity of your computed ode parameters w.r.t to the error necessarily inherent in your data?
  • The questions for the user defined case concerning the precision control apply here too.
 

to the input form: user defined case

 
 
Back to the top!

06.09.2017