The Gauss-Newton method

Directly to the input form: exponential fit

Directly to the input form: user defined case

 
  • The Gauss-Newton method solves nonlinear least squares problems. In abstract terms this means minimizing the euclidean length of a vector function F of m components with respect to the variable vector x with n components where we assume here that m >= n. In applications the situation is similar to many applications of linear least squares: for a set of data (ti,yi) one seeks the parameter x of a function model f(x;t) such that the sum of squares of deviations f(x;ti) -yi is minimal. The components of F are made up from these deviations (or ''residuals''). Hence
    S(x) = (1/2)*||F(x)||22
    and
    Fi(x) = f(x;ti) - yi.
    The function f(x;t) depends nonlinearly on x.
  • This method tries to find a zero of the gradient of S(x) , which of course is only a necessary optimality condition. Since this is achieved by monotonic decrease of S using a decrease depending on the norm of the gradient it is hoped that indeed a minimizer is found. Newton's method itself could also be used for finding a zero of ∇ S(x). This would require the computation of the Hessian matrix of S (matrix of second partial derivatives). This reads, using the Jacobian JF(x) of the vector function F
    2 S(x) = JF(x)TJF(x) + i=1 to m Fi(x)* ∇2 Fi(x).
    Far from the minimizer this could be indefinite and fail to produce a direction of descent for S. This, besides the necessity of computing the second partial derivatives of F, is the reason why the Gauss-Newton direction often is preferred.
  • Here only the always semidefinite part of the Hessian is used. Indeed we assume stronger that the Jacobian JF is always of full rank, which implies that JF(x)TJF(x) is always positive definite. The direction of descent for S is defined as
    d(x) = -(JF(x)TJF(x))-1 JF(x)TF(x)
  • d(x) is not computed from this formula but rather by solving the linear least squares problem
    ||F(x) + JF(x)*d||22= mind .
    This is done using numerically stable techniques, here we use the Householder QR-decomposition of JF.
  • In order to guarantee convergence to a zero of the gradient at least it is in addition necessary to decrease S along this direction sufficiently. This needs the computation of a steplength τ. The next iterate then is obtained as
    xnew = xold+τ·d.
  • This stepsize is computed by a combination of interpolation of S(x+τ ·d) and reduction of the interpolation interval (which always has the left endpoint 0, where the data S(x) and (d/dτ)S(x+τ ·d)τ=0 are used.) One can show that this always enforces convergence, provided the full rank assumption for JF on the initial level set is satisfied.
  • Order of convergence of this process is linear (only in the case of F(x*)=0, the perfect fit (interpolation), it is quadratic). The rate depends on magnitude of the optimal residual and is of the form C*||F(x*)|| < 1 where the constant also depends on the condition of JF.
  • The numerical code used here is NLSCON from the Elib of ZIB-Berlin.
 

There are two input forms: one with a predefined exponential fitting problem and one where the user can define the fit himself.

 

Input: the predefined exponential fit

 
  • Here the data model f(x;t) is a sum of exponentials such that
    Fi(x)=a0 + a1*exp(b1*ti) + .... + an*exp(bn*ti) - yi with x = (a0,a1,b1, ... ,an,bn)T.
  • You specify the number of exponentials n.
  • You specify the 2n+1 ''true'' coefficients x = (a0,a1,b1,...,an,bn)T.
  • You specify an interval [a,b] for t, in which the data will be generated equidistantly.
  • You specify the number m of data points. Important: 2n+1 <= m <= 200 !
  • A variable level which is used for generation of artificial ''measurements errors''. Important: 0 <= level <= 1 ! Now m data (ti,yi) will be computed using the model with the ''true'' coefficients and afterwards the values yi are perturbed using random numbers ri in [0,1] using the formula
    yi = yi,orig+(ri-1/2)*level*2*max{|yi,orig|}
    Subsequently the model is fitted to the perturbed data. Attention: don't hope for wonders! fitting by exponentials is a notoriously badly conditioned problem. This becomes clear if you consider it as a problem of parameter identification for a linear differential equation with constant coefficients, with the solution given by a discrete and noisy set of data.
  • You have the possibility to change some of the parameters of the code:
    1. the required precision of the solution vector rtol in [1.0d-12,1.0d-2]
    2. The minimal stepsize allowed for the length reduction of the full Gauss-Newton step sigmin in the range [1.0d-12,1.0d-2]
    3. the maximal condition number of the jacobian before regularization (by truncation) occurs: maxcond in the range [2,1016]
    4. the maximal number of steps allowed maxstep >= n
    5. the amount of printed output iprint: 0 = none, 1 = only final result, 2 = short iteration protocol plus final result 3 = detailed iteration protocol (not recommended)
 

Output: the predefined exponential fit

 
  • cpu time used.
  • A table with the original and the computed (perturbed) coefficients.
  • The sum of squares of data perturbations.
  • The sum of squares of residuals after fitting.
  • A plot which shows the perturbed data and the fitfunction.
 

Questions: exponential fit

 
  • How influences an increase of the perturbation the performance of the method?
  • How sensitive react the coefficients on an increase of the perturbation level for different choices of the exponents?
  • What takes place if two of your chosen exponents are nearly equal or if the decay is very small ?
 

To the input form: exponential fit

 

Input: user defined case

 
  • Here you can define the model function f(x;t) yourself.
  • You have the possibility to fit own data in which case your input for x serves as the initial guess or may have the data generated internally. In the latter case your input for x serves as the ''true'' parameter value. data are computed from this using your model function and then perturbed using random perturbations. The perturbed data are subsequently used for fitting.
  • You specify the dimension np of x. Important: np <= 20 !
  • You specify the model function f(x;t) by a piece of code depending on x(1),...,x(np) and t. This must obey the rules of FORTRAN and the restrictions of JAKEF.
  • You specify the initial vector x(1),...,x(np) with np entries.
  • The Jacobian is produced analytically here internally using the code JAKEF from NETLIB.
  • You have the choice to specify explicitly your data or to let the program generate data artificial using your specifications. In the latter case you define:
    1. An interval [a,b] from which the variable t is drawn equidistantly.
    2. The number m of data points. Important: np <= m <= 200 !
    3. An error level level. Important: 0 <= level <= 1 ! m data points (ti,yorig,i=f(x;ti)) are computed and subsequently the data yi are computed from
      yi = yi,orig+(ri-1/2)*level*2*max{|yi,orig|}
      where ri are random numbers in [0,1]. These data are then fitted.
    Direct input of data points:
    1. You specify their number m. Important: np <= m <= 200 !
    2. You type the data in the prepared textarea, the data items t(i),y(i)
      separated by comma or blank, not necessarily one per line, for example
      0,0 1,1
      2,1.4 ..
    3. Finally you have the possibility to change some of the parameters of the code:
      1. the required precision of the solution vector rtol in [1.0d-12,1.0d-2]
      2. The minimal stepsize allowed for the length reduction of the full Gauss-Newton step sigmin in the range [1.0d-12,1.0d-2]
      3. the maximal condition number of the jacobian before regularization (by truncation) occurs: maxcond in the range [2,1016]
      4. the maximal number of steps allowed maxstep >= n
      5. the amount of printed output iprint: 0 = none, 1 = only final result, 2 = short iteration protocol plus final result 3 = detailed iteration protocol (not recommended)
 

Output: user defined fit

 
  • cpu time used by the minimizer.
  • Table of original (resp. initial) coefficients and the fitted ones.
  • sum of squares of generated errors (if applicable)
  • sum of squares of residuals for the final solution.
  • A plot which shows the fitdata and the fit function.
 

Questions: user defined case ?!

 
  • What do you observe if the fit is ''bad'' (residuals resp. perturbations are large)
  • When do you observe large perturbations in the coefficients?
  • How depend these effects on the model function?
 

To the input form: user defined case

 
 
Back to the top!

26.08.2020