The Levenberg-Marquardt method

Directly to the input form

 
  • The Levenberg-Marquardt method solves nonlinear least squares problems using a so called ''trust region'' concept. It can of course also be used for solving a system of nonlinear equations. In a nonlinear least squares problem one searches a ''fit- function'' or ''model function'' f(t,x) with x representing a set of parameters, such that for a given set of data (ti,yi) the sum of squared residuals S(x) =||F(x)||22 becomes minimal. The components of the vector F(x) are the residuals Fi(x) = f(ti,x) - yi. The function f(t,x) depends nonlinearly on the vector of parameters x (and of course may also depend nonlinearly on t, but these values are fixed here).
  • Given an initial guess x0 the process is iterative: xk+1 = xk+dk.
    Here dk is the solution of an inequality constrained linear least squares problem:
    ||F(xk) + JF(xk) d ||2 = mind subject to ||Dkd|| <= δk . (*)
    Dk is a diagonal matrix which serves as a weighting of the corrections. In principle the norm appearing in the constraint could be different from the euclidean one, but here we also use the euclidean norm. Di,i is chosen internally as the maximum of one and the length of the i-th column of JF.
  • The linear least squares problem is solved using the QR decomposition of JF.
  • JF is the Jacobian of F and in this application we approximate it by the simplest finite differences.
  • The solution of the constrained problem can in this case be completely characterized by the following theorem of Sorensen (SINUM 19, (1982), 409-426)
    (we omit the index k in the following)
    d is a solution of (*) if and only if the following two properties hold
    • λ >= 0 and ||d|| <= δ and λ(δ-||d||) = 0
    • (JFTJF(x)+λ D2)d = - JFTF(x) (**)
    This means that either the unconstrained least squares solution solves the problem or a parameter λ must be found which solves (**) with the norm of d equal to δ. Hence one solves firstly the unconstrained least squares problem. If the solution does violate the norm constraint then one computes λ such that (**) holds. This equation is equivalent to the linear least squares problem obtained from the original one by appending sqrt(λ)D to JF and a vector of n zeroes to F and gives d as a function of λ. So δ-||d(λ)|| = 0 becomes a scalar nonlinear equation in λ which can be solved rather efficiently by updating techniques for the QR decomposition. λ is known as the ''regularization parameter''. This allows a rank deficient JF in the problem.
  • The so called ''trust region radius'' δk is computed adaptively depending on the difference between the true decrease
    ||F(xk)||2-||F(xk+1)||2
    and the expected decrease, obtained from the linearized model
    ||F(xk)||2-||F(xk) + JF(xk)dk ||2
    The variable ρ in graphical output (plot number 3) is the quotient of these two. With ρ near one (here: >= 0.75) or with ρ between 0.25 and 0.75 and the regularization parameter λ =0, δk becomes increased, with ρ between 0.25 and 0.75 and λ > 0 or 10-4 <= ρ < 0.25 it stays fixed and with ρ < 10-4 it becomes diminished and the step repeated. In theory,without rounding effects, this ends always successfully as long as x is not a stationary point of S.
  • This is a linearly convergent iteration with the same properties as the Gauss-Newton method with stepsize one, as long as λ stays zero. If λ stays nonzero all over the iteration, then the convergence will be very slow.
  • The termination of the iteration is controlled by three parameters: xtol, ftol and gtol. These are bounds for the relative change in x, ||F|| and the so called ''scaled gradient'', this is the maximal angle between the vector F and the columns of its Jacobian JF. If one of these is reached, termination occurs. There is an ''emergency exit'', if a maximum number of function evaluations is reached. Defaults are 10-7, 10-10, 10-7, 10000 respectively.
  • The numerical code used here is lmdif of Jorge J. More (from netlib/minpack).
 

Input

 
  • You specify the number of parameters n (the dimension of x). Important: 1 <= n <= 10 !
  • The number m of your data points. Important : 1 <= m <=100 and n <= m !
  • The data points (m entries as pairs of numbers t(i),y(i), where at least n of the t(i) must be pairwise different. But ''multiple measurements'' (duplicate t(i)) are allowed).
  • a possible shift for t(i) : tti = t(i) - shift. Attention : the fit function f depends on tti and not on t(i). Shift=0 must be specified as such.
  • The fit function f as a piece of code ending with a statement
    yt = an expression.
    This must depend on x(k), k=1,...,n and tti.
  • The initial values for x(k), k=1,...,n.
  • You can choose your own values for xtol, gtol , ftol and the maximum number of function evaluations if you want so.
 

Output

 
  • You get several plots:
  • The first one shows the log10 of the difference ||F(xk)||-||Fmin||. Here -99 means "zero".
  • The second shows the ''scaled gradient'' (red) and δk (green) , both in the decadic logarithm. Here ''-20'' means ''zero''.
  • The third one shows ρ (green) and λ (red), again in a logarithmic scale, with ''-20'' representing zero resp. -infinity.
  • The fourth one shows your data points and the fit function
  • You get a text output showing your parameter choice, the termination reason and the solution.
 

To the input form

 
Back to the top!

09.09.2016