The Levenberg-Marquardt method: nonlinear equations

Directly to the input form

 
  • The Levenberg-Marquardt method solves nonlinear systems of equations
    F(x) = 0 with F: |Rn -> |Rn using a so called ''trust region'' concept in minimizing
    S(x) = ||F(x)||22 .
  • Given an initial guess x0 the process is iterative: xk+1 = xk+dk.
    Here dk is the solution of an inequality constrained linear least squares problem:
    ||F(xk) + JF(xk) d ||2 = mind subject to ||Dkd|| <= δk . (*)
    Dk is a diagonal matrix which serves as a weighting of the corrections. In principle the norm appearing in the constraint could be different from the euclidean one, but here we also use the euclidean norm. Di,i is chosen internally as the maximum of one and the length of the i-th column of JF.
  • The linear least squares problem is solved using the QR decomposition of JF.
  • JF is the Jacobian of F and in this application we compute it analytically using JAKEF from your function input.
  • The solution of the constrained problem can in this case be completely characterized by the following theorem of Sorensen (SINUM 19, (1982), 409-426)
    (we omit the index k in the following)
    d is a solution of (*) if and only if the following two properties hold
    • λ >= 0 and ||d|| <= δ and λ(δ-||d||) = 0
    • (JFTJF(x)+λ D2)d = - JFTF(x) (**)
    This means that either the unconstrained least squares solution solves the problem or a parameter λ must be found which solves (**) with the norm of d equal to δ. Hence one solves firstly the unconstrained least squares problem. If the solution does violate the norm constraint then one computes λ such that (**) holds. This equation is equivalent to the linear least squares problem obtained from the original one by appending sqrt(λ)D to JF and a vector of n zeroes to F and gives d as a function of λ. So δ-||d(λ)|| = 0 becomes a scalar nonlinear equation in λ which can be solved rather efficiently by updating techniques for the QR decomposition. λ is known as the ''regularization parameter''. This allows a rank deficient JF in the problem.
  • The so called ''trust region radius'' δk is computed adaptively depending on the difference between the true decrease
    ||F(xk)||2-||F(xk+1)||2
    and the expected decrease, obtained from the linearized model
    ||F(xk)||2-||F(xk) + JF(xk)dk ||2
    The variable ρ in graphical output (plot number 4) is the quotient of these two. With ρ near one (here: >= 0.75) or with ρ between 0.25 and 0.75 and the regularization parameter λ =0 (plot number 3) , δk becomes increased, with ρ between 0.25 and 0.75 and λ > 0 or 10-4 <= ρ < 0.25 it stays fixed and with ρ < 10-4 it becomes diminished and the step repeated. In theory,without rounding effects, this ends always successfully as long as x is not a stationary point of S.
  • This is for a solvable system with regular Jacobian a quadratically convergent method (up to the steplength computation identical to the classical Newton's method) as long as λ stays zero. If λ stays nonzero all over the iteration, then the convergence can be very slow.
  • The termination of the iteration is controlled by three parameters: xtol, ftol and gtol. These are bounds for the relative change in x, ||F|| and the so called ''scaled gradient'', this is the maximal angle between the vector F and the columns of its Jacobian JF. If one of these is reached, termination occurs. There is an ''emergency exit'', if a maximum number of function evaluations is reached. Defaults are 10-7, 10-10, 10-7, 10000 respectively.
  • The numerical code used here is lmder of Jorge J. More (from netlib/minpack).
 

Input

 
  • You have the possibility to play with 6 prepared examples, of which a description is to be found here or you define a problem of your own. If this latter case
    1. You specify the number of parameters n (the dimension of x). Important: 1 <= n <= 100 !
    2. The function F as a piece of FORTRAN code. This must depend on the x(k), k=1,...,n .
  • You always specify initial values for x(k), k=1,...,n.
  • You can choose your own values for xtol, gtol , ftol and the maximum number of function evaluations if you want so or use the defaults. You may specify the print frequency of selected variables of x, this is the variable nprint and the computation of the initial trust region radius δ0 which is either factor*||D0x0|| if this is nonzero or factor itself.
 

Output

 
  • You get several plots:
  • The first one shows the log10 of the difference ||F(xk)||-||Fmin||. Here -99 means "zero".
  • The second shows the gradient norm (red) and δk (green) , both in the decadic logarithm. Here ''-20'' means ''zero''.
  • The third one shows λ and the fourth ρ (red), again in a logarithmic scale, with ''-20'' representing zero resp. -infinity.
  • You get a text output showing your parameter choice, the termination reason and the solution.
 

To the input form

 
Back to the top!

01.08.2013