|
- The Levenberg-Marquardt method solves nonlinear least squares problems
using a so called ''trust region'' concept.
It can of course also be used for solving a system of nonlinear equations.
In a nonlinear least squares problem one searches a ''fit- function'' or
''model function'' f(t,x) with x representing a set of
parameters, such that for a given set of data
(ti,yi) the sum of squared residuals
S(x) =||F(x)||22 becomes minimal. The components of
the vector F(x) are the residuals Fi(x) = f(ti,x) -
yi. The function f(t,x) depends nonlinearly on the vector
of parameters x (and of course may also depend nonlinearly on t, but these
values are fixed here).
- Given an initial guess x0 the process is iterative:
xk+1 = xk+dk.
Here dk is the solution of an inequality constrained linear
least squares problem:
||F(xk) + JF(xk) d ||2 =
mind subject to ||Dkd|| <= δk . (*)
Dk is a diagonal matrix which serves as a weighting of the
corrections. In principle the norm appearing in the constraint could be different from
the euclidean one, but here we also use the euclidean norm. Di,i is chosen internally as
the maximum of one and the length of the i-th column of JF.
- The linear least squares problem is solved using the QR decomposition of JF.
- JF is the Jacobian of F and in this application
we approximate it by the simplest finite differences.
- The solution of the constrained problem can in this case be completely
characterized by the following theorem of Sorensen (SINUM 19, (1982), 409-426)
(we omit the index k in the following)
d is a solution of (*) if and only if the following two properties hold
- λ >= 0 and ||d|| <= δ and
λ(δ-||d||) = 0
- (JFTJF(x)+λ D2)d =
- JFTF(x) (**)
This means that either the unconstrained least squares solution solves the problem or a
parameter λ must be found which solves (**) with the norm of d
equal to δ.
Hence one solves firstly the unconstrained least squares problem. If the solution does violate the norm
constraint then one computes λ such that (**) holds. This equation
is equivalent to the linear least squares problem obtained from the original one by appending sqrt(λ)D
to JF and a vector of n zeroes to F and gives d as
a function of λ.
So δ-||d(λ)|| = 0 becomes a scalar nonlinear equation in λ
which can be solved rather efficiently by updating techniques for the QR decomposition.
λ is known as the ''regularization parameter''.
This allows a rank deficient JF in the problem.
-
The so called ''trust region radius'' δk is computed adaptively
depending on the difference between the true decrease
||F(xk)||2-||F(xk+1)||2
and the expected decrease, obtained from the linearized model
||F(xk)||2-||F(xk) +
JF(xk)dk ||2
The variable ρ in graphical output (plot number 3)
is the quotient of these two. With ρ near one (here: >= 0.75)
or with ρ between 0.25 and 0.75 and
the regularization parameter λ =0, δk becomes increased,
with ρ between 0.25 and 0.75 and λ > 0 or 10-4 <= ρ < 0.25
it stays fixed and with ρ < 10-4 it becomes diminished and the step repeated.
In theory,without rounding effects, this ends always successfully as long as x is not
a stationary point of S.
- This is a linearly convergent iteration with the same properties as the Gauss-Newton method with
stepsize one,
as long as λ stays zero. If λ stays nonzero all over the iteration,
then the convergence will be very slow.
- The termination of the iteration is controlled by three parameters: xtol, ftol and gtol.
These are bounds for the relative change in x, ||F|| and the so
called ''scaled gradient'', this is the maximal angle between the vector F and the
columns of its Jacobian JF. If one of these is reached, termination occurs.
There is an ''emergency exit'', if a maximum number of function evaluations is reached.
Defaults are 10-7, 10-10, 10-7, 10000 respectively.
- The numerical code used here is lmdif of Jorge J. More (from netlib/minpack).
|
|
- You specify the number of parameters n (the dimension of x).
Important: 1 <= n <= 10 !
- The number m of your data points.
Important : 1 <= m <=100 and n <= m !
- The data points (m entries as pairs of numbers t(i),y(i),
where at least n of the t(i) must be pairwise different. But
''multiple measurements'' (duplicate t(i)) are allowed).
- a possible shift for t(i) :
tti = t(i) - shift. Attention : the fit function f depends on
tti and not on t(i). Shift=0 must be specified as such.
- The fit function f as a piece of code ending with a statement
yt = an expression.
This must depend on
x(k), k=1,...,n and tti.
- The initial values for x(k), k=1,...,n.
- You can choose your own values for xtol, gtol , ftol and the maximum number
of function evaluations if you want so.
|