The Newton-GMRES method

Directly to the input form

 
  • The Newton-GMRES method is meant for the solution of large nonlinear systems of equations F(x)=0, F: |Rn → |Rn .
  • It is a combination of the ideas of Newton's method and the GMRES method for linear systems, in the form first linearize F then relax (the precision).
  • Given an approximation xk for the zero x* in Newton's method one wants to solve the correction equation
    JF(xk) dk = -F(xk)
    and then compute the new iterate as
    xk+1 = xk + σkdk
    where the stepsize σk > 0 is found such that
    ||F(xk)|| - ||F(xk+1)|| >= δ σk ||F(xk)||
    for some δ ∈ ]0,1/2[. One can show that for a two times differentiable F with a nonsingular Jacobian at x* this is always possible and that finally the stepsize one is admissible, such that quadratic convergence is combined with globalized convergence by this technique. We use a fixed δ = 0.001 in this application. σ is found by trying one first and reducing the size gradually by halving if this does not satisfy the descent condition.
  • However, solving the correction equation to full accuracy might be too expensive or even impossible in practice for very large n. Hence it is an obvious idea to solve this equation approximately only, leading to the class of inexact Newton methods. This opens the possibility to solve the equation for dk using iterative methods and to terminate these after obtaining ''sufficient precision''.
  • In order to maintain a proof of global convergence it suffices to reduce the residual
    r(d) = JF(xk)d + F(xk)
    to
    ||r(d)|| <= ρ ||F(xk)|| with 0 < ρ < 1
    since then by Taylors theorem
          || F(xk+σd)|| = || F(xk) + σJF(xk)d + O(||σd||2) ||
    = || F(xk)- σF(xk) + σ(JF(xk)d+F(xk)) + O(||σd||2) ||
    <= || F(xk)- σF(xk)|| + σ ρ||F(xk)|| + O(||σd||2)
    = (1 - σ(1-ρ)) ||F(xk)|| + O(||σd||2)

    and since under the conditions above ||d|| = O(||F(xk)||) it follows that descent can be obtained for ||F(xk+σd)|| as before. With a fixed ρ > 0 the quadratic convergence however is lost and one obtains linear convergence which will improve with decreasing ρ which in turn increases the effort for solving for the correction direction d.
  • Here ρ is the user defined variable reduc on the input form. One may play with it in order to observe the influence on the convergence rate.
  • As iterative solver for the correction equation we use the GMRES method and this gives this method its name. As is known, GMRES can behave quite irregular if the eigenvalues of the Jacobian JF(x) have clusters around zero, hence this limits the power of this approach. This fact can be alleviated using a good preconditioner for GMRES, but since this is heavily problem dependent we work here without preconditioning. As usual we provide GMRES with the possibility of a restart after less than n steps (the variable restart on the input form serves this purpose) and we also allow the user to set a bound for the norm reduction in the modified Gram-Schmidt orthogonalization of the new column in the ON-system V which controls when reorthogonalization is issued for this.
  • For details on GMRES see its page in the chapter on linear equations.
  • In order to apply the method as described here, one needs the nonzero entries (and the sparsity structure) of JF(x) explicitly. This may be again prohibitive in practical circumstances and hence the next step in the design of the overall method is the replacement of JF(xk)d by a finite difference approximation, which makes the overall method a derivative free inexact Newton method. The simplest approximation is the forward difference quotient
    JF(xk)d = (F(xk+ τ d)-F(xk) )/τ
    with an error O(τ). We also provide better approximations, namely the symmetric difference quotient and a formula of an error O6). These are descibed in the chapter ''Interpolation/Approximation'' under ''Numerical differentiation''.
 

Input

 
  • For the nonlinear system
    F(x) = (F1(x1,...,xn) ,..., Fn(x1,...,xn)) = 0
    there is a predefined case with chooseable dimension (in {60,1200}), namely the standard discretization of the nonlinear two point boundary problem
    y''(t) = γ sin(π y(t)) + t(1-t), 0 <= t <= 1 , y(0) = 1 , y(1) = 2
    by finite differences of second order consistency. This is well defined for γ <= 1/2 but for larger values it becomes illdefined. You may set γ as you wish, but don't complain if it does not work! If you know a little bit more about the theory: think about the eigenvalues of the tridiagonal matrix (the Jacobian of F) with the typical row
    (0, ...., 0, -1 , 2 + γ π cos(π y(j))/(n+1)2 , -1 , 0 , ... , 0 ).
  • You have also the possibility to provide your own testcase by a code for computing F yourself. In this case we take you the burden to provide the Jacobian by using numerical derivatives along the correction direction d. You have the choice between three types of differences: the ordinary forward difference, the symmetric one and an extrapolation of order six of the symmetric difference quotient. This is controlled by the input variable difftype.
  • In the case of a selfdefined problem input is the dimension n, four indices I in the range {1,...,n} for which you will see the graph of xi, i in I over the stepnumber k and the initial point (x0,1,...,x0,n).
  • The required precision ε in the norm of F. We require
    ||F(x)||2 <= ε ||diag(JF)(x)||
    for the current point for successful termination.
  • The restart parameter for GMRES. restart = n means that you require (theoretically) exact solution of the correction equation, but this makes no sense since the effort will be much larger than for a direct solver then.
  • The maximum number of Newton iterations allowed. For GMRES we limit the number of iterations to 3n for each Newton step.
  • The minimal value allowed for the stepsize σ in the descent test in the damped Newton method. We use the sequence 2-i, i=0,1,2,.. for σ. A very small σ means no progress in x and hence is of little use, but for a bad initial guess values considerable smaller than 1 occur often.
  • The parameter reduc (ρ above) which controls how accurate GMRES must solve the system. A very small reduc makes the method more look like the true Newton method, but may cause much effort or even a failure of GMRES.
  • The parameter reorthfac. This controls the occurence of reorthogonalization in GMRES: let v = Viei be the last column of the ON-system obtained up to step i in GMRES. Then we compute JF(x) v and orthogonalize this w.r.t. Vi using the modified Gram-Schmidt method, obtaining a vector w. If
    ||w|| <= reduc × ||v||
    then the orthogonalization is repeated once w.r.t. to all previous columns in Vi. The next orthogonal vector then is w normalized to length 1. If reduc is chosen near 1, then this doubles the computational effort. On the other side, a small reduc allows degradation of the orthogonality and may influence the behaviour of GMRES badly (whose theory strongly depends on this).
  • The parameter difftype: You can choose between three types of numerical differentiation for computing JF(x) v from values of F alone. Available are the forward difference quotient, the symmetric one and an sixth order extrapolation of the symmetric one, requiring 1, 2 or 6 values of F.
  • You might want to see in addition to the graphical output a printed table of some components of x and ||F||. Since the step numbers might be quite large, you can decide to see only every k-th step.
  • You also might see the final solution printed or in graphical representation only.
  • You might see the decay of the residual in the last GMRES run in order to assess the influence of the restart parameter or the eigenvalue distribution of the Jacobian (if you know something about it).
 

Output

 
  • Three 2D-Plots showing the sequence xI for the selected indices and ||F||, and xi=1,...,n over i. If required, a plot of ||r(d)|| over the iterations in the last GMRES run.
  • A table of intermediate results if you did require this.
  • A table of the final solution if you did require this.
 

Questions?!

 
  • In the predefined case, when breaks the method down if you increase n or γ? May be you guess why?
  • What can you say about the interplay of the three parameters restart, reduc and reorthfac on the behaviour of the method?
  • Has the choice of difftype much influence of the behaviour of the method?
 

To the input form

 
Back to the top!

13.09.2017