NumaWWW: Numerical Mathematics in the World Wide Web

	The BFGS (quasi-Newton) method
	Directly to the input form
	In this application we search a local minimizer x^opt of a smooth nonlinear function f on Rⁿ. This means f(x^opt) <= f(x) for all x from some neighborhood of x^opt. Indeed the method only tries to find a stationary point of f, i.e. a zero of ∇f(x). Should f be locally convex, then indeed this will be a minimizer. Convexity however is not needed for this method. We need f in C¹ and the gradient is explicitly used. But in no way second partial derivatives enter this method. For a little bit theory concerning the background look here. The general structure of the method is as follows: Given an initial guess x₀ (which need not be a ''good'' one) and an arbitrary positive definite symmetric matrix A₀ one proceeds iteratively: x_k+1 = x_k - σ_kd_k. Here d_k solves the linear system A_kd_k = ∇f(x_k). It is a direction of strong ascent of f, hence the negative sign in the computation of x_k+1. Then the next matrix is obtained by a rank two updating. Here we use BFGS (detected independently by Broyden, Fletcher, Goldfard and Shanno using different approaches in 1970), A_k+1 = A_k+(y_ky_k^T)/(y_k^Ts_k)- A_ks_ks_k^TA_k/(s_k^TA_ks_k) A_k+1 satisfies the so called secant relation A_k+1s_k = y_k, with s_k=x_k+1-x_k, y_k= ∇f(x_k+1)- ∇f(x_k) One can show that the so defined sequence of matrices stays uniformly positive definite (i.e. there exist constants l₁, l₂ > 0 such that *l₁ < λ(A_k) < l₂ () for all k and all eigenvalues λ of the matrices), provided f is uniformly convex on Rⁿ (i.e. the eigenvalues of the Hessian of f also satisfy ()* for all x). The linear system for d_k is solved using the Cholesky-decomposition of the matrix, which is updated in every step, making this an O(n²) process. Positive definiteness of A_k+1 follows from positive definiteness of A_k and the additional requirement y_k^Ts_k > 0 which is obtained by proper selection of σ_k. For this we use the method published by Albaali and Fletcher. This is a bracketing method using quadratic and cubic interpolation aiming in satisfying the strong Powell-Wolfe conditions (we omit the index k): \|∇ f(x_next)^Td\| <= κ ×\|∇ f(x)^Td\| f(x)-f(x_next) >= σ δ \|∇ f(x)^T d\|. Here δ and κ are two parameters with default values 0.01 and 0.9. There always must hold 0 < δ < κ < 1 A very small κ means an almost exact unidimensional minimization in the direction -d_k. We use the minimizer of the quadratic polynomial in σ defined by the values of f for σ=0 and σ=1 and the directional derivative -∇ f(x)^Td at σ=0 in order to compute a first guess for σ_k. One can show that this estimate tends to one and satisfies the strong Powell-Wolfe conditions provided 0 < δ < 1/2 < κ < 1 . This combined with the Broyden-Dennis-More property of {A_k}: lim_{k to infinity} \|\|(A_k-∇²f(x^opt))d_k\|\|/\|\|d_k\|\| = 0 results in the superlinear convergence of the method: lim_{k to infinity} \|\|x_k+1-x^opt\|\|/\|\|\|x_k-x^opt\|\| = 0 . Since we do not require f to be uniformly convex or convex, we need to control the growth of the matrices and their inverses. This is quite cheaply possible here using a recursive update of the trace of A_k and its inverse obtained from the recursion. If it turns out that the condition number of A_k grows beyond preset bounds (i.e. the product of the two traces grows beyond some larger bound) then we issue a ''restart'': A_k is as A₀ chosen as a scaled version of the unit matrix, the scaling factor being computed as \|\|∇f(x+dx)-∇f(x)\|\|/\|\|dx\|\| for some small dx. If the term y y^T/(y^T s) in the update becomes larger than 1/sqrt(macheps), and y^T s < 0.2 s^T A s then we use a modification proposed by Powell for the vector y: we use instead z=θ y+(1-θ) A d with θ =0.8 s^T A s/(s^T A s- y^T s) These are the ''mod'' steps reported in the results. There are several possible reasons for termination: besides the obvious ones (too many steps required to reach desired precision and failure in function evaluation despite reduction of steplength) these are: change in the current x less than ε_x×(\|\|x\|\|+ε_x) and \|\|∇ f(x)\|\| <= min(ε_g,macheps ×trace(A)×trace(inv(A))/n²). This is the ''best'' type of exit. d_k =0 which means that the gradient suddenly became zero. reduction of stepsize beyond sigsm=10^-10. maybe a reason of severe illconditioning of the Hessian or otherwise a much too small κ. directional derivative along current direction in the roundoff level of f. (continuation makes no sense) failure of stepsize selection immediately after a restart. Again an effect of severe illconditioning or a too small κ . more than 4 successive small relative changes of f less than ε_f=10^-12. (continuation makes no sense) The termination reason is reported in the output. gradients are computed analytically here, with exception of the predefined testcase 14.

	Input
	You have the choice between 14 predefined functions for which you can read a short examples description here. You also may define a function f of your own, depending on n variables x(1),...,x(n). You also have the choice to specify the initial guess x₀. Either you use the standard parameter settings for the termination criteria or you specify these yourself. For convenience we have these reduced to ε_x, δ, κ setting ε_g=ε_x. ε_g gets adapted relative to \|\|∇ f(x₀)\|\|. This appears in the protocol. For the predefined cases you have the possibility to replace the standard initial point by another one. The required dimension can be drawn from the input form. You have the possibility to require a contourplot of f around the best point found with two of the n variables varying and the other components of x fixed at their current values. you can specify the size of the region. Be careful! Often these functions show rapid growth such that on a large region you may ''see'' nothing useful.

	Output
	cpu time used. If this appears as zero this means less than 0.01 seconds. the termination reason. The computed best values f(x_opt) and x(1)_opt,...,x(n)_opt and the gradient components. Then number of function evaluations used. Two plots in half-log format showing f_k - f_opt and \|\|∇ f(x_k)\|\|. The contour plot, if requested.

	To the input form

Back to the top!

07.06.2016