NumaWWW: Numerical Mathematics in the World Wide Web

	The method of conjugate gradients -- CG
	Directly to the input form
	The method of conjugate gradients, originally developed for solving systems of linear equations with a symmetric positive definite coefficient matrix, can be used almost unchanged for solving unconstrained minimization problems. This means we seek x^opt such that f(x^opt) <= f(x) for all x from some neighborhood of x^opt. Indeed the method only tries to find a stationary point of f, i.e. a zero of ∇ f(x). Should f be locally convex, then indeed this will be a minimizer. Convexity however is not needed for this method. We need f in C¹ and the gradient is explicitly used. But in no way second partial derivatives enter this method. For a little bit theory concerning the background look here. The general structure of the method is as follows: Given an initial guess x₀ we proceed iteratively as x_k+1 = x_k - σ_kd_k. Here d_k is computed recursively from d_k = ∇ f(x_k) if mod(k,n) = 0 and d_k = α_k ∇ f(x_k)+β_kd_k-1 otherwise, with α_k = 1 - ∇ f(x_k)^Td_k-1/\|\|∇ f(x_k-1)\|\|², β_k = ( \|\|∇ f(x_k)\|\|/\|\|∇ f(x_k-1)\|\| )². This is the variant of Fletcher and Reeves with a correction term by Lippold which makes d_k strongly gradient related i.e. 0 <= γ₁\|\|∇ f(x_k)\|\| < \|\|d_k\|\| < γ₂ \|\|∇ f(x_k)\|\| for all k and ∇ f(x_k)^Td_k = \|\|∇ f(x_k)\|\| ² independently from the quality of the stepsize selection. We use the restart variant although for convex problems convergence can be shown also for the variant without reinitializing d_k every n-th step with the gradient. For the restart variant n-step quadratric convergence can be shown for strict second order minimizers. For the determination of the stepsize we use the method published by Albaali and Fletcher. This is a bracketing method using quadratic and cubic interpolation aiming in satisfying the strong Powell-Wolfe conditions (we omit the index k): \|∇ f(x_next)'d\| <= κ\|∇ f(x)'d\| f(x)-f(x_next) >= σδ\|∇ f(x)'d\|. Here δ and κ are two parameters with default values 0.01 and 0.8. There always must hold 0 < δ < κ < 1* A very small κ means an almost exact unidimensional minimization in the direction -d_k. We use the minimizer of the quadratic polynomial in σ defined by the values of f for σ=0 and σ=1 and the directional derivative -∇ f(x)'d at σ=0 in order to compute a first guess for σ_k. One can show that this estimate tends to the first strict local minimum on the ray and satisfies the strong Powell-Wolfe conditions provided 0 < δ < 1/2 < κ < 1 . For a convex quadratic it is exact, however. Provided the level set of f for x₀ is bounded and contains only finitely many stationary points this method converges to one of them. If this is a strong local minimizer with positive definite Hessian convergence is n-step quadratic and hence R-superlinear. However, for large problems one would not like to iterate that much. Also, due to deviations of σ_k from the optimal one and roundoff effects which can be quite severe here the directions might not have quite a good quality. Hence if this method is applied in practice for large n a preconditioning must be introduced (replacing the gradient by a preconditioned one). Preconditioning however is largely problem dependent. For this reason we do not provide it. There are several possible reasons for termination: besides the obvious ones (too many steps required to reach desired precision and failure in function evaluation despite reduction of steplength) these are: change in the current x less than *ε_x(\|\|x\|\|+ε_x) and \|\|∇ f(x)\|\| less than max(ε_g,macheps). This is the ''best'' type of exit. Here ε_g=ε_x ∇ f(x_k) =0 which means that the gradient suddenly became zero. reduction of stepsize beyond sigsm=10^-10. maybe a reason of severe illconditioning of the Hessian or otherwise a much too small κ. directional derivative along current direction in the roundoff level of f. (continuation makes no sense) Norm of d_k less than roundoff level of x_k. more than n successive small relative changes of f less than ε_f=10^-12**. (continuation makes no sense) The termination reason is reported in the output.

	Input
	You have the choice between 14 predefined functions for which you can read a short examples description here. You also may define a function f of your own, depending on n variables x(1),...,x(n). You also have the choice to specify the initial guess x₀. Either you use the standard parameter settings for the termination criteria or you specify these yourself. For convenience we have these reduced to ε_x, δ, κ setting ε_g=ε_x. For the predefined cases you have the possibility to replace the standard initial point by another one. The required dimension can be drawn from the input form. You have the choice to use the standard parameters of the code or to choose δ, κ, ε_x in the allowed ranges. You have the possibility to require a contourplot of f around the best point found with two of the n variables varying and the other components of x fixed at their current values. you can specify the size of the region. Be careful! Often these functions show rapid growth such that on a large region you may ''see'' nothing useful.

	Output
	cpu time used. If this appears as zero this means less than 0.01 seconds. the termination reason. The computed best values f(x_opt) and x(1)_opt,...,x(n)_opt and the gradient components. Then number of function evaluations used. Two plots in half-log format showing f_k - f_opt and \|\|∇ f(x_k)\|\|. The contour plot, if requested.

	To the input form

Back to the top!

29.11.2010