Linear least squares :QR decomposition and normal equations

Directly to the input form

 
  • A linear system
    A p = b
    with more equations than unknowns typically will have no solution and hence one searches for a ''quasi solution''. This might be the minimum of the residual vector
    r = A p - b
    in some norm. Linear least squares uses the euclidean norm and this is the simplest case. The solution will be unique if the rank of A is the full column rank, which case we assume here. For a more general case look for the singular values decomposition (svd). In the full rank case the solution can be explicitly written down as
    p = (AT A)-1 AT b
    or
    (AT A) p = AT b <=> AT(A p - b) = AT r = 0 ,
    the so called ''normal equations'', saying that the optimal residual is perpendicular to the columns of A.
  • The direct application of this representation is often subject to severe illconditioning leading to destructive roundoff propagation and hence other solution methods are applied, see below. ''ill conditioning'' means that small variations in b or A give large variations in p.
  • A result in this direction asks for the difference between the solutions of the original problem
    ||Ap - b|| = minp and the perturbed one ||A˜p˜-b˜|| = minp˜
    which is as follows:
    1. Let
      κ = ||Σ||||Σ-1||
      with the singular values decomposition
      A = U Σ VT
      (κ is the condition number of A)
    2. E= A - A˜ , δ(b)= ||b - b˜||
    3. Assume with δ(A) = ||E||/||A|| there holds
          2 κ δ(A) < 1
    4. Then , with r the minimal residual of the original problem
      || p - p˜||/||p|| <= (κ/( 1 - 2 κδ(A) ) ) ×
      ( δ(A) + { δ(b) + κδ(A)||r||}/{||b||2-||r||2}1/2 )

    This means that the condition number of the so called normal equations (κ2) appears here only multiplied with the norm of the residual and if κ||r|| <= 1 that the sensitivity of the least squares problem is essentially identical to the sensitivity of a (solvable) linear system Ap = b, whereas the large residual problem can be very illconditioned.
  • The above mentioned situation ''more equations than unknowns'' appears quite often in applications. Imagine you have a set of measured data (x(i),y(i)), i=1,...,N,
    and a ''model'' for the dependence of y on x, that is an ''Ansatz'' with given functions φj(x)
    y = F(x;p)=p1φ1(x)+...+pnφn(x) + ε(x)
    which depends linearly on the unknown parameters pj, j=1,..,n and where you have reason to assume ε(xi) being ''small noise''. Inserting the data into this model with ε(xi) neglected leads to the situation above with a N * n matrix A with the elements
        Ai,j = φj(xi) and bi = yi .
  • Using the euclidean norm results in minimization of the sum of squares i=1 to N (F(xi;p)-yi)2
    w.r.t. p.
  • The method of least squares has also a justification in statistics: if you assume that ''small noise'' above means normally and independently distributed random variables ε(xi) with mean zero and variance independent of i then it turns out that the expectation value of your computed p is the ''true'' parameter in the model.
  • The QR decomposition of the matrix provides one way to diminuate the influence of illconditioning present in the normal equations. It uses the fact that the euclidean length is invariant under orthonormal transformations and hence applies such transformations with the aim of transforming the system A p = b into the form
    (Q A)T = ( RT, OT)
    R p = c1
    O p = c2

    where R is triangular and square, O is a zero matrix of N-n rows and c=Qb . Since the multiplication with Q does not change the rank, R will be nonsingular. Hence the incompatibility of the system is in the second equation only, with ||c2|| being the length of the optimal residual. The optimal parameter p is obtained simply by backsubstitution in the first equation. Formally this is equivalent to writing
    A = QT ( RT, OT)T
  • The matrix Q is in this application not computed explicitly, but in a factorized form as the product of n (in the case of N=n n-1) so called Householder matrices. These are special orthonormal matrices performing a reflection on a hyperplane with normal ui:
    I - 2 uiuiT / (uiTui), i=1,..,n
    The vector ui is the normal of this hyperplane and can be determined easily by the job of step i in this decomposition: annihilate the elements (i+1,i),...,(N,i) in the (current) matrix. If you think of these elements as a column, then for example
    (8,3,1,-1,4,-3)T -> (-10,0,0,0,0,0)T
    with
    u = (18,3,1,-1,4,-3)T
  • Given those normals ui any desired operation with the matrix Q which is the product of the Householder matrices can be accomplished.
  • One can show that the computation and the application of these transformations is numerical backward stable in the sense that the computed solution p is the exact solution to a problem which is a perturbation of the original system in the order of machine roundoff.
  • It should be obvious that the choice of the ''Ansatz'' heavily influences the condition number of A and hence the sensitivity of the coefficients. The application here allows you to test this for example using different bases of the n dimensional linear space of polynomials of maximum degree n-1 (e.g. monomial, Legrendre and Chebyshev). The following short table with the abscissas xi equidistant in [-1,1] demonstrates this quite well. For the monomial basis in intervals like [1,5] the situation becomes even worse!
    The table shows the reciprocal condition number of A, i.e. 1/κ
    N n Monomial Chebychev
    10 5 0.5982810E-001 0.6058193E+000
    1010 0.2161485E-003 0.6858268E-001
    20 5 0.5834324E-001 0.5135544E+000
    2010 0.7595056E-003 0.4190469E+000
    30 5 0.5688019E-001 0.4786713E+000
    3010 0.8214191E-003 0.4422010E+000
    3015 0.8670296E-005 0.2740692E+000
    40 5 0.5599882E-001 0.4613035E+000
    4010 0.8290337E-003 0.4275510E+000
    4015 0.9995193E-005 0.3545616E+000
    4020 0.9708376E-007 0.1313730E+000
    50 5 0.5542571E-001 0.4509568E+000
    5010 0.8266143E-003 0.4072196E+000
    5015 0.1050676E-004 0.3782863E+000
    5020 0.1163482E-006 0.2202498E+000
    100 5 0.5418598E-001 0.4298162E+000
    10010 0.8072133E-003 0.3451954E+000
    10015 0.1084858E-004 0.3424422E+000
    10020 0.1387583E-006 0.3193107E+000

  • This program gives you three choices: either you specify explictly a problem by the coefficients A and b. Or you play with some linear least squares models which are predefined here. For example it might be instructive to play with a polynomial model using different intervals on the real axis. Or finally you specify an ''ansatz'' of your own and work either with your own data or artificially generated ones.
 

Input

    Case 1: explicit specification of matrix and right hand side:
 
  • row and column number of the matrix
  • the coefficients of A with the elements of b appended, rowwise
    Case 2: predefined models with data generation:
 
  • You choose a model.
  • you specify an interval [a,b] for the ''free'' variable x.
  • you specify the number of data N.
  • You also specify the amount of generated ''noise''. This is here a set of random numbers ri following a rectangular distribution in the interval [-errlev,errlev]*y_max_abs where errlev can be chosen. That means that the data are disturbed with an absolute error which is proportional to the largest function value of F. errlev=0 is possible and shows the effect of roundoff errors on the computation.
  • The abscissas xi are generated equidistantly in [a,b]
  • Then yi is computed as yi = F(xi;p1,...)+2·(½-ri)·errlev·y_max_abs.
  • You have the choice between four models with 5 or 6 coefficients.
    1. A polynomial using the monomials 1, x, x2, ..., x5 as basis. You have the additional choice of solving the problem after a linear transformation of [a,b] to [-1,1].
    2. A polynomial with the Legendre polynomials L0(x),...,L5(x) as basis. These are
      L0(x) = 1, L1(x) = x, L2(x) = ½(3x2-1),...
    3. A trigonometric model with the basis 1, sin(c1 π x), cos(c2 π x), sin(c3 π x), cos(c4 π x).
    4. An exponential model with fixed exponents 1,exp(ci x) , i=1,2,3,4 .
    5. In the last two cases you may choose the constants ci yourself.
  • You specify the ''true'' coefficients p.
    Case 3: A model of your own:
 
  • In this case you must write a piece of code following FORTRAN conventions which computes one row of A given the ''instance'' x. Input is here x and n, output is one row of A. Row and column numbers are specified separately.
  • With the so defined ''ansatz'' you either might generate artificial data (in this case the abscissas are chosen equidistantly in a given interval and you specify the ''true'' coefficients and a perturbation level as in case 2) or
  • you fit given own discrete data.
 

Output

 
  • In case 1 you get the solution vector p and the euclidean length of the residual. You also get the reciprocal condition number 1/κ
  • In case 2: you get
  • your chosen ''true'' coefficients p0,...,pn.
  • the computed coefficients, showing the influence of roundoff and/or generated noise, for the solution way using the normal equations and the QR decomposition.
  • A plot showing the discrete data points and the approximating function F(x;p) with p from the QR solution way.
  • The reciprocal condition number 1/κ
  • The same output as in case two is also provided for case three, an ''ansatz'' of your own.
 

Questions ?!

 
  • If a small errlev generates large deviations in p: what do you conclude from this?
  • In which cases the results between the normal equations and the QR solution way differ most and why? Can you realize a connection of these differences to the reciprocal condition number?
  • Which role plays the transformation to [-1,1] in the case of a polynomial model?
  • Which difference do you observe between the monomial and the Legendre basis?
  • What do you expect if you choose in case 2 and the exponential model some ci=0?
  • How compares the upper bound for the change in the solution given above with with the one you can observe here, if you first set errlev to zero and then to some realistic value in the magnitude of some percent (like measurement errors)?
 

To the input form

 
Back to the top!

20.01.2014