|
Linear least squares :QR decomposition and normal equations |
|
|
|
- A linear system
A p = b
with more equations than unknowns typically will have no solution
and hence one searches for a ''quasi solution''. This might be the
minimum of the residual vector
r = A p - b
in some norm. Linear least squares uses the euclidean norm and this is
the simplest case. The solution will be unique if the rank of A
is the full column rank, which case we assume here. For a more general case
look for the singular values decomposition (svd).
In the full rank case the solution can be explicitly written
down as
p = (AT A)-1 AT b
or
(AT A) p = AT b <=> AT(A p - b) = AT r = 0 ,
the so called ''normal equations'', saying that the optimal residual is perpendicular to the columns of A.
- The direct application of this representation is often subject to
severe illconditioning leading to destructive roundoff propagation
and hence other solution methods are applied, see below.
''ill conditioning'' means that small variations in b or A
give large variations in p.
- A result in this direction asks for the difference between the solutions of the
original problem
||Ap - b|| = minp and the perturbed one
||A˜p˜-b˜|| = minp˜
which is as follows:
- Let
κ = ||Σ||||Σ-1||
with the singular values decomposition
A = U Σ VT
(κ is the condition number of A)
- E= A - A˜ , δ(b)= ||b - b˜||
- Assume with δ(A) = ||E||/||A|| there holds
2 κ δ(A) < 1
- Then , with r the minimal residual of the original problem
|| p - p˜||/||p|| <= (κ/( 1 - 2 κδ(A) ) ) × ( δ(A) +
{ δ(b) + κδ(A)||r||}/{||b||2-||r||2}1/2 )
This means that the condition number of the so called normal equations (κ2)
appears here only multiplied
with the norm of the residual and if κ||r|| <= 1
that the sensitivity of the least squares problem is essentially identical to the
sensitivity of a (solvable) linear system Ap = b, whereas the large residual problem
can be very illconditioned.
- The above mentioned situation ''more equations than unknowns'' appears
quite often in applications. Imagine you have a set of measured data
(x(i),y(i)), i=1,...,N,
and a ''model'' for the dependence of y on x, that is
an ''Ansatz'' with given functions φj(x)
y = F(x;p)=p1φ1(x)+...+pnφn(x) + ε(x)
which depends linearly on the unknown parameters pj, j=1,..,n and where you
have reason to assume ε(xi) being ''small noise''.
Inserting the data into this model with ε(xi) neglected leads to the situation above with
a N * n
matrix A with the elements
Ai,j = φj(xi) and bi = yi .
- Using the euclidean norm results in minimization of the sum of squares
∑i=1 to N (F(xi;p)-yi)2
w.r.t. p.
- The method of least squares has also a justification in statistics: if you
assume that ''small noise'' above means normally and independently distributed random
variables ε(xi) with mean zero and variance independent of i
then it turns out that the expectation value of your computed p is the ''true'' parameter
in the model.
- The QR decomposition of the matrix provides one way to diminuate the influence of illconditioning
present in the normal equations. It uses the fact that the euclidean length is invariant under orthonormal
transformations and hence applies such transformations with the aim of transforming the system
A p = b into the form
(Q A)T = ( RT, OT)
R p = c1
O p = c2
where R is triangular and square, O is a zero matrix of N-n rows and c=Qb .
Since the multiplication with Q does not change the rank, R will be
nonsingular. Hence the incompatibility of the system is in the second equation only, with ||c2||
being the length of the optimal residual. The optimal parameter p is obtained simply by
backsubstitution in the first equation. Formally this is equivalent to writing
A = QT ( RT, OT)T
- The matrix Q is in this application not computed explicitly,
but in a factorized form as the product of n (in the case of N=n n-1) so called
Householder matrices. These are special orthonormal matrices performing a reflection on a
hyperplane with normal ui:
I - 2 uiuiT / (uiTui), i=1,..,n
The vector ui is the normal of this hyperplane and can be determined easily
by the job of step i in this decomposition: annihilate the elements (i+1,i),...,(N,i)
in the (current) matrix. If you think of these elements as a column, then for example
(8,3,1,-1,4,-3)T -> (-10,0,0,0,0,0)T
with
u = (18,3,1,-1,4,-3)T
- Given those normals ui any desired operation with the matrix Q which is
the product of the Householder matrices can be accomplished.
- One can show that the computation and the application of these transformations is numerical backward
stable in the sense that the computed solution p is the exact solution to a problem which is
a perturbation of the original system in the order of machine roundoff.
- It should be obvious that the choice of the ''Ansatz'' heavily influences the
condition number of A and hence the sensitivity of the coefficients. The
application here allows you to test this for example using different bases of the
n dimensional linear space of polynomials of maximum degree n-1
(e.g. monomial, Legrendre and Chebyshev).
The following short table with the abscissas xi equidistant in
[-1,1] demonstrates this quite well. For the monomial basis in intervals
like [1,5] the situation becomes even worse!
The table shows the reciprocal condition number of A, i.e. 1/κ
N | n | Monomial | Chebychev |
10 | 5 | 0.5982810E-001 | 0.6058193E+000 |
10 | 10 | 0.2161485E-003 | 0.6858268E-001 |
20 | 5 | 0.5834324E-001 | 0.5135544E+000 |
20 | 10 | 0.7595056E-003 | 0.4190469E+000 |
30 | 5 | 0.5688019E-001 | 0.4786713E+000 |
30 | 10 | 0.8214191E-003 | 0.4422010E+000 |
30 | 15 | 0.8670296E-005 | 0.2740692E+000 |
40 | 5 | 0.5599882E-001 | 0.4613035E+000 |
40 | 10 | 0.8290337E-003 | 0.4275510E+000 |
40 | 15 | 0.9995193E-005 | 0.3545616E+000 |
40 | 20 | 0.9708376E-007 | 0.1313730E+000 |
50 | 5 | 0.5542571E-001 | 0.4509568E+000 |
50 | 10 | 0.8266143E-003 | 0.4072196E+000 |
50 | 15 | 0.1050676E-004 | 0.3782863E+000 |
50 | 20 | 0.1163482E-006 | 0.2202498E+000 |
100 | 5 | 0.5418598E-001 | 0.4298162E+000 |
100 | 10 | 0.8072133E-003 | 0.3451954E+000 |
100 | 15 | 0.1084858E-004 | 0.3424422E+000 |
100 | 20 | 0.1387583E-006 | 0.3193107E+000 |
- This program gives you three choices: either you specify explictly a problem by
the coefficients A and b. Or you play with some linear least squares
models which are predefined here. For example it might be instructive to play with a
polynomial model using different intervals on the real axis. Or finally you specify an ''ansatz''
of your own and work either with your own data or artificially generated ones.
|
|
|
Input |
| Case 1: explicit specification of matrix and right hand side: |
| - row and column number of the matrix
- the coefficients of A with the elements of b appended, rowwise
|
| Case 2: predefined models with data generation: |
|
- You choose a model.
- you specify an interval [a,b] for the ''free'' variable x.
- you specify the number of data N.
- You also specify the amount of generated ''noise''. This is here a set of random numbers ri
following a rectangular distribution in the interval [-errlev,errlev]*y_max_abs
where errlev can be chosen. That means that the data are disturbed with an absolute error
which is proportional to the largest function value of F.
errlev=0 is possible and shows the effect of roundoff errors on the computation.
- The abscissas xi are generated equidistantly in [a,b]
- Then yi is computed as
yi = F(xi;p1,...)+2·(½-ri)·errlev·y_max_abs.
- You have the choice between four models with 5 or 6 coefficients.
- A polynomial using the monomials 1, x, x2, ..., x5
as basis. You have the additional choice of solving the problem after a linear
transformation of [a,b] to [-1,1].
- A polynomial with the Legendre polynomials L0(x),...,L5(x)
as basis. These are
L0(x) = 1, L1(x) = x,
L2(x) = ½(3x2-1),...
- A trigonometric model with the basis 1, sin(c1 π x),
cos(c2 π x), sin(c3 π x), cos(c4 π x).
- An exponential model with fixed exponents 1,exp(ci x) , i=1,2,3,4
.
- In the last two cases you may choose the constants ci yourself.
- You specify the ''true'' coefficients p.
|
| Case 3: A model of your own: |
|
- In this case you must write a piece of
code following FORTRAN conventions
which computes one row of A given the ''instance'' x.
Input is here x and n, output is one row of A.
Row and column numbers are specified separately.
- With the so defined ''ansatz'' you either might generate artificial data
(in this case the abscissas are chosen equidistantly in a given interval
and you specify the ''true'' coefficients and a perturbation level as in case 2)
or
- you fit given own discrete data.
|
|
|
Output |
|
- In case 1 you get the solution vector p and the euclidean length of the residual.
You also get the reciprocal condition number 1/κ
- In case 2: you get
- your chosen ''true'' coefficients p0,...,pn.
- the computed coefficients, showing the influence of roundoff and/or generated noise,
for the solution way using the normal equations and the QR decomposition.
- A plot showing the discrete data points and the approximating function F(x;p)
with p from the QR solution way.
- The reciprocal condition number 1/κ
- The same output as in case two is also provided for case three, an ''ansatz'' of your own.
|
|
|
Questions ?! |
|
- If a small errlev generates large deviations in p: what do you conclude
from this?
- In which cases the results between the normal equations and the QR solution way differ most
and why? Can you realize a connection of these differences to the reciprocal
condition number?
- Which role plays the transformation to [-1,1] in the case of a polynomial model?
- Which difference do you observe between the monomial and the Legendre basis?
- What do you expect if you choose in case 2 and the exponential model some ci=0?
- How compares the upper bound for the change in the solution given above with
with the one you can observe here, if you first set errlev to zero and then
to some realistic value in the magnitude of some percent (like measurement errors)?
|
|
|
|