|
- Any real or complex matrix can be decomposed as
A = U S VH
where S is a matrix of the same shape as A ,
with nonnegative entries at most on the diagonal and zero elements otherwise,
the so called singular values,
and U and V unitary. Of course U has as much rows as A
whereas V has as much rows as A has columns, with U, V square.
The ''H'' means ''complex conjugate and transpose''.
The columns of U are the so called left singular vectors and those of
V the right singular vectors. For short you will see this in many texts simply under
the name SVD.
There is also an ''economy'' form of the svd where the S- part is square as V
omitting a zero block in S and the columns in U which correspond to this
zero block.
Here we use the full form.
- If one uses a suitable consistent enumeration then the columns of
U resp. V can be obtained as eigenvectors of A AH resp.
AH A and the nonzero entries of S
as roots of the nonzero eigenvalues of AH A .
But this is not the way they are computed in practice, not only because of the
numerical effort involved but also because this form of computation might
introduce roundoff errors much above the natural perturbation sensitivity
of these data.
- Rather one works directly on A using simple unitary operations
from the left
and the right with the aim to transform it to diagonal form.
This is an iterative process of course, i.e. one has
UkH A Vk -> S as k -> ∞ .
The positivity
condition on the diagonal of S is realized by corresponding sign changes in
U or V.
The transformations used can be purely rotations or more complicated ones,
for example Householder reflectors.
There are different possibilities to choose, we here follow the way proposed by Kahan, Golub
and Reinsch.
- In the following we assume that the row number m of A
is not smaller than the column number n.
- There is a preliminary transformation to upper bidiagonal form J (only
elements on the main diagonal and one diagonal above are nonzero).
This is achieved by applying Householder matrices (like in the QR decomposition)
from the left and the right with the aim to produce zeros in column one, row one,
column two, row two, ... in this order.
- Now the singular values are the roots of the eigenvalues of JHJ.
This is a tridiagonal hermitian matrix, hence the QL method with Wilkinson
shift would be always convergent. But, as before, we want to operate directly
on J. Now there comes the Francis theorem in to play:
Let T be an upper Hessenberg matrix with no subdiagonal element
vanishing (for J this means that no superdiagonal element vanishes),
W and Q both unitary and let
WH T W and QH T Q be again upper
Hessenberg. Further let W and
Q share a common first column. Then
WH T W = DH QH T Q D
with an unitary diagonal D, i.e. they are essential identical.
This clearly applies to a tridiagonal
hermitian matrix, here T=JHJ. The diagonal similarity transformation
is irrelevant for our purposes, hence we can forget about this. Hence we
compute a first Givens rotation in order to annihilate the subdiagonal element (2,1)
in T-μ I with the Wilkinsonshift μ for T and apply this
to J from the right . This destroys the bidiagonal form which we
reinstall by further
Givens reflectors applied from the left and the right until the undesired
element outside the bidiagonal form is driven out. Then we have a new
bidiagonal form J+ and of course
T+ = J+H J+ is tridiagonal and unitary
similar to T. From the structure of the Givens rotations applied from
the left it follows that their product which generates the transition from
T to T+ has the same first column as the first reflector alone
and hence T+ is the result of a QR step with shift μ.
From the convergence of this it follows that the
element (n-1,n) of J tends to zero in the course of the iteration.
Hence we can reduce the dimension if this has been reduced to the roundoff level
and proceed in the same way.
- The SVD has multiple applications, one of them is the computation
of the Moore-Penrose pseudoinverse A# of A.
The Moore-Penrose pseudoinverse is uniquely determined by the four requirements
- A A# = (A A# )H
- A# A = ( A# A)H
- A (A# A) = A
- A# ( A A#) = A#
- It is easy to verify that
A# = V S# UH
with S# obtained from S by replacing its nonzero elements
by their reciprocal value satisfies these four rules.
- In linear least squares problems
one minimizes
||r||2 = ||Ax - b||2
for given A and right hand side b w.r.t. x.
In this application the usage of the SVD is the ''ultima ratio''
if A is very illconditioned or even rank deficient.
In the rank deficient case the minimizer is not unique and the SVD gives
the minimizer of minimal euclidean length, which is unique, as
x* = A# b
- In real life a rank deficient case will not reveal itself by s(i)=0
but by some ''very small'' singular values in the order of the machine precision
times the matrix norm. Here one introduces the concept of a ''numerical rank''.
Let
s(i) def=0 if
s(i) <= α max {s(j)}
with a user defined value for α .
Then the numerical rank will be the number of nonzero s(i) .
This is a kind of regularization of the problem and the least squares
solution is now dependent on α.
- As condition number of a full rank rectangular matrix one defines the quotient of
the largest and the smallest singular value.
- The sensitivity of the singular values of a matrix is similar to
sensitivity of the eigenvalues of a hermitian matrix: If A and B
are to conforming matrices, then their singular values sA, sB
in the natural ordering satisfy
|sA,i - sB,i| <= ||A-B||
- The sensitivity of a linear least squares problem can be expressed via this so
defined condition number: Let
||Ax - a||=minx , ||By-b||=miny
be two linear least squares problems with solutions x*, y*,
κ=cond(A), β=||B-A||/||A||
and assume
2 κ β < 1
Then
||x*-y*||/||x*|| <= (1/(1-2 κ β))
(κ β+ κ ( ||b-a|| + κ β ||Ax*-a||)/(||a||2-||Ax*-a||2)1/2))
Observe that the condition number κ2 of the matrix of the
normal equations appears here only in combination with the (hopefully) small residual norm.
This is the main reason why solution of least squares problems via the normal equations is
considered as a fault.
- Another application of the SVD comes here:
Imagine you have a point set in 3D space which should lie on a plane but
doens't due to data perturbations and you would like to reconstruct that plane.
Then a good idea will be to define as ''best'' guess the plane whose sum of
squares
of orthogonal distances from the points in the set is minimal.
This plane is obtained from the SVD of the matrix with the rows
(xi,yi,zi,1).
The column of V which corresponds to (one of) the smallest singular
value(s), normalized such that the euclidean length of the vector from its
first three components has length one (the normal of the plane)
gives the coefficients of the Hesse normal form of this plane.
That means one solves first ||Ac||=minc,||c||=1 and
normalizes c as required subsequently.
|