34
votes

How does it actually reduce noise..can you suggest some nice tutorials?

5
Rather off-topic question, if you want theory, then go to Wikipedia - they've got basic description and references. If you want help on particular programming subject then restate the question (i.e. how to use Lapack to obtain SVD of hermitian matrix, etc).Anonymous
Please don't close. This is far more programming-related than some of the touchy-feely questions that have been on this site.Jason S
After some thinking I have to agree, and remove the -1 :)Anonymous

5 Answers

50
votes

SVD can be understood from a geometric sense for square matrices as a transformation on a vector.

Consider a square n x n matrix M multiplying a vector v to produce an output vector w:

w = M*v

The singular value decomposition M is the product of three matrices M=U*S*V, so w=U*S*V*v. U and V are orthonormal matrices. From a geometric transformation point of view (acting upon a vector by multiplying it), they are combinations of rotations and reflections that do not change the length of the vector they are multiplying. S is a diagonal matrix which represents scaling or squashing with different scaling factors (the diagonal terms) along each of the n axes.

So the effect of left-multiplying a vector v by a matrix M is to rotate/reflect v by M's orthonormal factor V, then scale/squash the result by a diagonal factor S, then rotate/reflect the result by M's orthonormal factor U.

One reason SVD is desirable from a numerical standpoint is that multiplication by orthonormal matrices is an invertible and extremely stable operation (condition number is 1). SVD captures any ill-conditioned-ness in the diagonal scaling matrix S.

18
votes

One way to use SVD to reduce noise is to do the decomposition, set components that are near zero to be exactly zero, then re-compose.

Here's an online tutorial on SVD.

You might want to take a look at Numerical Recipes.

8
votes

Singular value decomposition is a method for taking an nxm matrix M and "decomposing" it into three matrices such that M=USV. S is a diagonal square (the only nonzero entries are on the diagonal from top-left to bottom-right) matrix containing the "singular values" of M. U and V are orthogonal, which leads to the geometric understanding of SVD, but that isn't necessary for noise reduction.

With M=USV, we still have the original matrix M with all its noise intact. However, if we only keep the k largest singular values (which is easy, since many SVD algorithms compute a decomposition where the entries of S are sorted in nonincreasing order), then we have an approximation of the original matrix. This works because we assume that the small values are the noise, and that the more significant patterns in the data will be expressed through the vectors associated with larger singular values.

In fact, the resulting approximation is the most accurate rank-k approximation of the original matrix (has the least squared error).

6
votes

To answer to the tittle question: SVD is a generalization of eigenvalues/eigenvectors to non-square matrices. Say, $X \in N \times p$, then the SVD decomposition of X yields X=UDV^T where D is diagonal and U and V are orthogonal matrices. Now X^TX is a square matrice, and the SVD decomposition of X^TX=VD^2V where V is equivalent to the eigenvectors of X^TX and D^2 contains the eigenvalues of X^TX.

4
votes

SVD can also be used to greatly ease global (i.e. to all observations simultaneously) fitting of an arbitrary model (expressed in an formula) to data (with respect to two variables and expressed in a matrix).
For example, data matrix A = D * MT where D represents the possible states of a system and M represents its evolution wrt some variable (e.g. time).
By SVD, A(x,y) = U(x) * S * VT(y) and therefore D * MT = U * S * VT
then D = U * S * VT * MT+ where the "+" indicates a pseudoinverse.
One can then take a mathematical model for the evolution and fit it to the columns of V, each of which are a linear combination the components of the model (this is easy, as each column is a 1D curve). This obtains model parameters which generate M? (the ? indicates it is based on fitting).
M * M?+ * V = V? which allows residuals R * S2 = V - V? to be minimized, thus determining D and M.

Pretty cool, eh?

The columns of U and V can also be inspected to glean information about the data; for example each inflection point in the columns of V typically indicates a different component of the model.

Finally, and actually addressing your question, it is import to note that although each successive singular value (element of the diagonal matrix S) with its attendant vectors U and V does have lower signal to noise, the separation of the components of the model in these "less important" vectors is actually more pronounced. In other words, if the data is described by a bunch of state changes that follow a sum of exponentials or whatever, the relative weights of each exponential get closer together in the smaller singular values. In other other words the later singular values have vectors which are less smooth (noisier) but in which the change represented by each component are more distinct.