You probably already figured this out, but I'll post a short description anyway.
First, let me describe the two techniques speaking generally.
PCA basically takes a dataset and figures out how to "transform" it (i.e. project it into a new space, usually of lower dimension). It essentially gives you a new representation of the same data. This new representation has some useful properties. For instance, each dimension of the new space is associated with the amount of variance it explains, i.e. you can essentially order the variables output by PCA by how important they are in terms of the original representation. Another property is the fact that linear correlation is removed from the PCA representation.
SVD is a way to factorize a matrix. Given a matrix M
(e.g. for data, it could be an n
by m
matrix, for n
datapoints, each of dimension m
), you get U,S,V = SVD(M)
where:M=USV^T
, S
is a diagonal matrix, and both U
and V
are orthogonal matrices (meaning the columns & rows are orthonormal; or equivalently UU^T=I
& VV^T=I
).
The entries of S
are called the singular values of M
. You can think of SVD as dimensionality reduction for matrices, since you can cut off the lower singular values (i.e. set them to zero), destroying the "lower parts" of the matrices upon multiplying them, and get an approximation to M
. In other words, just keep the top k
singular values (and the top k
vectors in U
and V
), and you have a "dimensionally reduced" version (representation) of the matrix.
Mathematically, this gives you the best rank k
approximation to M
, essentially like a reduction to k
dimensions. (see this answer for more).
So Question 1
I understand the general premis of dimensionality reduction as bringing data to a lower dimension - But
a) how do SVD and PCA do this, and b) how do they differ in their approach
The answer is that they are the same.
To see this, I suggest reading the following posts on the CV and math stack exchange sites:
Let me summarize the answer:
essentially, SVD can be used to compute PCA.
PCA is closely related to the eigenvectors and eigenvalues of the covariance matrix of the data. Essentially, by taking the data matrix, computing its SVD, and then squaring the singular values (and doing a little scaling), you end up getting the eigendecomposition of the covariance matrix of the data.
Question 2
maybe if you can explain what the results of each technique is telling me, so for a) SVD - what are singular values b) PCA - "proportion of variance"
These eigenvectors (the singular vectors of the SVD, or the principal components of the PCA) form the axes of the news space into which one transforms the data.
The eigenvalues (closely related to the squares of the data matrix SVD singular values) hold the variance explained by each component. Often, people want to retain say 95% of the variance of the original data, so if they originally had n
-dimensional data, they reduce it to d
-dimensional data that keeps that much of the original variance, by choosing the largest d
-eigenvalues such that 95% of the variance is kept. This keeps as much information as possible, while retaining as few useless dimensions as possible.
In other words, these values (variance explained) essentially tell us the importance of each principal component (PC), in terms of their usefulness reconstructing the original (high-dimensional) data. Since each PC forms an axis in the new space (constructed via linear combinations of the old axes in the original space), it tells us the relative importance of each of the new dimensions.
For bonus, note that SVD can also be used to compute eigendecompositions, so it can also be used to compute PCA in a different way, namely by decomposing the covariance matrix directly. See this post for details.