My question is: in the case of having a matrix we want to do PCA on, where the number of features greatly outnumbers the number of trials, why doesn't prcomp behave as expected (or am I missing something)?
Below is a summary of the issue, full code is here, compressed 7MB data source is here (is 55MB uncompressed), target image is here.
My exact situation is that I have a matrix p by n matrix X (p = features, n = trials) where the trials are photos taken of faces, and the features are the pixels in the photos (so a 32256 by 148 matrix). What I want to do is find the principal component score vectors of that matrix. Since finding the covariance matrix XX^T is too expensive, an easy solution is to find the eigenvectors (v_i) of X^TX and transform them by X (Xv_i) more info.
XTX <- t(X) %*% X # missing the 1 / n - 1 for cov matrix b/c we normalize later anyway
eigen <- eigen(XTX)
eigenvectors.XTX.col <- eigen$vectors
principal.component.scores <- apply(eigenvectors.XTX.col, 2, function(c) {
normalize.vector(X %*% matrix(c, ncol = 1))
})
The principal component scores are eigenfaces in my case, and can be used to successfully reconstruct the target face as seen here: http://cl.ly/image/260w0N0u0Z3y (refer to my full code for how)
Passing X to prcomp should do something equivalent, but has a different result than the above homegrown way:
pca <- prcomp(X)
pca$x # right size, but wrong pc scores
The result of using pca$x
in reconstructing the face is not total crap, but much worse: http://cl.ly/image/2p19360u2P43
I also checked that using prcomp
on t(x)
yielded a different rotation matrix, so prcomp
is doing something fancy, but something mysterious under the hood. I know from here that prcomp
is using SVD to calculate the principal component loading vectors instead of eigen decomposition, but that should not be leading to any errors here (or so I think...).
What is the correct way of using the built in prcomp method, there must be a way, right?
princomp
, but as I was surprised to find in the answer below, it is not the case that you need more observations than variables (this would be impossible in some disciplines like genomics, or even here with facial recognition). Rather the issue here is some annoying defaults inprcomp
as well as a lack of normalization. Sorry for the lame problem in my question! - polpetti