PCA forth and back mean subtraction/addition

Question

Bottom line question: When I use my computed PCA projection matrix P to project a vector v into the other space (possibly lower in terms of dimensions), should I first subtract from v the mean of the vectors that were used for creating the covariance matrix that its principal eigenvectors form the projection matrix P ?

Another derived question: If the answer for the upper question is "correct", then when I project a "reduced" vector back to the original space, should I finally add to it the same mean?

Now the detailed question, including the steps that may cause a confusion:

PCA flow goes in the following way:

Taking m vectors of length d and computing their covariance matrix. Since the element in the (i,j) position is the covariance of the i'th dimension and the j'th dimension along all the m vectors, we can get the target (dxd) sized covariance matrix by subtracting the mean from all the vectors, creating a matrix A of size (dxm), in which all the mean-subtracted vectors are placed as column vectors and computing the multiplication: C = AA'.
Computing the d eigenvalues and eigenvectors of C, and for some pre-selected k, creating a matrix P of size (kxd), and placing the k eigenvectors corresponding to the largest eigenvalues in descending order, as row vectors of P.
For any vector v of the original dimension d, that we want to project to the possibly reduced dimension k, we compute the multiplication: u = Pv, which produces a vector in the possibly reduced dimension k.
For any vector u that was already projected to the possibly reduced dimension k, if we want to project it back (after possible loss of data) to the original dimension d, we compute the multiplication: v = P'u, which produces a vector in the original dimension d.

The question is whether:

in step (3), we should first subtract from v the mean we computed in step (1)?
in step (4), we should finally add to v the mean we computed in step (1)?

I agree, but it is just a constant, it should not affect the eigenvectors and the order of their corresponding eigenvalues — SalatYerakot
Moreover, I think you should divide by sqrt(m), like in the covariance measure formula — SalatYerakot
No, I believe it's m, unless you are re-scaling the samples themselves, in which case it's sqrt(m). For PCA compession it indeed doesn't matter, as eigenvalues are only used to sort eigenvectors, which are unaffected by this constant factor. — Joseph Artsimovich

Joseph Artsimovich Joseph Artsimovich · Accepted Answer · 2017-01-19T18:32:41

in step (3), we should first subtract from v the mean we computed in step (1)?

in step (4), we should finally add to v the mean we computed in step (1)?

Yes and yes.

Also note that you can compute the covariance matrix in one pass over the samples if necessary, with:

COV = E[vv'] - E[v]E[v]'

PCA forth and back mean subtraction/addition

2 Answers