Bottom line question: When I use my computed PCA projection matrix P to project a vector v into the other space (possibly lower in terms of dimensions), should I first subtract from v the mean of the vectors that were used for creating the covariance matrix that its principal eigenvectors form the projection matrix P ?
Another derived question: If the answer for the upper question is "correct", then when I project a "reduced" vector back to the original space, should I finally add to it the same mean?
Now the detailed question, including the steps that may cause a confusion:
PCA flow goes in the following way:
Taking m vectors of length d and computing their covariance matrix. Since the element in the (i,j) position is the covariance of the i'th dimension and the j'th dimension along all the m vectors, we can get the target (dxd) sized covariance matrix by subtracting the mean from all the vectors, creating a matrix A of size (dxm), in which all the mean-subtracted vectors are placed as column vectors and computing the multiplication: C = AA'.
Computing the d eigenvalues and eigenvectors of C, and for some pre-selected k, creating a matrix P of size (kxd), and placing the k eigenvectors corresponding to the largest eigenvalues in descending order, as row vectors of P.
For any vector v of the original dimension d, that we want to project to the possibly reduced dimension k, we compute the multiplication: u = Pv, which produces a vector in the possibly reduced dimension k.
For any vector u that was already projected to the possibly reduced dimension k, if we want to project it back (after possible loss of data) to the original dimension d, we compute the multiplication: v = P'u, which produces a vector in the original dimension d.
The question is whether:
- in step (3), we should first subtract from v the mean we computed in step (1)?
- in step (4), we should finally add to v the mean we computed in step (1)?