4
votes

I don't understand one point of PCA. PCA returns the directions that maximizes the variance for each feature? I mean, it will return a component for each feature of our original space, and only the k biggest components will be used as axis for the new subspace right? So actually if I'm in 50-D and 49 features have a strong variance i can just pass to a 49-D space? I'm speaking in plain English of course, nothing formally or technical.

Thanks

1
Yes but i mean, if we look at the Covariance Matrix algorithm, the number of eigenvectors and eigenvalues returned is N and only k of them are returned as final dimensions. What is that N? Should be the number of columns of our data matrix X ... so? Edit: So in 2 dim are returned 2 components, as the number of featuresrollotommasi
the input features are analyzed using PCA to discover all the top orthogonal dynamics which are explicitly not a one to one mapping from the input features ... so your 50D input features may well get reduced to just 3 dimensions ... often none of the input features is purely just one of the PCA output dimensions ... think of a handful of pencils thrown onto the ground ... on the flat 2D surface you only have two possible dimensions so all those pencils sent into a PCA will result in just those two vectors represented by X and Y ... each PCA output is orthogonal - independent of othersScott Stensland
Here is a confirm, there is one component for each feature but Pca permit to prioritize them using eigenvalues ! deeplearning4j.org/eigenvector#covariancerollotommasi
to really understand PCA I suggest you avoid just throwing your data into some library call ... instead roll up your sleeves and write your own PCA from scratch ... sure later when you are in production resort to using someone else's library but not during the learning stages ... the PCA algorithm is not magic and you can write your own in a few pages of codeScott Stensland

1 Answers

5
votes

If your original data has 50 dimensions, then PCA will return 50 principal components. It is up to you to choose a subset k of those principal components that can explain the most variance, typically at least 90% of the variance. The PCA software you use will usually compute how much variance is explained by each principal component, so just add up the variance and select the top k that can get you to 90% of the total variance. See this PCA tutorial:

In general, we would like to choose the smallest K such that 0.85 to 0.99 (equivalently, 85% to 95%) of the total variance is explained, where these values follow from PCA best practices.

... When we say that PCA can reduce dimensionality, we mean that PCA can compute principal components and the user can choose the smallest number K of them that explain 0.95 of the variance. A subjectively satisfactory result would be when K is small relative to the original number of features D.