Picking up from where we left...
So I can use linalg.eig or linalg.svd to compute the PCA. Each one returns different Principal Components/Eigenvectors and Eigenvalues when they're fed the same data (I'm currently using the Iris dataset).
Looking here or any other tutorial with the PCA applied to the Iris dataset, I'll find that the Eigenvalues are [2.9108 0.9212 0.1474 0.0206]
. The eig
method gives me a different set of eigenvalues/vectors to work with which I don't mind, except that these eigenvalues, once summed, equal the number of dimensions (4) and can be used to find how much each component contributes to the total variance.
Taking the eigenvalues returned by linalg.eig
I can't do that. For example, the values returned are [9206.53059607 314.10307292 12.03601935 3.53031167]
. The proportion of variance in this case would be [0.96542969 0.03293797 0.00126214 0.0003702]
. This other page says that ("The proportion of the variation explained by a component is just its eigenvalue divided by the sum of the eigenvalues.")
Since the variance explained by each dimension should be constant (I think), these proportions are wrong. So, if I use the values returned by svd()
, which are the values used in all tutorials, I can get the correct percentage of variation from each dimension, but I'm wondering why the values returned by eig
can't be used like that.
I assume the results returned are still a valid way to project the variables, so is there a way to transform them so that I can get the correct proportion of variance explained by each variable? In other words, can I use the eig
method and still have the proportion of variance for each variable? Additionally, could this mapping be done only in the eigenvalues so that I can have both the real eigenvalues and the normalized ones?
Sorry for the long writeup btw. Here's a (::)
for having gotten this far. Assuming you didn't just read this line.