0
votes

I have 2D data (I have a zero mean normalized data). I know the covariance matrix, eigenvalues and eigenvectors of it. I want to decide whether to reduce the dimension to 1 or not (I use principal component analysis, PCA). How can I decide? Is there any methodology for it?

I am looking sth. like if you look at this ratio and if this ratio is high than it is logical to go on with dimensionality reduction.

PS 1: Does PoV (Proportion of variation) stands for it?

PS 2: Here is an answer: https://stats.stackexchange.com/questions/22569/pca-and-proportion-of-variance-explained does it a criteria to test it?

2

2 Answers

0
votes

PoV (Proportion of variation) represents how much information of data will remain relatively to using all of them. It may be used for that purpose. If POV is high than less information will be lose.

0
votes

You want to sort your eigenvalues by magnitude then pick the highest 1 or 2 values. Eigenvalues with a very small relative value can be considered for exclusion. You can then translate data values and using only the top 1 or 2 eigenvectors you'll get dimensions for plotting results. This will give a visual representation of the PCA split. Also check out scikit-learn for more on PCA. Precisions, recalls, F1-scores will tell you how well it works

from http://sebastianraschka.com/Articles/2014_pca_step_by_step.html...

Step 1: 3D Example

"For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our d×kd×k-dimensional eigenvector matrix WW.

matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1),
                  eig_pairs[1][1].reshape(3,1)))
print('Matrix W:\n', matrix_w)

>>>Matrix W:
[[-0.49210223 -0.64670286]
[-0.47927902 -0.35756937]
[-0.72672348  0.67373552]]"

Step 2: 3D Example

" In the last step, we use the 2×32×3-dimensional matrix WW that we just computed to transform our samples onto the new subspace via the equation y=W^T×x

transformed = matrix_w.T.dot(all_samples)
assert transformed.shape == (2,40), "The matrix is not 2x40 dimensional."