0
votes

enter image description here

In machine learning, PCA is used to reduce the dimensionality of training data. However, from the above picture, I can't understand where is the reduction?

The input data x_i has D dimensions: enter image description here

The output data x still has D dimensions: enter image description here

2
Be sure to check where d replaces D, especially in step 5! So the idea is: instead of using all D components, use only d < D.sascha

2 Answers

2
votes

The crucial element here is misunderstanding what is the output, in this pseudocode the output is y (equation 29), not x (equation 30), consequently you do reduce your data to d dimensions, the final equation shows you that if you would like to move back to original space, you can do it (obviously data will be recovered with some errors, since in meantime we dropped a lot of information when going to d dimensions).

1
votes

The important thing to understand while using PCA is the covariance matrix C(x) and its corresponding spectral decomposition. The obtained eigenvalues and eigenvector of the decomposition is used to reduce the dimensionality.

For a D dimensional training set, we have D number of eigenvalues and their corresponding eigenvectors. But in practice (specially image related applications) many of the eigenvectors are correlated; in other words many of them are redundant basis vectors. So discarding those vector from the basis space doesn't result in significant information loss.

Now, if you want to reduce the dimension of your input data from original D to d < D dimension, you can project the input data into d dominant eigenvectors (from the d largest eigenvalues). Eq~29 gives the project input data into the d dimensional space. Eq~30 is used to reconstruct the original data; here reconstruction errors depend on d (number of eigenvectors)