I can perform PCA in scikit by code below: X_train has 279180 rows and 104 columns.
from sklearn.decomposition import PCA
pca = PCA(n_components=30)
X_train_pca = pca.fit_transform(X_train)
Now, when I want to project the eigenvectors onto feature space, I must do following:
""" Projection """
comp = pca.components_ #30x104
com_tr = np.transpose(pca.components_) #104x30
proj = np.dot(X_train,com_tr) #279180x104 * 104x30 = 297180x30
But I am hesitating with this step, because Scikit documentation says:
components_: array, [n_components, n_features]
Principal axes in feature space, representing the directions of maximum variance in the data.
It seems to me, that it is already projected, but when I checked the source code, it returns only the eigenvectors.
What is the right way how to project it?
Ultimately, I am aiming to calculate the MSE of reconstruction.
""" Reconstruct """
recon = np.dot(proj,comp) #297180x30 * 30x104 = 279180x104
""" MSE Error """
print "MSE = %.6G" %(np.mean((X_train - recon)**2))