I am trying to learn the basics of PCA analysis in Python using scikit libraries (in particular sklearn.decomposition and sklearn.preprocessing). The goal is to import data from images into a matrix X (each row is a sample, each column is a feature), then standardize X, use PCA to extract principal components (2 most important, 6 most important....6 less important), project X on these principal components, reverse the previous transformation and plot the result in order to see the difference with respect to the original image/images.
Now let's say that I do not want to consider the 2,3,4... most important principal components but I want to consider the N less relevant components, let's say N=6.
How should the analysis be done? I mean I can't simply standardize then call PCA().fit_transform and then revert back with inverse_transform() to plot the results.
At the moment I am doing something like this:
X_std = StandardScaler().fit_transform(X) # standardize original data
pca = PCA()
model = pca.fit(X_std) # create model with all components
Xprime = model.components_[range(dim-6, dim, 1),:] # get last 6 PC
And then I stop because I know I should call transform() but I do not understand how to do it...I tried several times withouth being successfull.
Is there someone that can tell me if previous steps are correct and point out the direction to follow?
Thank you very much
EDIT: currently I have adapted this solution as suggested by the first answer to my question:
model = PCA().fit(X_std)
model2pc = model
model2pc.components_[range(2, img_count, 1), :] = 0
Xp_2pc = model2pc.transform(X_std)
Xr_2pc = model2pc.inverse_transform(Xp_2pc)
And then I do the same for 6pc, 60pc, last 6 pc. What I have noticed is that this is very time consuming. I would like to get a model directly extracting the principal components I need (without zeroing out the others) and then perform transform() and inverse_transform() on that with that model.