Single image feature reduction at inference time

Question

I am trying to train a SVM classifier using scikit-learn.. At training time I want to reduce the feature vector dimension. I have used PCA to reduce the dimension.

pp = PCA(n_components=400).fit(features)
features = pp.transform(features)

PCA requires m x n dataset to determine the variance. but at the time of inference I have only single image and corresponding 1d feature vector.. I am wondering how to reduce feature vector at inference time in order to match the training dimension.

desertnaut desertnaut · Accepted Answer · 2020-07-30T11:50:05

As all preprocessing modules in scikit-learn nowadays, PCA includes a transform method that does exactly that, i.e. it transforms new samples according to an already fitted PCA transformation; from the docs:

transform(self, X)

Apply dimensionality reduction to X.

X is projected on the first principal components previously extracted from a training set.

Here is a short demo with dummy data, adapting the example from the documentation:

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)

X_new = ([[1, -1]]) # new data, notice the double array brackets

X_new_pca = pca.transform(X_new)
X_new_pca
# array([[-0.2935787 ,  1.38340578]])

If you want to avoid the double brackets for a single new sample, you should make it into a numpy array and reshape it as follows:

X_new = np.array([1, -1])
X_new_pca = pca.transform(X_new.reshape(1, -1))
X_new_pca
# array([[-0.2935787 ,  1.38340578]]) # same result

Single image feature reduction at inference time

3 Answers