scikit-learn PCA doesn't have 'score' method

Question

I am trying to identify the type of noise based on that article:

Model selection with Probabilistic (PCA) and Factor Analysis (FA)

I am using scikit-learn-0.14.1.win32-py2.7 on win8 64bit I know that it refers on version 0.15, however at the version 0.14 documentation it mentions that the score method is available for PCA so I guess it should normally work:

sklearn.decomposition.ProbabilisticPCA

The problem is that no matter which PCA I will use for the *cross_val_score*, I always get a type error message saying that the estimator PCA does not have a score method:

*TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator PCA(copy=True, n_components=None, whiten=False) does not.*

Any ideas why is that happening?

Many thanks in advance

Christos

X has 1000 samples of 40 features

here is a portion of the code:

import numpy as np
import csv
from scipy import linalg
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.cross_validation import cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.covariance import ShrunkCovariance, LedoitWolf

#read in the training data
train_path = '<train data path>/train.csv'

reader = csv.reader(open(train_path,"rb"),delimiter=',')
train = list(reader)
X = np.array(train).astype('float')

n_samples = 1000
n_features = 40
n_components = np.arange(0, n_features, 4)

def compute_scores(X):
    pca = PCA()

    pca_scores = []
    for n in n_components:
        pca.n_components = n
        pca_scores.append(np.mean(cross_val_score(pca, X, n_jobs=1)))

    return pca_scores

pca_scores = compute_scores(X)
n_components_pca = n_components[np.argmax(pca_scores)]

data can be found here: kaggle.com/c/data-science-london-scikit-learn/data — celetron
With using probabilistic PCA, the error should not turn up. However examples usually only work with the version they come with. — Andreas Mueller
Thx Andreas, that is correct. If you use probabilistic PCA it works fine. They are not providing any examples for the 14.1 version but it works. I suppose in the new version it will also work for PCA — celetron

celetron celetron · Accepted Answer · 2013-11-12T19:56:52

Ok, I think I found the problem. it is not working with PCA, but it does work with PPCA However, by not providing a cv number the cross_val_score automatically sets 3-fold cross validation that created 3 sets with sizes 334, 333 and 333 (my initial training set contains 1000 samples) Since nympy.mean cannot make a comparison between sets with different sizes (334 vs 333), python rises an exception. thx

scikit-learn PCA doesn't have 'score' method

1 Answers