2
votes

I am trying to identify the type of noise based on that article:

Model selection with Probabilistic (PCA) and Factor Analysis (FA)

I am using scikit-learn-0.14.1.win32-py2.7 on win8 64bit I know that it refers on version 0.15, however at the version 0.14 documentation it mentions that the score method is available for PCA so I guess it should normally work:

sklearn.decomposition.ProbabilisticPCA

The problem is that no matter which PCA I will use for the *cross_val_score*, I always get a type error message saying that the estimator PCA does not have a score method:

*TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator PCA(copy=True, n_components=None, whiten=False) does not.*

Any ideas why is that happening?

Many thanks in advance

Christos

X has 1000 samples of 40 features

here is a portion of the code:

import numpy as np
import csv
from scipy import linalg
from sklearn.decomposition import PCA, FactorAnalysis
from sklearn.cross_validation import cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.covariance import ShrunkCovariance, LedoitWolf

#read in the training data
train_path = '<train data path>/train.csv'

reader = csv.reader(open(train_path,"rb"),delimiter=',')
train = list(reader)
X = np.array(train).astype('float')

n_samples = 1000
n_features = 40
n_components = np.arange(0, n_features, 4)

def compute_scores(X):
    pca = PCA()

    pca_scores = []
    for n in n_components:
        pca.n_components = n
        pca_scores.append(np.mean(cross_val_score(pca, X, n_jobs=1)))

    return pca_scores

pca_scores = compute_scores(X)
n_components_pca = n_components[np.argmax(pca_scores)]
1
With using probabilistic PCA, the error should not turn up. However examples usually only work with the version they come with.Andreas Mueller
Thx Andreas, that is correct. If you use probabilistic PCA it works fine. They are not providing any examples for the 14.1 version but it works. I suppose in the new version it will also work for PCAceletron

1 Answers

-1
votes

Ok, I think I found the problem. it is not working with PCA, but it does work with PPCA However, by not providing a cv number the cross_val_score automatically sets 3-fold cross validation that created 3 sets with sizes 334, 333 and 333 (my initial training set contains 1000 samples) Since nympy.mean cannot make a comparison between sets with different sizes (334 vs 333), python rises an exception. thx