Advice on k fold cross validation

Question

I am currently carrying out a cross validation method with support vector machine classification of dicom images using the code:

    #Cross Validation using k-folds
    clf = svm.SVC(kernel='linear')
    scores = cross_validation.cross_val_score(clf,X,Y,cv=16))
    print scores
    print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(),scores.std()*2))

As you can see, I am currently using 16 folds, how would I find out the best amount of folds to use? Is it a case of more is better?

Also, I have found that whilst using cross validation, my accuracy scores vary massively from 66% to 100% which usually give a mean accuracy of 82% - 85%. Is there any advice on how I could improve this and perhaps ensure the classifier is picking equal amount of images from each class?

Sorry, I'm very new to Python!

Thank you for any advice!

AbtPst AbtPst · Accepted Answer · 2015-11-16T16:23:53

try using GridSearchCV. for example i can create a pipeline such as

pipeline = Pipeline([

    ('clf', LogisticRegression())
    ])

    parameters = {

        'clf__C': (0.1, 1, 10, 20, 30)
    }

so, here i am providing 5 different options for the C parameter of my LogisticRegression() classifier which is denoted by clf in my pipeline

of course, instead of LogisticRegression() you can use SVC. then

grid_search = GridSearchCV(pipeline, parameters, n_jobs=3, verbose=1, scoring='accuracy')

then just something like

bestParameters = grid_search.best_estimator_.get_params()
    for param_name in sorted(parameters.keys()):
        print ('\t %s: %r' % (param_name, bestParameters[param_name]))

will give you the best set of options to specify

Advice on k fold cross validation

1 Answers