2
votes

Here is my chunk of Python (2.7, [I learned Python 3 so use the future print_function to get the print formatting I'm used to using]) using the learning code from scikit-learn from a few revisions ago both of which I'm locked into because of corporate IT policy. Its using the SVC engine. What I don't understand is that the results I get for the +/- 1 case is different between the first (using simple_clf) and the second. But structurally I think they're identical with the first processing and entire array of data at once and the second just using data 1 piece of the array at a time. Yet the results don't agree. The values generated for the average (mean) score should be a decimal percentage (0.0 to 1.0). In some cases the difference is small but others more than large enough to make me ask my question.

from __future__ import print_function
import os
import numpy as np
from numpy import array, loadtxt
from sklearn import cross_validation, datasets, svm, preprocessing, grid_search
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score

GRADES = ['A+', 'A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'M']

# Initial processing
featurevecs = loadtxt( FEATUREVECFILE )
f = open( SCORESFILE )
scorelines = f.readlines()[ 1: ] # Skip header line
f.close()
scorenums = [ GRADES.index( l.split( '\t' )[ 1 ] ) for l in scorelines ]
scorenums = array( scorenums )

# Need this step to normalize the feature vectors
scaler = preprocessing.Scaler()
scaler.fit( featurevecs )
featurevecs = scaler.transform( featurevecs )

# Break up the vector into a training and testing vector
# Need to keep the training set somewhat large to get enough of the
# scarce results in the training set or the learning fails
X_train, X_test, y_train, y_test = train_test_split(
    featurevecs, scorenums, test_size = 0.333, random_state = 0 )

# Define a range of parameters we can use to do a grid search
# for the 'best' ones.
CLFPARAMS = {'gamma':[.0025, .005, 0.09, .01, 0.011, .02, .04],
             'C':[200, 300, 400, 500, 600]}

# do a simple cross validation
simple_clf = svm.SVC()
simple_clf = grid_search.GridSearchCV( simple_clf, CLFPARAMS, cv = 3 )
simple_clf.fit( X_train, y_train )
y_true, y_pred = y_test, simple_clf.predict( X_test )
match = 0
close = 0
count = 0
deviation = []
for i in range( len( y_true ) ):
    count += 1
    delta = np.abs( y_true[ i ] - y_pred[ i ] )
    if( delta == 0 ):
        match += 1
    elif( delta == 1 ):
        close += 1
    deviation = np.append( deviation, 
                           float( np.sum( np.abs( delta ) <= 1 ) ) )
avg = float( match ) / float( count )
close_avg = float( close ) / float( count )
#deviation.mean() = avg + close_avg
print( '{0} Accuracy (+/- 0) {1:0.4f} Accuracy (+/- 1) {2:0.4f} (+/- {3:0.4f}) '.format( test_type, avg, deviation.mean(), deviation.std() / 2.0, ), end = "" )

# "Original" code
# do LeaveOneOut item by item
clf = svm.SVC()
clf = grid_search.GridSearchCV( clf, CLFPARAMS, cv = 3 )
toleratePara = 1;
thecurrentScoreGraded = []
loo = cross_validation.LeaveOneOut( n = len( featurevecs ) )
for train, test in loo:
    try:
        clf.fit( featurevecs[ train ], scorenums[ train ] )
        rawPredictionResult = clf.predict( featurevecs[ test ] )

        errorVec = scorenums[ test ] - rawPredictionResult;
        print( len( errorVec ), errorVec )
        thecurrentScoreGraded = np.append( thecurrentScoreGraded, float( np.sum( np.abs( errorVec ) <= toleratePara ) ) / len( errorVec ) )
    except ValueError:
        pass
print( '{0} Accuracy (+/- {1:d}) {2:0.4f} (+/- {3:0.4f})'.format( test_type, toleratePara, thecurrentScoreGraded.mean(), thecurrentScoreGraded.std() / 2 ) )

Here are my results, you can see that they don't match. My actual work task was to see if changing exactly what sort of data was gathered to feed the learning engine would help with the accuracy or even if combining data into a bigger teaching vector would help so you see I'm working on bunch of combinations. Each pair of lines are for a type of learning data. The first line is my results, the second the results based on the "original" code.

original Accuracy (+/- 0) 0.2771 Accuracy (+/- 1) 0.6024 (+/- 0.2447) 
                        original Accuracy (+/- 1) 0.6185 (+/- 0.2429)
upostancurv Accuracy (+/- 0) 0.2718 Accuracy (+/- 1) 0.6505 (+/- 0.2384) 
                        upostancurv Accuracy (+/- 1) 0.6417 (+/- 0.2398)
npostancurv Accuracy (+/- 0) 0.2718 Accuracy (+/- 1) 0.6505 (+/- 0.2384) 
                        npostancurv Accuracy (+/- 1) 0.6417 (+/- 0.2398)
tancurv Accuracy (+/- 0) 0.2330 Accuracy (+/- 1) 0.5825 (+/- 0.2466) 
                        tancurv Accuracy (+/- 1) 0.5831 (+/- 0.2465)
npostan Accuracy (+/- 0) 0.3398 Accuracy (+/- 1) 0.7379 (+/- 0.2199) 
                        npostan Accuracy (+/- 1) 0.7003 (+/- 0.2291)
nposcurv Accuracy (+/- 0) 0.2621 Accuracy (+/- 1) 0.5825 (+/- 0.2466) 
                        nposcurv Accuracy (+/- 1) 0.5961 (+/- 0.2453)
upostan Accuracy (+/- 0) 0.3398 Accuracy (+/- 1) 0.7379 (+/- 0.2199) 
                        upostan Accuracy (+/- 1) 0.7003 (+/- 0.2291)
uposcurv Accuracy (+/- 0) 0.2621 Accuracy (+/- 1) 0.5825 (+/- 0.2466) 
                        uposcurv Accuracy (+/- 1) 0.5961 (+/- 0.2453)
upos Accuracy (+/- 0) 0.3689 Accuracy (+/- 1) 0.6990 (+/- 0.2293) 
                        upos Accuracy (+/- 1) 0.6450 (+/- 0.2393)
npos Accuracy (+/- 0) 0.3689 Accuracy (+/- 1) 0.6990 (+/- 0.2293) 
                        npos Accuracy (+/- 1) 0.6450 (+/- 0.2393)
curv Accuracy (+/- 0) 0.1553 Accuracy (+/- 1) 0.4854 (+/- 0.2499) 
                        curv Accuracy (+/- 1) 0.5570 (+/- 0.2484)
tan Accuracy (+/- 0) 0.3107 Accuracy (+/- 1) 0.7184 (+/- 0.2249) 
                        tan Accuracy (+/- 1) 0.7231 (+/- 0.2237)
1

1 Answers

0
votes

What do you mean by "structurally they are identical"? You use different subsets for training and for testing, and they have different sizes. If you don't use exactly the same training data, I don't see why you expect the results to be identical.

Btw, also take a look at the note on LOO in the documentation. LOO might have a high variance.