I am using KNearestNeighbors from sklearn to perform some learning. I split my dataset into training(70%) and testing (30%) from a dataset with 30,000 observations. However, I am unable to understand why 2 methods of evaluating the same model would yield such different results.
More specifically, when I take the r^2 value of the testing set all at once I get a much higher score (~0.70) than when I do kFold cross validation on the testing set. Why are these scores so different when the exact same model is being tested on exactly the same data. I am sure I am doing something wrong but I have no clue what. Please help!
r2_scorer = make_scorer(r2_score)
clf = neighbors.KNeighborsRegressor()
clf = clf.fit(X_train,y_train)
score1 = r2_score(y_test,clf.predict(X_test))
> 0.68777300248206585
kfold = model_selection.KFold(n_splits=10, random_state=42)
scores2 = cross_val_score(clf,X_test,y_test,cv = kfold, scoring = r2_scorer)
scores2
>array([ 0.05111285, 0.65697228, 0.57468009, 0.6706573 , 0.46720042,
0.3763054 , 0.56881947, 0.32569462, -0.16607888, -0.6862521 ])
scores2.mean()
> 0.28391114469744039
scores2.std()
> 0.4118551721575503