Say I have a learning curve that is sklearn learning curve SVM. And I'm also doing 5-fold cross-validation, which as far as I understand, it means splitting your training data into 5 pieces, train on four of them and testing on the last one.
So my question is, since for each data point in the LearningCurve
, the size of the training set is different (Because we want to see how will the model perform with the increasing amount of data), how does the cross-validation work in that case? Does it still split the whole training set into 5 equal pieces? Or it splits the current point training set into five different small pieces, then computes the test score? Is it possible to get a confusion matrix for each data point? (i.e. True Positive, True Negative etc.). I don't see a way to do that yet based on the sklearn learning curve code.
Does how many folds of cross-validation relate to how many pieces of training set we are splitting in train_sizes = np.linspace(0.1, 1.0, 5)
.
train_sizes, train_scores, test_scores, fit_times, _ = learning_curve(estimator,
X, y, cv,
n_jobs, scoring,
train_sizes)
Thank you!