Test set partition in scikit-learn

Question

In "plot_gmm_classifier.py" at http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_classifier.html, the training and test data is defined as follows.

skf = StratifiedKFold(iris.target, n_folds=4)
# Only take the first fold.
train_index, test_index = next(iter(skf))

X_train = iris.data[train_index]
y_train = iris.target[train_index]
X_test = iris.data[test_index]
y_test = iris.target[test_index]

It occurs to me that labels are provided for the test data in y_test = iris.target[test_index]. If this is the case, then why? We shouldn't have labelled test data. If this is not the case, what else is happening here?

Ando Saabas Ando Saabas · Accepted Answer · 2013-12-03T20:17:41

In this particular example, the test data labels are used so that the accuracy of the method can be evaluated (by comparing predicted test labels to true test labels) and for plotting the true labels on the graph.

Test set partition in scikit-learn

1 Answers