0
votes

In "plot_gmm_classifier.py" at http://scikit-learn.org/stable/auto_examples/mixture/plot_gmm_classifier.html, the training and test data is defined as follows.

skf = StratifiedKFold(iris.target, n_folds=4)
# Only take the first fold.
train_index, test_index = next(iter(skf))

X_train = iris.data[train_index]
y_train = iris.target[train_index]
X_test = iris.data[test_index]
y_test = iris.target[test_index]

It occurs to me that labels are provided for the test data in y_test = iris.target[test_index]. If this is the case, then why? We shouldn't have labelled test data. If this is not the case, what else is happening here?

1

1 Answers

2
votes

In this particular example, the test data labels are used so that the accuracy of the method can be evaluated (by comparing predicted test labels to true test labels) and for plotting the true labels on the graph.