0
votes

Load popular digits dataset from sklearn.datasets module and assign it to variable digits.

Split digits.data into two sets names X_train and X_test. Also, split digits.target into two sets Y_train and Y_test.

Hint: Use train_test_split method from sklearn.model_selection; set random_state to 30; and perform stratified sampling. Build an SVM classifier from X_train set and Y_train labels, with default parameters. Name the model as svm_clf.

Evaluate the model accuracy on testing data set and print it's score. i used the following code

import sklearn.datasets as datasets
import sklearn.model_selection as ms
from sklearn.model_selection import train_test_split


digits = datasets.load_digits();
X = digits.data
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30)

print(X_train.shape)
print(X_test.shape)

from sklearn.svm import SVC
svm_clf = SVC().fit(X_train, y_train)
print(svm_clf.score(X_test,y_test))

I got the below output.

(1347,64)
(450,64)
0.4088888888888889

But i am not able to pass the test. Can someone help on what is wrong?

1

1 Answers

4
votes

You are missing the stratified sampling requirement; modify your split to include it:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30, stratify=y)

Check the documentation.