0
votes

I’m training linear model on MNIST dataset, but I wanted to train only on one digit that is 4. How do I choose my X_test,X_train, y_test, y_train?

2

2 Answers

0
votes

Your classifier needs to learn to discriminate between sets of different classes. If you only care about digit 4, you should split your training and testing set into:

  • Class 4 instances
  • Not class 4 instances: union of all other digits

Otherwise the train/test split is still the typical one, where you want to have no overlap.

0
votes

If you only need to recognize 4s it's a binary classification problem, so you just need to create a new target variable: Y=1 if class is 4, Y=0 if class is not 4.

  • Train_X will be unchanged
  • Train_Y will be your new target variable related to Train_X
  • Test_X will be unchanged
  • Test_Y will be your new target variable related to Test_X. <\ul>

    Data will be a bit unbalanced but it should not be an issue!