1
votes

Iā€™m training few models on MNIST dataset using Sklearn, how do I train the linear model using only two digits 4 and 9 (two classes) from the MNIST dataset?

  • how to pick my X_test,X_train, y_test,y_train?
1
Thanks for answering, what if I had to choose only 4? ā€“ Shopping Deals
see my updated answer ā€“ seralouk

1 Answers

2
votes

So you only want to use the images of the digit 4 and 9.

You need indexing like X[np.logical_or(y == 4, y == 9)]:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_digits

digits = load_digits()

X = digits.data
y = digits.target

#Select only the digit 4 and 9 images
X = X[np.logical_or(y == 4, y == 9)]
y = y[np.logical_or(y == 4, y == 9)]

# verify selection
np.unique(y)
#array([4, 9])

# Now split them
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=200, test_size=100)

To use only the digit 4:

X = digits.data
y = digits.target

#Select only the digit 4 and 9 images
X = X[y == 4]
y = y[y == 4]