0
votes

This was more of a question out of curiosity. I see 2 different 2D-arrays in sklearn load_digits dataset - images and data(http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html) and wonder if I can use them interchangeably for training. I ask because I was able to train a NN with images instead of data and saw that it converged to around 0.5% train error and 8% validation error with 80-20 split. If so, what's the difference in terms of features between the two?

The documentation doesn't mention much about the two except that you can use the images dataset for visualizing.

1

1 Answers

2
votes

Consider this:

from sklearn.datasets import load_digits
digits = load_digits()

In term of features, there is no differences between digits.data and digits.images. Both contain the pixel values of some 8*8 images. The first is a (1797, 64) numpy.ndarray while the second is a (1797, 8, 8) numpy.ndarray. The only difference is that digits.images[i] is an 8*8 reshape of digits.data[i] which is more suitable for visualization.