1
votes

I'm trying to load MNIST dataset into arrays. When I use (X_train, y_train), (X_test, y_test)= mnist.load_data() I get an array y_test(10000,) but I want it to be in the shape of (10000,1). What is the difference between array(10000,1) and array(10000,)? How can I convert the first array to the second array?

1

1 Answers

3
votes

Your first Array with shape (10000,) is a 1-Dimensional np.ndarray. Since the shape attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,) and not (10000) (which would be an int). So currently your data looks like this:

import numpy as np
a = np.arange(5) #  >>> array([0, 1, 2, 3, 4]
print(a.shape) #    >>> (5,)

What you want is an 2-Dimensional array with shape of (10000, 1). Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims() or np.reshape().

Using np.expand_dims:

import numpy as np
b = np.array(np.arange(5))  # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1)  # >>> array([[0],[1],[2],[3],[4]])

The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.

Using np.reshape:

import numpy as np
a = np.arange(5) 
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])

The shape=[-1, 1] specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally. Reshape is a more powerful function than expand_dims and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()