I'm trying to load MNIST dataset into arrays. When I use (X_train, y_train), (X_test, y_test)= mnist.load_data() I get an array y_test(10000,) but I want it to be in the shape of (10000,1). What is the difference between array(10000,1) and array(10000,)? How can I convert the first array to the second array?
1 Answers
Your first Array with shape (10000,)
is a 1-Dimensional np.ndarray
.
Since the shape
attribute of numpy Arrays is a Tuple and a tuple of length 1 needs a trailing comma the shape is (10000,)
and not (10000)
(which would be an int). So currently your data looks like this:
import numpy as np
a = np.arange(5) # >>> array([0, 1, 2, 3, 4]
print(a.shape) # >>> (5,)
What you want is an 2-Dimensional array with shape of (10000, 1)
.
Adding a dimension of length 1 doesn't require any additional data, it is basically and "empty" dimension. To add an dimension to an existing array you can use either np.expand_dims()
or np.reshape()
.
Using np.expand_dims
:
import numpy as np
b = np.array(np.arange(5)) # >>> array([0, 1, 2, 3, 4])
b = np.expand_dims(b, axis=1) # >>> array([[0],[1],[2],[3],[4]])
The function was specifically made for the purpose of adding empty dimensions to arrays. The axis keyword specifies which position the newly added dimension will occupy.
Using np.reshape
:
import numpy as np
a = np.arange(5)
X_test_reshaped = np.reshape(a, shape=[-1, 1]) # >>> array([[0],[1],[2],[3],[4]])
The shape=[-1, 1]
specifies how the new shape should look like after the reshape operation. The -1 itself will be replaced by the shape that 'fits the data' by numpy internally.
Reshape is a more powerful function than expand_dims
and can be used in many different ways. You can read more on other uses of it in the numpy docs. numpy.reshape()