Python error in calculating deep learning model value loss - "ValueError: Data Cardinality is ambiguous"

Question

Despite having previous coding experience in R and Python, I am brand new to Jupyter Notebook, tensorflow, and deep learning model-building, so I am looking for someone to help me diagnose this error. I am following a tutorial demonstrating how to classify imagery with deep learning modules (https://www.youtube.com/watch?v=wQ8BIBpya2k&list=PLQVvvaa0QuDfhTox0AjmQ6tvTgMBZBEXN). The model loads the mnist image dataset, and classifies the images between 1-10. Running three epochs, the overall model accuracy comes out to 97%.

#Import module
import tensorflow as tf

#Import image dataset
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

#Defining training and test
x_train = tf.keras.utils.normalize(x_train, axis = 1)
x_test = tf.keras.utils.normalize(x_train, axis = 1)

#Define model
model = tf.keras.models.Sequential() #Feed-forward model

#Define imput layer
model.add(tf.keras.layers.Flatten()) #Input layer

#Two hidden layers
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu)) #Rectify linear, default
model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))

#Output layer
#Corresponds to the number of classifications; ten in this case
#No relu because it is a probability distribution
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))

#Defining training parameters for the model
#Loss is error, or what you have gotten wrong
model.compile(optimizer = 'adam',
             loss= 'sparse_categorical_crossentropy', #could use binary if looking for cats/dogs
             metrics = ['accuracy'])

#Training the model
#Epoch = how many times the model runs
model.fit(x_train, y_train, epochs = 3)

Output is as follows:

Epoch 1/3
1875/1875 [==============================] - 1s 578us/step - loss: 0.2612 - accuracy: 0.9236
Epoch 2/3
1875/1875 [==============================] - 1s 571us/step - loss: 0.1068 - accuracy: 0.9668
Epoch 3/3
1875/1875 [==============================] - 1s 562us/step - loss: 0.0721 - accuracy: 0.9773

When I try to figure out the model loss...

#Printing the model loss
val_loss, val_acc = model.evaluate(x_test, y_test)
print(val_loss, val_acc)

...I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-3452f7a38776> in <module>
----> 1 val_loss, val_acc = model.evaluate(x_test, y_test)
      2 print(val_loss, val_acc)

~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs)
    106   def _method_wrapper(self, *args, **kwargs):
    107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
--> 108       return method(self, *args, **kwargs)
    109 
    110     # Running inside `run_distribute_coordinator` already.

~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\training.py in evaluate(self, x, y, batch_size, verbose, sample_weight, steps, callbacks, max_queue_size, workers, use_multiprocessing, return_dict)
   1354             use_multiprocessing=use_multiprocessing,
   1355             model=self,
-> 1356             steps_per_execution=self._steps_per_execution)
   1357 
   1358       # Container that configures and calls `tf.keras.Callback`s.

~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution)
   1115         use_multiprocessing=use_multiprocessing,
   1116         distribution_strategy=ds_context.get_strategy(),
-> 1117         model=model)
   1118 
   1119     strategy = ds_context.get_strategy()

~\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, sample_weight_modes, batch_size, epochs, steps, shuffle, **kwargs)
    280             label, ", ".join(str(i.shape[0]) for i in nest.flatten(data)))
    281       msg += "Please provide data which shares the same first dimension."
--> 282       raise ValueError(msg)
    283     num_samples = num_samples.pop()
    284 

ValueError: Data cardinality is ambiguous:
  x sizes: 60000
  y sizes: 10000
Please provide data which shares the same first dimension.

What can I do to fix this error and compute the loss?

mujjiga mujjiga · Accepted Answer · 2020-08-20T19:03:46

As the error says, your number of samples in x_test corresponding to y_test do not match

  x sizes: 60000
  y sizes: 10000

If you check your code you will see that you have a bug while creating x_test

It should be :

x_test = tf.keras.utils.normalize(x_test, axis = 1)

Not

x_test = tf.keras.utils.normalize(x_train, axis = 1)

Python error in calculating deep learning model value loss - "ValueError: Data Cardinality is ambiguous"

1 Answers