2
votes

I'm trying to create a batch dataset from a tensor dataset and having trouble with the shape. I saw some references to enqueue but a couple years out of date and the tf.dataset.batch doesn't have any options. My dataset looks like:

X_test1 = tensorflow.data.Dataset.from_tensors((X_test_images, X_test_labels))
<TensorDataset shapes: ((5512, 256, 256, 3), (5512,)), types: (tf.float32, tf.int32)>

Which is image arrays of 256 x 256 with 3 color channels and a label vector for 5512 images / labels.

But when I try to batch it, it creates a new dimension:

new = X_test1.batch(32)
<BatchDataset shapes: ((None, 5512, 256, 256, 3), (None, 5512)), types: (tf.float32, tf.int32)>

What I really want is:

<BatchDataset shapes: ((None, 256, 256, 3), (None,)), types: (tf.float32, tf.int32)>

Where the None is the batch 32, with maybe some remainder in the last batch.

Thanks!!

1
I have a similar problem, can you please help me? My dataset is like this <BatchDataset shapes: ((None, 256, 256, 3), (None,)), types: (tf.float32, tf.int32)> Here what does None means?Samar Pratap Singh

1 Answers

1
votes

You should initialize the dataset using from_tensor_slices:

X_test1 = tf.data.Dataset.from_tensor_slices((X_test, y_test))
new = X_test1.batch(32)

Here the Documentation