4
votes

I want to build a data pipeline using tensorflow dataset. Because each data has different shapes, I can't build a data pipeline.

import tensorflow_datasets as tfds
import tensorflow as tf

dataset_builder = tfds.builder("oxford_flowers102")
dataset_builder.download_and_prepare()

train_data = dataset_builder.as_dataset(split=tfds.Split.TRAIN)
train_data = train_data.repeat().batch(32)
train_data = train_data.prefetch(tf.data.experimental.AUTOTUNE)
train_iterator = train_data.make_one_shot_iterator()
train_next_element = train_iterator.get_next()

with tf.Session() as sess:
    train_batch = sess.run(train_next_element)

Above code gives me the error:

"tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 1. First element had shape [500,666,3] and element 1 had shape [752,500,3]."

I want all images to be in the shape of [224,224,3]. How can I reshape images in the existing tensorflow dataset?

1

1 Answers

1
votes

You can dynamically resize the images like:

train_data = train_data.map(lambda image: tf.image.resize_image_with_crop_or_pad(image, 224, 224))

right before doing train_data = train_data.repeat().batch(32). Also, using the tf.data.Dataset.map(...) method you can apply a variety of transformations on your images before batching them.