0
votes

I'm using tensorflow 2.1 to build data pipeline. I wrote a function to do data preprocessing:

def preprocessing(path):
    path = str(path.numpy(), 'utf-8')
    label = Path(path).parent.name
    image = tf.io.read_file(path)
    image = tf.image.decode_image(image)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.image.central_crop(image, central_fraction=0.5)
    image = tf.image.resize(image, size=[224, 224])
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta=0.2)
    return image, label

When I verify processing function using following codes, it works.

ds = tf.data.Dataset.list_files('../datasets/hymenoptera_data/train/ants/*.jpg')
path = next(iter(ds))
image, label = preprocessing(path)
plt.imshow(image)
plt.show()

and the result of print(path) is tf.Tensor(b'..\datasets\hymenoptera_data\train\ants\886401651_f878e888cd.jpg', shape=(), dtype=string) But if i use map() to process generated ds, the error comes out:

ds_new = ds.map(preprocessing, num_parallel_calls=tf.data.experimental.AUTOTUNE)
for i in ds_new.take(1):
    plt.imshow(i)
    plt.show()

AttributeError: 'Tensor' object has no attribute 'numpy', this error happend due to path = str(path.numpy(), 'utf-8') in preprocessing function.

I don't understand why, who can help on this issue, really appreciate!

1

1 Answers

1
votes

Try this function for preprocessing:

def preprocessing(path):
    label = tf.strings.split(path, os.path.sep)[-2]
    image = tf.io.read_file(path)
    image = tf.image.decode_jpeg(image, channels=3)
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    image = tf.image.central_crop(image, central_fraction=0.5)
    image = tf.image.resize(image, size=[224, 224])
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_brightness(image, max_delta=0.2)
    return image, label

Works both with plain loading and with tf.data:

import tensorflow as tf
import os
import matplotlib.pyplot as plt

paths = tf.data.Dataset.list_files('images/*.jpg')
path = next(iter(paths))
image, label = preprocessing(path)
plt.imshow(image)
plt.show()

filenames = tf.data.Dataset.list_files('images/*.jpg')
ds = filenames.map(preprocessing)
for image, label in ds.take(1):
    plt.imshow(image)
    plt.show()