TensorFlow Dataset Generator With Mixed Datatypes

Question

I'm using the TensorFlow Datasets API (https://www.tensorflow.org/guide/datasets) and in particular, i'm using it with the TensorFlow Estimators API (https://www.tensorflow.org/guide/datasets_for_estimators) which recommends using a generator function.

I'm having trouble writing a generator function which yields features with different output types (e.g., a mix of int, float, and string.) I've figured out how to specify feature+label types different from the generator...but only when all the label types are identical.

However...suppose you have a variety of feature types to emit (in the case of the typical imports85 TensorFlow demonstration, for example, you would emit car make and model as strings (which later get categorized downstream) as well as Highway-MPG as float32 and number-of-doors as int. How does one specify on the Dataset from_generator call the various feature types?

dataset = tf.data.Dataset. from_generator(generator=self._generator, output_types=(tf.float32, tf.int32), output_shapes=(tf.TensorShape([None]),tf.TensorShape([1])))

I've already tried the obvious approach of using output_types=((tf.float32, tf.float32, tf.string, tf.string), tf.int32) without luck. Any help would be appreciated.

kvish kvish · Accepted Answer · 2018-09-27T17:46:18

From the official documentation:

It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.

So you might need to store them as strings and then evaluate them using functions like decode_raw for example.

TensorFlow Dataset Generator With Mixed Datatypes

1 Answers