24
votes

I want to create tensorflow records to feed my model; so far I use the following code to store uint8 numpy array to TFRecord format;

def _int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def _bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def _floats_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))


def convert_to_record(name, image, label, map):
    filename = os.path.join(params.TRAINING_RECORDS_DATA_DIR, name + '.' + params.DATA_EXT)

    writer = tf.python_io.TFRecordWriter(filename)

    image_raw = image.tostring()
    map_raw   = map.tostring()
    label_raw = label.tostring()

    example = tf.train.Example(features=tf.train.Features(feature={
        'image_raw': _bytes_feature(image_raw),
        'map_raw': _bytes_feature(map_raw),
        'label_raw': _bytes_feature(label_raw)
    }))        
    writer.write(example.SerializeToString())
    writer.close()

which I read with this example code

features = tf.parse_single_example(example, features={
  'image_raw': tf.FixedLenFeature([], tf.string),
  'map_raw': tf.FixedLenFeature([], tf.string),
  'label_raw': tf.FixedLenFeature([], tf.string),
})

image = tf.decode_raw(features['image_raw'], tf.uint8)
image.set_shape(params.IMAGE_HEIGHT*params.IMAGE_WIDTH*3)
image = tf.reshape(image_, (params.IMAGE_HEIGHT,params.IMAGE_WIDTH,3))

map = tf.decode_raw(features['map_raw'], tf.uint8)
map.set_shape(params.MAP_HEIGHT*params.MAP_WIDTH*params.MAP_DEPTH)
map = tf.reshape(map, (params.MAP_HEIGHT,params.MAP_WIDTH,params.MAP_DEPTH))

label = tf.decode_raw(features['label_raw'], tf.uint8)
label.set_shape(params.NUM_CLASSES)

and that's working fine. Now I want to do the same with my array "map" being a float numpy array, instead of uint8, and I could not find examples on how to do it; I tried the function _floats_feature, which works if I pass a scalar to it, but not with arrays; with uint8 the serialization can be done by the method tostring();

How can I serialize a float numpy array and how can I read that back?

5

5 Answers

16
votes

FloatList and BytesList expect an iterable. So you need to pass it a list of floats. Remove the extra brackets in your _float_feature, ie

def _floats_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.ones((3,)).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": _floats_feature(numpy_arr)}))
print(example)

features {
  feature {
    key: "bytes"
    value {
      float_list {
        value: 1.0
        value: 1.0
        value: 1.0
      }
    }
  }
}
5
votes

I will expand on the Yaroslav's answer.

Int64List, BytesList and FloatList expect an iterator of the underlying elements (repeated field). In your case you can use a list as an iterator.

You mentioned: it works if I pass a scalar to it, but not with arrays. And this is expected, because when you pass a scalar, your _floats_feature creates an array of one float element in it (exactly as expected). But when you pass an array you create a list of arrays and pass it to a function which expects a list of floats.

So just remove construction of the array from your function: float_list=tf.train.FloatList(value=value)

3
votes

I've stumbled across this while working on a similar problem. Since part of the original question was how to read back the float32 feature from tfrecords, I'll leave this here in case it helps anyone:

If map.ravel() was used to input map of dimensions [x, y, z] into _floats_feature:

features = {
    ...
    'map': tf.FixedLenFeature([x, y, z], dtype=tf.float32)
    ...
}
parsed_example = tf.parse_single_example(serialized=serialized, features=features)
map = parsed_example['map']
1
votes

Yaroslav's example failed when a nd array was the input:

numpy_arr = np.ones((3,3)).astype(np.float)

I found that it worked when I used numpy_arr.ravel() as the input. But is there a better way to do it?

0
votes

First of all, many thanks to Yaroslav and Salvador for their enlightening answers.

According to my experience, their methods only works when the input is a 1D NumPy array as the size of (n, ). When the input is a Numpy array with the dimension of more than 2, the following error info appears:

def _float_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float)
example = tf.train.Example(features=tf.train.Features(feature={"bytes": 
_float_feature(numpy_arr)}))
print(example)


TypeError: array([[0., 1., 2.],
   [3., 4., 5.]]) has type numpy.ndarray, but expected one of: int, long, float

So, I'd like to expand on Tsuan's answer, that is, flattening the input before it was fed into the TF example. The modified code is as follows:

def _floats_feature(value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

numpy_arr = np.arange(12).reshape(2, 2, 3).astype(np.float).flatten()
example = tf.train.Example(features=tf.train.Features(feature={"bytes": 
_float_feature(numpy_arr)}))
print(example)

In addition, np.flatten() is more applicable than np.ravel().