2
votes

I would like to use the Tensorflow Object Detection API for multi-channel images (e.g., 4-channels RGB + infrared). There is a tutorial how to change the API to add additional channels. However, the tutorial was written a year ago and the API has evolved since then and it seems that the API may accept multi-channel images now.

For example, in tensorflow-models/research/object-detection/data_decoders/tf_example_decoder.py in addition to fields.InputDataFields.image there is now fields.InputDataFields.image_additional_channels. Can it be used for any additional channels one has in an input image beyond standard 3 channels fed into fields.InputDataFields.image? I cannot figure out the purpose of this image_additional_channels and how to use it.

More general, my question is how to use the Tensorflow Object Detection API for multi-channel (>3) images. Are they accepted, i.e. taken into account, by default? I can feed them to train a model, but for inference in object_detection_tutorial notebook it cannot accept more than 3 channels, which makes me wonder whether it ignores the 4th channels during training.

I am using Tensorflow 1.12.0, latest commit (7a75bfc) of the Object Detection API. image_additional_channels was added in commit 9fce9c6 on 6 June 2018

2

2 Answers

1
votes

I'm trying do the same thing. It seems to accept additional channels during training (you need to add them during the creation of your TfExample file(s)). You also need to set num_additional_channels in the train_input_reader portion of the pipeline config file to be the number of channels you've added.

However, the script for exporting the model for inference does not seem to support exporting the model in a way that allows it to accept additional channels.

As you can see here: https://github.com/tensorflow/models/blob/master/research/object_detection/exporter.py#L129

The input tensor is only a standard image tensor and the tensor_dict[fields.InputDataFields.image_additional_channels] is not included in the input.

I'm about to fix this for my project, so I'll try to open a pull request and get them to merge it in.

0
votes

For the TFRecord creation you must edit the example here:

def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
    encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size

This portion of code loads and opens an image file path; if you have multiple images you have to load them in different variables. For example, I use something like this:

with tf.gfile.GFile(dictionary[0], 'rb') as fid:
    encoded_jpg = fid.read()
if ndata > 1:
    with tf.gfile.GFile(dictionary[1], 'rb') as fid:
        encoded_depth = fid.read()
    encoded_inputs = encoded_depth

where dictionary[0] contains the path of the rgb image and dictionary[1] contains the path of the depth image.

Then, the TFRecord must be created like this:

tf_example = tf.train.Example(features=tf.train.Features(feature={
    'image/height': dataset_util.int64_feature(height),
    'image/width': dataset_util.int64_feature(width),
    'image/filename': dataset_util.bytes_feature(filename),
    'image/source_id': dataset_util.bytes_feature(filename),
    'image/encoded': dataset_util.bytes_feature(encoded_jpg),
    'image/additional_channels/encoded': dataset_util.bytes_feature(encoded_inputs),
    'image/format': dataset_util.bytes_feature(image_format),
    'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
    'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
    'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
    'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
    'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
    'image/object/class/label': dataset_util.int64_list_feature(classes),
}))

However, the script for exporting the model for inference does not seem to support exporting the model in a way that allows it to accept additional channels.

As you can see here: https://github.com/tensorflow/models/blob/master/research/object_detection/exporter.py#L129

The input tensor is only a standard image tensor and the tensor_dict[fields.InputDataFields.image_additional_channels] is not included in the input.

I'm about to fix this for my project, so I'll try to open a pull request and get them to merge it in.

I would like to know that too! I managed to train without problems but I can't use the trained model since the additional channel cannot be loaded... Did you fixed that?