3
votes

Is there any way to use pre-trained models in Object Detection API of Tensorflow, which trained for RGB images, for single channel grayscale images(depth) ?

1

1 Answers

3
votes

I tried the following approach to perform object detection on Grayscale (1 Channel images) using a pre-trained model (faster_rcnn_resnet101_coco_11_06_2017) in Tensorflow. It did work for me.

The model was trained on RGB Images, So I just had to modify certain code in object_detection_tutorial.ipynb, available in the Tensorflow Repo.

First Change: Note that exisitng code in the ipynb was written for 3 Channel Images, So change the load_image_into_numpy array function as shown

From

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)

To

def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  channel_dict = {'L':1, 'RGB':3} # 'L' for Grayscale, 'RGB' : for 3 channel images
  return np.array(image.getdata()).reshape(
      (im_height, im_width, channel_dict[image.mode])).astype(np.uint8)

Second Change: Grayscale images have only data in 1 channel. To perform object detection we need 3 channels(the inference code was written for 3 channels)

This can be achieved in two ways. a) Duplicate the single channel data into two more channels b) Fill the other two channels with Zeros. Both of them will work, I used the first method

In the ipynb, go the section where you read the images and convert them into numpy arrays (the forloop at the end of the ipynb).

Change the code From:

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)

To this:

for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  if image_np.shape[2] != 3:  
      image_np = np.broadcast_to(image_np, (image_np.shape[0], image_np.shape[1], 3)).copy() # Duplicating the Content
      ## adding Zeros to other Channels
      ## This adds Red Color stuff in background -- not recommended 
      # z = np.zeros(image_np.shape[:-1] + (2,), dtype=image_np.dtype)
      # image_np = np.concatenate((image_np, z), axis=-1)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)

That's it, Run the file and you should see the results. These are my results

Detection on Grayscale Image Detection on RGB Image