Using ImageDataGenerator() for medical images

Question

I'm doing 2D brain segmentation using a modified U-Net.

I wonder if anyone can provide/point out a link on how to use the ImageDataGenerator() function in Keras for a stack of medical images. At the moment what I've done is I converted every slice to .tiff format (both the image and its corresponding mask) and put them into different folders (e.g. train, valid, test). This works fine for me.

However, I don't want to keep converting each slice from every MRI volume to .tiff image (there will be thousands of MRI volume coming). It would be great if I can just read everything from each volume image and mask.

All examples I found when using the ImageDataGenerator() function is the folder (e.g. train) contains individual images.

My image data is in .img and .hdr (each volume has 64 slices). Corresponding masks also in .img and .hdr.

So my data folder looks like this:

--Train
  -img
      -fetus1.img
      -fetus1.hdr
      -fetus2.img
      -fetus2.hdr
  -mask
      -fetus1.img
      -fetus1.hdr
      -fetus2.img
      -fetus2.hdr

--Valid
  -img
      -fetus3.img
      -fetus3.hdr
  -mask
      -fetus3.img
      -fetus3.hdr

--Test
  -img
      -fetus4.img
      -fetus4.hdr
  -mask
      -fetus4.img
      -fetus4.hdr

Thanks a lot in advance guys

have you implemented some code already ? please share, so that it will easy for us to guide you . — venkata krishnan
if you have the python package that breaks down the hdr to slices, you can use them inside the custom Image data generator and break the images and return it ? — venkata krishnan
Hi Venkata, No I haven't implemented it because I have no idea how to start it. I have written some python code but that is taking all images that was already converted into .tiff format. When you said 'if you have the python package that breaks down the hdr to slices' do you know any python package that does this? TQ — ErickYA

Smyke Smyke · Accepted Answer · 2019-08-03T13:35:11

Okay, I have a similar problem (although I'm not working with medical images) and found a solution, so I hope others will find it useful too.

I assume that you need a custom function to retrieve the images in batches, because they won't fit all into the memory at once, because the *.hdr file format is not supported, and because the existing keras helper functions don't use regression. (I guess you're doing some type of segmentation, if you're using the u-net?)
I also assume that you need the ImageDataGenerator, because you don't want to implement the data augmentation yourself.

So, because of 1) you'll need to use the fit_generator function in conjuction with the IDG, the only problem is that the ImageDataGenerator (IDG) does not support custom generators.

There are actually instances where you would use the IDG with the fit_generator function: The IDG flow function returns an Iterator of type NumpyArrayIterator. You can't use this one because it requires the data to fit into the working memory. The way the IDG.flow function is used/works is that you first create an instance of the IDG object and then call the flow function which creates and returns a NumpyArrayIterator which holds a reference to your IDG object.

One solution is now to write your custom DataGenerator which inherits from the keras.preprocessing.image.Iterator class and implements the _get_batches_of_transformed_samples (see here). Then you extend the IDG class and write a flow_from_generator function which returns an instance of your custom DataGenerator. This sounds more taxing than it really is, but be sure to familiarize yourself with the IDG, NumpyArrayIterator and Iterator code.

Here is how this would look like:

class DataGenerator(keras.preprocessing.image.Iterator):

    def__init__(self, image_data_generator, *args, **kwargs):
        #init whatever you need
        self.image_data_generator = image_data_generator
        #call Iterator constructor:
        super(DataGenerator, self).__init__(number_of_datapoints, batch_size, shuffle, shuffle_seed)
    def _get_batches_of_transformed_samples(self, index_array):
        ''' Here you retrieve the images and apply the image augmentation, 
            then return the augmented image batch.

            index_array is just a list that takes care of the shuffling for you (see super class), 
            so this function is going to be called with index_array=[1, 6, 8] 
            if your batch size is 3
        '''
        x_transformed = np.zeros((batch_size, x_img_size, y_img_size, input_channel_num), dtype_float32) 
        y_transformed = np.zeros((batch_size, x_img_size, y_img_size, output_channel_num), dtype_float32) 
        for i, j in enumerate(index_array):
            x = get_input_image_from_index(j)
            y = get_output_image_from_index(j)
            params = self.image_data_generator.get_random_transform(self.img_shape)
            x = self.image_data_generator.apply_transform(x, params)
            x = self.image_data_generator.standardize(x)
            x_transformed[i] = x
            y = self.image_data_generator.apply_transform(y, params)
            y = self.image_data_generator.standardize(y)
            y_transformed[i] = y
        return(x_transformed, y_transformed)

class ImageDataGeneratorExtended(keras.preprocessing.image.ImageDataGenerator):
     def flow_from_generator:(self, *args, **kwargs):
         return DataGenerator(self, *args, **kwargs)

Okay, I hope that helps. Ive used my own version of the above code, but haven't completetly tested it (although it works for me now), so take it with a grain of salt :P

For the *.hdr issue: it seems that you can use the ImageIO package (it supports the HDR and DICOM format, although I've never personally used that library).

Using ImageDataGenerator() for medical images

1 Answers