0
votes

I have 4 separate image folders with their own separate labels(images in folder 1 correspond to label 1 etc).

However, the image dataset is unbalanced where I have too much images with label 1 and 2 but not enough images for label 3 and 4.

As such, I decided to try to do image augmentation to boost my image dataset.

Here's how my code looks like.

train_datagen = ImageDataGenerator(rotation_range=20,width_shift_range=0.2, height_shift_range=0.2,preprocessing_function=preprocess_input,horizontal_flip=True)


train_generator=train_datagen.flow_from_directory('/trainImages',target_size=(80,80),batch_size=32,class_mode='categorical')

All the image folders are in the path "/trainImages"(e.g:"/trainImages/1","/trainImages/2")

The problem with this approach is that the augmentation is also done on images in folder 1 and 2(which do not need augmentation)

Is there a way to customize the ImageDataGenerator to ignore the image augmentation arguments for folder 1 and 2?

I'm rather new at both Python and Keras...

1

1 Answers

2
votes

You can create two folder structures:

  • Folder 1 - Structure containing only classes not to augment
  • Folder 2 - Structure containing classes to augment

Then you create two distinct generators.

dataGen1 = ImageDataGenerator(...)
dataGen2 = ImageDataGenerator(.... withAugmentation ....)

sequencer1 = dataGen1.flow_from_directory(dir1, ....)
sequencer2 = dataGen2.flow_from_directory(dir2, ....)

Now you create your own generator, which should contain a list of indices for each of the sequencers.

This code was not tested, if there are bugs you can comment so I test it tomorrow

def myGenerator(seq1, seq2, multiplySeq2By):

    generators = [seq1,seq2]

    #here we're creating indices to get data from the generators
    len1 = len(seq1)
    len2 = len(seq2)

    indices1 = np.zeros((len1,2))
    indices2 = np.ones((len2,2))

    indices1[:,1] = np.arange(len1) #pairs like [0,0], [0,1], [0,2]....
    indices2[:,1] = np.arange(len2) #pairs like [1,0], [1,1], [1,2]....

    indices2 = [indices2] * multiplySeq2By #repeat indices2 to generate more from it
    allIndices = np.concatenate([indices1] + indices2, axis=0)

    #you can randomize the order here:
    np.random.shuffle(allIndices)

    #now we loop the indices infinitely to get data from the original generators
    while True: 
        for g, el in allIndices:
            x,y = generators[g][el]
            yield x,y #when training, or "yield x" when testing                    

        #you may want another round of shuffling here for the next epoch. 

Remember to use steps_per_epoch = len1 + (multiplySeq2By * len2)