4
votes

I am finding it hard to understand how flow_from_directory of ImageDataGenerator works, I am using the following code to augment image data for my CNN model, as no of training images available are very less.

batch_size = 16
from keras.preprocessing.image import ImageDataGenerator
train_transformed = 'dataset/train_transformed'
train_datagen = ImageDataGenerator(
    rescale=1./255,
    horizontal_flip=True,
    fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(
    'dataset/train',
    target_size=(150, 150),
    batch_size=batch_size,
    class_mode='binary',
    save_to_dir=train_transformed,
    save_prefix='train_aug',
    save_format='png')

Its a binary classification problem having 20 positive and 20 negative images. So i have dataset/train folder with 2 subfolders having 20 images each. When i train the model with above image generator, i can see 4160 images being saved in dataset/train_transformed folder and presuming 4160 images being used for training the model.

model.fit_generator(
    train_generator,
    steps_per_epoch=1000 // batch_size,
    epochs=5,
    validation_data=validation_generator,
    validation_steps=100 // batch_size)

According to my understanding, No. of samples in each epoch = batch_size X steps_per_epoch
As my steps_per_epoch = 1000/16 = 62,
#Samples in each epoch should be 62 x 16 = 992
No of epochs is set to 5, so total generated images should be 992 x 5 = 4960.
And no of images generated are random with same hyperparameters.
Just needed an explanation for above configuration.

1

1 Answers

2
votes

Your calculations seem to be correct. Also be aware that flow_from_directory has a shuffle argument (true by default). This means that the generator will yield 992 images during each epoch (batch_size times steps_per_epoch), but the order in which they appear will be different for each epoch.