Train and test split set using ImageDataGenerator and flow

Question

I'm trying to make a network using augmentation.

First I use ImageDataGenerator with validation_split=0.2.

train_generator = ImageDataGenerator(
    rotation_range=90,
    zoom_range=0.15,
    width_shift_range=0.2,
    height_shift_range=0.2,
    fill_mode="nearest",
    validation_split=0.2
)

Then I tried to create a augmented training data end a not augmented validation data. I have to use flow instead of flow_from_directory.

train_augm = train_generator.flow([data_train, ebv_train], z_train, batch_size=128,subset='training')
valid_augm = train_generator.flow([data_train, ebv_train], z_train, batch_size=1,subset='validation')

I get this error menssage.

ValueError: Training and validation subsets have different number of classes after the split. If your numpy arrays are sorted by the label, you might want to shuffle them.

What I'm doing wrong?

The model.fit code is something like this

training_history = model.fit(
    train_augm,
    steps_per_epoch= len(data_train)//128,
    epochs=10,
    validation_data=valid_augm    
)

Nazmul Hasan Nazmul Hasan · Accepted Answer · 2020-06-22T03:23:16

The number of classes in the training data is not equal to the number of classes in the validation data. If you didn't shuffle it, please shuffle it. If you're still getting the error, I am assuming that some of the class has a very small number of data. you can reshuffle it, but sometimes you will get the same error. What you can do is, add more data to that specific class or manually split into training and validation.

For random split, you can take a look at train_test_split library.

Train and test split set using ImageDataGenerator and flow

1 Answers