How to use python generators with neural networks that take in data with x_train and y_train variables?

Question

I have used keras' ImageDataGenerator to create labelled data by following the example in Ch 5 in Francois Chollet's book "Deep Learning with Python." As an example, I subdivided my training directory into cat and dog subdirectories, and then populated it with images. Using the following code, I created a variable that I believe contains both the image and the label.

from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255)
train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

Later on , after defining a model, you would use the following code to run the model

history = model.fit_generator(
   train_generator,
   steps_per_epoch = 100, 
   epochs =30,
   validation_data = validation_generator, 
   validation_step=50)

Many online examples of Neural Networks have separate variables that hold the test and training data (e.g. x_train, y_train, x_test, y_test). This seems the most popular method. As an example:

(x_train, y_train), (x_test, y_test) = mnist.load_data()

And you would run the model with the following code:

history = model.fit(x_train, y_train, batch_size=128, epochs=5, verbose=False, validation_split=.1)
loss, accuracy  = model.evaluate(x_test, y_test, verbose=False)

Is there a way to convert the data created using the ImageDataGenerator into a format that would allow me to create a x_train, y_train, x_test, y_test data that's correctly formatted? Thanks

Dawei Wang Dawei Wang · Accepted Answer · 2020-09-09T16:26:45

Disclaimer: I never used Keras' ImageDataGenerators before but from the code you provided, I'm guessing you would have to create different instances of ImageDataGenerators for train, valid and test:

train_generator = train_datagen.flow_from_directory(
   train_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')
valid_generator = train_datagen.flow_from_directory(
   valid_dir,
   target_size = (150, 150)
   batch_size = 20,
   class_mode = 'binary')

and so on... Also, model.fit_generator() is deprecated.

The best workflow in my experience is to write the data generator yourself. There are a lot of examples on this, for example (https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly). Basically, instead of returning the data by looping over the entire dataset using return in the function, you loop over the entire dataset by batch and yield the data in the function.

How to use python generators with neural networks that take in data with x_train and y_train variables?

1 Answers