Train model in batches using fit_generator

Question

My model has 100 000 training samples of images, how do I modify my code below to train it in batches? With model.fit_generator I have to specify this inside the generator function:

def data_generator(descriptions, features, n_step, max_sequence):
    # loop until we finish training
    while 1:
        # loop over photo identifiers in the dataset
        for i in range(0, len(descriptions), n_step):
            Ximages, XSeq, y = list(), list(),list()
            for j in range(i, min(len(descriptions), i+n_step)):
                image = features[j]
                # retrieve text input
                desc = descriptions[j]
                # generate input-output pairs
                in_img, in_seq, out_word = preprocess_data([desc], [image], max_sequence)
                for k in range(len(in_img)):
                    Ximages.append(in_img[k])
                    XSeq.append(in_seq[k])
                    y.append(out_word[k])
            # yield this batch of samples to the model
            yield [[array(Ximages), array(XSeq)], array(y)]

My model.fit_generator code:

model.fit_generator(data_generator(texts, train_features, 1, 150), 
                    steps_per_epoch=1500, epochs=50, callbacks=callbacks_list, verbose=1)

Any assistance would be great, I'm training on a cloud 16GB V100 Tesla

Edit: My image caption model creates a training sample for each token in the DSL(250 tokens). With a dataset of 50 images (equivalent to 12500 training samples) and a batch size of 1, I get an OOM. With about 32 (equivalent to 8000 samples and a batch size of 1 it trains just fine.) My question is can I optimize my code better, or is my only option to use multiple GPUs?

Fix:

Steps_per_epoch must be equal to ceil(num_samples / batch_size), so if the dataset has 1500 samples, steps_per_epoch should be equal to 1500. I also reduced my LSTM sliding window from 48 to 24

steps_per_epoch: Integer. Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to ceil(num_samples / batch_size). Optional for Sequence: if unspecified, will use the len(generator) as a number of steps.

If the answer below is not enough, you should explain your question properly. That answers exactly what you're asking. — Daniel Möller
So are you saying that the code cannot be optimized. The only solution is to train in multiple V100 GPU's? — Paul Gwamanda

Daniel Möller Daniel Möller · Accepted Answer · 2020-03-05T13:18:37

The generators already return batches.

Every yield is a batch. It's totally up to you to design the generator with the batches the way you want.

In your code, the batch size is n_step.

Train model in batches using fit_generator

2 Answers