I'm trying to train a model which in my opinion is taking too long compared to other datasets given that it's taking about 1h and 20min to complete one epoch. I think that the problem is because the dataset is not being stored on ram, but I'm not sure of this.
The code is the following:
def load_data():
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(path1, target_size=(200, 200), batch_size=32, class_mode="binary")
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(path2, target_size=(200, 200),batch_size=32, class_mode="binary")
return train_generator, test_generator
Model:
- Sequential model
- 2 Convolutional layers with 32 neurons, activation = relu.
- 1 Convolutional layer with 64 neurons, activation = relu.
- Flattening and Dense layer, activation = relu.
- Dropout of 0.5
- Output layer (Dense) with sigmoid activation.
- Adam optimizer.
- Loss: binary cross entropy.
Fit:
model.fit_generator(x, steps_per_epoch=500, epochs=50, validation_data=y, validation_steps=len(y)/32, callbacks=[tensorboard])
- My dataset has 1201 images and 2 classes.
- I built the model following this tutorial.
- My GPU is a GTX 1060 3gb.
- 8gb of ram.
- The images are being reshaped to 200x200.
If you could help me I'd appreciate it. Thank you very much!
EDIT: I've done what Matias Valdenegro suggested, even though it's true that the time it takes to complete an epoch is lower, what I realized is that it takes my GPU 10s to complete a step. This is what I really wanted to improve. Sorry for the confusion.
nvprof <your_program> [args...]
to get CUDA kernel execution time. – Kh40tiK