2
votes

I have relatively small CNN

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(input_shape=(400,400,3), filters=6, kernel_size=5, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=12, kernel_size=3, padding='same', activation='relu'),
    tf.keras.layers.Conv2D(filters=24, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Conv2D(filters=32, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Conv2D(filters=48, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Conv2D(filters=64, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Conv2D(filters=96, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Conv2D(filters=128, kernel_size=3, strides=2, padding='valid', activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(240, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')

I use the following code to measure model performance:

for img_per_batch in [1, 5, 10, 50]:
    # warm up the model
    image = np.random.random(size=(img_per_batch, 400, 400, 3)).astype('float32')
    model(image, training=False)

    n_iter = 100
    start_time = time.time()
    for _ in range(n_iter):
        image = np.random.random(size=(img_per_batch, 400, 400, 3)).astype('float32')
        model(image, training=False)
    dt = (time.time() - start_time) * 1000
    print(f'img_per_batch = {img_per_batch}, {dt/n_iter:.2f} ms per iteration, {dt/n_iter/img_per_batch:.2f} ms per image')

My output (Nvidia Jetson Xavier, tensorflow==2.0.0):

img_per_batch = 1, 21.74 ms per iteration, 21.74 ms per image
img_per_batch = 5, 42.35 ms per iteration, 8.47 ms per image
img_per_batch = 10, 68.37 ms per iteration, 6.84 ms per image
img_per_batch = 50, 312.83 ms per iteration, 6.26 ms per image

Then I add dropout layer after each of the fully connected layers:

model = tf.keras.models.Sequential([
    # ... convolution layers are same
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(.3),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(.3),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dropout(.3),
    tf.keras.layers.Dense(240, activation='softmax')
])

With added layers output becomes as bellow:

img_per_batch = 1, 31.18 ms per iteration, 31.18 ms per image
img_per_batch = 5, 76.15 ms per iteration, 15.23 ms per image
img_per_batch = 10, 127.91 ms per iteration, 12.79 ms per image
img_per_batch = 50, 513.85 ms per iteration, 10.28 ms per image

In theory dropout layer shouldn't impact inference performance. But in the code above adding dropout layer increase single-image prediction time in 1.5 times and 10-images batch prediction is almost twice slower than without dropout. Am I doing something wrong?

1
Look at the shapes in model.summary(), count how many elements are there and how much work dropout has to do. Your network has almost no downsampling so many layers output a lot of elements. - Dr. Snoopy
The question was not why dropout layer is slow, but why it slow down inference. In my understanding dropout layer should be active in training mode only and disabled during prediction (when passing training=False). And the output of the Flatten layer has the shape (None, 3200), it is not a lot of params for dropout layer and cannot explain twofold performance decrease (every convolutional layer has strides=2, so network does have downsampling) - Konstantin Vdovkin
I agree what this doesn't make any sense. but whatever is going wromng here it's clearly fixed in tf2.2: colab.research.google.com/drive/… - mdaoust
Well, technically at inference time you should still multiply the layer inputs by p_keep to preserve the shape of the training distributions, right? Perhaps this could explain the difference - rvinas

1 Answers

5
votes

Apparently this is a known problem in TensorFlow 2.0.0: see this GitHub comment.

Try to use model.predict(x) instead of model(x).

This can also be fixed by updating to a more recent version of TensorFlow like 2.1.0.

Hope this helps