Tensorflow Object Detection API has slow inference time with tensorflow serving

Question

I am unable to match the inference times reported by Google for models released in their model zoo. Specifically I am trying out their faster_rcnn_resnet101_coco model where the reported inference time is 106ms on a Titan X GPU.

My serving system is using TF 1.4 running in a container built from the Dockerfile released by Google. My client is modeled after the inception client also released by Google.

I am running on an Ubuntu 14.04, TF 1.4 with 1 Titan X. My total inference time is 3x worse than reported by Google ~330ms. Making the tensor proto is taking ~150ms and Predict is taking ~180ms. My saved_model.pb is directly from the tar file downloaded from the model zoo. Is there something I am missing? What steps can I take to reduce the inference time?

Sid M Sid M · Accepted Answer · 2018-02-06T02:31:11

I was able to solve the two problems by

optimizing the compiler flags. Added the following to bazel-bin --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma
Not importing tf.contrib for every inference. In the inception_client sample provided by google, these lines re-import tf.contrib for every forward pass.

Tensorflow Object Detection API has slow inference time with tensorflow serving

4 Answers