I am unable to match the inference times reported by Google for models released in their model zoo. Specifically I am trying out their faster_rcnn_resnet101_coco
model where the reported inference time is 106ms
on a Titan X GPU.
My serving system is using TF 1.4 running in a container built from the Dockerfile released by Google. My client is modeled after the inception client also released by Google.
I am running on an Ubuntu 14.04, TF 1.4 with 1 Titan X. My total inference time is 3x worse than reported by Google ~330ms. Making the tensor proto is taking ~150ms and Predict is taking ~180ms. My saved_model.pb
is directly from the tar file downloaded from the model zoo. Is there something I am missing? What steps can I take to reduce the inference time?