I've trained a model in Tensorflow 2.0 and am trying to improve predict time when moving to production (on a server with GPU support). In Tensorflow 1.x I was able to get a predict speedup by using freeze graph, but this has been deprecated as of Tensorflow 2. From reading Nvidia's description of TensorRT, they suggest that using TensorRT can speedup inference by 7x compared to Tensorflow alone. Source:
TensorFlow 2.0 with Tighter TensorRT Integration Now Available
I have trained my model and saved it to a .h5 file using Tensorflow's SavedModel format. Now I follow nvidia's documentation to optimize the model for inference with tensorrt: TF-TRT 2.0 Workflow With A SavedModel.
When I run:
import tensorflow as tf
from tensorflow.python.compiler.tensorrt import trt_convert as trt
I get the error: ModuleNotFoundError: No module named 'tensorflow.python.compiler.tensorrt'
They give another example with Tensorflow 2.0 here: Examples. However, they try to import the same module as above and I get the same error.
Can anyone suggest how to optimize my model with TensorRT?