Performance issue with Tensorflow model server on CPU in comparison with Tensorflow model inference

Question

I observe performance issue for CPU with Tensorflow model server. It doubles the time for inference in comparison to raw Tensorflow model inference. Both built with MKL for CPU only.

Code to reproduce: https://github.com/BogdanRuzh/tf_model_service_benchmark

Tensorflow MKL build: bazel build --config=mkl -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 //tensorflow/tools/pip_package:build_pip_package

Tensorflow server MKL build: bazel build --config=mkl --config=opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/model_servers:tensorflow_model_server

The target model is simple CNN for segmentation.

Raw Tensorflow model process an image in 0.17s. Tensorflow model server process the same image in 0.32s.

How can I improve this performance? It's very critical for my application.

AlexSomov AlexSomov · Accepted Answer · 2019-09-26T06:53:09

I suppose that explonation will help you. It's said that with bad configuration tensorflow with intel optimizations may have worse performance then clear build https://github.com/tensorflow/serving/issues/1272#issuecomment-477878180

You may try to configure parameters of batching (with config file and --enable_batching parameter) https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching

And set (inter/intra)_op_parallelism_threads.

Additionaly, MKL has it's own flags for improving performance https://www.tensorflow.org/guide/performance/overview#tuning_mkl_for_the_best_performance

Performance issue with Tensorflow model server on CPU in comparison with Tensorflow model inference

2 Answers