1
votes

When I serve my TF model with tensorflow serveing, on version 2.1.0, through docker, I perform a stress testing with Jmeter. There is a problem. TPS will hit 4400 by testing with single data, while it only reach 1700 with multiple data in a txt file. The model is BiLSTM which I've trained without any cache setting. The experiments all perform in local server rather than through network.

Metrics:

In single data task, I set running HTTP request with identical data without interval by 30 request threads for 10 minutes.
  • TPS: 4491
  • CPU occupied: 2100%
  • 99% Latancy Line(ms): 17
  • error rate: 0
In multiple data task, I set running HTTP request with reading a txt file, a dataset with 9740000 different examples, by 30 request threads.
  • TPS: 1711
  • CPU occupied: 2300%
  • 99% Latancy Line(ms): 42
  • error rate: 0

Hardware:

  • CPU cores:12
  • processor: 24
  • Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz

Is there a cache in Tensorflow Serving?

Why is TPS with single data testing larger thrice than with various data testing in stress testing task?

1

1 Answers

0
votes

I've solved the problem. Request threads reading the same file needs to wait for which cost CPU for running Jmeter.