When I serve my TF model with tensorflow serveing, on version 2.1.0, through docker, I perform a stress testing with Jmeter. There is a problem. TPS will hit 4400 by testing with single data, while it only reach 1700 with multiple data in a txt file. The model is BiLSTM which I've trained without any cache setting. The experiments all perform in local server rather than through network.
Metrics:
In single data task, I set running HTTP request with identical data without interval by 30 request threads for 10 minutes.- TPS: 4491
- CPU occupied: 2100%
- 99% Latancy Line(ms): 17
- error rate: 0
- TPS: 1711
- CPU occupied: 2300%
- 99% Latancy Line(ms): 42
- error rate: 0
Hardware:
- CPU cores:12
- processor: 24
- Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
Is there a cache in Tensorflow Serving?
Why is TPS with single data testing larger thrice than with various data testing in stress testing task?