Intro
I have a setup of a client and a server communicating over a TCP connection, and I experience weird latency-behaviour which I cant understand.
Context
The client sends a request message to the server, which responds with a response message to client. I define latency as the time from sending a request message to receipt of a response message. I can send request messages at different rates (throttling the frequency of requests), however I always have at most one outstanding request message at any time. I.e, no concurrent/overlapping request-response message pairs.
I have implemented the sending of request and response message in three ways: first is directly on TCP sockets with my own serialization method etc, second is using gRPC for the communication over RPC using HTTP2, third is using Apache Thrift (an RPC framework similar to gRPC). gRPC is in turn implemented in 4 different client/server types, and for Thrift I have 3 different client/server types.
In all solutions, I experience a decrease in latency when increasing the sending rate of request messages (In gRPC and Thrift a request-response pair is communicated via an RPC method). The best latency is observed when not throttling the request rate at all, but sending a new request as soon as a response is received. Latency is measured using the std::chrono::steady_clock primitive. I have no idea what's causing this. I make sure to warmup the TCP connection (passing TCP slow start phase) by sending 10k request messages before starting the real testing.
How I implement the throttling and measure latency (on client ofc):
double rate;
std::cout << "Enter rate (requests/second):" << std::endl;
std::cin >> rate;
auto interval = std::chrono::microseconds(1000000)/rate;
//warmup-phase is here, but not included in this code.
auto total_lat = std::chrono::microseconds(0);
auto iter_time = start_time;
int i = 0;
for(i = 0; i < 10000; i++){ // send 10k requests.
iter_time = std::chrono::steady_clock::now();
RequestType request("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
ResponseType response;
auto start = std::chrono::steady_clock::now();
sendRequest(request); //these looks different depending on gRPC/Thrift/"TCP"
receiveResponse(&response);
auto end = std::chrono::steady_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
total_lat+=dur;
std::this_thread::sleep_until(iter_time+interval); //throttle the sending..
}
// mean latency: total_lat / i
I run the client/server in separate docker containers using docker-compose, and I also run them in kubernetes cluster. In both cases I experience the same behaviour. I am thinking maybe my throttling/time measureing code is doing stuff that I dont know about/understand.
The TCP sockets are set to TCP_NODELAY in all cases. The servers are single/multithreaded nonblocking/blocking, all kinds of different variations, and the clients are some synchronous, some asynchronous etc. So a lot of variation however the same behavious across them all.
Any ideas out here to what could cause such behaviour?