3
votes

First off, has anyone done a performance comparison for throughput/latency between a GRPC client-server implementation v/s a websocket+protobuf client-server implementation? Or at least something similary.

In order to reach this goal, I am trying out the example JAVA helloworld grpc client-server and trying to compare the latency of the response with a similar websocket client-server. Currently i am trying this out with both client and server on my local machine.

The websocket client-server has a simple while loop on the server side. For the grpc server i notice that it uses an asynchronous execution model. I suspect it creates a new thread for each client request, resulting in additional processing overheads. For instance, the websocket response latency i am measuring is in the order of 6-7 ms and the grpc example is showing a latency of about 600-700ms, accounting for protobuf overhead.

In order to do a similar comparison for grpc, is there a way to run the grpc server synchronously? I want to be able to eliminate the overhead of the thread creation/dispatch and other such internal overhead introduced by the asynchronous handling.

I do understand that there is a protobuf overhead involved in grpc that is not there in my websocket client-server example. However i can account for that by measuring the overhead introduced by protobuf processing.

Also, if i cannot run the grpc server synchronously, can i at least measure the thread dispatch/asynchronous processing overhead?

I am relatively new to JAVA, so pardon my ignorance.

2

2 Answers

5
votes

Benchmarking in Java is easy to get wrong. You need to do many seconds worth of warm-up for multiple levels of JIT to kick in. You also need time for the heap size to level-off. In a simplistic one-shot benchmark, it's easy to see the code that runs last is fastest (independent of what that code is), due to class loading. 600 ms is an insanely large number for gRPC latency; we see around 300 µs median latency on Google Compute Engine between two machines, with TLS. I expect you have no warm-ups, so you are counting the time it takes for Java to load gRPC and are measuring Java using its interpreter with gRPC.

There is not a synchronous version of the gRPC server, and even if there was it would still run with a separate thread by default. grpc-java uses a cached thread pool, so after an initial request gRPC should be able to re-use a thread for calling the service.

The cost of jumping between threads is generally low, although it can add tail latency. In some in-process NOOP benchmarks we see RPC completion in 8 µs using the extra threads and 4 µs without. If you really want though, you can use serverBuilder.directExecutor() to avoid the thread hop. Note that most services would get slower with that option and have really poor tail latency, as service processing can delay I/O.

0
votes

In order to do a similar comparison for grpc, is there a way to run the grpc server synchronously? I want to be able to eliminate the overhead of the thread creation/dispatch and other such internal overhead introduced by the asynchronous handling.

You can create a synchronous client. Generally the asynchronous is way faster. (Tested in Scala) You can simply use all resources you got in an non-blocking way. I would create a test on how many request from how many clients the server can handle per second. You can than limit the incoming request per client to make sure that your service will not crash. Asynchronous is also better for HTTP 2. HTTP 2 provides Multiplexing.

For a Benchmark I can recommend Metrics. You can expose the metrics via log or http endpoint.