2
votes

I am a bit new to (g)rpc, and I do not really understand the concept. We have a set of NodeJS servers in a Kubernetes cluster communicating between each other through grpc. The rpc interfaces are set up towards each server on client upstart.

We have recently discovered that upon restarting a server, its clients looses the connection to that server. That is, the previously working rpc calls to a server does no longer work after that server restarts. And not until we restart the servers in the right order does it start functioning again.

What I though was that through an address (host + port) you told the client 'here is a procedure you can call.' And upon calling the procedure, the address was called, processed on the server, and returned. If it worked like this, the client would not care if the server has restarted 0 or 100 times between rpc calls.

But with the above description of clients rpc calls failing/timing out, it seems like there is a socket-like connection, where the connection is established and maintained while both parts are running.

How does it work, and do I need to implement health checks to my rpc server on my clients to re-establish the interfaces upon server restart?

Thanks for you time.

1
What version of gRPC are you using?murgatroid99
What errors are you seeing making you think that's the case? Sometimes gRPC logs warning but keeps working fine. Please be more descriptive and specific in the question so that people can help.Ahmet Alp Balkan

1 Answers

2
votes

https://github.com/grpc/grpc/blob/master/doc/connectivity-semantics-and-api.md suggests that the Channel will go from "transient_failure" to "connecting" (and back to "ready") eventually, but because of exponential backoff, this could take a long time.

https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md describes something called MAX_BACKOFF. In https://github.com/grpc/grpc-node/blob/master/packages/grpc-js/src/channel.ts that appears to be hardcoded to two minutes.