20
votes

I'm using gRPC with Python as client/server inside kubernetes pods... I would like to be able to launch multiple pods of the same type (gRPC servers) and let the client connect to them (randomly).

I dispatched 10 pods of the server and setup a 'service' to target them. Then, in the client, I connected to the DNS name of the service - meaning kubernetes should do the load-balancing and direct me to a random server pod. In reality, the client calls the gRPC functions (which works well) but when I look at the logs I see that all calls going to the same server pod.

I presume the client is doing some kind of DNS caching which leads to all calls being sent to the same server. Is this the case? Is there anyway to disable it and set the same stub client to make a "new" call and fetch a new ip by DNS with each call?

I am aware of the overhead I might cause if it will query the DNS server each time but distributing the load is much more important for me at the moment.

3

3 Answers

25
votes

Let me take the opportunity to answer by describing how things are supposed to work.

The way client-side LB works in the gRPC C core (the foundation for all but the Java and Go flavors or gRPC) is as follows (the authoritative doc can be found here):

Client-side LB is kept simple and "dumb" on purpose. The way we've chosen to implement complex LB policies is through an external LB server (as described in the aforementioned doc). You aren't concerned with this scenario. Instead, you are simply creating a channel, which will use the (default) pick-first LB policy.

The input to an LB policy is a list of resolved addresses. When using DNS, if foo.com resolves to [10.0.0.1, 10.0.0.2, 10.0.0.3, 10.0.0.4], the policy will try to establish a connection to all of them. The first one to successfully connect will become the chosen one until it disconnects. Thus the name "pick-first". A longer name could have been "pick first and stick with it for as long as possible", but that made for a very long file name :). If/when the picked one gets disconnected, the pick-first policy will move over to returning the next successfully connected address (internally referred to as a "connected subchannel"), if any. Once again, it'll continue to choose this connected subchannel for as long as it stays connected. If all of them fail, the call would fail.

The problem here is that DNS resolution, being intrinsically pull based, is only triggered 1) at channel creation and 2) upon disconnection of the chosen connected subchannel.

As of right now, a hacky solution would be to create a new channel for every request (very inefficient, but it'd do the trick given your setup).

Given changes coming in Q1 2017 (see https://github.com/grpc/grpc/issues/7818) will allow clients to choose a different LB policy, namely Round Robin. In addition, we may look into introducing a "randomize" bit to that client config, which would shuffle the addresses prior to doing Round-Robin over them, effectively achieving what you intend.

5
votes

Usual K8S load balancing doesn't work for gRPC. The following link explains why. https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/

This is because gRPC is built on HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection, across which all requests are multiplexed—meaning multiple requests can be active on the same connection at any point in time. Normally, this is great, as it reduces the overhead of connection management. However, it also means that (as you might imagine) connection-level balancing isn’t very useful. Once the connection is established, there’s no more balancing to be done. All requests will get pinned to a single destination pod.

Most modern ingress controllers can handle this, but they are either hot of the oven (nginx), or in alpha version (traefik), or require the latest version of K8S (Linkerd). You can do client-side load balancing, of which you can find a Java solution here.

0
votes

If you've created a vanilla Kubernetes service, the service should have its own load-balanced virtual IP (check if kubectl get svc your-service shows a CLUSTER-IP for your service). If this is the case, DNS caching should not be an issue, because that single virtual IP should be splitting traffic among the actual backends.

Try kubectl get endpoints your-service to confirm that your service actually knows about all of your backends.

If you have a headless service, a DNS lookup will return an A record with 10 IPs (one for each of your Pods). If your client is always choosing the first IP in an A record, that would also explain the behavior you're seeing.