Istio Ingress Gateway - Visibility into gRPC connections and load balancing

Question

We have a gRPC application deployed in a cluster (v 1.17.6) with Istio (v 1.6.2) setup. The cluster has istio-ingressgateway setup as the edge LB, with SSL termination. The istio-ingressgateway is fronted by an AWS ELB (classic LB) in passthrough mode. This setup is fully functional and the traffic flows as intended, in general. So the setup looks like:

ELB => istio-ingressgateway => virtual service => app service => [(envoy)pods]

We are running load tests on this setup using GHZ (ghz.sh), running external to the application cluster. From the tests we’ve run, we have observed that each of the app container seems to get about 300 RPS routed to it, no matter the configuration of the GHZ test. For reference, we have tried various combos of --concurrency and --connection settings for the tests. This ~300 RPS is lower than what we expect from the app and, hence, requires a lot more PODs to provide the required throughput.

We are really interested in understanding the details of the physical connection (gRPC/HTTP2) setup in this case, all the way from the ELB to the app/envoy and the details of the load balancing being done. Of particular interest is the the case when the same client, GHZ e.g., opens up multiple connections (specified via the --connection option). We have looked at Kiali and it doesn’t give us the appropriate visibility.

Questions:

How can we get visibility into the physical connections being setup from the ingress gateway to the pod/proxy?
How is the “per request gRPC” load balancing happening?
What options might exist to optimize the various components involved in this setup?

Thanks.

Jakub Jakub · Accepted Answer · 2020-11-02T09:55:09

1.How can we get visibility into the physical connections being setup from the ingress gateway to the pod/proxy?

If Kiali doesn't show what exactly you need, maybe you could try with Jaeger?

Jaeger is an open source end to end distributed tracing system, allowing users to monitor and troubleshoot transactions in complex distributed systems.

There is istio documentation about Jaeger.

Additionally Prometheus and Grafana might be helpful here, take a look here.

2.How is the “per request gRPC” load balancing happening?

As mentioned here

By default, the Envoy proxies distribute traffic across each service’s load balancing pool using a round-robin model, where requests are sent to each pool member in turn, returning to the top of the pool once each service instance has received a request.

If you wan't to change the default round-robin model you can use Destination Rule for that. Destination rules let you customize Envoy’s traffic policies when calling the entire destination service or a particular service subset, such as your preferred load balancing model, TLS security mode, or circuit breaker settings.

There is istio documentation about that.

More about load balancing in envoy here.

3.What options might exist to optimize the various components involved in this setup?

I'm not sure if there is anything to optimize in istio components, maybe some custom configuration in Destination Rule?

Additional Resources:

Istio Ingress Gateway - Visibility into gRPC connections and load balancing

1 Answers