ISTIO sidecar causes Java grpc client throws "UNAVAILABLE: upstream connect error or disconnect/reset before headers" under high concurrency load

Question

I have two gRPC services and one will call another one through normal gRPC method(no stream on either side), I'm using istio as service mesh and have sidecar injected into kubernetes pod of both services.

The gRPC call works correctly under normal load, but under high concurrency load situations, gRPC client side keeps throwing the following exception:

<#bef7313d> i.g.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers
    at io.grpc.Status.asRuntimeException(Status.java:526)
    at i.g.s.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:397)
    at i.g.i.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at i.g.i.ClientCallImpl.access$300(ClientCallImpl.java:63)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584)
    at i.g.i.ContextRunnable.run(ContextRunnable.java:37)
    at i.g.i.SerializingExecutor.run(SerializingExecutor.java:123)
    at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Meanwhile, there's no exception on the server side, and there's no error on the istio-proxy container of client's pod neither. But if I disable istio sidecar injection so that those two service talk to each other directly, there's no such errors.

Could somebody kindly tell me why, and how to resolve this problem?

Thanks a lot.

That error message is not generated by grpc-java. So it is probably generated by istio. — Eric Anderson
@shizhz the error message is generated by the envoy (ingress gateway or in the sidecar in the service) as it cannot reach the upstream, can you post the manifests? — Rinor
@rinormaloku, thanks for your response, I've found the reason and posted it as answer, hope it's helpful for other people facing the same problem :-) — shizhz

shizhz shizhz · Accepted Answer · 2019-01-11T02:14:09

Finally I found the reason, it's caused by the default circuitBeakers settings of envoy sidecar, by default the option max_pending_requests and max_requests is set to 1024, and the default connecTimeout is 1s, so under the high concurrency load situation when the server side has too many pending requests waiting to be served, the sidecar circuitBreaker will get involved and tell client side the server side upstream is UNAVAILABLE.

To fix this problem you need to apply a DestinationRule for the target service with reasonable trafficPolicy settings.

ISTIO sidecar causes Java grpc client throws "UNAVAILABLE: upstream connect error or disconnect/reset before headers" under high concurrency load

1 Answers