Main question:
Can we exclude a path from the cloud endpoint statistics/monitoring while still allowing traffic to our actual backend?
Explanation:
We have a backend running on Kubernetes and are now trying out Google Cloud Endpoints. We added the EPS container to the pod in front of the backend container. As we do everywhere else, we also use health checks in Kubernetes and from the Google (L7) LoadBalancer in front. In order to have the health check reach our backend, it has to be defined in the openapi yaml file used by the EPS container, e.g.:
...
paths:
"/_ah/health":
get:
operationId: "OkStatus"
security: []
responses:
200:
description: "Ok message"
...
The issue with this is that these requests muddle the monitoring/tracing/statistics for our actual API. The latency numbers registered by the cloud endpoint are useless: they show a 50th percentile of 2ms, and then a 95th percentile of 20s because of the high fraction of health-check traffic. The actual requests taking 20+ seconds are shown as a marginal fraction of requests since the health checks do requests multiple times each second, each taking 2ms. Since these health checks are steady traffic being 90% of all requests, the actual relevant requests are shown as the 'exceptions' in the margin.
Therefore, we'd like to exclude this health traffic from the endpoint statistics, but keep the health check functional.
I have not found anything for this in the documentation, nor any solution on the web somewhere else.
Possible alternate solution
We can add an extra service to our Kubernetes setup reaching directly our backend only used for the health check. Problems with this are:
- Extra k8s service, configuration, firewall rules ... required
- We do not health check the actual setup. If the EPS container fails to direct traffic to our backend, this will go unnoticed.
- We encrypt traffic between the loadbalancer and backends with SSL, but our actual backend should now need an extra ssl-aware webserver in between for this. For this health check without actual data, this is a minor issue, but still would mean an exception to the rule.
We could add an additional health check for the EPS container as well. But since this should not show up in the stats, it should be like doing a request for a non-defined path and checking that the reponse is the EPS reponse for that case:
{"code": 5, "message": "Method does not exist.", "details": [{ "@type": "type.googleapis.com/google.rpc.DebugInfo", "stackEntries": [], "detail": "service_control" }] }
This is not ideal either. It does check if the container is running at the very least, but it's more of a 'it's not down' rather than a 'it's working' approach, so a lot of other issues will go unnoticed.