I need to expose application-wide metrics for Prometheus collection from a Kubernetes application that is deployed with multiple instances, e.g. scaled by Horizontal Pod Autoscaler. The scrape point is exposed by every instance of the pod for fail-over purposes, however I do not want Prometheus to actually call the scrape endpoint on every pod's instance, only one instance at a time and failover to another instance only if necessary.
The statistics is application-wide, not per-pod instance, all instance endpoints report the same data, and calling them in parallel would serve no useful purpose and only increase a workload on the backend system that has to be queried for statistics. I do not want 30 calls to the backend (assuming the app is scaled up to 30 pods) where just one call would suffice.
I hoped that exposing the scrape endpoint as a k8s service (and annotating the service for scraping) should do the trick. However instead of going through the service proxy and let it route the request to one of the pods, Prometheus seems to be going directly to the instances behind the service, and to all of them, rather than only one at a time.
Is there a way to avoid Prometheus calling all the instances, and have it call only one?
The service is defined as:
apiVersion: v1
kind: Service
metadata:
name: k8worker-msvc
labels:
app: k8worker-msvc
annotations:
prometheus.io/scrape: 'true'
prometheus.io/path: '/metrics'
prometheus.io/port: '3110'
spec:
selector:
app: k8worker
type: LoadBalancer
ports:
- protocol: TCP
port: 3110
targetPort: 3110
In case this is not possible, what are my options other than running leader election inside the app and reporting empty metrics data from non-leader instances?
Thanks for advice.