Heyo,
I've deployed a prometheus, grafana, kube-state-metrics, alertmanager, etc. setup using kubernetes in GKE v1.16.x. I've used https://github.com/do-community/doks-monitoring as a jumping off point for the yaml files.
I've been trying to debug a situation for a few days now and would be very grateful for some help. My prometheus nodes are not getting metrics from cadvisor.
- All the services and pods in the deployments are running. prometheus, kube-state-metrics, node-exporter, all running - no errors.
- The cadvisor targets in prometheus UI appear as "up".
- Prometheus is able to collect other metrics from the cluster, but no pod/container level usage metrics.
- I can see cadvisor metrics when I query
kubectl get --raw "/api/v1/nodes/<your_node>/proxy/metrics/cadvisor", but when I look in prometheus forcontainer_cpu_usageorcontainer_memory_usage, there is no data. - My cadvisor scrape job config in prometheus
- job_name: kubernetes-cadvisor
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics/cadvisor
scheme: https
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
cribbed from the prometheus/docs/examples.
I've tried a whole bunch of different variations on paths and scrape configs, but no luck. Based on the fact that I can query the metrics using kubectl get (they exist) it seems to me the issue is prometheus communicating with the cadvisor target.
If anyone has experience getting this configured I'd sure appreciate some help debugging.
Cheers
Prometheuspod? Are there any warnings that could shed some light why you can't get the cadvisor metrics? - Dawid Kruk