I have a Kubernetes cluster set up using kubeadm. I installed prometheus and node-exporter on top of it based on:
- https://github.com/bibinwilson/kubernetes-prometheus
- https://github.com/bibinwilson/kubernetes-node-exporter
The pods seem to be running properly:
kubectl get pods --namespace=monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-exporter-jk2sd 1/1 Running 0 90m 192.168.5.20 work03 <none> <none>
node-exporter-jldrx 1/1 Running 0 90m 192.168.5.17 work04 <none> <none>
node-exporter-mgtld 1/1 Running 0 90m 192.168.5.15 work01 <none> <none>
node-exporter-tq7bx 1/1 Running 0 90m 192.168.5.41 work02 <none> <none>
prometheus-deployment-5d79b5f65b-tkpd2 1/1 Running 0 91m 192.168.5.40 work02 <none> <none>
I can see the endpoints, as well:
kubectl get endpoints -n monitoring
NAME ENDPOINTS AGE
node-exporter 192.168.5.15:9100,192.168.5.17:9100,192.168.5.20:9100 + 1 more... 5m3s
I also did: kubectl port-forward prometheus-deployment-5d79b5f65b-tkpd2 8080:9090 -n monitoring
and when I access the prometheus web UI > Status > Targets, I don't find node-exporters there. When I start typing a query for a metric reported by node-exporter, it doesn't automatically show up in the query editor.
Logs coming from the prometheus pod seem to have a lot of errors:
kubectl logs prometheus-deployment-5d79b5f65b-tkpd2 -n monitoring
level=info ts=2021-08-11T16:24:21.743Z caller=main.go:428 msg="Starting Prometheus" version="(version=2.29.1, branch=HEAD, revision=dcb07e8eac34b5ea37cd229545000b857f1c1637)"
level=info ts=2021-08-11T16:24:21.743Z caller=main.go:433 build_context="(go=go1.16.7, user=root@364730518a4e, date=20210811-14:48:27)"
level=info ts=2021-08-11T16:24:21.743Z caller=main.go:434 host_details="(Linux 5.4.0-70-generic #78-Ubuntu SMP Fri Mar 19 13:29:52 UTC 2021 x86_64 prometheus-deployment-5d79b5f65b-tkpd2 (none))"
level=info ts=2021-08-11T16:24:21.743Z caller=main.go:435 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2021-08-11T16:24:21.743Z caller=main.go:436 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2021-08-11T16:24:21.745Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2021-08-11T16:24:21.745Z caller=main.go:812 msg="Starting TSDB ..."
level=info ts=2021-08-11T16:24:21.748Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
level=info ts=2021-08-11T16:24:21.753Z caller=head.go:815 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
level=info ts=2021-08-11T16:24:21.753Z caller=head.go:829 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=4.15µs
level=info ts=2021-08-11T16:24:21.753Z caller=head.go:835 component=tsdb msg="Replaying WAL, this may take a while"
level=info ts=2021-08-11T16:24:21.754Z caller=head.go:892 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2021-08-11T16:24:21.754Z caller=head.go:898 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=75.316µs wal_replay_duration=451.769µs total_replay_duration=566.051µs
level=info ts=2021-08-11T16:24:21.756Z caller=main.go:839 fs_type=EXT4_SUPER_MAGIC
level=info ts=2021-08-11T16:24:21.756Z caller=main.go:842 msg="TSDB started"
level=info ts=2021-08-11T16:24:21.756Z caller=main.go:969 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2021-08-11T16:24:21.757Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-08-11T16:24:21.759Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-08-11T16:24:21.762Z caller=kubernetes.go:282 component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
level=info ts=2021-08-11T16:24:21.764Z caller=main.go:1006 msg="Completed loading of configuration file" filename=/etc/prometheus/prometheus.yml totalDuration=7.940972ms db_storage=607ns remote_storage=1.251µs web_handler=283ns query_engine=694ns scrape=227.668µs scrape_sd=6.081132ms notify=27.11µs notify_sd=16.477µs rules=648.58µs
level=info ts=2021-08-11T16:24:21.764Z caller=main.go:784 msg="Server is ready to receive web requests."
level=error ts=2021-08-11T16:24:51.765Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:24:51.765Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:24:51.765Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:24:51.766Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:24:51.766Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:22.587Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:22.855Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:23.153Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:23.261Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:23.335Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:54.814Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:55.282Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:55.516Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:55.934Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:25:56.442Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:26:30.058Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:26:30.204Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:26:30.246Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:26:30.879Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:26:31.479Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:09.673Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:09.835Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:10.467Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:11.170Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:12.684Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:27:55.324Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: Get \"https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:28:01.550Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:28:01.621Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get \"https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:28:04.801Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:28:05.598Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Node: failed to list *v1.Node: Get \"https://10.96.0.1:443/api/v1/nodes?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:28:57.256Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
level=error ts=2021-08-11T16:29:04.688Z caller=klog.go:116 component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: Get \"https://10.96.0.1:443/api/v1/pods?limit=500&resourceVersion=0\": dial tcp 10.96.0.1:443: i/o timeout"
Is there a way to solve this issue and make node-exporters show up in the targets?
Version details:
kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.9", GitCommit:"7a576bc3935a6b555e33346fd73ad77c925e9e4a", GitTreeState:"clean", BuildDate:"2021-07-15T20:56:38Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}
Edit: The cluster was set up as follows:
sudo kubeadm reset
sudo rm $HOME/.kube/config
sudo kubeadm init --pod-network-cidr=192.168.5.0/24
mkdir -p $HOME/.kube; sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config; sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
It is using flannel.
flannel pods are running:
kube-flannel-ds-45qwf 1/1 Running 0 31h x.x.x.41 work01 <none> <none>
kube-flannel-ds-4rwzj 1/1 Running 0 31h x.x.x.40 mast01 <none> <none>
kube-flannel-ds-8fdtt 1/1 Running 24 31h x.x.x.43 work03 <none> <none>
kube-flannel-ds-8hl5f 1/1 Running 23 31h x.x.x.44 work04 <none> <none>
kube-flannel-ds-xqtrd 1/1 Running 0 31h x.x.x.42 work02 <none> <none>
route -n
). Otherwise, try something similar from a Pod running on your worker nodes (without hostNetwork / must be within the SDN). If you can't reach the API, issue could be with iptables (iptables -nL
) or ipvs (ipvsadm -l-n
), maybe kube-proxy, or still flannel (checkkubectl logs
), ... If you find a node that works: compare iptables/ipvs configuration. – SYN--pod-network-cidr=192.168.5.0/24
. Sounds wrong. I think the default host subnet length is 24 as well: whenever a new node joins the cluster, a portion of your cluster pod network cidr is allocated to it. If you whole pod subnet is a /24, I suspect only your master had its pod subnet properly allocated, you may already be out of addresses for the others... checkkubectl get nodes -o yaml
. With flannel, you should find aspec.podCIDR
and/orspec.podCIRDs
array. Make sure all your nodes have their own subnet, within your cluster pod network. – SYN--pod-network-cidr=10.244.0.0/16
. See github.com/flannel-io/flannel/issues/1054 – SYN