7
votes

I have install metrics server on my local k8s cluster on VirtualBox using https://github.com/kubernetes-sigs/metrics-server#installation

But the metrics server pod is in

metrics-server-844d9574cf-bxdk7      0/1     CrashLoopBackOff   28         12h     10.46.0.1      kubenode02   <none>           <none>

Events from the pod describe

Events:
  Type     Reason          Age                    From                 Message
  ----     ------          ----                   ----                 -------
  Normal   Scheduled       <unknown>                                   Successfully assigned kube-system/metrics-server-844d9574cf-bxdk7 to kubenode02
  Normal   Created         12h (x3 over 12h)      kubelet, kubenode02  Created container metrics-server
  Normal   Started         12h (x3 over 12h)      kubelet, kubenode02  Started container metrics-server
  Normal   Killing         12h (x2 over 12h)      kubelet, kubenode02  Container metrics-server failed liveness probe, will be restarted
  Warning  Unhealthy       12h (x7 over 12h)      kubelet, kubenode02  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy       12h (x7 over 12h)      kubelet, kubenode02  Readiness probe failed: HTTP probe failed with statuscode: 500
  Normal   Pulled          12h (x7 over 12h)      kubelet, kubenode02  Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
  Warning  BackOff         12h (x35 over 12h)     kubelet, kubenode02  Back-off restarting failed container
  Normal   SandboxChanged  55m (x22 over 59m)     kubelet, kubenode02  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          55m                    kubelet, kubenode02  Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.0" already present on machine
  Normal   Created         55m                    kubelet, kubenode02  Created container metrics-server
  Normal   Started         55m                    kubelet, kubenode02  Started container metrics-server
  Warning  Unhealthy       29m (x35 over 55m)     kubelet, kubenode02  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff         4m45s (x202 over 54m)  kubelet, kubenode02  Back-off restarting failed container

Logs from the deployment of metrics is as follows using kubectl logs deployment/metrics-server -n kube-system

E1110 12:56:25.249873       1 pathrecorder.go:107] registered "/metrics" from goroutine 1 [running]:
runtime/debug.Stack(0x1942e80, 0xc0006e8db0, 0x1bb58b5)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9d
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).trackCallers(0xc0004f73b0, 0x1bb58b5, 0x8)
        /go/pkg/mod/k8s.io/[email protected]/pkg/server/mux/pathrecorder.go:109 +0x86
k8s.io/apiserver/pkg/server/mux.(*PathRecorderMux).Handle(0xc0004f73b0, 0x1bb58b5, 0x8, 0x1e96f00, 0xc0005dc8d0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/server/mux/pathrecorder.go:173 +0x84
k8s.io/apiserver/pkg/server/routes.MetricsWithReset.Install(0xc0004f73b0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/server/routes/metrics.go:43 +0x5d
k8s.io/apiserver/pkg/server.installAPI(0xc00000a1e0, 0xc00013d8c0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/server/config.go:711 +0x6c
k8s.io/apiserver/pkg/server.completedConfig.New(0xc00013d8c0, 0x1f099c0, 0xc000697090, 0x1bbdb5a, 0xe, 0x1ef29e0, 0x2cef248, 0x0, 0x0, 0x0)
        /go/pkg/mod/k8s.io/[email protected]/pkg/server/config.go:657 +0xb45
sigs.k8s.io/metrics-server/pkg/server.Config.Complete(0xc00013d8c0, 0xc00013cb40, 0xc00013d680, 0xdf8475800, 0xc92a69c00, 0x0, 0x0, 0xdf8475800)
        /go/src/sigs.k8s.io/metrics-server/pkg/server/config.go:52 +0x312
sigs.k8s.io/metrics-server/cmd/metrics-server/app.runCommand(0xc0001140b0, 0xc0000a65a0, 0x0, 0x0)
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:66 +0x157
sigs.k8s.io/metrics-server/cmd/metrics-server/app.NewMetricsServerCommand.func1(0xc000618b00, 0xc0002c3a80, 0x0, 0x4, 0x0, 0x0)
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/app/start.go:37 +0x33
github.com/spf13/cobra.(*Command).execute(0xc000618b00, 0xc000100060, 0x4, 0x4, 0xc000618b00, 0xc000100060)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:842 +0x453
github.com/spf13/cobra.(*Command).ExecuteC(0xc000618b00, 0xc00012a120, 0x0, 0x0)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:950 +0x349
github.com/spf13/cobra.(*Command).Execute(...)
        /go/pkg/mod/github.com/spf13/[email protected]/command.go:887
main.main()
        /go/src/sigs.k8s.io/metrics-server/cmd/metrics-server/metrics-server.go:38 +0xae
I1110 12:56:25.384926       1 secure_serving.go:197] Serving securely on [::]:4443
I1110 12:56:25.384972       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I1110 12:56:25.384979       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I1110 12:56:25.384996       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I1110 12:56:25.385018       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I1110 12:56:25.385069       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385083       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I1110 12:56:25.385105       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I1110 12:56:25.385117       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E1110 12:56:25.385521       1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node kubenode02: unable to fetch metrics from node kubenode02: Get "https://192.168.56.4:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.4 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubenode01: unable to fetch metrics from node kubenode01: Get "https://192.168.56.3:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.3 because it doesn't contain any IP SANs, unable to fully scrape metrics from node kubemaster: unable to fetch metrics from node kubemaster: Get "https://192.168.56.2:10250/stats/summary?only_cpu_and_memory=true": x509: cannot validate certificate for 192.168.56.2 because it doesn't contain any IP SANs]
I1110 12:56:25.485100       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController 
I1110 12:56:25.485359       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I1110 12:56:25.485398       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
2
Hi there. CrashLoopBackOff just means it is never "up" and 28 is the number of restarts. The events tell us that it is failing the liveness probe. LP does a check to see if the container is happy. In this case there is something wrong inside the container probably. Try getting the logs from the Deployment with kubectl logs deployment/metrics-server -n kube-system. This might tell us why the app in the container is unhappy and reporting unhappy to its livenessProbe.Justin Tamblyn
Looking at the events further we see that HTTP probe failed with statuscode: 500. As I'm sure you know 500 is internal server error. So there is probably something wrong in the container. Let's see what the logs sayJustin Tamblyn
@JustinTamblyn added the logs.Himadri Ganguly
Which K8s version are you using? Have you went thru all the requirements needed in order to install metrics server?acid_fuji
@thomas I am using K8s v1.19.3 and the cluster has been set up using kebeadm.Himadri Ganguly

2 Answers

11
votes

The error is due to the self-signed TLS certificate. So adding - --kubelet-insecure-tls to the components.yaml and re-applying it to the K8s cluster fixes the issue.

Ref:- https://github.com/kubernetes-sigs/metrics-server#configuration

1
votes

I think, better would be to reissuing certificates for nodes (workers) and adding IP to SAN. cat w2k.csr.json

{
  "hosts": [
    "w2k",
    "w2k.rezerw.at",
    "172.16.8.113"
  ],
  "CN": "system:node:w2k",
  "key": {
    "algo": "ecdsa",
    "size": 256
  },
  "names": [
    {
      "O": "system:nodes"
    }
  ]
}

and commands:

cat w2k.csr.json|cfssl genkey - | cfssljson -bare w2k

cat w2k.csr| base64

This will output a string to put it in a spec.requet in new yaml file:

apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  name: worker01
spec:
  request: "LS0tLS1CRUdJ0tLS0tCg=="
  signerName: kubernetes.io/kubelet-serving
  usages:
  - digital signature
  - key encipherment
  - server auth

Apply it.

kubectl apply -f w2k.csr.yaml
certificatesigningrequest.certificates.k8s.io/worker01 configured

Approve the csr.

kubectl certificate approve w2k
certificatesigningrequest.certificates.k8s.io/w2k approved

Get the certificate and put together with its key on a node in /var/lib/kubelet/pki

root@w2k:/var/lib/kubelet/pki# mv w2k-key.pem  kubelet.key
root@w2k:/var/lib/kubelet/pki# mv w2k-cert.pem kubelet.crt

https://kubernetes.io/docs/tasks/tls/managing-tls-in-a-cluster/#create-a-certificate-signing-request-object-to-send-to-the-kubernetes-api