0
votes

I've been attempting to use Prometheus for the purpose of monitoring pod statistics for http_request_rate and/or packets_per_second. To do so, I was planning on using the Prometheus Adapter, from what I've read, requires the use of the Prometheus Operator.

I've had issues installing the Prometheus Operator from the helm stable charts. When running the installation command "helm install prom stable/prometheus-operator" I am getting the following warning message displayed six times

$ manifest_sorter.go:192 info: skipping unknown hook: "crd-install".

The installation continues and the pods are deployed however, the prometheus-node-exporter pod goes into status: CrashLoopBackOff.

I can't see a detailed reason for this as the message when describing the pods is "Back-off restarting failed container"

I'm running Minikube on version: 1.7.2.

I'm running Helm on version: 3.1.1.


>>>Update<<<

Output of Describing Problematic Pod

> $ kubectl describe pod prom-oper-prometheus-node-exporter-2m6vm -n default
> 
> Name:         prom-oper-prometheus-node-exporter-2m6vm Namespace:   
> default Priority:     0 Node:         max-ubuntu/10.2.40.198 Start
> Time:   Wed, 04 Mar 2020 18:06:44 +0000 Labels:      
> app=prometheus-node-exporter
>               chart=prometheus-node-exporter-1.8.2
>               controller-revision-hash=68695df4c5
>               heritage=Helm
>               jobLabel=node-exporter
>               pod-template-generation=1
>               release=prom-oper Annotations:  <none> Status:       Running IP:           10.2.40.198 IPs:   IP:           10.2.40.198
> Controlled By:  DaemonSet/prom-oper-prometheus-node-exporter
> Containers:   node-exporter:
>     Container ID:  docker://50b2398f72a0269672c4ac73bbd1b67f49732362b4838e16cd10e3a5247fdbfe
>     Image:         quay.io/prometheus/node-exporter:v0.18.1
>     Image ID:      docker-pullable://quay.io/prometheus/node-exporter@sha256:a2f29256e53cc3e0b64d7a472512600b2e9410347d53cdc85b49f659c17e02ee
>     Port:          9100/TCP
>     Host Port:     9100/TCP
>     Args:
>       --path.procfs=/host/proc
>       --path.sysfs=/host/sys
>       --web.listen-address=0.0.0.0:9100
>       --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
>       --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
>     State:          Waiting
>       Reason:       CrashLoopBackOff
>     Last State:     Terminated
>       Reason:       Error
>       Exit Code:    1
>       Started:      Wed, 04 Mar 2020 18:10:10 +0000
>       Finished:     Wed, 04 Mar 2020 18:10:10 +0000
>     Ready:          False
>     Restart Count:  5
>     Liveness:       http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
>     Readiness:      http-get http://:9100/ delay=0s timeout=1s period=10s #success=1 #failure=3
>     Environment:    <none>
>     Mounts:
>       /host/proc from proc (ro)
>       /host/sys from sys (ro)
>       /var/run/secrets/kubernetes.io/serviceaccount from prom-oper-prometheus-node-exporter-token-n9dj9 (ro) Conditions:   Type
> Status   Initialized       True    Ready             False   
> ContainersReady   False    PodScheduled      True  Volumes:   proc:
>     Type:          HostPath (bare host directory volume)
>     Path:          /proc
>     HostPathType:     sys:
>     Type:          HostPath (bare host directory volume)
>     Path:          /sys
>     HostPathType:     prom-oper-prometheus-node-exporter-token-n9dj9:
>     Type:        Secret (a volume populated by a Secret)
>     SecretName:  prom-oper-prometheus-node-exporter-token-n9dj9
>     Optional:    false QoS Class:       BestEffort Node-Selectors:  <none> Tolerations:     :NoSchedule
>                  node.kubernetes.io/disk-pressure:NoSchedule
>                  node.kubernetes.io/memory-pressure:NoSchedule
>                  node.kubernetes.io/network-unavailable:NoSchedule
>                  node.kubernetes.io/not-ready:NoExecute
>                  node.kubernetes.io/pid-pressure:NoSchedule
>                  node.kubernetes.io/unreachable:NoExecute
>                  node.kubernetes.io/unschedulable:NoSchedule Events:   Type     Reason     Age                    From             

> Message   ----     ------     ----                   ----             
> -------   Normal   Scheduled  5m26s                  default-scheduler    Successfully assigned default/prom-oper-prometheus-node-exporter-2m6vm
> to max-ubuntu   Normal   Started    4m28s (x4 over 5m22s)  kubelet,
> max-ubuntu  Started container node-exporter   Normal   Pulled    
> 3m35s (x5 over 5m24s)  kubelet, max-ubuntu  Container image
> "quay.io/prometheus/node-exporter:v0.18.1" already present on machine 
> Normal   Created    3m35s (x5 over 5m24s)  kubelet, max-ubuntu 
> Created container node-exporter   Warning  BackOff    13s (x30 over
> 5m18s)   kubelet, max-ubuntu  Back-off restarting failed container

Output of Problematic Pod Logs

> $ kubectl logs prom-oper-prometheus-node-exporter-2m6vm -n default
> time="2020-03-04T18:18:02Z" level=info msg="Starting node_exporter
> (version=0.18.1, branch=HEAD,
> revision=3db77732e925c08f675d7404a8c46466b2ece83e)"
> source="node_exporter.go:156" time="2020-03-04T18:18:02Z" level=info
> msg="Build context (go=go1.12.5, user=root@b50852a1acba,
> date=20190604-16:41:18)" source="node_exporter.go:157"
> time="2020-03-04T18:18:02Z" level=info msg="Enabled collectors:"
> source="node_exporter.go:97" time="2020-03-04T18:18:02Z" level=info
> msg=" - arp" source="node_exporter.go:104" time="2020-03-04T18:18:02Z"
> level=info msg=" - bcache" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - bonding"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - conntrack" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - cpu"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - cpufreq" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - diskstats"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - edac" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - entropy"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - filefd" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - filesystem"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - hwmon" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - infiniband"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - ipvs" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - loadavg"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - mdadm" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - meminfo"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - netclass" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - netdev"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - netstat" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - nfs"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - nfsd" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - pressure"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - sockstat" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - stat"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - textfile" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - time"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - timex" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - uname"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - vmstat" source="node_exporter.go:104"
> time="2020-03-04T18:18:02Z" level=info msg=" - xfs"
> source="node_exporter.go:104" time="2020-03-04T18:18:02Z" level=info
> msg=" - zfs" source="node_exporter.go:104" time="2020-03-04T18:18:02Z"
> level=info msg="Listening on 0.0.0.0:9100"
> source="node_exporter.go:170" time="2020-03-04T18:18:02Z" level=fatal
> msg="listen tcp 0.0.0.0:9100: bind: address already in use"
> source="node_exporter.go:172"
2

2 Answers

3
votes

This is one of known issue related to Helm 3. It affected many charts as argo or ambassador. You can find in Helm docs info that crd-install hook was removed:

Note that the crd-install hook has been removed in favor of the crds/ directory in Helm 3.

I've deployed this chart, also get information that Helm skipped unknown hook but don't have issue with pods.

The alternative way is to create CRD's before install chart. Steps to do that can be found here.

In first step you have commands to create CRD's:

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_alertmanagers.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_podmonitors.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_prometheusrules.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_servicemonitors.yaml
kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.36/example/prometheus-operator-crd/monitoring.coreos.com_thanosrulers.yaml

Last step is execute Helm install:

helm install --name my-release stable/prometheus-operator --set prometheusOperator.createCustomResource=false

But Helm 3 will not recognize --name flag.

Error: unknown flag: --name

You have to remove this flag. It should look like:

$ helm install prom-oper  stable/prometheus-operator --set prometheusOperator.createCustomResource=false
NAME: prom-oper
LAST DEPLOYED: Wed Mar  4 14:12:35 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
NOTES:
The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace default get pods -l "release=prom-oper"

$ kubectl get pods
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prom-oper-prometheus-opera-alertmanager-0   2/2     Running   0          9m46s
...
prom-oper-prometheus-node-exporter-25b27                 1/1     Running   0          9m56s

If you will have some issues regarding repo you just need to execute:

helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update

If this alternative way won't help, please add to your question output of:

kubectl describe <pod-name> -n <pod-namespace> and kubectl logs <pod-name> -n <pod-namespace>

1
votes

This issue turned out to be caused by the fact that Minikube was being run with --vm-driver=none. To solve the issue, Minikube was rebuilt using --vm-driver=kvm2 with --memory=6g. This allowed the installation of stable/prometheus-operator and all pods ran without crashing.