Openshift Monitoring - cAdvisor + Prometheus - Docker

Question

I tried to implement a monitoring solution for Openshift cluster based on Prometheus + node-exporter + grafana + cAdvisor.

I have a huge problem with cAdvisor component. I did a lot of configuration (The changes always do with volumes), but none of them work well, containter restarting every ~2min or do not collect all data metrics (processes)

example of configuration(with this config containter do not restart every 2min, but not collect processes) I know, i dont have /rootfs in volumes, but with this container work like 5s and goes down:

containers:
    - image: >-
        google/cadvisor@sha256:fce642268068eba88c27c666e92ed4144be6188447a23825015884741cf0e352
      imagePullPolicy: IfNotPresent
      name: cadvisor-new-version
      ports:
        - containerPort: 8080
          protocol: TCP
      resources: {}
      securityContext:
        privileged: true
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: '/sys/fs/cgroup/cpuacct,cpu'
          name: sys
          readOnly: true
        - mountPath: /var/lib/docker
          name: docker
          readOnly: true
        - mountPath: /var/run/containerd/containerd.sock
          name: docker-socketd
          readOnly: true
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: cadvisor-sa
  serviceAccountName: cadvisor-sa
  terminationGracePeriodSeconds: 300
  volumes:
    - hostPath:
        path: '/sys/fs/cgroup/cpu,cpuacct'
      name: sys
    - hostPath:
        path: /var/lib/docker
      name: docker
    - hostPath:
        path: /var/run/containerd/containerd.sock
      name: docker-socketd

i use a service account in my OS project with scc-privileged.

Openshift version - 3.6
Docker version - 1.12
cAdvisor version - I tried every one from v0.26.3 to newest

I found a post that the problem can be the old version od docker, can anyone confirmed this?

Maybe someone do the right configuration and implement cAdvisor on Openshift?

example of logs:

I0409 08:41:46.661453       1 manager.go:231] Version: 
 {KernelVersion:3.10.0-693.17.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
E0409 08:41:50.823560       1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
I0409 08:41:50.825280       1 factory.go:356] Registering Docker factory
I0409 08:41:50.826394       1 factory.go:54] Registering systemd factory
I0409 08:41:50.826949       1 factory.go:86] Registering Raw factory
I0409 08:41:50.827388       1 manager.go:1178] Started watching for new ooms in manager
I0409 08:41:50.838169       1 manager.go:329] Starting recovery of all containers
W0409 08:41:56.853821       1 container.go:393] Failed to create summary reader for "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc323db44_39a9_11e8_accd_005056800e7b.slice/docker-26db795af0fa28047f04194d8169cf0249edf2c918c583422a1404d35ed5b62c.scope": none of the resources are being tracked.
I0409 08:42:03.953261       1 manager.go:334] Recovery completed
I0409 08:42:37.874062       1 cadvisor.go:162] Starting cAdvisor version: v0.28.3-1e567c2 on port 8080
I0409 08:42:56.353574       1 fsHandler.go:135] du and find on following dirs took 1.20076874s: [ /rootfs/var/lib/docker/containers/2afa2c457a9c1769feb6ab542102521d8ad51bdeeb89581e4b7166c1c93e7522]; will not log again for this container unless duration exceeds 2s
I0409 08:42:56.453602       1 fsHandler.go:135] du and find on following dirs took 1.098795382s: [ /rootfs/var/lib/docker/containers/65e4ad3536788b289e2b9a29e8f19c66772b6f38ec10d34a2922e4ef4d67337f]; will not log again for this container unless duration exceeds 2s
I0409 08:42:56.753070       1 fsHandler.go:135] du and find on following dirs took 1.400184357s: [ /rootfs/var/lib/docker/containers/2b0aa12a43800974298a7d0353c6b142075d70776222196c92881cc7c7c1a804]; will not log again for this container unless duration exceeds 2s
I0409 08:43:00.352908       1 fsHandler.go:135] du and find on following dirs took 1.199079344s: [ /rootfs/var/lib/docker/containers/aa977c2cc6105e633369f48e2341a6363ce836cfbe8e7821af955cb0cf4d5f26]; will not log again for this container unless duration exceeds 2s

Add is there anything from the logs you can share since you say it does work for 5s before it then exits out? — Graham Dumpleton

jorgemoralespou jorgemoralespou · Accepted Answer · 2018-04-10T16:28:20

There's a cAdvisor process embedded in the OpenShift's kubelet. Maybe there's a race condition that makes the pod crash.

Openshift Monitoring - cAdvisor + Prometheus - Docker

2 Answers