First of all I'd like to understand clearly something, if I run in a kubernetes cluster a telegraf daemonset, it will collect the metrics of the pods? Or it will collect the metrics of the physical nodes?
I've created a telegraf daemonset in my test kubernetes cluster running on my laptop under hyperv, based on this kubernetes cluster installation:
I would like to collect metrics of the pods but it doesn't arrive to the kafka machine. I get this error in the logs:
2019-05-08T02:36:35Z I! Starting Telegraf 1.9.2
2019-05-08T02:36:35Z I! Using config file: /etc/telegraf/telegraf.conf
2019-05-08T02:46:36Z E! [agent] Failed to connect to output kafka, retrying in 15s, error was 'kafka: client has run out of available brokers to talk to (Is your cluster reachable?)'
This is the daemonset definition file:
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
data:
telegraf.conf: |+
[global_tags]
env = "$ENV"
[agent]
hostname = "$HOSTNAME"
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "2s"
precision = ""
debug = false
quiet = true
logfile = ""
[[outputs.kafka]]
brokers = ["10.121.63.5:9092", "10.121.63.18:9092", "10.121.62.64:9092", "10.121.62.80:9092", "10.121.63.22:9092"]
topic = "telegraf-measurements-json"
client_id = "golangsarama__1.18.0__serverinfra__telegraf"
routing_tag = "host"
version = "0.11.0.2"
compression_codec = 2
required_acks = 1
data_format = "json"
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
[[inputs.kubernetes]]
url = "https://192.168.213.18:6443"
insecure_skip_verify = true
---
# Section: Daemonset
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
spec:
selector:
matchLabels:
name: telegraf
template:
metadata:
labels:
name: telegraf
spec:
containers:
- name: telegraf
image: docker.io/telegraf:1.9.2
resources:
limits:
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: "HOST_PROC"
value: "/rootfs/proc"
- name: "HOST_SYS"
value: "/rootfs/sys"
- name: ENV
valueFrom:
secretKeyRef:
name: telegraf
key: env
volumeMounts:
- name: sys
mountPath: /rootfs/sys
readOnly: true
- name: proc
mountPath: /rootfs/proc
readOnly: true
- name: docker-socket
mountPath: /var/run/docker.sock
- name: utmp
mountPath: /var/run/utmp
readOnly: true
- name: config
mountPath: /etc/telegraf
terminationGracePeriodSeconds: 30
volumes:
- name: sys
hostPath:
path: /sys
- name: docker-socket
hostPath:
path: /var/run/docker.sock
- name: proc
hostPath:
path: /proc
- name: utmp
hostPath:
path: /var/run/utmp
- name: config
configMap:
name: telegraf
This is the article that I followed to create a daemonset.
Here is the pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
default nginx-65f88748fd-jztrz 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-rl48l 1/1 Running 0 7d18h
kube-system coredns-fb8b8dccf-x8fvx 1/1 Running 0 7d18h
kube-system etcd-k8s-master 1/1 Running 2 7d18h
kube-system kube-apiserver-k8s-master 1/1 Running 2 7d18h
kube-system kube-controller-manager-k8s-master 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-96tsl 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-b884r 1/1 Running 0 7d18h
kube-system kube-flannel-ds-amd64-pdqmq 1/1 Running 0 7d18h
kube-system kube-proxy-42k2g 1/1 Running 0 7d18h
kube-system kube-proxy-77pw9 1/1 Running 0 7d18h
kube-system kube-proxy-n5mbs 1/1 Running 0 7d18h
kube-system kube-scheduler-k8s-master 1/1 Running 2 7d18h
monitoring telegraf-dvtcl 1/1 Running 5 117m
monitoring telegraf-n2mqz 1/1 Running 5 117m
tcpdump shows that something sent from the daemonset:
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
z#0........................
09:52:59.002901 IP 192.168.1.10.45546 > sdsfdsf.XmlIpcRegSvc: Flags [S], seq 3040818525, win 28200, options [mss 1410,sackOK,TS val 158999344 ecr 0,nop,wscale 7], length 0
E..<2.@.@......
y?...#..?5]......n(._.........
But I can't see anything on our grafana dashboard. If I install a standalone rpm based telegraf on the nodes, it sents out and I can see the metrics. But I'm curious of the pod metrics.