I am in the process of setting up monitoring for kafka consumers and brokers. Monitoring the server metrics seems fairly trivial but I am confused with the kafka consumer metrics specifically lag.
I initially obtained the consumer lag on a topic per partition running the consumer-groups.sh describe group script programmatically. There is also the consumer_offsets topic which I believe reveals the lag as well. But I was informed this lag value is not accurate and I should be obtaining it via jmx metrics on the consumer host. Can someone verify if this is correct and why? Basically I want to know which would be the most reliable means to find the correct lag for a consumer.
This is what I am told I should be retrieving: kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} Attribute: records-lag-max
The problem is that not sure how to access the consumer client server if not given the port or is there a default port for this? Also do all kafka consumer clients register jmx metrics?
Thanks