Our production encounter one Kafka Event Consumption phenomenon. Total event Volume is 3.4 Billion Events with 40 partitions. And event message almost even distribute on each partition with 80+ million event per partition.
And we allocate 40 consumer streams with 40 threads (BTW, we 're using kafka client 0.8.2).
During Consumption periods, on the first 4 hours each partition's lag keep drop down. On the last one hour, 2/3 consumer streams have finished event consumption. Only less than 10 consumer stream continue to receive remaining events. For the related less than 10 partition's lag ranges on 2-3 millions. It means consumer pool usage gradually back to idle while waiting for remaining few consumer finish tasks.
Assume cpu cores & memory space & network bandwidth is enough, any tips to ensure Kafka consumer can finish overall consumption within one time and no more consumer fall behind (Except for enlarge partition number).