Looking at the latest (v0.10) Kafka Consumer documentation:
"The position of the consumer gives the offset of the next record that will be given out. It will be one larger than the highest offset the consumer has seen in that partition. It automatically advances every time the consumer receives data calls poll(long) and receives messages."
Is there a way to query for the largest offset available for the partition on the server side, without retrieving all the messages?
The logic I am trying to implement is as follows:
- query every second for the amount (A) of pending messages in a topic
- if A > threshold, wake up a processor that would go ahead retrieving all the messages, and processing them
- otherwise do nothing (sleep 1)
The motivation is that I need to do some batch processing, but I want the processor to wake up only when there is enough data (and I don't want to retrieve all the data twice).