I have Apache Beam 2.2.0 realtime job, which is running on Google Dataflow platform. The job basically reads json events from Kafka, transforms it and writes to BigQuery. The problem is that Google Dataflow runner constantly closes existing Kafka consumer and then creates new one with the following message:
logger: "com.google.cloud.dataflow.worker.ReaderCache"
message: "Closing idle reader for S4-0000000000000014"
stage: "S4"
step: "Read from Kafka/KafkaIO.Read/KafkaIO.Read/Read(UnboundedKafkaSource)/DataflowRunner.StreamingUnboundedRead.ReadWithIds"
However, according to monitoring of Kafka consumers offset lag, there are plenty of messages in Kafka to consumer.
The question is why Google Dataflow do this? How does it determine that reader is idle? How to prevent this behaviour?