I'm running a job using Beam KafkaIO source in Google Dataflow and cannot find an easy way to persist offsets across job restarts (job update option is not enough, i need to restart the job)
Comparing Beam's KafkaIO against PubSubIO (or to be precise comparing PubsubCheckpoint with KafkaCheckpointMark) I can see that checkpoint persistence is not implemented in KafkaIO (KafkaCheckpointMark.finalizeCheckpoint method is empty) whereas it's implemented in PubsubCheckpoint.finalizeCheckpoint which does acknowledgement to PubSub.
Does this mean I have no means of reliably managing Kafka offsets on job restarts with minimum effort?
Options I considered so far:
Implement my own logic for persisting offsets - sounds complicated, I'm using Beam though Scio in Scala.
Do nothing but that would result in many duplicates on job restarts (topic has 30 days retention period).
Enable auto-commit but that would result in lost messages so even worse.