0
votes

I'm going to create Cloud Dataflow pipeline (using Apache beam) which assumes next steps:

  1. Reading messages from Kafka
  2. Processing messages
  3. Writing processed messages to Google Cloud Storage

I would like to commit offset to Kafka only if the message is stored in GCS succesfully, that is, implement exactly-once semantic to this flow.

How can I do it, is there any out-of-box support in KafkaIO.Read of at least any possibility to manage offsets manually?

2

2 Answers

0
votes

There is a Kafka Connect plugin for this https://docs.confluent.io/current/connect/kafka-connect-gcs/gcs_connector.html

If you prefer to not use Kafka Connect, the source is open for that plugin and you can see how exactly once is achieved. However, I'd suggest giving it a go with Connect. It's pretty straightforward.

0
votes

Did we find any thing here. I am also looking to create a apache beam job which reads data from kafka and writes to GCS . I want to commit checkpoint(offset) after the upload is succ