0
votes

I have use case where i need to read data from topic then batch data(100 records) and write the batch to specific file or external store. I am planning to use processor API for this and batch the data in process method using state store backed by kafka and write to file once the batch size reaches 100 records. Clear the batch from the state store to create fresh new batch.

One more requirements is that we cannot have duplicates in data. This mean same record cannot be in two different batches.

Does streams exactly once fit this use case?? I read in the design that its not recommended if we are batching data and most of the articles around this say that Exactly once works only in the case of consume process and produce pattern.

2
why don't you use kafka consumer?Deadpool
Yes that what i would lean to if streams is not the best use case for this.user9656219

2 Answers

2
votes

Kafka Stream's exactly once does only work if you write the result back to Kafka. Because you want to write data to an external system, Kafka cannot provide any help for exactly-once guarantees, because Kafka transactions are not cross-system transactions.

0
votes

As pointed out @Matthias, Exactly one semantics only work with Kafka streams to Kafka streams type application, integration with an external system is likely to break the semantics. You can read more about it in this article.

I would suggest you use Kafka Consumer API as it will provide the best balance between flexibility and abstraction for your use case. All you need to do is to remove enable.auto.commit=false and manually commit after successfully writing the batch to the external system using consumer.commitSync();

Ensuring exactly once can get a little difficult sometimes depending on your use case. You'll need to make sure that your consumer is idempotent using custom logic. You can consider using external persistent storage to keep to hash (or the key if it is unique) of the messages and check for each message if it is not already processed. You can also use state store for this purpose but I have felt that clearing a state store sometimes becomes a hassle, but it depends a lot on your use case.

You can check out this article if it helps.