Kafka streams exactly once processing use case

Question

I have use case where i need to read data from topic then batch data(100 records) and write the batch to specific file or external store. I am planning to use processor API for this and batch the data in process method using state store backed by kafka and write to file once the batch size reaches 100 records. Clear the batch from the state store to create fresh new batch.

One more requirements is that we cannot have duplicates in data. This mean same record cannot be in two different batches.

Does streams exactly once fit this use case?? I read in the design that its not recommended if we are batching data and most of the articles around this say that Exactly once works only in the case of consume process and produce pattern.

Yes that what i would lean to if streams is not the best use case for this. — user9656219

Matthias J. Sax Matthias J. Sax · Accepted Answer · 2019-01-13T01:11:29

Kafka Stream's exactly once does only work if you write the result back to Kafka. Because you want to write data to an external system, Kafka cannot provide any help for exactly-once guarantees, because Kafka transactions are not cross-system transactions.

Kafka streams exactly once processing use case

2 Answers