0
votes

We have Samza tasks which reads messages from Kafka Output stream but if there is any retryable failure while processing the message then i would want my Samza task to read the same message again and reprocess it. And after successfully processing the message acknowledge it for checkpointing.

Is there a way to manually control the checkpoint(just like what Kafka Consumer provides "Manual Offset Control" by setting enable.auto.commit to false : https://kafka.apache.org/0100/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html )

I came across this doc https://samza.apache.org/learn/documentation/0.13/jobs/reprocessing.html which talks about reprocessing previously processed data but it is not offering any acknowledge based checkpoint control.

Found relate issues https://github.com/zendesk/ruby-kafka/issues/304

1

1 Answers

3
votes

Sid,

You can perform manual commit from your StreamTask. If you set task.commit.ms is set to -1, then auto-commit in the Samza job is disabled. In such a case, the task should manually trigger a commit by invoking taskCoordinator.commit() when you are ready to acknowledge.

You can find documentation regarding checkpointing here - http://samza.apache.org/learn/documentation/0.13/container/checkpointing.html. I believe documentation is insufficient in the website as it doesn't elaborately cover the manual commit scenario. The configuration table also needs to be updated so that it is obvious to the user that manual commit is supported.

HTH :)