0
votes

Is there a way we can rewind the offset in Structured Streaming? I am using Spark version 3 and I have configured my startingoffset as earliest and every restart after that will be picking the offset value from checkpoint directory.

For example: current offset in Kafka is 1000, committed offset in checkpoint directory is 900. I want to reconsume the offset from 800 again. How can I achieve this?

If I cancel the current run and reset the offset value for the consumer group using below command. Will structured streaming pick the offset from there on restart instead of considering checkpoint directory?

kafka-consumer-groups.sh --bootstrap-server <broker hostname> \
  --group <consumer group> --reset-offsets --to-offset 800 \
  --topic <topicName>:<partition number> \
  --execute
1

1 Answers

0
votes

Spark Structured Streaming will not commit any offsets back to Kafka but only keep track of the consumed offsets in its checkpoint files.

That means using the kafka-consumer-groups.sh tool will not help.

If you want to start reading from offset 800 you need to delete your checkpoint files and use the readStream option startingOffsets as described in the Structured Streaming + Kafka Integration Guide:

enter image description here