I am using Spring Boot 2.1.1.RELEASE and Spring Cloud Greenwich.RC2, and the managed version for spring-cloud-stream-binder-kafka is 2.1.0RC4. The Kafka version is 1.1.0. I have set the following properties as the messages should not be consumed if there is an error.
spring.cloud.stream.bindings.input.group=consumer-gp-1
...
spring.cloud.stream.kafka.bindings.input.consumer.autoCommitOnError=false
spring.cloud.stream.kafka.bindings.input.consumer.enableDlq=false
spring.cloud.stream.bindings.input.consumer.max-attempts=3
spring.cloud.stream.bindings.input.consumer.back-off-initial-interval=1000
spring.cloud.stream.bindings.input.consumer.back-off-max-interval=3000
spring.cloud.stream.bindings.input.consumer.back-off-multiplier=2.0
....
There are 20 partitions in the Kafka topic and Kerberos is used for authentication (not sure if this is relevant).
The Kafka consumer is calling a web service for every message it processes, and if the web service is unavailable then I expect that the consumer will then try to process the message for 3 times before it moves on to the next message. So for my test, I disabled the webservice, and therefore none of the message could be processed correctly. From the logs I can see that this is happening.
After a while I stopped and then restarted the Kafka consumer (webservice is still disabled). I was expecting that after the restart of the Kafka consumer, it would attempt to process the messages that was not successfully processed the first time around. From the logs (I printed out each message with its fields) after the restart of the Kafka Consumer I couldn't see this happening. I thought the partition might be influencing something, but I check the logs and all 20 partitions were assigned to this single consumer.
Is there a property I have missed? I thought the expected behavior when I restart the consumer the second time, is that Kafka broker would pass the records that were not successfully processed to the consumer again.
Thanks
>Then after retrying many records 3 times, record 3 is retried again but not record 1 or 2.
That part makes no sense; the running app should not go back to "record 3". Perhaps edit the question to add some logs to show the behavior you are seeing. – Gary Russell