0
votes

I am trying to build a Flink job that would read data from a Kafka source do a bunch of processing including few REST calls and then finally sink into another Kafka topic.

The problem I trying to address is that of message retries. What if there are transient errors in the REST API? How can I do exponential backoff-based retry of these messages like the way Storm supports?

I have 2 approaches that I could think off

  1. Use TimerService but then in case of failures the state will start to expand uncontrollably.
  2. Write failed message to a different Kafka topic and process them with a delay of sorts, but here the problem can arise if the Sink itself is down for few minutes?

Is there a better more robust and simpler way to achieve this?

1
For approach #2, is your Kafka cluster down so often that simply relying on Flink to fail if the sink is down and then restart from a checkpoint isn't good enough?David Anderson

1 Answers

1
votes

I would use Flink's AsyncFunction to make the REST calls. If needed, it will backpressure the source(s) rather than use more than a configured amount of state. For retries, see AsyncFunction retries.