Flink shut down hook to minimize data loss/duplication

Question

I have a flink job that reads data from kafka, does some reads from redis and then writes aggregated windowed data to a redis sink (the redis writes are actually calling a lua script loaded into redis that increments existing values, so I can only increment here and not update).

the problem is that when I'll stop the job (maintenance, code changes, etc), even with savepoints, I'm bound to either write duplicate data to redis or lose some data when resuming, because as far as I understand it, redis sink has no guarantees regarding semantics (exactly/at-least/at-most once).

The question is - is there some kind of a shut-down signal that will allow me to cleanly shut down the job to guarantee exactly once semantics?

In other words, what I'm looking for is to:

receive the shut-down signal (from canceling the job?)
stop reads from kafka and commit the offset (already done inside the connector?)
finish processing the remaining data (windows are very short - tumbling windows of 15 seconds, processing time)
write the last output of the last window back to redis
shutdown the job

Is this possible to do? any other ideas on how I can deal with downtime (planned/unplanned) will be welcomed.

David Anderson David Anderson · Accepted Answer · 2017-05-03T12:35:44

Since version 1.2, Flink has a cancel with savepoint operation that's available from both the CLI and the rest API. See the docs and pull request.

Flink shut down hook to minimize data loss/duplication

1 Answers