Fault tolerance never provides "execute exactly once" guarantee, this is not possible. If a cluster member crashes, you don't know whether it executed the REST operation or not. Even if the REST operation itself failed, it might have been executed remotely, but the response delivery failed - you don't know.
Rather, if operations fail, they are retried. Internal Jet vertices, such as the window accumulators, save all state to snapshot. In other words, there is no state of that vertices that is not saved to the snapshot. So, if a job is restarted, the actions that were performed after the last snapshot are discarded and the state is restored as if that actions were never performed. That's why we can call it "exactly once".
However, this is not generally possible with sinks: if we execute a REST operation, there's no way to reset the remote side to a state as if the operation was never executed. If there was, you can write an exactly-once sink.
We call the IMap
sink exactly-once because if you execute map.put("key", "value")
multiple times, the value for key "key"
will still be "value"
. This is called idempotence. Even though the put
operation might be executed multiple times, the effect is as if it was executed once.
Idempotence might be the way to go for your REST service. It can be implemented, for example, by ignoring duplicates. There's no way to solve it with IMap: even if you were somehow able to "listen exactly once", the REST operation might fail and you don't know for sure whether it was executed on the remote side or not.