1
votes

I need to process some infinitive incoming stream and send results to some external server, for example via REST. I need a "Exactly-once" guarantee. Can I achieve this with hazelcast jet?

The docs says

As of version 0.6, Hazelcast Jet supports exactly-once processing with the source being either a Hazelcast IMap or a Kafka topic, and the sink being a Hazelcast IMap.

It's ok for me to use IMap as a sink, but I'm confused how to "extract" new data from it. Is there any "exactly-once" IMap event listener?

1

1 Answers

3
votes

Fault tolerance never provides "execute exactly once" guarantee, this is not possible. If a cluster member crashes, you don't know whether it executed the REST operation or not. Even if the REST operation itself failed, it might have been executed remotely, but the response delivery failed - you don't know.

Rather, if operations fail, they are retried. Internal Jet vertices, such as the window accumulators, save all state to snapshot. In other words, there is no state of that vertices that is not saved to the snapshot. So, if a job is restarted, the actions that were performed after the last snapshot are discarded and the state is restored as if that actions were never performed. That's why we can call it "exactly once".

However, this is not generally possible with sinks: if we execute a REST operation, there's no way to reset the remote side to a state as if the operation was never executed. If there was, you can write an exactly-once sink.

We call the IMap sink exactly-once because if you execute map.put("key", "value") multiple times, the value for key "key" will still be "value". This is called idempotence. Even though the put operation might be executed multiple times, the effect is as if it was executed once.

Idempotence might be the way to go for your REST service. It can be implemented, for example, by ignoring duplicates. There's no way to solve it with IMap: even if you were somehow able to "listen exactly once", the REST operation might fail and you don't know for sure whether it was executed on the remote side or not.