After reading the flink docs, (relevant part noted below) I still did not completely understand the atomicity and key distribution.
i.e consider a graph consisting of keyby->flatmap(containing a map state), and parallelism set as 1 with 4 task slots, does flink ensure that each key exists only once (in one task slot) in the distributed environment, and is it the atomic unit? Thanks in advance to all helpers.
You can think of Keyed State as Operator State that has been partitioned, or sharded, with exactly one state-partition per key. Each keyed-state is logically bound to a unique composite of
<parallel-operator-instance, key>
, and since each key “belongs” to exactly one parallel instance of a keyed operator, we can think of this simply as<operator, key>
.Keyed State is further organized into so-called Key Groups. Key Groups are the atomic unit by which Flink can redistribute Keyed State; there are exactly as many Key Groups as the defined maximum parallelism. During execution each parallel instance of a keyed operator works with the keys for one or more Key Groups.