2
votes

I keep in a mapWithState a pair composed of String as key and an Object that contains an array as State. I'm updating the array if a new stream containing the same key appears. Is their a possibility that the array will be updated twice if the spark app runs on multiple nodes, or is spark letting only one node at a time to update the state? I don't now exactly how the mapWithState execution model works.

Thank you!

1

1 Answers

1
votes

StateSpec function will be called for each key value pair so there can be multiple updates per batch but individual updates are sequential and operate on partitioned data so there will be no update conflicts if this is what you're worried about.