I am researching Flink for more than a week now. We are consuming events from Kafka and we want events belongs to a specific object id needs to process in the order of event time. So far my research tells me, I should be using keyby and timeWindows, is my understanding correct?
Another question, When one taskmanager goes down, only those events will belong to that task manager will be stopped processed until the task manager comes up? Does checkpoint mechanism aware the events that are not being processed, how it will request Kafka about those events?
Question with use case below
In a CallCenter, the agent will be receiving the calls and will go into different states. For every action by the agent say login, idle, busy,etc, we get agent event of that action as state through Kafka. The requirement is that we have to process the events in order by agent, we cannot process agent idle event before login event. We need to process these in order at the same time we need to scale out.
In Flink cluster with the parallel process, we should not end up processing the agent information in different partitions/TaskSlots with the bad state of an agent. My question is keyBy agentId would divide the stream into substreams and process them in a designated partition all the time, that way order of event processing is maintained.
Also, another question is if there is an exception/task manager going down for the partition which processing a particular agent data, how Flink knows to request only those agent events after recovery.