My flink job reads from kafka consumer using FlinkKafkaConsumer010 and sinks into hdfs using CustomBucketingSink. We have series of transformations kafka -> flatmaps(2-3 transformations) -> keyBy -> tumblingWindow(5 mins) -> Aggregation -> hdfsSink. We have kafka input of 3 millions/min events on an average and around 20 millions/min events on peak time. Checkpointing duration and minimum pause between two checkpoiting is 3 mins and i am using FsStateBackend.
Here are my assumptions :
Flink consumes some fixed number of events from kafka(multiple offsets from multiple partitions at once) and waits till it reachs to sink and then checkpoints. In case of success it commits the kafka partitions offset it read and maintains some state related to hdfs file it was writting. While multiple transformations were going after kafka hand over events to other operators, kafka consumer sits idle until it gets confirmation for success for the events that it sent. So we can say while sink is writting data to hdfs all previous operators were sitting idle. In case of failure flink goes to previous checkpoint state and points to kafka last partition offset committed and points to hdfs file offest it should start writting to.
Here are my doubts based on above assumptions:
1) Is above assumption is correct. 2) Does it make sense for tumbling window to have state as in case of failure anyway we start from last kafka partition commited offset. 3) In case tumbling window make state, when will this state can be used by flink. 4) Why checkpoint and savepoint state size vary. 5) In case of any failure, flink always starts from sorce operator. Right ?