I have a flink job, which reads user events, uses session windows and writes back to kafka.
The state backend that I'm using is s3 (no hdfs cluster, just using the libs).
The problem is that the end to end checkpointing time keeps rising until checkpoints are dropped, and most of the time is spent on "Alignment".
The question is - why?, how can I solve this without setting the checkpointing mode to AT_LEAST_ONCE?