I came across a post Scaling Klaviyo’s Event processing Pipeline with Stream Processing, in the post, people in company called Klaviyo do the counting in different timeframes, hourly, daily, even monthly.
I have a couple of questions, if I understand it correctly, they're using timewindow, but is it normal to use timewindow for such long time, like a day?!
That doesn't make sense to me, if you're doing a daily or monthly counting, why not use batch processing? What is the fundamental benefit of using streaming in such case?
A different case, if I need to count the kafka event from the very beginning, in real time, what is the real-world solution? Use flink streaming to update a "counter" in redis every time an event arrives? If the kafka is quite busy, like several millions messages per second, wouldn't there be too much IO and network?