0
votes

My job does the following things:

  1. Consumes events from Kafka topic based on event time.
  2. Computes a window size of 7 days and in a slide of 1 day.
  3. Sink the results to Redis.

I have several issues:

  1. In case it consumes Kafka events from the lastest record, after 1 day the job is alive, the job closes the window and computes 7 days window. The problem is that the job has only data for 1 day and hence the results are wrong.
  2. If I try to let it consumes the Kafka events from a timestamp of 7 days ago, as the job starts, it calculates the whole windows from the first day, and it took a lot of time. Also, I want just the last window results because this is what matters for me.

Have I missed something? Is there a better way to do that?

1
I am not really sure what is Your problem here. The window with size of 7 days and slide of 1 day means that the window should be closed after 7 days but new window will be created each day. From Your post It seems that the whole 7 day window is closed after 1 day of data received, which should not happen. - Dominik WosiƄski
The problem is when I start the job and the first event arrived with an event time of week ago, flink closes the window and computes it for a week ago behind this event time. - Yair Cohen
Can you add your topology? It might be that your timestamp extractor and watermark assigner are not working as expected. It would also help if you could give a minimal example of what you see and what you expect. - Arvid Heise

1 Answers

3
votes

Flink aligns time windows to the epoch. So if you have windows that are one hour long, they run from the top of the hour to the top of the hour. Day long windows run from midnight to midnight. The same principle applies to windows that are seven days long, and since the epoch began on a Thursday (Jan 1, 1970), a window that is seven days long should close at midnight on Wednesday night / Thursday morning.

You can supply an offset to the window constructor if you want to shift the windows to start at a different time.