I am reading events from PubSub and the goal is to group them into windows. I would like to make the end of each window coincide with the minutes 0, 15, 30 and 45 of each hour.
Since this is a streaming job, it could be launched at any time, and I would like to find a way to align the size of the first window with the next ones.
This would be the stream:
- Launch the job
- Define as
window_size
the time remaining between this moment and the next quarter of an hour - Starting from the end of this first window, set the
window_size = int(15*60)
(seconds).
For example:
- Launch the job
- Now it's 11:18, so fix
window_size = (11:30-11:18).seconds
- When this first window will end, set
window_size = int(15*60)
(seconds)
In one of the examples provided by Google, the pipeline working with windowing is defined as follows, where window_size
is a parameter passed as input by the user:
def expand(self, pcoll):
return (
pcoll
| "Window into Fixed Intervals" >> beam.WindowInto(window.FixedWindows(self.window_size))
| "Add Key" >> beam.Map(lambda elem: (None, elem))
| "Groupby" >> beam.GroupByKey()
| "Abandon Key" >> beam.MapTuple(lambda _, val: val)
)