0
votes

I have two unbounded streams each with a 2 minute window and AfterWatermark.pastEndOfWindow() trigger. After checking the results from an outer join it seems that the windows are not aligned. Beam aligns the data on the left side of the join, but takes data on the right across 4 minute overlapping intervals:

Stream A |--|          (observed range after join from window 1)
            |--|       (observed range after join from window 2)
               |--|    (observed range after join from window 3)
Stream B |----|        (observed range after join from window 1)
            |----|     (observed range after join from window 2)
               |----|  (observed range after join from window 3)

So for example window 1 has events from Stream A for 0-2 time period (as expected) and events from Stream B from 0-4 time period, and in window 2, I got events from Stream A time period 2-4 and Stream B, time period 2-6.

How Beam decided which window the data goes, in a join coming from two fixed windows, which are not aligned?

1
I think if you'd like to align and join two streams, you should pick up a key and have a GroupByKey applied then you're working on a stream of data from same keys. If possible, could you please share some code about how you perform the join?Zhou Yunqing
@ZhouYunqing forgot to mention I have a groupby for the two streams I join.jamborta
This still does not clarify how you perform the join. If you are applying a ParDo with one of the streams as a side input, the resulting PCollection should group data by 2' windows. If you are using CoGroupByKey, both your streams must have the same windowing strategy and can only result in a stream grouped by 2' windows. At this point it would make things easier if you shared your code.Lefteris S

1 Answers

0
votes

If you're using CoGroupByKey to join PCollections, beam roughly requires them to have the same window (the actual equality depends on the implementation of WindowFn#verifyCompatibility). Therefore, the scenario you mentioned above does not happen.