1
votes

I have two streams left and right. For the same time window let's say that

  • the left stream contains the elements L1, L2 (the number is the key)
  • the right stream contains the elements R1, R3

I wonder how to implement a LEFT OUTER JOIN in Apache Flink so that the result obtained when processing this window is the following:

(L1, R1), (L2, null)

L1, R1 are matching by key (1), and L2, R3 do not match. L2 is included because is at left

1
This greatly depends on the specific usecase, but I can think of some methods to emulate this behaviour. Could You tell what should happen in case if one stream is (L1, L1, L2, L3, L4) and the other (R1, R3, R3, R4, R5) ? Do You even consider duplicates or are You deduplicating the streams beforehand ?Dominik Wosiński
in my case key is a UUID, therefore I dont expect a duplicated key within the same windowjmhostalet

1 Answers

1
votes

Well, You should be able to obtain the proper results with the coGroup operator and properly implemented CoGroupFunction. The function gives You access to the whole group in the coGroup method. The documentation states that for CoGroupFunction one of the groups may be empty, so this should allow You to implement the Outer Join. The only issue is the fact that groups are currently created in memory, so You need to verify that Your groups won't grow too big as they can effectively kill the JVM.