I need to join two event sources based on a key. The gap between the events can be up to 1 year(ie. event1 with id1 may arrive today and the corresponding event2 with id1 from the second event source may arrive a year later). Assume I want to just stream out the joined event output.
I am exploring the option of using Flink with the RocksDB backend(I came across Table APIs which appear to suit my use case). I am not able to find references architectures that do this kind of long window joins. I am expecting the system to process about 200M events a day.
Questions:
Are there any obvious limitations/pitfalls of using Flink for this kind of Long Window joins?
Any recommendations on handling this kind of long window joins
Related: I am also exploring using Lambda with DynamoDB as the state to do stream joins(Related Question). I will be using managed AWS services if this info is relevant.