I'm building a Flink Streaming system that can handle both live data and historical data. All data comes from the same source and then in split into historical and live. The live data gets timestamped and watermarked, while the historical data is received in-order. After the live stream is windowed, both streams are unioned and flow into the same processing pipeline.
I cannot find anywhere if all records in an EventTime streaming environment need to be timestamped, or if Flink can even handle this mix of live and historical data at the same time. Is this a feasible approach or will it create problems that I am too inexperienced to see? What will the impact be on the order of the data?
We have this setup to allow us to do partial-backfills. Each stream is keyed by an id, and we send in historical data to replace the observed data for one id while not affecting the live processing of other ids.
