I'm processing a stream of events coming from IOT devices.
These events have a first level of timestamp, set by the network. They're also packing together several measures taken at different points in time. For instance:
- network time 9:08
- measure M1 taken at 8:52
- measure M2 taken at 9:07
The measures are to be aggregated hourly, in this case M1 should go in an 8:00-9:00 window, and M2 in a 9:00-10:00 window.
I wonder what is the proper way to design my flink app, manage those timestamps, and the related watermarks. From my understanding so far:
- I should probably put all the processing related to network time (9:08) in a separate Flink app.
- Have a Flink app processing the measures after they are unpacked (flap-mapped). Then assign the timestamp with
assignTimestampsAndWatermarks()
, correct ? What strategy should I use, given the 15mn spread there is between measures coming simultaneously ?
--
PS: nope, I can't change the IOT device
PPS: I plan to use EMR, so flink 1.11, if it has any impact on design.