Say I have some a streaming data of the schema as follows:
uid: string
ts: timestamp
Now assuming the data has been partitioned by uid
(in each partition, the data is minimal, e.g. less than 1 row/sec).
I would like to put the data (in each partition) into windows based on event time ts
, then sort all the elements within each window (based on ts
as well), at last apply a custom transformation on each of the element in the window in order.
Q1: Is there any way to get an aggregated view of the window, but keep each element, e.g. materialize the all the elements in a window into a list?
Q2: If Q1 is possible, I would like to set a watermark and trigger combination, which triggers once at the end of the window, then either trigger periodically or trigger every time late data arrives. Is it possible?