0
votes

First, this is my code logic

source.keby(key)
.window(1d)
.process(computeLogic)
.sink(sinkFunction)

My question is:

If I have too many keys, let's say more than 10,000 keys, will there be one window for each key? Does too many Windows take up too much memory and cause OOM ?

1
10,000 keys is a small number and should be no problem, unless you've got some huge record per-key that has to be kept around. Normally if the thing being aggregated is a metric (e.g. an int, or double) then you don't need to worry until you get to millions of keys per operator sub-task.kkrugler

1 Answers

0
votes

will there be one window for each key?

It depends. You can define your own Window which achieves one key per window. For pre-defined window, like Tumbling Window, there will be multiple windows per key. E.g. For Tumbling Event Window of 5 seconds with allowed lateness, there would be multiple windows for the same key.

Does too many Windows take up too much memory and cause OOM ?

If there are many keys, you can add more parallelism to Flink job, so each task will handle less keys.

For the case with lots of windows on the task, if you use Heap State(which is memory based state), then it may cause OOM. For state backend like RocksDB state backend, then it should be fine as the state will be flushed to disk.

For more details, you can refer to: https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html