I'm trying to understand the runtime aspect of Flink when dealing with multiple data streams and multiple operators per data stream.
Use case: N data streams in a single flink job (each data stream representing 1 device - with different time latencies), and each of these data streams gets split into two streams, of which one goes into a bunch of CEP operators, and one into a process function.
Questions:
- At runtime, will the engine create one thread per data stream? Or one thread per operator?
- Is it possible to dynamically create a data stream at runtime when the job starts? (i.e. if N is read from a file when the job starts and corresponding N streams need to be created)
- Are there any specific performance impacts when a large number of streams (N ~ 10000) are created, as opposed to N partitions within a single stream?