4
votes

My Question in regarding Apache Flink framework.

Is there any way to support more than one streaming source like kafka and twitter in single flink job? Is there any work around.Can we process more than one streaming sources at a time in single flink job?

I am currently working in Spark Streaming and this is the limitation there.

Is this achievable by other streaming frameworks like Apache Samza,Storm or NIFI?

Response is much awaited.

2

2 Answers

6
votes

Yes, this is possible in Flink and Storm (no clue about Samza or NIFI...)

You can add as many source operators as you want and each can consume from a different source.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Properties properties = ... // see Flink webpage for more details    

DataStream<String> stream1 = env.addSource(new FlinkKafkaConsumer08<>("topic", new SimpleStringSchema(), properties);)
DataStream<String> stream2 = env.readTextFile("/tmp/myFile.txt");

DataStream<String> allStreams = stream1.union(stream2);

For Storm using low level API, the pattern is similar. See An Apache Storm bolt receive multiple input tuples from different spout/bolt

0
votes

Some solutions have already been covered, I just want to add that in a NiFi flow you can ingest many different sources, and process them either separately or together.

It is also possible to ingest a source, and have multiple teams build flows on this without needing to ingest the data multiple times.