Difference between Apache Storm and Flink

Question

I'm working with these two real time data stream framework processor. I've searched everywhere but I can't find big difference between these two framework. In particular I would like to know how they work based on size of data or topology etc.

This seems to be a duplicate of stackoverflow.com/questions/55964790/… — Fabian Hueske
Isn't your question a duplicate of the one I linked ("What is/are the main difference(s) between Flink and Storm?")? If it is not, it would be good to rephrase the question title to point out the difference. — Fabian Hueske
@FabianHueske I guess you mixed the links. This seems to be the duplicate: stackoverflow.com/questions/30699119/… — TobiSH
Exactly @TobiSH i don't understand Fabian post because it's link report to my post. My question is little bit different and you can see these difference in the answer of jbx. Moreover the other one is 3 years older and answer may be different. — Marco Domenicano

jbx jbx · Accepted Answer · 2019-05-03T07:19:45

The difference is mainly on the level of abstraction you have on processing streams of data.

Apache Storm is a bit more low level, dealing with the data sources (Spouts) and processors (Bolts) connected together to perform transformations and aggregations on individual messages in a reactive way.

There is a Trident API that abstracts a little from this low level message driven view, into more aggregated query like constructs, which makes things a bit easier to integrate together. (There is also an SQL-like interface for querying data streams, but it is still marked as experimental.)

From the documentation:

TridentState wordCounts =
     topology.newStream("spout1", spout)
       .each(new Fields("sentence"), new Split(), new Fields("word"))
       .groupBy(new Fields("word"))
       .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))                
       .parallelismHint(6);

Apache Flink has a more functional-like interface to process events. If you are used to the Java 8 style of stream processing (or to other functional-style languages like Scala or Kotlin), this will look very familiar. It also has a nice web based monitoring tool. The nice thing about it is that it has built-in constructs for aggregating by time windows etc. (Which in Storm you can probably do too with Trident).

From the documentation:

 DataStream<WordWithCount> windowCounts = text
            .flatMap(new FlatMapFunction<String, WordWithCount>() {
                @Override
                public void flatMap(String value, Collector<WordWithCount> out) {
                    for (String word : value.split("\\s")) {
                        out.collect(new WordWithCount(word, 1L));
                    }
                }
            })
            .keyBy("word")
            .timeWindow(Time.seconds(5), Time.seconds(1))
            .reduce(new ReduceFunction<WordWithCount>() {
                @Override
                public WordWithCount reduce(WordWithCount a, WordWithCount b) {
                    return new WordWithCount(a.word, a.count + b.count);
                }
            });

When I was evaluating the two, I went with Flink, simply because at that time it felt more well documented and I got started with it much more easily. Storm was slightly more obscure. There is a course on Udacity which helped me understand it much more, but in the end Flink still felt more fit for my needs.

You might also want to look at this answer here, albeit a bit old so both projects must have evolved since then.

Difference between Apache Storm and Flink

1 Answers