4
votes

While exploring Akka streams, I also came across Apache Flink which stream processing engine. Akka streams implements reactive streams and supports back pressure.

So if I have to make decision between two, which one should I go for? How do they differ and whats the similarity? What should be the criteria here?

2

2 Answers

8
votes

Akka Streams is a library implementing reactive streams specification.

Apache Flink is a streaming engine.

The main high level difference is that in Apache Flink you create a job by coding against one of Flink APIs and you submit that job to Apache Flink cluster. It is the Apache Flink cluster that executes your stream processing job. By using Akka Streams you are creating a standalone application. In that sense Akka Streams is a more lightweight of the two.

You can still distribute Akka Streams based app by using StreamRefs, though you need to do that explicitly in the code and you need to run Akka Cluster. Apache Flink already manages a cluster so you don't need to do that explicitly in your code (though you still need the cluster set up and running to submit your jobs to). Apache Flink has smarts built in to take a job and execute it in an optimal way. Parallelizing/distributing execution when possible. You don't get that with Akka Streams.

Apache Flink stream processing is designed to achieve end2end exactly once processing semantics in face of failures. In Akka Streams such guarantee would need to be implemented explicitly in your code.

Akka Streams as reactive streams specification implementation is all about asynchronous and memory bound processing. Akka HTTP for example is built on top of Akka Streams and as a result implements a very efficient and lightweight client and server sides of HTTP protocol.

Akka Streams implements asynchronous non-blocking backpressure (as per reactive streams specification) to guarantee the memory boundedness during execution. Apache Flink also has a backpressure mechanism, though it's not implemented in the same way.

Akka Streams as an implementation of reactive streams specification can interoperate with other implementations like RxJava or Project Reactor. Apache Flink is not part of any broader standard.

I would say the main reasons to go for Apache Flink is the exactly once guarantees and automated distribution that comes with it. Otherwise Akka Streams is a very powerful API with simpler execution model.

EDIT: Probably worth mentioning project Alpakka that brings a lot of technologies to Akka Streams so that they can be plugged in to reactive streams based processing.

5
votes

I am not an expert in Akka Streams, but as far as I know, the main difference is that Flink offers the distribution of processing out of the box, while Akka Streams does not, since it was designed to process data on a single node.

The similarity between the two is that they both offer stream processing capabilities and in this sense, they probably have similar functionality.

But, Flink has multiple additional modules like SQL, CEP, or Machine Learning that You won't be able to get in Akka Streams. Also, Flink provides fail-safety and state recovery, which I am not sure if is present in Akka Streams out of the box.

On the other hand, setting up Akka Streaming will require less work as You don't need to care about setting JobManager & TaskManager but You can simply create a Java/Scala application, dockerize & run it somewhere.

So, the main question You should ask Yourself is, if the data You are processing is big enough that it will need to be processed on multiple nodes if it is then You really have no choice other than Flink (just in scenario Akka Streams vs. Flink). If however, the data You are going to process can be processed on a single node, then You should assess the fail-safety & message delivery guarantees You need. In the general case scenario, using Akka Streams may be easier to start with, but Flink may take over when it comes to productionizing the app.