2
votes

I have a question about the use of Akka Streams and Akka Cluster. I'm trying to make a version of distributed word count using Akka Streams and Akka Cluster.

I would like to build an Akka Streams client that reads a text file as streaming I/O and sends the stream of words to a remote cluster. This is the code of the client:

final Path file = Paths.get("example.txt");
final Source<ByteString, CompletionStage<IOResult>> read = FileIO.fromPath(file);

final Source<Pair<String, Integer>, CompletionStage<IOResult>> counts =
  read
    .via(Framing.delimiter(ByteString.fromString(" "), 256, FramingTruncation.ALLOW))
    .map(i -> i.utf8String())
    .runWith(/* send to Akka cluster */);

I don't understand what I have to use to send streaming data to an Akka cluster without losing the bases of Akka Streams (backpressure, etc.).

I know of the existence of Stream refs and Cluster Client but I don't understand which of them to use.

2
There is a way to integrate actors with akka streams: doc.akka.io/docs/akka/2.5.3/scala/stream/…, though I don't think it's a good idea since, in order to preserve the the back pressure, you have to add a lot of boiler plate across your actors...RoberMP

2 Answers

0
votes

Direct Answer

I don't think the functionality that you are looking for is available as of version 2.5.18. The inventory of cluster functionality doesn't list anything related to streams.

Indirect Answer

The computational requirements for your use case would have to be rather extreme to justify an akka-stream spanning multiple servers. The amount of parallelism available in a single server is rather large given the explosion of core count on modern processors. Therefore, each step of computation in your stream would have to require a huge amount of processor resources to justify spanning the stream across a network.

If you are truly working on such a large project then a tool like apache spark may better suite your needs.

0
votes

I don't think what you are looking for currently exists. However, there is this similar thing called streamRefs which allow you to have reactive streams over the network. Take a look here: https://doc.akka.io/docs/akka/2.5/stream/stream-refs.html