0
votes

I have been trying to explore more about akka streams, but I am failing to understand on how we can achieve similar parallelism in the way we achieve using Akka.Lets say Actor A consumes data from kafka and writes it to s3 and another Actor B consumes from kafka and writes it to postgres and another Actor C reads from DB and produces it another kafka topic. All 3 actors can be in different actor systems and need not be dependent on other. But how do I achieve a similar thing using Akka streams. I believe akka streams have phases where A does something and pipes it to B and so on till we reach the sink. I do realise there is a mapAsync which can be used to paralellise things but I am not sure how it would play in this context and also in terms of ordering gaurantees.

1

1 Answers

0
votes

Single Source

For the particular use case that you've listed you can use BroadcastHub to "fan out" each data item from kafka to each of the Sink values you listed:

type Data = ???

val kafkaSource : Source[Data, _] = ???

val runnableGraph: RunnableGraph[Source[Data, NotUsed]] =
  kafkaSource.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.right)

val kafkaHub : Source[Data, NotUsed] = runnableGraph.run()

val s3Sink : Sink[Data, _] = ???

val postgresSink : Sink[Data, _] = ???

kafkaHub.to(s3Sink).run()
kafkaHub.to(postgresSink).run()

Multiple Sources

One important drawback of the above implementation is that "the rate of the producer will be automatically adapted to the slowest consumer".

Therefore, if you're able to make multiple connections to the ultimate source then that will likely be more performant by maximizing concurrency:

val kafkaSource : () => Source[Data,_] = ???

//stream 1
kafkaSource().to(s3Sink).run()

//stream 2
kafkaSource().to(postgresSink).run()