I'm trying to design a streaming architecture for streaming analytics. Requirements:
- RT and NRT streaming data input
- Stream processors implementing some financial analysis
- RT and NRT analysis output stream
- Reference data requests during stream processing
I'm exploring Kafka and Kafka Streams for stream processing and RT/NRT realtime messaging. My question is: I need to perform some query to external systems (info providers, MongoDB etc etc) during stream pocessing. These queries could be both sync and async req-response, based on the external system characteristics.
I've read this post explaining how to join KStream and KTable during processing and it's very interesting but in this scenario KTable is not dependant on input parameters coming from the KStream, it's just a streaming representation of a table.
I need to query the external system foreach KStream message, passing some message fields as query parameters and enrich the streaming message with query result, then publish the enriched message to an output topic. Is there any consolidated paradigm to design this stream processing? Is there any specific technology I'd better to use? Remember that queries can be sync and async.
I'd also like to design wrappers to these external systems, implementing a sort of distributed RPC, callable from a Kafka Stream processing. Could you suggest any technology/framework? I was considering Akka actors for distributing query responders but I can't understand if Akka fits well with the request-response paradigm.
Thanks