I'm implementing a streaming pipeline that resembles the illustration below:
*K-topic1* ---> processor1 ---> *K-topic2* ---> processor2 -->
*K-topic3* ---> processor3 --> *K-topic4*
The K-topic components represent Kafka topics and the processor components code (Python/Java).
For the processor component, the intention is to read/consume data from the topic, perform some processing/ETL on it, and persist the results to the next topic in the chain as well as persistent store such as S3.
I have a question regarding the design approach.
The way I see it, each processor component should encapsulate both consumer and producer functionality.
Would the best approach be to have a Processor module/class that could contain KafkaConsumer and KafkaProducer classes ? To date, most examples I've seen have separate consumer and producer components which are run separately and would entail running double the number of components as opposed to encapsulating producers & consumers within each Processor object.
Any suggestions/references are welcome.
This question is different from
Designing a component both producer and consumer in Kafka
as that question specifically mentions using Samza which is not the case here.