I have my own Spring Cloud Data Flow processor with Python inside, I used this sample as a guide: https://dataflow.spring.io/docs/recipes/polyglot/processor/.
Then I want to scale and create three of these processors, so using spring.cloud.deployer.myApp.count=3
I create 3 pods with Python inside.
I slightly modified a code in sample: when I create a Kafka consumer, I also pass a group id, so messages should be load balanced.
consumer = KafkaConsumer(get_input_channel(), group_id=get_consumer_group(), bootstrap_servers=[get_kafka_binder_brokers()])
The issue is that SCDF creates a Kafka topic with only 1 partition, so messages arrive to one pod only. So I am wondering:
- Should I somehow configure SCDF to create a Kafka topic with 3 partitions?
- Or should I not rely on SCDF and create topics on my own in Python? I suppose this will be redundant, since SCDF also creates this topic.
- What component in SCDF is actually responsible for Kafka topics creation? And how can I influence it regarding number of partitions?
- If I stop this stream and launch again with 4 processor steps, should the topic be extended with the 4th partition? Because currently no new partitions get created.