1
votes

Based on the Kafka documentation: one can improve throughput in Kafka by having a topic that has multiple partitions, and then creating a consumer group that has at most as many consumer instances as the number of partitions. That way each consumer instance is assigned its own partition.

I can create a topic with multiple partitions, then configure flume-kafka-channel to use that topic.

However regardless how many partitions the topic has the flume-kafka-channel only create a single consumer (at least based on what I see in the flume logs).

Is there a way I could configure the Kafka-Channel to spawn as many consumers as there are partitions?

I am guessing the answer is no, since there could be only a single source for a channel.

1
You could run multiple Flume processes, depending on the sources and sinks - OneCricketeer

1 Answers

0
votes

You can run multiple flume agents with the same consumer group id, so that all the agents can share the partitions. Ex: if your topic has 20 partitions and you run 4 flume agents then each flume agent is assigned to 5 partitions. I think that is the only way you can achieve parallelism with Kafka Channel in flume.