Kafka Topic design best approach

Question

We are designing a integration using apache Kafka to send a critical business data . We have one producer and 5 consumers, so I have created one topic with 5 partitions to assign one partition for every consumer, however we need information delivered in the same order sent by producer and we was unable to achieve it. I read I only can achieve order by partition, so if I have only one partition I should be able todo it, but since I have 5 consumers i need partitions to paralelize the topic. So i think i must use topic keys but since order is only waranted by partition, I have some questions: If I use Keys in Kafka producer, I should send the payload specifyng the partition number (i.e in the producer code write message 5 times, one for each partition)?, or only by sending data with key to the topic, kafka replicates and write data in the same order in each partition?. example:

for(int i=0;i<=partitionsnumber;i++){ sendtoKafka(i,key,payload); }

In this case, should I use one topic for every consumer instead of partitions?

What is the best strategy to send data in the same order to all cosumers?

Note: The only key in the messages is of type string.

Simran Singh Simran Singh · Accepted Answer · 2020-06-09T18:39:37

I wasn't able to add the comment as it is quite long.

What you have mentioned in your comment that “we need an equal number of partitions for consumer application” is correct. However, it is only applicable if all the consumers(in your case its 5) comes under the same Consumer group.

For example, a topic T has 5 partitions, now suppose we create a consumer C1 with consumer group G1. Consumer c1 will get messages from all 5 partitions of Topic T. Then, we add consumer c2 under the same Consumer group G1. c1 will consume from 3 partitions and c2 will consume from the remaining 2 (It could be vice versa). Now what you have mentioned – “one partition per consumer application ” is an ideal scenario in this situation where 5 consumers under the same consumer group (G1) can consume from all 5 partitions parallel. This concept is called scalability.

Now, in your case you need the same data to be read 5 times because you have 5 consumers. In this case, instead of publishing the same messages to 5 partitions and then consume the same messages from all 5 consumers, you can write a simple producer app that publishes the data on a topic with 1 partition. Then, your 5 consumer apps can consume the same data independently I.e.I told you to assign all your consumer applications with random consumer-group names so that it will consume the messages independently ( as well as committing the offset).

Below the code snippet. Two consumer consuming messages from the same Topic(1 partition) parallelly:

Consumer 1:

 Properties props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); // randomise consumer group for consumer 1. 
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
    props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

            KafkaConsumer consumerLiveVideo = new KafkaConsumer(props);
            consumerLiveVideo.subscribe(Collections.singletonList(topicName[0])); // topic with 1 partition

Consumer 2:

Properties props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); // randomise consumer group for consumer 2 . 
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

            KafkaConsumer consumerLiveVideo = new KafkaConsumer(props);
            consumerLiveVideo.subscribe(Collections.singletonList(topicName[0])); // topic with 1 partition

You have also asked about the correct approach, according to me a single consumer application is all you need. Also, don’t mix the concepts of replication and scalability in Kafka as both of these are very critical.

Also, you have said about the critical data, you can read about Producer configuration parameter acks( use parameter acks =1 or acks=all based on your scenario).

For more details about the Scalability, Replication, Consumer Groups, Consumer/Producer/Brokers/Topics, please go through chapters 1-5 of Kafka The Definitive Guide.

Kafka Topic design best approach

2 Answers