0
votes

We are designing a integration using apache Kafka to send a critical business data . We have one producer and 5 consumers, so I have created one topic with 5 partitions to assign one partition for every consumer, however we need information delivered in the same order sent by producer and we was unable to achieve it. I read I only can achieve order by partition, so if I have only one partition I should be able todo it, but since I have 5 consumers i need partitions to paralelize the topic. So i think i must use topic keys but since order is only waranted by partition, I have some questions: If I use Keys in Kafka producer, I should send the payload specifyng the partition number (i.e in the producer code write message 5 times, one for each partition)?, or only by sending data with key to the topic, kafka replicates and write data in the same order in each partition?. example:

for(int i=0;i<=partitionsnumber;i++){ sendtoKafka(i,key,payload); }

In this case, should I use one topic for every consumer instead of partitions?

What is the best strategy to send data in the same order to all cosumers?

Note: The only key in the messages is of type string.

2

2 Answers

2
votes

I wasn't able to add the comment as it is quite long.

What you have mentioned in your comment that “we need an equal number of partitions for consumer application” is correct. However, it is only applicable if all the consumers(in your case its 5) comes under the same Consumer group.

For example, a topic T has 5 partitions, now suppose we create a consumer C1 with consumer group G1. Consumer c1 will get messages from all 5 partitions of Topic T. Then, we add consumer c2 under the same Consumer group G1. c1 will consume from 3 partitions and c2 will consume from the remaining 2 (It could be vice versa). Now what you have mentioned – “one partition per consumer application ” is an ideal scenario in this situation where 5 consumers under the same consumer group (G1) can consume from all 5 partitions parallel. This concept is called scalability.

Now, in your case you need the same data to be read 5 times because you have 5 consumers. In this case, instead of publishing the same messages to 5 partitions and then consume the same messages from all 5 consumers, you can write a simple producer app that publishes the data on a topic with 1 partition. Then, your 5 consumer apps can consume the same data independently I.e.I told you to assign all your consumer applications with random consumer-group names so that it will consume the messages independently ( as well as committing the offset).

Below the code snippet. Two consumer consuming messages from the same Topic(1 partition) parallelly:

Consumer 1:

 Properties props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); // randomise consumer group for consumer 1. 
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
    props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);

            KafkaConsumer consumerLiveVideo = new KafkaConsumer(props);
            consumerLiveVideo.subscribe(Collections.singletonList(topicName[0])); // topic with 1 partition

Consumer 2:

Properties props = new Properties();
    props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); // randomise consumer group for consumer 2 . 
    props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
    props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArrayDeserializer");
    props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

            KafkaConsumer consumerLiveVideo = new KafkaConsumer(props);
            consumerLiveVideo.subscribe(Collections.singletonList(topicName[0])); // topic with 1 partition

You have also asked about the correct approach, according to me a single consumer application is all you need. Also, don’t mix the concepts of replication and scalability in Kafka as both of these are very critical.

Also, you have said about the critical data, you can read about Producer configuration parameter acks( use parameter acks =1 or acks=all based on your scenario).

For more details about the Scalability, Replication, Consumer Groups, Consumer/Producer/Brokers/Topics, please go through chapters 1-5 of Kafka The Definitive Guide.

0
votes

You need all your consumers to read the same messages published by the producer, right?

If that's the case, you don't have to publish/produce the same messages to all 5 partitions of your topic.

A simpler approach would be to create a topic with 1 partition and your producer app will publish all the messages to that topic/partition.

Now, you can easily create consumer applications with different consumer groups consuming data from the same topic. Assign some random id to your consumers and this way you will be able to consume from one topic/partition with all 5 consumers and be able to commit offsets.

Just add the below code snippet to all 5 consumer apps properties.

props.put(ConsumerConfig.GROUP_ID_CONFIG, UUID.randomUUID().toString()); // randomise consumer group.

Let me know if you have any questions.