How do I use multiple consumers in Kafka?

Question

I am a new student studying Kafka and I've run into some fundamental issues with understanding multiple consumers that articles, documentations, etc. have not been too helpful with so far.

One thing I have tried to do is write my own high level Kafka producer and consumer and run them simultaneously, publishing 100 simple messages to a topic and having my consumer retrieve them. I have managed to do this successfully, but when I try to introduce a second consumer to consume from the same topic that messages were just published to, it receives no messages.

It was my understanding that for each topic, you could have consumers from separate consumer groups and each of these consumer groups would get a full copy of the messages produced to some topic. Is this correct? If not, what would be the proper way for me to set up multiple consumers? This is the consumer class that I have written so far:

public class AlternateConsumer extends Thread {
    private final KafkaConsumer<Integer, String> consumer;
    private final String topic;
    private final Boolean isAsync = false;

    public AlternateConsumer(String topic, String consumerGroup) {
        Properties properties = new Properties();
        properties.put("bootstrap.servers", "localhost:9092");
        properties.put("group.id", consumerGroup);
        properties.put("partition.assignment.strategy", "roundrobin");
        properties.put("enable.auto.commit", "true");
        properties.put("auto.commit.interval.ms", "1000");
        properties.put("session.timeout.ms", "30000");
        properties.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
        properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        consumer = new KafkaConsumer<Integer, String>(properties);
        consumer.subscribe(topic);
        this.topic = topic;
    }


    public void run() {
        while (true) {
            ConsumerRecords<Integer, String> records = consumer.poll(0);
            for (ConsumerRecord<Integer, String> record : records) {
                System.out.println("We received message: " + record.value() + " from topic: " + record.topic());
            }
        }

    }
}

Furthermore, I noticed that originally I was testing the above consumption for a topic 'test' with only a single partition. When I added another consumer to an existing consumer group say 'testGroup', this trigged a Kafka rebalance which slowed down the latency of my consumption by a significant amount, in the magnitude of seconds. I thought that this was an issue with rebalancing since I only had a single partition, but when I created a new topic 'multiplepartitions' with say 6 partitions, similar issues arose where adding more consumers to the same consumer group caused latency issues. I have looked around and people are telling me I should be using a multi-threaded consumer -- can anyone shed light on that?

There's a great example of a high level consumer here for kafka 0.8.1. — chrsblck
@chrsblck thanks for the link. I've actually examined that previously and probably didn't understand it as well as I could have -- could you perhaps explain a little bit how that example makes use of the threads? I don't fully understand what they're doing at the moment. — Jeff Gong
One way is to have the same number of threads as partitions for a given topic. From the article - Grab a list of streams List<KafkaStream<byte[], byte[]>> streams = consumerMap.get(topic); ... Then assign each thread a partition executor.submit(new ConsumerTest(stream, threadNumber)). — chrsblck

Chris Gerken Chris Gerken · Accepted Answer · 2015-06-17T18:37:08

I think your problem lies with the auto.offset.reset property. When a new consumer reads from a partition and there's no previous committed offset, the auto.offset.reset property is used to decide what the starting offset should be. If you set it to "largest" (the default) you start reading at the latest (last) message. If you set it to "smallest" you get the first available message.

So add:

properties.put("auto.offset.reset", "smallest");

and try again.

* edit *

"smallest" and "largest" were deprecated a while back. You should use "earliest" or "latest" now. Any questions, check the docs

How do I use multiple consumers in Kafka?

3 Answers