1
votes

I have a Kafka cluster with multiple topics, I'm going to set One partition for each topic and all those topics will be consumed by a single one EC2 instance running with 3 Kafka Consumer threads (One consumer per thread), belong to same Consumer Group.

I haven't experimented it yet, but I'm wondering if the Kafka can do balancing the partitions of all topics to be consumed by 3 threads equally ? or Kafka will assign all partitions to be consumed by only one thread?

2

2 Answers

2
votes

The Kafka consumer is NOT thread-safe, you should not share same consumer instance between different thread. Instead you should create new instance for each thread.

From documentation https://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#multithreaded:

1. One Consumer Per Thread

A simple option is to give each thread its own consumer instance. Here are the pros and cons of this approach:

  • PRO: It is the easiest to implement
  • PRO: It is often the fastest as no inter-thread co-ordination is needed
  • PRO: It makes in-order processing on a per-partition basis very easy to implement (each thread just processes messages in the order it receives them).
  • CON: More consumers means more TCP connections to the cluster (one per thread). In general Kafka handles connections very efficiently so this is generally a small cost.
  • CON: Multiple consumers means more requests being sent to the server and slightly less batching of data which can cause some drop in I/O throughput.
  • CON: The number of total threads across all processes will be limited by the total number of partitions.

If topic has several partitions, messages from different partitions can be processed in parallel. You can create few consumer instances with same group.id and each of consumer will get subset of partitions to consume data.

Kafka doesn't support parallel processing across different topics. By this I mean that groups are not managed across different topics, partitions from different topics might not be assigned evenly.

0
votes

One should not have more consumer than the partitions. Otherwise, the order of the messages cannot be guaranteed and the way the consumer offset is store will nto work. Partially because of this, Kafka (Java) producers/consumer are not thread-safe.

So in Kafka case, the number of partitions is your parallellism.

So in your scenario, having one partition, run exactly one consumer with exactly one consumer instance in exactly one thread (you can, sure, send the message for later processing to some threads in a pool)