- When a producer is producing a message - it will specify the topic it wants to send the message to, is that right? Does it care about partitions?
Yes, the Producer does specify the topic
producer.send(new ProducerRecord<byte[],byte[]>(topic, partition, key1, value1) , callback);
The more partitions there are in a Kafka cluster, the higher the throughput one can achieve. A rough formula for picking the number of partitions is based on throughput. You measure the throughout that you can achieve on a single partition for production (call it p) and consumption (call it c).
- When a subscriber is running - does it specify its group id so that it can be part of a cluster of consumers of the same topic or several topics that this group of consumers is interested in?
When the Kafka consumer is constructed and group.id does not exist yet (i.e. there are no existing consumers that are part of the group), the consumer group will be created automatically.
If all consumers in a group leave the group, the group is automatically destroyed.
- Does each consumer group have a corresponding partition on the broker or does each consumer have one?
Each consumer group is assigned a partition, multiple consumer groups can access a single partition, but not 2 consumers belonging to a consumer group are assigned the same partition because consumer consumes messages sequentially in a group and if multiple consumers from a single group consume messages from the same partition then sequence might be lost, whereas groups being logically independent can consume from the same partition.
- Are the partitions created by the broker, and therefore not a concern for the consumers?
Brokers already have partitions.
Each broker to have up to 4,000 partitions and each cluster to have up to 200,000 partitions.
Whenever a consumer enters or leaves a consumer group, the brokers rebalance the partitions across consumers, meaning Kafka handles load balancing with respect to the number of partitions per application instance for you.
Before assigning partitions to a consumer, Kafka would first check if there are any existing consumers with the given group-id.
When there are no existing consumers with the given group-id, it would assign all the partitions of that topic to this new consumer.
When there are two consumers already with the given group-id and a third consumer wants to consume with the same group-id. It would assign the partitions equally among all three consumers. No two consumers of the same group-id would be assigned to the same partition
source
- Since this is a queue with an offset for each partition, is it the responsibility of the consumer to specify which messages it wants to read? Does it need to save its state?
Offset is handled internally by Kafka. The current offset is a pointer to the last record that Kafka has already sent to a consumer in the most recent poll. So, the consumer doesn't get the same record twice because of the current offset.
It doesn't need to be specified exclusively
- What happens when a message is deleted from the queue? - For example, the retention was for 3 hours, then the time passes, how is the offset being handled on both sides?
It automatically reconfigures themselves according to need. It should give an error.