0
votes

I am new to Streaming Broker [like Kafka], and coming from Queueing Messaging System [like JMS, Rabbit MQ].

I read from Kafka docs that, messages are stored in Kafka partitions in offset as record. And consumer reads from offset.

What is the difference between message and record [does multiple/partial messages constitute a record?]

When comsumer reads from offset, is there a possibility that consumer reads partial message? IS there a need for consumer to string these parital messages based on some logic?

OR

1 message = 1 record = 1 offset

EDIT1:

The question was popped because, the "batch size" decides how many bytes of message should be published on to the borker. Lets say there are 2 messages with message1 = 100bytes and message2= 200 bytes and batchsize is set to 150bytes. Does this mean 100 bytes from message1 and 50 bytes from message2 are sent to broker at once? If yes, how are these 2 messages stored in offset?

1

1 Answers

1
votes

In Kafka, a Producer sends messages or records (both terms can be used interchangeably) to Topics. A topic is divided into one or more Partitions that are distributed among the Kafka Cluster, which is generally composed of at least three Brokers.

A message/record is sent to a leader partition (which is owned by a single broker) and associated to an Offset. An Offset is a monotonically increasing numerical identifier used to uniquely identify a record inside a topic/partition, e.g. the first message stored in a record partition will have the offset 0 and so on.

Offsets are used both to identify the position of a message in a topic/partition as well as for the position of a Consumer Group.

For optimisation purpose , a producer will batch messages per partition. A batch is considered to be ready when either the configured batch.sized or linger.ms are reached. For example, if you have a batch.size set to 200KB and you send two messages (150KB and 100KB), they will be part potentially of the same batch. But the producer will never fragment a single message into chuncks.

No, a consumer cannot read partial messages.