2
votes

When I am sending messages to Kafka topic, I might get a single message which is much larger in size compared to other messages.

So it is required to compress at single message level. As per the https://cwiki.apache.org/confluence/display/KAFKA/Compression,

A set of messages can be compressed and represented as one compressed message.

Also as per the description given here https://github.com/apache/kafka/blob/0.10.1/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java for the property compression.type,

Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).;

Shall I put batch size as one/disable batching to make the compression at each message level?

1
Why is compressing batches not acceptable?ftr
I'm not saying it. My intention is if one message is very big and I enable batches, will it compress and send the message/ wait for the next messages to fulfill the batch constarint?Anil Kumar
@AnilKumar , you got the answer for this question , am now facing same issue , each message is so big , i want compress , each message and send , is it possible ?Bravo
No bravo. I didn't get anyAnil Kumar

1 Answers

1
votes

compression is orthogonal to the question of producing in batch or not. Though, as stated in the documentation:

more batching means better compression

Compression can be set in the topic level (https://kafka.apache.org/documentation/#topicconfigs) or as part of producer config (https://kafka.apache.org/documentation/#producerconfigs) . Moreover, different messages in the same topic can be compressed with different type, as the compression type is part of the record metadata (https://kafka.apache.org/documentation/#recordbatch), and it would be seamless to the consumer.

However, if you require selectively compress different messages, it cannot be done with the same producer, as the producer configuration is static. Whatever is the motivation for such a choice, you could just create two producer instances (one that support compression and one without compression), and according to message content, decide which producer to use to send it.