1
votes

Creating Kafka Producer with "acks=all" config.

Is their any significance of calling flush with above config ?

Will it wait for flush to be invoked before being sent to broker.

AS

acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee. This is equivalent to the acks=-1 setting.

2
just call flush from a finally clause (when you exit the program and don't want to lose the last messages). The kafka producer is smart enough to know (based on your params) when he must flush the messages for a certain partition, so calling flush manually will decrease the performance, since you are generating a totally synchronous producer (acks=all and flush on every message).aran

2 Answers

4
votes

As per documentation

flush():

Invoking this method makes all buffered records immediately available to send (even if linger_ms is greater than 0) and blocks on the completion of the requests associated with these records. The post-condition of flush() is that any previously sent record will have completed (e.g. Future.is_done() == True). A request is considered completed when either it is successfully acknowledged according to the ‘acks’ configuration for the producer, or it results in an error.

Other threads can continue sending messages while one thread is blocked waiting for a flush call to complete; however, no guarantee is made about the completion of messages sent after the flush call begins.

flush() will still block the client application until all messages are sent even with ack=0. The only thing is that it won't wait for an ack, the block is only until the buffer is sent out.

flush() with ack=all guarantees that the messages have been sent and has been replicated on the cluster with required replication factor.

Finally, to answer your question: Will it wait for flush to be invoked before being sent to broker?

Answer: Not necessarily. The producer keeps sending messages at an interval or by batch size (The buffer.memory controls the total amount of memory available to the producer for buffering). But, it's always good to flush() to make sure you send all messages.

Refer to this link for more information.

3
votes

Let me first try and call out the distinction between flush() and acks before I get to the 2 questions.

flush() - This is a method to be invoked in the producer to push the messages to the brokers from the buffer (configurable) maintained on the producer side. You would either invoke this method or close() to send the messages through to the brokers from the producer buffer. This gets invoked automatically if the buffer memory available to the producer gets full (as described by Manoj in his answer).

acks=ALL is however a responsibility of the broker i.e. to send an acknowledgement back to the producer after the messages have synchronously replicated to other brokers as per the setting requested in the producer. You would use this setting to tune your message delivery semantics. In this case, as soon as the messages are replicated to the designated in-sync replicas, the broker will send the acknowledgement to the producer saying - "I got your messages".

Now, on your questions i.e. If there is any significance of calling flush with the acks setting and whether or not the producer will wait for flush to be invoked before being sent to the broker.

Well, the asynchronous nature of the producer will ensure that the producer does not wait. If however, you invoke flush() explicitly or if it gets invoked on its own then any further sends will be blocked until the producer gets the acknowledgement from the broker. So, the relationship between these 2 is very subtle.

I hope this helps!