Kafka: Why broker isn't pull based like consumers

Question

I was reading Kafka docs where it was mentioned that:-

Consumers pulls data from broker by requesting from offset.
Producer pushes messages to broker.

Making Kafka consumers pull based make sense that the consumers can drive the pace and broker can store the data for a really long time.

However with producers being push based, How does Kafka make sure that speed mismatch between producer and kafka won't happen? Also producers don't have persistance by design.This seems to be a bigger problem, when producers and brokers are separated over high latency network(internet).

Why a negative vote? I clearly did research in how kafka works, It is not a borad question and it is relevant to programming and a populer tool? — Mangat Rai Modi

Robin Moffatt Robin Moffatt · Accepted Answer · 2017-11-02T18:34:39

As a distributed commit log, Kafka solves exactly this (impedance mismatch). You produce your events at the rate at which they occur into Kafka, and then you consume them at the rate at which your application can. The data is persisted in Kafka regardless. If your application needs to consume at a greater rate, you scale it out and partition your topic and consume in parallel. Because the data is persisted the only factor is how fast you want to consume the data.

Kafka: Why broker isn't pull based like consumers

1 Answers