How does kafka streams guarantee that the partitions assigned to it
from the kafka broker match the partitions assigned for other topics?
Kafka streams application subscribes to one or more topics under an application.id
which resembles group.id
in Kafka-clients.
When a client requests Kafka broker for subscription of a topic with a particular group.id
it returns a set of partitions for that topic.
If all the topics partitions are assigned to any streams instance under the same application.id
, a re-balance will be triggered and the newly started streams instance will receive its share of the partitions and the old instance will no longer be listening to those partitions.
Does kafka streams always read the compacted topic from the beginning,
or does it read from latest offset?
Whether compacted or otherwise, Kafka streams applications reads from the last committed offset.
Once it reads the messages from the compacted topic, does it commit
the offset?
From the wiki, it is stated that..
Kafka Streams commit the current processing progress in regular
intervals (parameter commit.interval.ms). If a commit is triggered,
all state stores need to flush data to disk, i.e., all internal topics
needs to get flushed to Kafka. Furthermore, all user topics get
flushed, too. Finally, all current topic offsets are committed to
Kafka. In case of failure and restart, the application can resume
processing from its last commit point (providing at-least-once
processing guarantees).
While writing a Kafka streams application, the developer need not manually take care of committing the offsets because it is internally done by Kafka streams.