In Kafka, I can split my topic into many partitions. I cannot have more consumers than partitions in Kafka, because the partition is used as a way to scale out a topic. If I have more load, I can increase the number of partitions, which will allow me to increase the number of consumers, which will allow me to have more threads / processes processing on a given topic.
In Kafka, there is a concept of a Consumer Group. If we have 10 consumer groups on a single topic, each consumer group will have the opportunity to process every message in a topic. The consumer group still takes advantage of the scalability from the partitions (i.e. Each consumer group can have up to 'n' consumers, where 'n' is the number of partitions on a topic). This is the beauty of kafka, scalability and multi-channel reading are two separate concepts with two separate knobs to turn.
In Kinesis, we are told that, if you use the Kinesis Library Client you can get the same functionality as consumer groups by defining different Kinesis Applications. In other words, we can have different Kinesis Applications independently streaming all records from the same stream and different times.
We are also told that "Amazon Kinesis Client Library (KCL) automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints."
OK, So I'm getting ready to start reading through the KCL code here, but I'm hoping someone can answer these questions to save me some time.
- How does the KCL actually do this?
- Are there diagrams somewhere explaining the process?
- If I started a new Kinesis Application (MyKinesisApp1) after a record was already produced and consumed by all prior Kinesis Applications, will the new Kinesis Application (MyKinesisApp1) still have an opportunity to consume that record? In other words, does Kinesis remove the record from its stream after it has been processed, or does it leave it there for the 7 days no matter what?
I have seen this question here but it doesn't answer my question. Especially my third question! Also, this question does a direct comparison between two similar technologies. It will help people that know Kafka, learn Kinesis more quickly.