7
votes

In Kafka, I can split my topic into many partitions. I cannot have more consumers than partitions in Kafka, because the partition is used as a way to scale out a topic. If I have more load, I can increase the number of partitions, which will allow me to increase the number of consumers, which will allow me to have more threads / processes processing on a given topic.

In Kafka, there is a concept of a Consumer Group. If we have 10 consumer groups on a single topic, each consumer group will have the opportunity to process every message in a topic. The consumer group still takes advantage of the scalability from the partitions (i.e. Each consumer group can have up to 'n' consumers, where 'n' is the number of partitions on a topic). This is the beauty of kafka, scalability and multi-channel reading are two separate concepts with two separate knobs to turn.

In Kinesis, we are told that, if you use the Kinesis Library Client you can get the same functionality as consumer groups by defining different Kinesis Applications. In other words, we can have different Kinesis Applications independently streaming all records from the same stream and different times.

We are also told that "Amazon Kinesis Client Library (KCL) automatically creates an Amazon DynamoDB table for each Amazon Kinesis Application to track and maintain state information such as resharding events and sequence number checkpoints."

OK, So I'm getting ready to start reading through the KCL code here, but I'm hoping someone can answer these questions to save me some time.

  1. How does the KCL actually do this?
  2. Are there diagrams somewhere explaining the process?
  3. If I started a new Kinesis Application (MyKinesisApp1) after a record was already produced and consumed by all prior Kinesis Applications, will the new Kinesis Application (MyKinesisApp1) still have an opportunity to consume that record? In other words, does Kinesis remove the record from its stream after it has been processed, or does it leave it there for the 7 days no matter what?

I have seen this question here but it doesn't answer my question. Especially my third question! Also, this question does a direct comparison between two similar technologies. It will help people that know Kafka, learn Kinesis more quickly.

1
Did you read this answer: stackoverflow.com/a/42833193/1622134az3
This question and its answer do a good job of comparing two similar but different technologies. I see them being similar to those questions, but not the same.CBP
As a quick follow up, my comment on the answer below was the missing information for me to understand this problem. It isn't explicitly written anywhere (that I have seen). I realized this after reading the answer below. Im confident this question will help people in the future.CBP

1 Answers

7
votes
  1. In the KCL configuration, there is a section "appName" which corresponds to "Application Name" and that is the same as "consumer group" in Kafka. For each consumer group (ie. Kinesis Streams Consumer Application) there is a DynamoDB table. You can see an example DynamoDB here (the KCL appName is 'quickstats-development'): AWS Kinesis leaseOwner confusion

  2. No, as far as I know, there is not. "Kinesis Streams" is similar to Kafka, but other than that, not much graphical representation.

  3. Yes. Each Kafka Consumer-Group is represented as a different DynamoDB table in Kinesis. That way, different Kinesis Consumer Applications can consume same record independently. The checkpoint in Kinesis is the Offset value of Kafka. And a checkpoint in DynamoDB is the cursor of reading point in a Kinesis shard. Read this answer for a similar example: https://stackoverflow.com/a/42833193/1622134