How are shards from a Kinesis stream assigned to multiple instances of a Kinesis consumer?

Question

I have a setup with a kinesis stream with 20 shards that is consumed by a kinesis consumer based on KCL. The consumer is deployed in ECS with 20 instances.(Meaning multiple KCL instances?)

What I believed would happen in this scenario is:

Each instance would create 20 worker threads for each shard, independently of each other.
So at any given time, a shard would have 20 separate threads connecting to it
The same set of records would get processed by each instance (ie: duplicate record processing will not be handled across the instances)
This would also exceed the consumer rate limits per each shard. ( 5 transactions per second)
Running a single instance of my consumer is sufficient. In other words, scaling the consumer across multiple instances will not have any benefits at all.

This answer seem to suggest that the "shard's lease" would ensure that it is only processed by a single instance. However, the second answer here says that "A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.".

Further this documentation suggests "Increasing the number of instances up to the maximum number of open shards" as a possible scale-up approach which contradicts some of the above points.

How does the consumer instances actually function in this scenario?

stevenv stevenv · Accepted Answer · 2020-09-06T08:44:59

What would happen in the scenario you describe is that each of the 20 workers will eventually only process 1 shard.

At startup, each worker will try to claim as many shards by creating leases for those shards. When all 20 workers start simultaneously, they will all try to create leases for 20 shards, but this will not succeed for all of them. One worker may end up with eg 5 shards, and other ones with 2 or 3. After a few iterations of lease taking, though, each worker should have only 1 shard. This way the AWS rate limits are respected.

While this balancing process happens, it is possible for a short while for two workers to process the same records twice. This happens between the time that a worker steals a lease from another worker and that worker trying to update the lease and discovering that another worker has taken it, either by periodic refreshing or by checkpointing.

After this initial lease division, though, this will not happen anymore. When the workers are restarted, they resume the leases they had previously. But when a worker is down for a long time, other workers will take over its leases.

Kinesis has an at-least-once processing model because of this. It is best to design your application so that operations on the data are idempotent.

Scaling is useful if you want to be fault-tolerant (other workers will take over from a failed worker) or your data processing is so time-consuming that one worker would not be able to cope with 20 shards. Scaling beyond the number of shards is indeed only useful for fault-tolerance purposes.

How are shards from a Kinesis stream assigned to multiple instances of a Kinesis consumer?

1 Answers