1
votes

We currently have an application that receives a large amount of sensor data. Each sensor has its own unique sensor id (eg '5834f7718273f92cc326f620') and emits its status at different intervals. The processing order of the messages that come in is not important, for example a newer message of one sensor can be processed before an older message of another sensor. What does matter though, is that each message for a given sensor must be processed sequentially; in the order that that they arrived in the stream.

I have taken a look at the Kinesis Client Library and understand that KCL pushes messages to a single processor per shard. Does this mean that if a stream has only one shard it will have only one processor and couldn't this create a bottleneck? Or does KCL have more than one processor, and somehow, perhaps using the partition key ensures messages with the same partition key are never processed concurrently?

Note: We have taken a look at sqs fifo, but ruled it out as the 300 messages per second limit would soon become an issue.

1

1 Answers

2
votes

Yes, each shard can only have one processor at a given moment (per application).

But, you can use the sensor id as the partition key for your kinesis put record request. (see here)

This will make sure that all of this sensor events will get into the same shard and processor. If you will do that you'll be able to scale your processes and shards and still get each sensor events processed in a single processor