I have checked all of the AWS documentation on Kinesis. All I have found is how the Producer streams data onto kinesis streams and consumer consumes the streams when initialized (Kind of FIFO model). If the data which is sent to the stream stays in the shard for 24 hours, I would like to access a particular value multiple times. However, I did not find a suitable mechanism to do that. Is there a way to scan the kinesis stream rather than processing the streams like FIFO model.
2 Answers
No, unfortunately you can't do that.
If you know the position of your data (i.e. checkpoint
value) you can start reading your shard starting from that place. But otherwise, there is no search mechanism.
If you really need to catch a specific value and process it multiple times; you might want to use some in-memory database-like cache structure on your consumer application. Redis, Memcache or maybe VoltDB can be helpful if you have such large data moving at high speed.
When you are putting a record into Kinesis, the producer is getting a sequence ID and Shard ID (see the API for PutRecord here: http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html).
Response Syntax:
{
"SequenceNumber": "string",
"ShardId": "string"
}
You can use this sequence ID and shard ID to fetch the record from the kinesis stream on the consumers side (see the API for GetShardIterator here: http://docs.aws.amazon.com/kinesis/latest/APIReference/API_GetShardIterator.html).
Request Syntax:
{
"ShardId": "string",
"ShardIteratorType": "string",
"StartingSequenceNumber": "string",
"StreamName": "string"
}
Please note that if you are looking for more of pub-sub model, you should use SNS and not Kinesis, which is more optimized for event streaming processing (mainly in FIFO order) in near real time.