2
votes

I have problems implementing dynamodbstreams. We want to get records of changes right at the time the dynamodb table is changed.

We've used the java example from https://docs.aws.amazon.com/en_en/amazondynamodb/latest/developerguide/Streams.LowLevel.Walkthrough.html and translated it for our c++ project. Instead of ShardIteratorType.TRIM_HORIZON we use ShardIteratorType.LATEST). Also I am currently testing with an existing table and do not know how many records to expect.

Most of the time when iterating over the shards I retrieve from Aws::DynamoDBStreams::DynamoDBStreamsClient and the Aws::DynamoDBStreams::Model::DescribeStreamRequest I do not see any records. For testing I change entries in the dynamodb table through the aws console. But sometimes (and I do not know why) there are records and it works as expected.

I am sure that I misunderstand the concept of streams and especially of shards and records. My thinking is that I need to find a way to find the most recent shard and to find the most recent data in that shard.

Isn't this what ShardIteratorType.LATEST would do? How can I find the most recent data in my stream?

I appreciate all of your thoughts and am curious about what happens to my first stackoverflow post ever.

Best David

1

1 Answers

2
votes

How can I find the most recent data in my stream?

How would you define the most recent data? Last 10 entries? Last entry? Or data that is not yet in the shard? The question may sound silly but the answer makes a difference.

The option - LATEST - that you are using is going to set the head of the iterator right after the last entry which means that unless new data arrives after the iterator has been created, there will be nothing to read.

If by the most recent data you mean some records that are already in the shard then you can't use LATEST. The easy option is to use TRIM_HORIZON.

Or even easier would be to subscribe lambda function to that stream that will automatically be invoked whenever a new record is put into the stream (with the record being passed to that lambda function as payload), which might be preferable if you need to handle events in near-real time.