0
votes

Disclamer: I've never used Kinesis.

Context: we are designing a service that fetch data from a 4.1M Items DynamoDB table every minute, we do a grouping job based on one Items attribute, and we write it to a Data Store (accessible from EC2s only, not Lambda). We target to execute each job in <1m, ideally 15 seconds. An item in DDB can get updates on one of his attributes at any time (i.e., item override in DDB table). Only most recent version of the item must be used in grouping job.

This is a basic architecture diagram:

enter image description here

Supposing that DDB is provided with right RCU, my questions are:

  1. Is there a good use case for Kinesis? Is there any mayor issue design issue in this solution for our use case?
  2. How do I guarantee that only the last version of the DDB item (which is updated over time) is used for the grouping job?
1

1 Answers

0
votes

If you are doing ver similar operations each time you are fetching these rows, you probably want to use Streams and do Aggregation as seen here, whereby you use streams to process data and then put the results into time intervals, lets' say 10 minutes.

Then you can do incremental aggregation and then go to hour, day, year etc, and have an accuracy of ~10 minutes if that is your smallest unit, and the results would be live, in that they are available all in real time. You should save big big dollars on your capacity units & duplicate processing if there is any.

To answer your questions specifically:

  1. DynamoDB Streams uses Kenesis under the hood, I would say so.
  2. You have different events to consume, Update, Remove Create, so you would subscribe to all the events you were interested in. Ordering is also preserved.