I've a Lambda which makes an API request to a service and get a list of matches for a given identifier. Once it gets a list of matches it updates a DynamoDB table in the following way:
requestId (partitionKey) | ID | matches | status |
---|---|---|---|
uuid1 | ID1 | [match1,match2, match3...] | FOUND_MATCHES |
For each match in this table, a message is sent to SQS which is listened to by a Lambda. For each match, the Lambda will make a call to a different service and update a table which keeps track of the matches execution.
Match (partitionKey) | requestId (sortKey + GSI Partition Key) |
---|---|
match1 | uuid1 |
match2 | uuid1 |
... | ... |
match10 | uuid1 |
Now given the requestId, I would like to know if there are entries in the 2nd table for all matches.
One option I'm thinking is look up the first table via the requestId and it will give the list of matches, then it will call the 2nd table multiple times via the primary key, sortKey combination and compile a result. The other option is instead of looking up via primary key/sort key for 2nd table, it will look via a GSI Parition key on the requestId column and get all the matches at once (in a paginated list).
I would like to expose this operation via an API call, and I'm wondering if I'll run into APIG 30 seconds timeout (I know I'll need to run some experiments with my dataset, but just wanted to see if there are other options I can consider before doing this). If number of matches exceed say 50,000. And each get call roughly takes 20ms, it will take about 50,000 * 20 = 1000 seconds which is way more than APIG limit. Maybe batch call might help a bit, but not sure there is much room there.
Essentially I would like to update the status in the first table from FOUND_MATCHES
to ALL_MATCHES_PROCESSED
.
- Ideal option is automatically the state gets updated
- A Get status API essentially triggers the calculation and updates the state (Maybe make this async to get past the 30seconds APIG limitation)