0
votes

I want to migrate my data from DynamoDb to Redshift. I dont want to scan the whole table at once as this might result in throttling. My Table is as below:

acountId(hash key), lastUpdatedTime.

I thought I can create GSI on lastUpdatedTime and then I can query like give me the data between day1 to day5. Again next day I can do give me data between day6 to day7. But even with GSI my understanding is that It will scan the whole table As I wont have any hash key to provide. I just have some range of timestamp to query.

1

1 Answers

0
votes

Creating a GSI is the right solution indeed. However the GSI creation operation might be a bit slow/expensive if you set GSI to project all attributes. I would recommend creating the GSI on lastUpdatedTime, and project only the partition key (and order key if you have one) using KEYS_ONLY. Then, when you scan, you will only retrieve the item keys and query the item there and then, when migrating.

I recommend reading up on GSIs here: https://docs.aws.amazon.com/fr_fr/amazondynamodb/latest/developerguide/GSI.html