I have a RDS database whose snapshot is taken everyday and is kept in a S3 bucket. I copy the RDS snapshot data from S3 to Amazon Redshift database daily. I can use copy
to copy the tables but instead of copying the whole table, I want to copy only the rows which were added since the last snapshot was taken(Incremental copying).
For example, in RDS, there is a table name "user" which looks like this at 25-05-2021
id | username
1 | john
2 | cathy
When I will run the data loader for first time on 26-05-2021, it will copy these two rows into the Redshift table with the same name.
Now on 26-05-2021, the table in RDS looks like this:
id | username
1 | john
2 | cathy
3 | ola
4 | mike
When I will run the data loader on 27-05-2021, instead of copying all three rows, I want to copy/take only the rows which has been newly added(id = 3 and id = 4
) as I already have the other rows.
What should be the best way of doing this incremental loading?
VACUUM
? – Max Ganz II