We have a set of tables in Redshift with columns having IDENTITY property in it, for sequence generating. During testing phase there is a requirement of taking a backup and restore and this is a repeatative activity for each cycle of testing. We followed the below processes to take backup and then restore and faced the below issues:
- Traditional way: Created backup tables in another backup schema with CREATE TABLE XYZ_BKP AS SELECT * FROM XYZ. But doing that we lost the IDENTITY and other attributes of the table. So during restore if you try to create the table from the backup directly you lose the attribute properties and YOU CAN'T ALTER to add IDENTITY constraint.
- Traditional way backup and a different restore method: This time we dropped and recreate the table with DDL first and then tried to perform INSERT INTO from backup. But it can't insert values into the IDENTITY columns.
- UNLOAD and COPY: We also tried Redshift utilities such as UNLOAD to take a backup of the table in S3 and then restore using copy. It worked fine but then we faced other issues - a. DATE fields having leading zero didn't get extracted properly in the UNLOAD extract. Ex: A Date '0001-01-01' extracted as '1-01-01'. Then it's failing during the COPY saying not a valid date. There are other several errors its throwing during the restore (COPY) such missing data for not null fields or invalid value for int datatype. Which means the UNLOAD and COPY command together don't work in sync and values change.
- Table restore from snapshot: I haven't tried this but i understand AWS supports table restore now. But again it's a tedious job to set up this individually for 500 tables. Also you have keep and track snapshots for long.
It will be very helpful if you could suggest the best possible way to backup and restore in my scenario OR the best practices organizations follow.