AWS DataPipeline: RedshiftCopyActivity OVERWRITE_EXISTING not enforcing primary key

Question

I have a DataPipeline that exports data from a local DB to Redshift via S3 (very similar to Incremental copy of RDS MySQL table to Redshift template). I have defined primary key and set insertMode to "OVERWRITE_EXISTING" in pipeline definition, however, I noticed that some rows eventually were duplicated. In what cases does it happen and how do I prevent it?

Chennakrishna Chennakrishna · Accepted Answer · 2016-07-20T12:55:37

In Redshift it wont enforce primary key to restrict duplicate values. We do use temp table to load incremental data then we do upsert(using merge) to target table by checking whether record exist or not.

In this way you can achieve.

Thanks!!

AWS DataPipeline: RedshiftCopyActivity OVERWRITE_EXISTING not enforcing primary key

2 Answers