1
votes

Im am configuring AWS pipline to load a redshift table with data from JSON S3 file.

Im using RedshiftActivity and everything was good until i try to configure KEEP_EXISTING load method. I really do not want to truncate my table with each load but keep existing information and ADD new records.

Redshift activity seems to require PRIMARY KEY defined in the table in order towork (OK) ... now it's also requresting me to configure DISTRIBUTION KEY, but i am interested in EVEN distribution and it seems that DISTRIBUTION KEY cannot work aside with EVEN distribution style.

Can i simulate EVEN distribution using a distribution key?

Thanks.

1

1 Answers

1
votes

I don't bother with primary key when creating tables in Redshift. For distkey, you want to pick a field whose values are randomly distributed, ideally.

In your case of incremental insertion, what I normally do is just use SQLActivity to copy the data from s3 to a staging table in Redshift. Then I perform the update/insert/dedup and whatever steps, depending on business logic. Finally I drop the staging table. Done.