0
votes

We are beginning to use Amazon Redshift for our reporting purposes. We are able to load our entire data onto Redshift through S3 and also manually update the data for everyday incremental load. Now we are into the process of automating the entire process because then the scripts can be run at a particular time and data gets automatically updated with everyday data.

The method we are using for incremental load is as suggested in the documentation,

http://docs.aws.amazon.com/redshift/latest/dg/merge-create-staging-table.html

this works fine manually but while automating the process, I am not sure how to obtain the primary key for each table based on which the existing records are updated. In short how to obtain the primary key field from redshift ? Is there something like "index" or some other term which can be used to obtain the primary key or even the distkey ? Thanks in advance

1

1 Answers

1
votes

I'm still working on the details of the query to extract the information easily, but you can use this query

select a.attname AS "column_name", format_type(a.atttypid, a.atttypmod) AS "column_type",
format_encoding(a.attencodingtype::integer) AS "encoding", a.attisdistkey AS "distkey",
a.attsortkeyord AS "sortkey", a.attnotnull AS "notnull", a.attnum, i.*
FROM pg_namespace n
join pg_class c on n.oid = c.relnamespace
join pg_attribute a on c.oid = a.attrelid AND a.attnum > 0 AND NOT a.attisdropped 
left join pg_index i on c.oid = i.indrelid and i.indisprimary='true'
WHERE  
c.relname = 'mytablename' 
and  n.nspname='myschemaname'
order by a.attnum

to find most of the interesting things about a table. If you look at the output, the pg_index.indkey is a space delimited concatenation of the primary key columns (since it may be a compound key) expressed as the column order number which ties back to the pg_attribute.attnum column.