Identity Column not respected on Insert into() (Amazon Redshift)

Question

I initially ran into this when I was selecting from one table with an identity, primary key and sort key into another table with its own set of identity, primary, sort. Instead of respecting the (1,1) identity as it was defined, it doing (1,8) (sometimes 3,8). I think it might be because the original table was sorted? In trying to figure out what was going on, I made a much simpler query and data and found a reproducible example across multiple redshift clusters. Take this test example:

drop table if exists test;
create temp table test (id int identity(1,1) not null
                    , value varchar(16)
                    , primary key (id))
                    diststyle all
                    sortkey (id);
insert into test (value) select 'a';
insert into test (value) select 'b';
insert into test (value) select 'c' union select 'd';
insert into test (value) values ('e'), ('f'), ('g');

select * from test;

The output I get is:

id  value
1   a
2   b
9   c
10  d
3   e
4   f
5   g

You'll notice the identity column is not incrementing correctly. I had friends on other clusters try this, they got 20, 27 and 65, 60 for the c and d columns, while the other columns are in order. Please note that the output is still "sorted" correctly, by the sortkey/order of input, despite that the id column isn't physically in order.

The only similarity I can think of between the weird original results I got when first finding this and the test query is that unions are sorted and my table had a sortkey on it.

Other thoughts as to why this is happening and how to fix it are welcome.

Paladin Paladin · Accepted Answer · 2016-01-30T18:39:52

Redshift identity columns are not guaranteed to be incremental as defined by the identity skip value. But, it is guaranteed that the values will never collide (i.e. it will always be unique).

The skip in value comes because of the distributed architecture of Redshift. Each node reserves some values on the number line (n mod x where x is the number of nodes in the cluster). So, if all the nodes are not getting equal amount of rows, you will see skips in the identity values.

Identity Column not respected on Insert into() (Amazon Redshift)

1 Answers