Redshift allows designating multiple columns as SORTKEY
columns, but most of the best-practices documentation is written as if there were only a single SORTKEY.
If I create a table with SORTKEY (COL1, COL2)
, does that mean that all columns are stored sorted by COL1, then COL2? Or maybe, since it is a columnar store, each column gets stored in a different order? I.e. COL1 in COL1 order, COL2 in COL2 order, and the other columns unordered?
My situation is that I have a table with (among others) a type_id and a timestamp column. Data arrives roughly in timestamp order. Most queries are joined against / restricted by both type_id and timestamp. Usually the type_id clauses are more specific, meaning a much larger percentage of rows can be excluded by looking at the type_id clause than by looking at the timestamp clause. type_id is the DISTKEY for this reason. I'm trying to understand the pros and cons of SORTKEY (type_id)
, SORTKEY (stamp)
, SORTKEY (type_id,stamp)
, SORTKEY (stamp,type_id)
.
Thanks.