Compound Sort Key vs. Sort Key

Question

Let me ask other question about redshift sortkey. We're planning to set the sortkey with the columns frequently used in WHERE statement.

So far, the best combination for our system seems to be: DISTSTYLE EVEN + COMPOUND SORTKEY + COMPRESSED Column (except for First SortKey column)

Just wondering which can be more better, simple SORTKEY or COMPOUND SORTKEY for our BI tables which can have diversified queries according to users' analysis.

For example, we set the compound sortkey according to frequency in several queries' WHERE statement as follows.

COMPOUND SORTKEY
(
PURCHASE_DATE <-- set as first sort key since it's date column.
STORE_ID,
CUTOMER_ID,
PRODUCT_ID
)

But sometimes it can be queried only 'PRODUCT ID' in actual queries, not with other listed sort keys, nor queried different from COMPOUND KEY order.

In that case, may I ask 'COMPOUND SORTKEY' can be useless or simple SORT KEY can be more effective ...?

I'd be so grateful if you would tell me about your idea and experiences.

John Rotenstein John Rotenstein · Accepted Answer · 2018-10-17T11:32:05

The simple rules for Amazon Redshift are:

Use DISTKEY on the column that is most frequently used with JOIN
Use SORTKEY on the column(s) that is most frequently used with WHERE

You are correct that the above compound sort key would only be used if PURCHASE_DATE is included in the WHERE.

An alternative is to use Interleaved Sort Keys, which give equal weighting to many columns and can be used where different fields are often used in the WHERE. However, Interleaved Sort Keys are much slower to VACUUM and are rarely worth using.

So, aim to use SORTKEY on most of your queries, but don't worry too much about the other queries unless you are having some particular performance problems.

See: Redshift Sort Keys - Choosing Best Sort Style | Hevo Blog

Compound Sort Key vs. Sort Key

2 Answers