Which option is best for selecting column compression encoding [ COPY VS ANALYZE COMPRESSION ]

Question

Scenario : I have to change existing table's column encoding

a) If I execute ANALYZE COMPRESSION table_name ---this approach is suggested to use ZSTD compression for all columns including SORT-KEY column.

b) I have created new table using existing table’s DDL and used copy command in order to get column compression encoding (Copy select column compression encoding when load data into an empty table) ---COPY command suggested LZO for all columns including SORT-KEY column.

Question :

Which approach is correct or optimised ? SORT-KEY column compression is bad so will ZSTD for SORT-KEY column improve performance ?

Fine-tuning should be done based on your workload. Did you go through docs.aws.amazon.com/redshift/latest/dg/… to get the idea of how to measure performance before and after your adjustments? — Nikita Sokolov
I always find that analyze compression recommends zstd, but it over estimates the benefits! (in my experience little or no difference from lzo) — Jon Scott

Tony Gibbs Tony Gibbs · Accepted Answer · 2018-08-30T01:13:27

ANALYZE COMPRESSION only looks at the effectiveness of the compression based on storage and does not consider other factors.

In many cases the first column of the SORT KEY compresses well and is typically filtered on (predicate in the where clause). If for some reason you never filtered on the column (maybe a merge join) it would be okay to compress the SORT KEY.

The reason we recommend decompressing the first column of the SORT KEY is because when you filter with a range restricted scan on a column that is highly compressed compared to the other columns you are scanning it can result in a slight decrease in performance.

Which option is best for selecting column compression encoding [ COPY VS ANALYZE COMPRESSION ]

2 Answers