Redshift - Using sort key is doubling the table size?

Question

So I am trying to optimize some tables, the strange thing is I found out that a table which never had any sorting key defined before, the table size (blocks in MB) is 720. And when I try to define a sort key, or compound sort key, the table size literally doubles to 1440. I have tried in conjunction with different distribution styles, but that doesn't change the table size. I also ran ANALYZE COMPRESSION, but there was nothing to improve on.

So what should I do on this case? I would need the tables to have a sort key since they are used in alot of joins, but the doubling of table size it not ideal. I tried googling for this particular issue but seems there has been nothing on this problem

Edit: I do not try to implement this on an existing table, instead a created a new table like the current one, then I inserted all the data. And just to be sure, I ran statistics and vacuum on the new table.

What happens if you create a new table with the sort key schema and attempt copying the contents into that table? — Chris Williams
Perhaps this is relevant? stackoverflow.com/questions/51623529/redshift-table-size — Chris Williams
but it doesnt help at all, i tried using all different sort keys but it still doubles my table size. — NewGirl

Joe Harris Joe Harris · Accepted Answer · 2020-08-18T19:39:26

Your table size is being determined by the minimum table size. See this support document for more information: Why does a table in my Amazon Redshift cluster consume more disk storage space than expected?

For tables created using the KEY or EVEN distribution style: Minimum table size = block_size (1 MB) * (number_of_user_columns + 3 system columns) * number_of_populated_slices * number_of_table_segments.

The minimum table size is increased when you have a sort key because Redshift allocates extra space for new unsorted data to be added to the table.

If an Amazon Redshift table has a sort key, the table has two segments—one sorted segment and one unsorted segment. If an Amazon Redshift table has no sort key, all data is unsorted, and therefore the table has one unsorted segment.

When data is added to an existing table with a sort key, the new data is maintained in a separate segment that contains unsorted data—the data is not inserted into the original sorted key segment until a VACUUM operation is performed.

Redshift - Using sort key is doubling the table size?

1 Answers