1
votes

I would like to copy a table in Redshift, such that I can preserve the existing column encodings while also adding DIST and SORT keys. Is there a simple repeatable way to do this?

Previously I have copied the table DDL (which doesn't have encodings), ANALYZED the table, manually typed in the column encodings, manually added DIST and SORT keys, created the new table, inserted the data, dropped the old, then altered the new. This is quite time consuming and prone to error.

Is there a better way?

3

3 Answers

3
votes

Redshift automatically applies compression to new empty tables. Note that it typically defaults to LZO in this situation.

You can specify the DISTKEY and SORTKEY in a CREATE TABLE … AS query. So the simplest way to do what you want is:

CREATE TABLE new_table 
DISTKEY(new_dist_col) 
SORTKEY(new_sortkey_1, new_sortkey_2)
AS 
SELECT * FROM old_table;

EDIT: To keep existing encodings, use the v_generate_tbl_ddl view from our Utils library to get DDL of existing table with encoding. Then modify it to create the new target and INSERT INTO from the old table.

SELECT ddl 
FROM admin.v_generate_tbl_ddl 
WHERE tablename = 'old_table'
;
                   ddl
-------------------------------------------------
 --DROP TABLE "my_schema"."old_table";
 CREATE TABLE IF NOT EXISTS "my_schema"."old_table"
 (
         "dist_col" BIGINT   ENCODE zstd
         ,"sortkey_2" TIMESTAMP
         ,"sortkey_1" NUMERIC(28,4)   ENCODE zstd
         ,"column_d" VARCHAR(256)   ENCODE zstd
 )
 DISTSTYLE KEY
 DISTKEY ("dist_col")
 SORTKEY (
         "sortkey_2"
         "sortkey_1"
         )
 ;
0
votes

You can parse the built-in metadata tables "information_schema.tables" and "information_schema.columns" (and others) in order to programmatically determine the table definition. It is possible then to completely automate the creation of a new table clone, and insert all rows from one table into another.

0
votes

Amazon (AWS Labs) maintains an amazon-redshift-utils repository containing a ColumnEncodingUtility, which supports in-place Updates to Distribution and Sort Keys (in addition to its main function of optimizing Column-Encoding compression for existing tables).

This utility simply performs the migration process outlined in your question (create new table based on original DDL, update column-encoding and DIST/SORT keys, migrate data using INSERT...SELECT..., then alter-rename the new table to replace the old). However, the utility fully automates the process and it's officially maintained by Amazon, so it's less prone to error than performing these steps manually every time.