7
votes

Is there a way to change the column ordering in Amazon Redshift (or Postgres, since it is based on it)? Or maybe add a column at a specific position?

In mysql there you can do:

ALTER TABLE MY_TABLE
ADD COLUMN {NEW_COL} AFTER {EXISTING_COL}

But this does not work in Redshift. Any ideas?

4
Purists would argue that you should never need to, because columns should be referred to by name, not order, but in practice SQL does treat columns as ordered, so it sort of makes sense to want to choose that order. The ALTER TABLE docs don't mention any way to do it, but that doesn't prove there isn't a workaround of some sort, particularly in Redshift, which uses a "column-oriented" storage model. - IMSoP
when importing data with a COPY command you need the column order in the table to match the column order of the CSV file. - Martin Taleski
Probably a better link would be to the Redshift docs, since it diverged from Postgres some time ago. However, now you mention COPY, I think we may have an X/Y Problem here... - IMSoP
Can I ask - why do you need to change the order of the columns? The way I look at it, redshift column order should not matter. - Jon Scott

4 Answers

9
votes

From your comments, it seems that what you actually need is to be able to COPY into a table from a file which has columns in a particular order.

According to the Redshift documentation for the COPY command:

(column1 [, column2, ...])

Specifies an optional column list to load data fields into specific columns. The columns can be in any order in the COPY statement, but when loading from flat files, such as in an Amazon S3 bucket, their order must match the order of the source data. [...] If no column list is specified, the command behaves as if a complete, in-order column list was specified.

So rather than re-ordering the columns in your table, you just need to specify them in the COPY statement, as in some of the examples in the docs:

copy venue_new(venueid, venuename, venuecity, venuestate) 
from 's3://mybucket/data/venue_noseats.txt' 
credentials 'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'
delimiter '|';
6
votes

The answer is no, redshift does not (easily) support column shuffling, which is strange, as I believe tables are stored as individual columns. There is no way to do this without unloading/loading or table copying.

It is said that the unload/load is the preferred method, as it will take advantage of any parallelism you configured into your table.

Thus, the standard methodology is necessary:

There may be a "secret way" to do this with only the one column in question (dump column, drop column, add column, reload column), but that sounds incredibly sketchy and should be avoided.

1
votes

Redshift doesn't support ordering at all. I have to solve same problem in my case, and this is how I have done it.

Best option is following unload,alter the table by drop and re-create.

1)Unload to S3

unload ('select (Column1,column2,Column3,...,Column_n) from orginal_table') to 's3://<your_bucket>/<your_file>' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' MANIFEST  DELIMITER '|'  GZIP   NULL AS 'null_string' ESCAPE ALLOWOVERWRITE;

2)Drop AND/Or re-create

Create duplicate_table(Column1,column2,Column3,...,Column_n);**with new sequence make sure your seq

3)Reload.

copy duplicate_table(Column1,column2,Column3,...,Column_n) from  's3://<your_bucket>/<your_file>manifest' CREDENTIALS 'aws_access_key_id=<your_key>;aws_secret_access_key=<your_secret>' MANIFEST  DELIMITER '|'  GZIP   NULL AS 'null_string' ESCAPE ALLOWOVERWRITE;
1
votes

You can simply create a new table in Redshift with the required ordering

CREATE TABLE temp_table_name (column1 dtype1, column2 dtype2, column3 dtype 3 ...);

and insert data from the source table in the required order.

INSERT INTO temp_table_name (SELECT column1, column2, column3 ... FROM table_name);

Then drop the original table

DROP TABLE table_name;

and rename the temp table to the original table

ALTER TABLE temp_table_name RENAME TO table_name;