What I'm really trying to accomplish is to change the data types of a couple of the columns. In particular a couple of columns that are stored as strings that need to be dates. I was trying to use the CREATE TABLE / SELECT command with similar issues. I/O errors or broken connections.
I figured that it might be more efficient to export and re-import the data than to try to abuse the cluster by reading / writing so much data in a circle more or less.
I've tried both Aginity Workbench and SQL Workbench J with large timeouts (10 hours) configured. SQL Workbench J managed to run for 2 hours and 12 minutes today before failing with the same error I keep seeing over and over.
An I/O error occured while sending to the backend.
This is a fair chunk of data... 2,028,448,405 rows at present (I say "at present" because we're adding about 70 million rows per day). But I would expect that Redshift would handle this easily.
UNLOAD ( 'select weekday, day, month, year, guid,
..., colN from actions a where a.colN in (\'my\',\'penguin\',\'lolz\')' ) TO 's3://penguin-lolz-bucket/all.csv' CREDENTIALS 'aws_access_key_id=$id;aws_secret_access_key=$key ALLOWOVERWRITE;</code>
Googling the error message has just led me to a ton of posts about timeout configurations or general connection issues.
UPDATE 1:
So made several changes and so far the query has been running for 3 hours without error:
- Removed the file name from the TO field
- Created a new bucket just for this process
- Added the GZIP and PARALLEL options to the query
This was based on the idea that I might be exceeding the capacity of the bucket and/or the max object size on a per file basis.
UPDATE 2: The UNLOAD performs as expected now. (After the changes I made in UPDATE 1)
PRO TIP: When doing a large unload like this, keep your eye on the status of your query in the AWS Redshift console (you'll find it in one of the 'tabs' when you drill into your cluster details). I was using SQL Workbench J for this. After about 5 hours or so SQL Workbench J showed the query was still running. However the AWS Redshift console showed it was complete and it was.