Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?

Question

When performing an INSERT, Redshift does not allow you to insert a string value that is longer/wider than the target field in the table. Observe:

CREATE TEMPORARY TABLE test (col VARCHAR(5));
-- result: 'Table test created'

INSERT INTO test VALUES('abcdefghijkl');
-- result: '[Amazon](500310) Invalid operation: value too long for type character varying(5);'

One workaround for this is to cast the value:

INSERT INTO test VALUES('abcdefghijkl'::VARCHAR(5));
-- result: 'INSERT INTO test successful, 1 row affected'

The annoying part about this is that now all of my code will have to have these cast statements on every INSERT for each VARCHAR field like this, or the application code will have to truncate the string before trying to construct the query; either way, it means that the column's width specification has to go into the application code, which is annoying.

Is there any better way of doing this with Redshift? It would be great if there was some option to just have the server truncate the string and perform (and maybe raise a warning) the way it does with MySQL.

One thing I could do is just declare these particular fields as a very large VARCHAR, perhaps even 65535 (the maximum).

create table analytics.testShort (a varchar(3));
create table analytics.testLong (a varchar(4096));
create table analytics.testSuperLong (a varchar(65535));

insert into analytics.testShort values('abc'); 
insert into analytics.testLong values('abc');
insert into analytics.testSuperLong values('abc');

-- Redshift reports the size for each table is the same, 4 mb

The one disadvantage of this approach I have found is that it will cause bad performance if this column is used in a group by/join/etc:

https://discourse.looker.com/t/troubleshooting-redshift-performance-extensive-guide/326 (search for VARCHAR)

I am wondering though if there is no harm otherwise if you plan to never use this field in group by, join, and the like.

Some things to note in my scenario: Yes, I really don't care about the extra characters that may be lost with truncation, and no, I don't have a way to enforce the length of the source text. I am capturing messages and URLs from external sources which generally fall into certain range in length of characters, but sometimes there are longer ones. It doesn't matter in our application if they get truncated or not in storage.

Ramy Nasr Ramy Nasr · Accepted Answer · 2016-08-26T14:36:57

The only way to automatically truncate the strings to match the column width is using the COPY command with the option TRUNCATECOLUMNS

Truncates data in columns to the appropriate number of characters so that it fits the column specification. Applies only to columns with a VARCHAR or CHAR data type, and rows 4 MB or less in size.

Otherwise, you will have to take care of the length of your strings using one of these two methods:

Explicitly CAST your values to the VARCHAR you want:

INSERT INTO test VALUES(CAST('abcdefghijkl' AS VARCHAR(5)));
Use the LEFT and RIGHT string functions to truncate your strings:

INSERT INTO test VALUES(LEFT('abcdefghijkl', 5));

Note: CAST should be your first option because it handles multi-byte characters properly. LEFT will truncate based on the number of characters not bytes and if you have a multi-byte character in your string, you might end up exceeding the limit of your column.

Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?

1 Answers