When performing an INSERT, Redshift does not allow you to insert a string value that is longer/wider than the target field in the table. Observe:
CREATE TEMPORARY TABLE test (col VARCHAR(5));
-- result: 'Table test created'
INSERT INTO test VALUES('abcdefghijkl');
-- result: '[Amazon](500310) Invalid operation: value too long for type character varying(5);'
One workaround for this is to cast the value:
INSERT INTO test VALUES('abcdefghijkl'::VARCHAR(5));
-- result: 'INSERT INTO test successful, 1 row affected'
The annoying part about this is that now all of my code will have to have these cast statements on every INSERT for each VARCHAR field like this, or the application code will have to truncate the string before trying to construct the query; either way, it means that the column's width specification has to go into the application code, which is annoying.
Is there any better way of doing this with Redshift? It would be great if there was some option to just have the server truncate the string and perform (and maybe raise a warning) the way it does with MySQL.
One thing I could do is just declare these particular fields as a very large VARCHAR, perhaps even 65535 (the maximum).
create table analytics.testShort (a varchar(3));
create table analytics.testLong (a varchar(4096));
create table analytics.testSuperLong (a varchar(65535));
insert into analytics.testShort values('abc');
insert into analytics.testLong values('abc');
insert into analytics.testSuperLong values('abc');
-- Redshift reports the size for each table is the same, 4 mb
The one disadvantage of this approach I have found is that it will cause bad performance if this column is used in a group by/join/etc:
https://discourse.looker.com/t/troubleshooting-redshift-performance-extensive-guide/326 (search for VARCHAR)
I am wondering though if there is no harm otherwise if you plan to never use this field in group by, join, and the like.
Some things to note in my scenario: Yes, I really don't care about the extra characters that may be lost with truncation, and no, I don't have a way to enforce the length of the source text. I am capturing messages and URLs from external sources which generally fall into certain range in length of characters, but sometimes there are longer ones. It doesn't matter in our application if they get truncated or not in storage.