1
votes

I am moving a table from SQL Server to Redshift. I've exported the data and gotten it into a UTF-8 text file. When trying to load to Redshift, the COPY command fails, complaining the data exceeds the width of the field.

The destination Redshift table schema matches that of the source SQL Server table (i.e. varchar field widths are the same). If I understand correctly, Redshift's varchar size is in bytes, not characters, like SQL Server. So, multi-byte characters are causing the "too wide" problem.

I'd like to run a query to determine how big to make my varchar fields, but there doesn't seem to be a function that returns the number of bytes a string requires, only the number of characters in that string.

How have others solved this problem?

1
Is the SQL server data type varchar or nvarchar? If it's nvarchar then you'll need to double the max length from SQL server when moving to Redshift, i.e. nvarchar(50) becomes varchar(100)Brian Driscoll
The VARCHAR data type supports UTF-8 multi-byte characters up to a maximum of four bytes. Five-byte or longer characters are not supported. To calculate the size of a VARCHAR column that contains multi-byte characters, multiply the number of characters by the number of bytes per character. For example, if a string has four Chinese characters, and each character is three bytes long, then you will need a VARCHAR(12) column to store the string.Sandesh Deshmane
Brian, in both databases the fields are defined as varchar (not nvarchar)Todd
Sandesh - that is exactly the problem - finding the number of bytes per character within 6.5 billion rows, each with 5 potentially problem fields.Todd
Todd, Sandesh has the correct answer. If the field in SQL Server is a varchar(3) then you just multiply the field length by 4 and use a varchar(12) in Redshift (although we've found x3 works for all our migrated data). If you don't want to use that default configuration, then you'll have to run a max(length(<yourfield>)) operation in SQL Server then multiply that times 4 though that's not terribly safe against future source data that might be longer.mike_pdb

1 Answers

0
votes

Field length and as consequence fields types might be critical in Redshift. Load sample data into RedShift table with maximum fields sizes. Sample has to be as big as possible. Than you will be be able to calculate real field sizes with disregard of the definitions in MSSQL Server, that might be much bigger than you really need.