UTF-8 error when bulk loading data into Snowflake that was exported from SQL Server using BCP

Question

I have a CSV file I'm creating by exporting a table in SQL Server 2016 SP2 using the bulk copy utility (bcp.exe). I'm setting the code page to 65001 (which Microsoft's documentation states is UTF-8). However, when I stage the file in Snowflake and then try to use the COPY command to move it into a table, I get an error that says, "Invalid UTF8 detected in string '0xFF0xFE00x0000x0010x0010x0060x0000x0000x0000x0000x0010x00n0x0040x00M0x00M0x00c0x00A0x00A0x00M0x00'."

If I use the IGNORE_UTF8_ERRORS flag, I get data in my table that is unintelligible. Any suggestions about how to fix the issue would be gratefully received.

Here's my BCP call:

BCP "SELECT Id, Name FROM database_name.owner_name.table_name WHERE Id = '0011602001r4ddgAZA'" queryout C:\temp\test.csv "-t|" -w -T -S. -C 65001

Here's the code in Snowflake:

--Create a file format
create or replace file format SFCI_Account
  type = 'CSV'
  field_delimiter = '|'
  validate_utf8 = True
 ;

-- Create a Stage object
create or replace stage SFCI_Account_stage
file_format = SFCI_Account;

-- Check my file is there  
list @SFCI_Account_stage;

-- Copy the file into the table
copy into Test
from @SFCI_Account_stage
file_format = (format_name = SFCI_Account)
pattern='.*.csv.gz'
on_error = 'skip_file';

ASturt ASturt · Accepted Answer · 2019-10-09T22:33:29

Apparently, all I needed to do was change the -w to -c in my BCP call and add the following:

-r "\r\n"

So, my final BCP call looks like this:

BCP "SELECT Id, Name FROM database_name.owner_name.table_name WHERE Id = '0011602001r4ddgAZA'" queryout C:\temp\test.csv "-t|" -c -T -S. -C 65001 -r "\r\n"

Now, that fixed the issue of the the UTF-8 error, but now I have to figure out how to deal with carriage returns in the data.

UTF-8 error when bulk loading data into Snowflake that was exported from SQL Server using BCP

1 Answers