1
votes

I have a SSIS data flow task that reads from a CSV file and stores the results in a table.

I am simply loading the CSV file by rows (not even seperating the columns) and dumpting the entire row to the database, very simple process.

The file contains UTF-8 characters, and the file also has the UTF BOM already as I verified this.

Now when I load the file using a flat file connection, I have the following settings currently:

  1. Unicode checked
  2. Advanced editor shows the column as "Unicode text stream DT_NTEXT".

When I run the package, I get this error:

[Flat File Source [16]] Error: The data type for "Flat File Source.Outputs[Flat File Source Output].Columns[DataRow]" is DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead and convert the data to DT_NTEXT using the data conversion component.

[Flat File Source [16]] Error: Unable to retrieve column information from the flat file connection manager.

It is telling me to use DT_TEXT but my file is UTF-8 and it will loose its encoding right? Makes no sense to me.

I have also tried with the Unicode checkbox unchecked, and setting the codepage to "65001 UTF-8" but I still get an error like the above.

Why does it say my file is an ANSI file?

I have opened my file in sublime text and saved it as UTF-8 with BOM. My preview of the flat file does show other languages correctly like Chinese and English combined.

When I didn't check Unicode, I would also get this error saying the flat files error output column is DT_TEXT and when I try and change it to Unicode text stream it gives me a popup error and doesn't allow me to do this.

1

1 Answers

0
votes

I have faced this same issue for years, and to me it seems like it could be a bug with the Flat File Connection provider in SQL Server Integration Services (SSIS). I don't have a direct answer to your question, but I do have a workaround. Before I load data, I convert all UTF-8 encoded text files to UTF-16LE (Little Endian). It's a hassle, and the files take up about twice the amount of space uncompressed, but when it comes to loading Unicode into MS-SQL, UTF-16LE just works!

With regards to the actual conversion step I would say that is for you to decide what will work best in your workflow. When I have just a few files then I convert them one-by-one in a text editor, but when I have a lot of files then I use PowerShell. For example,

Powershell -c "Get-Content -Encoding UTF8 'C:\Source.csv' | Set-Content -Encoding Unicode 'C:\UTF16\Source.csv'"