3
votes

What is the purpose of adding a text qualifier to a SSIS flat text file output?

I'm pulling data out of a SQL database that has quotes/commas/pipes/and many other common delimiters in the data.

Extreme example of a data point in a column:

"Johnson"|Smith,Jones

I set up the export as a comma delimited, with a double quote " text qualifier. I assumed it would export the data like so, and it did:

,""Johnson"|Smith,Jones",

Now i'm testing re-importing the data back in, as a comma delimited, with a double quote text qualifier. I got errors saying SSIS couldn't find the delimiter. I thought it would recognize the combination comma, and double quote, essentially as a more complex delimiter.

If adding a text delimiter to the output doesn't help with the problem of having the characters in the actual data, what does it do?

Assuming the person receiving the data might use a tool like Excel to process the data, which doesn't seem to be able to handle a complex multi character delimiter like |", is the best way to handle this by removing the most common delimiter from my data, and using that as the delimiter? Probably pipe in my case, instead of comma.

1

1 Answers

4
votes

Text qualifier is used in the event that delimiters are contained within the row cell. Typically, the text qualifier is a double quote. In the event that the cell contains a delimiter and a text qualifier is not used, then the data that occurs after the delimiter will spill into the next column. From there, the data row can potentially blow up and none of the columns will line up afterwards. It can be a real mess.

Additionally, you will not see the text qualifier in applications, like Excel. However, if you open the file in Notepad++, then you will see the text qualifiers. There can be a lot of data (e.g., text qualifiers, new line characters, column delimiters, etc.) that is contained within a file but is not displayed in certain applications. This data typically is used to define the structure of the data as opposed to being the actual data.

For your problem, you will need to remove the double quotes from the source data or use a different text qualifier. You could use a single quote, but what if you have data like Jones's? The idea here is that the text qualifier should be unique in defining the data structure, which, as I understand it, means that you cannot have a text qualifier that is actually a part of the data (see note from Microsoft below - emphasis mine).

Per Microsoft:

Specify a text qualifier character. Each column can be configured to recognize a text qualifier.

The use of a qualifier character to embed a qualifier character into a qualified string is supported by the Flat File Connection Manager. The double instance of a text qualifier is interpreted as a literal, single instance of that string. For example, if the text qualifier is a single quote and the input data is 'abc', 'def', 'g'hi', the output data is abc, def, g'hi. However, an instance of a qualifier embedded in a qualified string causes the Flat File Source to fail with the error DTS_E_PRIMEOUTPUTFAILED.


References