I'm trying to create a stored procedure in SQL Server 2016 that converts XML that was previously converted into Varbinary back into XML, but getting an "Illegal XML character" error when converting. I've found a workaround that seems to work, but I can't actually figure out why it works, which makes me uncomfortable.
The stored procedure takes data that was converted to binary in SSIS and inserted into a varbinary(MAX) column in a table and performs a simple
CAST(Column AS XML)
It worked fine for a long time, and I only began seeing an issue when the initial XML started containing an ® (registered trademark) symbol.
Now, when I attempt to convert the binary to XML I get this error
Msg 9420, Level 16, State 1, Line 23
XML parsing: line 1, character 7, illegal xml character
However, if I first convert the binary to varchar(MAX), then convert that to XML, it seems to work fine. I don't understand what is happening when I perform that intermediate CAST that is different than casting directly to XML. My main concern is that I don't want to add it in to account for this scenario and end up with unintended consequences.
Test code:
DECLARE @foo VARBINARY(MAX)
DECLARE @bar VARCHAR(MAX)
DECLARE @Nbar NVARCHAR(MAX)
--SELECT Varbinary
SET @foo = CAST( '<Test>®</Test>' AS VARBINARY(MAX))
SELECT @foo AsBinary
--select as binary as varchar
SET @bar = CAST(@foo AS VARCHAR(MAX))
SELECT @bar BinaryAsVarchar -- Correct string output
--select binary as nvarchar
SET @nbar = CAST(@foo AS NVARCHAR(MAX))
SELECT @nbar BinaryAsNvarchar -- Chinese characters
--select binary as XML
SELECT TRY_CAST(@foo AS XML) BinaryAsXML -- ILLEGAL XML character
-- SELECT CONVERT(xml, @obfoo) BinaryAsXML --ILLEGAL XML Character
--select BinaryAsVarcharAsXML
SELECT TRY_CAST(@bar AS XML) BinaryAsVarcharAsXML -- Correct Output
--select BinaryAsNVarcharAsXML
SELECT TRY_CAST(@nbar AS XML) BinaryAsNvarcharAsXML -- Chinese Characters
varcharmeans ASCII, or at least, single-byte-encoded text. ® is outside the 0-127 range that isn't affected by codepages. Try with nvarchar andCAST( N'<Test>®</Test>' as varbinary(max)).nvarcharmeans UTF16 ie two bytes, which is why the conversion from varchar to varbinary to nvarchar failed. - Panagiotis Kanavosnvarcharfields, parameters and string literals. - Panagiotis Kanavosvarchar, run into conversion issues and instead of using the correct column collations or switching tonvarchar, just covered up the problem by usingvarbinary. That didn't fix anything, just moved the encoding problems to the reader. That worked as long as there were no non-Latin characters (which wouldn't have caused problems with varchar either). When the first non-Latin character was added, boom, the reader failed - Panagiotis KanavosCAST(CAST( '<Test>®</Test>' AS VARBINARY(MAX)) AS NVARCHAR(MAX))Uh, don't do that. The fundamental rule of character encodings is to read with the encoding that was used for writing. - Tom Blodget