0
votes

I have the following xml:

<?xml version="1.0" encoding="utf-8"?>
<products>
        <product>
              <ProductDescription><![CDATA[SOMETHIN''WHATEVER]]></ProductDescription>
        </product>


        <product>
              <ProductDescription><![CDATA[TYPICAL_TEXT ΜΑΣΚΑΡΑ]]></ProductDescription>
            </product>
</products>

which I want to make into columns. By searching around, I arrived at the following solution:

declare @input XML='copy-paste-my-xml'


SELECT 
Item.value('(ProductDescription)[1]', 'nvarchar(max)') as ProductDescription
from
@input.nodes('//product') AS T(Item)

Running the query produces this error:

Msg 9420, Level 16, State 1, Line 1 XML parsing: line 9, character 57, illegal xml character

The place indicated by the error is where the first Unicode character starts: ΜΑΣΚΑΡΑ. I deduce this because removing the unicode part makes the query run succesfully. But I've read that xml supports unicode. I also tried using N' but I got:

Msg 9402, Level 16, State 1, Line 1 XML parsing: line 1, character 38, unable to switch the encoding

What should I do to get the unicode value?

1
XML does support Unicode. It is possible however that your xml parsing needs to be configured to match the same unicode serialisation, which should be UTF-8.Dragonthoughts
Your XML is in an ASCII string: 'xml here'. Have you tried putting in a Unicode string: N'xml here'?Richard
Your tag and comment @Dragonthoughts helped me solve this. I had to add N' AND change encoding to uts-16George Menoutis
@GeorgeMenoutis You are welcome, I do my best to help. May I suggest you actually complete answer, so we can help people in the future who have a similar problem?Dragonthoughts
I thought I'd wait for you to make one so I could accept...if not, tell me and I will make one myselfGeorge Menoutis

1 Answers

1
votes

The problem here is two-folded:

  1. You are using an xml declaration, stating, that the XML is encoded in utf-8. Within SQL-Server this is obsolet. There is no utf-8 support on this level. So the XML is a liar.
  2. You are using a literal like 'Some XML here', which is implicitly of type CHAR / VARCHAR. This single-byte-encoded string is extended ASCII with very limited support for foreign characters (uses a character map through collation). You should use the literal with a leading N to make it NCHAR/NVARCHAR with a wide range of unicode support (to be exact: UCS-2).

In my following example I use your XML without the declaration. It is best to omit this, as it is senseless within SQL-Server. But alternatively you can change utf-8 to utf-16 or ucs-2:

declare @input XML=
N'<products>
        <product>
              <ProductDescription><![CDATA[SOMETHIN''WHATEVER]]></ProductDescription>
        </product>


        <product>
              <ProductDescription><![CDATA[TYPICAL_TEXT ΜΑΣΚΑΡΑ]]></ProductDescription>
            </product>
</products>';


SELECT 
Item.value('(ProductDescription)[1]', 'nvarchar(max)') as ProductDescription
from
@input.nodes('//product') AS T(Item)