2
votes

I am greeted with an illegal xml character error when parsing a table record into xml.

SELECT 
    mb.ProductTitle,mb.ProductDescription,
    CAST((
        SELECT
            Id,                             
            ProductDescription,
        FROM ProductsManagement AS mpm
        WHERE mpm.MattressId = 6
        FOR XML PATH('ProductItemListModel'), 
        ROOT('MattressBarndProductItemList'))as XML)
FROM Brands AS mb
WHERE mb.Id = 6
FOR XML PATH(''), ROOT('ProductModel')

or

SELECT CONVERT(XML,'lift')

The record for the description is as follows:

Ease™ by Sealy adjustable base is the simple way to turn your bed into the perfect place to relax. The wireless remote controls the head and leg lift, for virtually unlimited range of ergonomic positions."

The above is not parsing into xml.

1
You're trying to aggregate strings with this code. In SQL Server 2017 you can use STRING_AGG. In earlier versions this technique is used to do the same thing. Illegal XML characters are a known problem of this technique.Panagiotis Kanavos
All SO questions about string aggregation and string splitting use techniques from Aaron Bertrand's articles, directly or indirectly. Using a SQLCLR function is the fastest technique and doesn't have any issues with invalid characters.Panagiotis Kanavos
@PanagiotisKanavos, perhaps OP wants to generate a XML export :)Alexander Volok

1 Answers

5
votes

This is because there is a list of known illegal characters in XML standard(s). Mostly those characters are not even visible, for instance a "terminal bell", or CHAR(7). Such character and other from the list will cause that error you now encounter.

There are few workarounds available, but all of them is about removing illegal chars.

Following example is based on a scalar function approach, therefore warning: it can perform slow on large amounts of data:

CREATE FUNCTION [dbo].RemoveInvalidXMLCharacters (@InputString VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
    IF @InputString IS NOT NULL
    BEGIN
      DECLARE @Counter INT, @TestString NVARCHAR(40)

      SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + ']%'

      SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)

      WHILE @Counter <> 0
      BEGIN
        SELECT @InputString = STUFF(@InputString, @Counter, 1, ' ')
        SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
      END
    END
    RETURN(@InputString)
END

So, adjusted query will be similar to:

SELECT 
    [dbo].RemoveInvalidXMLCharacter(smb.ProductTitle) as ProductTitle
,   [dbo].RemoveInvalidXMLCharacter(mb.ProductDescription) as ProductDescription
,    CAST((
        SELECT
            Id,                             
            [dbo].RemoveInvalidXMLCharacter(ProductDescription) ProductDescription
        FROM ProductsManagement AS mpm
        WHERE mpm.MattressId = 6
        FOR XML PATH('ProductItemListModel'), ROOT('MattressBarndProductItemList'))as XML)
FROM Brands AS mb
WHERE mb.Id = 6
FOR XML PATH(''), ROOT('ProductModel')

Another method is an conversation to VARBINARY and back described also in this linked topic: TSQL "Illegal XML Character" When Converting Varbinary to XML