1
votes

I have a table with nvarchar(max) column that may contain xml of known structure. I'd like to parse it to perform some aggregations so need to filter out "bad" entries. Here is the test case:

create table TestTable ([Message] nvarchar(max))
insert into TestTable ([Message]) values 
    ('<root m="1"/>'),
    ('<root m="7"/>'),
    ('<rooo')
go

set quoted_identifier on
go   

create view TestView as
select data.value('(/root/@m)[1]', 'int') as MyValue
from (
    select cast([Message] as xml) as data
    from (
        select [Message] from dbo.TestTable where [Message] like '<root%>'
    ) as T1
) as T2
where data.exist('/root') = 1
go

select * from TestView

This produces:

Msg 9400, Level 16, State 1, Line 1 XML parsing: line 1, character 5, unexpected end of input

I don't understand why because if i run the nested query:

    select cast([Message] as xml) as data
    from (
        select [Message] from dbo.TestTable where [Message] like '<root%>'
    ) as T1

it perfectly returns 2 valid rows. Why??

p.s. Microsoft SQL Server 2008 (SP3) - 10.0.5500.0 (X64) Sep 21 2011 22:45:45 Copyright (c) 1988-2008 Microsoft Corporation Express Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1)

1
SQL Server can re-order expressions so there is no guarantee the where [Message] like '<root%>' happens first. You would need to put the potentially problematic expressions in a CASE statementMartin Smith
How is that possible if this breaks select semantics? Can my query be somehow rewritten to achieve what i need?UserControl
RE: Breaking SELECT semantics this is discussed here. Use select case when [Message] like '<root%>' then cast([Message] as xml) end as dataMartin Smith
Thanks, Martin! Can you please repost your comment as the answer and i'll mark it.UserControl

1 Answers

1
votes

Your issue is not in the select clause but the where clause:

where data.exist('/root') = 1

I suspect you are thinking: "Aha! if the xml is not formatted properly then this will return 0 or NULL". Nope, this function -- as with the other xml functions -- requires valid xml. Or it gets an error.

You might be interested in SQL Server: inline conditional convert with XML? and In SQL Server, what is the best way to determine if a given string is a valid XML or not?.

There does not appear to be a simple way to do what you want. However, you might be able to do a simple check on the original string to see if it is reasonable xml. For instance, the following checks if there are an equal number of "<" and ">":

select (case when len(replace(val, '<', '')) = len(replace(val, '>', ''))
             then 'MAYBE OKAY'
             else 'NOPE'
        end)
from (select '<badness' as val) t