2
votes

Using SQL Server 2008 R2. I'm trying to convert some very poor XML data that is stored as one big string in a column. Right now, I'm extracting a value from it using string parsing routines but I'd like to use XPath and XQuery and ultimately want to create a stored procedure that just requires the InstrumentID as the required input with the start and end dates as optional parameters.

Here is the way I'm doing it now using T-SQL:

Use MyDb
DECLARE @First VARCHAR(15), @Second VARCHAR(15),
        @DateLow datetime, @DateHigh datetime, 
        @InstrumentID Varchar(15)

SET @First = '<InstrumentID>'
SET @Second = '</InstrumentID>'
SET @DateLow = '2003-04-01'  
SET @DateHigh = '2004-01-13' 
SET @InstrumentID = NULL
SET @InstrumentID = 'SB2093780001'

select * 
from
    (SELECT 
         out_interface_id, msg_id, 
         SUBSTRING(xml_msg, CHARINDEX(@First, xml_msg) + LEN(@First), 
         CHARINDEX(@Second, xml_msg) - 
         CHARINDEX(@First, xml_msg) - 
         LEN(@First)) as InstrumentID,
         msg_type, xml_msg, CAST(xml_msg AS XML) as [Quick_XML], 
         date_received,status, last_modified, environment,
         transaction_closed_date
     FROM
         MyTable
     WHERE
         msg_type IN ('ABC','DEF') 
         AND date_received >= @DateLow
         AND date_received < DATEADD(DAY, 1, @DateHigh)
    ) x
where 
    (x.InstrumentID = @InstrumentID or x.InstrumentID = NULL)
order by 
    date_received desc

And here is what I have tried so far using XPath and XQuery

Use MyDb
declare @x xml;

select
    out_interface_id, msg_id,
    CAST(xml_msg AS XML) as [Quick_XML],
    @x.query('//InstrumentID') AS InstrumentID,
    msg_type, xml_msg,
    date_received, status, last_modified, environment,
    transaction_closed_date
from 
    MyTable
where
    msg_type in ('ABC','DEF') 
order by 
    date_received desc

I know @x.query won't work because it doesn't know that I want to extract the field from the xml_msg column but everything I've tried so far fails. I think I'm getting close but I've been reading everything I can find on XPath and XQuery and I can't find anything with this specific scenario. The xml_msg column is untyped, horrible xml that is just one long string. Maybe I will have to clean it up before I can query it? Some of the extracts from that field are hundreds of lines long after I cast it to XML. As always any help will be much appreciated.

XML Test Data

The xml_msg column is varchar(max)

<Proponix>
    <Header>
        <DestinationID>ABC</DestinationID>
        <SenderID>PRO</SenderID>
        <OperationOrganizationID>ABT1</OperationOrganizationID>
        <MessageType>ACCTV21</MessageType>
        <DateSent>20160701</DateSent>
        <TimeSent>1934</TimeSent>
        <MessageID>1091697493</MessageID>
    </Header>
        <Subheader>
            <OriginalInstrumentID>SB1499780001</OriginalInstrumentID>
                <SplitActivities>
                    <InstrumentID>SB1499780001</InstrumentID>
                </SplitActivities>
        </Subheader>
</Proponix>

You will see in the test xml that there is OriginalInstrumentID and InstrumentID. I'm only concerned with InstrumentID. OriginalInstrumentID is only there for certain types of Messages as defined in the msg_type field.

Test

select out_interface_id, msg_id,
CAST(xml_msg as XML).value('//InstrumentID[1]','nvarchar(MAX)') AS InstrumentID
from MyTable
order by date_received desc

When I try this code, I get the following error:

Msg 2389, Level 16, State 1, Line 3 XQuery [value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'

I've tried lots of different things but nothing is working.

Final Working query

It was just a matter of parenthesis.

select out_interface_id, msg_id,
    CAST(xml_msg as XML).value(
    '(//InstrumentID)[1]','nvarchar(MAX)')  AS InstrumentID
    ,msg_type, xml_msg, CAST(xml_msg AS XML) as [Quick_XML] 
    ,date_received,status, last_modified, environment
    ,transaction_closed_date
from MyTable
    where msg_type in ('ABC','DEF') 
    and date_received >= CAST(GETDATE() -1 as DATE)
    and date_received < CAST(GETDATE() as DATE)
order by date_received desc
1
Please show your (reduced) XML. Is it valid XML? What is the datatype of that column? - Shnugo

1 Answers

2
votes

Some things to think about:

  • VARCHAR(MAX) is a bad place to store XML. It is not directly casteable to XML and it is not fit for special characters. Best use real XML or - if there is any good reason to keep this as string - use NVARCHAR(MAX). SQL Server uses UTF16 internally and you will waste a lot on casting VARCHAR to XML...
  • With XML one should be as specific as possible. Using an XPath like (//InstrumentID)[1] is easy to type, but needs a deep search and could return unexpected results, if there are InstrumentIDs in other places too.

Very simple and straight:

DECLARE @tbl TABLE(YourXMLColumn VARCHAR(MAX), OtherVAlue VARCHAR(MAX));
INSERT INTO @tbl VALUES
(
'<Proponix>
  <Header>
    <DestinationID>ABC</DestinationID>
    <SenderID>PRO</SenderID>
    <OperationOrganizationID>ABT1</OperationOrganizationID>
    <MessageType>ACCTV21</MessageType>
    <DateSent>20160701</DateSent>
    <TimeSent>1934</TimeSent>
    <MessageID>1091697493</MessageID>
  </Header>
  <Subheader>
    <OriginalInstrumentID>SB1499780001</OriginalInstrumentID>
    <SplitActivities>
      <InstrumentID>SB1499780001</InstrumentID>
    </SplitActivities>
  </Subheader>
</Proponix>','Some other values'
);
SELECT tbl.OtherVAlue
      ,CAST(tbl.YourXMLColumn AS XML).value('(//InstrumentID)[1]','nvarchar(max)') AS InstrumentID
FROM @tbl AS tbl 

Any other value can be taken from your XML just as easy as above.

I'd advise you to use the most specific path like

SELECT tbl.OtherVAlue
      ,CAST(tbl.YourXMLColumn AS XML).value('(/Proponix/Subheader/SplitActivities/InstrumentID)[1]','nvarchar(max)') AS InstrumentID
FROM @tbl AS tbl