3
votes

I am relatively new to XPath and XQuery. I am working with XML stored in a table called tblXML. XML is stored in this table by caseID. For this example, this value is unique. The XML itself is stored in a column labeled Data.

If you were to select * from tblXML where caseID = '12345' you would see this:

caseID        data
------        ------
12345         <root><patient><diagnosis preferred_name="diagnosis" tier="1">Melanoma</diagnosis> (XML file stored as blob in database)

If you were to dive into the XML, it would look something like this. For example, this is the XML in the data column for caseID '12345':

<root>
    <patient>
        <diagnosis preferred_name="diagnosis" tier="1">Melanoma</diagnosis>
        <month_of_birth preferred_name="month_of_birth" tier="1">02</month_of_birth>
        <day_of_birth preferred_name="day_of_birth" tier="2">01</day_of_birth>
        <year_of_birth preferred_name="year_of_birth" tier="1">1960</year_of_birth>
        <new_tumor_events>
            <month_of_nte preferred_name="month_of_nte" tier="1">12</month_of_nte>
            <day_of_nte preferred_name="day_of_nte" tier="2">30</day_of_nte>
            <year_of_nte preferred_name="year_of_nte" tier="1">1994</year_of_nte>
        </new_tumor_events>
        <follow_ups>
            <follow_up>
                <month_of_fu preferred_name="month_of_fu" tier="1">12</month_of_fu>
                <day_of_fu preferred_name="day_of_fu" tier="2">31</day_of_fu>
                <year_of_fu preferred_name="year_of_fu" tier="1">1995</year_of_fu>
            </follow_up>
            <follow_up>
                <month_of_fu preferred_name="month_of_fu" tier="1">12</month_of_fu>
                <day_of_fu preferred_name="day_of_fu" tier="2">31</day_of_fu>
                <year_of_fu preferred_name="year_of_fu" tier="1">1996</year_of_fu>
            </follow_up>
        </follow_ups>
    </patient>
</root>

Goals: I am trying to select all values for all elements at all levels/nodes of the XML document that have a @Tier attribute of "1". I would like to see each value be output in it's own row in a column called 'Value' like this:

caseID     Value
------     ------
12345      Melanoma
12345      02
12345      1960
12345      12
12345      1994
12345      12
12345      1995
12345      12
12345      1996

Here's SQL that's analogous to what I'm trying to write:

DECLARE @tier int 
SET @tier = '1' --To search by tier explicitly via variable

    select 
        x.caseID,
    bar.value('(//*[@tier = sql:variable("@tier")])[1]','varchar(max)') as value
from tblXML as x
    CROSS APPLY data.nodes('//*:patient') AS foo(bar)
where
    x.caseID = '12345'
    and x.data.value('(//@tier)[1]', 'varchar(3)') = '1'

This gives me...

case ID        value
------         ------  
12345          Melanoma

...but not the additional rows I'm looking for.

I think the my confusion centers around both the constraint imposed by the singleton [1] and the context node in my CROSS APPLY statement. I'm not sure how to get everything in the document that's Tier 1, not just the first instance of Tier 1 that the query finds.

For this example I know my context node in my CROSS APPLY statement is focused on the patient section of the XML, but I would like to be able to search at all levels of the XML tree both up and down. I know the // for @Tier lets me search at all levels of the document, but my context node seems to drive/constrain me to some degree on how deep into the document I can search.

It's my understanding that I can drill deep into the XML (follow_up perhaps?), then search from the top down. However, the deepest node will change from caseID to caseID and XML document to XML document. Is there a way to wildcard my CROSS APPLY statement into the deepest node in a given XML document and then look for any occurrence of @Tier within that document, then retrieve its corresponding value?

I realize that method lacks elegance and is not performant. I have not been able to find clear documentation thus far to give me the results I'm looking for.

Thanks in advance for your help.

EDIT: I should specify that this SQL will be the core of a table-valued function in SQL Server.

4

4 Answers

2
votes

Define a key:

<xsl:key name="t" match="*[@tier]" use="@tier"/>

and find all the teir=1 elements using

<xsl:for-each select="key('t', '1')">
  ....
</xsl:for-each>
1
votes

Try removing the singleton in your select:

DECLARE @tier int 
SET @tier = '1' --To search by tier explicitly via variable
    select 
        x.caseID,
        bar.value('/string()','varchar(max)')
from tblXML as x
    CROSS APPLY data.nodes('//*[@tier = sql:variable("@tier")]') AS foo(bar)
where
    x.caseID = '12345'

The xpath in your select already filters nodes based on the matching attribute value, so you don't need to do it again in your where constraint. The xpath predicate is analogous to where.

0
votes

Try this:

SELECT  
    x.caseID,
    bar.value('.','varchar(max)') as value
FROM tblXML as x
    CROSS APPLY data.nodes('//*[@tier=1]') AS foo(bar)
0
votes

You can basically do the tier lookup from the CROSS APPLY statement, and take value of the root in your select ('.' as first argument).

DECLARE @s XML;
SET @s='<root>
    <patient>
        <diagnosis preferred_name="diagnosis" tier="1">Melanoma</diagnosis>
        <month_of_birth preferred_name="month_of_birth" tier="1">02</month_of_birth>
        <day_of_birth preferred_name="day_of_birth" tier="2">01</day_of_birth>
        <year_of_birth preferred_name="year_of_birth" tier="1">1960</year_of_birth>
        <new_tumor_events>
            <month_of_nte preferred_name="month_of_nte" tier="1">12</month_of_nte>
            <day_of_nte preferred_name="day_of_nte" tier="2">30</day_of_nte>
            <year_of_nte preferred_name="year_of_nte" tier="1">1994</year_of_nte>
        </new_tumor_events>
        <follow_ups>
            <follow_up>
                <month_of_fu preferred_name="month_of_fu" tier="1">12</month_of_fu>
                <day_of_fu preferred_name="day_of_fu" tier="2">31</day_of_fu>
                <year_of_fu preferred_name="year_of_fu" tier="1">1995</year_of_fu>
            </follow_up>
            <follow_up>
                <month_of_fu preferred_name="month_of_fu" tier="1">12</month_of_fu>
                <day_of_fu preferred_name="day_of_fu" tier="2">31</day_of_fu>
                <year_of_fu preferred_name="year_of_fu" tier="1">1996</year_of_fu>
            </follow_up>
        </follow_ups>
    </patient>
</root>';

DECLARE @tblXML TABLE(caseID INT PRIMARY KEY,data XML);
INSERT INTO @tblXML(caseID,data)VALUES(12345,@s);

DECLARE @tier INT;
SET @tier=1;

SELECT
    t.caseID,
    node.value('.','NVARCHAR(512)') AS value
FROM
    @tblXML AS t
    CROSS APPLY t.data.nodes('//*[@tier=sql:variable("@tier")]') AS t1n(node)
WHERE
    t.caseID=12345;