XPath 2.0 - select all nodes between 2 elements

3

votes

I have the following XML file:

<document>
  <article>
    <head>headline 1</head>
    <text>
      <paragraph>foo</paragraph>
      <paragraph>bar</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>some text</source>
    <portal>ABC</portal>
    <ID number="1"/>
  </article>
  <article>
    <head>headline 2</head>
    <text>
      <paragraph>lorem ipsum</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>another source</source>
    <portal>DEF</portal>
    <ID number="2"/>
  </article>
</document>

Now I'd like to return all nodes of each article that occur after the head node and before the portal node. Therefore I was looking into XPath 2 node comparison (<< and >> operators).

What I have so far is the following, which returns empty:

<xsl:template match="/">
  <xsl:copy-of select="/document/article/head/following-sibling::*[. << ./article/portal]"/>
</xsl:template>

Any ideas how to fix that xpath query?

xmlxsltxpathxslt-2.0xpath-2.0

2

votes

A simple XPath 1.0 expression should work for such a case:

/document/article/head/following-sibling::*[following-sibling::portal]

2

votes

Use:

/*/*/node()[. >> ../head and ../portal >> .]

Here is a complete transformation:

<xsl:stylesheet version="2.0"   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:sequence select="/*/*/node()[. >> ../head and ../portal >> .]"/>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<document>
    <article>
        <head>headline 1</head>
        <text>
            <paragraph>foo</paragraph>
            <paragraph>bar</paragraph>
        </text>
        <date>
            <day>10</day>
            <month>05</month>
            <year>2002</year>
        </date>
        <source>some text</source>
        <portal>ABC</portal>
        <ID number="1"/>
    </article>
    <article>
        <head>headline 2</head>
        <text>
            <paragraph>lorem ipsum</paragraph>
        </text>
        <date>
            <day>10</day>
            <month>05</month>
            <year>2002</year>
        </date>
        <source>another source</source>
        <portal>DEF</portal>
        <ID number="2"/>
    </article>
</document>

the wanted, correct result is produced:

    <text>
        <paragraph>foo</paragraph>
        <paragraph>bar</paragraph>
    </text>
    <date>
        <day>10</day>
        <month>05</month>
        <year>2002</year>
    </date>
    <source>some text</source>

    <text>
        <paragraph>lorem ipsum</paragraph>
    </text>
    <date>
        <day>10</day>
        <month>05</month>
        <year>2002</year>
    </date>
    <source>another source</source>

Update:

In a comment Roman Pekar has specified a new requirement: he wants to get all such nodes that are between the first head and portal of each article.

Of course, this is straightforward -- just change the above expresssion to:

/*/*/node()[. >> ../head[1] and ../portal[1] >> .]

2

votes

I'm often working with xml in SQL Server, so when I've seen Dimitre Novatchev answer, I've tried it in SSMS. It didn't work, as the XQuery implementation in SQL server is a statically typed language and does static type checking, so I've tried to find working in SQL Server form of this expression. Here it is:

/document/article/*[. >> ../head[1] and . << ../portal[1]]

The full query will be

declare @Data xml

select @Data = '
<document>
  <article>
    <head>headline 1</head>
    <text>
      <paragraph>foo</paragraph>
      <paragraph>bar</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>some text</source>
    <portal>ABC</portal>
    <ID number="1"/>
  </article>
  <article>
    <head>headline 2</head>
    <text>
      <paragraph>lorem ipsum</paragraph>
    </text>
    <date>
      <day>10</day>
      <month>05</month>
      <year>2002</year>
    </date>
    <source>another source</source>
    <portal>DEF</portal>
    <ID number="2"/>
  </article>
</document>
'

select @Data.query('/document/article/*[. >> ../head[1] and . << ../portal[1]]')

1

votes

I think you got a nice suggestion not using << at all but if you want to use it I think <xsl:copy-of select="/document/article/head/following-sibling::*[. << parent::article/portal]"/> fixes your attempt.

XPath 2.0 - select all nodes between 2 elements

4 Answers