4
votes

What is the xpath expression to select all nodes of a document?

Given this example XML:

<div class="header"/>

I contains three nodes: <div> (element), class= (attribute) and "header" (text).

$doc = new DOMDocument;
$doc->loadXml('<div class="header"/>');
$xpath = new DOMXPath($doc);

I tried with //node():

$xpath->query('//node()');

which returns all element nodes only (I assume because of //). Is there a way to add other nodes like attributes and textnodes in attribute values?


Additional example:

I can obtain each node by using the DOMDocument API, e.g. to obtain the text node of the attribute value:

$doc = new DOMDocument;
$doc->loadXml('<div class="header"/>');
$class = $doc->documentElement->getAttributeNode('class');
echo $class->childNodes->item(0)->nodeName;

Which gives:

#text

How to obtain the superset of all nodes with one xpath expression, especially including that exemplary class attribute-node child text-node?

4

4 Answers

3
votes

Use:

//node() | //@* | //namespace::*

this selects any node (of type document node /, element node, text node, processing instruction node and comment node) and any attribute node and any namespace node -- that is all nodes because there are no other types of nodes.

How you access the obtained XmlNodeList containing the selected nodes depends on the API of the specific XPath engine you are using -- read and use your documentation.

XSLT- based example:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">

  <xsl:for-each select=
   "//node() | //@* | //namespace::*">

   Type: <xsl:text/>

   <xsl:choose>
    <xsl:when test="not(..)">
     <xsl:text>document node </xsl:text>
    </xsl:when>
    <xsl:when test="self::*">
     <xsl:text>element </xsl:text>
    </xsl:when>
    <xsl:when test="self::text()">
     <xsl:text>text-node </xsl:text>
    </xsl:when>
    <xsl:when test="self::comment()">
     <xsl:text>comment-node </xsl:text>
    </xsl:when>
    <xsl:when test="self::processing-instruction()">
     <xsl:text>PI-node </xsl:text>
    </xsl:when>
    <xsl:when test="count(.|../@*) = count(../@*)">
     <xsl:text>attribute-node </xsl:text>
    </xsl:when>
    <xsl:when test=
    "count(.|../namespace::*) = count(../namespace::*)">
     <xsl:text>namespace-node </xsl:text>
    </xsl:when>
   </xsl:choose>

   <xsl:text>Name: "</xsl:text>
   <xsl:value-of select="name()"/>" <xsl:text/>

   <xsl:text>Value: </xsl:text>
   <xsl:value-of select="."/>

  </xsl:for-each>

 </xsl:template>
</xsl:stylesheet>

when this XSLT transformation is applied on any XML document it selects all nodes using the above XPath expression (the transformation intentionally excludes any white-space-only text nodes) and outputs (in document order) the type, name and string-value of the selected nodes.

For example, when applied on this XML document:

<networkOfBridges xmlns:x="x">
    <bridge id="1"  otherside="A" />
    <!-- A Comment -->
    <bridge id="2"  oneside="A"/>
    <?PI Processing Instruction ?>
    <bridge id="3"  oneside="A" otherside="A" />
</networkOfBridges>

the result is:

   Type: element Name: "networkOfBridges" Value: 

   Type: namespace-node Name: "xml" Value: http://www.w3.org/XML/1998/namespace

   Type: namespace-node Name: "x" Value: x

   Type: element Name: "bridge" Value: 

   Type: namespace-node Name: "xml" Value: http://www.w3.org/XML/1998/namespace

   Type: namespace-node Name: "x" Value: x

   Type: attribute-node Name: "id" Value: 1

   Type: attribute-node Name: "otherside" Value: A

   Type: comment-node Name: "" Value:  A Comment 

   Type: element Name: "bridge" Value: 

   Type: namespace-node Name: "xml" Value: http://www.w3.org/XML/1998/namespace

   Type: namespace-node Name: "x" Value: x

   Type: attribute-node Name: "id" Value: 2

   Type: attribute-node Name: "oneside" Value: A

   Type: PI-node Name: "PI" Value: Processing Instruction 

   Type: element Name: "bridge" Value: 

   Type: namespace-node Name: "xml" Value: http://www.w3.org/XML/1998/namespace

   Type: namespace-node Name: "x" Value: x

   Type: attribute-node Name: "id" Value: 3

   Type: attribute-node Name: "oneside" Value: A

   Type: attribute-node Name: "otherside" Value: A
3
votes

Your example actually contains only two nodes: an element (div) and an attribute (class="header"). So, "header" is the value of the attribute and not a separate node.

Text nodes do exist but they are used for text in between elements. For example, in <title>Alice in wonderland</title>, there are two nodes: an element (title) and a text node (Alice in wonderland).

Therefore, the best you can do in this case is //*|//@*.

EDIT, after your update in the question.

The existence of the text node is due to the php-specific implementation and it is not part of the W3C standard. There are only 2 nodes that XPath considers, regardless of the implementation.

Having said that, you could use some XPath functions to get what you want. The function name() returns the name of a node and the function string() returns the string-value. Maybe you could use these to take strings as a result (instead of nodes).

1
votes

Have you tried something like //*|//@*|//text()?

-1
votes
foreach ($xpath->query('//*[count(*) = 0]') as $node) {
    $path = array();
    $val = $node->nodeValue;
    do {
        $path[] = $node->nodeName;
    }
    while ($node = $node->parentNode);
    $result[implode('/', array_reverse($path))] = $val;
}