The XML document:
<Home>
<Addr>
<Street>ABC</Street>
<Number>5</Number>
<Comment>BLAH BLAH BLAH <br/><br/>ABC</Comment>
</Addr>
</Home>
The XPath expression:
//*[contains(text(), 'ABC')]
//*
matches any descendant element of the root node. That is, any element but the root node.
[...]
is a predicate, it filters the node-set. It returns nodes for which ...
is true
:
A predicate filters a node-set [...] to produce a new node-set. For each node in the node-set to be filtered, the PredicateExpr is evaluated [...]; if PredicateExpr evaluates to true for that node, the node is included in the new node-set; otherwise, it is not included.
contains('haystack', 'needle')
returns true
if haystack
contains needle
:
Function: boolean contains(string, string)
The contains function returns true if the first argument string contains the second argument string, and otherwise returns false.
But contains()
takes a string as its first parameter. And it's passed nodes. To deal with that every node or node-set passed as the first parameter is converted to a string by the string()
function:
An argument is converted to type string as if by calling the string function.
string()
function returns string-value
of the first node:
A node-set is converted to a string by returning the string-value of the node in the node-set that is first in document order. If the node-set is empty, an empty string is returned.
string-value
of an element node:
The string-value of an element node is the concatenation of the string-values of all text node descendants of the element node in document order.
string-value
of a text node:
The string-value of a text node is the character data.
So, basically string-value
is all text that is contained in a node (concatenation of all descendant text nodes).
text()
is a node test that matches any text node:
The node test text() is true for any text node. For example, child::text() will select the text node children of the context node.
Having that said, //*[contains(text(), 'ABC')]
matches any element (but the root node), the first text node of which contains ABC
. Since text()
returns a node-set that contains all child text nodes of the context node (relative to which an expression is evaluated). But contains()
takes only the first one. So for the document above the path matches the Street
element.
The following expression //*[text()[contains(., 'ABC')]]
matches any element (but the root node), that has at least one child text node, that contains ABC
. .
represents the context node. In this case, it's a child text node of any element but the root node. So for the document above the path matches the Street
, and the Comment
elements.
Now then, //*[contains(., 'ABC')]
matches any element (but the root node) that contains ABC
(in the concatenation of the descendant text nodes). For the document above it matches the Home
, the Addr
, the Street
, and the Comment
elements. As such, //*[contains(., 'BLAH ABC')]
matches the Home
, the Addr
, and the Comment
elements.
//*[contains(text(),'ABC')]
returns only the<Street>
element. It doesn't return any ancestors of<Street>
or<Comment>
. – Ken Bloom