Using this online XPath tester on the following XML
<a>foo <![CDATA[ MyCData]]> baz</a>
with the XPath expression /a/text()
, I get back all the text
foo <![CDATA[ MyCData]]> baz
(This is structured as three nodes, as we can see using /a/text()[2]
, which returns baz
.)
However, with javax.xml.xpath.XPath, the CData and the last text node are not returned at all. I get a single node with foo
, and the remainder of the text <![CDATA[ MyCData]]> baz
is just not available. Regardless of how XPath treats the XML structure, it is a bug if we cannot access nodes at all.
However, if I set isCoalescing(true) on the DocumentBuilderFactory, it concatenates all the text and CData nodes into one. I might end up using that, but it converts CData to escaped text in the output, which looks ugly, even if it is allowed by the standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node, or else some special type of CData node.
By the way, if the CData is the only contents of its parent element, with no spaces or other text in front, an ordinary text-content XPath retrieves it successfully, even with isCoalescing at its default (false). So, we see that the Java XPath is always returning the first, and only the first, text node.
When I examine the full DOM tree of my DOM Document, with isCoalescing at its default, I find that the CData section is represented as its own node of type cdata-section, which is great, but how can I access this node in XPath?