How can I access CDATA as a node using XPath in Java?

Question

Using this online XPath tester on the following XML

<a>foo <![CDATA[ MyCData]]>  baz</a>

with the XPath expression /a/text(), I get back all the text

foo <![CDATA[ MyCData]]>  baz

(This is structured as three nodes, as we can see using /a/text()[2] , which returns baz.)

However, with javax.xml.xpath.XPath, the CData and the last text node are not returned at all. I get a single node with foo, and the remainder of the text <![CDATA[ MyCData]]> baz is just not available. Regardless of how XPath treats the XML structure, it is a bug if we cannot access nodes at all.

However, if I set isCoalescing(true) on the DocumentBuilderFactory, it concatenates all the text and CData nodes into one. I might end up using that, but it converts CData to escaped text in the output, which looks ugly, even if it is allowed by the standard. Also, I would prefer to be able to address the CData separately as some sort of node, whether "just" a text node, or else some special type of CData node.

By the way, if the CData is the only contents of its parent element, with no spaces or other text in front, an ordinary text-content XPath retrieves it successfully, even with isCoalescing at its default (false). So, we see that the Java XPath is always returning the first, and only the first, text node.

When I examine the full DOM tree of my DOM Document, with isCoalescing at its default, I find that the CData section is represented as its own node of type cdata-section, which is great, but how can I access this node in XPath?

Thanks, but that talks about XML inside CData. I just want the CData! In other XPath engines CData is simply a text node, but not in Java, as described. — Joshua Fox

Michael Kay Michael Kay · Accepted Answer · 2012-08-28T08:17:13

The online XPath tester is getting it wrong, I'm afraid. According to the XPath data model, the <a> element has a single text node child whose string value is "foo MyCDATA baz"; there is no second text node, so a request for the second text node should return nothing.

The XPath data model takes the view that CDATA is merely a convenient way of inputting data to avoid having to escape special characters; the presence of the CDATA does not affect the meaning or information content of the XML, so it is not made available to the application.

How can I access CDATA as a node using XPath in Java?

1 Answers