5
votes

I used XPath to parse rss xml data, and the data is

<rss version="2.0">
  <channel>
    <title>
      <![CDATA[sports news]]>
    </title>
  </channel>
</rss>  

I want to get the text "sports news" using xpath "/rss/channel/title/text()" ,but the result is not what I want ,the real result is "\r\n",so how to found the result I want.

the code is below:

    Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
    XPathFactory xpathFactory = XPathFactory.newInstance();
    XPath xPath = xpathFactory.newXPath();
    Node node = (Node) xPath.evaluate("/rss/channel/title/text()", doc,XPathConstants.NODE);
    String title = node.getNodeValue();
2

2 Answers

4
votes

Try calling setCoalescing(true) on your DocumentBuilderFactory and this will collapse all CDATA/text nodes into single nodes.

0
votes

You could try changing the XPath expression to

"string(/rss/channel/title)"

and use return type STRING instead of NODE:

Node node = (Node) xPath.evaluate("string(/rss/channel/title)", doc,
                                  XPathConstants.STRING);

This way you are not selecting a text node, but rather the string value of the title element, which consists of the concatenation of all its descendant text nodes.