1
votes

I'm trying to figure out how to use XPath to get the exceptionID and instrumentID values out of the XML snippet in the following XML document (yes having XML in the CDATA is a little odd, but that's what I get from the 3rd party service)

<?xml version="1.0"?>
  <exception>
    <info>
      <![CDATA[
        <info>
          <exceptionID>1</exceptionID>
          <instrumentID>1</instrumentID>
        </info>
      ]]>
    </info>
</exception>

Is it possible to get the values in one XPath statement?

I'm using javax.xml.xpath.XPath inside Java (JDK 1.5 with Xalan 2.7.1 and Xerces 2.9.1), e.g.

XPath xpath = XPathFactory.newInstance().newXPath();

Long exceptionId  = new Long(((Double)xpath.evaluate(this.exceptionIdXPath, 
                               document, XPathConstants.NUMBER)).longValue());

It's the this.exceptionIdXPath variable that I'm not sure how to set, I know for example that:

/exception/info/text()/info/exceptionID won't work (text() returns the data inside the CDATA but with no 'knowledge' that it is XML)

2
What XPath/XSLT engine are you using?Dirk Vollmar
I'm using javax.xml.xpath.XPath inside Java (JDK 1.5 with Xalan 2.7.1 and Xerces 2.9.1)Martijn Verburg
XPath 3.0 could do that with one expression parse-xml(/exception/info)/info/exceptionID using parse-xml saxonica.com/documentation/functions/intro/parse-xml.xml, Saxon 9.3 saxonica.com is implemented in Java and supports XPath 3.0 in its commercial versions.Martin Honnen
@Martin Honnen - Thanks, I trialled this and it worked, but introduces an extra lib to our solution for one (fairly minor in comparison) issue. I've left a note in the code and my issue tracker to revisit if/when we move to XPath and its implementing libsMartijn Verburg
@karianna: Don't use unparsed data as parseable data. Bad design choice.user357812

2 Answers

5
votes

Yes, you can do it. But anything inside the CDATA section is a string and won't be part of the DOM. Therefore, you have to use XPath's string manipulation functions.

In XPath you can use substring-before and substring-after. Something like this may work:

substring-before(substring-after(/exception/info,"<exceptionID>"), "</exceptionID>")
2
votes

This is going to be very specific to the tools you're using (it would be good to know what platform and libraries you're using), but generally you can't do this in a single step. The whole point of CDATA is that it's raw character data and not necessarily XML.

What you can do is capture the text() in exception/info (basically the contents of your CDATA block) and create a new XML document (in memory) from that, and then use XPath over that document.

The detailed steps for this are platform-dependant.