0
votes

Good Morning,

I have a problem with a XML that contents CDATA code. If we have this XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<character>
   <Body>
      <methodResult>
         <nodeOut>
            <![CDATA[  <film>Indiana Jones and the Kingdom of the Crystal Skull</film>]]>
         </nodeOut>
      </methodResult>
   </Body>
</character>

We need to have this:

<film>Indiana Jones and the Kingdom of the Crystal Skull</film>

Where is the XSLT? I want extract only the CDATA content in a XML file and delete the rest. I use XSLT 1.0.

Thank you!

3

3 Answers

1
votes

This will produce XML:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">

    <!-- ignore this elements -->
    <xsl:template match="role|actor|part"/>

    <!-- get the remaining text and remove white-spaces -->
    <xsl:template match="text()">
        <xsl:value-of select="normalize-space(.)" disable-output-escaping="yes"/>
    </xsl:template>

</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="UTF-8"?><film>Indiana Jones and the Kingdom of the Crystal Skull</film>
0
votes

You could use a transformation that has the output method set to text and simply extract the text node from the name element.

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <xsl:output method="text" />

    <xsl:template match="node()|@*">
        <xsl:apply-templates select="node()|@*" />
    </xsl:template>

    <xsl:template select="name/text()">
        <xsl:value-of select="." />
    </xsl:template>

</xsl:stylesheet>

Note that this will fail if there's multiple CDATA sections in the element, that you'd need to create some sort of root element if there's more than one name in your input. There's also leading whitespace in your CDATA section so I suggest you trim the output. One way you could do that in the XSLT itself is to use function normalize-space() but it would affect the contents of the CDATA "xml" as well. There's also no XML prolog with this method, so if the output is seen as valid XML depends on what you feed it to.

But this is a good place to start.

0
votes

A clean solution is possible in XSLT 3.0 (as supported by Saxon 9.7 or Exselt) using

<xsl:template match="/">
  <xsl:copy-of select="parse-xml-fragment(character/name/text()[last()])"/>
</xsl:template>

See https://www.w3.org/TR/xpath-functions-30/#func-parse-xml-fragment.