2
votes

I am working on an XSLT transformation to re-arrange XML blocks to validate NewsML files. Some of these files contains encoded characters (such as & " etc...). The problem is the XSLT transformation is converting these characters to their literal string (ie "and", "'"). This is causing problems. I do not want this to happen.

I have experimented with various techniques (uses of <xsl:text>, <xsl:value-of> and the disable-output-escaping flag, <xsl:output method='xml|html|xhtml|text'>) to no avail. These methods either, convert the characters, or simply leave them out.

eg, a string which starts with "stars on PM&amp;apos;s cards" can end up as

  • stars on PM's cards
  • stars on PMs cards

I am using the Saxonica (http://www.saxonica.com/) processing app.

The basic XSLT I am using is provided below. (There are other things but the problem exists even with this simplest stylesheet)

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="no" />
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

Any ideas on how to prevent this conversion would be most appreciated. The requirement is to keep the original text as it appears.

2

2 Answers

1
votes

I think you need to do both the disable-output-escaping="yes" and set the document to HTML at the same time.

FROM W3C (emphasis mine):

It is an error for output escaping to be disabled for a text node that is used for something other than a text node in the result tree. Thus, it is an error to disable output escaping for an xsl:value-of or xsl:text element that is used to generate the string-value of a comment, processing instruction or attribute node; it is also an error to convert a result tree fragment to a number or a string if the result tree fragment contains a text node for which escaping was disabled. In both cases, an XSLT processor may signal the error; if it does not signal the error, it must recover by ignoring the disable-output-escaping attribute.

The disable-output-escaping attribute may be used with the html output method as well as with the xml output method. The text output method ignores the disable-output-escaping attribute, since it does not perform any output escaping.

An XSLT processor will only be able to disable output escaping if it controls how the result tree is output. This may not always be the case. For example, the result tree may be used as the source tree for another XSLT transformation instead of being output. An XSLT processor is not required to support disabling output escaping. If an xsl:value-of or xsl:text specifies that output escaping should be disabled and the XSLT processor does not support this, the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.

If output escaping is disabled for a character that is not representable in the encoding that the XSLT processor is using for output, then the XSLT processor may signal an error; if it does not signal an error, it must recover by not disabling output escaping.

Since disabling output escaping may not work with all XSLT processors and can result in XML that is not well-formed, it should be used only when there is no alternative.

1
votes

These are entities. Usually they get mapped to a unicode representation of that entity. The final stream will just contain the characters. If you output the stream it's up to the serializer to escape the characters depending on the output type (which is what you can disable with disable-output-escaping). So a proper serializer should turn this

<xsl:output method="html" encoding="UTF-8"/>
<xsl:text>some&#160;test</xsl:text>

into

some&nbsp;test

See section 5 on this article.

So I would check that with your XSLT processor first.