0
votes

I'm using XSLT to translate some XML into HTML. The XML was not created by us and follows a long and complex schema with many custom formatting tags that need to be transformed into the appropriate HTML elements. When I transform it, tags that are not valid HTML are getting silently dropped.

For example,

<P>(1) something something <PRTPAGE P=\"783\"/> something else. </P>

Becomes:

<P>(1) something something  something else.</P>

Is there a way to output some kind of warning when a tag like PRTPAGE is dropped?

Since the schema re-uses the same tag names for multiple purposes, it's hard for me to figure out what tags are valid inside the one I'm transforming. I'm thinking that there may be some tags that require their own transformation rules and these warnings can help refine the XSL.

This is my XSL so far. I'm using the built-in javax.xml.transform.Transformer to do the transformation.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE stylesheet [
        <!ENTITY mdash  "&#x2014;" >
        ]>
<xsl:stylesheet
        xmlns:xsl=
                "http://www.w3.org/1999/XSL/Transform"
        version="3.0"
>

    <xsl:character-map name="cm">
        <xsl:output-character character="&mdash;" string="—"/>
    </xsl:character-map>

    <xsl:output use-character-maps="cm" method="xml" />
    <xsl:template match="//E[@T='03']">
        <span class="italic underline">
            <xsl:apply-templates/>
        </span>
    </xsl:template>
</xsl:stylesheet>
1
If PRTPAGE is being dropped, then it is because you have not added code to the XSLT to tell it to be copied. XSLT's built-in templates will not copy elements, only text nodes. If you are using XSLT 3.0, you can just add <xsl:mode on-no-match="shallow-copy"/> to tell XSLT to copy elements too. Thanks! - Tim C
Are you sure it is only PRTPAGE being dropped, by the way, and not P too? - Tim C
I'm actually processing the body of the <P> tags. Was just trying to abbreviate the backstory a bit. If I don't enable shallow-copy Is there a way to know when / what tags are dropped (i.e. what tags do not match)? - Frank Riccobono
Your XSL is replacing the U+2014 character with… the U+2014 character. In an XML document, writing &#x2014; is exactly the same as writing a (U+2014 EM DASH). In other words, your character-map is doing nothing. - VGR
The built-in XSLT processor in Java is a internal version of Apache Xalan which is an XSLT 1 processor that does not support xsl:character-map. As you use version="3.0" Xalan would operate in forwards-compatible processing mode and ignore the xsl:character-map. Is that what you want? Or are you using XSLT 3 with Saxon 9? In that case there is <xsl:mode warning-on-no-match="yes"/>, see w3.org/TR/xslt-30/#modes. - Martin Honnen

1 Answers

1
votes

This was a bit too long to write in comments, but if XSLT is dropping elements, it is because you have not added any template to explicitly copy them. When XSLT selects an element for which there is no matching template, then it uses its built-in templates which skip over elements, and only copy their descendant text nodes.

What you could try doing, is adding a generic template to your XSLT to match all other elements, and use xsl:message to write out a message listing the element name

  <xsl:template match="*">
    <xsl:message>
      <xsl:text>Dropping </xsl:text>
      <xsl:value-of select="name()" />
    </xsl:message>
    <xsl:apply-templates />
  </xsl:template>

(Note that I don't know anything about the javax.xml.transform.Transformer to say how you can actually read these messages).