2
votes

I have a number of xml files that should follow this format:

<root>
<question>What is the answer?</question>
<answer choice="A">Some</answer>
<answer choice="B">Answer</answer>
<answer choice="C">Text</answer>
</root>

But it comes in from a web interface (I cannot control the output) with Comments and ends up looking like this:

<root>
<question>What is the answer?</question>
<answer choice="A"><!--some comment
-->
Some
</answer choice="B">
<answer>

<!--some comment
     -->
    Answer
    </answer>
    <answer choice="C"><!--another comment
    -->
   Text</answer>
   </root>

The output - after removing the comments ends up like this:

What is the answer?
A\t


Some
B\t
Answer
C\t     
Text

Now, I have an xsl sheet set up to strip out comments using:

<xsl:template match="comment()"/>

and some other Identity template applications.

I would use normalize-space(), but it removes the newlines that I do want from the answer text. What I am looking for is a way to remove only "blank" or preceding and ending "extra" newlines. Is there a good way to do this?

Also note: The final output is Adobe Indesign, which uses XSLT 1.0.

[Edit - the XSL is below].

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:strip-space elements="*" />

<xsl:template match = "@*|node()|processing-instruction()" name="identity">
<xsl:copy>
<xsl:apply-templates select="@*|node()|processing-instruction()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="comment()"/>

<xsl:template match="//answer"><xsl:value-of select="@choice"/>
<xsl:text>&#09;</xsl:text><xsl:call-template name="identity"/>
</xsl:template> 

<xsl:template match="//question">
<xsl:text>00&#09;</xsl:text><xsl:call-template name="identity"/>
</xsl:template>

</xsl:stylesheet>
1
It would help if you showed exactly what you have done so far (XSLT), and show an example of how the output fails to meet your expectations/requirements. - Jim Garrison
I have added the XSL and cleaned up the output. Basically, when the comments are stripped out, they are leaving behind the original space they took up in the xml file, since this space is considered part of the answer node, it can't be picked up by strip-spaces. I also do need any intentional returns, so normalize-space is out of the question as well. - DurkD

1 Answers

4
votes

This stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*" />
    <xsl:template match="@*|node()" name="identity">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="comment()"/>
    <xsl:template match="@choice">
        <xsl:value-of select="concat(.,'&#x9;')"/>
    </xsl:template>
    <xsl:template match="question|answer">
        <xsl:call-template name="identity"/>
        <xsl:text>&#xA;</xsl:text>
    </xsl:template>
</xsl:stylesheet>

Output:

<root><question>What is the answer?</question>
<answer>A   Some</answer>
<answer>B   Answer</answer>
<answer>C   Text</answer>
</root>