4
votes

A quick question on using xslt 1.0 that you might be able to help me with.I have an Input xml that looks like below

<Root>
    <FirstName>Bob</FirstName>
    <LastName>Marley</LastName>
    <ID>BM1234</ID>
    <Songs>
        <Song>
            <EmptyElements></EmptyElements>
            <SongName>No woman no cry</SongName>
            <Year>1974</Year>
            <album></album>
            <studio></studio>
            <rating></rating>
        </Song>
    </Songs>
</Root>

The output needs to look like

<Root>
    <FirstName>Bob</FirstName>
    <LastName>Marley</LastName>
    <ID>BM1234</ID>
    <Songs>
        <Song>
            <EmptyElements>album, studio, rating</EmptyElements>
            <SongName>No woman no cry</SongName>
            <Year>1974</Year>
        </Song>
    </Songs>
</Root>

so basically a comma separated list of all the empty elements into the EmptyElements tag.

2
I would suggest you to search the net on Identity Transform template, and xsl:for-each in XSLT. These will help you solve your problem, or at least take you to the next step.Lingamurthy CS
I would strongly recommend against doing this at all- by consolidating multiple pieces of data in a single node like this, you make your data far less queryable, and far more difficult to work with generally. There's a rule in database design called 'First normal form', which states every field in a record should be atomic (i.e. one piece of data only). There's a very strong argument for applying the same principle to XML nodes.Flynn1179

2 Answers

2
votes

Or simply:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="Song">
    <xsl:copy>
        <EmptyElements>
            <xsl:for-each select="*[not(node() or self::EmptyElements)]">
                 <xsl:value-of select="name()"/>
                 <xsl:if test="position()!=last()">
                    <xsl:text>, </xsl:text>
                 </xsl:if>
            </xsl:for-each> 
        </EmptyElements>
        <xsl:apply-templates select="*[node()]"/>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Note:

This solution proudly uses the last() function. There are no performance issues related to using this function.

The XPath specification states:

The last function returns a number equal to the context size from the expression evaluation context.

And the XSLT specification tells us that:

Expression evaluation occurs with respect to a context. ... The context consists of:

• a node (the context node)
• a pair of non-zero positive integers (the context position and the context size)
...

The idea that the processor would go back and count all the nodes in the current node list, again and again, for each node in the list is simply preposterous. Once the context has been established (by calling either xsl:for-each or xsl:apply-templates), the context size is known and isn't going to change.

This conclusion can also be easily put to a test: using a list of 10k items, no discernible difference was found when evaluating:

<xsl:for-each select="item">
    <xsl:value-of select="position()!=last()"/>
</xsl:for-each>

against:

<xsl:for-each select="item">
    <xsl:value-of select="not(position() = 1)"/>
</xsl:for-each>

(tested with libxslt, Xalan and Saxon).

1
votes

This transformation:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="EmptyElements" priority="5">
    <xsl:copy>
      <xsl:apply-templates mode="enumerate" select=
      "../*[not(self::EmptyElements) and not(node())]" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Songs/Song/*" mode="enumerate">
    <xsl:value-of select="substring(',', not(position() = 1), 1)"/>
    <xsl:value-of select="name()"/>
  </xsl:template>
  <xsl:template match="Songs/Song/*[not(node())]"/>
</xsl:stylesheet>

when applied on the provided source XML document:

<Root>
    <FirstName>Bob</FirstName>
    <LastName>Marley</LastName>
    <ID>BM1234</ID>
    <Songs>
        <Song>
            <EmptyElements></EmptyElements>
            <SongName>No woman no cry</SongName>
            <Year>1974</Year>
            <album></album>
            <studio></studio>
            <rating></rating>
        </Song>
    </Songs>
</Root>

produces the wanted, correct result:

<Root>
   <FirstName>Bob</FirstName>
   <LastName>Marley</LastName>
   <ID>BM1234</ID>
   <Songs>
      <Song>
         <EmptyElements>album,studio,rating</EmptyElements>
         <SongName>No woman no cry</SongName>
         <Year>1974</Year>
      </Song>
   </Songs>
</Root>

Explanation:

  1. The identity rule, when selected for execution, copies the matched node "as-is"
  2. The template matching Songs/Song/*[not(node())], when selected for execution, does nothing, which results in "deleting" (not copying) the matched node in the output.
  3. The template matching EmptyElements has a higher priority specified than the "deleting" template mentioned above, so it is selected for execution on any EmptyElements element.
  4. The matched EmptyElements element is shallow-copied to the output, and then its content (body) is produced by applying templates in mode enumerate to all empty siblings-elements.
  5. Finally, the template in mode enumerate matches any child element of a Song element that is a child of a Songs element. It is selected for execution by the <xsl:apply-templates> instruction in step 4. above and is applied only on the empty-element siblings of the EmptyElements element. This template does two things: a) output a comma, if this is not the first node in the node-list; b) output the name of the matched element. In this way all names of the empty siblings-elements of the EmptyElements element are output, separated by commas.

Update:

Dear reader, We have a companion answer to this question, which starts with Or simply: And alludes that it is simpler than the code in this answer.

Instead of telling you that this answer is simpler than the Or simply:-answer, I have summarized a few facts that are related to simplicity, and you can make your own conclusion. In the following table, the values in each left sub-column are for this, current solution. The values in each right sub-column are for the Or simply:-solution:

enter image description here

In addition to this, the Or simply: - solution also has a potential performance, and a certain streamability issue -- see this fragment:

             <xsl:if test="position()!=last()">
                <xsl:text>, </xsl:text>
             </xsl:if>

Compare with what the current solution uses:

not(position() = 1)

See Dr. Michael Kay's recommendation, that the latter is "a much better way of coding this" than the former, and his explanation why:

"Why? Because however hard the optimizer works, the last() function is hard work: it involves some kind of lookahead. With "position() ne last()" the lookahead might be limited to one element, but it's still a lot more complicated than testing whether the position is 1.

With streaming coming along, the latter formulation is also more likely to be streamable (because lookahead is impossible with streaming)."

Conclusion: Whenever someone tells us: "Or simply:", it is good to take a few metrics before taking their statement for granted ...