5
votes

I am struggling with the concept of grouping (on multiple keys) of table based XML to hierarchy with XSLT

The grouping is based on first four elements, however the grouping must break if there is another element in between the set.

Source XML:

<RECORDS> 
<RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A1</F1>
</RECORD>
<RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A2</F1>
</RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>195</E5>
    <F1>A3</F1>
  </RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A4</F1>
  </RECORD>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A5</F1>
  </RECORD>
     <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A6</F1>
  </RECORD>
 <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F1>A7</F1>
  </RECORD>
 </RECORDS>

Output XML

 <RECORDS>
 <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F>
     <F1>A1</F1>
     <F1>A2</F1>
    </F>
  </RECORD>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>195</E5>
    <F>
     <F1>A3</F1>
     <F1>A4</F1>
    </F>
  </RECORD>
  <RECORD>
   <E1>MICKEY</E1> <!--Must break and not merge in first group -->
   <E2>TEST</E2>
   <E4>14</E4>
   <E5>196</E5>
   <F>   
   <F1>A5</F1>
   </F>
  </RECORD>
  <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <E5>196</E5>
    <F>
     <F1>A6</F1>
     <F1>A7</F1>
    </F>
  </RECORD>
 </RECORDS>

Here is the XSL I have come up with so far...

<?xml version="1.0"?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes"/>
 <xsl:key name="grouped" match="RECORD"
  use="concat(E1, '+', E2, '+', E4 , '+', E5 )"/>

<xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
 <xsl:template match="/*">
  <RECORDS>
   <xsl:apply-templates select=
   "RECORD[generate-id()
          =
           generate-id(key('grouped',
                        concat(E1, '+', E2, '+', E4 , '+', E5 )
                          )
                           [1]
                      )
           ]
   "/>
  </RECORDS>
 </xsl:template>
 <xsl:template match="RECORD">
   <RECORD>
  <E1><xsl:value-of select="E1"/></E1>
<E2><xsl:value-of select="E2"/></E2>
<E4><xsl:value-of select="E4"/></E4>
<F>
<xsl:for select="F1">
<F1><xsl:value-of select="F1"/></F1>
</xsl:for>

</F>
   </RECORD>

</xsl:template>
</xsl:stylesheet>

The issue is that I am unable to generate the inner tag reapeating for each f1. Also I should get 4 set of RECORDS, not 3 that I get with this.

<RECORDS>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
  <RECORD>
    <E1>MICKEY</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
  <RECORD>
    <E1>DONALD</E1>
    <E2>TEST</E2>
    <E4>14</E4>
    <F></F>
  </RECORD>
</RECORDS>
2
Please pinpoint your difficulty - preferably, post your attempt so we can fix it.michael.hor257k

2 Answers

5
votes

Here is a solution using keys. Shorter (28% less lines of code, and not requiring horizontal scrolling). More Robust (see the end of this answer for details)

It is more general, because it will work even in the case where in-between the elements we want to group, there are other elements that must be ignored (that is where preceding-sibling::*[1] may be an element we want excluded from grouping -- in the current problem -- not a RECORD element):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:key name="kStartGroup" match="/*/*" use=
    "generate-id(preceding-sibling::*
      [not(concat(E1, '|', E2, '|', E4, '|', E5)
          = concat(current()/E1, '|', current()/E2, '|', current()/E4, '|', current()/E5)
          )
      ][1])"/>
  <xsl:template match="*[not(concat(E1, '|', E2, '|', E4, '|', E5) 
                            = 
                              concat(preceding-sibling::*[1]/E1, '|', 
                                     preceding-sibling::*[1]/E2, '|', 
                                     preceding-sibling::*[1]/E4, '|',
                                     preceding-sibling::*[1]/E5)
                             )]">
    <xsl:copy>
      <xsl:copy-of select="E1 | E2 | E4 | E5"/>
      <F><xsl:copy-of select=
                      "key('kStartGroup', generate-id(preceding-sibling::*[1]))/F1"/></F>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="/*"><xsl:copy><xsl:apply-templates/></xsl:copy></xsl:template>      
  <xsl:template match="text()"/>
</xsl:stylesheet>

Robustness / Scalability

Because this transformation doesn't contain recursion (nested calls to <xsl:apply-templates), it is robust and scalable when applied on large XML files.

On the other side, the provided in another answer "siblings recursion" solution crashes due to stack-overflow when the transformation is applied on sufficiently-large XML document. In my case this crash was observed with source XML document of about 13 000 (13 thousand lines) -- this may vary depending on available RAM, XSLT processor, etc.

The current transformation executes successfully even on extremely large XML documents -- such as having 1 200 000 (one million and 200 thousand lines).

3
votes

Apparently you want to do in XSLT 1.0 the equivalent of XSLT 2.0's group-adjacent. This can be achieved using a technique known as "sibling recursion":

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:template match="/RECORDS">
    <xsl:copy>
        <!-- start the first group -->
        <xsl:apply-templates select="RECORD[1]"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="RECORD">
    <xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
    <xsl:copy>
        <xsl:copy-of select="E1 | E2 | E4 | E5"/>
        <F>
            <xsl:copy-of select="F1"/>
            <!-- immediate sibling in the same group -->
            <xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect"/>
        </F>
    </xsl:copy>
    <!-- start the next group -->
    <xsl:apply-templates select="following-sibling::RECORD[not(concat(E1, '+', E2, '+', E4 , '+', E5)=$key)][1]"/>
</xsl:template>

<xsl:template match="RECORD" mode="collect">
    <xsl:variable name="key" select="concat(E1, '+', E2, '+', E4 , '+', E5)" />
    <xsl:copy-of select="F1"/>
    <!-- immediate sibling in the same group -->
    <xsl:apply-templates select="following-sibling::RECORD[1][concat(E1, '+', E2, '+', E4 , '+', E5) = $key]" mode="collect" />
</xsl:template> 

</xsl:stylesheet>