XSLT - XML to CSV dynamic template using XSLT 1

Question

I am attempting parse an XML into a flat file. Of the many topics I have found on this subject at SO, these two both partially what I wish to accomplish.

XML to CSV using XSLT help

XML to CSV using XSLT

Example XML

    <env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
   <env:Body>
      <wd:Get_Schools_Response wd:version="v29.1" xmlns:wd="urn:com.workday/bsvc">
         <wd:Response_Filter>
            <wd:Page>1</wd:Page>
            <wd:Count>50</wd:Count>
         </wd:Response_Filter>
         <wd:Response_Group>
            <wd:Include_Reference>0</wd:Include_Reference>
         </wd:Response_Group>
         <wd:Response_Results>
            <wd:Total_Results>19448</wd:Total_Results>
            <wd:Total_Pages>389</wd:Total_Pages>
            <wd:Page_Results>50</wd:Page_Results>
            <wd:Page>1</wd:Page>
         </wd:Response_Results>
         <wd:Response_Data>
            <wd:School>
               <wd:School_Data>
                  <wd:ID>Chonnam_National_University_Yosu</wd:ID>
                  <wd:School_Name>Chonnam National University (Yosu)</wd:School_Name>
                  <wd:Country_Reference>
                     <wd:ID wd:type="WID">7a5a2aadf9d34086a2bfbfd408bc28da</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">KR</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">KOR</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">410</wd:ID>
                  </wd:Country_Reference>
                  <wd:Inactive>0</wd:Inactive>
               </wd:School_Data>
            </wd:School>
            <wd:School>
               <wd:School_Data>
                  <wd:ID>Asian_University_Of_Science_Technology</wd:ID>
                  <wd:School_Name>Asian University of Science &amp; Technology</wd:School_Name>
                  <wd:Country_Reference>
                     <wd:ID wd:type="WID">873d0f604e3b458c990cb4d83a5c0f14</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">TH</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">THA</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">764</wd:ID>
                  </wd:Country_Reference>
                  <wd:Inactive>0</wd:Inactive>
               </wd:School_Data>
            </wd:School>
            <wd:School>
               <wd:School_Data>
                  <wd:ID>Groep_T_Leuven</wd:ID>
                  <wd:School_Name>Groep T Leuven</wd:School_Name>
                  <wd:Country_Reference>
                     <wd:ID wd:type="WID">a04ea128f43a42e59b1e6a19e8f0b374</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">BE</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">BEL</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">56</wd:ID>
                  </wd:Country_Reference>
                  <wd:Inactive>0</wd:Inactive>
               </wd:School_Data>
            </wd:School>
            <wd:School>
               <wd:School_Data>
                  <wd:ID>Tohono_O_Odham_Community_College</wd:ID>
                  <wd:School_Name>Tohono O'Odham Community College</wd:School_Name>
                  <wd:Country_Region_Reference>
                     <wd:ID wd:type="WID">c7b20b0d4bc04711a00900569e9afabd</wd:ID>
                     <wd:ID wd:type="Country_Region_ID">USA-AZ</wd:ID>
                     <wd:ID wd:type="ISO_3166-2_Code">AZ</wd:ID>
                  </wd:Country_Region_Reference>
                  <wd:Country_Reference>
                     <wd:ID wd:type="WID">bc33aa3152ec42d4995f4791a106ed09</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">US</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Alpha-3_Code">USA</wd:ID>
                     <wd:ID wd:type="ISO_3166-1_Numeric-3_Code">840</wd:ID>
                  </wd:Country_Reference>
                  <wd:Inactive>0</wd:Inactive>
               </wd:School_Data>
            </wd:School>
         </wd:Response_Data>
      </wd:Get_Schools_Response>
   </env:Body>
</env:Envelope>

<xsl:stylesheet version="1.0"

In the case of the first link I get the following:

1|50|0|19448|389|50|1|Chonnam_National_University_Yosu|Chonnam National University (Yosu)|7a5a2aadf9d34086a2bfbfd408bc28da|KR|KOR|410|0|Asian_University_Of_Science_Technology|Asian University of Science & Technology|873d0f604e3b458c990cb4d83a5c0f14|TH|THA|764|0|Groep_T_Leuven|Groep T Leuven|a04ea128f43a42e59b1e6a19e8f0b374|BE|BEL|56|0|Tohono_O_Odham_Community_College|Tohono O'Odham Community College|c7b20b0d4bc04711a00900569e9afabd|USA-AZ|AZ|bc33aa3152ec42d4995f4791a106ed09|US|USA|840|0

This is a good solution because it drills down into each child nodes and puts in a separator, but doesn't know about the child nodes of the previous ancestor. In addition I do not want the page/results/total_pages information to come over. I added the standard template override but that didn't do anything.

<xsl:template match="text()|@*">
  <!--<xsl:value-of select="."/>
      Do nothing -->
</xsl:template>

In the case of the second:

ID|School_Name|Country_Reference|Inactive|Country_Region_Reference
Chonnam_National_University_Yosu|Chonnam National University (Yosu)|7a5a2aadf9d34086a2bfbfd408bc28daKRKOR410|0|
Asian_University_Of_Science_Technology|Asian University of Science & Technology|873d0f604e3b458c990cb4d83a5c0f14THTHA764|0|
Groep_T_Leuven|Groep T Leuven|a04ea128f43a42e59b1e6a19e8f0b374BEBEL56|0|
Tohono_O_Odham_Community_College|Tohono O'Odham Community College|bc33aa3152ec42d4995f4791a106ed09USUSA840|0|c7b20b0d4bc04711a00900569e9afabdUSA-AZAZ

In the case of the second example, it's not dynamic enough, it doesn't add bars between the child values. I tried doing things like this:

<xsl:key name="field" match="/*/*/*/*/*/*/*/child::*" use="local-name()"/>

<!-- variable containing the first occurrence of each field -->
   <xsl:variable name="allFields"
     select="/*/*/*/*/*/*/*/child::*[generate-id()=generate-id(key('field', local-name())[1])]" />

Which produces something like:

ID
Chonnam_National_University_Yosu
Asian_University_Of_Science_Technology
Groep_T_Leuven
Tohono_O_Odham_Community_College

What I am hoping for is to dynamically drill into all children and grandchildren, etc and produce a flat file with delimiters for all values, even if the previous node didn't have those values, and finish each line with a line feed. In addition get rid of 1|50|0|19448|389|50|1 from the first result:

Chonnam_National_University_Yosu|Chonnam National University (Yosu)|7a5a2aadf9d34086a2bfbfd408bc28da||||KR|KOR|410|0
Asian_University_Of_Science_Technology|Asian University of Science & Technology|873d0f604e3b458c990cb4d83a5c0f14||||TH|THA|764|0
Groep_T_Leuven|Groep T Leuven|a04ea128f43a42e59b1e6a19e8f0b374||||BE|BEL|56|0
Tohono_O_Odham_Community_College|Tohono O'Odham Community College|c7b20b0d4bc04711a00900569e9afabd|USA-AZ|AZ|bc33aa3152ec42d4995f4791a106ed09|US|USA|840|0

I am using XSLT but I am open to suggestions on other tools or methods.

Daniel Haley Daniel Haley · Accepted Answer · 2018-01-24T00:35:42

I had a stylesheet that I created in XSLT 1.0 for a similar question. (I'm unable to find a link to it for reference. I believe the question was deleted before I could submit the answer. Luckily I saved it.)

I made some changes to it that appear to produce the output you're looking for.

Based on your examples and descriptions, these are what I thought the requirements should be:

All text that was a descendant of wd:School_Data needed to be output.
There was a single row for each wd:School_Data.
The column/field that the text belonged in was based on either: 1) the element name or 2) the value of the wd:type attribute and the name of the parent (if the wd:type attribute was specified).
All rows should have an entry for each column, even if that row did not have a value for that column.

The first thing that I needed to do was determine what the unique columns were going to be.

To do this I first created an xsl:key (named cols in the stylesheet) that matched all elements that were descendants of wd:School_Data and contained text. The key used a combination of the local name (without prefix) and the wd:type attribute separated by ~. For example, the element <wd:ID>Groep_T_Leuven</wd:ID> would have the key ID~ and <wd:ID wd:type="ISO_3166-1_Alpha-2_Code">BE</wd:ID> would have the key ID~ISO_3166-1_Alpha-2_Code.

The second step was to create a variable (allCols) that contained a unique list of all the keys (local-name()/wd:type combos) separated by |. This variable would allow me to recursively process it so that each row was guaranteed to have an entry for each column and that the order would always be the same. The value was created by processing only the first node from each key (Muenchian Grouping) with a moded template (mode getCols).

Since the column entries in the first row (the header) needed to be different than what the values in allCols were, I created another variable named header. The value for this variable was created similar to allCols, but it either used just the local name or the parents local name and the value of the wd:type attribute (depending on whether or not the attribute existed). It also used a moded template with the mode header. You'll notice that I used substring($temp,2) in this variable. This was to remove the unwanted | from the beginning of $temp.

To limit the processing to just wd:School_Data, I matched the root element (/*) and only selected wd:School_Data in the xsl:for-each. I also output the value of the header variable.

When a wd:School_Data element was processed, I called the named template outputFields. I passed a parameter containing the allCols variable.

The outputFields template is where most of the processing happens. First, four variables are created.

The first two variables, field and leftToProcess, are the first field and the remaining fields.

The second two variables, elemName and elemType, are the separated element name and type values.

In the xsl:choose the value of the matching element is selected. How it's selected depends on if there was a type value.

The xsl:if is the recursive part of the template. If leftToProcess has fields in it, the template is called again with the leftToProcess as the toProcess parameter value.

XSLT 1.0 (working example here: http://xsltfiddle.liberty-development.net/pPgCcor/1)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:wd="urn:com.workday/bsvc">
  <xsl:output method="text"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="cols" match="wd:School_Data//*[text()]" use="concat(local-name(),'~',@wd:type)"/>

  <xsl:variable name="allCols">
    <xsl:apply-templates 
      select="//wd:School_Data//*[text()][count(.|key('cols',concat(local-name(),'~',@wd:type))[1])=1]"
      mode="getCols"
    /> 
  </xsl:variable>

  <xsl:variable name="header">
    <xsl:variable name="temp">
      <xsl:apply-templates 
        select="//wd:School_Data//*[text()][count(.|key('cols',concat(local-name(),'~',@wd:type))[1])=1]"
        mode="header"
      />      
    </xsl:variable>
    <xsl:value-of select="substring($temp,2)"/>
  </xsl:variable>

  <xsl:template match="/*">
    <xsl:message><xsl:value-of select="$allCols"/></xsl:message>
    <xsl:value-of select="concat($header,'&#xA;')"/>
    <xsl:for-each select="//wd:School_Data">
      <xsl:call-template name="outputFields">
        <xsl:with-param name="toProcess" select="$allCols"/>
      </xsl:call-template>
      <xsl:text>&#xA;</xsl:text>
    </xsl:for-each>
  </xsl:template>

  <xsl:template name="outputFields">
    <xsl:param name="toProcess"/>
    <xsl:variable name="field" select="substring-before($toProcess, '|')"/>
    <xsl:variable name="leftToProcess" select="substring-after($toProcess, '|')"/>
    <xsl:variable name="elemName" select="substring-before($field,'~')"/>
    <xsl:variable name="elemType" select="substring-after($field,'~')"/>
    <xsl:choose>
      <xsl:when test="$elemType">
        <xsl:value-of select=".//*[local-name()=$elemName and @wd:type=$elemType]"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select=".//*[local-name()=$elemName]"/>        
      </xsl:otherwise>
    </xsl:choose>
    <xsl:if test="$leftToProcess">
      <xsl:text>|</xsl:text>
      <xsl:call-template name="outputFields">
        <xsl:with-param name="toProcess" select="$leftToProcess"/>
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

  <xsl:template match="*" mode="getCols">
    <xsl:value-of select="concat(local-name(),'~',@wd:type,'|')"/>
  </xsl:template>

  <xsl:template match="*" mode="header">
    <xsl:value-of select="'|'"/>
    <xsl:choose>
      <xsl:when test="@wd:type">
        <xsl:value-of select="concat(local-name(..),' ',@wd:type)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="local-name()"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

XSLT - XML to CSV dynamic template using XSLT 1

1 Answers