39
votes

I'm having a hard time wrapping my head around XSLT but I heard it's possible to split an XML file into multiple files. Basically I'd like to copy all the elements up to the first file and after the last file and then add the individual file content for each output file.

Could someone give me some pointers on this if it's even possible?

Thanks,

complete.xml

<rootelem>
  <elem>
    <file attr1='1'>
      <content>content file 1</content>
    </file>
    <file attr2='2'>
      <content>content file 2</content>
    </file>
    <file attr3='3'>
      <content>content file 3</content>
    </file>
  </elem>
</rootelem>

OUTPUT:

complete_PART1.xml

<rootelem>
  <elem>
     <file attr1='1'>
        <content>content file 1</content>
     </file>
  </elem>
</rootelem>

complete_PART2.xml

<rootelem>
  <elem>
    <file attr2='2'>
      <content>content file 2</content>
    </file>
  </elem>
</rootelem>

complete_PART3.xml

<rootelem>
  <elem>
     <file attr3='3'>
        <content>content file 3</content>
     </file>
  </elem>
</rootelem>
3
Good question, +1. See my answer for directions about the standard support of XSLT (1.0 and 2.0) of producing multiplr output results. - Dimitre Novatchev
I have a requirement to split large XML files into smaller files, but I was going to write a program to do (it needs to poll a folder and process files over x megs big) - there's loads of different file types - e.g. I don't know the XML structure up front so I need a generic splitter - can this be done with XSLT or should I use the .NET XML reading tools? - Rodney

3 Answers

19
votes

Responding to your comment on @Dimitre's answer...

You wrote,

<xsl:template match="/">
  <xsl:for-each select="elem/file">
    <xsl:result-document method="xml" href="file_{@id}-output.xml">
      <xsl:copy-of select="."/>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template> 

This doesn't quite match your XML, which has rootelem as an outermost element, and your comment says root as an outermost element. You probably want something like this:

<xsl:template match="/root">
  <xsl:for-each select="elem/file">
    <xsl:result-document method="xml" href="file_{@id}-output.xml">
      <root>
        <xsl:copy-of select="/root/@*" />
        <elem>
          <xsl:copy-of select="../@* | ." />
        </elem>
      </root>
    </xsl:result-document>
  </xsl:for-each>
</xsl:template> 

You could get fancier, trying to use <xsl:copy> instead of literal result elements for root and elem, but it doesn't seem worth the effort unless they're going to vary.

15
votes

It is not possible in pure XSLT 1.0 to produce more than one output files. One could use the <exslt:document> extension element for this purpose.

In XSLT 2.0 use the <xsl:result-document> element.

2
votes

If you want to use

<xsl:result-document method="xml" href="file_{@id}-output.xml">

from an ANT xslt call, you need to use 2.0., just add the following in your ANT call:

<classpath location="/home/ap/saxon/saxon8.jar" />

And specifiy Version="2.0" And enjoy file splitting.