There are a number of flattening questions here, but none deal with this level of complexity.
I have an xml document that looks something like:
<document>
<div class='target-one'>
maybe some text node, maybe not...1
<randomElement>
maybe some text node, maybe not...2
</randomElement>
<div class='target-one'>
<randomElement>
maybe some text node, maybe not...3
</randomElement>
</div>
maybe some text node, maybe not...4
<randomElement>
maybe some text node, maybe not...5
</randomElement>
<div class='target-two'>
maybe some text node, maybe not...6
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...10
<randomElement>
maybe some text node, maybe not...11
</randomElement>
<div class='target-one'>
<randomElement>
maybe some text node, maybe not...12
</randomElement>
</div>
maybe some text node, maybe not...13
<randomElement>
maybe some text node, maybe not...14
</randomElement>
<div class='target-two'>
maybe some text node, maybe not...15
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
</document>
So there is a list of target elements which can be nested in any order. I would like to flatten them whenever they are nested by adding in more of the parent element to wrap the randomElement and nodes separately, while making the target children into target siblings. What I mean is that the output should look like:
<document>
<div class='target-one'>
maybe some text node, maybe not...1
<randomElement>
maybe some text node, maybe not...2
</randomElement>
</div>
<div class='target-one'>
<randomElement>
maybe some text node, maybe not...3
</randomElement>
</div>
<div class='target-one'>
maybe some text node, maybe not...4
<randomElement>
maybe some text node, maybe not...5
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...6
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class='target-one'>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...10
<randomElement>
maybe some text node, maybe not...11
</randomElement>
</div>
<div class='target-one'>
<randomElement>
maybe some text node, maybe not...12
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...13
<randomElement>
maybe some text node, maybe not...14
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...15
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class='target-two'>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
</document>
So I wind up with many more of the parent divs, but all the text and the other nodes are in the right place. Please note that randomElement might be a div that is not a target class...
This is for reformatting ebooks for paging in an online library, so there might be an enormous number of elements before we actually hit a problem div. Thus we need some way to select all the elements and texts nodes in between problem children divs as a group, because if they are all wrapped in their own divs, it does no good - we will wind up with every p, em or span as its own page.
At the same time, most parent divs have no problem children. As long as the solution passes them through, I can clean up any empty divs with another run, but I do need this to work at least on a rudimentary level with text that has no child elements as well.
This is my first question on StackOverflow because I just don't get the recursion that would be necessary for this.
Thanks!
EDIT BASED ON THE ANSWER BY user52889. This never worked out but I am leaving it here for readability:
XSL that I can fire off in saxon:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="2.0">
<xsl:output method="html"
indent="yes"
encoding="utf-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="div[matches(@class,'target-one|target-two','i')]">
<xsl:for-each select="node()">
<xsl:choose>
<xsl:when test="self::*[matches(@class,'target-one|target-two','i')]">
<xsl:apply-templates select="."/>
</xsl:when>
<xsl:when test="preceding-sibling::node()[0][not(self::*[matches(@class,'target-one|target-two','i')])]">
<!-- do nothing, it will be handled by the next case -->
</xsl:when>
<xsl:otherwise>
<!--
create a copy of the element matched by the template, with its attrs
add to it the current node and all nodes which follow it, up to the next SIGNIFICANT node
or, put another way, all following siblings which either
a) do not have a preceding signficant node, or
b) whose nearest preceding singificant node is the same as the nearest preceding significant node of the current node, i.e. its following sibling node is the current node.
-->
<xsl:element name="{../name()}">
<xsl:apply-templates select="../@*"/>
<xsl:apply-templates select="following-sibling::node()[
not(preceding-sibling::*[matches(@class,'target-one|target-two','i')])
or
count(preceding-sibling::*[matches(@class,'target-one|target-two','i')][0]/following-sibling::node()[0] | current()) = 1
]" />
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Current output from this file with children and duplicates:
<document>
<div class="target-one">
<randomElement>
maybe some text node, maybe not...2
</randomElement>
<div class="target-one"></div>
maybe some text node, maybe not...4
<randomElement>
maybe some text node, maybe not...5
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class="target-one">
<div class="target-one"></div>
maybe some text node, maybe not...4
<randomElement>
maybe some text node, maybe not...5
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class="target-one"></div>
<div class="target-one">
<randomElement>
maybe some text node, maybe not...5
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class="target-one">
<div class="target-two">
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...8
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...7
</randomElement>
</div>
<div class="target-two"></div>
<div class="target-one">
<randomElement>
maybe some text node, maybe not...9
</randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...11
</randomElement>
<div class="target-one"></div>
maybe some text node, maybe not...13
<randomElement>
maybe some text node, maybe not...14
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
<div class="target-two">
<div class="target-one"></div>
maybe some text node, maybe not...13
<randomElement>
maybe some text node, maybe not...14
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
<div class="target-one"></div>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...14
</randomElement>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
<div class="target-two">
<div class="target-two">
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class="target-two"></div>
maybe some text node, maybe not...17
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...16
</randomElement>
</div>
<div class="target-two"></div>
<div class="target-two">
<randomElement>
maybe some text node, maybe not...18
</randomElement>
</div>
<div class="target-two"></div>
</document>