I have a very flat document which contains implied groups of elements based on their positioning after a Heading
item:
<Document>
<Body>
...
<Heading>Section 1</Heading>
<Item Id="1.1">Alpha</Item>
<Item Id="1.1">Bravo</Item>
...
<Heading>Section 2</Heading>
<Item Id="2.1">Alpha</Item>
<Item Id="2.1">Bravo</Item>
...
</Body>
</Document>
From this document, I want to extract the groups, but also filter the items in each group to take the first items with a given identifier. For example, where there are two items with the ID "1.1", only the first item is expected in the output. I intend to do additional processing to include the duplicates as children of the first item.
To achieve this grouping, I am using Muenchian grouping, where the key for the group is the identifier value:
<xsl:key
name="ItemsById"
match="/Document/Body/Item"
use="@Id"/>
This works great, except that there's a number of Item
elements defined as examples that happen to use the same identifiers and winds up in the node-set matched in the key.
As there is a range in the middle of the document that I care about, I am using the Kayessian method of intersection to restrict the node-set to just the section in the document I am interested in:
<xsl:variable
name="section"
select="(/Document/Body/Heading[text() = 'Example']
/following-sibling::*[2]/following-sibling::*)[
count(. | /Document/Body/Heading[text() = 'Appendix B']
/preceding-sibling::*)
= count(/Document/Body/Heading[text() = 'Appendix B']
/preceding-sibling::*)
]" />
This node-set is the intersection of two node-sets: all the elements after the Heading
"Section 1" (including the heading itself) and all the elements before the Heading
"Appendix B".
This matches the elements I care about, however since the key is unfiltered, the "first" value for a given identifier is sometimes outside of this node-set. I have tried using the variable in the key, but I've since discovered that there are numerous restrictions on the match in a key which prevent the use of variables.
Here is the full source document:
<Document>
<Body>
<Heading>Preamble</Heading>
<Para>
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
</Para>
<Heading>Example</Heading>
<Item Id="1.1">Example Alpha</Item>
<Item Id="1.1">Example Bravo</Item>
<Heading>Section 1</Heading>
<Item Id="1.1">Alpha</Item>
<Item Id="1.1">Bravo</Item>
<Item Id="1.2">Charlie</Item>
<Item Id="1.3">Delta</Item>
<Item Id="1.3">Echo</Item>
<Item Id="1.4">Foxtrot</Item>
<Heading>Section 2</Heading>
<Item Id="2.1">Alpha</Item>
<Item Id="2.1">Bravo</Item>
<Item Id="2.2">Charlie</Item>
<Item Id="2.3">Delta</Item>
<Item Id="2.3">Echo</Item>
<Item Id="2.4">Foxtrot</Item>
<Heading>Appendix A</Heading>
<Item Id="A.1">Alpha</Item>
<Item Id="A.1">Bravo</Item>
<Item Id="A.2">Charlie</Item>
<Item Id="A.3">Delta</Item>
<Item Id="A.3">Echo</Item>
<Item Id="A.4">Foxtrot</Item>
<Heading>Appendix B</Heading>
<Para>
Lorem ipsum dolor sit amet, consectetur
adipiscing elit, sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.
</Para>
</Body>
</Document>
I'm apply the following stylesheet:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<!-- The node-set which covers the wanted section of elements. -->
<xsl:variable
name="section"
select="(/Document/Body/Heading[text() = 'Example']
/following-sibling::*[2]/following-sibling::*)[
count(. | /Document/Body/Heading[text() = 'Appendix B']
/preceding-sibling::*)
= count(/Document/Body/Heading[text() = 'Appendix B']
/preceding-sibling::*)
]" />
<!-- The items keyed by their ID. -->
<xsl:key
name="ItemsById"
match="/Document/Body/Item"
use="@Id"/>
<!-- Matches the root to begin the output structure. -->
<xsl:template match="/">
<Document>
<!-- Apply templates to the headings. -->
<xsl:apply-templates select="$section[local-name() = 'Heading']" />
</Document>
</xsl:template>
<xsl:template match="/Document/Body/Heading">
<Section>
<xsl:attribute name="Title">
<xsl:value-of select="."/>
</xsl:attribute>
<xsl:variable
name="heading"
select="generate-id()" />
<!-- Apply templates to the items in this set. -->
<xsl:apply-templates
select="$section[
local-name() = 'Item'
and
generate-id() = generate-id(key('ItemsById', @Id)[1])
and
$heading = generate-id(preceding-sibling::Heading[1])
]" />
</Section>
</xsl:template>
</xsl:stylesheet>
This is the current output:
<Document>
<Section Title="Section 1">
<Item Id="1.2">Charlie</Item>
<Item Id="1.3">Delta</Item>
<Item Id="1.4">Foxtrot</Item>
</Section>
<Section Title="Section 2">
<Item Id="2.1">Alpha</Item>
<Item Id="2.2">Charlie</Item>
<Item Id="2.3">Delta</Item>
<Item Id="2.4">Foxtrot</Item>
</Section>
<Section Title="Appendix A">
<Item Id="A.1">Alpha</Item>
<Item Id="A.2">Charlie</Item>
<Item Id="A.3">Delta</Item>
<Item Id="A.4">Foxtrot</Item>
</Section>
</Document>
The issue is that the Item 1.1 is missing from Section 1.
Is there anything different I can try to achieve the same grouping over the section I'm interested in?