-- Modified question --
Thanks already for all who provided potential solutions, but these are in line whith what I tried already, so I assume I should have been more clear. I extended the XML a bit to make the problem more transparent.
The XML is actually a compilation of various files, containing translated content, and the aim is to get a unified document containing only the unique English strings, and (after manual review and cleaning) a single translated one for each string, so it can be used for translation memory. That's why it's now a big file with loads of redundant information.
Each para line contains the English master (which can be repeated dozens of times within the file) and the translation variants. In quite some cases it's easy as all translated versions are equal, so I would end up with a single line, but in other cases it might be more complex.
So, assume today I have 10 para lines containing the same English content (#1), 2 different German variations, 3 different French variations, and the rest of locales only one variation I need to get :
1 Para having : 1 EN / 2 DE (v1 and v2) / 3 FR (v1,v2 and v3) / ...
And this repeated for every grouped unique English value in my list
The modified XML :
<Books>
<!--First English String (#1) with number of potential translations -->
<Para>
<EN>English Content #1</EN>
<DE>German Trans of #1 v1</DE>
<FR>French Trans of #1 v1</FR>
<!-- More locales here -->
</Para>
<Para>
<EN>English Content #1</EN>
<DE>German Trans of #1 v2</DE>
<FR>French Trans of #1 v1</FR>
<!-- More locales here -->
</Para>
<Para>
<EN>English Content #1</EN>
<DE>German Trans of #1 v1</DE>
<FR>French Trans of #1 v2</FR>
<!-- More locales here -->
</Para>
<!--Second English String (#2) with number of potential translations -->
<Para>
<EN>English Content #2</EN>
<DE>German Trans of #2 v1</DE>
<FR>French Trans of #2 v1</FR>
<!-- More locales here -->
</Para>
<Para>
<EN>English Content #2</EN>
<DE>German Trans of #2 v3</DE>
<FR>French Trans of #2 v1</FR>
<!-- More locales here -->
</Para>
<Para>
<EN>English Content #2</EN>
<DE>German Trans of #2 v2</DE>
<FR>French Trans of #2 v1</FR>
<!-- More locales here -->
</Para>
<!--Loads of additional English Strings (#3 ~ #n) with number of potential translations -->
Current solutions offer me the following output
<Books>
<Para>
<EN>English Content #1</EN>
<DE>German Trans of #1 v1</DE>
<DE>German Trans of #1 v2</DE>
<DE>German Trans of #2 v1</DE>
<DE>German Trans of #2 v3</DE>
<DE>German Trans of #2 v2</DE>
<FR>French Trans of #1 v1</FR>
<FR>French Trans of #1 v1</FR>
<FR>French Trans of #1 v2</FR>
<FR>French Trans of #2 v1</FR>
</Para>
</Books>
So, taking only the first EN tag, and then grouping all the others, irrelevant of differences between English master strings. While what I aim at is to get the following :
<Books>
<!-- First Grouped EN string and linked grouped translations -->
<Para>
<EN>English Content #1</EN>
<DE>German Trans of #1 v1</DE>
<DE>German Trans of #1 v2</DE>
<FR>French Trans of #1 v1</FR>
<FR>French Trans of #1 v2</FR>
</Para>
<!-- Second Grouped EN string and linked grouped translations -->
<Para>
<EN>English Content #2</EN>
<DE>German Trans of #2 v1</DE>
<DE>German Trans of #2 v3</DE>
<DE>German Trans of #2 v2</DE>
<FR>French Trans of #2 v1</FR>
</Para>
<!-- 3d to n Grouped EN string and linked grouped translations -->
</Books>
<EN></EN>
values. Can you show your first stab at XSLT as well, to show your existing logic? – Merlyn Morgan-Graham