I've been working on a Geo application. Over the time the product's XML has grown bit messy. The problem arises when synchronizing the changes across multiple environments, like Dev, Test, etc. I'm trying to figure out a way to normalize the content, so I can avoid some cumbersome while editing and merging, and hence, have a productive development. I know it sounds crazy, and there's lot on the background, but let me jump to the actual issue leaving the history.
Here's the issue:
Multiple sorting orders applied, like:
- Sort based on reverse domain name. For example, it should read
d.c.b.a
asa.b.c.d
ormap.google.com
ascom.google.map
for sorting. - When the domain contains non-alphanumeric char, like *, ?, [, ], etc, then that node should be after the specific one as the scope is wide.
- Sort on port & path as 2nd subsequent sorting.
- Apply similar sorting order for tags under
<tgt>
element if present.
- Sort based on reverse domain name. For example, it should read
- Eliminate
<scheme>
and<port>
tags when the values are generic, like http / https for scheme tag and 80 or 443 for port tag, otherwise retain. Also, remove if there's no value, like<scheme/>
. - Preserve all other tag and values as-is.
- Trivial thing like indent to 2 space characters and actual data without having wanted boilerplate stuff.
Here's a bit of the problematic XML:
XML
<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
<a>blah</a>
<b>blah</b>
<maps>
<mapIndividual>
<src>
<scheme>https</scheme>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<port>80</port>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<scheme>https</scheme>
<domain>map.google.com</domain>
<port>443</port>
<path>/value</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>*.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>http</scheme>
<domain>d.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<maps>
</mapGeo>
I was able to apply basic sorting on the values as is, but couldn't figure out a way to generate reverse domain name. I came across XSL extension, but haven't tried yet. Here's the beginning part of the solution I was working on, which is very basic.
XSL
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="maps">
<xsl:copy>
<xsl:apply-templates select="*">
<xsl:sort select="src/domain" />
<xsl:sort select="src/port" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Expected Output
<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
<a>blah</a>
<b>blah</b>
<maps>
<mapIndividual>
<src>
<domain>d.c.b.a</domain>
<path>somepath</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>somepath</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<domain>*.c.b.a</domain>
<path>path1</path>
<port>8085</port>
<query>blah</query>
</src>
<tgt>
<domain>r.q.p</domain>
<path>path2</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<scheme>tcp</scheme>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</src>
<tgt>
<domain>map.google.com</domain>
<path>/value</path>
<query>blah</query>
</tgt>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<mapIndividual>
<src>
<domain>photos.yahoo.com</domain>
<path>somepath</path>
<query>blah</query>
</src>
<loc>C:\var\tmp</loc>
<x>blah</x>
<y>blah</y>
</mapIndividual>
<maps>
</mapGeo>
Note: I'd prefer XSLT 1.0 as that's supported in the current environment. XSLT 2.0 would be a plus.
Update: I figured out solution to support XSLT 2.0 and XSLT 3.0, so please ignore my previous note for XSLT 1.0.
Thank you in Advance!
Cheers,
mappings
or selectssourceLocation
, elements which are not even present in your XML input sample. Also if this is XSLT 1 and you expect to break up and/or reverse some sort order keys it seems, do you have access to EXSLT extension, do you know exactly which XSLT processor you use, which extension mechanisms it supports? – Martin Honnenxsl:sort select="string-join(reverse(tokenize(src/domain, '\.')), '.')"
I think. I haven't understood the second with the non alphanumeric characters, can they all be treated the same by replacing them with some character that would be sorted after alphanumeric? Or do you need to implement some ordering also on e.g.*.com
and?.com
? – Martin Honnen