0
votes

I've been working on a Geo application. Over the time the product's XML has grown bit messy. The problem arises when synchronizing the changes across multiple environments, like Dev, Test, etc. I'm trying to figure out a way to normalize the content, so I can avoid some cumbersome while editing and merging, and hence, have a productive development. I know it sounds crazy, and there's lot on the background, but let me jump to the actual issue leaving the history.

Here's the issue:

  1. Multiple sorting orders applied, like:

    • Sort based on reverse domain name. For example, it should read d.c.b.a as a.b.c.d or map.google.com as com.google.map for sorting.
    • When the domain contains non-alphanumeric char, like *, ?, [, ], etc, then that node should be after the specific one as the scope is wide.
    • Sort on port & path as 2nd subsequent sorting.
    • Apply similar sorting order for tags under <tgt> element if present.
  2. Eliminate <scheme> and <port> tags when the values are generic, like http / https for scheme tag and 80 or 443 for port tag, otherwise retain. Also, remove if there's no value, like <scheme/>.
  3. Preserve all other tag and values as-is.
  4. Trivial thing like indent to 2 space characters and actual data without having wanted boilerplate stuff.

Here's a bit of the problematic XML:

XML

<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
  <a>blah</a>
  <b>blah</b>
  <maps>
    <mapIndividual>
      <src>
        <scheme>https</scheme>
        <domain>photos.yahoo.com</domain>
        <path>somepath</path>
        <query>blah</query>
      </src>
      <loc>C:\var\tmp</loc>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>tcp</scheme>
        <domain>map.google.com</domain>
        <port>80</port>
        <path>/value</path>
        <query>blah</query>
      </src>
      <tgt>
        <scheme>https</scheme>
        <domain>map.google.com</domain>
        <port>443</port>
        <path>/value</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>http</scheme>
        <domain>*.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>http</scheme>
        <domain>d.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
  <maps>
</mapGeo>

I was able to apply basic sorting on the values as is, but couldn't figure out a way to generate reverse domain name. I came across XSL extension, but haven't tried yet. Here's the beginning part of the solution I was working on, which is very basic.

XSL

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

<xsl:template match="node()">
    <xsl:copy>
      <xsl:apply-templates select="node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="maps">
    <xsl:copy>
      <xsl:apply-templates select="*">
        <xsl:sort select="src/domain" />
        <xsl:sort select="src/port" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Expected Output

<?xml version='1.0' encoding='UTF-8' ?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
  <a>blah</a>
  <b>blah</b>
  <maps>
    <mapIndividual>
      <src>
        <domain>d.c.b.a</domain>
        <path>somepath</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>somepath</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <domain>*.c.b.a</domain>
        <path>path1</path>
        <port>8085</port>
        <query>blah</query>
      </src>
      <tgt>
        <domain>r.q.p</domain>
        <path>path2</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <scheme>tcp</scheme>
        <domain>map.google.com</domain>
        <path>/value</path>
        <query>blah</query>
      </src>
      <tgt>
        <domain>map.google.com</domain>
        <path>/value</path>
        <query>blah</query>
      </tgt>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
    <mapIndividual>
      <src>
        <domain>photos.yahoo.com</domain>
        <path>somepath</path>
        <query>blah</query>
      </src>
      <loc>C:\var\tmp</loc>
      <x>blah</x>
      <y>blah</y>
    </mapIndividual>
  <maps>
</mapGeo>

Note: I'd prefer XSLT 1.0 as that's supported in the current environment. XSLT 2.0 would be a plus.

Update: I figured out solution to support XSLT 2.0 and XSLT 3.0, so please ignore my previous note for XSLT 1.0.

Thank you in Advance!

Cheers,

2
Your current stylesheet, other than having the identity transformation, matches mappings or selects sourceLocation, elements which are not even present in your XML input sample. Also if this is XSLT 1 and you expect to break up and/or reverse some sort order keys it seems, do you have access to EXSLT extension, do you know exactly which XSLT processor you use, which extension mechanisms it supports?Martin Honnen
What exactly does "sort based on reverse domain name" mean? Do you want to sort alphabetically, based on the entire reversed name? Or do you want to sort by each label separately?michael.hor257k
P.S. Please ask one question at a time. The sorting issue is complicated enough. Save issues #2, #3 and #4 for separate questions.michael.hor257k
@MartinHonnen Thank you for pointing the issue. I fixed it.Rohit
In XSLT 2 or 3 you can express the first sorting requirement as xsl:sort select="string-join(reverse(tokenize(src/domain, '\.')), '.')" I think. I haven't understood the second with the non alphanumeric characters, can they all be treated the same by replacing them with some character that would be sorted after alphanumeric? Or do you need to implement some ordering also on e.g. *.com and ?.com?Martin Honnen

2 Answers

0
votes

I don't think it's possible to sort in the reverse order you seek in a single pass using XSLT 1.0. Consider the following simplified example:

XML

<root>
    <item>
        <domain>t.q.p</domain>
    </item>
    <item>
        <domain>s.q.p</domain>
    </item>
    <item>
        <domain>photos.yahoo.com</domain>
    </item>
    <item>
        <domain>map.google.com</domain>
    </item>
    <item>
        <domain>aap.google.com</domain>
    </item>
    <item>
        <domain>r.q.p</domain>
    </item>
    <item>
        <domain>*.c.b.a</domain>
    </item>
    <item>
        <domain>d.c.b.a</domain>
    </item>
</root>

XSLT 1.0 (+ EXSLT node-set)

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="/root">
    <!-- 1st pass -->
    <xsl:variable name="items">
        <xsl:for-each select="item">
            <xsl:copy>
                <xsl:attribute name="sort-string">
                    <xsl:call-template name="reverse-tokens">
                        <xsl:with-param name="text" select="domain"/>
                    </xsl:call-template>
                </xsl:attribute>
                <xsl:copy-of select="@*|node()"/>
            </xsl:copy>
        </xsl:for-each>
    </xsl:variable>
    <!-- output -->
    <xsl:copy>
        <xsl:apply-templates select="exsl:node-set($items)/item">
            <xsl:sort select="@sort-string" data-type="text" order="ascending"/>
        </xsl:apply-templates>
    </xsl:copy>
</xsl:template>

<xsl:template match="@sort-string"/>

<xsl:template name="reverse-tokens">
    <xsl:param name="text"/>
    <xsl:param name="delimiter" select="'.'"/>
    <xsl:variable name="token" select="substring-before(concat($text, $delimiter), $delimiter)"/>
    <xsl:if test="contains($text, $delimiter)">
        <!-- recursive call -->
        <xsl:call-template name="reverse-tokens">
            <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
        </xsl:call-template>
        <xsl:value-of select="$delimiter"/>
    </xsl:if>
    <xsl:choose>
        <xsl:when test="$token = '*'">
            <xsl:text>zzzz</xsl:text>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$token"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <item>
    <domain>d.c.b.a</domain>
  </item>
  <item>
    <domain>*.c.b.a</domain>
  </item>
  <item>
    <domain>aap.google.com</domain>
  </item>
  <item>
    <domain>map.google.com</domain>
  </item>
  <item>
    <domain>photos.yahoo.com</domain>
  </item>
  <item>
    <domain>r.q.p</domain>
  </item>
  <item>
    <domain>s.q.p</domain>
  </item>
  <item>
    <domain>t.q.p</domain>
  </item>
</root>
0
votes

This XSLT 1.0 stylesheet (without extensions)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output indent="yes" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="maps">
        <xsl:copy>
            <xsl:apply-templates select="*">
                <xsl:sort 
                    select="translate(src/domain,translate(src/domain,'.',''),'')" 
                    order="descending"/>
                <xsl:sort 
                    select="
                      substring-after(
                        substring-after(
                          substring-after(translate(src/domain,'*','~'),'.'),'.'),'.')"/>
                <xsl:sort 
                    select="
                        substring-after(
                            substring-after(translate(src/domain,'*','~'),'.'),'.')"/>
                <xsl:sort 
                    select="substring-after(translate(src/domain,'*','~'),'.')"/>
                <xsl:sort select="translate(src/domain,'*','~')" />
                <xsl:sort select="src/port" />
            </xsl:apply-templates>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Output

<?xml version="1.0" encoding="UTF-8"?>
<?tapia chrome-version='2.0' ?>
<mapGeo>
   <a>blah</a>
   <b>blah</b>
   <maps>
      <mapIndividual>
         <src>
            <scheme>http</scheme>
            <domain>d.c.b.a</domain>
            <path>somepath</path>
            <port>8085</port>
            <query>blah</query>
         </src>
         <tgt>
            <domain>r.q.p</domain>
            <path>somepath</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>http</scheme>
            <domain>*.c.b.a</domain>
            <path>somepath</path>
            <port>8085</port>
            <query>blah</query>
         </src>
         <tgt>
            <domain>r.q.p</domain>
            <path>somepath</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>tcp</scheme>
            <domain>map.google.com</domain>
            <port>80</port>
            <path>/value</path>
            <query>blah</query>
         </src>
         <tgt>
            <scheme>https</scheme>
            <domain>map.google.com</domain>
            <port>443</port>
            <path>/value</path>
            <query>blah</query>
         </tgt>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
      <mapIndividual>
         <src>
            <scheme>https</scheme>
            <domain>photos.yahoo.com</domain>
            <path>somepath</path>
            <query>blah</query>
         </src>
         <loc>C:\var\tmp</loc>
         <x>blah</x>
         <y>blah</y>
      </mapIndividual>
   </maps>
</mapGeo>

Do note: this is ussing the fact that . (dot) precedes and ~ follows (tilde) letters in alphabetical order (at least for US). Also might (sic) not scale well...

I'm with Martin Honnen comment: this would be better solved in XSLT 2.0