Parsing XSLT/Xpath data for a list of XML nodes

Question

I am searching for a lib or tool or even some simple code that can parse the Xpath/XSLT data in our XSLT files to produce a Dictionary/List/Tree of all the XML nodes that the XSLT is expecting to work on or find. Sadly everything I am finding is dealing with using XSLT to parse XML rather than parsing XSLT. And the real difficult part I'm dealing with is how flexible XPath is.

For example in the several XSLT files we work with an entry may select on

nodeX/nodeY/nodeNeeded;

OR

../nodeNeeded;

OR

select nodeX then select nodeY then select nodeNeeded; and so forth.

What we would like to do is to be able to parse out that XSLT doc and get a data structure of sorts that explicitly tell us that the XSLT is looking for nodeNeeded in path nodeX, nodeY so that we can custom build the XML data in a minimalism fashion

Thanks!

Here is a mocked up sub-set of data for visualization purposes:

<server_stats>
    <server name="fooServer">
        <uptime>24d52m</uptime>
        <userCount>123456</userCount>
        <loggedInUsers>
            <user name="AnnaBannana">
                <created>01.01.2012:00.00.00</created>
                <loggedIn>25</loggedIn>
                <posts>3</posts>
             </user>
         </loggedInUsers>
         <temperature>82F</temperature>
         <load>72</load>
         <mem_use>45</mem_use>
         <visitors>
             <current>42</current>
             <browsers name="mozilla" version="X.Y.Z">22</browsers>
             <popular_link name="index.html">39</popular_link>
             <history>
                 <max_visitors>789</max_visitors>
                 <average_visitors>42</average_visitors>
             </history>
         </visitors>
     </server>
 </server_stats>

From this one customer may just want create an admin HTML page where they pull the hardware stats out of the tree, and perhaps run some load calculations from the visitor count. Another customer may just want to pull just the visitor count information to display as information on their public site. To have each of these customers system load to be as small as possible we would like to parse their stat selecting XSLT and provide them with just the data they need (which has been requested). Obviously the issue is that one customer may perform a direct select on the visitor count node and another may select the visitors node and select each of the child nodes they want etc.

The 2 hypothetical customers looking for the "current" node in "visitors" might have XSLT looking like:

<xsl:template match="server_stats/server/visitors">
    <xsl:value-of select="current"/>
</xsl:template>

OR

<xsl:template match="server_stats">
     <xsl:for-each select="server">
          <xsl:value-of select="visitors/current"/>
          <xsl:value-of select="visitors/popular_link"/>
     </xsl:for-each>
</xsl:template>

In this example both are trying to select the same node but the way they do it is different and "current" is not all that specific so we also need the path they used to get there since "current" could be nodes for several items. This hurts us from just looking for "current" in their XSLT and because the way they access the path can be very different we cant just search for the whole path either.

So the result we would like is to parse their XSLT and give us say a List of stats:

Customer 1:
visitors/current
Customer 2:
visitors/current
visitors/popular_link

etc.

Some example selects that break the solution provided below which we will be working on solving:

<xsl:variable name="fcolor" select="'Black'"/> results in a /'Black' entry
<xsl:for-each select="server"> we get the entry, but its children don't show it anymore
<xsl:value-of select="../../@name"/>  This was kind of expected, we can try to figure out how to skip attribute based selections but the relative paths show up as I thought they would
<xsl:when test="substring(someNode,1,2)=0 and substring(someNode,4,2)=0 and substring(someNode,7,2)>30">  This one is kind of throwing me, because this shows up as a path item, it's due to the when check in the solution but I don't see any nice solution since the same basic statement could have been checking for a branching path, so this might just be one of those cases we need to post-process or something of that nature.

Ann L. Ann L. · Accepted Answer · 2012-03-02T19:36:29

That's going to be challenging, because XSLT is so context-dependent. You're right to call this "parsing" because you're going to have to duplicate a lot of the logic that would go into a parser.

My suggestion would be to start with a brute-force approach, and refine it as you find more test cases that it can't handle. Look at a couple of XSLT files and write code that can find the structures you're looking for. Look at a few more and if any new structures appear, refine your code to find those, too.

This will not find every possible way that XSLT and XPath can be used, as a purely empirical approach to parsing these files would, but it will be a much smaller project and will find the structures that whoever developed the files tended to use.

Parsing XSLT/Xpath data for a list of XML nodes

2 Answers