0
votes

I need to compare xml files from two folders and collect those xml elements that only show up in one of the xml file.

The xml files in two folder has same file name. Below is the sample of what I want to do:

old/booklist1.xml

<books>
    <book @type="fiction">
        <isn>12345678</isn>  
        <name>xxxx</name>
    </book>
</books>

new/booklist1.xml

<books>
    <book @type="fiction">
        <isn>12345678</isn>  
        <name>xxxx</name>
    </book>
    <book @type="history">
        <isn>23456789</isn>  
        <name>yyyyy</name>
    </book>
</books>

I will need the output of the booklist1.xml as the below:

<books>
    <book @type="history">
        <isn>23456789</isn>  
        <name>yyyyy</name>
    </book>
</books> 

I have below findDiff.xsl that works when I specify / hardcode the xml file name:

<xsl:key name="book" match="book" use="." />

<xsl:template match="/books">
    <xsl:copy>
        <xsl:copy-of select="book[not(key('book', ., document('old_booklist1.xml')))]"/>
    </xsl:copy>
</xsl:template>

The fidDiff.xsl current is associated with new/booklist1.xml and I copied the old/booklist1.xml to the same folder with new/booklist1.xml and made the name as old_booklist1.xml and above xsl works with the hard coded uri. I have to loop throw xml file in folder new and then compare it with the same named xml file in folder old.

I am thinking to use the following way to build the xml file URI:

  1. loop in the new and get the file uri

  2. build the file uri for xml file in old folder

    <xsl:variable name="xmlPath" select="document-uri()"/>

    <xsl:variable name="compareWithPath" select=" replace($xmlFilePath, 'new', 'old')"/>

then pass the compareWithPath to below template:

<xsl:template match="/books">
        <xsl:copy>
            <xsl:copy-of select="book[not(key('book',., document($compareWithPath)))]"></xsl:copy-of>
        </xsl:copy>
    </xsl:template>

But I got the error that The system cannot find the file specified file:/C:/Users/phyllis/Documents/old/booklist1.xml

Michael Kay mentioned that we can convert the file name to URI and use doc() or document() to load it. I build the filename URI exactly the same way that I got from document-uri(). What am I wrong here?

The converted file URI looks like this:

<compareWithPath>file:/C:/Users/phyllis/Documents/old/booklist1.xml</compareWithPath>

Returns false when check above file URI using:

<fileExist><xsl:value-of select="doc-available($compareWithPath)"/></fileExist>
1
So where do you "loop" through files in a folder? Are you using Saxon's uri-collection('old?select=*.xml') or where/how exactly do you try to find and load the URIs?Martin Honnen
There might be less memory consumption if you use a version of Saxon (9.8 or later) that supports XSLT/XPath 3 where in addition to the collection function there is the uri-collection function which only gives you the URI of the files in the collection but doesn't pull them in all together. Additionally, in the commercial editions of Saxon you have a discard-document function to avoid memory problems I think.Martin Honnen
Once you have the file access working, I would look at whether you can't improve the key you have with <xsl:key name="book" match="book" use="." />, given that book seems to have various child elements and whitespace any key on the complete contents can easily break by a change in indentation or white-space stripping. Perhaps a composite key on the particular elements you need to use to identify a book is a better approach in XSLT 3.Martin Honnen
If you have uri-collection() and want to process each document use e.g. <xsl:apply-templates select="uri-collection() ! doc(.)"/> or e.g. <xsl:iterate select="uri-collection() ! doc(.)">...</xsl:iterate> or for-each with the same select if wanted/needed.Martin Honnen
Perhaps ask that key problem in a new, separate question will all details, I would still think that in XSLT 3 using a composite key with e.g. <xsl:key name="book" match="book" composite="yes" use="*" /> and book[not(key('book', *, document($compareWithPath)))] is a cleaner approach.Martin Honnen

1 Answers

0
votes

The below xsl code works well for my problem:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="3.0">
   
    <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
   
    <xsl:key name="book" match="book" composite="yes" use="*"/>
   
    <xsl:template match="books">
        <xsl:param name="compareWith"/>
        <xsl:copy>
            <xsl:copy-of
                select="book[not(key('book', *, document($compareWith)))]"/>
        </xsl:copy>
    </xsl:template>
   
    <xsl:template match="/">
        <xsl:copy>
            <xsl:iterate select="uri-collection('new') ! doc(.)">
                <xsl:variable name="fileUri" select=" concat('update/', tokenize( document-uri(.),'/')[last()])"/>
                <xsl:result-document method="xml" href="{$fileUri}">
                    <xsl:apply-templates select="books">
                        <xsl:with-param name="compareWith" select="concat('old/', tokenize( document-uri(.),'/')[last()])"/>
                    </xsl:apply-templates>
                </xsl:result-document>
            </xsl:iterate>
        </xsl:copy>
</xsl:template>
</xsl:stylesheet>

Only a key() is able to eliminate the duplicates, find diff, compare file, etc. Never thought it could be so easy to solve this kind of problem using xsl but once and once xsl proved his power when it comes to xml.

<xsl:iterate select="uri-collection() ! doc(.)">...</xsl:iterate> iterate through uri-collection() makes it easy to loop through the folders.