2
votes

I'm trying to find the difference between two xmls following the below steps.

  1. Get all distinct paths of inner child elements of both left and right xml.
  2. Loop through the left path and check if it matches any right path.
  3. If it is not matched,then it is a new element.
  4. If it matches,I'm doing map1-map2 of child elements.This gives me the changed child elements(both element and attribute changes).

But I need to identify what has changed,whether element text or attribute value and list it.

Please let me know an approach in Xquery to do this.

2
@DavidBrossard Thanks! I'm able to find if the xmls are diffrerent using fn:deep-equal function.But I want to know what is different,whether it is an element value or attribute value that has changed - Antony
Could you include attribute paths in your distinct path lists, and have your map be path=>text instead of path=>element? Then changes to attributes and element text content fall out automatically. - BenW
@BenW how to include the attribute paths ? Curently Im taking paths of second level child elements as distinct-values(local:path-to-node($left/child::*/child::*)) This is returning the elements but not the attributes. - Antony
I left an answer. - BenW

2 Answers

3
votes

Comparing XML is not entirely trivial, but it has been attempted often enough. Not the most recent, but you could take a look at this:

https://github.com/ryanjdew/marklogic-xml-diff

I have also used this in the past, although XSLT rather than XQuery, and not as detailed. Worked nice for regression tests though, and can be used in MarkLogic too:

http://xsltunit.org/xsltunit.xsl

There are also commercial products out there, like:

https://www.deltaxml.com/

Although, not sure if the above can be used in a headless way too..

HTH!

0
votes

As grtjn points out, comparing XML can get pretty complicated, particularly if your XML includes mixed content, or if you care about element ordering, or you have elements that can occur more than once. If those don't apply, then it's possible to take some shortcuts.

In your example, it might be good enough to add attributes to your path lists, and then compare everything (elements and attributes) based only on their string content.

The path list has to include attributes. You can take the union of different axises like this, or whatever criteria you have to include attributes and leaf elements and exclude text nodes and non-leaf elements.

local:path-to-node($left/child::*/child::*/(self::*[text()]|attribute::*))

And your path-to-node function has to be able to cope with being passed attributes. So you'd have to make some changes to your path-to-node function to make sure attributes are named correctly (with an '@'), and (if applicable) don't get a position marker

declare function local:path-to-node($node as node()){
  let $attr := typeswitch ($node) case attribute() return '@' default return ()
  return string-join(('', $node/ancestor::*/name(.), $attr||name($node)), '/')
};

Depending how complicated your path-to-node function is, it might be easier to have two of them.

fn:distinct-values((
  local:path-to-element-node($left/child::*/child::*),
  local:path-to-attribute-node($left/child::*/child::*/@*)
))