1
votes

I'm trying to get unique set of data from the XML below

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>John</name>
  <name>Smith</name>
  <name>Adam</name>
</output>
<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Adam</name>
  <name>Smith</name>
  <name>John</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Jeff</name>
  <name>Adam</name>
</output>

Since the 3 output blocks contain the same information, I only need to pick one. But, when I use distinct-values() function, I'm getting all three of them in their respective order.

I have assigned the above table as $final and below is what I'm getting

for $f in distinct-values($final)
return $f

output

DBDatabase systemsSmithJohnAdam
DBDatabase systemsJohnSmithAdam
DBDatabase systemsAdamSmithJohn

expected

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>

no need for ordering in I tried to sort the name tag but its not working out as it adds too much to the code. Is there any logic in Xquery to get one copy from the above XML ?

3
The three <output> elements are structurally different, so a simple deep-equal(...) does not work. When exactly to you consider two elements to contain "the same information"? What if a <name> was duplicated in one of them? Do you only want to disregard ordering?Leo Wörteler
If the 3 <output> tags are "equivalent" to you, why not just grab the first ([1]) tag and its children?Jack Fleeting
I consider two elements equal when the info in them is the same. Consider the above case as a textbook and authors. All three are the same. I can't take only first, because this is a sample among many other textbooks and the number of repetitions are differentrachithr
If that's the case, you may need to expand the sample xml in the question to show another case and how these cases relate to each other; for example, is the "same information" repeated sequentially (for example, 3 times in a row, as in your question) or can it be mingled with "same information" from another book/authors?Jack Fleeting
Updated the Question. The "title" is key here. Need to know books and their authors. You can assume that there is no other entry for the same book with different sets of authors.rachithr

3 Answers

1
votes

Try something along these lines on your actual xml:

let $inv :=
<doc>
 [your xml above]
</doc>
let $titles := $inv//output/title
for $title in distinct-values($titles)
return $inv//output[title[$title]][1]

Output:

<output>
  <category>DB</category>
  <title>Database systems</title>
  <name>Smith</name>
  <name>John</name>
  <name>Adam</name>
</output>
<output>
  <category>Others</category>
  <title>Pattern Recognition</title>
  <name>Adam</name>
  <name>Jeff</name>
</output>
0
votes

An option could be :

doc("data.xml")//output/*[not(preceding::*=.)]

Output :

<category>DB</category>
<title>Database systems</title>
<name>Smith</name>
<name>John</name>
<name>Adam</name>
0
votes

In XQuery 3, I think the shortest and most efficient is to use group by:

for $output in //output
group by $title := $output/title
return head($output)

https://xqueryfiddle.liberty-development.net/jyH9Xv5