1
votes

I have removed the duplicates entry based on one attributes in xml. My problem is need to remove the duplicates for comparing multiple attributes column.

Input
    <Id>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
      </Id>

Expected output:

  <Id>
    <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
    <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
  </Id>

Please provide the xquery for my requirement.

The below query is based on Auto_id only.

for $d in distinct-values(xdmp:directory("/documents/","1")//Id/tbl_Keysight_Input/@Auto_Id)
let $items := xdmp:directory("/documents/","1")/id/tbl_Keysight_Input[@Auto_Id = $d]
order by $d
return 

         for $i in $items [position() le 1]
         return $i
2
Using fn:deep-equal()?har07
@har07 its return only true and false statement, But I need the resultAntony
Can't you apply the same strategy which you used when removing the duplicates entry based on one attributes? Only now use deep-equal() to determine equality between 2 elements (instead of comparing one attributes)..har07
Hi @har07 I have added my query , How to check Auto_Id and Product_No based duplicates?Antony
I'm curious about the functional side of this. What is the use case behind this? Where is the input coming from? Bear in mind that having aggregate documents like above input, prevent solutions that would allow scaling to large amounts. I'd normally think about storing items separately, and maybe using an md5 or sha hash key to trace duplicates.grtjn

2 Answers

1
votes

Assuming that all elements to be compared reside within the same parent element, you can check, for each tbl_Keysight_Input, if any preceding-sibling element is deep-equal, and only return tbl_Keysight_Input where none of the preceding elements are deep-equal. So for each group of elements with the same attributes, only the first element will be taken since that one has no preceding duplicate.

I don't have marklogic for testing this though, but the following should illustrate the idea in XQuery :

for $x in xdmp:directory("/documents/","1")/id/tbl_Keysight_Input
where count($x/preceding-sibling::tbl_Keysight_Input[fn:deep-equal(.,$x)]) = 0
return $x
1
votes

The easiest way to compare and filter would be to use fn:deep-equal(). Since you have a directory of XML documents and want to compare these elements across documents, you may need to use a temporary XML structure.

You could select all of the tbl_Keysight_Input elements, put them into a temporary element structure, so that they are in the same element. Then, select and iterate through each tbl_Keysight element and use fn:deep-equals() in a predicate to ensure that they are unique.

The following will work, but depending on the number of documents in the directory, and the number of tbl_Keysight_Input elements that they contain, this might not scale.

for $x in <temp>{xdmp:directory("/documents/","1")/id/tbl_Keysight_Input}</temp>/*
where $x[not(preceding-sibling::*[fn:deep-equal(., $x)])]
return $x