2
votes

I have XML file there is <a> and <b> for each element

I want to write a query using XQuery to return True or False

there is an element called <element>.

each <element> has 2 element in it <a>and<b>.

Return False : if there is any <a> has the same value as another <a> in another element && there <b>'s value are different

otherwise True : <a> values are differnt in each element or there is similarity but there <b> values are different

for example

<root>
<element>
   <a>ttt</a>
   <b>tttsame</b>
</element>
<element>
   <a>ttt</a>
   <b>tttsame</b>
</element>
<element>
   <a/>
   <b>value</b>
</element>
<element>
   <a>rrr</a>
   <b>rrrvalue</b>
</element>
<element>
   <a>mmm</a>
   <b>rrrvalue</b>
</element>
<element>
   <a>mmm</a>
   <b>rrrvalue</b>
</element>
</root>

This one should be okay should return true

<root>
<element>
   <a>ttt</a>
   <b>ttt value</b>
</element>
<element>
   <a>ttt</a>
   <b>ttrdiff</b>
</element>
<element>
   <a/>
   <b>value</b>
</element>
<element>
   <a>mmm</a>
   <b>rrrvalue</b>
</element> 
</root>

shoudn't be accepted because ttt has two different values should return false

3
This is still unclear. What's the difference between the "good" XML and the "bad" XML?zx485
There is still no clear relation between input XML and output XML.zx485

3 Answers

2
votes

You could group on a and then check if there is more than one distinct b in any group, for instance with

not
(
    for $a-group in root/element
    group by $a := $a-group/a
    where tail(distinct-values($a-group/b))
    return $a-group
)

https://xqueryfiddle.liberty-development.net/6qM2e2r/0 and https://xqueryfiddle.liberty-development.net/6qM2e2r/1 has your two input samples.

As for how it works, the question asks to return false "if there is any <a> has the same value as another <a> in another element && there <b>'s value are different".

To find element elements with the same a child element we can group by $a := $a-group/a in a for $a-group in root/element expression. The distinct or different b values in each group of as with the same value are computed by distinct-values($a-group/b), if there are at least two different b values then tail(distinct-values($a-group/b)) contains at least one value, otherwise it is an empty sequence. This works as through XQuery 3's group by clause "In the post-grouping tuple generated for a given group, each non-grouping variable is bound to a sequence containing the concatenated values of that variable in all the pre-grouping tuples that were assigned to that group" (https://www.w3.org/TR/xquery-31/#id-group-by) so that after the group by $a := $a-group/a clause the variable $a-group is bound to a sequence of element elements with the same grouping key based on the a child element.

So the complete for .. group by .. where .. return selects the groups of element elements with the same a value where there are at least two different/distinct b values.

As the requirement is to "return false" if any such groups exist the not() function is applied to implement that condition as the boolean value of a non-empty sequence is true and the not(..) then gives false if there are any elements meeting the condition expressed in the for selection.

2
votes

Simple XPath 2.0:

empty(
        (for $parentA-Dubled in /*/*[a = following-sibling::*/a]
           return
             empty($parentA-Dubled/following-sibling::*
                                        [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
        )
        [not(.)]
      )

XSLT 2.0 - based verification:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:template match="/">
    <xsl:value-of select=
    "empty(
            (for $parentA-Dubled in /*/*[a = following-sibling::*/a]
              return
                empty($parentA-Dubled/following-sibling::*
                                         [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
            )
             [not(.)]
          )
     "/>
    </xsl:template>
</xsl:stylesheet>

When this transformation is applied on any XML document, it evaluates the XPath expression and outputs the result of this evaluation.

When applied on the first provided XML document, the wanted, correct result is produced:

true

When applied on the second provided XML document, again the wanted, correct result is produced:

false

Explanation:

This sub-expression:

(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
               return
                 empty($parentA-Dubled/following-sibling::*
                          [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])

evaluates to a sequence of boolean values: true() / false()

true() is returned when this is true:

empty($parentA-Dubled/following-sibling::*
                          [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])

This means that true() is returned for every occasion when there is an $parentA-Dubled/a that has no other a (a child of a following sibling of $parentA-Dubled with the same value as $parentA-Dubled/a but the value of its b sibling is different than the value of $parentA-Dubled/b.

To summarize: true() is returned when for all a elements with the same value, their b siblings also have (all b s) the same value

Then when is the case when false() is returned?

Returning false() means that empty() returned false() -- that is, there exists at least one occasion of two a elements that have the same value, but their b siblings have different values.

Thus, the sub-expression above returns a sequence such as:

true(), true(), true(), ..., true() -- all values are true()

or

true(), true(), true(), ..., false), ..., true() -- at least one of the values is false()

The original problem requires us to return true() in the first case and to return false() in the second case.

This is easy to express as:

empty($booleanSequence[. eq false()]) -- and this is equivalent to the shorter:

empty($booleanSequence[not(.)])

Now, we just need to substitute in the above expression $booleanSequence with the first sub-expression that we analyzed above:

(for $parentA-Dubled in /*/*[a = following-sibling::*/a]
               return
                 empty($parentA-Dubled/following-sibling::*
                          [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])

Thus we obtain the complete XPath expression that solves the original problem:

empty(
        (for $parentA-Dubled in /*/*[a = following-sibling::*/a]
           return
             empty($parentA-Dubled/following-sibling::*
                                        [$parentA-Dubled/a eq a and $parentA-Dubled/b ne b])
        )
        [not(.)]
      )
0
votes

Try this XQuery code to get only one distinct item of <a> (The corresponding <b> value is not specified; here, the first element is chosen):

let $file := doc("input.xml")/root,
    $vals := distinct-values($file/element/a) return
  <root>
    {for $i in $vals return $file/element[a=$i][1]}
  </root>

Its result is:

<root>
    <element>
        <a>ttt</a>
        <b>ttt value</b>
    </element>
    <element>
        <a/>
        <b>value</b>
    </element>
    <element>
        <a>rrr</a>
        <b>rrrvalue</b>
    </element>
    <element>
        <a>mmm</a>
        <b>rrrvalue</b>
    </element>
</root>