0
votes

I found that the following code worked on a small set of my data, but I didn't realize that I hadn't taken any samples with multiple comments. When I tried to apply the code to the actual database, which has multiple comments per entry, I received the error mentioned above.

Current code:

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry[contains(comment, concat('!', $kw))],
      $doc/set/entry[contains(comment, $kw)]
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1),
  subsequence($results/buyer, $i, 1)
)

Document:

<set>
  <entry>
    <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
    <buyer></buyer>
    <id>1282</id>
    <industry>International Trade; Fish and Game</industry>
  </entry>
  <entry>
    <comment>!On leave in October.</comment>
    <comment>!Planning to make a large purchase before Christmas.</comment>
    <buyer></buyer>
    <id>709</id>
    <industry>Real Estate</industry>
  </entry>
    <entry>
    <comment>Is often !out between 1 and 3 p.m.</comment>
    <buyer></buyer>
    <id>127</id>
    <industry>Virus Software Marketting</industry>
  </entry>
  <entry>
    <comment>Very personable.  One of our best clients.</comment>
    <buyer></buyer>
    <id>14851</id>
    <industry>Administrative support.</industry>
  </entry>
  <entry>
    <comment>!Very difficult to reach, but one of our top buyers.</comment>
    <comment>His wife often answers the phone.  That means he is out of the office.</comment>
    <buyer></buyer>
    <id>1458</id>
    <industry>Construction</industry>
  </entry>
  <entry>
    <comment></comment>
    <buyer></buyer>
    <id>276470</id>
    <industry>Bulk Furniture Sales</industry>
  </entry>
  <entry>
    <comment>A bit of an eccentric.  One of our best clients.</comment>
    <buyer></buyer>
    <id>1506</id>
    <industry>Sports Analysis</industry>
  </entry>
  <entry>
    <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
    <buyer></buyer>
    <id>1523</id>
    <industry>International Trade</industry>
  </entry>
  <entry>
    <comment>He wants to buy everything, but !he has a tight budget.</comment>
    <comment>!His company may be closing soon.</comment>
    <buyer></buyer>
    <id>1524</id>
    <industry>Public Relations</industry>
  </entry>
</set>

The result:

Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.

I ran into a similar error and was able to fix it, but when I try to apply the fixes, that did not work. Example:

  $doc('test')/set/entry[contains(., concat('!', $kw))],
  $doc('test')/set/entry[contains(., $kw)]

returns the same result.

Walking through the desired result:

The first return should return every entry and its children if the entry's comment child contains any of the three keywords in $pKeywords.

concat('!', $kw) is supposed to make !-containing comments the priority.

The second return slices the comment and buyer nodes from the results of the first return.

As long as there is exactly 1 comment-named node in every entry, the code executes fine. When there are 2 or more comment-named nodes, the code fails, and the compiler returns the error mentioned above:

Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.

-Edit-

Desired result:

<comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
<buyer/>
<comment>Very personable.  One of our best clients.</comment>
<buyer/>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
<comment>A bit of an eccentric.  One of our best clients.</comment>
<buyer/>

Clarifying the desired result:

//contains ! and the first keyword, "best clients"; so, the first result should come from this entry.
  <entry>
    <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
    <buyer></buyer>
    <id>1523</id>
    <industry>International Trade</industry>
  </entry>

//Only one entry contains ! and "best clients".  So, the first result containing "best clients" contains nodes for the second result.
  <entry>
    <comment>Very personable.  One of our best clients.</comment>
    <buyer></buyer>
    <id>14851</id>
    <industry>Administrative support.</industry>
  </entry>

//This contains ! and the second keyword, "Very", but it is a duplicate.  So, ideally its children should not be returned.
  <entry>
    <comment>!Very difficult to reach, but one of our top buyers.</comment>
    <comment>His wife often answers the phone.  That means he is out of the office.</comment>
    <buyer></buyer>
    <id>1458</id>
    <industry>Construction</industry>
  </entry>

//This contains ! and a string, "very" (part of everything).  Nodes from this entry should be returned as the third result.
  <entry>
    <comment>He wants to buy everything, but !he has a tight budget.</comment>
    <comment>!His company may be closing soon.</comment>
    <buyer></buyer>
    <id>1524</id>
    <industry>Public Relations</industry>
  </entry>

//The only entry whose comment child contains the keyword '20'.  There is no '!'-containing comment with 20, so this nodes is the top and only node whose children should be returned.
  <entry>
    <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
    <buyer></buyer>
    <id>1282</id>
    <industry>International Trade; Fish and Game</industry>
  </entry>

-Edit 2-

Next pass gives a better idea of what I'm trying to accomplish, but there are some obvious syntax errors (for example, I'm still discovering how to work with arrays, as seen on line 8). I will update this as I resolve the syntax errors:

<set>
{
    let $kw := ('best clients', 'Very', '20')
    let $entry := doc('test')/set/entry
    let $priority := '!'

    for $i in (1, count($kw))
    let $priority_result[$i] :=
    (
        for $entries in $entry
        where $entry contains(., $priority) and where $entry contains $kw[$i]
        return subsequence($priority_result[$i], 1, 2)
    )

    if $priority_result[$i] < 2
    for $i in (1, count($kw))
    let $secondary_result[$i] :=
    (
        for $entries in $entry
        where $entry contains $kw[$i] and where $entry not($priority_result) and where $entry not($secondary_result[1..($i-1)])
        return $secondary_result[$i]
    )
    else let $secondary_result[$i] := ''

    for $i in (1, count($kw))
    return
    (
        $primary_result[$i],
        $secondary_result[$i]
    )
}
</set>

And the suggested change, which returns a null result:

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry/comment[contains(., concat('!', $kw))],
      $doc/set/entry/comment[contains(., $kw)]
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1),
  subsequence($results/buyer, $i, 1)
)
2

2 Answers

1
votes

The error message seems to be complaining about trying to call a document-node() as a function.

$doc('test') vs $doc


Either that, or comments(...) only works for a single node, not a node-set.

contains(comment, $kw) vs comment/contains(.,$kw)
or comment[contains(.,$kw)]
or comment[contains(text(),$kw)]


This worked for me:

<set>{
    for $entry in doc('test')/set/entry
    let $kw := (
        for $prefix in ('!','')
        for $kw in ('best clients', 'Very', '20')
        where exists($entry/comment[contains(., concat($prefix,$kw))])
        return concat($prefix,$kw)
    )[1]
    where exists($kw)
    order by not(starts-with($kw,'!'))
    return <entry keyword="{$kw}">{
      ( $entry/comment,
        $entry/buyer )
    }</entry>
}</set>

Result (multiple comments per <entry>):

<set>
   <entry keyword="!Very">
      <comment>!Very difficult to reach, but one of our top buyers.</comment>
      <comment>His wife often answers the phone.  That means he is out of the office.</comment>
      <buyer/>
   </entry>
   <entry keyword="20">
      <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very personable.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>A bit of an eccentric.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
      <buyer/>
   </entry>
</set>

This will give you separate entries for each comment:

<set>{
    for $entry in doc('test')/set/entry
    for $comment in $entry/comment
    let $kw := (
        for $prefix in ('!','')
        for $kw in ('best clients', 'Very', '20')
        where exists($comment[contains(., concat($prefix,$kw))])
        return concat($prefix,$kw)
    )[1]
    where exists($kw)
    order by not(starts-with($kw,'!'))
    return <entry keyword="{$kw}">{
      ( $comment,
        $entry/buyer )
    }</entry>
}</set>

Output:

<set>
   <entry keyword="!Very">
      <comment>!Very difficult to reach, but one of our top buyers.</comment>
      <buyer/>
   </entry>
   <entry keyword="20">
      <comment>The client is only 20 years old.  Do not be surprised by his youth.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very personable.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>A bit of an eccentric.  One of our best clients.</comment>
      <buyer/>
   </entry>
   <entry keyword="best clients">
      <comment>Very gullible, so please !be sure she needs what you sell her.  She's one of our best clients.</comment>
      <buyer/>
   </entry>
</set>
0
votes

For reference sake, this is the code we start with (it's a little daunting, and I still don't understand it all):

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry[contains(comment, concat('!', $kw))],  (: *1 :)
      $doc/set/entry[contains(comment, $kw)]                (: *1 :)
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/comment, $i, 1), (: *2 :)
  subsequence($results/buyer, $i, 1)    (: *2 :)
)

The version that doesn't throw the error is resolved in the typical way. It took me a while to catch the second error, marked by *2. Basically, because I was going one level deeper in my search, *1, I needed to go up one level for my results, ..:

for $doc in doc('test')
let $results :=
(
  let $pKeywords := ('best clients', 'Very', '20')
  return
    for $kw in $pKeywords
    return
    (
      $doc/set/entry/comment[contains(., concat('!', $kw))], (: *1, went deeper :)
      $doc/set/entry/comment[contains(., $kw)]               (: *1, went deeper :)
    )
  [not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
  subsequence($results/../comment, $i, 1), (: *2, added .. :)
  subsequence($results/../buyer, $i, 1)    (: *2, added .. :)
)

What I'm still struggling with:

1) The use of concat(). My understanding is that it puts two things together and its result for $kw[1] would be equivalent to "!best clients". The result doesn't show that, though. In the result, the exclamation point does not always stand directly before the priority query.

2) Not returning duplicate results. I'd like every entry to be unique. I need to add a routine, somewhere, that either restricts duplicates from entering into my result set or that eliminates duplicates prior to [not(position() gt 2)], where the number of results is trimmed/sliced.

Thanks, to all viewers and efforts in the works! Still looking forward to better answers!