I found that the following code worked on a small set of my data, but I didn't realize that I hadn't taken any samples with multiple comments. When I tried to apply the code to the actual database, which has multiple comments per entry, I received the error mentioned above.
Current code:
for $doc in doc('test')
let $results :=
(
let $pKeywords := ('best clients', 'Very', '20')
return
for $kw in $pKeywords
return
(
$doc/set/entry[contains(comment, concat('!', $kw))],
$doc/set/entry[contains(comment, $kw)]
)
[not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
subsequence($results/comment, $i, 1),
subsequence($results/buyer, $i, 1)
)
Document:
<set>
<entry>
<comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer></buyer>
<id>1282</id>
<industry>International Trade; Fish and Game</industry>
</entry>
<entry>
<comment>!On leave in October.</comment>
<comment>!Planning to make a large purchase before Christmas.</comment>
<buyer></buyer>
<id>709</id>
<industry>Real Estate</industry>
</entry>
<entry>
<comment>Is often !out between 1 and 3 p.m.</comment>
<buyer></buyer>
<id>127</id>
<industry>Virus Software Marketting</industry>
</entry>
<entry>
<comment>Very personable. One of our best clients.</comment>
<buyer></buyer>
<id>14851</id>
<industry>Administrative support.</industry>
</entry>
<entry>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<comment>His wife often answers the phone. That means he is out of the office.</comment>
<buyer></buyer>
<id>1458</id>
<industry>Construction</industry>
</entry>
<entry>
<comment></comment>
<buyer></buyer>
<id>276470</id>
<industry>Bulk Furniture Sales</industry>
</entry>
<entry>
<comment>A bit of an eccentric. One of our best clients.</comment>
<buyer></buyer>
<id>1506</id>
<industry>Sports Analysis</industry>
</entry>
<entry>
<comment>Very gullible, so please !be sure she needs what you sell her. She's one of our best clients.</comment>
<buyer></buyer>
<id>1523</id>
<industry>International Trade</industry>
</entry>
<entry>
<comment>He wants to buy everything, but !he has a tight budget.</comment>
<comment>!His company may be closing soon.</comment>
<buyer></buyer>
<id>1524</id>
<industry>Public Relations</industry>
</entry>
</set>
The result:
Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.
I ran into a similar error and was able to fix it, but when I try to apply the fixes, that did not work. Example:
$doc('test')/set/entry[contains(., concat('!', $kw))],
$doc('test')/set/entry[contains(., $kw)]
returns the same result.
Walking through the desired result:
The first return should return every entry and its children if the entry's comment child contains any of the three keywords in $pKeywords.
concat('!', $kw) is supposed to make !-containing comments the priority.
The second return slices the comment and buyer nodes from the results of the first return.
As long as there is exactly 1 comment-named node in every entry, the code executes fine. When there are 2 or more comment-named nodes, the code fails, and the compiler returns the error mentioned above:
Stopped at line 9, column 22: [XPTY0004] document-node()(...): function(item()*) as item()* expected, document-node() found.
-Edit-
Desired result:
<comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer/>
<comment>Very personable. One of our best clients.</comment>
<buyer/>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<buyer/>
<comment>A bit of an eccentric. One of our best clients.</comment>
<buyer/>
Clarifying the desired result:
//contains ! and the first keyword, "best clients"; so, the first result should come from this entry.
<entry>
<comment>Very gullible, so please !be sure she needs what you sell her. She's one of our best clients.</comment>
<buyer></buyer>
<id>1523</id>
<industry>International Trade</industry>
</entry>
//Only one entry contains ! and "best clients". So, the first result containing "best clients" contains nodes for the second result.
<entry>
<comment>Very personable. One of our best clients.</comment>
<buyer></buyer>
<id>14851</id>
<industry>Administrative support.</industry>
</entry>
//This contains ! and the second keyword, "Very", but it is a duplicate. So, ideally its children should not be returned.
<entry>
<comment>!Very difficult to reach, but one of our top buyers.</comment>
<comment>His wife often answers the phone. That means he is out of the office.</comment>
<buyer></buyer>
<id>1458</id>
<industry>Construction</industry>
</entry>
//This contains ! and a string, "very" (part of everything). Nodes from this entry should be returned as the third result.
<entry>
<comment>He wants to buy everything, but !he has a tight budget.</comment>
<comment>!His company may be closing soon.</comment>
<buyer></buyer>
<id>1524</id>
<industry>Public Relations</industry>
</entry>
//The only entry whose comment child contains the keyword '20'. There is no '!'-containing comment with 20, so this nodes is the top and only node whose children should be returned.
<entry>
<comment>The client is only 20 years old. Do not be surprised by his youth.</comment>
<buyer></buyer>
<id>1282</id>
<industry>International Trade; Fish and Game</industry>
</entry>
-Edit 2-
Next pass gives a better idea of what I'm trying to accomplish, but there are some obvious syntax errors (for example, I'm still discovering how to work with arrays, as seen on line 8). I will update this as I resolve the syntax errors:
<set>
{
let $kw := ('best clients', 'Very', '20')
let $entry := doc('test')/set/entry
let $priority := '!'
for $i in (1, count($kw))
let $priority_result[$i] :=
(
for $entries in $entry
where $entry contains(., $priority) and where $entry contains $kw[$i]
return subsequence($priority_result[$i], 1, 2)
)
if $priority_result[$i] < 2
for $i in (1, count($kw))
let $secondary_result[$i] :=
(
for $entries in $entry
where $entry contains $kw[$i] and where $entry not($priority_result) and where $entry not($secondary_result[1..($i-1)])
return $secondary_result[$i]
)
else let $secondary_result[$i] := ''
for $i in (1, count($kw))
return
(
$primary_result[$i],
$secondary_result[$i]
)
}
</set>
And the suggested change, which returns a null result:
for $doc in doc('test')
let $results :=
(
let $pKeywords := ('best clients', 'Very', '20')
return
for $kw in $pKeywords
return
(
$doc/set/entry/comment[contains(., concat('!', $kw))],
$doc/set/entry/comment[contains(., $kw)]
)
[not(position() gt 2)]
)
for $i in (1 to count($results))
return
(
subsequence($results/comment, $i, 1),
subsequence($results/buyer, $i, 1)
)