0
votes

I am using MarkLogic to query over a relative large xml dataset. Now I am using two sets of query like these:

  1. xdmp:estimate(cts:search(fn:doc(), cts:and-query(($query, $text-query, $sent-query))))

  2. xdmp:estimate(cts:search(fn:doc(), cts:and-not-query(cts:and-query(($query, $text-query)), $sent-query)))

where, $text-query := cts:word-query("coke")

$sent-query := cts:and-query((cts:element-range-query(xs:QName("score_id"),">=",$lowValue), cts:element-range-query(xs:QName("score_id"),"<",$hiValue)))

$query := cts:word-query("diet coke")

$lowValue := 13264683002210000000;

$hiValue := 13264683002211000000;

For both the sets(query no. 1 and 2), I am getting some counts, but when I am removing the xdmp:estimate part, the cts:search() for query 1. is returning the xmls, where as for the query no. 2, I am getting an empty sequence.

My question is that, if cts:search is not returning any xmls, then how can the xdmp:estimate is counting the nodes for the query no. 2 ??

Is the cts:and-not-query is not comfortable with the cts:element-range-query, although ???

N.B. I am getting some counts with xdmp:estimate with out any errors. Range index is created over the field "score_id".

3

3 Answers

2
votes

Try adding the unfiltered option to the cts:search calls.

http://docs.marklogic.com/5.0doc/docapp.xqy#display.xqy?fname=http://pubs/5.0doc/apidoc/SearchBuiltins.xml&category=SearchBuiltins&function=cts:search

Chances are that cts:search is returning empty because fragments match the indexes but are filtered out as false positives. The unfiltered results should help you decide if that is correct, or might be evidence of a bug.

2
votes

In particular, the "estimate" will use indexes only. If you do not have "fast phrase" indexes it can't tell directly from the indexes if the phrase "diet coke" is in the document - it will instead use the indexes in the "index resolution" phase of a query to find all documents with both "diet" and "coke." Later for the real query it will "filter" these candidate documents to see if the two words are actually right next to each other.

See the MarkLogic search developers guide, with particular attention to "index resolution" and "filtering."

1
votes

The problem probably comes from xdmp:estimate as it shows the number of fragments that potentially can have hits (false positive too), not only does that actually have.

Try using fn:count instead of xdmp:estimate as it will produce the correct result each time,the only problem is it is slower than xdmp:estimate

For more information: MarkLogic Server: Search Developer's Guide Chapter 11