1
votes

I understand that lucene's AND (&&), OR (||) and NOT (!) operators are shorthands for REQUIRED, OPTIONAL and EXCLUDE respectively, which is why one can't treat them as boolean operators (adhering to boolean algebra).

I have been trying to construct a simple OR expression, as follows

q = +(field1:value1 OR field2:value2)

with a match on either field1 or field2. But since the OR is merely an optional, documents where both field1:value1 and field2:value2 are matched, the query returns a score resulting in a match on both the clauses.

How do I enforce short-circuiting in this context? In other words, how to implement short-circuiting as in boolean algebra where an expression A || B || C returns true if A is true without even looking into whether B or C could be true.

1
Did you already read this? searchhub.org/2011/12/28/why-not-and-or-and-notarun
@arun - Thanks for posting the link. Its a good overview of the various operators solr/lucene provides. But it doesn't answer my question regarding short-circuiting. It seems to me that one way of simulating what I want is to write the following query: (x !y !z) OR (y !z !x) OR (z !x !y). But the problem with the previous query is that all the three clauses are executed irrespective of a match on any clause (defeating the purpose of the OR operator).Deepak
I think the URL suggested moved to here lucidworks.com/post/why-not-and-or-and-notcquezel

1 Answers

2
votes

Strictly speaking, no, there is no short circuiting boolean logic. If a document is found for one term, you can't simply tell it not to check for the other. Lucene is an inverted index, so it doesn't really check documents for matches directly. If you search for A OR B, it finds A and gets all the documents which have indexed that value. Then it gets B in the index, and then list of all documents containing it (this is simplifying somewhat, but I hope it gets the point across). It doesn't really make sense for it to not check the documents in which A is found. Further, for the query provided, all the matches on a document still need to be enumerated in order to acquire a correct score.

However, you did mention scores! I suspect what you are really trying to get at is that if one query term in a set is found, to not compound the score with other elements. That is, for (A OR B), the score is either the score-A or the score-B, rather than score-A * score-B or some such (Sorry if I am making a wrong assumption here, of course).

That is what DisjunctionMaxQuery is for. Adding each subquery to it will render a score from it equal to the maximum of the scores of all subqueries, rather than a product.

In Solr, you should learn about the DisMaxQParserPlugin and it's more recent incarnation, the ExtendedDisMax, which, if I'm close to the mark here, should serve you very well.