Lucene query parsing behaviour - joining query parts with AND

Question

Let's say we have a Lucene index having few documents indexed using StopAnalyzer.ENGLISH_STOP_WORDS_SET. A user is issuing two queries:

foo:bar
baz:"there is"

Let's assume that the first query yields some results because there are documents matching that query.

The second query yields 0 results. The reason for this is because when baz:"there is" is parsed, it ends up as a void query as both there and is are stopwords (technically speaking, this is converted to an empty BooleanQuery having no clauses). So far so good.

However, any of the following combined queries

+foo:bar +baz:"there is"
foo:bar AND baz:"there is"

behave exactly the same way as query +foo:bar, that is, brings back some results - all despite the second AND part which yields no results.

One might argue that when ANDing, both conditions have to be met, but they aren't.

It seems contradictory as an atomic query component has different impact on the overall query depending on the context. Is there any logical explanation for this? Can this be addressed in any way, preferably without writing own QueryAnalyzer? Can this be classified as a Lucene bug?

If this makes any difference, observed behaviour happens under Lucene v3.0.2.

This question was also posted on Lucene Java users mailing list, no answers came so far.

Sindri Traustason Sindri Traustason · Accepted Answer · 2011-05-18T15:56:38

I would suggest not using the StopAnalyzer if you want to be able to search for phrases like "there is". StopAnalyzer is essentially a lossy optimization method and unless you are indexing huge text documents it's probably not worth it.

Lucene query parsing behaviour - joining query parts with AND

3 Answers