Lucene: Searching multiple fields with default operator = AND

Question

To allow users to search across multiple fields with Lucene 3.5 I currently create and add a QueryParser for each field to be searched to a DisjunctionMaxQuery. This works great when using OR as the default operator but I now want to change the default operator to AND to get more accurate (and fewer) results.

Problem is, queryParser.setDefaultOperator(QueryParser.AND_OPERATOR) misses many documents since all terms must be in atleast 1 field.

For example, consider the following data for a document: title field = "Programming Languages", body field = "Java, C++, PHP". If a user were to search for Java Programming this particular document would not be included in the results since the title nor the body field contains all terms in the query although combined they do. I would want this document returned for the above query but not for the query HTML Programming.

I've considered a catchall field but I have a few problems with it. First, users frequently include per field terms in their queries (author:bill) which is not possible with a catchall field. Also, I highlight certain fields with FastVectorHighlighter which requires them to be indexed and stored. So by adding a catchall field I would have to index most of the same data twice which is time and space consuming.

Any ideas?

Regarding indexing a catchall field, have you observed a time/space hit that is cause for concern? My experience has been the indexing the same data in a specific stored field, and then adding to a generalized index-only field has a pretty minimal impact on performance or index size. — femtoRgon
Also, I wonder what the end query's structure looks like. Particularly, how the dis-max queries are set up. Easy to kill your ability to get meaningful scores with them. — femtoRgon
@femtoRgon disjunctionMaxQuery structure is like this: '((title:java title:programming) | (body:java body:programming))~0.2' You bring up a good point that adding a catchall field may have little impact as far time/space is concerned. I definitely considered it but decided against it as I would also like to keep the ability to search by field, such as author:bill. Not only do users use this feature but I use it behind the scenes. Thx. — Chris Davi

Chris Davi Chris Davi · Accepted Answer · 2012-12-17T22:10:05

Guess I should have done a little more research. Turns out MultiFieldQueryParser provides the exact functionality I was looking for. For whatever reason I was creating a QueryParser for each field I wanted to search like this:

String[] fields = {"title", "body", "subject", "author"};
QueryParser[] parsers = new QueryParser[fields.length];      
for(int i = 0; i < parsers.length; i++)
{
   parsers[i] = new QueryParser(Version.LUCENE_35, fields[i], analyzer);
   parsers[i].setDefaultOperator(QueryParser.AND_OPERATOR);
}

This would result in a query like this:

(+title:java +title:programming) | (+body:java +body:programming)

...which is not what I was looking. Now I create a single MultiFieldQueryParser like this:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_35, new String[]{"title", "body", "subject"}, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);

This gives me the query I was looking for:

+(title:java body:java) +(title:programming body:programming)

Thanks to @seeta and @femtoRgon for the help!

Lucene: Searching multiple fields with default operator = AND

3 Answers