15
votes

I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered.

Is it possible to change this behaviour? I know that '+' can be use to force a term to be included but I would like to make that the default action.

Ideally I would like functionality similar to Google's: '-' to exclude words and "abc xyz" to group words.

Just to clarify I also thought of inserting '+' into all spaces in the query. I just wanted to avoid detecting grouped terms (brackets, quotes etc) and potentially breaking the query. Is there another approach?

5

5 Answers

32
votes

This looks similar to the Lucene Sentence Search question. If you're interested, this is how I answered that question:

String defaultField = ...;
Analyzer analyzer = ...;
QueryParser queryParser = new QueryParser(defaultField, analyzer);

queryParser.setDefaultOperator(QueryParser.Operator.AND);

Query query = queryParser.parse("Searching is fun");
2
votes

Like Adam said, there's no need to do anything to the query string. QueryParser's setDefaultOperator does exactly what you're asking for.

0
votes

Why not just preparse the user search input and adjust it to fit your criteria using the Lucene query syntax before passing it on to Lucene. Alternatively, you could just create some help documentation on how to use the standard syntax to create a specific query and let the user decide how the query should be performed.

0
votes

Lucene has a extensive query language as described here that describes everything you want except for + being the default but that's something you can simple handle by replacing spaces with +. So the only thing you need to do is define the format you want people to enter their search queries in (I would strongly advise to adhere to the default Lucene syntax) and then you can write the transformations from your own syntax to the Lucene syntax.

0
votes

The behavior is hard-coded in method addClause(List, int, int, Query) of class org.apache.lucene.queryParser.QueryParser, so the only way to change the behavior (other than the workarounds above) is to change that method. The end of the method looks like this:

if (required && !prohibited)
  clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST));
else if (!required && !prohibited)
  clauses.addElement(new BooleanClause(q, BooleanClause.Occur.SHOULD));
else if (!required && prohibited)
  clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST_NOT));
else
  throw new RuntimeException("Clause cannot be both required and prohibited");

Changing "SHOULD" to "MUST" should make clauses (e.g. words) required by default.