Lucene Query Parser

Question

I am currently trying to get some fulltext querying done in Lucene. What I would like to achieve is the following:

Upon getting a search term like

"hello AND world"

I would like a query that searches for both terms on all fields. However, both terms do not have to occur in only one field but have to occur in all the fields.

Thus, the result should look like:

+(field1:hello field2:hello) +(field1:world field2:world)

When using a MultiFieldQueryParser I only get the following:

(+field1:hello +field1:world) (+field2:hello +field2:world)

As I understood, this requires every term to occur in only one field.

Is there any chance to get such a behavior realised using default Lucene features, or do I have to implement my own query parser?

My current approach is to concatenate all the field contents on the domain object in only one field and query only that one. However, this approach is pretty ugly...

Thanks, Matthias

femtoRgon femtoRgon · Accepted Answer · 2013-05-09T17:00:07

I absolutely disagree that your current approach is ugly. I find that collecting all content into an everything field is the cleanest way to enable a find-it-anywhere search.

If you are manually concatenating fields, though, that could be a bit messy. Instead you can add multiple field with the same name, which will all be effectively concatenated in the index. Something like:

//Don't actually construct your fields this way.
//Just cutting out some of the boilerplate for simplicity.
document.add(new Field("field1", firstvalue));
document.add(new Field("everything", firstvalue));
document.add(new Field("field2", nextvalue));
document.add(new Field("everything", nextvalue));

Will work fine for getting it all into the same field quite nicely. Generally, as long as the "everything" field isn't stored (it certainly shouldn't be), this should have very little impact on index size, and should perform well. I've previously just created a utility call that adds the field to the document, and adds it to the "everything" or "all" field transparently for anything being indexed.

Check out the Solr docs where they recommend this pattern through the use of their copyField schema element, for this situation.

If you really want to use MultiFieldQueryParser, you may need to parse subqueries separately, and join them with a booleanQuery, like:

BooleanQuery bq = new BooleanQuery();
bq.add(new BooleanClause(multifieldQP.parse("hello"), BooleanClause.Occur.MUST));
bq.add(new BooleanClause(multifieldQP.parse("world"), BooleanClause.Occur.MUST));
searcher.search(bq);

But there would be complexities with breaking the query up, if it's user entered, to handle that automatically. Again, I'd stick with what you are currently doing.

Lucene Query Parser

1 Answers