5
votes

I have a Lucene index that contains documents that have a "type" field, this field can be one of three values "article", "forum" or "blog". I want the user to be able to search within these types (there is a checkbox for each document type)

How do I create a Lucene query dependent on which types the user has selected?

A couple of prerequisites are:

  • If the user doesn't select one of the types, I want no results from that type.
  • The ordering of the results should not be affected by restricting the type field.

For reference if I were to write this in SQL (for a "blog or forum search") I'd write:

SELECT * FROM Docs
WHERE [type] in ('blog', 'forum')
3

3 Answers

4
votes

For reference, should anyone else come across this problem, here is my solution:

IList<string> ALL_TYPES = new[] { "article", "blog", "forum" };
string q = ...; // The user's search string
IList<string> includeTypes = ...; // List of types to include
Query searchQuery = parser.Parse(q);
Query parentQuery = new BooleanQuery();
parentQuery.Add(searchQuery, BooleanClause.Occur.SHOULD);
// Invert the logic, exclude the other types
foreach (var type in ALL_TYPES.Except(includeTypes))
{
    query.Add(
        new TermQuery(new Term("type", type)),
        BooleanClause.Occur.MUST_NOT
    );
}
searchQuery = parentQuery;

I inverted the logic (i.e. excluded the types the user had not selected), because if you don't the ordering of the results is lost. I'm not sure why though...! It is a shame as it makes the code less clear / maintainable, but at least it works!

3
votes

Add a constraints to reject documents that weren't selected. For example, if only "article" was checked, the constraint would be

-(type:forum type:blog)
0
votes

While erickson's suggestion seems fine, you could use a positive constraint ANDed with your search term, such as text:foo AND type:article for the case only "article" was checked, or text:foo AND (type:article OR type:forum) for the case both "article" and "forum" were checked.