2
votes

I am aware that the Lucene documentation says

Note: The NOT operator cannot be used with just one term. For example, the following search will return no results:

NOT "jakarta apache"

However, I would like to be able to form a query that returns all documents NOT containing a term. I have looked into stringing together a MatchAllDocsQuery and a TermQuery into a BooleanQuery, but I cannot seem to find the right combination.

If I index the following two documents

Doc0: content:The quick brown fox jumps over the lazy dog.
Doc1: (empty string)

The query *:* -content:fox returns both documents when I just want one document.

The RegexQuery content:^((?!fox).)*$ suggested by this StackOverflow answer returns one document but it does not seem to be working correctly because content:^((?!foo).)*$ returns one document as well when I expect it to return two documents.

I am aware of the performance implications of what I want to do. The query will only be run on a few documents so I am not too worried about performance.

Is there a way to write a Lucene query to get what I want?

2

2 Answers

5
votes

You can use match everything and exclude the term -

IndexSearcher searcher = new IndexSearcher("path_to_index");
MatchAllDocsQuery everyDocClause = new MatchAllDocsQuery();
TermQuery termClause = new TermQuery(new Term("text", "exclude_term"));
BooleanQuery query = new BooleanQuery();
query.add(everyDocClause, BooleanClause.Occur.MUST);
query.add(termClause, BooleanClause.Occur.MUST_NOT);
Hits hits = searcher.search(query);  

Else, have a dummy field which some fixed value and use query

+dummy_field:dummy_value -exclude_term
1
votes

Can't you append an "artificial" token to each document and then search for "'added token' and not 'what you want to avoid'" ?