I want to construct a Lucene query that only matches documents with exactly the terms I specify: no fewer, and no more. The "no fewer" part is easy: a BooleanQuery with all mandatory terms. However, I'm not sure how to do the "no more" part. In essence what I need is a query which says "the result documents cannot have any terms other than what I've specified in the query." Any ideas? Thanks!
3
votes
What do you mean by 'any fields other than specified in the query'? You mean 'terms'?
– Artur Nowak
Sorry, yes. So if I had a document with a field called "string", and Document A had values "mystring1" and "mystring2" for that field, while Document B had values "mystring1", querying using Document B would not return Document A since Document A has "mystring2", which Document B does not have.
– joshlf
1 Answers
5
votes
I think you can approach this problem as follows:
- you need to create an analyzer that will extract tokens, remove duplicates and then concatenate them in some order, (e.g. lexicographical). So if you have three documents:
doc1: "lorem ipsum", doc2: "lorem ipsum dolor", doc3: "lorem ipsum lorem"
It will produce the following values for them
doc1: "ipsum lorem", doc2: "dolor ipsum lorem", doc3: "ipsum lorem"
- then create a field that is filled by this analyzer
- finally, apply this analyzer to your query and match against this special field. So the only query term you would be using for query "lorem ipsum" would be "ipsum lorem"
The code to achieve this would be too long to fit in the answer, but I hope you get the general idea -- to create a field that you can match fully against.