1
votes

tl;dr:

What's the best way to bulk-fetch documents from Lucene using an exact-match on a set of keys?


Long version:

We have a Lucene index persisted to disk that is read through a DirectoryReader.

It contains 2,000,000 documents with the schema:

{"key": "20-character-string", "value": "1-1000-character-string"}

We now need to perform the equivalent of a SELECT document WHERE document.key IN $keyArray -- i.e. return the subset of documents whose keys intersect the $keyArray (a 10,000-item array of keys) using an exact-match.

Is there a better way than performing 10,000 separate searches?

1
I believe TermInSetQuery is what I'm after. - Lawrence Wagerfield

1 Answers

0
votes

You should use TermInSetQuery.

Under the hood it uses a sequence of BooleanQuery instances ORd together, if there are fewer than 16 terms in your set, else it uses something more efficient (presumably a hashset of sorts).