0
votes

Here's the problem I'm trying to solve:

  • I have multiple lucene indices, each containing a subset of the same data structure (they have the same fields, but the fields may or may not be present in a document in a certain index)
  • There is a global identifier that is shared between indices. Meaning, if there are 4 indices, there may be up to 4 documents sharing a single key.
  • I have a single lucene query

I query all indices together using a MultiReader and I am able to find out which sub-index the hit is coming from using ReaderUtil. So far so good, but here's the problem:

In order to perform a (rather complex) merging logic, i need the documents from all subindices with any key that matched at least one document in the original query.

Here's an example:

Index 1

1: {key: "foo", name: "Name A", something: 42}

2: {key: "bar", something: 2}

Index 2

27: {key: "foo", something: 2}

Index 3

102: {key: "foo", name: "Name B"}

103: {key: "bar", something: 999}

Now, if I would perform a query for name "Name A", I would only get document 1 from index 1.

What I actually need are all documents from all indices with keys that were hit in that query, which are all document with key foo:

  • doc 1 from index 1
  • doc 27 from index 2
  • doc 102 from index 3

based on the original query for name: "Name A".

Can I achieve this without 2 separate queries, the second being a massive OR based on the keys retrieved in the first? Is there a more efficient way?

1

1 Answers

0
votes

Ok, here's how i got it to work:

use a TermFirstPassGroupingCollector with group field id, and perform a search using the actual search query (e.g. name: Name A)

TermFirstPassGroupingCollector firstPassCollector = new TermFirstPassGroupingCollector(
            "<grouping field name, e.g. id>",
            Sort.INDEXORDER,
            x);

searcher.search(query, firstPassCollector);

Collection<SearchGroup<String>> firstPassResult = firstPassCollector.getTopGroups(0, false)

then, use a second pass collector, and collect all fields within all groups, using a MatchAllDocsQuery:

TermSecondPassGroupingCollector secondPassCollector = new TermSecondPassGroupingCollector(
            fieldNaming.getIdFieldName(),
            firstPassResult,
            Sort.INDEXORDER,
            Sort.INDEXORDER,
            maxDocsPerGroup,
            false,
            false,
            false);

searcher.search(new MatchAllDocsQuery(), secondPassCollector);

I can now iterate over all my (matched) groups, and get all documents within each group, whether it was explicitly matched or not:

for (GroupDocs groupDocs : documentGroups) {

    if (groupDocs.totalHits == 0) {
        continue;
    }

    for (int doc : groupDocs.scoreDocs) {

        Document document = reader.document(doc);
        ...
    }
}

problem solved. make sure you handle weirdness like the first pass collector getTopGroups() returning null.