0
votes

I'm using Lucene 5.3 to index a set of documents and use BooleanQuery where each term in the query is boosted by some score.

My problem is when I search the index i get a lesser number of documents as hits than that are in my index.

    System.out.println( "docs in the index = " + reader.numDocs() );
     //e.g., docs in the index = 92
    TopDocs topDocs = indexSearcher.search( q, reader.numDocs() ); //this ensures no result is omitted from the search.
    ScoreDoc[] hits = topDocs.scoreDocs;
    System.out.println( "results found: " + topDocs.totalHits )
    //e.g., results found: 44

What is the reason for this behaviour? Does lucene ignore documents with a zero score?

How do I get all the documents in the index no matter what score they have?

1
Are you expecting this query to match all of the documents in your index? Or do you want it to return all of the documents whether they match the query or not?femtoRgon
i want to return all the documents. I will then rank them based on the score so the unmatching documents will be the bottom.samsamara
@KillBill Looking at the code in IndexSearcher::search(Weight weight, ScoreDoc after, int nDocs) there could be a potential of not returning docs based off the maxScore: return new TopDocs(totalHits, scoreDocs, maxScore); So I would use TopDocs searchAfter(ScoreDoc after, Query query, int n).D_K
@D_K which Lucene version it that? I don't see such a method (search(Weight weight, ScoreDoc after, int nDocs)) in Lucene 5.3.1 org.apache.lucene.search.IndexSearcher?samsamara
@KillBill Right, forgot to mention it was 4.10.2 lucene.D_K

1 Answers

0
votes

Lucene will only return results which actually match the query. If you want to get all the documents as results, you need to make sure they all match. You can do this with a MatchAllDocsQuery:

Query query = new BooleanQuery.Builder()
        .add(new BooleanClause(new MatchAllDocsQuery(), BooleanClause.Occur.MUST))
        .add(new BooleanClause(myOldQuery, BooleanClause.Occur.SHOULD))
        .build();