4
votes

How can I get the number of Hits per document in Lucene in Java. I have

 
   IndexReader reader;
   reader = IndexReader.open(FSDirectory.open(new File(index)), true);
   Searcher searcher = new IndexSearcher(reader);
   String feild = "contents"
   QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field,analyzer);
   Query query = parser.parse("test");
   TopScoreDocCollector collector = TopScoreDocCollector.create(
                    5 * hitsPerPage, false);
   searcher.search(query, collector);
   ScoreDoc[] hits = collector.topDocs().scoreDocs;
   Searcher searcher = new IndexSearcher(reader);
   int numTotalHits = collector.getTotalHits();
   System.out.println(numTotalHits + " total matching documents");

for (int i = start; i < end; i++) { int id = hits[i].doc; TermFreqVector[] Tfv = reader.getTermFreqVectors(id);

The tfv is getting to be null :( Can some one direct on how to get the hits in each document from there.

EDIT:

If we set the TermVector.YES while indexing it works.

2

2 Answers

1
votes

You can write custom Similarity implementation. You will gain access to term frequency which will give you number of times given terms occurs in given document.

1
votes

This is a duplicate of Get search word Hits ( number of occurences) per document in Lucene

As that answer says, you can use the term freq vector. jarekrozanski's answer is faster, but you will need to make a custom similarity class, which you might dislike doing.