2
votes

In Lucene, I want to know about the number of accesses in inverted index.

Maybe, Lucene has the inverted index like this,

cat dog
----- -----
d01 d02
d02 d01
d03 d03
----- -----

If I use query "cat dog", Lucene will access the inverted index consecutively. I ask top-2 result then, with only 4 accesses Lucene will return d01, d02. In that case, I want to know the access time (in this example "4").

Currently, I use Lucene like this.

Query q = new QueryParser(Version.LUCENE_35, "title", analyzer).parse(querystr);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index, true);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;

Thank you.

1

1 Answers

0
votes

Asymptotically, if there are p matches and you're finding the top k, the time will be p log k. So in your case, 6 log 2 = 6. (Of course with such small numbers, this formula gives ridiculous results).

See this for more info.

Note that "top two" doesn't mean "first two", but rather "two highest scoring". Depending on the weights in your example, it's possible that Lucene could ignore d03.