I am using Lucene 6.x.x and I am not sure about any easy way but a solution is better than no solution at all. Something like this works for me using - MatchAllDocsQuery
.
private static void printWholeIndex(IndexSearcher searcher) throws IOException{
MatchAllDocsQuery query = new MatchAllDocsQuery();
TopDocs hits = searcher.search(query, Integer.MAX_VALUE);
Map<String, Set<Integer>> invertedIndex = new HashMap<>();
if (null == hits.scoreDocs || hits.scoreDocs.length <= 0) {
System.out.println("No Hits Found with MatchAllDocsQuery");
return;
}
for (ScoreDoc hit : hits.scoreDocs) {
Document doc = searcher.doc(hit.doc);
List<IndexableField> allFields = doc.getFields();
for(IndexableField field:allFields){
//Single document inverted index
Terms terms = searcher.getIndexReader().getTermVector(hit.doc,field.name());
if (terms != null ) {
TermsEnum termsEnum = terms.iterator();
while(termsEnum.next() != null){
if(invertedIndex.containsKey(termsEnum.term().utf8ToString())){
Set<Integer> existingDocs = invertedIndex.get(termsEnum.term().utf8ToString());
existingDocs.add(hit.doc);
invertedIndex.put(termsEnum.term().utf8ToString(),existingDocs);
}else{
Set<Integer> docs = new TreeSet<>();
docs.add(hit.doc);
invertedIndex.put(termsEnum.term().utf8ToString(), docs);
}
}
}
}
}
System.out.println("Printing Inverted Index:");
invertedIndex.forEach((key , value) -> {System.out.println(key+":"+value);
});
}
Two points,
1.maximum documents supported - Integer.MAX_VALUE
. I have not tried but probably, this limit can be eliminated using searchAfter
method of searcher and performing multiple searches.
2.doc.getFields()
returns only fields that are stored. Probably, you can keep a static field array if all of your indexed fields are not stored since line , Terms terms = searcher.getIndexReader().getTermVector(hit.doc,field.name());
works for not stored fields too.