The example in this question and some others I've seen on the web use postings
method of a TermVector
to get terms positions. Copy paste from the example in the linked question:
IndexReader ir = obtainIndexReader();
Terms tv = ir.getTermVector( doc, field );
TermsEnum terms = tv.iterator();
PostingsEnum p = null;
while( terms.next() != null ) {
p = terms.postings( p, PostingsEnum.ALL );
while( p.nextDoc() != PostingsEnum.NO_MORE_DOCS ) {
int freq = p.freq();
for( int i = 0; i < freq; i++ ) {
int pos = p.nextPosition(); // Always returns -1!!!
BytesRef data = p.getPayload();
doStuff( freq, pos, data ); // Fails miserably, of course.
}
}
}
This code works for me but what drives me mad is that the Terms
type is where the position information is kept. All the documentation I've seen keep saying that term vectors keep position data. However, there are no methods on this type to get that information!
Older versions of Lucene apparently had a method but as of at least version 6.5.1 of Lucene, that is not the case.
Instead I'm supposed to use postings
method and traverse the documents but I already know which document I want to work on!
The API documentation does not say anything about postings returning only the current document (the one the term vector belongs to) but when I run it, I only get the current doc.
Is this the correct and only way to get position data from term vectors? Why such an unintuitive API? Is there a document that explains why the previous approach changed in favour of this?