What does the in-memory representation (as opposed to the file format) of Lucene's index look like? Is the whole reverse index loaded into memory e.g. as an array of posting lists (where each posting list contains document IDs, terms frequencies in the document, and positions)? Something like
class Posting {
private int docID;
private int termFreq;
private int[] termPositions;
}
class PostingList {
private Posting[] postings;
}
public class SomeClassThatHoldsTheIndexInMemory {
private PostingList[] index; // Indexed by some internal term ID?
}
I understand that everything that makes up the index (including auxiliary information about terms) might not be held in memory, but surely something is?
Which classes define the in-memory representation of the index? If the index looks something like the above, how does Lucene go from a term (a string) to a term ID (an int)?