What are the internals of storage and search that allow this? As in the nitty gritties?
For example, I have a million documents matched by a term and a million others matched by a second term of an AND query. How does lucene do an intersection so fast in giving me top k?
Does it store the document in order of increasing doc IDS for every term? And then when two terms' documents have to be intersected, it looks for the first common k documents in both sets by iterating over them both incrementally, in a single pass.
Or, does it use simple unordered hash set from the larger documents array to find the common documents?
Or are both such(or possibly more) types of intersection polices used depending on the number of documents asked by user, those matched by individual terms etc among other factors?
Any articles which could point out the nitty gritty of document array merge will be appreciated.
Edit: Thanks for the info guys. It makes sense now. Skip lists do the magic. I will dig more into it to gain clear understanding.