From a data structure point of view how does Lucene filter over a range of continuous values?
I understand that Lucene relies upon a compressed bit array data structure akin to CONCISE. Conceptually this bit array holds a 0 for every document that doesn't match a term and a 1 for every document that does match a term. But the cool/awesome part is that this array can be highly compressed and is very fast at boolean operations. For example if you want to know which documents contain the terms "red" and "blue" then you grab the bit array corresponding to "red" and the bit array corresponding to "blue" and AND them together to get a bit array corresponding to matching documents.
But Lucene also provides very fast lookup on ranges of continuous values and if Lucene still uses this same compressed bit array approach, I don't understand how this happens efficiently in computation or memory. Here's my assumption, you tell me how close I am: Lucene discretizes the continuous field and creates a bit array for each of these (now discrete) values. Then when you want a range across values you grab the bit arrays corresponding to this range and just AND them together. Since I suspect there would be TONS of arrays to AND together, then maybe you do this at several levels of granularity so that you can AND together big chunks as much as possible.
UPDATE: Oh! I just realized another alternative that would be faster. Since we know that the bit arrays for the discretized ranges mentioned above will not overlap, then we can store the bit arrays sequentially. If we have a start and end values for our range, then we can keep an index to the corresponding points in this array of bit arrays. At that point we just jump in the the array of bit arrays and start scanning it as if we're scanning a single bit array. Am I close?