In SQL an index is typically some kind of balanced tree (ordered nodes that point to the real table row to make searching possible in O(log n)
). Walking through such a tree tree actually is the searching process.
Now Lucene uses an inverted index with term frequencies: It stores for each term how often it occurs in which documents. This is easy to understand. But this doesn't explain how a search on such an index is actually performed.
The search string is analyzed and split the same way into terms, of course, and then "the index is searched" for these terms to find the documents containing them – but how? Is the Lucene index itself also ordered and tree-like organized in some way for O(log n)
to be possible? Or is walking the Lucene index on search-time actually linear, so O(n)
?