4
votes

In some blogs and lucene website,I know lucene use data structure "skip list" in inverted index. But I have some puzzle about it.

1:In general,skip list maybe used in memory ,but inverted index is stored in disk. So how lucene use it when search on the index? just scanning it on disk or load it to memory?

2:skip list's insert operator often use random(0,1) to decide whether insert to next level,but in luncene introdution,it seems a fixed interval in every terms,so how lucene create the "skip list" different or not?

Please correct me if I am wrong.

2

2 Answers

3
votes

Lucene uses memory in a couple different ways, even though the index is persisted on a disk when the IndexReader is created for searching and for operations like sorting (field cache):

http://blog.mikemccandless.com/2010/07/lucenes-ram-usage-for-searching.html

Basically those binary files get copied into RAM for much faster scanning and reducing I/O. You get a hint in the above link how searching with some parameters can force Lucene to "skip terms in searching" Hence, where that data structure can be used.

Lucene is open source, so you can see the code for yourself what is being used in Java or Lucene.NET for the C# implementation.