1
votes

I am unable to understand the difference between mergefactor and minMergeDocs.

For e.g. I want to index 10,000 Documents and say 100 of those Documents fill up my RAM buffer, so Lucene will write out these 100 Documents as a file. Now if I set mergefactor=5, when a fifth segment is to be written to the disk, Lucene will merge all the existing segments to a single segment and so on.

1. Where does minMergeDocs fit in to this? If I have mergefactor=5 and minMergeDocs=10 -- Does mergefactor take precedence over minMergeDocs?

2. Also, when Lucene merges segments on the disk , does it also delete the individual segments, which are now part of the new segment file?

Thanks in advance for your response,

1

1 Answers

2
votes

Merge factor defines how often the segments will be merged. Default value is 10. That means, a new segment is created for every 10 documents. When the number of such segments reaches 10, the segments themselves are merged to create a single segment of 100 docs. This is the Log Merge Policy.

minMergeDocs no longer exists in Lucene 3.0.

To have finer control on indexing, you can use setMaxBufferedDocs() or setRAMBufferSizeMB() or setMaxMergeDocs() methods from IndexWriter.