0
votes

I would like to well understand Solr merge behaviour. I did some researches on the different merge policies. And it seems that the TieredMergePolicy is better than old merge policies (LogByteSizeMergePolicy, etc ...). That's why I use this one and that's the default policy on last solr versions.

First, I give you some interesting links that I've read to have a better idea of merge process : http://java.dzone.com/news/merge-policy-internals-solr http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

According to the official documentation of Lucene, I would like to ask several questions on it : http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/TieredMergePolicy.html

Questions

1- In the official documentation, there is one method called setExpungeDeletesPctAllowed(double v). And in the Solr 4.3.0, I have checked in the TieredMergePolicy class and I didn't find this method. There is another method that look like this one, called : setForceMergeDeletesPctAllowed(double v). Is there any differences between both methods ?

2- Are both methods above called only when you do a ExpungeDelete and an optimization or Are they called when a normal merge.

3- I've read that merges beetween segments are done according a pro-rata of deleted documents percentage on a segment. By default, this percentage is set to 10%. Does it possible to set this value to 0% to be sure that there is no more deleted documents in the index after merging ?

I need to reduce the size of my index without call optimize() method if it's possible. That's why any informations about merge process would be interesting for me.

Thanks

1

1 Answers

3
votes

you appear to be mixing up your documentation. If you are using Lucene 4.3.0, use the documentation for it (see the correct documentation for TieredMergePolicy in 4.3.0), rather than for version 3.2.0.

Anyway, on these particular questions: See #Lucene-3577

1 - Seems to be mainly a necessary name change, for all intents and purposes.

2 - Firstly, IndexWriter.expungeDeletes no longer exists in 4.3.0. You can use IndexWriter.forceMergeDeletes(), if you must, though it is strongly recommended against, as it is very, very costly. I believe this will only impact a ForceMergeDeletes() call. If you want to favor reclaiming deletions, set that in the MergePolicy, using: TieredMergePolicy.setReclaimDeletesWeight

3 - The percent allowed is right there in the method call you've indicated in your first question. Forcing all the deletions to be merged out when calling ForceMergeDeletes() will serve to make an already very expensive operation that much more expensive as well, though.

Just to venture a guess, if you need to save disk space taken by your index, you'll likely have much more success looking more closely at how much data your are storing in the index. Not enough information to say for sure, of course, but seems a likely solution to consider.