2
votes

I have a large Solr search index, with many segments. I want to merge them to consume less space on the disk and speed up the search by scanning a smaller index, by removing deleted documents form the search during the segment merge.

The default behavior of optimize is that all segments are merged until only one segment is left. I want to avoid that and stop earlier with a specified number of segments left. Otherwise the merge might fail with an out of memory exception, while trying to merge two chunks which are together larger than the available RAM.

1

1 Answers

3
votes

first let's have a look at the index segments on the disk:

tomcat/solr/coreName/data/index$ ls -htlr --sort=size | grep .nvd

will output the index segments ordered by size, so you can count how many segments should be left over when stopping the merge. Merging starts always with the smallest chunks first.

curl -X POST http://localhost:8080/solr/coreName/update -H "Content-Type: text/xml" --data-binary '<update> <optimize maxSegments="80"/> </update>'

Will trigger an optimize on the index merging multiple segments into one larger result, according to the configured maxMergeAtOnce in the mergeFactor in the solrConfig.xml.

The maxSegments parameter specifies how many segments should be left over when the merging should stop. So you can stop before merging the largest chunks of your index.

Make sure to send a POST body containing the update XML <update> <optimize maxSegments="80"/> </update> wrapping the optimize command with the maxSegments parameter set. Sending the parameters as query parameters on a GET request will not work.

I also noticed that I needed to restart Solr to clean up the old merged index files on the disk. Before the restart and already after the successful merge, the index files where still present on the disk.