We are currently have a Solr instance that has about 50 million documents. There is a long
field that we often sort by, using the standard long
field type with a precisionStep
of zero:
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<field name="row" type="long" indexed="true" stored="true" />
When it comes to doing a sort, the index needs to be loaded in memory. In our case, with a large range of row
values, we need between 500m and 1g of heap to do the sort.
I am wondering whether this memory usage requirement can be reduced somehow.
Would increasing the precisionStep
of the row
field decrease the index size and therefore reduce the amount of memory required for sorting? Is there a trade-off when doing this against sorting speed? And would sorts still be totally correct with a higher precision step (the row values must strictly be in order)?
1GB of heap is pretty acceptable now but I am a bit wary if we add lots more documents with more row
values that the memory requirements will become too high.
(added after jpountz's answer)
While this fits in memory currently, it wont scale with the number of documents we are expecting to add in the next few months. We will probably get the results from Solr unsorted and sort them on the client side with the disk-based java-merge-sort.