3
votes

We are currently have a Solr instance that has about 50 million documents. There is a long field that we often sort by, using the standard long field type with a precisionStep of zero:

<fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
<field name="row" type="long" indexed="true" stored="true" />

When it comes to doing a sort, the index needs to be loaded in memory. In our case, with a large range of row values, we need between 500m and 1g of heap to do the sort.

I am wondering whether this memory usage requirement can be reduced somehow.

Would increasing the precisionStep of the row field decrease the index size and therefore reduce the amount of memory required for sorting? Is there a trade-off when doing this against sorting speed? And would sorts still be totally correct with a higher precision step (the row values must strictly be in order)?

1GB of heap is pretty acceptable now but I am a bit wary if we add lots more documents with more row values that the memory requirements will become too high.


(added after jpountz's answer)

While this fits in memory currently, it wont scale with the number of documents we are expecting to add in the next few months. We will probably get the results from Solr unsorted and sort them on the client side with the disk-based java-merge-sort.

1

1 Answers

2
votes

The precisionStep parameter is only relevant for range queries. To perform a sort, Lucene needs to load field values in the field cache. A long being 8 bytes, the field cache for your field should require about 8B * 50M ~ 400 MB. If you really need a long for this field, there is no way to reduce memory usage (on the other hand, using an int instead would only require ~200MB).