6
votes

We have a Solr core that has about 250 TrieIntFields (declared as dynamicField). There are about 14M docs in our Solr index and many documents have some value in many of these fields. We have a need to sort on all of these 250 fields over a period of time.

The issue we are facing is that the underlying lucene fieldCache gets filled up very quickly. We have a 4 GB box and the index size is 18 GB. After a sort on 40 or 45 of these dynamic fields, the memory consumption is about 90% and we start getting OutOfMemory errors.

For now, we have a cron job running every minute restarting tomcat if the total memory consumed is more than 80%.

From what I have read, I understand that restricting the number of distinct values on sortable Solr fields will bring down the fieldCache space. The values in these sortable fields can be any integer from 0 to 33000 and quite widely distributed. We have a few scaling solutions in mind, but what is the best way to handle this whole issue?

UPDATE: We thought instead of sorting, if we did boosting it won't go to fieldCache. So instead of issuing a query like

select?q=name:alba&sort=relevance_11 desc

we tried

select?q={!boost relevance_11}name:alba

but unfortunately boosting also populates the field cache :(

2

2 Answers

2
votes

I think you have two options:

1) Add more memory.
2) Force Solr not to use the field cache by specifying facet.method=enum, as per documentation.

There's also a solr-user mailing list thread discussing the same problem.

Unless your index is huge, I'd go with option 1). RAM is cheap these days.

0
votes

We have a way to rework the schema by keeping a single sort field. The dynamic fields we have are like relevance_CLASSID. The current schema has a unique key NODEID and a multi-valued field CLASSID - the relevance scores are for these class Ids. If we instead keep one document per classId per nodeId i.e. the new schema will have NODEID:CLASSID as unique key and store some redundant information across documents with the same NODEID, then we can sort on a single field relevance and do a filter query on CLASSID.