2
votes

I have a server of 12 cores, 2.4GHz and 64GB RAM. I have 8 shards, each is about 400GB size on the same machine. Total index is taking about 3TB space of 4TB SSD.

I am using some solr complex operations like highlighting, faceting, grouping and the query performance is very slow (few seconds to minute)

If i increase the number of shards, does that improve performance ? Is there anyway to improve the performance without splitting the shards on multiple machines ?

3

3 Answers

1
votes

Increasing the number of shards, might, depending on usage, decrease performance. There is an overhead cost of using sharding, for example that each shard might have to retrieve N documents to be able to satisfy the "give me N rows" requirement (since one shard might contain all the documents in the result set). For faceting this will have to generate facets on each shard, then join them on the shard that answers the query, creating another overhead factor.

You might also want to look into the cache performance, to see if you're ejecting the cache each and every time (.. and keep in mind that you'll have one cache for each shard).

The only usable answer for your requirement is "try and see" (experiment with the different sizes). I'd try to avoid creating artificial, local shards if possible. Sharding is useful to distribute an index across several servers and not internally on one server (although there are use cases for that as well - but it's not performance as far as I know).

See SolrPerformanceProblems and SolrPerformanceFactors

0
votes

SSDs helps tremendously with the random IO that Solr needs, but they are not magical. Assuming you also update your shards, 64GB RAM for 3TB of index seems underpowered to me. I am guessing that most of your memory is used just running the Solr instance?

We have a multi-TB Solr setup with relatively little free memory for disk cache. We ran some tests and saw a huge degradation of performance when going from 0.5% of the index in disk cache to 0.1%. The limits for your setup are likely to be different, but if your box has very little free memory (I'll hand wave and say 10GB), my guess is that adding a relatively modest amount of RAM (let's say 32 or 64GB) will help a lot.

0
votes

Sharding on the same machine won't give you any performance benefits, since 64Gb of RAM will be shared across all shards. For the best query performance you would want to store entire index in a memory (ideal case). Deploying each shard on separate machine should improve query performance.