14
votes

Does running multiple Solr shards on a single machine improve performance? I would expect Lucene to be multi-threaded, but it doesn't seem to be using more than a single core on my server with 16 physical cores. I realize this is workload dependent, but any statistics or benchmarks would be very useful!

2
Did you read Hacker News yesterday, by any any chance? carsabi.com/car-news/2012/03/23/…Jesvin Jose
Yep, I wrote that :) I was hoping other people had some stats that I could compare with thoughcberner
@cberner Is any of this true for Index performance or is that a completely different animal? I need to update my index frequently with user content and am looking to speed it up.ted.strauss
@ted.strauss I didn't test it with indexing, since we were only indexing tens or hundreds of items per second. My guess would be that indexing is different, and wouldn't benefit, but that's just a guess. However, one thing I found helped a lot with indexing was enabling soft-commits, if you need near real time updatescberner
@cberner thanks for your helpful comments. esp since my question is languishing stackoverflow.com/q/13500955/241677ted.strauss

2 Answers

13
votes

I ran some benchmarks of our search stack, and found that adding more Solr shards (on a single machine, with 16 physical cores) did improve performance up to about 8 shards (where I got a 6.5x speed up). This is on an index with ~1.5million documents, running complex range queries.

So, it seems that Solr doesn't take advantage of multiple physical cores, when running queries against a single index.

0
votes

If you currently have a single box with a single shard, then splitting this shard into several shards:

  • is likely to worsen throughput,
  • may improve latency, by parallelizing query execution.

I can't provide you with statistics or benchmarks because it depends on whether query execution is CPU or I/O bound: if query execution is already I/O bound on a single box then splitting the shard into several shards will even worsen throughput. You will need to test yourself, just take a production log and try to replay it in both scenarii.