Elasticsearch sharding basics with respect to query speed

Question

Pardon my ignorance if this question is too broad or vague. I'm playing with Elasticsearch with out-of-the-box settings on my laptop and it works just great.

It's a 8 core Macbook and 6G of heap is given to Elasticsearch and it works pretty well for a large dataset (just over 7 Million documents).

I'm keen to set up a multinode cluster (2 machines) and before I assume a few things, I would like to get expert views on a few key points.

I understand "How many shards per node" is a very subjective question and one answer will not fit all situations.

I understand that sharding helps to distribute the indices to multiple nodes so that the storage footprint is optimal per node. But mainly, I'd like to understand how the sharding effects on the query speed & effective CPU cores utilization.

When a single search query comes in, does ES fire internal subqueries to all the shards in parallel, and therefore it can keep all the cores busy (if the no of shards equals no of cores)?

Can I also be pointed to a few useful links that will help me? Thanks.

Andrei Stefan Andrei Stefan · Accepted Answer · 2016-05-13T06:50:03

Your understanding is pretty much spot on.

The basic concept to understand is that one query on a single shard will use one thread. One thread boils down to one core CPU. If the query needs to touch multiple shards, then ES will make sure the shards involved are queried. This means each shard will do its part of the job using one thread.

The size of the shard and the complexity of the query translates to how much time is being spent in that thread. But the OS will not give one CPU core to that thread all the time, the OS it's scheduling jobs and other processes get a slice of the CPU core.

Ideally, yes, you would have number of shards = number of cores, but rarely clusters out there use this setup. Mainly those clusters that have a lot of concurrent requests per seconds and they demand a strict response time.

Elasticsearch sharding basics with respect to query speed

2 Answers

No of shards > No of cores