1
votes

We are looking into addressing some performance issues with our ES cluster. We were looking into shard distribution on data nodes. I know that there is an advice to have shards evenly distributed between nodes - and here is my question:

For the cluster with 8 data nodes - we have some indexes that have 2 primary shards and 3 replicas (so 8 shards in total). We have also some indexes that have 1 primary shard and 3 replicas (so 4 in total).

My question is - is that setup can be consider "evenly distributed"? We were thinking that it is not and we were thinking about having indexes with 1 primary shard - 7 replicas (so every index will be hosted on 8 nodes) - but we don't know if such setup has any sense? If not - what would you recomend instead to distirbute shards more evenly?

Here is the result of shard cat query:

id1     0 p STARTED  2138  16.1mb x.x.x.x node1
id1     0 r STARTED  2138  16.1mb x.x.x.x node2
id1     0 r STARTED  2138  16.1mb x.x.x.x node3
id1     0 r STARTED  2138  16.1mb x.x.x.x node4
id2     0 r STARTED  3379  26.8mb x.x.x.x node5
id2     0 r STARTED  3379  26.8mb x.x.x.x node3
id2     0 r STARTED  3379  26.8mb x.x.x.x node4
id2     0 p STARTED  3379  26.8mb x.x.x.x node6
id3     0 r STARTED 20086  76.1mb x.x.x.x node1
id3     0 r STARTED 20086  76.1mb x.x.x.x node5
id3     0 p STARTED 20086  76.1mb x.x.x.x node6
id3     0 r STARTED 20086  76.1mb x.x.x.x node7
id4     0 r STARTED  2754   7.3mb x.x.x.x node2
id4     0 r STARTED  2754   7.3mb x.x.x.x node3
id4     0 r STARTED  2754   7.3mb x.x.x.x node8
id4     0 p STARTED  2754   7.3mb x.x.x.x node7
id5     0 r STARTED 10239  42.3mb x.x.x.x node1
id5     0 p STARTED 10239  42.3mb x.x.x.x node4
id5     0 r STARTED 10239  42.3mb x.x.x.x node6
id5     0 r STARTED 10239  42.3mb x.x.x.x node8
id6     0 r STARTED 13388  42.4mb x.x.x.x node1
id6     0 p STARTED 13388  42.4mb x.x.x.x node5
id6     0 r STARTED 13388  42.4mb x.x.x.x node3
id6     0 r STARTED 13388  42.4mb x.x.x.x node8
id7     1 r STARTED 27483 136.2mb x.x.x.x node2
id7     1 r STARTED 27483 136.2mb x.x.x.x node3
id7     1 r STARTED 27483 136.3mb x.x.x.x node8
id7     1 p STARTED 27483 136.2mb x.x.x.x node7
id7     0 r STARTED 27189 146.5mb x.x.x.x node1
id7     0 p STARTED 27189 146.6mb x.x.x.x node5
id7     0 r STARTED 27189 146.6mb x.x.x.x node4
id7     0 r STARTED 27189 146.7mb x.x.x.x node6
.kibana 0 r STARTED    13 106.8kb x.x.x.x node2
.kibana 0 p STARTED    13 106.8kb x.x.x.x node3
id8     1 r STARTED 13555  80.8mb x.x.x.x node2
id8     1 r STARTED 13555  80.8mb x.x.x.x node4
id8     1 r STARTED 13555  80.8mb x.x.x.x node8
id8     1 p STARTED 13555  80.8mb x.x.x.x node7
id8     0 r STARTED 13390    63mb x.x.x.x node1
id8     0 p STARTED 13390  62.7mb x.x.x.x node5
id8     0 r STARTED 13390  62.7mb x.x.x.x node6
id8     0 r STARTED 13390  62.8mb x.x.x.x node7
1
You are in a classical oversharding situation. You have to much small indices. Look if you can gather them, then do a forcemerge on this new bigger indices.Jaycreation
Content of indices is changing often - is it good idea to do forcemerge in such setup?Michał Przybylak
It's a good practice to do a forcemerge on a regular basis. But the first thing you need to do is to change your sharding strategy. (follow the article of Opster's.Jaycreation
@MichałPrzybylak, would be great if you can upvote and accept if my answer was helpful, TIA :)user156327

1 Answers

2
votes

Distributing all shards on all ES data nodes for every index doesn't make sense for various reasons.

  1. Number of primary shards should be chosen based on their size and helps you to horizontally scale the index.
  2. Number of replica shards helps you in high availability and increasing search performance.

It's really difficult to achieve the perfect shards balance in the ES cluster(based on a number of shards, size, and traffic), Although based on your shards size which is really small(lesser than 100MB), you can go with 1 shard and 7 replicas for your all indices, having said this you need to benchmark and choose the correct number of shards and replica based on your cluster setup and use-cases.