0
votes

As I am new to elasticsearch using [elasticsearch version 7.4] and with a lot of studies it is not clear till now how much shards / Nodes are preferred in particular index.  As of now, I have configured 3 shards and 2 replicas with 3 Nodes(each having 8GB RAM, 500GB HDD). and having 55GB of Data in One Index.  So I need your views/suggestions in the following points.

  1. Is above given no of shards, Nodes, replicas is sufficient.  
  2. For CAP theorem I will prefer CP i.e: Consistency and Partition-tolerance for this in 3 Node cluster 
    • For Consistency configured write_consistency=all 
    • For Partition-tolerance set master-eligible node to (N/2) + 1 in my case it is 3.
1

1 Answers

0
votes

I can hopefully give you some useful advice from my time running Elasticsearch clusters :)

1)

Shards: See this blog post for more information, but your average shard will be 55gb/3 = 18gb which is a good shard size (in my experience it's best to keep shards between 5gb-25gb, the ES docs recommend this as well).

Replicas: 2 replicas is my go-to for a good balance between failure tolerance and performance, so this is good.

Nodes: Those 3 nodes should be sufficient, and you won't need that much disk. With 2 replicas you'll have roughly 55gb * 3 = 165gb of data stored (could be more depending on your mapping) across 1500gb of hard drive, so perhaps you could save some money by using nodes with 100gb disks.

2)

For partition tolerance I might suggest setting write_consistency=quorum. That way, even if you lose a node and therefore 1 replica shard you'll still be able to write with 1 primary and 1 replica left. Otherwise, you'd need to reboot/recreate that node to start writing again. See https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-index_.html#index-consistency for more details.

Master-eligible: Yes I recommend a minimum of 3 master nodes, so you'll want to set all 3 of these nodes to be master and data nodes.