0
votes

I have an Elasticsearch (v5.6.10) cluster with 3 nodes.

  • Node A : Master
  • Node B : Master + Data
  • Node C : Master + Data

There are 6 shards per data node with replication set as 1. All 6 primary nodes are in Node B and all 6 replicas are in Node C.

My requirement is to take out the Node B, do some maintenance work and put it back in cluster without any downtime.

I checked elastic documentation, discussion forum and stackoverflow questions. I found that I should execute the below request first in order to to allocate the shards on that node to the remaining nodes.

curl -XPUT localhost:9200/_cluster/settings -H 'Content-Type: application/json' -d '{
  "transient" :{
      "cluster.routing.allocation.exclude._ip" : <Node B IP>
   }
}';echo

Once all the shards have been reallocated I can shutdown the node and do my maintenance work. Once I am done, I have to include the node again for allocation and Elasticsearch will rebalance the shards again.

Now I also found another discussion where the user faced an issue of yellow/red cluster health issue due to having only one data node but wrongly setting replication as one, causing unassigned shards. It seems to me while doing this exercise I am taking my cluster towards that state.

So, my concern here is whether I am following the correct way keeping in mind all my primary shards are in the node (Node B) which I am taking out of a cluster having replication factor as 1.

1

1 Answers

1
votes
  1. With only two data nodes and you wanting to shut down one, you can't really reallocate the shards. Elasticsearch never allocates primary and replica shards on the same nodes; it wouldn't add any benefit in availability or performance and would only double the disk space. So your reallocation command wouldn't add any benefit here since the shards cannot be moved anywhere.
  2. Do a synced flush and then an orderly shutdown of the node. The replica shards on the remaining node will automatically be promoted to primary shards. Your cluster will go yellow until the other node joins again, but there isn't really a way around it in your scenario (without being either a hack or an overkill). But this is fine — as long as you always have a replica it will be on the other node and your cluster will keep working as expected.