2
votes

I hava a four nodes ElasticSearch cluster.After i insert about 100G data into the cluster,i restart the cluster.I found that it cost a lot of time for recovering shards.I notice that :

  1. All primary shard recover from local node through gateway,it recover very quickly.
  2. All replica shard recover from primary shard,i found that the replica shard is seem to copy from the node where the primary shard in to another node in the cluster.
  3. After i finished the first long time restart,and then i shutdown my cluster and restart my cluster again,it just cost few minutes

I was very confused that why my shard copy again when i restart my cluster,where is the original replica data in my node ?

I have been read some relative questions such as :

quick recovery after node restart in elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-synced-flush.html

but it didn't work,Can I stop the shard migration across the nodes when i restart my cluster for the first time ?

1

1 Answers

2
votes

That the replicas recover slowly is OK. When the primary shards are recovered, the cluster is usable and should be yellow. It can accept queries, and will give priorities to this, throttling other operations like writing replicas.

The two links you cite are helpful. When a node is gone, the cluster assumes it's broken and starts to reshuffle like crazy. This doesn't make sense when a node is gone for short restart, or a bit late. In these cases the settings and disabling allocation helps. Also, newer releases wait a bit (https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html).

The link about the synced flush is also a good one. This means faster rebuild of the replicas. In my experience, it's not much faster, though.