1
votes

We have an elastic search cluster with 5 data nodes and 2 master node. The elastic search service on one master node is always disabled so that only only one master is active all the time. Today due to some reason whatsoever, the current master node was down. We started the service on the second master node. All the data nodes connected to the new master, all the primary shards got assigned successfully, but all the replicas weren't assigned and I am left with almost 384 unassigned shards.

What should I do now, to assign them?

What is the best practice that and steps that has to be carried out during such circumstances?

Following is how my http://es-master-node:9200/_settings looks like: http://pastebin.com/mK1QBfP6

When I try to manually allocate the shards, I get the following error:

➜  Desktop curl -XPOST http://localhost:9200/_cluster/reroute\?pretty -d '{
  "commands": [
    {
      "allocate": {
        "index": "logstash-1970.01.18",
        "shard": 1,
        "node": "node-name",
        "allow_primary": true
      }
    }
  ]
}'
{
  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "[allocate] allocation of [logstash-1970.01.18][1] on node {node-name}{vrVG4CBbSvubWHOzn2qfQA}{10.100.0.146}{10.100.0.146:9300}{master=false} is not allowed, reason: [YES(allocation disabling is ignored)][NO(more than allowed [85.0%] used disk on node, free: [13.671127301258165%])][YES(shard not primary or relocation disabled)][YES(target node version [2.2.0] is same or newer than source node version [2.2.0])][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(allocation disabling is ignored)][YES(below shard recovery limit of [2])][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][YES(node passes include/exclude/require filters)][YES(primary is already active)]"
    } ],
    "type" : "illegal_argument_exception",
    "reason" : "[allocate] allocation of [logstash-1970.01.18][1] on node {node-name}{vrVG4CBbSvubWHOzn2qfQA}{10.100.0.146}{10.100.0.146:9300}{master=false} is not allowed, reason: [YES(allocation disabling is ignored)][NO(more than allowed [85.0%] used disk on node, free: [13.671127301258165%])][YES(shard not primary or relocation disabled)][YES(target node version [2.2.0] is same or newer than source node version [2.2.0])][YES(no allocation awareness enabled)][YES(shard is not allocated to same node or host)][YES(allocation disabling is ignored)][YES(below shard recovery limit of [2])][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][YES(node passes include/exclude/require filters)][YES(primary is already active)]"
  },
  "status" : 400
}

any help will be appreciated.

1

1 Answers

0
votes

So, here are the things that I did to allocate the unassigned shards:

Spawn 5 new ES-DATA server and waited for them to Join the cluster. Once they were in the cluster, I used the following script:

#!/bin/bash
array=(node1 node2 node3 node4 node5)
node_counter=0
length=${#array[@]}
IFS=$'\n'
for line in $(curl -s 'http://ip-adress:9200/_cat/shards'|  fgrep UNASSIGNED); do
    INDEX=$(echo $line | (awk '{print $1}'))
    SHARD=$(echo $line | (awk '{print $2}'))
    NODE=${array[$node_counter]}
    echo $NODE
    curl -XPOST 'http://IP-adress:9200/_cluster/reroute' -d '{
        "commands": [
        {
            "allocate": {
                "index": "'$INDEX'",
                "shard": '$SHARD',
                "node": "'$NODE'",
                "allow_primary": true
            }
        }
        ]
    }'
    node_counter=$(((node_counter)%length +1))
done

to assign the unassigned shards to the new data nodes. It took about 5 to 6 for the cluster to recover again. Although this is hack, a relevant answer would make more sense.

Following are the unanswered questions:

  • The shards were already there on the old nodes, why didn't ES-Master realize that?
  • How can we explicitly ask ES-MASTER to scan the already existing data node and get information from them (about their current state, replicas they have, shards that they contain, etc)