Elasticsearch unassigned shards warning every couple of hours

Question

Our cluster has 3 elasticsearch data pods / 3 master pods / 1 client and 1 exporter. The problem is the alert "Elasticsearch unassigned shards due to circuit breaking exception". You can check further about this in this question

Now, by making the curl call http://localhost:9200/_nodes/stats, I've figured that the heap usage is average across data pods.

The heap_used_percent of eleasticsearch-data-0, 1 and 2 are 68%, 61% and 63% respectively.

I made below API calls and can see the shards are almost evenly distributed.

curl -s http://localhost:9200/_cat/shards |grep elasticsearch-data-0 |wc -l

curl -s http://localhost:9200/_cat/shards |grep elasticsearch-data-1 |wc -l

curl -s http://localhost:9200/_cat/shards |grep elasticsearch-data-2 |wc -l

Below is the output of allocate explain curl call

curl -s http://localhost:9200/_cluster/allocation/explain | python -m json.tool

{
    "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
    "can_allocate": "no",
    "current_state": "unassigned",
    "index": "graph_24_18549",
    "node_allocation_decisions": [
        {
            "deciders": [
                {
                    "decider": "max_retry",
                    "decision": "NO",
                    "explanation": "shard has exceeded the maximum number of retries [50] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:18:44.115Z], failed_attempts[50], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:16:42.146Z], failed_attempts[49], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid2], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:15:05.849Z], failed_attempts[48], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [tsg_ngf_graph_1_mtermmetrics1_vertex_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid3], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:11:50.143Z], failed_attempts[47], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[o_9jyrmOSca9T12J4bY0Nw], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid4], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:08:10.182Z], failed_attempts[46], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid6], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:07:03.102Z], failed_attempts[45], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid7], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:05:53.267Z], failed_attempts[44], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid8], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:04:24.507Z], failed_attempts[43], delayed=false, details[failed shard on node [nodeid1]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid1], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid9], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:03:02.018Z], failed_attempts[42], delayed=false, details[failed shard on node [nodeid2]: failed to perform indices:data/write/bulk[s] on replica [graph_24_18549][0], node[nodeid2], [R], recovery_source[peer recovery], s[INITIALIZING], a[id=someid10], unassigned_info[[reason=ALLOCATION_FAILED], at[2020-10-31T09:01:38.094Z], failed_attempts[41], delayed=false, details[failed shard on node [nodeid1]: failed recovery, failure RecoveryFailedException[[graph_24_18549][0]: Recovery failed from {elasticsearch-data-2}{}{} into {elasticsearch-data-1}{}{}{IP}{IP:9300}]; nested: RemoteTransportException[[elasticsearch-data-2][IP:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2012997826/1.8gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2012934784/1.8gb], new bytes reserved: [63042/61.5kb]]; ], allocation_status[no_attempt]], expected_shard_size[4338334540], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[5040039519], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2452709390/2.2gb], which is larger than the limit of [1972122419/1.8gb], real usage: [2060112120/1.9gb], new bytes reserved: [392597270/374.4mb]]; ], allocation_status[no_attempt]], expected_shard_size[2606804616], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4799579998], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[4012459974], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2045921066/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1770141176/1.6gb], new bytes reserved: [275779890/263mb]]; ], allocation_status[no_attempt]], expected_shard_size[3764296412], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: AlreadyClosedException[engine is closed]; ], allocation_status[no_attempt]], expected_shard_size[2631720247], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2064366222/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1838754456/1.7gb], new bytes reserved: [225611766/215.1mb]]; ], allocation_status[no_attempt]], expected_shard_size[3255872204], failure RemoteTransportException[[elasticsearch-data-0][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2132674062/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1902340880/1.7gb], new bytes reserved: [230333182/219.6mb]]; ], allocation_status[no_attempt]], expected_shard_size[2956220256], failure RemoteTransportException[[elasticsearch-data-1][IP:9300][indices:data/write/bulk[s][r]]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [2092139364/1.9gb], which is larger than the limit of [1972122419/1.8gb], real usage: [1855009224/1.7gb], new bytes reserved: [237130140/226.1mb]]; ], allocation_status[no_attempt]]]"
                },
{
                    "decider": "same_shard",
                    "decision": "NO",
                    "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[graph_24_18549][0], node[nodeid2], [P], s[STARTED], a[id=someid]]"
                }
            ],
            "node_decision": "no",
            "node_id": "nodeid2",
            "node_name": "elasticsearch-data-2",
            "transport_address": "IP:9300"
        }

What needs to be done now? Because I don't see the heap is shooting. I've already tried below API, which helps and assigns all unassigned shards, but the problem reoccurs every couple of hours.

curl -XPOST ':9200/_cluster/reroute?retry_failed

please add /_cat/allocation?v and /_cat/thread_pool/search?v and /_cat/thread_pool/write?v and /_cat/nodes?v to question. — hamid bayat
what is the number of shard per indices? do you have heavy index/update indices with 1 shard? — hamid bayat
The heap usage seems to be sorted now and I don't notice spike. Yet it's failing. Have edited the question appropriately. Thanks Hamid — Gokul

Ricardo Ricardo · Accepted Answer · 2021-01-08T00:54:15

Which ElasticSearch version are you using? 7.9.1 and 7.10.1 have better retry failed replication due to CircuitBreakingException and better indexing pressure

I would recommend you to try upgrading you cluster. Version 7.10.1 seems to have fixed this issue for me. See more: Help with unassigned shards / CircuitBreakingException / Values less than -1 bytes are not supported

Elasticsearch unassigned shards warning every couple of hours

1 Answers