0
votes

I had basic knowledge about elastic search.I come across the following phrase . From https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html

In the case that the primary itself fails, the node hosting the primary will send a message to the master about it. The indexing operation will wait (up to 1 minute, by default) for the master to promote one of the replicas to be a new primary.

The question, How node hosting the shard knows about the failure of the shard ? As I understand , shard is a lucene instance that runs on a data node.

3

3 Answers

1
votes

Most likely (with some improvements since elasticsearch version 1.4), this would be detected via checksum if any segment file within the shard has incorrect checksum, then the shard is marked corrupt. This may happen on recovery (after node starts up) or when any IO operation is done on the segment (ie when it is read by searching or via the merge policy)

Potentially, this page for 7.8 (select the version you use for accurate doc) mentions how to dismiss corrupt data or if data is important best way is to restore from snapshot : https://www.elastic.co/guide/en/elasticsearch/reference/7.8/shard-tool.html#_description_7

1
votes

I guess, you are getting confused in this statement

How node hosting the shard knows about the failure of the shard ? As I understand , shard is a lucene instance that runs on a data node.

while its true that every shard is a Lucene instance(index) but its not a 1:1 mapping and 1 data node of elasticsearch can host multiple shards not just 1 shard and failure of Lucene shard doesn't always mean the failure of data node.

Node holding the primary shard knows if its connected to network, whether its able to index the data or not or shard is corrupted or not as mentioned by @julian and then it can send that information to master node, which then promote other replicas to primary which is contained in cluster state which all nodes holds.

In network failure case, all the primary shards hosted on the nodes will be replaced by other shards and it's easy to detect as master will not a heart beat from that data node.

Hope bold part of my answer is what you were looking for, otherwise feel free to comment and would try to explain further.

0
votes

It's confusing at first sight. But if you look deeper, it is still a valid scenario and same mentioned in the document at high level.

Let's, say coordinator node receives a request to index the data. Master node maintains list of in-sync shards. Then master forwards the request to the node which has the primary shard. As you mentioned, shard is a Lucene core. The node which received has to index the data in the primary shard. Incase if it is not possible due to the portion of shard corrupted or so, then it will inform the master to elect another primary.

And master also monitors each shards and informs the other node to prepare a primary shard if needed. Demotes a shard from primary if needed. Master does more in this cases.

Elasticsearch maintains a list of shard copies that should receive the operation. This list is called the in-sync copies and is maintained by the master node

Once the replication group has been determined, the operation is forwarded internally to the current primary shard of the group