4
votes

I read from hadoop operations that if a datanode fails during writing process,

A new replication pipeline containing the remaining datanodes is opened and the write resumes. At this point, things are mostly back to normal and the write operation continues until the file is closed. The namenode will notice that one of the blocks in the file is under-replicated and will arrange for a new replica to be created asynchronously. A client can recover from multiple failed datanodes provided at least a minimum number of replicas are written (by default, this is one).

But what happens if all the datanodes fail? i.e., minimum number of replicas are not written? Will client ask namenode to give new list of datanodes? or will the job fail?

Note : My question is NOT what happens when all the data nodes fails in the cluster. Question is what happens if all the datanodes to which the client was supposed to write, fails, during the write operation

Suppose namenode told the client to write BLOCK B1 to datanodes D1 in Rack1, D2 in Rack2 and D3 in Rack1. There might be other racks also in the cluster(Rack 4,5,6,...). If Rack1 and 2 failed during the write process, client knows that the data was not written successfully since it didn't receive the ACK from the datanodes, At this point, will it ask Namenode to give new set of datanodes? may be in the still alive Racks ?

1

1 Answers

1
votes

OK I got what you are asking. DFSClient will get a list of datanodes from the namenode where it is supposed to write a block (say A) of a file. DFSClient will iterate over that list of Datanodes and write the block A in those locations. If block write fails in the first datanodes, it'll abandon the block write and ask namenode a new set of datanodes where it can attempt to write again.

Here the sample code from DFSClient that explains that -

private DatanodeInfo[] nextBlockOutputStream(String client) throws IOException {
    //----- other code ------
    do {
            hasError = false;
            lastException = null;
            errorIndex = 0;
            retry = false;
            nodes = null;
            success = false;

            long startTime = System.currentTimeMillis();
            lb = locateFollowingBlock(startTime);
            block = lb.getBlock();
            accessToken = lb.getBlockToken();
            nodes = lb.getLocations();

            //
            // Connect to first DataNode in the list.
            //
            success = createBlockOutputStream(nodes, clientName, false);

            if (!success) {
              LOG.info("Abandoning block " + block);
              namenode.abandonBlock(block, src, clientName);

              // Connection failed.  Let's wait a little bit and retry
              retry = true;
              try {
                if (System.currentTimeMillis() - startTime > 5000) {
                  LOG.info("Waiting to find target node: " + nodes[0].getName());
                }
                Thread.sleep(6000);
              } catch (InterruptedException iex) {
              }
            }
          } while (retry && --count >= 0);

          if (!success) {
            throw new IOException("Unable to create new block.");
          }
     return nodes;
}