Restarted namenode suffer from block report storm

Question

When our standby namenode based on hadoop v2.4.1 restarted from failure, we found that the namenode was too busy to response timely after it leaving safemode. We dumped several stacks, all of them looked like this,

Thread 212 (IPC Server handler 148 on 8020):
  State: WAITING
  Blocked count: 66
  Waited count: 598
  Waiting on java.util.concurrent.locks.ReentrantLock$FairSync@60ea5634
  Stack:
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
    java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
    java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:229)
    java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
    org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeLock(FSNamesystem.java:1378)
    org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processReport(BlockManager.java:1676)
    org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReport(NameNodeRpcServer.java:1019)
    org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReport(DatanodeProtocolServerSideTranslatorPB.java:152)
    org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28061)
    org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
    org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
    org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
    java.security.AccessController.doPrivileged(Native Method)
    javax.security.auth.Subject.doAs(Subject.java:415)
    org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
    org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

Almost all server handlers are waiting for the FSNameSystem#writeLock to process incremental/full reports!

Conf:

dfs.blockreport.initialDelay: 120.
dfs.blockreport.intervalMsec: 6h.
server handlers number: 200.
datanodes number:400.
The namenode takes 0.5~1s to process a block report.

the log of datanodes see a lot offerService IOExceptions and retries.

The NN log indicated that a storage of a datanode had been processed for more than twice, some even up to 10.

blockLog.info("BLOCK* processReport: from storage " + storage.getStorageID() + " node " + nodeID + ", blocks: " + newReport.getNumberOfBlocks() + ", processing time: " + (endTime - startTime) + " msecs");

Has anyone seen the same problem, Any ideas?

I think the problem is related with the conf dfs.blockreport.initialDelay. — Dengzh
Your problem clearly is contention on modifying the block map. Randomizing the initial delay on different datanodes might help, but it will certainly still take a long time to get the namenode out of safe mode. Also try playing around with the RDP timeouts. — Thomas Jungblut

Dengzh Dengzh · Accepted Answer · 2015-12-15T12:04:32

At last we reduce the num of server's IPC handler according to the server's feature, and solve the problem. Hope this will help those who stuck in the same trouble!

Restarted namenode suffer from block report storm

1 Answers