Hadoop verification block

Question

I have a problem when start hadoop.

DataBlockScanner consume up to 100% of one CPU.

Master log is:

2012-04-02 11:25:49,793 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.processReport: from 192.168.33.44:50010, blocks: 16148, processing time: 13 msecs

Slave log is:

2012-04-02 11:09:34,109 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1757906724564777881_10532084

I checked hadoop fsck and found no error or corrupt block.

Why is the CPU usage so high, and how to stop the block verification?

Chris White Chris White · Accepted Answer · 2012-07-02T11:11:05

Without digging through the source to confirm, this is probably only a problem on startup, as the datanode has to tree walk the data directory (/ies) to discover all the blocks and then report them to the namenode. Again without the source i'm unable to confirm as to whether the checksums of each block are verified on startup too, which could be the cause for the 100% CPU.

Hadoop verification block

2 Answers