0
votes

I have some corrupt blocks in my hadoop cluster and the replication factor that we use is 3 . my understanding is even if a block is corrupt we will be having 2 more good replicas in other nodes. when i do a fsck in a good file path i get the details below with location of all the replicas: /location/to/goodfile1 29600 bytes, 1 block(s): OK 0. BP-xxxx-xx.1xx.1xx.xx-1364828076720:blk_1114138336_1099565732615 len=29600 Live_repl=3 [/default/xx.1xx.1xx.xx:50010, /default/xx.1xx.1xx.xx:50010, /default/xx.1xx.1xx.xx:50010]

Status: HEALTHY Total size: 29600 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 1 (avg. block size 29600 B) Minimally replicated blocks: 1 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 3.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 14 Number of racks: 1 FSCK ended at Fri Dec 29 02:32:32 MST 2017 in 1 milliseconds

but when i do a fsck /corruptfile -blocks -locations -files to a corrupt file , i donot get the replica locations , also i see the average block replication as 0.0: Status: CORRUPT Total size: 27853 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 1 (avg. block size 27853 B)


UNDER MIN REPL'D BLOCKS: 1 (100.0 %) dfs.namenode.replication.min: 1 CORRUPT FILES: 1 MISSING BLOCKS: 1 MISSING SIZE: 27853 B CORRUPT BLOCKS: 1


Minimally replicated blocks: 0 (0.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 0.0 Corrupt blocks: 1 Missing replicas: 0 Number of data-nodes: 14 Number of racks: 1 FSCK ended at Fri Dec 29 02:39:50 MST 2017 in 0 milliseconds

can any one explain : 1)as i see avg replication as 0.0 , does that mean we donot have replicas for the corrupt block 2)we generally remove the corrupt block to make the cluster healthy , in this case is this a correct option to remove the block. 3)why dont i see replica location for this corrupt block. 4)can anyone post a sample of a FSCK on their corrupt block.

Thank you.

1

1 Answers

0
votes

you can check the namenode:50075/blockScannerReport?listblocks and it will list the all blocks status(very long page will occur),

So when you check the fsck(file system checking utility) -

hadoop fsck -block -location -racks fullAddressOfFileInHDFS

so after you get and you have also illusted the list of -

 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)

actually your Average block replication: must be 1.0 for fresh and good health, but it is showing 0.0 just because of the Corrupt Blocks: 1

See here the block is got corrupted not the file, so here are several ways -

  1. why not you first get the file locally using hadoop fs -get and if the file locally you get is good and as is later drop the file from the cluster and thus again put the file in the same location it was using hadoop.

  2. secondly, find the file of the block or if you have the file, check the health status, which is shown healthy, then enter the hadoop dfsadmin safemode enter done the maintenance, check the data nodes manually, after configuration, leave the safemode, hadoop dfsadmin -refreshNodes and later run the hadoop balancer command, it will solve the issue because there are many possibilities of failure with point 1 for those other tools connectivity and dependent on that file.

I mentioned what I think so, choice is yours, Happy new year 2018 in Advance, thanks.