Hadoop: FSCK result shows missing replicas

Question

could anyone let me know how to fix missing replicas?

============================================================================

Total size: 3447348383 B

Total dirs: 120

Total files: 98

Total blocks (validated): 133 (avg. block size 25919912 B)

Minimally replicated blocks: 133 (100.0 %)

Over-replicated blocks: 0 (0.0 %)

Under-replicated blocks: 21 (15.789474 %)

Mis-replicated blocks: 0 (0.0 %)

Default replication factor: 3

Average block replication: 2.3834586

Corrupt blocks: 0

Missing replicas: 147 (46.37224 %)

Number of data-nodes: 3

Number of racks: 1

============================================================================

As per Indefinite guide,

Corrupt or missing blocks are the biggest cause for concern, as it means data has been lost. By default, fsck leaves files with corrupt or missing blocks, but you can tell it to perform one of the following actions on them:

• Move the affected files to the /lost+found directory in HDFS, using the -move option. Files are broken into chains of contiguous blocks to aid any salvaging efforts you may attempt.

• Delete the affected files, using the -delete option. Files cannot be recovered after being deleted.

Here my question is how to find out affected files? I have already worked with Hive to get the required outputs without any issue. will it affect performance/speed of query processing.

Regards,

Raj

highlycaffeinated highlycaffeinated · Accepted Answer · 2013-04-19T00:17:14

Missing replicas should be self-healing over time. However, if you're wanting to move them to lost+found, you can use:

hadoop fsck / -move

Or delete them with:

hadoop fsck / -delete

If you just want to identify the files with under-replicated blocks, use:

hadoop fsck / -files -blocks -locations

That will give you lots of detail, including the list of expected/actual block replication counts.

Hadoop: FSCK result shows missing replicas

1 Answers