could anyone let me know how to fix missing replicas?
============================================================================
Total size: 3447348383 B
Total dirs: 120
Total files: 98
Total blocks (validated): 133 (avg. block size 25919912 B)
Minimally replicated blocks: 133 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 21 (15.789474 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.3834586
Corrupt blocks: 0
Missing replicas: 147 (46.37224 %)
Number of data-nodes: 3
Number of racks: 1
============================================================================
As per Indefinite guide,
Corrupt or missing blocks are the biggest cause for concern, as it means data has been lost. By default, fsck leaves files with corrupt or missing blocks, but you can tell it to perform one of the following actions on them:
• Move the affected files to the /lost+found directory in HDFS, using the -move option. Files are broken into chains of contiguous blocks to aid any salvaging efforts you may attempt.
• Delete the affected files, using the -delete option. Files cannot be recovered after being deleted.
Here my question is how to find out affected files? I have already worked with Hive to get the required outputs without any issue. will it affect performance/speed of query processing.
Regards,
Raj