processing the last block of data in HADOOP

Question

Suppose the data size of a file XYZ, is 68MB. So the blocks (where default block is 64MB) will be A- 64MB and B - 4MB. In the B block, rest of the space is occupied by another data block.

So when processing is done for XYZ data file, the A anb B blocks data will be processed. Since B block contains data for another file too, how does the HADOOP know which part of the block is to processed in case of B block?

Ashrith Ashrith · Accepted Answer · 2014-11-06T07:09:51

If you have file (XYZ) of 68 MB and assuming your block size being 64MB then the data will be split into 2 blocks. Block-A will store 64MB of data and then the Block-B will store rest of the 4MB and the block will be closed (there is no wastage of space here), no other file's data will be put into Block-B.

So while processing, MapReduce knows exactly which blocks to process for a specific file. Of course, there are other constraints like input split's which are taken into consideration by MapReduce while processing the blocks to figure out record boundaries.

processing the last block of data in HADOOP

1 Answers