0
votes

We have dfs.blocksize set to 512MB for one of the map reduce jobs which is a map only job. But, some of the mappers are outputting more than 512 MB. ex: 512.9 MB.

I believe, the mapper block size should be restrained by the dfs.blocksize. Appreciate any inputs. Thanks

2
File size != block sizeOneCricketeer
Stack Overflow is a site for programming and development questions. This question appears to be off-topic because it is not about programming or development. See What topics can I ask about here in the Help Center. Perhaps Super User or Unix & Linux Stack Exchange would be a better place to ask.jww

2 Answers

1
votes

I believe, the mapper block size should be restrained by the dfs.blocksize.

This is not true. Files can be larger than block size. They'll just span multiple blocks in that case.

1
votes

Mappers do not save their outputs in HDFS - they use regular file systems for saving results - this is done to not replicate temporary data accross server in HDFS cluster. So, HDFS block size has nothign to do with mappers' output file size.