In hadoop, the data is split into either 64mb or 128mb blocks. Let us say I have a file of size 70mb. Does it split into two blocks of 64mb and 6mb. If so, the second block is occupied with only 6mb, is the other space in that block wasted or is it occupied by another block?
2 Answers
1
votes
In hadoop block size can be chosen by an application that writes into hdfs via dfs.blocksize property:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
There is no limitation on whether it should be 64 or 128mb, but current hadoop version defaults to 128mb.
Different block sizes can be set on different files.
No space is wasted if a file has a size smaller than the block size. However it is not recommended to have a lot of small files. More info regarding this problem and how to resolve it is here: https://developer.yahoo.com/blogs/hadoop/hadoop-archive-file-compaction-hdfs-461.html