HDFS and small files - part 2

Question

This is with reference to the question : Small files and HDFS blocks where the answer quotes Hadoop: The Definitive Guide:

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.

Which I completely agree with because as per my understanding, blocks are just a way for the namenode to map which piece of file is where in the entire cluster. And since HDFS is an abstraction over our regular filesystems, there is no way a 140 MB will consume 256 MB of space on HDFS if the block size is 128MB, or in other words, the remaining space in the block will not get wasted.

However, I stumbled upon another answer here in Hadoop Block size and file size issue which says:

There are limited number of blocks available dependent on the capacity of the HDFS. You are wasting blocks as you will run out of them before utilizing all the actual storage capacity.

Does that mean if I have 1280 MB of HDFS storage and I try to load 11 files with size 1 MB each ( considering 128 MB block size and 1 replication factor per block ), the HDFS will throw an error regarding the storage?

Please correct if I am assuming anything wrong in the entire process. Thanks!

Your understanding is correct. On disk Linux storage formats are being used and hence 10 KB file is not going to take 256MB space of disk. HDFS blocks are being created to manage the volume of data and cluster resource not manage to the disk space. — Sandeep Singh

Yogi Devendra Yogi Devendra · Accepted Answer · 2017-10-22T15:33:36

No. HDFS will not throw error because

1280 MB of storage limit is not exhausted.
11 meta entries won't cross memory limits on the namenode.

For example, say we have 3GB of memory available on namenode. Namenode need to store meta entries for each file, each block. Each of this entries take approx. 150 bytes. Thus, you can store roughly max. 1 million files with each having one block. Thus, even if you have much more storage capacity, you will not be able to utilize it fully if you have multiple small files reaching the memory limit of namenode.

But, specific example mentioned in the question does not reach this memory limit. Thus, there should not be any error.

Consider, hypothetical scenario having available memory in the namenode is just 300 bytes* 10. In this case, it should give an error for request to store 11th block.

References:

HDFS and small files - part 2

1 Answers