I read about Hadoop's HDFS and came to know that hadoop is designed to process smaller number of larger sized files, rather than large number of small sized files.
The reason for this being that if there are larger number of small sized files, then Namenode's memory is quickly eaten away. I am having difficulty in understanding this argument.
Consider the following scenario:
1000 small files and each having size of 128 MB (the same block size of hdfs block).
So, this would mean 1000 entries in Namenode's memory holding this information.
Now, consider the following scenarios:
one single BIG file, who has 128 MB * 1000 block size.
Now won't Namenode have 1000 entries for this BIG single file?
Is this conclusion correct that in both these cases the Namenode would have same number of entries in memory regarding the block information of the file? If so, then how come hadoop is efficient for small number of larger sized files rather than larger number of small sized files?
Can anyone help in understanding this?