hadoop hdfs directory size shown as 0

Question

Every time I use hadoop fs -ls /path_to_directory or hadoop fs -ls -h /path_to_directory , the result is like the following

drwxr-xr-x   - hadoop supergroup          0 2016-08-05 00:22/user/hive-0.13.1/warehouse/t_b_city
drwxr-xr-x   - hadoop supergroup          0 2016-06-15 16:28/user/hive-0.13.1/warehouse/t_b_mobile

The size of directory inside HDFS is always shown as 0 no matter there is file wihin it or not.

Browsing from the web UI gives the same reuslt as following :

drwxr-xr-x  hadoop  supergroup  0 B 0   0 B t_b_city
drwxr-xr-x  hadoop  supergroup  0 B 0   0 B t_b_mobile

However, there are actually files within those directory. When using command hadoop fs -du -h /user/hive-0.13.1/warehouse/ , the directory size can be shown correctly as the following:

385.5 K   /user/hive-0.13.1/warehouse/t_b_city
1.1 M     /user/hive-0.13.1/warehouse/t_b_mobile

Why would the hadoop fs -ls command of hdfs and the web UI always show 0 for a directory ?

Also, the hadoop fs -ls command usually finish immediately while the hadoop fs -du would take sometime to execute. It seems that the hadoop fs -ls command doesn't actually spend time on calculating total size of a directory.

When you run a ls -l command on Linux, the "size" displayed for directories is not related to the size of the files inside. So why did you expect HDFS to work differently??? — Samson Scharfrichter
BTW, the NameNode stores the whole filesystem information in RAM and not on disk, therefore a directory entry requires zero bytes on disk. On the other hand Linux filesystems require a few disk segments to persist each directory (list of inodes, permissions etc) — Samson Scharfrichter
Thanks. Seems my understanding for the ls command have long been wrong. I took it for granted that ls will show size for both file and directory. — Heyang Wang
Again, the size of a directory is the size of the directory object. Just like the size of a file is the size of a file. Full stop. — Samson Scharfrichter

abhiieor abhiieor · Accepted Answer · 2016-08-15T13:12:25

It is working as designed. Hadoop is designed for big files and one should not expect it to give the size of each and every time one run hadoop fs -ls command. If Hadoop works in way you want then try to think from another person point of view who might just want to see whether directory exists or not; but end up waiting long time just because Hadoop is calculating size of folder; not so good.

hadoop hdfs directory size shown as 0

2 Answers