0
votes

We need to count the number of files in lots of directories in a multi-tenant multi-node Cluster with lots of amounts of Data. And so, I wonder from where the command "hdfs dfs -count /path/to/directory" gets its information? Does it work like a hdfs dfs -ls? Or it gets its information directly from the Namenode in the HDFS?

Thanks a lot!

1

1 Answers

2
votes

It calls the getContentSummary method from FileSystem API:

ContentSummary summary = src.fs.getContentSummary(src.path);
out.println(summary.toString(showQuotas) + src);

Source code for org.apache.hadoop.fs.FsShell.Count

Source Code for getContentSummary