I have file stored on HDFS and I need to get its size. I used the following line at the command prompt to get the file size
hadoop fs -du -s train.csv | awk '{{s+=$1}} END {{printf s}}
I know that Hadoop stores duplicates of files decided by the replication factor. So when I run the line above, is the returned size the file size time the replication factor or just the file size?