Freeing up "Non-DFS used" space in hadoop

Question

I am trying out to load our data in hadoop hdfs. After some test runs, when check hadoop web ui, I realise that there is a lot of space consumed under title "Non-DFS used". In fact, "Non-DFS used" is more than "DFS used". So almost half the cluster is consumed by Non-DFS data.

Even after reformatting namenode and restarting, this " Non-DFS" space is not freed up.

Also I am not able to find the directory under which this "Non-DFS" data is stored, so that I can manually delete those files.

I read many threads online from people stuck at the exact same issue, but none got definitive answer.

Is it so difficult to empty this "Non-DFS" space? Or should I be not deleting it? How can I free up this space?

Kumar Kumar · Accepted Answer · 2015-07-28T11:26:54

In HDFS, Non-DFS is storage in the datanode which is not occupied by the hdfs data.

Look at the datanode hdfs-site.xml, directory set in the property either dfs.data.dir or dfs.datanode.data.dir will be used for DFS. All other used storage in the datanode will be considered as Non-DFS storage.

You can freed it up by deleting any unwanted files from the datanode machine such as hadoop logs, any non hadoop related files (other information on the disk), etc. It cannot be done by using any hadoop commands.

Non-DFS used is calculated by using following formula,

Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Find similar questions below,

What exactly Non DFS Used means?

Freeing up "Non-DFS used" space in hadoop

2 Answers