I am trying out to load our data in hadoop hdfs. After some test runs, when check hadoop web ui, I realise that there is a lot of space consumed under title "Non-DFS used". In fact, "Non-DFS used" is more than "DFS used". So almost half the cluster is consumed by Non-DFS data.
Even after reformatting namenode and restarting, this " Non-DFS" space is not freed up.
Also I am not able to find the directory under which this "Non-DFS" data is stored, so that I can manually delete those files.
I read many threads online from people stuck at the exact same issue, but none got definitive answer.
Is it so difficult to empty this "Non-DFS" space? Or should I be not deleting it? How can I free up this space?