i have a three node hadoop cluster with replication factor = 3.
Storage Directory is /app/hadoop/tmp/dfs/ for each system.
Each datanode system has hard-disk capacity of 221GB.
the Effective data of HDFS is 62GB with replication 62*3= 186GB.
Now the problem is i am falling short of storage even though i have only 186GB of data on 660 GB cluster: HDFS shows huge difference in the space available for use:
datanode1 =7.47 GB
datanode2 =17.7 GB
datanode3 =143 GB
to make sure that these space is used by hadoop local storage, i ran this command on each datanode. for datanode1
du -h --max-depth=1 /app/hadoop/tmp/
63G /app/hadoop/tmp/dfs
139G /app/hadoop/tmp/mapred
201G /app/hadoop/tmp/
for datanode2
du -h --max-depth=1 /app/hadoop/tmp/
126G /app/hadoop/tmp/mapred
62G /app/hadoop/tmp/dfs
188G /app/hadoop/tmp/
for datanode3 du -h --max-depth=1 /app/hadoop/tmp/dfs/ 62G /app/hadoop/tmp/dfs/data 62G /app/hadoop/tmp/dfs/
here datanode1 has used 201GB space for storage. I tried load-balancer but its showing the cluster is balanced. here is the output.
start-balancer.sh
starting balancer, logging to /usr/lib/hadoop-0.20/logs/hadoop-ocpe-balancer-blrkec241933d.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 622.0 milliseconds
recently one of my datanode went down for few days, after fixing it this problem has arisen. How to balance the load?