I have an 11-node cluster with Cloudera Express 5.11 on Centos. Originally it was made of 7 nodes only; 4 more nodes have been added at a later time. Disk capacity is the same in every node: 5.4 TB.
The problem I'm having is that the hdfs dfsadmin -report
command is showing wrong values of disk usage, especially for the Configured Capacity. The values I have are 6.34 TB in the first 7 nodes and 21.39 TB in the last 4 ones.
For example, in one node I have the following report:
Decommission Status : Normal
Configured Capacity: 23515321991168 (21.39 TB)
DFS Used: 4362808995840 (3.97 TB)
Non DFS Used: 14117607018496 (12.84 TB)
DFS Remaining: 3838187159552 (3.49 TB)
DFS Used%: 18.55%
DFS Remaining%: 16.32%
Configured Cache Capacity: 2465202176 (2.30 GB)
Cache Used: 0 (0 B)
Cache Remaining: 2465202176 (2.30 GB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Running the df
command on the dfs.data.dir
folders showed me that the DFS Used
value (not the percentage) is correct, but the others are way off. I have read that HDFS may show values there are not up-to-date, but I've been seeing the same values for some days, even after rebooting all services and all machines.
What bugs me the most is that:
- The Configured Capacity is way higher than the true capacity (how could it infer 21 TB when I have only 5 TB?)
- I have two different values for the two sets of nodes, respectively
What could be the causes for these values? And is there a way to fix them?
PS: the reason I'm asking this is that, with the wrong values, HDFS underestimates the DFS Used%
and thus fails to rebalance files in the nodes. Indeed, the node for which I posted the valued has:
DFS Used
: ~4 TB (correct)DFS Used%
: ~19% (wrong)
Every other node has:
DFS Used
: ~2 TB (correct)DFS Used%
: ranging from 11% to 28% (wrong)
This makes it so that the DFS Used%
of the incriminated node is under the average, thus the balancer of HDFS infers that the node should not be rebalanced.
PS2: one thing I have noticed is that the first set of nodes has Centos 6.9, while the second one has Centos 6.8. Could this contribute somehow to the problem?