we have HDP cluster version 2.6.5
with 8
data nodes , all machines are installed on rhel 7.6 version
HDP cluster is based amabri platform version - 2.6.1
each data-node ( worker machine ) include two disks and each disk size is 1.8T
when we access the data-node machines we can see differences between the size of the disks
for example on the first data-node the size is : ( by df -h
)
/dev/sdb 1.8T 839G 996G 46% /grid/sdc
/dev/sda 1.8T 1014G 821G 56% /grid/sdb
on the second data-node the size is:
/dev/sdb 1.8T 1.5T 390G 79% /grid/sdc
/dev/sda 1.8T 1.5T 400G 79% /grid/sdb
on the third data-node th size is:
/dev/sdb 1.8T 1.7T 170G 91% /grid/sdc
/dev/sda 1.8T 1.7T 169G 91% /grid/sdb
and so on
the big question is why HDFS not perform the re-balance on the HDFS disks?
for example expected results on all disks should be with the same size on all datanodes machines
why is the used size differences between datanode1
to datanode2
to datanode3
etc ?
any advice about the tune parameters in HDFS that can help us?
because its very critical when one disk is reached 100%
size and the other are more small as 50%