6
votes

I recently upgraded my Cloudera environment from 5.8.x (hadoop 2.6.0, hdfs-1) to 6.3.x (hadoop 3.0.0, hdfs-1) and after some days of data loads with moveFromLocal, i just realized that the DFS Used% of datanode server on which i execute moveFromLocal are 3x more than that of others.

Then having run fsck with -blocks, -locations and -replicaDetails flags over the hdfs path to which i load the data; i observed that replicated blocks (RF=2) are all on that same server and not being distributed to other nodes unless i manually run hdfs balancer.

There is a pertinent question asked a month ago, hdfs put/moveFromLocal not distributing data across data nodes?, which does not really answer any of the questions; the files i keep loading are parquet files.

There was no such a problem in the Cloudera 5.8.x. Is there some new configuration should i make in Cloudera 6.3.x related to replication, rack awareness or something like that?

Any help would be highly appreciated.

2
I didnt know it is possible to have more than 1 replica of a block on the same datanode. Can you double check that?mazaneicha
Yes it is possible if you distribute them across separate disks on a server, and , not sure but may reside on the same disk if you have lesser datanodes than the RF, which is against the purpose of replication whatsoever. But, what i meant there and my case is totally different; which is, whereas original blocks (primaries) are distributed across all the datanodes, replicas (secondaries) of each block of loaded data are on the server where i run the loading operation, that is moveFromLocal.belce

2 Answers

2
votes

According to the HDFS Architecture doc, "For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on the local machine if the writer is on a datanode..."

Per the same doc, "Because the NameNode does not allow DataNodes to have multiple replicas of the same block, maximum number of replicas created is the total number of DataNodes at that time."

1
votes

You are probably doing moveFromLocal on one of your datanodes. Seems like you need to do your moveFromLocal from non-datanode to get even distribution on your cluster.