0
votes

If I copy a set of files to HDFS in a Hadoop 7 node cluster, would HDFS take care of automatically balancing out the data across the 7 nodes, is there any way I can tell HDFS to constrain/force data to a particular node in the cluster?

1

1 Answers

2
votes

NameNode is 'the' master who decides about where to put data blocks on different nodes in the cluster. In theory, you should not alter this behavior as it is not recommended. If you copy files to hadoop cluster, NameNode will automatically take care of distributing them almost equally on all the DataNodes.

If you want to force change this behaviour (not recommended), these posts could be useful:

  1. How to put files to specific node?

  2. How to explicilty define datanodes to store a particular given file in HDFS?