On a distributed Hadoop cluster, can I copy the same hdfs-site.xml file to the namenodes and datanodes?
Some of the set-up instructions I've seen (i.e. Cloudera) say to have the dfs.data.dir property in this file on the datanodes and and the dfs.name.dir property in this file on the namenode. Meaning I should have two copies of hdfs-site.xml, one for the namenode and one for the datanodes.
But if it's all the same I'd rather just own/maintain one copy of the file and push it to ALL nodes anytime I change it. Is there any harm/risk in having both dfs.name.dir and dfs.data.dir properties in the same file? What issues might happen if a data node sees the property for "dfs.name.dir" ? And if there are issues, what other properties should be in the hdfs-site.xml file on the namenode but not on datanode? and vice versa.
And finally, what properties need to be included in the hdfs-site.xml file that I copy to a client machine (who isn't a tasktracker or datanode, but just talks to the Hadoop cluster) ?
I'v searched around, including the O'reilly operations book, but can't find any good article describing how the config file needs to differ across different nodes. Thanks!