1
votes

I recently set up a test environment cluster for hadoop -One master and two slaves.

Master is NOT a dataNode (although some use master node as both master and slave). So basically I have 2 datanodes. The default configuration for replication is 3. Initially, I did not change any configuration on conf/hdfs-site.xml. I was getting error could only be replicated to 0 nodes instead of 1. I then changed the configuration in conf/hdfs-site.xml in both my master and slave as follows:

<property>
<name>dfs.replication</name>
<value>3</value>
</property>

and lo! everything worked fine. My question is: does this configuration applies to NameNode or DatNode although I changed hdfs-site.xml in all my datanodes and NameNodes.

if my understanding is correct, NameNode allocates the block for datanodes. so replication configuration in master or NameNode is important and probably not needed in datanodes. Is this correct?

I am confused with the actual purpose of different xml in hadoop framework: from my little understanding:

1) core-site.xml - configuration parameters for the entire framework, such as where the logs files should go, what is the default name of the filesystem etc

2) hdfs-site.xml - applies to individual datanodes. how many replication, data dir in the local filesystem of the datanode, size of the block etc

3) mapred-site.xml - applies to datanode and gives configuration for the task tracker.

please correct if this is wrong. These configuration files are not well explained in the tutorials I had. so it comes from my look into these files in the defaults.

2

2 Answers

0
votes

This is my understanding and I may be wrong.

{hdfs-site.xml} - is to for the properties of HDFS(Hadoop Distributed File System) {mapred-site.xml} - is to for the properties of MapReduce {core-site.xml} - is for other properties which touch both HDFS and MapReduce

0
votes

this is usually caused by insufficient space.

please check the total capacity of your cluster and used, remaining ratio using

  hdfs dfsadmin -report

also check dfs.datanode.du.reserved in the hdfs-site.xml, if this value is larger than your remained capacity

look for other possible causes explained here