I recently set up a test environment cluster for hadoop
-One master and two slaves.
Master is NOT a dataNode (although some use master node as both master and slave).
So basically I have 2 datanodes. The default configuration for replication is 3.
Initially, I did not change any configuration on conf/hdfs-site.xml
. I was getting error could only be replicated to 0 nodes instead of 1
.
I then changed the configuration in conf/hdfs-site.xml
in both my master and slave as follows:
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
and lo! everything worked fine. My question is: does this configuration applies to NameNode or DatNode although I changed hdfs-site.xml in all my datanodes and NameNodes.
if my understanding is correct, NameNode allocates the block for datanodes. so replication configuration in master
or NameNode
is important and probably not needed in datanodes
. Is this correct?
I am confused with the actual purpose of different xml in hadoop framework: from my little understanding:
1) core-site.xml
- configuration parameters for the entire framework, such as where the logs files should go, what is the default name of the filesystem etc
2) hdfs-site.xml
- applies to individual datanodes. how many replication, data dir in the local filesystem of the datanode, size of the block etc
3) mapred-site.xml
- applies to datanode and gives configuration for the task tracker.
please correct if this is wrong. These configuration files are not well explained in the tutorials I had. so it comes from my look into these files in the defaults.