I have installed hadoop 2.7.2 in pseudo-distributed mode(machine-1).I want to add a new datanode to it to make it as a cluster.As, but the problem is both of the machine has differnet disk partitions.
I installed same version hadoop 2.7.2 in new data node(machine-2) and also can ssh with machine-1.After googling many websites, all have common tutorials mentioning that, we have to have the same configurations files inside /etc/hadoop/ folder.
With the above said, my existing configurations in machine-1 are:
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home1/tmp</value>
<description>A base for other temporary directories
<property>
<name>fs.default.name</name>
<value>hdfs://CP000187:9000</value>
</property>
<property>
<name>hadoop.proxyuser.vasanth.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.vasanth.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home1/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home1/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
/home1is a disk mounted in machine1.Machine-2 has two disk mounted namely /hdd1 and /hdd2.
Now, what should i specify in hdfs-site.xml on the new machine(machine-2) to make use of both hdd1 and hdd2?
should the value of dfs.data.dir of all nodes needs to be same?
Is the dfs.namenode.name.dir property required on hdfs-site.xml on machine2(since it is not a name node)?
My simplified question is it mandatory to replicate the master node configuration files in slave nodes also? Please help me out on this..