hdfs-site.xml for adding a new datanode

Question

I have installed hadoop 2.7.2 in pseudo-distributed mode(machine-1).I want to add a new datanode to it to make it as a cluster.As, but the problem is both of the machine has differnet disk partitions.

I installed same version hadoop 2.7.2 in new data node(machine-2) and also can ssh with machine-1.After googling many websites, all have common tutorials mentioning that, we have to have the same configurations files inside /etc/hadoop/ folder.

With the above said, my existing configurations in machine-1 are:

core-site.xml

    <configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home1/tmp</value>
                <description>A base for other temporary directories

    <property>
                <name>fs.default.name</name>
                <value>hdfs://CP000187:9000</value>
    </property>

    <property>
        <name>hadoop.proxyuser.vasanth.hosts</name>
        <value>*</value>
     </property>

    <property>
        <name>hadoop.proxyuser.vasanth.groups</name>
        <value>*</value>
    </property>
    </configuration>

hdfs-site.xml:

<configuration>
     <property>
            <name>dfs.replication</name>
            <value>1</value>
     </property>
     <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home1/hadoop_data/hdfs/namenode</value>
     </property>
     <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home1/hadoop_store/hdfs/datanode</value>
     </property>
     <property>
            <name>dfs.permissions</name>
            <value>false</value>
     </property>
</configuration>

/home1 is a disk mounted in machine1.

Machine-2 has two disk mounted namely /hdd1 and /hdd2.

Now, what should i specify in hdfs-site.xml on the new machine(machine-2) to make use of both hdd1 and hdd2?

should the value of dfs.data.dir of all nodes needs to be same?

Is the dfs.namenode.name.dir property required on hdfs-site.xml on machine2(since it is not a name node)?

My simplified question is it mandatory to replicate the master node configuration files in slave nodes also? Please help me out on this..

sterin jacob sterin jacob · Accepted Answer · 2016-06-17T08:46:05

You just need to copy entire hadoop folder from node1 to node2 . So in both configuration should point hdfs://CP000187:9000 . You dont have to do any addition settings in node2 .

To start datanode in node2 run (From sbin) .You need run only datanode and nodemanager process in node2

./hadoop-daemon.sh start datanode

To check whether datanode is added correct or not , run dfsadmin -report in node1

hadoop dfsadmin -report

Output :

Configured Capacity: 24929796096 (23.22 GB)
Present Capacity: 17852575744 (16.63 GB)
DFS Remaining: 17851076608 (16.63 GB)
DFS Used: 1499136 (1.43 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (2):

hdfs-site.xml for adding a new datanode

2 Answers