I've recently been trying to build and configure a (8-Pi) Raspberry Pi 3 Hadoop-cluster (as a personal project over the summer). Please bear with me (unfortunately I am a little new to Hadoop). I am using is Hadoop version 2.9.2. I think its important to note that right now I am trying to just get one Namenode and one Datanode completely functional with one-another, before moving ahead and replicating the same procedure on the remaining seven Pi's.
The issue: My Namenode (alias: master) is the only node that is being displayed as a 'Live Datanode' under both the dfs-health interface, and through the use of :
dfsadmin -report
Even though the Datanode is being displayed as an 'Active Node' (within the Nodes of the cluster Hadoop UI) and 'master' is not listed within the slaves file. The configuration I am aiming for is that the Namenode should not perform any of Datanode operations. Additionally I am trying to configure the cluster in such a way that the command above will display my Datanode (alias: slave-01) as a 'Live Datanode'.
I suspect that my issue is caused by the fact that both my Namenode and Datanode make use of the same host-name (raspberrypi), however am unsure of the configuration changes I am required to make in order to correct the issue. After having looked into the documentation, I unfortunately couldn't find a conclusive answer as to whether this is allowed or not.
If someone could please help me solve this issue it would be extremely appreciated! I have provided any relevant file-information below (which I thought may be useful for solving the issue). Thank you :)
PS: All files are identical within the Namenode and Datanode unless otherwise specified.
===========================================================================
Update 1
I have removed localhost from the slaves file on both the Namenode and Datanode, and changed their respective hostnames to 'master' and 'slave-01' as well.
After running JPS: I have noticed that all of the correct processes are running on the master node, however I am having an error on the Datanode for which the log shows:
ExitCodeException exitCode=1: chmod: changing permissions of '/opt/hadoop_tmp/hdfs/datanode': Operation not permitted.
If someone could please help me solve this issue it would be extremely appreciated! Unfortunately the issue persists despite changing permissions using 'chmod 777'. Thanks in advance :)
===========================================================================
Hosts File
127.0.0.1 localhost
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
127.0.1.1 raspberrypi
192.168.1.2 master
192.168.1.3 slave-01
Master File
master
Slaves File
localhost
slave-01
Core-Site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>fs.default.FS</name>
<value>hdfs://master:9000/</value>
</property>
</configuration>
HDFS-Site.xml
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop_tmp/hdfs/datanode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop_tmp/hdfs/namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Mapred-Site.xml
<configuration>
<property>
<name>mapreduce.job.tracker</name>
<value>master:5431</value>
</property>
<property>
<name>mapred.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Yarn-Site.xml
<configuration>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8050</value>
</property>
</configuration>
job.tracker, which doesn't exist in Hadoop 2, and the duplicatedfs.default.FS, which is the real value that overwrites the other, looks like you're guessing at config files... You do not need YARN or mapreduce files at all to only get a namenode up and running, then focus on the datanode, and then focus on ResourceManager, NodeManager, and finish off with submitting code - OneCricketeer