Cannot start cluster from namenode (master): different $HADOOP_HOME on datanode (slave) and namenode (master)

votes

I am using Hadoop 1.2.1 on master and slave but I have them installed on different directories. So when I invoke bin/start-dfs.sh on master, I get the following error.

partho@partho-Satellite-L650: starting datanode, logging to /home/partho/hadoop/apache/hadoop-1.2.1/libexec/../logs/hadoop-partho-datanode-partho-Satellite-L650.out
hduser@node2-VirtualBox: bash: line 0: **cd: /home/partho/hadoop/apache/hadoop-1.2.1/libexec/..: No such file or directory**
hduser@node2-VirtualBox: bash: **/home/partho/hadoop/apache/hadoop-1.2.1/bin/hadoop-daemon.sh: No such file or directory**
partho@partho-Satellite-L650: starting secondarynamenode, logging to /home/partho/hadoop/apache/hadoop-1.2.1/libexec/../logs/hadoop-partho-secondarynamenode-partho-Satellite-L650.out

The daemons are getting created fine on the Master as you can see below

partho@partho-Satellite-L650:~/hadoop/apache/hadoop-1.2.1$ jps
4850 Jps

4596 DataNode

4441 NameNode

4764 SecondaryNameNode

It is obvious that Hadoop is trying to find the hadoop-daemon.sh and libexec on the slave using the $HADOOP_HOME on the master.

How can I configure individual datanodes/slaves so that when I start a cluster from master, the Hadoop home directory for the respective slaves are checked for hadoop-daemon.sh?

hadoophadoop2

4 Answers

votes

Hadoop usually sets the HADOOP_HOME environment variable on each node in a file named hadoop-env.sh.

You can update hadoop-env.sh on each node with the path for the respective node. It should maybe be in /home/partho/hadoop/apache/hadoop-1.2.1/. Probably want to stop the cluster first so it will pickup the changes.

If you have locate installed run locate hadoop-env.sh or find / -name "hadoop-env.sh"

votes

For Best solution for this you should keep hadoop directory in your any directory but it should be same for both like Example:

on master path:

/opt/hadoop

on slave path

/opt/hadoop

it doesn't matter which version you are using but directory name should be same

votes

Once you set up the cluster, to start all daemons from master

bin/hadoop namenode -format(if required)
bin/stop-dfs.sh
bin/start-dfs.sh
bin/start-mapred.sh

In order to start all nodes from master,

- you need to install ssh on each node
- once you install ssh and generate ssh key in each server, try connecting each nodes from master
- make sure slaves file in master node has all Ips of all nodes

So commands would be

- install ssh(in each node) : apt-get install openssh-server
- once ssh is installed,generate key : ssh-keygen -t rsa -P ""
- Create password less login from namenode to each node:
  ssh-copy-id -i $HOME/.ssh/id_rsa.pub user@datanodeIP
  user - hadoop user on each machine`enter code here`
- put all nodes ip in slaves(in conf dir) file in namenode

votes

Short Answer

On Master-Side

hadoop-daemons.sh

In $HADOOP_HOME/sbin/hadoop-daemons.sh (not $HADOOP_HOME/sbin/hadoop-daemon.sh, there is an s in the filename), there is a line calling $HADOOP_HOME/sbin/slaves.sh. In my version (Hadoop v2.7.7), it reads:

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

Change the line it to the following line to make it respect slave-side environment variables:

exec "$bin/slaves.sh" "source" ".bash_aliases" \; "hadoop-daemon.sh" "$@"

yarn-daemons.sh

Similarly, in $HADOOP_HOME/sbin/yarn-daemons.sh, change the line:

exec "$bin/slaves.sh" --config $YARN_CONF_DIR cd "$HADOOP_YARN_HOME" \; "$bin/yarn-daemon.sh" --config $YARN_CONF_DIR "$@"

exec "$bin/slaves.sh" "source" ".bash_aliases" \; "yarn-daemon.sh" "$@"

On Slave-Side

Put all Hadoop-related environment variables into $HOME/.bash_aliases.

Start / Stop

To start HDFS, just run start-dfs.sh on master-side. The slave-side data node will be started as if hadoop-daemon.sh start datanode is executed from an interactive shell on slave-side.

To stop HDFS, just run stop-dfs.sh.

Note

The above changes already are already completed. But for perfectionists, you may also want to fix the callers of sbin/hadoop-daemons.sh so that the commands are correct when you dump them. In this case, find all occurrences of hadoop-daemons.sh in the Hadoop scripts and replace --script "$bin"/hdfs to --script hdfs (and all --script "$bin"/something to just --script something). In my case, all the occurrences are hdfs, and since the slave side will rewrite the command path when it is hdfs related, the command works just fine with or without this fix.

Here is an example fix in sbin/start-secure-dns.sh.

Change:

"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start datanode $dataStartOpt

"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode $dataStartOpt

In my version (Hadoop v2.7.7), the following files need to be fixed:

sbin/start-secure-dns.sh (1 occurrence)
sbin/stop-secure-dns.sh (1 occurrence)
sbin/start-dfs.sh (5 occurrences)
sbin/stop-dfs.sh (5 occurrences)

Explanation

In sbin/slaves.sh, the line which connects the master to the slaves via ssh reads:

ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
   2>&1 | sed "s/^/$slave: /" &

I added 3 lines before it to dump the variables:

 printf 'XXX HADOOP_SSH_OPTS: %s\n' "$HADOOP_SSH_OPTS"
 printf 'XXX slave: %s\n' "$slave"
 printf 'XXX command: %s\n' $"${@// /\\ }"

In sbin/hadoop-daemons.sh, the line calling sbin/slaves.sh reads (I split it into 2 lines to prevent scrolling):

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; \
 "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

The sbin/start-dfs.sh script calls sbin/hadoop-daemons.sh. Here is the result when sbin/start-dfs.sh is executed:

Starting namenodes on [master]
XXX HADOOP_SSH_OPTS: 
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: namenode
master: starting namenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-namenode-akmacbook.out
XXX HADOOP_SSH_OPTS: 
XXX slave: slave1
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: datanode
slave1: bash: line 0: cd: /home/hduser/hadoop-2.7.7: Permission denied
slave1: bash: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh: Permission denied
Starting secondary namenodes [master]
XXX HADOOP_SSH_OPTS: 
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: secondarynamenode
master: starting secondarynamenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-secondarynamenode-akmacbook.out

As you can see from the above result, the script does not respect the slave-side .bashrc and etc/hadoop/hadoop-env.sh.

Solution

From the result above, we know that the variable $HADOOP_CONF_DIR is resolved at master-side. The problem will be solved if it is resolved at slave-side. However, since the shell created by ssh (with a command attached) is a non-interactive shell, the .bashrc script is not loaded on the slave-side. Therefore, the following command prints nothing:

ssh slave1 'echo $HADOOP_HOME'

We can force it to load .bashrc:

ssh slave1 'source .bashrc; echo $HADOOP_HOME'

However, the following block in .bashrc (default in Ubuntu 18.04) guards non-interactive shells:

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

At this point, you may remove the above block from .bashrc to try to achieve the goal, but I don't think it's a good idea. I did not try it, but I think that the guard is there for a reason.

On my platform (Ubuntu 18.04), when I login interactively (via console or ssh), .profile loads .bashrc, and .bashrc loads .bash_aliases. Therefore, I have a habit of keeping all .profile, .bashrc, .bash_logout unchanged, and put any customizations into .bash_aliases.

If on your platform .bash_aliases does not load, append the following code to .bashrc:

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

Back to the problem. We could therefore load .bash_aliases instead of .bashrc. So, the following code does the job, and the $HADOOP_HOME from the slave-side is printed:

ssh slave1 'source .bash_aliases; echo $HADOOP_HOME'

By applying this technique to the sbin/hadoop-daemons.sh script, the result is the Short Answer mentioned above.