Short Answer
On Master-Side
hadoop-daemons.sh
In $HADOOP_HOME/sbin/hadoop-daemons.sh (not $HADOOP_HOME/sbin/hadoop-daemon.sh, there is an s in the filename), there is a line calling $HADOOP_HOME/sbin/slaves.sh. In my version (Hadoop v2.7.7), it reads:
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
Change the line it to the following line to make it respect slave-side environment variables:
exec "$bin/slaves.sh" "source" ".bash_aliases" \; "hadoop-daemon.sh" "$@"
yarn-daemons.sh
Similarly, in $HADOOP_HOME/sbin/yarn-daemons.sh, change the line:
exec "$bin/slaves.sh" --config $YARN_CONF_DIR cd "$HADOOP_YARN_HOME" \; "$bin/yarn-daemon.sh" --config $YARN_CONF_DIR "$@"
to
exec "$bin/slaves.sh" "source" ".bash_aliases" \; "yarn-daemon.sh" "$@"
On Slave-Side
Put all Hadoop-related environment variables into $HOME/.bash_aliases.
Start / Stop
To start HDFS, just run start-dfs.sh on master-side. The slave-side data node will be started as if hadoop-daemon.sh start datanode is executed from an interactive shell on slave-side.
To stop HDFS, just run stop-dfs.sh.
Note
The above changes already are already completed. But for perfectionists, you may also want to fix the callers of sbin/hadoop-daemons.sh so that the commands are correct when you dump them. In this case, find all occurrences of hadoop-daemons.sh in the Hadoop scripts and replace --script "$bin"/hdfs to --script hdfs (and all --script "$bin"/something to just --script something). In my case, all the occurrences are hdfs, and since the slave side will rewrite the command path when it is hdfs related, the command works just fine with or without this fix.
Here is an example fix in sbin/start-secure-dns.sh.
Change:
"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script "$bin"/hdfs start datanode $dataStartOpt
to
"$HADOOP_PREFIX"/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode $dataStartOpt
In my version (Hadoop v2.7.7), the following files need to be fixed:
sbin/start-secure-dns.sh (1 occurrence)
sbin/stop-secure-dns.sh (1 occurrence)
sbin/start-dfs.sh (5 occurrences)
sbin/stop-dfs.sh (5 occurrences)
Explanation
In sbin/slaves.sh, the line which connects the master to the slaves via ssh reads:
ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
I added 3 lines before it to dump the variables:
printf 'XXX HADOOP_SSH_OPTS: %s\n' "$HADOOP_SSH_OPTS"
printf 'XXX slave: %s\n' "$slave"
printf 'XXX command: %s\n' $"${@// /\\ }"
In sbin/hadoop-daemons.sh, the line calling sbin/slaves.sh reads (I split it into 2 lines to prevent scrolling):
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; \
"$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
The sbin/start-dfs.sh script calls sbin/hadoop-daemons.sh. Here is the result when sbin/start-dfs.sh is executed:
Starting namenodes on [master]
XXX HADOOP_SSH_OPTS:
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: namenode
master: starting namenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-namenode-akmacbook.out
XXX HADOOP_SSH_OPTS:
XXX slave: slave1
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: datanode
slave1: bash: line 0: cd: /home/hduser/hadoop-2.7.7: Permission denied
slave1: bash: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh: Permission denied
Starting secondary namenodes [master]
XXX HADOOP_SSH_OPTS:
XXX slave: master
XXX command: cd
XXX command: /home/hduser/hadoop-2.7.7
XXX command: ;
XXX command: /home/hduser/hadoop-2.7.7/sbin/hadoop-daemon.sh
XXX command: --config
XXX command: /home/hduser/hadoop-2.7.7/etc/hadoop
XXX command: --script
XXX command: /home/hduser/hadoop-2.7.7/sbin/hdfs
XXX command: start
XXX command: secondarynamenode
master: starting secondarynamenode, logging to /home/hduser/hadoop-2.7.7/logs/hadoop-hduser-secondarynamenode-akmacbook.out
As you can see from the above result, the script does not respect the slave-side .bashrc and etc/hadoop/hadoop-env.sh.
Solution
From the result above, we know that the variable $HADOOP_CONF_DIR is resolved at master-side. The problem will be solved if it is resolved at slave-side. However, since the shell created by ssh (with a command attached) is a non-interactive shell, the .bashrc script is not loaded on the slave-side. Therefore, the following command prints nothing:
ssh slave1 'echo $HADOOP_HOME'
We can force it to load .bashrc:
ssh slave1 'source .bashrc; echo $HADOOP_HOME'
However, the following block in .bashrc (default in Ubuntu 18.04) guards non-interactive shells:
# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac
At this point, you may remove the above block from .bashrc to try to achieve the goal, but I don't think it's a good idea. I did not try it, but I think that the guard is there for a reason.
On my platform (Ubuntu 18.04), when I login interactively (via console or ssh), .profile loads .bashrc, and .bashrc loads .bash_aliases. Therefore, I have a habit of keeping all .profile, .bashrc, .bash_logout unchanged, and put any customizations into .bash_aliases.
If on your platform .bash_aliases does not load, append the following code to .bashrc:
if [ -f ~/.bash_aliases ]; then
. ~/.bash_aliases
fi
Back to the problem. We could therefore load .bash_aliases instead of .bashrc. So, the following code does the job, and the $HADOOP_HOME from the slave-side is printed:
ssh slave1 'source .bash_aliases; echo $HADOOP_HOME'
By applying this technique to the sbin/hadoop-daemons.sh script, the result is the Short Answer mentioned above.