0
votes

Recently I just started on studying about Hadoop and I just do an experiment to really understand it. Here is the tutorial: http://www.codeproject.com/Articles/757934/Apache-Hadoop-for-Windows-Platform What I want to ask is what is the background process that occur in the computer system after I run into several command based on the tutorial, such as the

  • hadoop namenode -format
  • javac -classpath C:\hadoop-2.3.0\share\hadoop\common\hadoop-common-2.3.0.jar;C:\hadoop-2.3.0\share\hadoop\mapreduce\hadoop-mapreduce-client-core-2.3.0.jar;C:\hadoop-2.3.0\share\hadoop\common\lib\gson-2.2.4.jar;C:\hadoop-2.3.0\share\hadoop\common\lib\commons-cli-1.2.jar Recipe.java
  • jar -cvf Recipe.jar *.class
  • hadoop fs -mkdir /in
  • hadoop fs -copyFromLocal c:\Hwork\recipeitems-latest.json /in
  • hadoop jar c:\Hwork\Recipe.jar Recipe /in /out
  • hadoop fs -ls /out
  • hadoop fs -cat /out/part-r-00000
2

2 Answers

0
votes

hadoop fs runs commands similar to unix (ls, copy, cat etc.) on the HDFS file system you can see the full list is the filesystem shell documentation

hadoop namenode -format is initialization of the nematode i.e. erasing everything stored in Hadoop - note that on newer Hadoop versions you'd do that via hdfs see here

The two other commands (javac and jar) has to do with compiling and packing a java programs

0
votes

You can track the flow by browsing through the binary file 'hadoop' available in the /bin folder under your hadoop home directory.

Once you submit a Hadoop command, it acts as a normal shell command in UNIX(cat,ls,awk). It'll go to the Hadoop binary(/bin) directory and start executing the command with the other options(fs,jar,distcp,job,namenode,jt..) as arguments. As per the option provided to the hadoop command, next shell will be invoked with the remaining options as arguments. Finally the java class will be executed with the requested options. I've provided a brief overview on how it works for 'hadoop fs -cat '

bin/hadoop

COMMAND=$1
case $COMMAND in
# usage flags
--help|-help|-h)
print_usage
exit
;;
.
.
namenode|secondarynamenode|datanode|dfs|dfsadmin|fsck|balancer|fetchdt|oiv|dfsgroups|portmap|nfs3)
.
.
if [ -f "${HADOOP_HDFS_HOME}"/bin/hdfs ]; then
exec "${HADOOP_HDFS_HOME}"/bin/hdfs ${COMMAND/dfsgroups/groups}  "$@"
elif [ -f "${HADOOP_PREFIX}"/bin/hdfs ]; then
exec "${HADOOP_PREFIX}"/bin/hdfs ${COMMAND/dfsgroups/groups} "$@"

bin/hdfs

    elif [ "$COMMAND" = "dfs" ] ; then
    CLASS=org.apache.hadoop.fs.FsShell
    HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
elif [ "$COMMAND" = "dfsadmin" ] ; then
    CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin
    HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
.
.
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

Sample java class implementation, http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/fs/FsShell.java

You could see how a command(cat) is implemented in java. I believe this should provide you a brief overview on how hadoop commands work in background.