I am absolutly new in Apache Hadoop and I am following a video course on Udemy.
The course is based on Hadoop 1.2.1, is it a too old version? Is better start my study with another course based on a more recent version or is it ok?
So I have installed Hadoop 1.2.1 on an Ubuntu 12.04 system and I have configured it in pseudo distribution mode.
According with the tutorial I have do it using the following settings in the following configuration files:
1) conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2) conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3) conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
Then in the Linux shell I do:
ssh localhost
So I am connected through SSH to my local system.
Then I go into the Hadoop bin directory, /home/andrea/hadoop/hadoop-1.2.1/bin/ and here I perform this command that have to perform the format of the name node (what exactly means?):
bin/hadoop namenode –format
And this i the obtained output:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./hadoop namenode –format
16/01/17 12:55:25 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG: args = [–format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 12:55:25 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Then I try to start all the nodes performing this command:
./start–all.sh
and now I obtain:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./start-all.sh
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Now I try to open in the browser the following URLs:
http//localhost:50070/
and can't open it (page not found)
and:
http://localhost:50030/
this is correctly opened and redirect to this jsp page:
http://localhost:50030/jobtracker.jsp
So, in the shell I perform the jps command that lists all the running Java process for the user:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
6247 Jps
5720 DataNode
5872 SecondaryNameNode
6116 TaskTracker
5965 JobTracker
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
As you can see seems that the NameNode is not started.
On the tutorial that I am followingsay that:
If NameNode or DataNode is not listed than it might happen that the namenode's or datanode's root directory which is set by the property 'dfs.name.dir' is getting messed up. It by default points to the /tmp directory which operating system changes from time to time. Thus, HDFS when comes up after some changes by OS, gets confused and namenode doesn't start.
So to solve this problem provide this solution (that can't work for me).
First stop all nodes by the stop-all.sh script.
Then I have to explicitly set the 'dfs.name.dir' and 'dfs.data.dir'.
So I have created a dfs directory into the Hadoop path and into this directory I have created 2 directories (at the same level): data and name (the idea is to make two folders inside it which would be used for datanode demon and namenode demon).
So I have something like this:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ tree
.
├── data
└── name
Then I use this configuration for the hdfs-site.xml where I explicitly set the previous 2 directories:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/andrea/hadoop/hadoop-1.2.1/dfs/data/</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/andrea/hadoop/hadoop-1.2.1/dfs/name/</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
So, after this changing, I run again the command to format the NameNode:
hadoop namenode –format
And I obtain this output:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ hadoop namenode –format16/01/17 13:14:53 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG: args = [–format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 13:14:53 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$
So I start again all the nodes by: start-all.sh and this is the obtained output:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ start-all.sh
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
Then I perform the jps command to see if all the nodes are correctly started but this is what I obtain:
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
8041 SecondaryNameNode
8310 TaskTracker
8406 Jps
8139 JobTracker
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$
The situation worsened because now I have 2 nodes that are not started: the NameNode and the DataNode.
What am I missing? How can I try to solve this issue and start all my nodes?
Tnx