0
votes

I am absolutly new in Apache Hadoop and I am following a video course on Udemy.

The course is based on Hadoop 1.2.1, is it a too old version? Is better start my study with another course based on a more recent version or is it ok?

So I have installed Hadoop 1.2.1 on an Ubuntu 12.04 system and I have configured it in pseudo distribution mode.

According with the tutorial I have do it using the following settings in the following configuration files:

1) conf/core-site.xml:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

2) conf/hdfs-site.xml:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

3) conf/mapred-site.xml:

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>localhost:9001</value>
    </property>
</configuration>

Then in the Linux shell I do:

ssh localhost

So I am connected through SSH to my local system.

Then I go into the Hadoop bin directory, /home/andrea/hadoop/hadoop-1.2.1/bin/ and here I perform this command that have to perform the format of the name node (what exactly means?):

bin/hadoop namenode –format

And this i the obtained output:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./hadoop namenode –format
16/01/17 12:55:25 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG:   args = [–format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 12:55:25 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ 

Then I try to start all the nodes performing this command:

./start–all.sh

and now I obtain:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ ./start-all.sh 
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ 

Now I try to open in the browser the following URLs:

http//localhost:50070/

and can't open it (page not found)

and:

http://localhost:50030/

this is correctly opened and redirect to this jsp page:

http://localhost:50030/jobtracker.jsp

So, in the shell I perform the jps command that lists all the running Java process for the user:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
6247 Jps
5720 DataNode
5872 SecondaryNameNode
6116 TaskTracker
5965 JobTracker
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ 

As you can see seems that the NameNode is not started.

On the tutorial that I am followingsay that:

If NameNode or DataNode is not listed than it might happen that the namenode's or datanode's root directory which is set by the property 'dfs.name.dir' is getting messed up. It by default points to the /tmp directory which operating system changes from time to time. Thus, HDFS when comes up after some changes by OS, gets confused and namenode doesn't start.

So to solve this problem provide this solution (that can't work for me).

First stop all nodes by the stop-all.sh script.

Then I have to explicitly set the 'dfs.name.dir' and 'dfs.data.dir'.

So I have created a dfs directory into the Hadoop path and into this directory I have created 2 directories (at the same level): data and name (the idea is to make two folders inside it which would be used for datanode demon and namenode demon).

So I have something like this:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ tree
.
├── data
└── name

Then I use this configuration for the hdfs-site.xml where I explicitly set the previous 2 directories:

<configuration>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/andrea/hadoop/hadoop-1.2.1/dfs/data/</value>
    </property>

    <property>
        <name>dfs.name.dir</name>
        <value>/home/andrea/hadoop/hadoop-1.2.1/dfs/name/</value>
    </property>

    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

So, after this changing, I run again the command to format the NameNode:

hadoop namenode –format

And I obtain this output:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ hadoop namenode –format16/01/17 13:14:53 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = andrea-virtual-machine/127.0.1.1
STARTUP_MSG:   args = [–format]
STARTUP_MSG:   version = 1.2.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG:   java = 1.7.0_79
************************************************************/
Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
16/01/17 13:14:53 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at andrea-virtual-machine/127.0.1.1
************************************************************/
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/dfs$ 

So I start again all the nodes by: start-all.sh and this is the obtained output:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ start-all.sh
starting namenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-namenode-andrea-virtual-machine.out
localhost: starting datanode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-datanode-andrea-virtual-machine.out
localhost: starting secondarynamenode, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-secondarynamenode-andrea-virtual-machine.out
starting jobtracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-jobtracker-andrea-virtual-machine.out
localhost: starting tasktracker, logging to /home/andrea/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-andrea-tasktracker-andrea-virtual-machine.out
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ 

Then I perform the jps command to see if all the nodes are correctly started but this is what I obtain:

andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ jps
8041 SecondaryNameNode
8310 TaskTracker
8406 Jps
8139 JobTracker
andrea@andrea-virtual-machine:~/hadoop/hadoop-1.2.1/bin$ 

The situation worsened because now I have 2 nodes that are not started: the NameNode and the DataNode.

What am I missing? How can I try to solve this issue and start all my nodes?

Tnx

2
hadoop version command is working????Kishore
add java home in hadoop.env fileKishore
can you paste the hosts file here-- vi /etc/hostsKishore
Can you post namenode logsBruceWayne

2 Answers

0
votes

Would you try to turn.of the IPTABLES.once and reformat along with exporting the java path.

0
votes

IF you have configure in hdfs-site.xml with , While you are formatting name node

<property>
        <name>dfs.name.dir</name>
        <value>/home/andrea/hadoop/hadoop-1.2.1/dfs/name/</value>
 </property>

then while formatting name node you should see

> successfully formatted /home/andrea/hadoop/hadoop-1.2.1/dfs/name/

message if name node format is successful. As per your logs I am not able to see those successful logs. Check permission issues may be there. If it didn't start try using another command:

hadoop-daemon.sh start namenode

Hope it works...