hadoop startup errors: datanode, tasktracker won't start and data replication error

Question

I am trying to install Hadoop 1.2.1 on a (test) cluster of 5 machines with one node serving as JobTracker, NameNode and Secondary NameNode. Rest of the 4 machines are slaves.

There are two issues.

1) In the master's conf/masters and conf/slaves files, I provided the IP addresses of master and slaves respectively. On the slaves, masters file is empty and slaves file contains its own IP.

When starting up hadoop (bin/start-all.sh), TaskTracker and DataNode don't start. I put in the host names of these machines in /etc/hosts file and tried putting in their hostnames in masters and slaves files as well. This doesn't make any difference -- TaskTracker and DataNode don't start.

While starting up hadoop services, I get a message that TaskTracker and DataNode logs have been written. But strangely, I don't find them in that location. Following are the messages I get

10.6.80.4: starting datanode, logging to /home/ubuntu/hadoop-1.2.1/libexec/../logs/hadoop-ubuntu-datanode-dsparq-instance4.out

10.6.80.2: starting tasktracker, logging to /home/ubuntu/hadoop-1.2.1/libexec/../logs/hadoop-ubuntu-tasktracker-dsparq-instance2.out

2) In the JobTracker/NameNode log, following exception is listed multiple times.

error: java.io.IOException: File <> could only be replicated to 0 nodes, instead of 1

The solutions to these problems (on StackOverflow) suggest reformatting the hdfs and checking the entries of /etc/hosts. I tried both of them, but that didn't help.

Please let me know how to fix these errors. Thank you in advance.

Adding contents of core-site.xml and mapred-site.xml (same on all the machines)

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
<name>fs.default.name</name>
<value>hdfs://10.6.80.21:8020</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/hdfs</value>
</property>

</configuration>

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>mapred.job.tracker</name>
<value>hdfs://10.6.80.21:8021</value>
</property>

</configuration>

Karthik Karthik · Accepted Answer · 2015-04-13T17:40:58

It should that logs are written to *.out file but it doesnot contain much information.. Look for *.log file in /var/log/hadoop/ directories for latest logs about namenode or other demons(if you have an RPM based installation else look in $HADOOP_HOME log folder). Comming to above issue, make sure that core-site.xml in all the nodes had the namenode details specified. Similarly check mapred-site.xml across the cluster and it should contain the address of jobtracker.

Also make sure that hostnames are maintained across cluster or you can just use ip address. hadoop.tmp.dir (location that we mention in core-site)must be created across the cluster and it should have appropriate file permission such that your hdfs user can perform read/writes.

Core-site.xml

<property>
      <name>hadoop.tmp.dir</name>
      <value>/loation/for/temp/dir</value>
      <description>A base for other temporary directories.</description>
    </property>

    <property>
      <name>fs.default.name</name>
      <value>hdfs://namenodehostname</value>
      <description>The name of the default file system.</description>
    </property>

Mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>jobtrackerhostname</value>
  <description>The host and port that the MapReduce job tracker
  </description>
</property>

If you feel that all the above properties are set in your cluster and still facing issue. Please update your complete log along with config files.

hadoop startup errors: datanode, tasktracker won't start and data replication error

1 Answers