Map / Reduce Tasks are failing extensively. Task Id : attempt___000001_0, Status : FAILED

Question

I am new to Hadoop. My Laptop is 32GB, Core i5, 4 core processor. I have created multinode (3 data Node) apache hadoop cluster 2.7.4 on that by virtual machines. I have assign 8GB, 2 core cpu per data node, Resource manager virtual machines. When I am running map reduce hadoop example jobs on namenode then almost every time my job got failed due to failing of Map tasks or reduce tasks.

I didn't see any specific error in logs, but notice that all maps & reduces tasks containers try to find containers on same data node, if it fails couple of times then application master select another node for available containers.

Is there any way to assign container on data node like round robin way ?

Any help would be appreciable.

Output-

hduser@NameNode:/opt/hadoop/etc/hadoop$ hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.4.jar pi 2 4
Number of Maps  = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
.....
17/11/02 12:53:33 INFO mapreduce.Job: Running job: job_1509607315241_0001
17/11/02 12:53:40 INFO mapreduce.Job: Job job_1509607315241_0001 running in uber mode : false
17/11/02 12:53:40 INFO mapreduce.Job:  map 0% reduce 0%
17/11/02 12:53:55 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_0, Status : FAILED
17/11/02 12:53:55 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000000_0, Status : FAILED
17/11/02 12:54:01 INFO mapreduce.Job:  map 50% reduce 0%
17/11/02 12:54:09 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_1, Status : FAILED
17/11/02 12:54:14 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_r_000000_0, Status : FAILED
17/11/02 12:54:24 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_m_000001_2, Status : FAILED
17/11/02 12:54:30 INFO mapreduce.Job: Task Id : attempt_1509607315241_0001_r_000000_1, Status : FAILED
17/11/02 12:54:40 INFO mapreduce.Job:  map 100% reduce 100%
17/11/02 12:54:44 INFO mapreduce.Job: Job job_1509607315241_0001 failed with state FAILED due to: Task failed task_1509607315241_0001_m_000001

Yarn-site.xnl

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.10.109</value>
        <description> The hostname of the machine the resource manager runs on. </description>
  </property>
  <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>A list of auxiliary services run by the node manager. A service is implemented by the class defined by the property yarn.nodemanager.auxservices.servicename.class. By default, no auxiliary services are specified. </description>
  </property>
  <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        <description> </description>
  </property>
  <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>7096</value>
        <description>The amount of physical memory (in MB) that may be allocated to containers being run by the node manager         </description>
  </property>
  <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>6196</value>
        <description>RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb" and not exceed "yarn.scheduler.maximum-allocation-mb" and It should not be more then total allocated memory of the Node. </description>
  </property>
  <property>
        <name>yarn.nodemanager.delete.debug-delay-sec</name>
        <value>6000</value>
  </property>
  <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
        <description>RM can only allocate memory to containers in increments of "yarn.scheduler.minimum-allocation-mb" and not exceed "yarn.scheduler.maximum-allocation-mb" and It should not be more then total allocated memory of the Node. </description>
  </property>

  <property>
        <name>yarn.app.mapreduce.am.resource.mb</name>
        <value>2048</value>
  </property>
  <property>
        <name>yarn.app.mapreduce.am.command-opts</name>
        <value>-Xmx2048m</value>
  </property>
  <property>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
  </property>
  <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>2</value>
        <description>The number of CPU cores that may be allocated to containers being run by the node manager.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.bind-host</name>
        <value>192.168.10.109</value>
        <description> The address the resource manager’s RPC and HTTP servers will bind to.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.address</name>
        <value>192.168.10.109:8032</value>
        <description>The hostname and port that the resource manager’s RPCserver runs on. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>192.168.10.109:8033</value>
        <description>The resource manager’s admin RPC server address and port. This is used by the admin client (invoked with yarn rmadmin, typically run outside the cluster) to communicate with the resource manager. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>192.168.10.109:8030</value>
        <description>The resource manager scheduler’s RPC server address and port. This is used by (in-cluster) application masters to communicate with the resource manager.</description>
  </property>
  <property>
        <name>yarn.resourcemanager.resourcetracker.address</name>
        <value>192.168.10.109:8031</value>
        <description>The resource manager resource tracker’s RPC server address and port. This is used by (incluster) node managers to communicate with the resource manager. </description>
  </property>
  <property>
        <name>yarn.nodemanager.hostname</name>
        <value>0.0.0.0</value>
        <description>The hostname of the machine the node manager runs on. </description>
  </property>
  <property>
        <name>yarn.nodemanager.bind-host</name>
        <value>0.0.0.0</value>
        <description>The address the node manager’s RPC and HTTP servers will bind to. </description>
  </property>
  <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/opt/hadoop/hdfs/yarn</value>
        <description>A list of directories where nodemanagers allow containers to store intermediate data. The data is cleared out when the application ends.</description>
  </property>
  <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:8050</value>
        <description>The node manager’s RPC server address and port. This is used by (in-cluster) application masters to communicate with node managers.</description>
  </property>
  <property>
        <name>yarn.nodemanager.localizer.address</name>
        <value>0.0.0.0:8040</value>
        <description>The node manager localizer’s RPC server address and port. </description>
  </property>
  <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>192.168.10.109:8088</value>
        <description> The resource manager’s HTTP server address and port.</description>
  </property>
  <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:8042</value>
        <description>The node manager’s HTTP server address and port. </description>
  </property>
  <property>
        <name>yarn.web-proxy.address</name>
        <value>192.168.10.109:9046</value>
        <description>The web app proxy server’s HTTP server address and port. If not set (the default), then the web app proxy server will run in the resource manager process. MapReduce ApplicationMaster REST APIs are accessed using a proxy server, that is, Web Application Proxy server. Proxy server is an optional service in YARN. An administrator can configure the
service to run on a particular host or on the ResourceManager itself (stand-alone mode). If the proxy server is not configured, then it runs as a part of the ResourceManager service. By default, REST calls could be made to the web address port of
ResourceManager 8088. </description>
  </property>

</configuration>

mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <description>Default framework to run.</description>
        </property>
      <!--  <property>
                <name>mapreduce.jobtracker.address</name>
                <value>localhost:54311</value>
                <description>MapReduce job tracker runs at this host and port.</description>
        </property> -->
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>192.168.10.109:19888</value>
                <description>The MapReduce job history server’s addressand port.</description>
        </property>
        <property>
                <name>mapreduce.shuffle.port</name>
                <value>13562</value>
                <description>The shuffle handler’s HTTP port number.This is used for serving map outputs, and is not a user-accessible web UI.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>192.168.10.109:10020</value>
                <description>The job history server’s RPC server address and port. This is used by the client (typically outside the cluster) to query job history.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.bind-host</name>
                <value>192.168.10.109</value>
                <description>Setting all of these values to 0.0.0.0 as in the example above will cause the MapReduce daemons to listen on all addresses and interfaces of the hosts in the cluster.</description>
        </property>
        <property>
                <name>mapreduce.job.userhistorylocation</name>
                <value>/opt/hadoop/hdfs/mrjobhistory</value>
                <description>User can specify a location to store the history files of a particular job. If nothing is specified, the logs are stored in output directory. The files are stored in "_logs/history/" in the directory. User can stop logging by giving the value "none".</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.intermediate-done-dir</name>
                <value>/opt/hadoop/hdfs/mrjobhistory/tmp</value>
                <description>Directory where history files are written by MapReduce jobs.</description>
        </property>
        <property>
                <name>mapreduce.jobhistory.done-dir</name>
                <value>/opt/hadoop/hdfs/mrjobhistory/done</value>
                <description>Directory where history files are managed by the MR JobHistory Server.</description>
        </property>

        <property>
                <name>mapreduce.map.memory.mb</name>
                <value>2048</value>
        </property>

        <property>
                <name>mapreduce.reduce.memory.mb</name>
                <value>3072</value>
        </property>

        <property>
                <name>mapreduce.map.cpu.vcores</name>
                <value>1</value>
                <description> The number of virtual cores to request from the scheduler for each map task.</description>
        </property>

        <property>
                <name>mapreduce.reduce.cpu.vcores</name>
                <value>1</value>
                <description> The number of virtual cores to request from the scheduler for each reduce task.</description>
        </property>

        <property>
                <name>mapreduce.task.timeout</name>
                <value>1800000</value>
        </property>

        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx1555m</value>
        </property>

        <property>
                <name>mapreduce.reduce.java.opts</name>
                <value>-Xmx2048m</value>
        </property>

        <property>
                <name>mapreduce.job.running.map.limit</name>
                <value>2</value>
                <description> The maximum number of simultaneous map tasks per job. There is no limit if this value is 0 or negative.</description>
        </property>

        <property>
                <name>mapreduce.job.running.reduce.limit</name>
                <value>1</value>
                <description> The maximum number of simultaneous reduce tasks per job. There is no limit if this value is 0 or negative.</description>
        </property>

        <property>
                <name>mapreduce.reduce.shuffle.connect.timeout</name>
                <value>1800000</value>
                <description>Expert: The maximum amount of time (in milli seconds) reduce task spends in trying to connect to a tasktracker for getting map output.</description>
        </property>

        <property>
                <name>mapreduce.reduce.shuffle.read.timeout</name>
                <value>1800000</value>
                <description>Expert: The maximum amount of time (in milli seconds) reduce task waits for map output data to be available for reading after obtaining connection.</description>
        </property>
<!--
        <property>
                <name>mapreduce.job.reducer.preempt.delay.sec</name>
                <value>300</value>
                <description> The threshold (in seconds) after which an unsatisfied mapper request triggers reducer preemption when there is no anticipated headroom. If set to 0 or a negative value, the reducer is preempted as soon as lack of headroom is detected. Default is 0.</description>
        </property>

        <property>
                <name>mapreduce.job.reducer.unconditional-preempt.delay.sec</name>
                <value>400</value>
                <description> The threshold (in seconds) after which an unsatisfied mapper request triggers a forced reducer preemption irrespective of the anticipated headroom. By default, it is set to 5 mins. Setting it to 0 leads to immediate reducer preemption. Setting to -1 disables this preemption altogether.</description>
        </property>
-->
</configuration>

Can you please go to datanode log present on the datanodes and provide error info from there. — KrazyGautam
@KrazyGautam I didn't see any error there. I can't past logs due to limit — Hitesh Mundra

Hitesh Mundra Hitesh Mundra · Accepted Answer · 2017-11-06T07:05:28

The problem is with /etc/hosts file in data nodes. We have to comment the line where hostname which point to it's loopback address. I have traced this error with the line in logs-

INFO [main] org.apache.hadoop.mapreduce.v2.app.client.MRClientService: Instantiated MRClientService at DN1/127.0.1.1:54483

Earlier

127.0.1.1 DN1
192.168.10.104 dn1

After

# 127.0.1.1 DN1
192.168.10.104 DN1

Map / Reduce Tasks are failing extensively. Task Id : attempt_*_*_000001_0, Status : FAILED

2 Answers

Map / Reduce Tasks are failing extensively. Task Id : attempt___000001_0, Status : FAILED