0
votes

I run a simple sort program, however, I meet such error like below.

12/06/15 01:13:17 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000001_1&filter=stdout
12/06/15 01:13:18 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000001_1&filter=stderr
12/06/15 01:13:20 INFO mapred.JobClient:  map 50% reduce 0%
12/06/15 01:13:23 INFO mapred.JobClient:  map 100% reduce 0%
12/06/15 01:14:19 INFO mapred.JobClient: Task Id : attempt_201206150102_0002_m_000000_2, Status : FAILED
Too many fetch-failures
12/06/15 01:14:20 WARN mapred.JobClient: Error reading task outputServer returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/tasklog?plaintext=true&attemptid=attempt_201206150102_0002_m_000000_2&filter=stdout

Does anyone know what's the reason and how to resolve it?

-------Update for more log information-------------------

2012-06-15 19:56:07,039 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2012-06-15 19:56:07,258 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists! 2012-06-15 19:56:07,339 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : null 2012-06-15 19:56:07,346 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=144965632, MaxSingleShuffleLimit=36241408 2012-06-15 19:56:07,351 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Thread started: Thread for merging on-disk files 2012-06-15 19:56:07,351 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Thread started: Thread for merging in memory files 2012-06-15 19:56:07,351 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Thread waiting: Thread for merging on-disk files 2012-06-15 19:56:07,352 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Need another 2 map output(s) where 0 is already in progress 2012-06-15 19:56:07,352 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Thread started: Thread for polling Map Completion Events 2012-06-15 19:56:07,352 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2012-06-15 19:56:12,353 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-06-15 19:56:32,076 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 copy failed: attempt_201206151954_0001_m_000000_0 from 192.168.1.106 2012-06-15 19:56:32,077 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Server returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=attempt_201206151954_0001_m_000000_0&reduce=0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1639) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

2012-06-15 19:56:32,077 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201206151954_0001_r_000000_0: Failed fetch #1 from attempt_201206151954_0001_m_000000_0 2012-06-15 19:56:32,077 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output from attempt_201206151954_0001_m_000000_0 even after MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to the JobTracker 2012-06-15 19:56:32,077 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 adding host 192.168.1.106 to penalty box, next contact in 12 seconds 2012-06-15 19:56:32,077 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0: Got 1 map-outputs from previous failures 2012-06-15 19:56:47,080 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-06-15 19:56:56,048 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 copy failed: attempt_201206151954_0001_m_000000_0 from 192.168.1.106 2012-06-15 19:56:56,049 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Server returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=attempt_201206151954_0001_m_000000_0&reduce=0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1639) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

2012-06-15 19:56:56,049 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201206151954_0001_r_000000_0: Failed fetch #2 from attempt_201206151954_0001_m_000000_0 2012-06-15 19:56:56,049 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output from attempt_201206151954_0001_m_000000_0 even after MAX_FETCH_RETRIES_PER_MAP retries... or it is a read error, reporting to the JobTracker 2012-06-15 19:56:56,049 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 adding host 192.168.1.106 to penalty box, next contact in 16 seconds 2012-06-15 19:56:56,049 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0: Got 1 map-outputs from previous failures 2012-06-15 19:57:11,053 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Need another 2 map output(s) where 0 is already in progress 2012-06-15 19:57:11,053 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Scheduled 0 outputs (1 slow hosts and0 dup hosts) 2012-06-15 19:57:11,053 INFO org.apache.hadoop.mapred.ReduceTask: Penalized(slow) Hosts: 2012-06-15 19:57:11,053 INFO org.apache.hadoop.mapred.ReduceTask: 192.168.1.106 Will be considered after: 1 seconds. 2012-06-15 19:57:16,055 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2012-06-15 19:57:25,984 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201206151954_0001_r_000000_0 copy failed: attempt_201206151954_0001_m_000000_0 from 192.168.1.106 2012-06-15 19:57:25,984 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Server returned HTTP response code: 403 for URL: _http://192.168.1.106:50060/mapOutput?job=job_201206151954_0001&map=attempt_201206151954_0001_m_000000_0&reduce=0 at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1639) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

Best Regards,

1
Can you send the some logs from Jobtracker? - Ashish
@Ashish I have added some log, if it's what you mentioned? - Yongwei Xing
seems like you have a permission problem in the shuffle phase. What version of hadoop are you using? I guess you are setting up/configuring a new cluster, right? mapred.local.dir is the place where mapoutput is stored locally. Check the permission of this directory. - root1982
@root1982 I use the hadoop 1.0.3 on Mac OS 10.7.4. It's pseudo-distributed mode on a single node. For mapred.local.dir, where to set it? Do you mean hadoop.tmp.dir? Actually, it works well some days ago. It just happened suddenly. - Yongwei Xing
did you actually resolve this? I am facing a similar issue - daydreamer

1 Answers

4
votes

I had the same issue. After digging down, I pinpointed the problem to be the name-resolution for the hosts. Please check the log of the particular attempt in

$HADOOP_HOME/logs/userlogs/JobXXX/attemptXXX/syslog

and if it has something like

WARN org.apache.hadoop.mapred.ReduceTask: java.net.UnknownHostException: slave-1.local.lan

then just add the appropriate entry in /etc/hosts. After doing this the error got resolved and in the next attempt everything worked fine.