Getting error in apche Pig when running over yarn “org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker3/10.201.2.169:50000”

Question

I am running Apache Pig 0.11.2 with Hadoop 2.2.0.

Most simple jobs that I run in Pig work perfectly fine.

However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get these connection errors:

2013-12-18 11:21:28,400 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker2/10.201.2.145:54957. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:29,402 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker2/10.201.2.145:54957. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:30,403 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker2/10.201.2.145:54957. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:30,507 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2013-12-18 11:21:31,703 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker1/10.201.2.20:49528. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:32,704 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker1/10.201.2.20:49528. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:33,705 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker1/10.201.2.20:49528. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:33,809 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2013-12-18 11:21:34,890 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker3/10.201.2.169:50000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:35,891 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker3/10.201.2.169:50000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:36,893 [main] INFO org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker3/10.201.2.169:50000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1 SECONDS) 2013-12-18 11:21:36,996 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2013-12-18 11:21:37,152 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

The strange thing is that after these errors keeping appearing for about minutes, they'll stop, and the correct output shows up at the bottom.

So Hadoop is running fine and computing the proper output. The problem is just these connection errors that keep popping up. and that causing increase in execution time of the script.

One thing that I have noticed is that whenever this error appears, the job had created and ran multiple JAR files during the job. However, after a few minutes of these message popping up, the correct output finally appears.

I have 5 nodes cluster 1 namenode and 4 datanode. All the daemons are running fine.

Any suggestions on how to get rid of these messages?

Daniel Kvasnicka Daniel Kvasnicka · Accepted Answer · 2014-06-03T19:37:54

Looks like your job history server is not running.

Turn on log aggregation (you may have already done that and you're only missing the server) - put this to your yarn-site.xml:
```
<property>
   <name>yarn.log-aggregation-enable</name>
   <value>true</value>
</property>
```

Run the job history server:

$HADOOP_INSTALL/sbin/mr-jobhistory-daemon.sh start historyserver

Try running the Pig script again

Getting error in apche Pig when running over yarn “org.apache.hadoop.ipc.Client - Retrying connect to server: tasktracker3/10.201.2.169:50000”

1 Answers