Hadoop Terasort unstable benchmark results

Question

I have a Cloudera Hadoop cluster and I'm doing some benchmarks running Terasort but I'm getting very unstable results from 105 - 150 minutes. Some times I've seen it was replicating more than usual or doing a lot of garbage collections but some other times they were pretty much the same.

I don't know the reason of the unstable results, any hint or recommendation will be very welcome :)

I run the benchmarks as follows:

I've chosen the number of maps and reduces tasks following this guide http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Speculative maps and reduce execution is off.

Generating dataset:

10,000,000,000 rows of 100 bytes ~= 953674 M
Block size = 128 MB
Number of maps tasks = 3725 (number-of-rows * row-size) / (block-size*2) I do times 2 because the maps tasks time was too low, like 7 seconds.

sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar teragen -Ddfs.replication=3 -Dmapred.map.tasks=3725 10000000000 /terasort-in

Running terasort:

num-of-worker-nodes = 4
num-of-cores-per-node = 8
Reduce tasks = 56 ( 1.75 * num-of-worker-nodes * num-of-cores-per-node )

sudo -u hdfs hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar terasort -Ddfs.replication=1 -Dmapred.reduce.tasks=56 /terasort-in /terasort-out

The service and role distribution among nodes is as follows:

6 Nodes - 8 cores, 16 GB RAM and 2 HD each - running just HDFS and MapReduce:

1st node, just master roles:
- Namenode.
- Cloudera management services.
2nd node, just master roles:
- JobTracker.
- SecondaryNamenode.
3rd to 6th nodes, just worker roles:
- TaskTracker.
- Datanode.

I use the 2nd node as client because is the one with the lowest load.

Please tell me if you need any configuration property value or detail.

Update: After Chris White's answer I've tried to reduce the number of pollings between the jobtracker and tasktrackers by having just 1 worker and very few maps and reduces, now the benchmarks are pretty stable :)

In answer to Chris White's question. 100% of the maps were local. — jpgerek

Chris White Chris White · Accepted Answer · 2013-11-07T11:30:54

There are many factors that you need to take into consideration when looking at performance:

This could be a polling problem combined with the small number of processing slots you have available.

The Task Trackers poll the running tasks periodically to determine if they have finished, and the Job Tracker also polls the Task Trackers. With your ~3700 map tasks (if i've read your question correctly), if there was say a ~1 second difference in polling times, then this could account for the ~hour you are seeing in timing differences.

If you have a larger cluster with more processing slots, i imagine this number would become more stable, but no MR job will every have a constant running time, there are too many polling and other external timings (JVM start up time for example) that can adjust the overall runtime.

What was the data locality counters say for both jobs? If one job had considerably more data lock tasks than another then i would expect it to run fast too.

Hadoop Terasort unstable benchmark results

1 Answers