I have simple mappers and following simple reducer (it is joining of two large tables by one field):
protected void reduce(StringLongCompositeKey key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {}
foreach(Text text : values) {
// do some operations with one record and then emit it using context.write
// so nothing is storing in memory, one text record is small (mo more then 1000 chars)
}
}
but I got following error
14/09/25 17:54:59 INFO mapreduce.Job: map 100% reduce 28%
14/09/25 17:57:14 INFO mapreduce.Job: Task Id : attempt_1410255753549_9772_r_000020_0, Status : FAILED
Container [pid=24481,containerID=container_1410255753549_9772_01_001594] is running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used; 4.8 GB of 8.4 GB virtual memory used. Killing container.
There is one nuance -)
Iterable<Text> values
is very long! As I considered before, and still believe that it is true, that that Iterable loads next record on demand, and it shouldn't be problem for hadoop to process it, without consumption a lot of RAM.
Could this error appear while shuffling or sorting? Is there any special information about processing long sequences?