0
votes

In a map-reduce job, i got the error "java.lang.OutOfMemoryError: Java heap space". Since I get this error in a mapper function; I thought that when I lower the input size to the mapper I will have no more error, so I changed the mapred.max.split.size to a much more lower value.

Then, I started the job again and i saw that "number of mapper tasks to be executed" has increased, so i thought that lowering mapred.max.split.size was a good idea: more mappers with lower memory requirements.

BUT, I got the "java.lang.OutOfMemoryError: Java heap space" error again, again and again.

It seems that, i did not understand how hadoop works.

Any suggestions?

1
Can you share you mapper code?Ashish
I am using mahout, seq2sparse function.ndemir

1 Answers

3
votes

You can modify the child heap size with: mapred.child.java.opts=-Xmx3000m, (in newer APIs you can be more specific with mapreduce.map.java.opts). You could also tune your node by choosing how many map and reduce tasks can be run in parallel. This can be controlled by the number of map and reduce slots available in a Tasktracker for example:

mapred.tasktracker.map.tasks.maximum=7
mapred.tasktracker.reduce.tasks.maximum=3

There are more options: mapred.cluster.map.memory.mb=300 and mapred.job.map.memory.mb=600 but I don't think you will need them for now.