I am trying to run a mahout item similarity job on a input consists of ~250 Million Pairs(row) in a Amazon EMR Cluster(m3.2xLarge,10 core nodes).I am facing Java Heap Size error while running the similarity job.
Things i have tried to solve this issue.
Increase the heap size of name nodes by defining them in bootstrap action.Like this -
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons --args --namenode-heap-size=8192Use memory intensive bootstrap recommended by AWS (s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive)
Set MAHOUT_HEAPSIZE manually.
The problem isn't solved.Is there any way to solve it?
profile
your java application and you can see the behavior of your heap. It's either getting bigger and bigger till it reaches the limit (option 1) or it's varying through the application with a weird manner (GC is not working properly). – Payam