I am trying to reuse a static hash inside a setup method in a hadoop job.
private static Map<Long,String> amostraTable = null; //class variable
protected void setup(Context context) throws IOException,
InterruptedException {
if (amostraTable == null){
amostraTable = new HashMap<Long,String>();
System.out.println("Hashmap allocated!");
} else{
System.out.println("Hashmap reused");
return ;
}
}
I set mapreduce.job.jvm.numtasks=-1
I just want to reuse the hashmap. but every mapper are logging: Hashmap allocated!
Is there any other parameter to set? The mappers task are consuming to many CPU to allocate/populate the hashmap.
EDIT: Look at this post: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201206.mbox/%3COFC497A21A.62B05EC6-ON85257A14.006F8FF6-85257A14.006FE8C7@freddiemac.com%3E
"If i understood correctly, then if I initialize a static variable (say var) in setup() and when mapper is started for the 2nd time on same JVM, the that var would be already initialized before setup() is called i.e it is retaining its value from previously run mapper. Is this the way ?"
EDIT mapred.job.reuse.jvm.num.tasks: JVM reuse no longer supported mapreduce.job.jvm.numtasks: JVM reuse no longer supported
EDIT Hadoop 2.x does not support JVM Reuse. So my first option is to use: MultiThreadedMapper. I will make my HashMap thread safe. Is this a good option?